My Journey with LLVM (GSoC'20 Phase 1)

2020/06/30

It has been one month since my proposal gets accepted by GSoC'20. I learned a lot and had a wonderful time. Besides, we’ve made some progress towards our goal. Hence, it’s a good time to review what I’ve done and what I’ve learned in the first coding period.

The Project

In LLVM, we use yaml2obj to handcraft simple binaries of various formats in YAML, e.g., ELF, Mach-O, COFF, etc. My project is to add DWARF support to yaml2obj which hopefully makes it easier for people to handcraft debug sections in those kinds of binaries. This project is supervised by James Henderson.

The Progress

We’ve already ported existing DWARF implementation to yaml2elf as planned. People are able to handcraft DWARF sections at a low level. I have to admit that the current implementation of DWARF sections is hard to use since we have to specify nearly every field of those sections, e.g., the length, the version, the address or offset of the associated DWARF section, etc. That’s because those sections are isolated in the current implementation and DWARFYAML lacks a strategy to make those sections get interlinked properly. This is what we are going to address and I believe it will be improved in the future. We also have a spreadsheet to record the progress against the expected timeline.

The Implementation Status

The supported DWARF sections’ syntax and known issues are listed below. I’m not going to resolve all of the issues since some DWARF sections are deprecated in DWARFv5 spec and rarely used.

Note: The fields quoted by “[[]]” are optional.

Syntax Known Issues/Possible Improvements
debug_abbrev:
- [[Code: 1]]
Tag: DW_CHILDREN_yes
Attributes:
- Attribute: DW_AT_producer
Form: DW_FORM_strp
* Doesn’t support emitting multiple abbrev tables. D83116
debug_addr:
- [[Format: DWARF32/DWARF64]]
[[Length: 0x1234]]
Version: 5
[[AddressSize: 8]]
[[SegmentSelectorSize: 0]]
Entries:
- Address: 0x1234
[[Segment: 0x1234]]
* yaml2macho doesn’t support emitting the .debug_addr section.
* dwarf2yaml doesn’t support parsing the .debug_addr section.
debug_aranges:
- [[Format: DWARF32/DWARF64]]
Length: 0x1234
CuOffset: 0x1234
AddrSize: 0x08
SegSize: 0x00
Descriptors:
- Address: 0x1234
Length: 0x00
* The Length, AddrSize and SegSize fields should be optional.
* Rename CuOffset to DebugInfoOffset.
* Rename AddrSize to AddressSize.
* Rename SegSize to SegmentSelectorSize.
debug_info:
- [[Format: DWARF32/DWARF64]]
Length: 0x1234
Version: 5
UnitType: DW_UT_compile
AbbrOffset: 0x00
AddrSize: 0x08
Entries:
- AbbrCode: 1
Values:
- Value: 0x1234
- BlockData: [ 0x12, 0x34 ]
- CStr: ‘abcd’
* Rename AbbrOffset to DebugAbbrevOffset.
* Rename AddrSize to AddressSize.
* Rename AbbrCode to AbbrevCode or Code.
debug_line:
- [[Format: DWARF32/DWARF64]]
Length: 0x1234
Version: 4
PrologueLength: 0x1234
MinInstLength: 1
DefaultIsStmt: 1
LineBase: 251
LineRange: 14
OpcodeBase: 3
StandardOpcodeLengths: [ 0, 1, 1 ]
IncludeDirs:
- a.dir
Files:
- Name: hello.c
DirIndex: 0
ModTime: 0
Length: 0
Opcodes:
- Opcode: DW_LNS_extended_op
ExtLen: 9
SubOpcode: DW_LNE_set_address
Data: 0x1234
* The DWARFv5 .debug_line section isn’t tested.
debug_pub_names/types:
Length:
TotalLength: 0xffffffff
TotalLength64: 0x0c
Version: 2
UnitOffset: 0x1234
UnitSize: 0x4321
Entries:
DieOffset: 0x1234
Name: abcd
* Doesn’t support emitting multiple pub tables.
* Replace Length with Format and Length.
debug_ranges:
- AddrSize: 0x04
Entries:
- LowOffset: 0x10
HighOffset: 0x20
debug_str:
- abc
- def

Accomplishments

I’m very happy that I’m roughly able to reach the goal of the first period. During the first coding period, I learned about how the debug information is represented at a lower level in object files and how to process errors in the LLVM library. I’m also able to dig into some related core libraries, such as DebugInfo, CodeGen, and so on.

Areas in Need of Improvements

However, there are still some areas that I didn’t do well. When I was working on porting DWARF support to yaml2elf, I found that some DWARF sections were not well-formatted, e.g., the .debug_pub* sections don’t support emitting multiple pub tables, the .debug_abbrev section doesn’t support emitting multiple abbreviation tables, the .debug_pub* and .debug_abbrev sections lack terminating entries, etc. I used to port them to yaml2elf first and then try to fix the issue. However, it’s not the right approach! I should have fixed the issue first and then ported the section to yaml2elf so that I don’t have to update the test cases in many places and this prevents ill-formed test cases from spreading everywhere.

Besides, if I had made elf2yaml support converting DWARF sections back to YAML, my life would be easier. After porting some sections to yaml2elf, I realize that it’s good for us to have a tool that is able to convert DWARF sections back so that I don’t have to handcraft too many sections.

Acknowledgements

I would love to express my sincere gratitude to James Henderson for mentoring me during this project, and to folks for reviewing my patches and giving many useful suggestions in my proposal!

Accepted Patches

In case these patches are useful for evaluation.

D82435 [DWARFYAML][debug_gnu_*] Add the missing context
D82933 [DWARFYAML][debug_abbrev] Emit 0 byte for terminating abbreviations.
D82622 [DWARFYAML][debug_info] Replace ‘InitialLength’ with ‘Format’ and ‘Length’.
D82367 [ObjectYAML][ELF] Add support for emitting the .debug_gnu_pubnames/pubtypes sections.
D82630 [ObjectYAML][DWARF] Collect diagnostic message when YAMLParser fails.
D82296 [ObjectYAML][ELF] Add support for emitting the .debug_pubnames section.
D82621 [DWARFYAML][debug_info] Teach yaml2obj emit correct DWARF64 unit header.
D82351 [ObjectYAML][DWARF] Remove unused context. NFC.
D82347 [ObjectYAML][ELF] Add support for emitting the .debug_pubtypes section.
D82275 [DWARFYAML][debug_info] Add support for error handling.
D82173 [DWARFYAML][debug_info] Use ‘AbbrCode’ to index the abbreviation.
D82139 [DWARFYAML][debug_info] Fix array index out of bounds error.
D82073 [ObjectYAML][ELF] Add support for emitting the .debug_info section.
D81826 [DWARFYAML][debug_abbrev] Make the abbreviation code optional.
D81820 [ObjectYAML][ELF] Add support for emitting the .debug_abbrev section.
D81915 [ObjectYAML][DWARF] Let writeVariableSizedInteger() return Error.
D81541 [ObjectYAML][DWARF] Implement the .debug_addr section.
D81709 [ObjectYAML][DWARF] Let the target address size be inferred from FileHeader.
D81529 [ObjectYAML][test] Use a single test file to test the empty ‘DWARF’ entry.
D80722 [ObjectYAML][DWARF] Make the PubSection optional.
D81220 [DWARFYAML][debug_ranges] Make the “Offset” field optional.
D81528 [DWARFYAML] Add support for emitting DWARF64 .debug_aranges section.
D81450 [ObjectYAML][ELF] Add support for emitting the .debug_line section.
D81357 [DWARFYAML][debug_ranges] Emit an error message for invalid offset.
D81356 [ObjectYAML] Add support for error handling in DWARFYAML. NFC.
D80203 [ObjectYAML][DWARF] Add DWARF entry in ELFYAML.
D80862 [ObjectYAML][test] Address comments in D80203.
D81217 [ObjectYAML][DWARF] Support emitting .debug_ranges section in ELFYAML.
D81063 [DWARFYAML][debug_aranges] Replace InitialLength with Format and Length.
D81051 [ObjectYAML][ELF] Let the endianness of DWARF sections be inferred from FileHeader.
D80972 [ObjectYAML][DWARF] Support emitting the .debug_aranges section in ELFYAML.
D80535 [ObjectYAML][MachO] Add error handling in MachOEmitter.