My Journey with LLVM (GSoC'20 Final Evaluation)

2020/08/28

My GSoC comes to an end and this is a report of my work during the last 3 months. My project is adding DWARF support to yaml2obj, especially yaml2elf. The original proposal is here.

Implementation Status

With the help of my mentor James and other community members, I was able to accomplish most of the milestones in my original proposal. Now, the usability of the tool has been improved a lot. Some outstanding features are listed below.

* The InitialLength fields of DWARF sections are replaced with Format and Length. At first, we have to hardcode the InitialLength field to instruct the tool to emit a proper DWARF64 or a DWARF32 section, e.g.,

## DWARF32 section.
InitialLength:
  TotalLength32: 0x1234

## DWARF64 section.
InitialLength:
  TotalLength32: 0xffffffff
  TotalLength64: 0x1234

Now, yaml2obj emits DWARF32 sections by default and the Length field can be omitted, yaml2obj will calculate it for us (Patches that address this issue: D82622, D85880, D81063, D84008, D86590, D84911).

## DWARF32 section.
## The Format and Length fields can be omitted.
## We don't need to care about them.

## DWARF64 section.
Format: DWARF64 ## We only need to specify the Format field.

* yaml2obj supports emitting multiple abbrev tables. yaml2obj only supported emitting a single abbrev table and multiple compilation units had to share the same abbrev table before D86194 and D83116. Now, yaml2obj is able to emit multiple abbrev tables and compilation units can be linked to any one of them. We add an optional field ID to abbrev tables and an optional field AbbrevTableID to compilation units. Compilation units can use AbbrevTableID to link the abbrev table with the same ID. However, the AbbrOffset field of compilation units which corresponds to the debug_abbrev_offset field still needs to be specified. If D86614 can be accepted in the future, we don’t need to calculate it and specify it any more!

debug_abbrev:
  - ID: 0
    Table:
      ...
  - ID: 1
    Table:
      ...
debug_info:
  - ...
    AbbrevTableID: 1 ## Reference the second abbrev table.
  - ...
    AbbrevTableID: 0 ## Reference the first abbrev table.

* More DWARF sections are supported. The debug_rnglists, debug_loclists, debug_addr and debug_str_offsets sections are newly supported in yaml2obj. Check out D83624, D84234, D81541 and D83853 for more information!

* The DWARF support is added to elf2yaml and improved in macho2yaml. At first, the output of macho2yaml is noisy. It dumps the DWARF sections twice, one in the Sections: entry and one in the DWARF: entry, e.g.,

## The content of the debug_str section is dumped twice!
Sections:
  - sectname: __debug_str
    ...
    content: 6D61696E00 ## "main\0"
DWARF:
  debug_str:
    - main

After D85506, if the DWARF parser fails to parse the DWARF sections into the DWARF: entry, obj2yaml will dump them as raw content sections, otherwise, they will be presented as structured DWARF sections in the DWARF: entry. Besides, D85094 adds DWARF support to elf2yaml. Although it only supports dumping the debug_aranges section, we can easily extend it in the future.

Unfinished Tasks

* Allow users to describe DIEs at a high level. In my original proposal, we plan to make yaml2obj support describing DIEs at a high level. However, yaml2obj didn’t support emitting multiple abbrev tables at that time and I spent some time on enabling it to emit multiple abbrev tables and link compilation units with them. I’m not going to leave the community and I will improve it in the future.

My username on Phabricator is @Higuoxing. Please feel free to ping me if you have trouble in or encountering bugs in crafting DWARF test cases in YAML. I’m very happy to help!

Acknowledgements

I would love to express my sincere gratitude to @jhenderson(James Henderson) for mentoring me during this project. Besides, I would like to thank @grimar(George Rimar), @MaskRay(Fangrui Song), @labath(Pavel Labath), @dblaikie(David Blaikie), @aprantl(Adrian Prantl) and @probinson(Paul Robinson) for reviewing my patches, patiently answering my questions and leaving comments to my proposal!

Proposed Changes (Only accepted and ongoing ones are listed)

Ongoing:

D86614 [DWARFYAML] Make the debug_abbrev_offset field optional.
D86545 [DWARFYAML] Abbrev codes in a new abbrev table should start from 1 (by default).
D85289 [DWARFYAML][debug_info] Rename some mapping keys. NFC.

Porting the existing DWARF support to yaml2elf:

D80203 [ObjectYAML][DWARF] Add DWARF entry in ELFYAML.
D80972 [ObjectYAML][DWARF] Support emitting the .debug_aranges section in ELFYAML.
D81217 [ObjectYAML][DWARF] Support emitting .debug_ranges section in ELFYAML.
D81450 [ObjectYAML][ELF] Add support for emitting the .debug_line section.
D81820 [ObjectYAML][ELF] Add support for emitting the .debug_abbrev section.
D82073 [ObjectYAML][ELF] Add support for emitting the .debug_info section.
D82347 [ObjectYAML][ELF] Add support for emitting the .debug_pubtypes section.
D82367 [ObjectYAML][ELF] Add support for emitting the .debug_gnu_pubnames/pubtypes sections.
D82296 [ObjectYAML][ELF] Add support for emitting the .debug_pubnames section.

Introducing new DWARF sections to yaml2obj:

D81541 [ObjectYAML][DWARF] Implement the .debug_addr section.
D83624 [DWARFYAML] Implement the .debug_rnglists section.
D83853 [DWARFYAML] Implement the .debug_str_offsets section.
D84234 [DWARFYAML] Implement the .debug_loclists section.

Adding DWARF support to obj2yaml:

D85094 [obj2yaml] Add support for dumping the .debug_aranges section.

Refactoring work (improving error handling, making YAML fields optional, adding DWARF64 support, etc):

D80535 [ObjectYAML][MachO] Add error handling in MachOEmitter.
D80861 [ObjectYAML][DWARF] Let dumpPubSection return DWARFYAML::PubSection.
D81063 [DWARFYAML][debug_aranges] Replace InitialLength with Format and Length.
D81051 [ObjectYAML][ELF] Let the endianness of DWARF sections be inferred from FileHeader.
D86590 [DWARFYAML] Make the unit_length and header_length fields optional.
D86537 [DWARFYAML] Make the ‘Attributes’ field optional.
D83116 [DWARFYAML] Add support for referencing different abbrev tables.
D86194 [DWARFYAML] Add support for emitting multiple abbrev tables.
D86192 [obj2yaml] Refactor the .debug_pub* sections dumper.
D85880 [DWARFYAML] Replace InitialLength with Format and Length. NFC.
D85805 [DWARFYAML] Make the address size of compilation units optional.
D85821 [MachOYAML] Simplify the section data emitting function. NFC.
D85707 [DWARFYAML] Let the address size of line tables inferred from the object file.
D85506 [macho2yaml] Refactor the DWARF section dumpers.
D85496 [macho2yaml] Remove unused functions. NFC.
D85397 [DWARFYAML][debug_info] Make the ‘Values’ field optional.
D85405 [obj2yaml] Test dumping an empty .debug_aranges section.
D84496 [DWARFYAML] Replace ‘Format’, ‘Version’, etc with ‘FormParams’. NFC.
D85296 [DWARFYAML][debug_info] Pull out dwarf::FormParams from DWARFYAML::Unit.
D85179 [DebugInfo][unittest] Use YAML to generate the .debug_loclists section.
D85006 [DWARFYAML] Offsets should be omitted when the OffsetEntryCount is 0.
D84921 [DWARFYAML] Make the debug_aranges entry optional.
D84952 [DWARFYAML] Add helper function getDWARFEmitterByName(). NFC.
D85003 [DWARFYAML] Add emitDebug[GNU]Pub[names/types] functions. NFC.
D84911 [DWARFYAML] Make the ‘Length’ field of the address range table optional.
D84907 [DWARFYAML] Make the ‘AddressSize’, ‘SegmentSelectorSize’ fields optional.
D84624 [DWARFYAML] Rename checkListEntryOperands() to checkOperandCount(). NFC.
D84618 [DWARFYAML] Add support for emitting custom range list content.
D83282 [DWARFYAML] Refactor: Pull out member functions to DWARFYAMLUtils.cpp.
D84383 [DWARFYAML] Pull out common helper functions for rnglist and loclist tables. NFC.
D84008 [DWARFYAML] Refactor emitDebugInfo() to make the length be inferred.
D84239 [DWARFYAML] Refactor range list table to hold more data structure.
D83749 [DWARFYAML] Add support for emitting value forms of strx, addrx, etc.
D83452 [DWARFYAML] Use override instead of virtual for better safety.
D83220 [DWARFYAML][unittest] Refactor parseDWARFYAML().
D82435 [DWARFYAML][debug_gnu_*] Add the missing context IsGNUStyle. NFC.
D82933 [DWARFYAML][debug_abbrev] Emit 0 byte for terminating abbreviations.
D82622 [DWARFYAML][debug_info] Replace ‘InitialLength’ with ‘Format’ and ‘Length’.
D82630 [ObjectYAML][DWARF] Collect diagnostic message when YAMLParser fails.
D82351 [ObjectYAML][DWARF] Remove unused context. NFC.
D82275 [DWARFYAML][debug_info] Add support for error handling.
D82173 [DWARFYAML][debug_info] Use ‘AbbrCode’ to index the abbreviation.
D81826 [DWARFYAML][debug_abbrev] Make the abbreviation code optional.
D81915 [ObjectYAML][DWARF] Let writeVariableSizedInteger() return Error.
D81709 [ObjectYAML][DWARF] Let the target address size be inferred from FileHeader.
D81529 [ObjectYAML][test] Use a single test file to test the empty ‘DWARF’ entry.
D80722 [ObjectYAML][DWARF] Make the PubSection optional.
D81220 [DWARFYAML][debug_ranges] Make the “Offset” field optional.
D81528 [DWARFYAML] Add support for emitting DWARF64 .debug_aranges section.
D81357 [DWARFYAML][debug_ranges] Emit an error message for invalid offset.
D81356 [ObjectYAML] Add support for error handling in DWARFYAML. NFC.

Bugfixes

D85717 [DWARFYAML] Teach yaml2obj emit the correct line table program.
D85180 [YAMLTraits] Fix mapping <none> value that followed by comments.
D84640 [llvm-readelf] Fix emitting incorrect number of spaces in ‘–hex-dump’.
D82621 [DWARFYAML][debug_info] Teach yaml2obj emit correct DWARF64 unit header.
D82139 [DWARFYAML][debug_info] Fix array index out of bounds error.
D80862 [ObjectYAML][test] Address comments in D80203.