dwarf2ctf, a type encoder for the Linux kernel
=========

Many kernel-level debugging and tracing systems need access to the kernel's type
information.  Since C doesn't support any form of introspection, the data must
be extracted in some other way: here, we extract it from the DWARF debugging
information generated by the compiler.  Unfortunately, this information is very
voluminous (just the type information alone adds up to a couple of hundred
megabytes in a 'make allyesconfig' kernel): even if users are happy to spend the
disk space, the time and memory required to read much of this information in is
likely to be prohibitive.

This problem is not new -- back in 2004, Sun had the same problem when
attempting to give DTrace a view of the type information in the Solaris kernel.
Their solution was the Compact ANSI-C Type Format (CTF), a highly compacted
representation of C types suitable for debuggers and tracers.  They combined
this with a highly efficient tool for converting DWARF2 types to CTF, and hacks
in the Solaris kernel causing the kernel itself to emit CTF data for its own
types.

Unfortunately while this tool may be highly efficient it is not adequate for the
Linux kernel.  It treats every ELF object as an independent entity with an
independent set of types -- perfectly all right for the Solaris kernel with a
few hundred modules maximum, but very much not for Linux, where distro kernels
often compile in thousands of modules.  Ideally, we would like to treat all
kernel modules, built-in or not, the same way, sharing and deduplicating all
globally-visible types across the entire set of visible modules and recording
each precisely once.

We also want to collect descriptions of global variables and emit descriptions
of their name->type mapping as well, since the kernel has no easily accessible
ELF section we can extract this information from at runtime (kernel modules must
be accessible at runtime for modern Linux systems to work, but the kernel itself
could have come from over the network or off a USB key or from a non-mounted
partition or an EFI boot partition or who knows where, and could have any name
even if it is accessible: so tracing tools should not rely on being able to look
inside the kernel image).

We do all this with dwarf2ctf, a CTF generation tool that reads in DWARF from a
set of object files (usually, every object file in the kernel and all modules)
and fills a directory with compressed files containing CTF representations of
the types in those object files: the kernel build system regenerates these as
necessary and links them directly into kernel modules.

Caveats: It is somewhat specific to the form of DWARF output emitted by GCC, and
doesn't yet support DWARF-4 type signatures or compressed DWARF at all.

We'll look at each part of this system in turn, from the top down, starting with
using the kernel type information  dwarf2ctf produces in other programs.


Using dwarf2ctf output
----------------------

Using this data is fairly simple.  Once you've read the CTF sections from the
kernel modules and inflated them (or ignored them if they are empty or, as just
mentioned, one byte long), you simply need to look at the ctf_parent_name() for
each module, and if it is set to "ctf", call ctf_import() to set the parent of
this module to the CTF data you have read from the .ctf.shared_ctf section in
the ctf.ko kernel module.  The core kernel's types are stored in the
.ctf.vmlinux section in the same kernel module, and all built-in kernel modules
have their types in .ctf.$module_name.  Non-built-in kernel modules just have a
.ctf section containing their types, which again might need their parent set to
"shared_ctf".  (Out-of-tree kernel modules will have no such parent.)

Once you've set up the parenthood relationships you can call ctf_close() on the
shared type repository and forget about it entirely: it will be refcounted and
destroyed when all its children are closed.


You should end up with a family of CTF files, one per kernel module built-in or
not and one for the core kernel, freely usable for whatever purpose you need.


Invocation and build-system connections
----------

dwarf2ctf's command-line syntax emphasises simplicity over compactness.  Linux
has nearly-infinitely-long command lines these days, so we can take advantage of
this.

Two syntaxes are supported.  The first shares types across multiple modules and
the core kernel; the second is used for out-of-tree module building, and avoids
either sharing anything at all across modules or depending on the set of shared
types defined for the core kernel.


dwarf2ctf outputdir objects.builtin modules.builtin dedup.blacklist \
	  vmlinux.o module.o ...
dwarf2ctf outputdir -e module.o ...

where:

 - 'outputdir' is the possibly-relative path to a directory in which the
   generated CTF files get placed.
 - 'objects.builtin' is the name of the file containing the object files that
   correspond to always-built-in kernel code (that cannot be built as modules).
 - 'modules.builtin' is the name of the file containing the names of
   kernel modules presently built in to the kernel.
 - 'dedup.blacklist' is a blacklist of modules that should never participate
   in deduplication: see 'Duplicate type detection' below.
 - the .o filenames are the names of object files comprising the kernel and/or
   modules: you can feed in whole modules at once (before linking with .mod.o).
   This list is often very, very long (I have seen command-lines in excess of
   60Kb).

dwarf2ctf's output consists of a series of gzip-compressed .ctf.new files in the
outputdir, which the makefile compares with and if necessary moves over the top
of .ctf files with the same basename, so as to avoid relinking things if
dwarf2ctf has written out content identical to what it wrote last time it ran.
These fall into several classes, partitioned according to the contents of
objects.builtin and modules.builtin:

 - shared_ctf.builtin.ctf: The shared type repository.  Types shared by more
   than one of the files below go here.
   libdtrace-ctf).  See 'Using dwarf2ctf output' below regarding use of this
   data.
 - vmlinux.builtin.ctf: Types in the core kernel, that cannot be built in to
   modules, go here.
 - *.builtin.ctf: One of these is generated for the types in each module that
   is presently built in to the kernel.
 - *.mod.ctf: One of these is generated for each .ko.

All the files in the first three classes are linked into the ctf.ko module under
various names, an empty module containing nothing but CTF data.


A lengthy section of Makefile.modpost, and a short section of the toplevel
Makefile, is dedicated to creating these files, and to linking them into the
kernel modules.  The dependency graph related to dwarf2ctf output is quite
complex: modules and objects (ld -r'ed *.o files) are processed by dwarf2ctf to
produce a number of files in the .ctf directory, and the final modules depend on
the relevant ctf files.  The .mod.ctf's go into the .ko's with the same stem
name, but ctf.ko receives content from all the CTF files corresponding to
built-in modules, and until dwarf2ctf runs and creates those files we cannot
tell what those CTF files will be, though we do have a wildcard that matches
them all.
[
GNU Make's 'secondary expansion' feature comes to the rescue here: we can
compute a list of expected CTF filenames at runtime, given the names of the
modules we are linking in.  For the builtin modules, we cheat and touch a stamp
file after moving any .ctf.new files back over a .ctf file, then depend on that
to see if ctf.ko needs to be relinked.

The actual incorporation of the CTF data into the kernel modules happens before
module signing (if signing is active), by calling objcopy --add-section on the
module in question.  This too has some knotty corners.

First of all, the module linkage process normally links a module using all the
prerequisites of the module's target -- but we have designated all the CTF files
as prerequisites of the module's target, and we don't want to link them directly
in using ld(1), since they aren't object files.  So we have to filter them out in
the link line.

Secondly, those modules which have no CTF files should acquire empty CTF
sections to indicate their lack of unique types -- but objcopy in binutils 2.20
and below silently exits if asked to --add-section an empty file.  So we use dd
to generate a file with a one-byte null in it instead, and teach the users of
CTF sections to treat a one-byte-long 'CTF' section as if it were empty.


Overview of dwarf2ctf operation
--------

There are four phases to dwarf2ctf operation: initialization, duplicate type
detection, CTF construction, and writeout.  Some of these phases can repeat.
All but the last phase consists purely of sucking data from object files into
GHashTables in memory. (The last two phases could potentially be combined,
shrinking the size of one hash and saving memory, but the hash that is shrunk is
by no means the largest one, so the extra complexity is probably not worth it.)

dwarf2ctf uses several other libraries to do this:

 - elfutils, used for DWARF parsing.  We could potentially write our own
   DWARF parser, but elfutils works and is tested.

 - glib, used for the GHashTable.  The rest of the kernel uses roll-your-own
   hash tables, but dwarf2ctf makes heavy demands of its hashtables: they must
   be expanding hashes capable of efficiently storing hundreds of thousands of
   items, with amortized log(N) lookup time, and they must support deletion
   (though it need not be particularly efficient deletion).  This rules out
   simple fixed-size bucket hashes like the ones used in other parts of the
   kernel build system: GHashTable is already implemented, and works.

 - zlib, used to compress the CTF information.  

 - libdtrace-ctf, which both reads and writes the CTF data.  This is a port of
   the Solaris CTF library, GPLed and with additional support for the storage
   of name->type mappings (meant to represent variables) akin to its existing
   ELF symbol->type mappings.

dwarf2ctf has a good few important data structures, described at the top of
scripts/dwarf2ctf/dwarf2ctf.c.

dwarf2ctf has its own trace facility, implemented via the dw_ctf_trace() macro
and enabled by compiling with -DDEBUG and setting DWARF2CTF_TRACE in the
environment.  (The first step is required because some very numerous data
structures are greatly expanded when debugging is turned on, which would waste
memory if it were done all the time).  This produces a huge volume of trace
output, several gigabytes when run over an allyesconfig kernel.


Unless you're interested in how dwarf2ctf works internally, you can stop reading
here.  If you are interested, now is a good time to read the comments above
main() in scripts/dwarf2ctf/dwarf2ctf.c, which briefly describe dwarf2ctf's data
structures and functions.


Flow of Control
---------------

The /* C comments */ point to other sections of this document,

Functions named in the /* Utilities */ section of dwarf2ctf.c are not mentioned
here for simplicity's sake.

[C]: Callback
[R]: recursive
[1]: Numbers: Mutually-recursive loop
|: Several functions which all call the same functions
->: Call from array of callbacks (filter_ctf_*() omitted as uninteresting)

main()
 /* See 'Initialization' */
 init_assembly_tab()
 init_builtin()
 init_blacklist()
 run()
   init_tu_to_modules()
   init_ctf_table()

   /* Duplicate detection */

   scan_duplicates()
     process_file()                       /* Toplevel DWARF walkers */
[C]    detect_duplicates_init()
[R]    process_tu_func()
[C]      assembly_filter_tab[]
[C]      detect_duplicates()
[ 1]       mark_shared()
[R]          type_id()                    /* Type IDs */
[C1]           mark_shared()
[R]        mark_seen_contained()
[C]    detect_duplicates_done()

     process_file()
[C]    detect_duplicates_init()
[R]    process_tu_func()
[C]      assembly_filter_tab[]
[C]      detect_duplicates_alias_fixup()
[R]        type_id()
[C]          is_named_struct_union_enum()
[R]        type_id()
[C]          detect_duplicates_alias_fixup_internal()
               mark_shared() (see above)
[C]    detect_duplicates_done()

   /* CTF construction */

   process_file()
[R]  process_tu_func()
[C]    assembly_filter_tab[]
[C]    construct_ctf()
[ 2]     construct_ctf_id()
[R3]       die_to_ctf()
             assembly_tab[]
[C]           -> assemble_ctf_base()
              -> assemble_ctf_pointer()
               | assemble_ctf_array()
               | assemble_ctf_array_dimension()
               | assemble_ctf_typedef()
               | assemble_ctf_cvr_qual()
               | assemble_ctf_variable()
                   lookup_ctf_type()
[ 2]                 construct_ctf_id()
              -> assemble_ctf_enumeration()
              -> assemble_ctf_enumerator()
              -> assemble_ctf_struct_union()
              -> assemble_ctf_su_member()
[ 3]               die_to_ctf()
[ 2]               construct_ctf_id()

   write_types()

Initialization
--------------

 init_assembly_tab()
 init_builtin()
 init_blacklist()
 run()
   init_tu_to_modules()
   init_ctf_table()

This happens at the top of main() and run(), and in various functions named
init_*().  Of these, init_assembly_tab() and init_builtin() serve only to turn
various static arrays and files mentioned on the command line into more useful
internal representations (e.g. the assembly filter array of structures is turned
into a pair of arrays indexed by DWARF tag), and init_blacklist() is described
in the section on duplicate type detection below.

init_ctf_table(), called both at initialization time and later during CTF
assembly when new CTF files are found to be needed, creates a new CTF file in
memory and either marks it as a child of the shared type repository, or (if it
*is* the shared type repository, or deduplication is off and there is only one
CTF file being processed and no shared type repository at all) creates a few
types in it which CTF has representations of but DWARF does not: a void type,
and a generic catchall pointer-to-function-returning-int.

That leaves init_tu_to_modules().  This walks over all the top-level
compile_unit DIEs in the DWARF debugging information in every object file
mentioned in the list of modules and built-in modules, constructing a mapping
from translation unit name back to the name of the kernel module it comes from,
even if that module is built in to the kernel.  This is normally the same as the
filename (sans extension), but for built-in kernel modules, the name comes from
the modules.builtin file's entry for the translation unit instead, so that the
output can land in a .builtin.ctf file rather than being jammed into
vmlinux.builtin.ctf with the core kernel's types.

This means that dwarf2ctf can operate in terms of the kernel module a type is
contained within rather than having to think about the mapping between object
file name, translation unit name and module name all the time.


Toplevel DWARF walkers
----------------------

     process_file()
[C]    (per-TU initialization callback)
[R]    process_tu_func()
[C]      assembly_filter_tab[]
[C]      (per-DIE callback)
[C]    (per-TU cleanup callback)

All routines in dwarf2ctf other than initialization and writeout are DWARF
walkers: i.e., they walk over all DWARF DIEs in all object files specified on
the command line and do something with every DIE.  This job is done by
process_file() and its helper process_tu_func(), which not only digs out the
corresponding (built-in or non-built-in) module name corresponding to each
object file, but also detects and skips translation units it has handled before
(in case they are incrementally linked into multiple object files) and allows
callbacks to be invoked at the start and end of each translation unit.

Even though dwarf2ctf only cares about top-level types, in some situations DWARF
can emit top-level types with references to a non-top-level type: if all
occurrences of the top-level type are an opaque structure, and the only
non-opaque definition is inside a function, references in the same translation
unit as the non-opaque definition will point to the definition inside the
function (and references outside the translation unit will not point at any
definition).  Thus, if we want to catch all nuances of globally-visible types,
we have to scan types inside functions and lexical blocks inside functions too.

To avoid generating a vast number of unnecessary type definitions, the 'assembly
table' which describes how to construct a CTF type given a DWARF DIE also
contains a description of a set of filters which are passed the current DIE and
its parent: if they return false, the DIE is skipped and never passed to the
callback function.  We also avoid calling the callback for any DWARF DIE whose
tag doesn't appear in the assembly table at all: there's no point doing
duplicate detection or anything else for a DWARF DIE we won't be generating CTF
from.  There are currently two filters defined: filter_ctf_file_scope(), which
is called for every DWARF DIE whose tag is one we never expect to see a
reference to if it is inside a function (except if they relate to a structure or
union, as above), and filter_ctf_uninteresting(), which is called for variables
to see if they are worthy of recording (top-level named variables with external
linkage not part of the internal workings of macros only).


Type IDs
--------

[R]  type_id()
[C]    (optional per-type callback)       

The only thing dwarf2ctf does which the Sun tool does not is the detection of
duplicate and shared types, both within individual kernel modules and across
modules.  Our ultimate goal is that a type that appears in the source code once
appears in the CTF output once as well.  This goal has mostly been attained,
except for out-of-tree modules, where cross-module type sharing must be disabled
to avoid requiring rebuilds of the module whenever the core kernel is rebuilt.

The core of this is the concept of a *type ID* and the function type_id() which
computes it.  A type ID is an identifier for a type which precisely represents
that type and only that type.  Doing this for types in different headers or at
different scopes with the same name without needing to encode knowledge of C
scoping rules into dwarf2ctf is an interesting proposition: we can use the line
number and filename info provided by DWARF in most user-specified types to help.

A type ID is a recursively-constructed string of the following form (fixed
elements represented by {}, optional elements by []):

//[filename]//[line number]//{type string}

Types are *based* upon other types iff they have a DW_AT_type attribute pointing
to some other type.  All types based upon other types have a type ID that is the
type ID of the type upon which they are based, with additional information
specific to this type appended to it.  The filename and line number is only
added for those types which are not based upon other types and which have a
filename and line number in the DWARF (lots don't, e.g. base types): the
filename is canonicalized with realpath(), though since this is quite slow and
type_id() is called a lot, the mapping from DWARF filename to realpath() result
is cached.  Types that have no filename or line number start with '////'.

We use // to separate the filename and line number elements because this is the
shortest string other than NUL that cannot appear in a canonicalized POSIX
pathname (ignoring Pyramid, Cygwin and other strange systems that actually
return // in the result of realpath(): Linux doesn't use it and that's all that
matters.  Should it start to use it, we can switch delimiter to ///.)

Function pointers are not represented (or, rather, are all mapped to the same
type ID, the generic catchall function-pointer type mentioned above); array
dimensions are represented by [index-type dimension], or [] for flexible array
members.  Structure members are not represented, since they are not types, but
the types of their members *are* represented, as are nested structures (the line
number and filename serving, as ever, to disambiguate them from other structures
with the same name declared nested inside different structures).

The following are some examples of valid type IDs (assuming the kernel source
tree is, implausibly, located at /k/, just off the root directory: comments on
individual types done /* like C */; the last example is broken across lines for
formatting's sake):

//fp//*  # a pointer to a function, any function
////long int
////char []
////unsigned int typedef __kernel_uid_t typedef __kernel_uid32_t
//fp//* typedef __signalfn_t * typedef __sighandler_t
////struct nsproxy                       /* an opaque type */
////struct nsproxy * #                   /* pointer to it */
////long unsigned int typedef u64 volatile
////long unsigned int volatile const * const
////long unsigned int typedef sector_t [////long unsigned int  511]
/k/include/linux/types.h//222//struct list_head
/k/include/linux/types.h//222//struct list_head *
/k/include/linux/types.h//222//struct list_head [////long unsigned int 5]
/k/include/linux/types.h//217//struct    /* no struct tag */
/k/include/linux/types.h//217//struct typedef atomic64_t
/k/include/linux/mm_types.h//34//struct page * typedef pgtable_t
/k/fs/eventpoll.c//122//struct nested_calls
/k/include/linux/sysctl.h//1016//struct ctl_table typedef ctl_table
    [////long unsigned int 4] var inotify_table  /* A global variable */

This scheme means that cv-quals and other modifiers applied to other types are
always merged: if there are a dozen typedefs for a single type 'foo' with the
same name declared in the same place, they all end up with the same type ID and
are only emitted into the CTF once.

The type_id() function can also accept a callback, which is called as the
recursion unwinds, from base type up to derived type: so it might be called for
"////unsigned int", then for "////unsigned int typedef __kernel_uid_t", and so
on up to the DIE that was originally passed in.  Because type_id() returns a
dynamically-allocated string, calls to type_id() made purely for the sake of
invoking a callback are normally of the peculiar form "free(type_id(...))".

type_id() is a very hot spot, so syscall results are cached in it (such as
realpath(), as mentioned above), and when string appending is done, it is done
all at once where possible, via str_appendn(), which calls realloc() only once
no matter the number of strings being appended.

Lots of core data structures in dwarf2ctf consist of hashes mapping type IDs to
something else (predominantly CTF file/ID pairs and module names).  It would
be possible to map from hashes of type IDs, saving some memory, but this would
impair debugging so is not yet implemented.  (If it is implemented, it should
probably be implemented only when DEBUG is not defined.)


Duplicate detection
-------------------

   scan_duplicates()
     process_file()                       /* Toplevel DWARF walkers */
[C]    detect_duplicates_init()
[R]    process_tu_func()
[C]      assembly_filter_tab[]
[C]      detect_duplicates()
[ 1]       mark_shared()
[R]          type_id()                    /* Type IDs */
[C1]           mark_shared()
[R]        mark_seen_contained()
[C]    detect_duplicates_done()

     process_file()
[C]    detect_duplicates_init()
[R]    process_tu_func()
[C]      assembly_filter_tab[]
[C]      detect_duplicates_alias_fixup()
[R]        type_id()
[C]          is_named_struct_union_enum()
[R]        type_id()
[C]          detect_duplicates_alias_fixup_internal()
               mark_shared() (see above)
[C]    detect_duplicates_done()

The job of the duplicate detection pass is to fill out the id_to_module hash,
which maps type IDs to the module they appear in, with the two special cases
that types that appear only in the core kernel are said to appear in the module
'vmlinux', and types that appear in more than one module (or in a module and in
the core kernel) are said to appear in the module 'shared_ctf', the shared type
repository.  This is quite a tricky multi-pass process, because we must ensure
that the shared type repository is self-contained: all types in the repository
must not reference any types outside the repository.

Detecting duplicates itself is easy: we consider two types duplicates if they
have the same type ID: if they both reside in the same module, the resulting
type resides in that module too.  However, detecting shared types is harder.
We consider that a type belongs in the shared module if any of these conditions
is true:

 - the type appears multiple times in different modules
 - a type for which this type is a base type is shared
 - the type is referenced by a structure or union member, and the structure
   or union is shared
 - the type is a non-opaque type with an opaque variant ('struct foo'), or
   vice versa, and either of these variants is shared: these two types will
   get different type IDs, so explicit checking is necessary

It should be fairly easy to see that sharedness is a contagious property:
e.g. if you mark a structure as shared, and one of its members is an
otherwise-unshared opaque pointer to a structure, you have to mark that as
shared: this causes the non-opaque definition of the structure, and all *its*
members, to be shared, and so on.  So since dwarf2ctf does not track the members
of structures itself (not until the CTF generation phase, anyway), this means
walking over the DWARF DIEs multiple times, checking for sharedness over and
over until we are done.

We partition the problem into two parts, both of which are carried out by
process_file() callback functions: detect_duplicates() and
detect_duplicates_alias_fixup().


detect_duplicates() is called first, once for every DIE in the kernel (via
process_file()).  This identifies types that are duplicated but not shared, and
identifies shared types without consideration of opaque struct/union aliasing.
It also flags types that have been seen only once as 'seen': this is checked
much later on by the CTF construction phase, since construction of CTF for
any type which has not been inspected by the deduplicator is a sign of a bug in
the deduplicator.

This has several subtleties.

If we are running for an out-of-tree module, we must still identify types as
duplicated within the module, but must never mark them as shared: out-of-tree
modules cannot contribute to the shared type repository nor even use types in
it, since they are rebuilt independently from the kernel proper and thus cannot
depend on a type currently in the repository remaining there (e.g. perhaps it
has only two users, both in modules, and the kernel is rebuilt to not build one
of those modules anymore: this should not require rebuilding of any out-of-tree
modules).

If we mark a structure or union type as seen, we must mark aggregate types that
appear directly within that type's DIE as seen as well.  This is done by the
recursive function mark_seen_contained().  You might wonder what the point of it
is: such types surely cannot appear anywhere else, and any duplication will
precisely match the duplication of the containing type.  The answer is that they
can still be referenced as the type of structure members of their containing
structure, e.g. in

struct foo {
	struct bar {
	} *baz;
	struct bar wombat[16];
};

Here, a reference to 'struct bar' appears in 'struct foo', and CTF is
constructed for it, even though it is not a top-level DIE.  In GCC 4.8+, a
reference to the 16-element array-of-struct-bar can also appear in 'struct foo':
in fact almost anything can appear in there if used nowhere else in the
translation unit, even base types.  So we look for the appearance of anything
which we can assemble into CTF (anything in the assembly_tab) other than
members, since members cannot be used as the type of anything else, and mark
them all as seen in this module.  (Nearly everything in a structure or union is
a member, so this ends up skipping almost but not quite everything.)


If we find that a type has appeared more than once in different kernel modules
(or in a module and in the core kernel), we must mark it as shared.  This is
done via mark_shared(), which is both a function that can be directly called
(e.g. from detect_duplicates()) and a type_id() callback.  If it is called
directly, it immediately reinvokes itself as a type_id() callback, which calls
it for the base type of the type in question and then for all qualifiers up the
type ID stack, marking them all as shared if they weren't already.

If a structure or union is marked as shared, the types of its members are also
marked as shared via a recursive call (even if they have already been so marked:
just because this structure is of a type we've already seen, in a location we've
already seen, doesn't mean that someone might not have legitimately used
#defines to add extra members to the end of it, and we need to mark them as
shared too).  We track structures that have been seen in this translation unit
and avoid recursing into them, to avoid an infinite loop in cases like this:

struct one;
struct two {
	struct one *foo;
};

struct one {
	struct two *foo;
}

The types being pointers does not help here -- the marking of 'struct one *' as
shared will automatically mark 'struct one' as shared too, because otherwise we
might have a structure in the shared type repository whose members' types 
could not be found there.


The second pass is the 'alias fixup' pass, implemented by
detect_duplicates_alias_fixup().  This pass serves to detect unshared opaque
types whose non-opaque equivalents are shared, and vice versa.  It is executed
repeatedly until no types have been marked as shared for an entire iteration,
but is considerably faster per iteration than the first pass, which often
consumes more than half of dwarf2ctf's total runtime.  We work in one direction
only, looking for non-opaque structures, unions or enums which have structure
tags.  (Structures without tags cannot have opaque variants, and structures
which are opaque will have non-opaque cousins somewhere, or can be emitted to
the CTF as an opaque structure harmlessly since they truly have no members and
are probably manipulated only via casts.)

We identify structures, unions or enums with tags via the type_id() callback
is_named_struct_union_enum(), but cannot determine if something is an opaque
structure at this stage.  Instead, we do that after the callback, checking to
see if the first four characters of the type ID are "////": this relies on the
fact that GCC never gives opaque structures line numbers in DWARF.  We do the
actual checking and marking of each non-opaque structure using
detect_duplicates_alias_fixup_internal(), which is yet another type_id()
callback.

This function directly synthesises the name which this structure's opaque cousin
would have, if it existed, by stripping off the line number and filename and
replacing them with '////', and sees if either of these types have been marked
as shared while the other has not.  If the opaque type is shared, the non-opaque
variant can be marked shared using the same recursive mark_shared() function as
before (thus marking the types of all its members and types it depends upon as
shared too).  If it is the opaque type that needs marking shared, this will not
work, since mark_shared() takes a DWARF DIE, and we don't have one for the
opaque type, just a faked-up type ID.  However, since an opaque type doesn't
have any members that need recursively tracing, we don't need access to its
DWARF DIE to figure them out, and can just mark it as shared directly, via an
insert into the id_to_module hash.

Since this function is a type_id() callback, it is called not just for
structures but for types based on them (e.g. in type_id form, "struct foo const
* [43] volatile *"); at every level of this declarator stack, a shared opaque
base type will contaminate its non-opaque cousins with sharedness, and vice
versa.  This handles situations in which, say, the opaque version of "struct foo
const * [43]" was used by more than one module and was marked as shared: the
marking process will have marked the opaque versions of "struct foo const *
[43]", "struct foo const *", "struct foo const" and "struct foo" as shared, but
will not have touched any non-opaque versions of these types which may exist
until this routine runs.  It also handles typedefs to structures with no need
for any extra code.

This whole alias fixup process needs to be repeated, because whenever a
non-opaque type is marked as shared and its member's types traced and marked
shared, *those* may themselves be structure types with corresponding opaque or
non-opaque variants, and when they are opaque types the non-opaque variant that
alias fixup works from may already have passed under the DWARF walker's gaze: so
another pass over the kernel's DWARF is necessary to be sure we catch it.
mark_shared() thus sets a flag that scan_duplicates() recognizes and uses to
trigger another run through the alias fixup pass.


There are a very few modules that this algorithm doesn't work for.  One example
is snd-ens1371, which reads, in toto

#define CHIP1371
#include "ens1370.c"

and ens1370.c (itself a distinct kernel module) then defines a 'struct ensoniq'
whose members vary depending on whether CHIP1371 is defined.  Obviously, it is
impossible to share any such types between kernel modules even though their
names are the same and they are defined in the same place in the same source
file in both cases.  But this sort of trickery is very rare, so we simply
implement a blacklist of modules which will not introduce new types into the
shared CTF repository, and who do not participate in alias fixup detection
either.  Detecting these cases in order to blacklist them is harder: no
automated system has yet been implemented, although instances where #defines of
this nature introduce new types that are then used by later members will cause
assertion failures inside dwarf2ctf which might be a clue.  So it is possible
that some examples have been missed.  (The blacklist only applies to cases where
structure members change within a single kernel build, so cases where structures
have members whose presence depends on CONFIG_* values are quite all right, as
are cases where #defines are introduced by one translation that #includes
another that then goes on to define whole new structures: it is only cases where
modules #define something that changes the definition of individual
possibly-shared types that will need blacklisting.)


CTF construction
----------------

   process_file()
[R]  process_tu_func()
[C]    assembly_filter_tab[]
[C]    construct_ctf()
[ 2]     construct_ctf_id()
[R3]       die_to_ctf()
             assembly_tab[]
[C]           -> assemble_ctf_base()
              -> assemble_ctf_pointer()
               | assemble_ctf_array()
               | assemble_ctf_array_dimension()
               | assemble_ctf_typedef()
               | assemble_ctf_cvr_qual()
               | assemble_ctf_variable()
                   lookup_ctf_type()
[ 2]                 construct_ctf_id()
              -> assemble_ctf_enumeration()
              -> assemble_ctf_enumerator()
              -> assemble_ctf_struct_union()
              -> assemble_ctf_su_member()
[ 3]               die_to_ctf()
[ 2]               construct_ctf_id()

The next stage after the detection of duplicate and cross-module shared types is
to generate CTF.  We generate all CTF at once before emitting it: this is
potentially somewhat wasteful of memory, but in practice has not proved to be a
problem: substantially less memory is used than is used by other parts of the
kernel build, unless -DDEBUG is enabled.  Its job is to look through the
kernel's type DWARF (via process_file(), as usual) and create CTF for every
file-or-global-scope type and every externally-visible variable in the CTF file
in which the duplicate detection pass has said that type should appear.
(Variables are treated exactly like types: it just so happens that they are
never shared because no type or variable can depend upon them, so they always go
directly into the appropriate module and never into the shared type repository.)

At this stage, the 'CTF files' are not actually files but rather ctf_file_t
structures maintained by libdtrace-ctf and tracked in the module_to_ctf_file
hash.  We track every single individual type in the CTF file in the id_to_type
hash, which maps type IDs to pairs of (CTF file ID, ctf type ID): this lets us
use the type IDs described above when considering cross-references within CTF
files (e.g. from one CTF type to a type it depends upon).

The most important functions in this phase are:

 - construct_ctf_id(), the top-level process_file() callback which is given a
   DWARF DIE, looks in module_to_ctf_file for the CTF file where this type
   should land (creating it if necessary), makes sure a type with this type ID
   has not already been created there, calls die_to_ctf() to create the CTF,
   notes where it was created in id_to_type, and handles errors.

 - die_to_ctf(), a recursive function which calls the assembly function for the
   DIE it is given and all its immediate children, with special-case handling
   for tagged structures and unions.  If you want to create a type but not note
   where it was created for future lookups by lookup_ctf_type(), this is what to
   call.  (This is only done currently for unnamed structures/unions.)

 - lookup_ctf_type(), which is called by CTF assembly functions for those
   types that depend upon other types: it calls construct_ctf_id() again
   to construct the type, and double-checks that all such types appear
   either in the module we are constructing types for, or in the shared CTF
   module.  CTF represents all function pointers the same way, and has a
   special type ID for 'void', so we special-case both of these cases.

These functions are mostly straightforward (though highly recursive, with all
three plus CTF construction functions participating in loop 2 above,
die_to_ctf() calling itself directly, and even one situation, the
already-mentioned unnamed structures/unions, in which die_to_ctf() is directly
called back by a CTF construction function, in loop 3 above.)


There are a few subtleties, though.  Firstly, error handling.  We consider that
errors that will lead to unusable CTF are fatal: mostly, these are errors where
a bug in the deduplicator has failed to trace types correctly and has left at
least one type in the shared module depending on a type in a non-shared module,
or has failed to mark a type as shared at all.  In all these cases, you'll
eventually get an error from lookup_ctf_type() of the general form

blah.c:413:foo_t: Internal error: lookup of flob found in different file.

The first two parts of this error are the translation unit and line number the
type being assembled (usually a structure or union) was found in: foo_t is the
name of the type being assembled.  The type of the structure being looked up
appears nowhere, because we don't know it, but the name of the member is given
("flob" above, or "(unnamed)" if we don't know it).  If you want more
information, you can pass -DDEBUG to the compilation of dwarf2ctf.c in
scripts/dwarf2ctf/Makefile, and rerun dwarf2ctf, and you'll get the module and
filename in which both the originating and the target types appear.  In order to
actually track down the bug you'll probably have to run dwarf2ctf with
DWARF2CTF_TRACE set in the environment, and look at the place where the target
type was deduplicated, and try to figure out why the deduplicator didn't trace
the reference to the type in foo_t correctly.

There is another kind of error, though: a failure to assemble a single type,
perhaps because DWARF was emitted that we don't know how to understand (this is
particularly likely in structure assembly, where we are highly dependent on the
form of the DWARF that GCC happens to emit for DW_AT_member_location).  We pass
an 'enum skip_type' around, which has three possible values, one of which is
SKIP_ABORT.  After every type is assembled, we call the notably inefficient
function ctf_update() on the CTF file we're working over.  This takes the
in-memory structures and serializes them (all of them, every time), following
which they can be referred to by other CTF types.  If a SKIP_ABORT propagates up
to construct_ctf_id(), we call ctf_discard() instead, which throws away every
type constructed since the last ctf_update() -- i.e., the specific erroneous
type we've just been working on.  (We might have emitted some parts of it and
then failed, so we should try to clean up.)

A SKIP_ABORT is not fatal unless DEBUG is defined: its only effect is to omit
one single type from the resulting CTF, which is probably still usable.

There's more error-handling complexity inside die_to_ctf(), where errors from
libdtrace-ctf are actually reported (there may be multiple of them for a single
type, e.g. if we are assembling a structure and several members somehow refer to
a type we do not know about).


die_to_ctf() itself has the sort of parameter list that can make people swear
off C for life.  It is largely explained in the description of ctf_assembly_fun.
Most parts of it are hardly used in the function itself, just passed down to CTF
assembly functions.  The only part of die_to_ctf() that is worth noting is that
it goes to extra effort to call ctf_update() early when new structure and union
DIEs are created, before any of their members are assembled.  This allows self-
references like

struct foo {
	struct foo *next;
};

to work (since you can't refer to a type in CTF if you haven't called
ctf_update() since adding that type).

Finally, we must note the override flag.  Both die_to_ctf() and
construct_ctf_id() return a CTF ID.  This is thrown away by the DWARF walking
code (the function construct_ctf() exists just for that purpose), but when
called by lookup_ctf_type(), this CTF ID is taken to be the single ID of the CTF
type that's just been assembled.  Normally this is the same as the CTF ID
returned by the CTF assembly function for the top-level DWARF DIE, but there are
a few structures for which we want to return the result of some other CTF
assembly function.

The only currently-existing example is array dimensions, which DWARF represents
as a typed array DIE whose child is a dimension, but which CTF represents as an
array-with-dimensions that you can't change afterwards.  We can't assemble an
'array' at the top level because we don't know how big it is, but we have to
track the type recorded there somehow.  We handle this by having
assemble_ctf_array(), the assembly function for the top-level DW_TAG_array_type
DIE, simply look up the type of the array's members and return its ID as if it
had just constructed it, after which assemble_ctf_array_dimension(), the
assembly function for DW_TAG_subrange_type, actually constructs the array,
wrapping it around the CTF 'ID' 'assembled' by the parent and setting the
override flag to make sure that this is what is really recorded.


CTF construction functions
--------------------------

Each CTF construction function takes a single DWARF DIE and turns it into CTF,
somehow.  They are laid out in the assembly table described in 'toplevel DWARF
walkers' above.  They all start the same way, with a series of CTF_DW_ENFORCE or
CTF_DW_ENFORCE_NOT assertions.  These guard against corrupted DWARF missing some
of the attributes we need, or DWARF containing attributes which indicate that we
can't handle the content (e.g. DW_AT_signature or DW_AT_specification on
structures, which would both indicate this is DWARF 4, which we can't handle
yet.)

We'll go through these functions one by one, pointing out anything that
maintainers should be aware of.


assemble_ctf_base() assembles all integral base types (DW_TAG_base_type) and
transforms them into the corresponding CTF type.  The functions we need to call
for this in the CTF API all have the same type signature but have different
names; CTF also distinguishes between the various differently-sized
floating-point types, so we must figure out from the type size which type a
given DWARF base type is referring to.  We map from DWARF encoding to a triple
of (CTF addition function, CTF integral type, type size) where the latter is
optional and depends on the size of the DWARF type we are encoding and the size
of various floating_point types on the current system.  (This does mean that
cross-compilation using dwarf2ctf is likely to fail fairly often: we need
machinery to determine the sizeof() types on the target system before that can
function.)

This sizeof()-based search procedure is why we do not currently support
DW_AT_bit_size for base types: we could easily support it for sizes modulo 8,
but GCC happens to emit DW_AT_byte_size in this case.  In C DW_AT_bit_size is
likely to be emitted only for bitfields in structures anyway, not for base
types.


assemble_ctf_pointer() and assemble_ctf_typedef() are trivial: look up the
associated type with lookup_ctf_type() and assemble the appropriate thing.
assemble_ctf_cvr_qual() is almost as trivial, but has to figure out which of
const, volatile or restrict it was called for and call the corresponding CTF API
function.  assemble_ctf_enumeration() and assemble_ctf_enumerator() are quite
simple too.


assemble_ctf_variable() has a couple of extra complexities: we unconditionally
set the skip parameter to SKIP_SKIP, suppressing recursion into containing DIEs,
since we already know we won't care about any of them.  Also, while the
deduplication pass unifies opaque and non-opaque structures into the same type,
it never makes sure that variables declared in the same header by translation
units which have opaque versus non-opaque structures in scope are deduplicated.
e.g. you could well end up with these two type IDs, depending on whether
<linux/pid_namespace.h> was included before <linux/pid.h> in a given translation
unit:

////struct pid_namespace var init_pid_ns
/path/to/kernel/include/linux/pid_namespace.h//19//struct pid_namespace var init_pid_ns

These variables both refer to the same type, but deduplicating them would
require an additional deduplication pass.  Since variables are always terminal
and nothing can refer to them, nothing will ever look up any of those type IDs
(since the only thing that looks up type IDs is code that is searching for type
that other types depend on).  So we don't care about this duplication and
running an additional deduplication pass to eliminate it would slow down
dwarf2ctf to no good end.  It's better just to ignore duplicate errors from
ctf_add_variable().


assemble_ctf_array() and assemble_ctf_array_dimension() we talked about
above. One last subtlety remains, which is that figuring out the actual
dimensionality of an array is complicated enough that it has been hived off into
a private_subrange_dimension() function, called both from here and from
type_id().  Arrays with neither a DW_AT_upper_bound nor a DW_AT_count, and
arrays without an indexing type, are best considered flexible arrays; arrays
whose upper bound or count is not unsigned or signed integral data are also
flexible (perhaps they're using a full-blown location list, but we can't encode
that in CTF so we treat it as flexible); and if an upper bound is used, we want
to add one to its value before treating it as a count of elements.


This leaves structure/union assembly, both of which are assembled by the same
pair of functions, assemble_ctf_struct_union for the type itself and
assemble_ctf_su_member() for the individual members.  As with
assemble_ctf_cvr_qual(), we have to look at the tag to figure out which CTF
function to use to do the assembly, but we have an extra constraint: it is
perfectly idiomatic C to declare a structure repeatedly with a different number
of members every time.  This is perfectly permissible as long as the leading
portions of all declarations match.  We do not verify this (we hope that the
compiler will diagnose it, which it will unless the conflicting declarations
cross modules), though perhaps we should: we simply look up the structure in the
CTF and the DWARF and skip assembly of the structure members via SKIP_SKIP if
the already-assembled structure has at least as many members as the current one.

assemble_ctf_su_member() is by far the most complex of the assembly functions.
It has to handle members that already exist, members that need assembly, members
that correspond to unnamed structure members, numerous different ways of
representing structure offsets and members with no offset at all.

The offset computation is quite laborious and by no means complete: a complete
implementation would require an interpreter for DWARF location lists, which is
total overkill given that in DWARF2 GCC emits a totally stereotyped location
list, and in DWARF3+ we don't need location list parsing at all.  CTF wants an
offset in bits.

We have five cases:
 - for DW_AT_data_bit_offset, we just use the offset unchanged.

 - for DW_AT_data_member_location with an integral form (data2, data4, data8,
   udata, or sdata) we just look it up and multiply it by eight, adding the
   parent's DW_AT_bit_offset to handle structures nested inside other
   structures.

 - for DW_AT_data_member_location with a block form, we make sure that the list
   is of one particular simple form (DW_OP_plus_uconst and a constant value in
   bytes), and abort assembly otherwise.  The only case I know of where this
   test will trip is C++ virtual bases: if people are using C++ code with
   virtual bases inside the kernel they deserve sympathy, but probably not
   support in the code.  CTF can't represent C++ types in any case.

 - for expression location lists, or anything else that we don't understand, we
   simply die (we could simply skip the type, but this seems serious enough that
   dying is warranted).

 - with none of these present, we have no offset: the member is at the same
   location as the start of the structure.

But where is the 'start of the structure'?  That depends on whether this is an
unnamed struct/union member (usually a union).  If it is, we want to fold all
its members directly into the parent structure, with their offsets increased by
the offset of the unnamed member as a whole.  This is done by directly calling
die_to_ctf() with the first child of the anonymous member's type and with all
other parameters set as if the parent DIE was the current structure, thus
fooling die_to_ctf() into believing that these members are members of the
current structure, not of the anonymous one.  The offset-increasing magic is
done via the parent_bias parameter to die_to_ctf() and all the CTF construction
functions: it is ignored by all of them except for assemble_ctf_su_member()
itself, which adds the parent bias onto the normally-computed offset, and is
otherwise passed down unchanged to all children.  This means that even this
terribly contrived case works:

struct horror {
	int spacer;
	union {
		struct {
			int spacer;
			struct {
				int foo;
				int bar;
			} b;
		} a;
	};
};

In this situation, horror.a.b.bar may have:

 - a nonzero parent_bias due to the offset of the anonymous union in 'struct
   horror'
 - a nonzero offset due to the offset of 'bar' in its containing structure
 - if DW_AT_data_member_location with integral form is used, a nonzero
   DW_AT_bit_offset of 'b' in 'a'

If this is not an anonymous union, we are dealing with only one member: we look
up its type and add it reasonably conventionally via ctf_add_member_offset().
Even here there are subtleties: we use construct_ctf_id() directly rather than
via lookup_ctf_type() so we can get a better error message on failure, and we
ignore any duplicate-member errors because this is probably a sign that this
structure has already been encountered and we are working through another
instance of it with more members.


Writeout
--------

   write_types()

This couldn't really be simpler, as the trivial call graph shows.  We create an
output directory with the requested name, then work over the entire
module_to_ctf_file hash, writing out every CTF file into a new suitably-named
file via zlib's compressed file I/O functions.
