Chapter 1: Introduction 3
This limitation is only a problem when an application reads one format and writes another.
Each BFD back end is responsible for maintaining as much data as possible, and the internal
BFD canonical form has structures which are opaque to the BFD core, and exported only
to the back ends. When a file is read in one format, the canonical form is generated for
BFD and the application. At the same time, the back end saves away any information
which may otherwise be lost. If the data is then written back in the s ame format, the back
end routine will be able to use the canonical form provided by the BFD core as well as the
information it prepared earlier. Since there is a great deal of commonality between back
ends, there is no information lost when linking or copying big endian COFF to little endian
COFF, or a.out to b.out. When a mixture of formats is linked, the information is only
lost from the files whose format differs from the destination.
1.3.2 The BFD canonical object-file format
The greatest potential for loss of information occurs when there is the leas t overlap between
the information provided by the source format, that stored by the canonical format, and
that needed by the destination format. A brief description of the canonical form may help
you understand which kinds of data you can count on preserving across conversions.
files Information stored on a per-file basis includes target machine architecture, par-
ticular implementation format type, a demand pageable bit, and a write pro-
tected bit. Information like Unix magic numbers is not stored here—only the
magic numbers’ m eaning, so a ZMAGIC file would have both the demand page-
able bit and the write protected text bit set. The byte order of the target is
stored on a per-file basis, so that big- and little-endian object files may be used
with one another.
sections Each section in the input file contains the name of the section, the section’s
original address in the object file, size and alignment information, various flags,
and pointers into other BFD data structures.
symbols Each symbol contains a pointer to the information for the object file which
originally defined it, its name, its value, and various flag bits. When a BFD
back end reads in a symbol table, it relo c ates all symbols to make them relative
to the base of the section where they were defined. Doing this ensures that
each symbol points to its containing s ec tion. Each symbol also has a varying
amount of hidden private data for the BFD back end. Since the symbol points
to the original file, the private data format for that symbol is accessible. ld can
operate on a collection of symbols of w ildly different formats without problems.
Normal global and simple local symbols are maintained on output, so an output
file (no matter its format) will retain symbols pointing to functions and to
global, static, and common variables. Some symbol information is not worth
retaining; in a.out, type information is stored in the symbol table as long
symbol names. This information would be useless to most COFF debuggers;
the linker has command line switches to allow users to throw it away.
There is one word of type information within the symbol, so if the format
supports symbol type information within symbols (for example, COFF, IEEE,
Oasys) and the type is simple enough to fit within one word (nearly everything
but aggregates), the information will be preserved.