JMP
EBP
SUB
12
LIBRARY RECOGNITION USING
FLIRT SIGNATURES
At this point it is time to start moving
beyond IDA’s more obvious capabilities
and begin our exploration of what to do after
“The initial autoanalysis has been finished.”
1
In
this chapter we discuss techniques for recognizing
standard code sequences such as the library code con-
tained in statically linked binaries or standard initializa-
tion and helper functions inserted by compilers.
When you set out to reverse engineer any binary, the last thing that
you want to do is waste time reverse engineering library functions whose
behavior you could learn much more easily simply by reading a man page,
reading some source code, or doing a little Internet research. The challenge
presented by statically linked binaries is that they blur the distinction between
application code and library code. In a statically linked binary, entire libraries
1
IDA generates this message in the message window when it has finished its automated
processing of a newly loaded binary.
IDA_02.book Page 211 Thursday, July 24, 2008 3:31 PM
The IDA Pro Book
(C) 2008 by Chris Eagle
212 Chapter 12
are combined with application code to form a single monolithic executable
file. Fortunately for us, tools are available that enable IDA to recognize and
mark library code, allowing us to focus our attention on the unique code
within the application.
Fast Library Identification and Recognition Technology
Fast Library Identification and Recognition Technology, better known as
FLIRT,
2
encompasses the set of techniques employed by IDA to identify
sequences of code as library code. At the heart of FLIRT are pattern-matching
algorithms that enable IDA to quickly determine whether a disassembled
function matches one of the many signatures known to IDA. The <IDADIR>/sig
directory contains the signature files that ship with IDA. For the most part,
these are libraries that ship with common Windows compilers, though a few
non-Windows signatures are also included.
Signature files utilize a custom format in which the bulk of the signature
data is compressed and wrapped in an IDA-specific header. In most cases,
signature filenames fail to give a clear indication of which library the associ-
ated signatures were generated from. Depending on how they were created,
signature files may contain a library name comment that describes their
contents. If we view the first few lines of extracted ASCII content from
a signature file, this comment is often revealed. The following Unix-style
command
3
generally reveals the comment in the second or third line of
output:
# strings sigfile | head -n 3
Within IDA, there are two ways to view comments associated with signature
files. First, you can access the list of signatures that have been applied to a
binary via View
Open Subviews
Signatures. Second, the list of all signature
files is displayed as part of the manual signature application process, which is
initiated via File
Load File
FLIRT Signature File.
Applying FLIRT Signatures
When a binary is first opened, IDA attempts to apply special signature files,
designated as startup signatures, to the entry point of the binary. It turns
out that the entry point code generated by various compilers is sufficiently
different that matching entry point signatures is a useful technique for iden-
tifying the compiler that may have been used to generate a given binary.
2
Please see http://www.hex-rays.com/idapro/flirt.htm.
3
The strings command was discussed in Chapter 2, while the head command is used to view only
the first few lines (three in the example) of its input source.
IDA_02.book Page 212 Thursday, July 24, 2008 3:31 PM
Library Recognition Using FLIRT Signatures 213
If IDA identifies the compiler used to create a particular binary, then the
signature file for the corresponding compiler libraries is loaded and applied
to the remainder of the binary. The signatures that ship with IDA tend to
be related to proprietary compilers such as Microsoft Visual C++ or Borland
Delphi. The reason behind this is that a finite number of binary libraries ship
with these compilers. For open source compilers, such as GNU gcc, the binary
variations of the associated libraries are as numerous as the operating systems
the compilers ship with. For example, each version of FreeBSD ships with a
unique version of the C standard library. For optimal pattern matching, sig-
nature files would need to be generated for each different version of the
library. Consider the difficulty in collecting every variation of libc.a
4
that has
shipped with every version of every Linux distribution. It simply is not practi-
cal. In part, these differences are due to changes in the library source code
that result in different compiled code, but huge differences also result from
the use of different compilation options, such as optimization settings and the
use of different compiler versions to build the library. The net result is that
IDA ships with very few signature files for open source compiler libraries. The
good news, as you shall soon see, is that Hex-Rays makes tools available that
allow you to generate your own signature files from static libraries.
So, under what circumstances might you be required to manually apply
signatures to one of your databases? Occasionally IDA properly identifies
the compiler used to build the binary but has no signatures for the related
compiler libraries. In such cases, either you will need to live without signatures,
or you will need to obtain copies of the static libraries used in the binary and
generate your own signatures. Other times, IDA may simply fail to identify
a compiler, making it impossible to determine which signatures should be
4
libc.a is the version of the C standard library used in statically linked binaries on Unix-style
systems.
MAIN VS. _START
Recall that a program’s entry point is the address of the first instruction that will be
executed. Many longtime C programmers incorrectly believe that this is the address
of the function named
main, when in fact it is not. The file type of the program, not
the language used to create the program, dictates the manner in which command-
line arguments are provided to a program. In order to reconcile any differences
between the way the loader presents command-line arguments and the way the pro-
gram expects to receive them (via parameters to
main, for example), some initializa-
tion code must execute prior to transferring control to
main. It is this initialization that
IDA designates as the entry point of the program and labels
_start.
This initialization code is also responsible for any initialization tasks that must take
place before
main is allowed to run. In a C++ program, this code is responsible for
ensuring that constructors for globally declared objects are called prior to execution
of
main. Similarly, cleanup code is inserted that executes after main completes in
order to invoke destructors for all global objects prior to the actual termination of the
program.
IDA_02.book Page 213 Thursday, July 24, 2008 3:31 PM
The IDA Pro Book
(C) 2008 by Chris Eagle
214 Chapter 12
applied to a database. This is common when analyzing obfuscated code in
which the startup routines have been sufficiently mangled to preclude com-
piler identification. The first thing to do, then, would be to de-obfuscate the
binary sufficiently before you could have any hope of matching any library
signatures. We will discuss techniques for dealing with obfuscated code in
Chapter 21.
Regardless of the reason, if you wish to manually apply signatures to a
database, you do so via File
Load File
FLIRT Signature File, which opens
the signature selection dialog shown in Figure 12-1.
Figure 12-1: FLIRT signature selection
The File column reflects the name of each .sig file in IDA’s <IDADIR>/sig
directory. Note that there is no means to specify an alternate location for .sig
files. If you ever generate your own signatures, they need to be placed into
<IDADIR>/sig along with every other .sig file. The Library name column
displays the library name comment that is embedded within each file. Keep
in mind that these comments are only as descriptive as the creator of the
signatures (which could be you!) chooses to make them.
When a library module is selected, the signatures contained in the
corresponding .sig file are loaded and compared against every function
within the database. Only one set of signatures may be applied at a time,
so you will need to repeat the process if you wish to apply several different
signature files to a database. When a function is found to match a signature,
the function is marked as a library function, and the function is automatically
renamed according to the signature that has been matched.
WARNING Only functions named with an IDA dummy name can be automatically renamed. In
other words, if you have renamed a function, and that function is later matched by a
signature, then the function will not be renamed as a result of the match. Therefore, it
is to your benefit to apply signatures as early in your analysis process as possible.
Recall that statically linked binaries blur the distinction between applica-
tion code and library code. If you are fortunate enough to have a statically
linked binary that has not had its symbols stripped, you will at least have
useful function names (as useful as the trustworthy programmer has chosen
IDA_02.book Page 214 Thursday, July 24, 2008 3:31 PM
Library Recognition Using FLIRT Signatures 215
to create) to help you sort your way through the code. However, if the binary
has been stripped, you will have perhaps hundreds of functions, all with
IDA-generated names that fail to indicate what the function does. In both
cases, IDA will be able to identify library functions only if signatures are
available (function names in an unstripped binary do not provide IDA with
enough information to definitively identify a function as a library function).
Figure 12-2 shows the Overview Navigator for a statically linked binary.
Figure 12-2: Statically linked with no signatures
In this display, no functions have been identified as library functions, so
you may find yourself analyzing far more code than you really need to. After
application of an appropriate set of signatures, the Overview Navigator is
transformed as shown in Figure 12-3.
Figure 12-3: Statically linked binary with signatures applied
As you can see, the Overview Navigator provides the best indication of
the effectiveness of a particular set of signatures. With a large percentage of
matched signatures, substantial portions of code will be marked as library
code and renamed accordingly. In the example in Figure 12-3, it is highly
likely that the actual application-specific code is concentrated in the far-left
portion of the navigator display.
There are two points worth remembering when applying signatures.
First, signatures are useful even when working with a binary that has not
been stripped, in which case you are using signatures more to help IDA
identify library functions than to rename those functions. Second, statically
linked binaries may be composed of several separate libraries, requiring the
application of several sets of signatures in order to completely identify all
library functions. With each additional signature application, additional
portions of the Overview Navigator will be transformed to reflect the discovery
of library code. Figure 12-4 shows one such example. In this figure, you see
a binary that was statically linked with both the C standard library and the
OpenSSL
5
cryptographic library.
Figure 12-4: Static binary with first of several signatures applied
5
Please see http://openssl.org/.
IDA_02.book Page 215 Thursday, July 24, 2008 3:31 PM
The IDA Pro Book
(C) 2008 by Chris Eagle
评论0