戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌
___ _ _ ___
� 囝� 囝
� � � 囝_
戌戌戌 戌戌戌 戌戌戌 戌 戌 戌戌戌
戌 戌 戌 戌 戌 戌 戌
戌戌戌 戌戌 戌戌 戌戌 戌 戌 戌戌
戌 戌 戌 戌 戌 戌
戌 戌戌戌 戌 戌 戌戌戌 戌戌戌
_ ___ _ _ ___ _ _ ___
� 囝� 囝� � � � � �
囝_ � � � 囝� 囝� �
戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌戌
------------------ a presentation by Lord Julus (C) 1999 -------------------
--------
Briefing
--------
There are several documents around the PE file layout, more or less
documented, more or less well presented, but most of them tend to corectly
cover all aspects sorounding this file type layout. However, and I do hope
no one will object, I felt like taking my share of this stuff by writing
down my own PE file description, viewed more from the pure ASM point of view
(as most of the articles are built around the .h files from the WinSDK and
tend to look more on the C programming style). Therefore, read carefuly all
the notes regarding my notations and get your hands on some x86 compiler and
linker to use with. Most of the stuff in here can be used easily by using
the include files I set up and which can be also found in this issue.
Also, in order to check the things as you read and understand them a
file dump utility is absolutely necessary. Here are some names you could
use:
TDUMP - Borland's File dumper (TASM 5.0)
DUMPPE - VCommunications PE dumper (Sourcer 7.0)
DUMPBIN - Microsoft File dumper (Microsoft SDK)
INFO-PE - My own PE dumper
PESPILL - My second PE dumper written in win32 ASM
The first three might be more complete than mine, but in my file
dumper my main idea was to make things easily to understand, therefore I
made it dump the file in a special ordered form, giving you all the
information possible.
Also, please note that at the end of this document you will find an
Annex which will guide you in exploring the PE file using the Assembly
language.
------------
Bibliography
------------
To write this article I compiled information gathered from many
documents and I wish to thank all of the authors of those articles for their
research and explanations:
Kath Randy
Matt Pietrek
Michael J. O'Leary
Luevelsmeyer
Also, I would like to send a special greeting to a friend that helped
me purge the fog off some structures and values in the PE file: Jacky
Qwerty/29A.
-----------
First notes
-----------
PE stands for "Portable Executable", and represents the new encoding
for the executable files introduced by Microsoft in the win32 platform.
Meaning that the PE file can run on Windows95/98, WindowsNT, as well as
Win32s. The Portable Executable layout is inspired from the COFF file layout
used in some UNIX platforms. As complicated as the PE layout looks like,
actually it is not complicated at all, it contains lots of useless data, as
well as some redundant data. Understanding the basic concepts will help you
understand basically anything in the PE layout. The PE file follows the NE
("New Executable") file format used in the Windows 3.xx.
The PE files we will talk about in this document are the PE
executable files (extension .EXE, usually). But, you should know that a very
important part of the PE files are also the dinamic link libraries, the .DLL
files which share the same layout. The OBJ files (output of the compiler and
input for the linker) have also the same format in the win32 environment,
but they have some more information in the headers than the PE executables.
Anyway, we will stop at the PE executable files, but some refferences may be
done to the DLLs as well.
In order to get all this let me tell you what happens when a PE file
gets loaded by the system. First, a virtual memory area is created going
from 00000000h up to 0FFFFFFFFh. The file is loaded into this memory area,
actually mapped, starting from an address until that address plus filesize,
with some alignments, of course. The address where the PE file gets loaded
is called the imagebase of the file (or base of image).
Two very important notions you need to be aware of are the virtual
address and the relative virtual address. The formula we have is this:
imagebase + Relative Virtual Address = Virtual Address
or imagebase + RVA = VA
So, the relative virtual address tells you where in memory you can
find a certain data, but only adding the imagebase to it, you can actually
have the real address. This was done in this way because even if a DLL
file has a certain imagebase inside its PE header (as we will see), due to
address clashes (different DLLs trying to load at the same imagebase), some
DLLs are forced to be loaded at other free imagebases. In this way, the RVA
simply needs to be added the new imagebase and all addresses are correctly
calculated and all pointers point to the right data. You will see how
important it is to know exactly the difference between the RVA and the VA.
Another important thing is that Win32 is running in protected mode.
This means that all memory is expressed in pages, and therefore a lot of the
information must be aligned to page boundaries as we shall see as follows.
---------------
Basic structure
---------------
The basic structure of the PE file is bulit around headers and data.
There exist a number of well defined headers one after another and a set of
data that you can access using the information you gathered from the
headers.
Basically, the PE file looks like this:
+---------------------------+
� Old MS-DOS header �
+---------------------------�
� MS-DOS stub �
+---------------------------�
� PE header �
+---------------------------�
� PE Optional Header �
+---------------------------�
� PE Data Directory �
+---------------------------�
� Section header #1 �
+---------------------------�
� Section header #2 �
+---------------------------�
� Section header #3 �
+---------------------------�
| ... |
+---------------------------�
� Section header #n �
+---------------------------�
� Section body #1 �
+---------------------------�
� Section body #2 �
+---------------------------�
� Section body #3 �
+---------------------------�
| ... |
+---------------------------�
� Section body #n �
+---------------------------+
So, you see how easy is the layout. Firstly the headers and then the
real data. Let us look each part in depth.
-----------------
Old MS-DOS header
-----------------
The Old MS-DOS header looks, as it speaks, exactly like the old
ms-dos header. The only new adding is the PE header offset and the MS-DOS
stub. So, the MS-DOS header is:
IMAGE_DOS_HEADER STRUC ; DOS .EXE header