Saturday, April 18, 2015

PE File of windows

PE file or Portable Executable File is a format of executables, dll and object files. In a high level this is a mixture of different structures. Each structure having specific information of that file. Also these structures will have information about where to look next for further information.

Some of the tutorials and articles which will explain about these PE files are


I am trying to develop a disassembler and one of the model tool for disassembler is DumpBin. Among the different options there is an option called /ALL like DumpBin c:\testPE.exe /ALL 
This will go through the PE file read all the structures of that PE file and display the relevant information for you. I decided to write dll which will do the similar function. Anyways that would help me disassembler also like export functions, picking the code sections etc. While developing this tool learned a lot about PE file structures. I will go through it one by one. Before that I just prepared a high level image of the PE File hope it might help you to understand it


First thing to do is read the file, create the file mapping and obtain the view of that mapped file. This gives you the memory location of first structure of PE file - IMAGE_DOS_HEADER. This is nothing but MS-DOS stub. There are two important members of this structure. First one is e_magic. It should have value as IMAGE_DOS_SIGNATURE which is "MZ". Second one is e_lfanew. This is the offset, if you add this offset to the base image address you will get the address where IMAGE_NT_HEADERS is located. This image nt header is a collection of two headers (file header and optional header) and characteristics. Characteristic should have value IMAGE_NT_SIGNATURE that is "PE\0\0". File header has information like on which machine architecture this PE file will execute, whether this PE file is an executable or a dll, how many sections are present in this PE file, certain characteristics of the PE file like symbols is stripped or not like that. Optional Header has information like whether this is a 32bit or 64 bit application, what type of subsystem it needs like GUI or command line, then dll characteristics like can be relocated at run time, stack size, heap size, code size, then Data Directory array. There are 16 entries in the Data Directory array. Each one gives you two information - relative virtual address [RVA] of the directory location and size of the directory. Directories like export directory, import directory, resource directory, debug directory, etc. If you go to export directory you can find the information regarding functions which are exported by this PE file. similarly if you go to import directory you can find information related to all those functions which are imported by this PE file.

After NT Headers you will get IMAGE_SECTION_HEADERs. In the File Header it gives the information of number of section. Those many IMAGE_SECTION_HEADER structure will follow. Code section, initialized data section, uninitialized data section - these are the few sections. Each section header has information like name of section, virtual size, virtual address , size of raw data and pointer to raw data. Also each sections characteristics like whether it can be executed, or read or written to or even removed if not necessary are present in the Section Header.

Suppose I want to read the contents of that section from the PE File then do the following 3 steps
1. Open the file
2. In the Section header there is a field - PointerToRawData. Move your file pointer to the location which is pointed to by that field.
3. Read all the values upto the length mentioned by "SizeOfRawData" field present in Section header.

 Finding export function details -
0th element of the Data Directory will give you information regarding location and size of Export table. It gives a RVA and this needs to be converted to an offset with respect to base image. The offset is pointing to IMAGE_EXPORT_DIRECTORY which has information like module name, number of functions, address of functions, address of names, address of Name ordinals. Address of function is an RVA which is pointing to array of RVAs of function addresses. Address of Names is an RVA which is pointing to array of strings. Address of Name ordinals is an RVA which is pointing to array of words. Suppose I want to find a particular exported function abc then I will go through the array of strings of AddressOfNames. Let us assume that I found the function at index I in AddressOfNames. Using the same index pick the value present in AddressOfNameOrdinals array. This value will form the index for AddressOfFunctions.









Finding import Function details -
2nd element in the Data Directory [index is 1] will give you information about where to find Import Table.  This offset will point to array of IMAGE_IMPORT_DESCRIPTOR. One for each dll/PE file from which functions have been imported. Last element of the array is an empty image_import_descriptor. So to find how many dlls from which we have imported count till you find an empty image_import_descriptor. Each Image_import_descriptor has following information like library name, whether forward chaining is enabled or not then address of functions and name of functions. There are two fields both points to same thing initially OriginalFirstThunk and First Thunk both are RVAs which point to array of IMAGE_THUNK_DATA structures. Using the AddressOfData field we can identify whether the function was imported by name or through ordinal.  If IMAGE_ORDINAL_FLAG32 == (AddressOfData & IMAGE_ORDINAL_FLAG32) then it is through ordinal. AddressOfData & 0x7FFFFFFF will give you ordinal number. Else it is through function name. Function field is a RVA which points to IMAGE_IMPORT_BY_NAME structure. This has the name of the function.
Initially both OriginalFirstThunk and FirstThunk field of IMAGE_IMPORT_DESCRIPTOR will be pointing to same IMAGE_THUNK_DATA which intern has function names. As these functions are loaded I guess FirstThunk field afterwards will be having address of functions rather than function names.
IMPORT_TABLE link from First Thunk



SAME IMPORT TABLE link from Original First Thunk



IMPORT Table after PE file loaded (First Thunk) has been replaced.
  







My Code is present in Github - https://github.com/harsha-kadekar/Disassembler.git
In that solution of the dll can be found in folder ProcessDissector

Please note image has been taken from these following links -
http://www.codeproject.com/Articles/14360/Injective-Code-inside-Import-Table  - For import table
https://msdn.microsoft.com/en-IN/library/ms809762.aspx - For export table.