Wednesday, March 9, 2016

Behind Function call - Dissassembly of the function call in C Microsoft Visual Studio

Just for a curiosity sake, I wanted to know how the function call is actually implemented. I have worked in PLP , where we write assembly programs to a bare hardware. In that we try to simulate the function call by using special instructions like jal/jr. Also in MASM, when we are calling a function with parameters we try to push the parameters to the stack in an understood order and then call the function. Inside the function, we try to access those parameters from stack. 

Now C is a high level programming language and all this stack management is done by the compiler when it is generating object code. I wanted to see how it is done. So in this post I will explain what I learnt in it.

For a function call, there are 3 parts. 
  1. Before function call
  2. During function call
  3. After function return


Lets go one by one. My main function for this experiment is as given below.

int main(int argc, char **argv)
{
int nReturnValue = 0;

int x = 10;
int y = 10;

int value = Add(x, y);

x = 20;
y = 30;

value = AddEmpty();

x = 40;
y = 50;

AddEmptyVoid();

return nReturnValue;
}


I will be explaining 3 function calls - Add [has return value, has parameters], AddEmpty [has return value, but no parameters], AddEmptyVoid [No return value, No parameters].

a. Add Function [int Add(int x, int y)]

int Add(int x, int y)
{
int value = x + y;

return value;
}



So when I call a function in C, internally compiler will translate into the code as shown in the above image. So all the instruction before the Call Add (012A1000h) falls under  before function call, Call instruction and some others (which I will show afterwards) comes under during function call. Finally all the instructions after call comes under after function call category.

Before Function Call: In this function I have 2 parameters - x and y. Here I am pushing those values into stack before going to the function, so that function can use those variables from the stack. As you can see, it happens from right to left that is first y and x. Suppose I have 3 parameters like SomeFunc(x, y, z) then it would be push z, push y, push x. 

During Function Call: Call will store the return address into the stack. That is what ever value present in the EIP will be stored/pushed into stack before going to function. Here EIP will be pointing to add esp,8 instruction as shown in the above image. Before starting actual computation of function, it has to some extra work which is shown in the below image.

That's a lot of work. Lets go one by one. Ebp is base pointer. In every function, this will be holding your frame pointer. Frame pointer is nothing but your starting point of the function in the stack. Stack pointer will keep on varying as function proceeds but frame pointer is constant for that function. When I came inside Add function, Ebp is still pointing to old functions frame pointer, in this case it is main functions frame pointer. First work is backup old functions frame pointer(push ebp) and change eBp to point to this functions frame pointer. Esp is pointing to the top of the stack which is beginning of our function, so copy it to ebp (mov ebp, esp) which forms our functions frame pointer. Once functions frame pointer is established, now allocation of local variables. Allocate space in stack for the local variables. As stack grows in reverse direction, to allocate space in stack we are subtracting the stack pointer(sub esp, 0xCCh). If I have 0 local variables, by default it will subtract 0xC0h that is 190 bytes. Why 190 bytes I am not sure. In this function, I have one local variable value which is of type int. Here again, it is reserving 12 bytes for 1 integer. Once this is done, now backup the old functions registers relating to string or array operation registers that is ebx, esi and edi. This is done by pushing those registers to stack. Next 4 instructions help to initialize the local variables with some value. In this case 0xCCCCCCCCh is stored for each local int variable. For local variables, we have allocated 0xCCh bytes. Divide that number by 4, as we are considering dwords which are 4 bytes. This gives us 0x33h value. rep stos instruction uses registers ecx, eax and edi for its working. Basically it will loop ecx times, during each loop initializes the address as pointed by edi with value mentioned in eax register. Once initialized, it will increment edi by 4. So lea edi, [ebp-0CCh] is initializing edi to the beginning of the local variable section of stack. Then mov ecx, 33h instruction stores  ecx with the number of dwords so that those many times it can be looped. mov eax, 0CCCCCCCCh instruction loads the value used for initializing the local variables. Last instruction rep stos dword ptr es:[edi] will loop ecx times and initializes each dwords pointed by edi to value stored in eax. 
Finally all the necessary work is done, now it will start the function computation.

After Function call: This comes after return statement of the function. Similar to during function call, some work is done by the called function (add) and some done by callee function(main). Basically we need to store the return value if any and then do the cleanup of the stack. Whole idea is before returning to callee return the stack to same state how it was before function call. 

The return value is passed by using eax register. So value variable's value is copied to eax register (mov eax, dword ptr [value]). Next is restore the registers which were pushed at the beginning of the function call. Remember pop has to happen in reverse order that is those registers which were pushed last has to be popped first. So first edi, next is esi and finally ebx. Now go back to the beginning of the function call. Frame pointer will be  pointing to the beginning of the function call. Ebp holds the function pointer. So copy your frame pointer to stack pointer (mov esp, ebp). Remember just before creating the frame pointer of this function, we had pushed the old functions frame pointer to the stack. As we are going back to the old function, restore the frame pointer of the old function (push ebp).  Finally call the ret. This will initialize the EIP to the return address. Remember with Call return address was pushed to stack. ret will pop that return address and store it to EIP. Now control comes back to main function and it would point to add esp, 8 instruction. Remember, for calling function add we had pushed 2 integers to stack, so we need to pop those values from the stack. But we don't use those values, so we just reduce the top of the stack (add esp, 8). Return value of the function is present in eax register. That value will be copied to the local variable (mov dword ptr [value], eax). 

b. AddEmpty Function [int AddEmpty()]

int AddEmpty()
{
int x = 10;
int y = 20;

return x + y;
}

Now this is similar to previous Add function call, but with small changes. This function is not having any parameters. So before Call instruction, we need not have to push any values to the stack. Also after the Call we need not have to correct the stack top as we have not pushed any parameters to stack.

In AddEmpty function, we have 2 local variables. Hence local variable storage is 0xC0h + 0x18h = 0xD8h. So while initializing the local variables, we need to loop 0xD8h/4 = 0x36h times. Rest all are same as previous function call.


Return of the function is same as previous one.


c. AddEmptyVoid [void AddEmptyVoid()]

void AddEmptyVoid()
{

}

As you can see, this function neither has parameters not does it return any value. It will directly call the function using Call instruction. Also after the return of the function it is not copying the value stored in eax register as function is not returning anything.


As it has no local variables, stack allocation for local variables is 0xC0h. For local variable initialization we have to loop 0xC0h/4 = 0x30h times. Rest all are same as previous function calls. 



This explains only for x86. There is a difference when I am using x64. Currently I am running late :-p So will update the post some later day.

Please let me know if any errors present in the explanation or if i have done any wrong statements.


Following are the references which helped me understand the concepts.





No comments:

Post a Comment