Intro to RE: C : part-3

In the previous blog, I discussed some of the basic C program’s disassembly structures, concentrating on the variables and their memory layouts. This article, a follow-up to the previous one, focuses on basic operations and functions in C programs.

In the previous blogs, we have seen what an empty C program looks like

void main() {}

Disassembly:

main:
        push    rbp
        mov     rbp, rsp
        nop
        pop     rbp
        ret

Arithmatic operators

Now if we want to work with operations, we’ll have to add 2 local variables to the function. Something like in the below example.

void main() {
    int a=1, b=2;
}

Disassembly:

main:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], 1
        mov     DWORD PTR [rbp-8], 2
        nop
        pop     rbp
        ret

Addition

Now let’s perform add operation on the 2 local variables we created and save the result in a new variable.

void main() {
    int a=1, b=2;
    int c = a + b;
}

Disassembly:

main:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], 1
        mov     DWORD PTR [rbp-8], 2
        mov     edx, DWORD PTR [rbp-4]
        mov     eax, DWORD PTR [rbp-8]
        add     eax, edx
        mov     DWORD PTR [rbp-12], eax
        nop
        pop     rbp
        ret

We can see a few new instructions in the disassembly code that are responsible for the int c = a + b instruction in the source code.

When we look at them separately, it becomes quite natural to understand.

main:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], 1
        mov     DWORD PTR [rbp-8], 2

        mov     edx, DWORD PTR [rbp-4]      ; Save the first variable value in EDX register
        mov     eax, DWORD PTR [rbp-8]      ; Save the second variable value in EAX register
        add     eax, edx                    ; add EAX and EDX register values, this stores the result in EAX here
        mov     DWORD PTR [rbp-12], eax     ; Move the new value of EAX to third variable

        nop
        pop     rbp
        ret

Let’s look at other arithmetic operations as well

Subtraction

void main() {
    int a=1, b=2;
    int c = a - b;
}

Disassembly:

main:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], 1
        mov     DWORD PTR [rbp-8], 2

        mov     eax, DWORD PTR [rbp-4]      ; load first variable in EAX
        sub     eax, DWORD PTR [rbp-8]      ; subtract EAX with second variable, then save the result in EAX
        mov     DWORD PTR [rbp-12], eax     ; save the new value of EAX in the third register

        nop
        pop     rbp
        ret

Multiplication

void main() {
    int a=1, b=2;
    int c = a * b;
}

Disassembly:

main:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], 1
        mov     DWORD PTR [rbp-8], 2
        mov     eax, DWORD PTR [rbp-4]      ; load first variable
        imul    eax, DWORD PTR [rbp-8]      ; multiply it with second
        mov     DWORD PTR [rbp-12], eax     ; save it in the third
        nop
        pop     rbp
        ret

Division and modulo

If you are not aware, the division operation is about calculating the quotient and the modulo operation is about the remainder.

void main() {
    int a = 1, b = 2;
    int c = a / b;
    int d = a % b;
}

Disassembly:

main:
        ; prologue
        push    rbp
        mov     rbp, rsp

        ; first and second variable
        mov     DWORD PTR [rbp-4], 1
        mov     DWORD PTR [rbp-8], 2

        ; division
        mov     eax, DWORD PTR [rbp-4]      ; Load first variable in EAX
        cdq                                 ; Convert double to quad value;
        idiv    DWORD PTR [rbp-8]           ; perform idiv operation with second variable
        mov     DWORD PTR [rbp-12], eax     ; Store new EAX value in third variable

        ; modulo
        mov     eax, DWORD PTR [rbp-4]      ; Load the first value again
        cdq                                 ; Convert double to quad value;
        idiv    DWORD PTR [rbp-8]           ; perform idiv operation with second variable
        mov     DWORD PTR [rbp-16], edx     ; Store the EDX value in fourth variable

        ; epilogue
        nop
        pop     rbp
        ret

If you didn’t notice, the division result was stored in the EAX register, while the modulo result was stored in the EDX…Everything else stays unchanged.

Its OKAY if you are having questions like-

How?
why EDX??
WTF is going on???
Does it perform both operations even if either one of them is required????

These instructions, however, are not as simple to understand as others. So allow me to attempt to explain what’s going on.

To begin, you must comprehend the cdq instruction’s magic. This converts a Doubleword to a Quadword by extending the sign bit of EAX into the EDX register. For the purposes of this blog, consider the EAX and EDX to be joined together to form a large quadword register. So, if EDX contains 0x12 and EAX contains 0x3456789a, the resulting value is 0x123456789a. Does that make sense?

So when a idiv (or other div derivatives) operation is performed, both the quotient and the remainder are calculated. The instruction stores the quotient in EAX and the remainder in EDX register.

Now that you understand the concept, you can think about removing some of the repeated instructions to make your program smaller and run faster.

main:
        ; prologue
        push    rbp
        mov     rbp, rsp

        ; first and second variable
        mov     DWORD PTR [rbp-4], 1
        mov     DWORD PTR [rbp-8], 2

        ; division and modulo
        mov     eax, DWORD PTR [rbp-4]      ; Load first variable in EAX
        cdq                                 ; Convert double to quad value;
        idiv    DWORD PTR [rbp-8]           ; perform idiv operation with second variable
        mov     DWORD PTR [rbp-12], eax     ; Store new EAX value in third variable (quotient)
        mov     DWORD PTR [rbp-16], edx     ; Store the EDX value in fourth variable (remainder)

        ; epilogue
        nop
        pop     rbp
        ret

Another point worth mentioning is that div operations cannot be used without overwriting the contents of the EAX and EDX registers. If you want to use the values of these registers after the div operation, save them somewhere else where they can be read later.

Increment/Decrement operators

That’s all there is to arithmetic operators. Let’s move on to the increment and decrement operators…

void main() {
    int A = 5;
    int B = A++;
    int C = ++A;
}

Disassembly:-

main:
        ; prologue
        push    rbp
        mov     rbp, rsp

        ; int A = 5;
        mov     DWORD PTR [rbp-4], 5

        ; int B = A++;
        mov     eax, DWORD PTR [rbp-4]      ; load the value from variable A in EAX
        lea     edx, [rax+1]                ; increment the value and store it in EDX
        mov     DWORD PTR [rbp-4], edx      ; update the incremented value in the variable A
        mov     DWORD PTR [rbp-8], eax      ; Load the old EAX value in variable B;

        ; int C = ++A;
        add     DWORD PTR [rbp-4], 1        ; Increment the value of variable A
        mov     eax, DWORD PTR [rbp-4]      ; Load the updated value of variable A in EAX
        mov     DWORD PTR [rbp-12], eax     ; Store the EAX value in variable C

        ; epilogue
        nop
        pop     rbp
        ret

At this level, I believe you can see that this operator is nothing special. I’ll leave the decrement operator upto you to test and disect. You can always use godbolt.org ¹ for quick testing.

Bitwise operators

We can now proceed to examine the bitwise operators from a low-level perspective. (PS: They are my personal favourites)

void main() {
    int A = 5, B = 0;
    int C1 = A & B;
    int C2 = A | B;
    int C3 = A ^ B;
    int C4 = ~ B;
}

Disassembly:-

main:
        ; prologue
        push    rbp
        mov     rbp, rsp

        ; int A = 5, B = 0;
        mov     DWORD PTR [rbp-4], 5
        mov     DWORD PTR [rbp-8], 0

        ; int C1 = A & B;
        mov     eax, DWORD PTR [rbp-4]
        and     eax, DWORD PTR [rbp-8]
        mov     DWORD PTR [rbp-12], eax

        ; int C1 = A | B;
        mov     eax, DWORD PTR [rbp-4]
        or      eax, DWORD PTR [rbp-8]
        mov     DWORD PTR [rbp-16], eax

        ; int C1 = A ^ B;
        mov     eax, DWORD PTR [rbp-4]
        xor     eax, DWORD PTR [rbp-8]
        mov     DWORD PTR [rbp-20], eax

        ; int C1 = ~ B;
        mov     eax, DWORD PTR [rbp-8]
        not     eax
        mov     DWORD PTR [rbp-24], eax

        ; epilogue
        nop
        pop     rbp
        ret

Simple and neat. Aren’t they? Load the variables in registers, perform the operation, store the result.

shift right/left operators

Then there are shift operators - shift left and shift right.

void main() {
    int A = 1;
    int B = A << 4;
    int C = B >> 4;
}

Disassembly:-

main:
        ; prologue
        push    rbp
        mov     rbp, rsp
        ; int A = 1;
        mov     DWORD PTR [rbp-4], 1

        ; int B = A << 4;
        mov     eax, DWORD PTR [rbp-4]
        sal     eax, 4                  ; Shift arithmetic left
        mov     DWORD PTR [rbp-8], eax

        ; int C = B >> 4;
        mov     eax, DWORD PTR [rbp-8]
        sar     eax, 4                  ; Shift arithmetic right
        mov     DWORD PTR [rbp-12], eax

        ; epilogue
        nop
        pop     rbp
        ret

If you look at their binary representation, shift operators are very straightforward. Allow me to create an image for you.

Initial value in memory

After shifting left 4 times.

Blocks with “.” are the freshly shifted block from outside the memory frame. These blocks are packed with zeroes. This makes our resulting value 2^4 = 16.

If we shift it right 4 times we’ll get our initial value.

Consider this: if we conducted a shift right operation with this, the entire frame would be filled with 0s, and the resultant value would be zero. No matter how many shifts we make.

Another intriguing thing is that you can multiply a number by 2 using the shift left procedure…without actually using the * operation.

void main() {
    int a = 589;
    int X = a*2;
    int Y = a << 1;
}

main:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], 589

        ; int X = a*2;
        mov     eax, DWORD PTR [rbp-4]
        add     eax, eax
        mov     DWORD PTR [rbp-8], eax

        ; int Y = a << 1;
        mov     eax, DWORD PTR [rbp-4]
        add     eax, eax
        mov     DWORD PTR [rbp-12], eax

        nop
        pop     rbp
        ret

At a lower level, they are identical. Nothing particularly useful, but it’s good to know what’s happening behind the scenes.

Branching

Now comes the branching. Every good program employs branching for one reason or another. This is very useful to understand when considering reverse engineering.

If-else

void main() {
    int a = 1;
    int x;

    if(a==2)
        x = 10;
    else
        x = 5;
}

Disassembly:

main:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], 1
        cmp     DWORD PTR [rbp-4], 2
        jne     .L2
        mov     DWORD PTR [rbp-8], 10
        jmp     .L4
.L2:
        mov     DWORD PTR [rbp-8], 5
.L4:
        nop
        pop     rbp
        ret

Let’s understand this step by step

Line	Description
Line 1	label for the function starting
Line 2:3	Prologue; Setting up the function frame
Line 4	int a = 1;
Line 5	Comparing this value with a hardcoded value `2`
Line 6	If the result of the comparision is not equal, then jump to `L2` flag
Line 7	x = 10; This will run if it didn’t jump to `L2`
Line 8	Now jump to `L4`
Line 9	Flag for `L2`
Line 10	x = 5;
Line 11:14	epilogue for the function

Here is a graph to make it more simpler

Switch-case

Branching can also be implemented with switch-case directives in C and some other languages. At lower level, they function similarly to if-else.


void main() {
    int a = 1;
    int x;

    switch(a){
        case 1: {
            x = 10;
            break;
        }

        case 2: {
            x = 20;
            break;
        }
    }
}

Disassembly:-

main:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], 1

        ; Compare and jump if equal
        cmp     DWORD PTR [rbp-4], 1
        je      .L2

        ; Compare and jump if equal
        cmp     DWORD PTR [rbp-4], 2
        je      .L3

        ; Default jump to the end
        jmp     .L4

.L2:
        mov     DWORD PTR [rbp-8], 10
        jmp     .L4

.L3:
        mov     DWORD PTR [rbp-8], 20
        nop

.L4:
        nop
        pop     rbp
        ret

See…just like if-else statements, switch-case statements also use cmp and jmp instructions to create branches in the flow.

Graph diagram for the above disassembly will look something like this

With all that out of the way, let us take a brief look at how function calling works at the low level.

Functions

I always wished to demonstrate people an infinite loop with recursion. So here you have it.

void main()
{
    main();
}

Disassembly:-

main:
        push    rbp
        mov     rbp, rsp
        mov     eax, 0
        call    main
        nop
        pop     rbp
        ret

Each time the call main instruction is encountered, the main() function is called, and a new function frame is created. Due to the lack of an exit condition, the processor will never be able to read anything beyond the call main instruction, and thus the function will never return. Hence, the infinite loop.

Now take a look at how things change when we add arguments to a function.

void main()
{
    main(1,2,3,4,5,6,7,8,9,10);
}

Disassembly:-

main:
        push    rbp
        mov     rbp, rsp
        push    10
        push    9
        push    8
        push    7
        mov     r9d, 6
        mov     r8d, 5
        mov     ecx, 4
        mov     edx, 3
        mov     esi, 2
        mov     edi, 1
        mov     eax, 0
        call    main
        add     rsp, 32
        nop
        leave
        ret

If you examine the pattern, you will notice that the arguments are loaded in a particular sequence - from right to left. The first six arguments remain in the registers edi, esi, edx, ecx, r8d & r9d (from left to right). The rest of the arguments are stored on the stack.

This pattern is followed by any function that you wish to invoke from your code.

void main()
{
    printf(1,2,3);
}

Disassembly:-

main:
        push    rbp
        mov     rbp, rsp
        mov     edx, 3
        mov     esi, 2
        mov     edi, 1
        mov     eax, 0
        call    printf
        nop
        pop     rbp
        ret

This is obviously not the correct method to invoke a printf function. Printf’s first argument should be a string (possibly a format string).

void main()
{
    printf("Hello");
}

Disassembly:-

.LC0:
        .string "Hello"
main:
        push    rbp
        mov     rbp, rsp
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        nop
        pop     rbp
        ret

The Hello string is kept at LC0 offset here. So we load the offset’s pointer to value and put it in edi. Then execute the printf() function, which will take as its first argument the value stored in the edi register.

If you are wondering what’s the point of mov eax, 0 just before printf call… read this StackOverflow thread ²

If we add 1 more argument to printf call, that should be stored in esi register. And the right-most argument will be processed first.

void main()
{
    printf("%d", 10);
}

Disassembly:-

.LC0:
        .string "%d"
main:
        push    rbp
        mov     rbp, rsp
        mov     esi, 10
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        nop
        pop     rbp
        ret

Printf, like all other C functions, has a return value that is an int type. Printf gives the number of characters in the format string that the function has processed.

void main()
{
    int x = printf("%d\n", 10);
    printf("%d\n", x);
}

Disassembly:-

.LC0:
        .string "%d"
main:
        ; Prologue
        push    rbp
        mov     rbp, rsp

        ; Getting memory block for variables
        sub     rsp, 16

        ; first printf
        mov     esi, 10
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf

        ; Storing return value (eax) of printf in the local variable
        mov     DWORD PTR [rbp-4], eax

        ; second printf
        mov     eax, DWORD PTR [rbp-4]
        mov     esi, eax
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf

        ; epologue
        nop
        leave
        ret

Remember how we talked about that the return values from functions are stored in eax register. Here also, the return value from printf is stored in eax which is then stored in some other local variable.

Since both of my strings for printf were exactly same, the compiler reused it to call printf second time, instead of creating 2 strings with same content.

Function pointers

Let us spice things up a little more now… and look at function pointers.

void main()
{
    printf("%p\n", main);
    printf("%p\n", *main);
    printf("%p\n", &main);
}

Output of this program is not what you might expect if you are not a seasoned C programmer or have never worked with function pointers before.

Output:-

0x55770bbfa139
0x55770bbfa139
0x55770bbfa139

Each of them gave the same output. This is not the case when working with integer pointers. Let’s see how this looks at lower level

Disassembly:-

.LC0:
        .string "%p\n"
main:
        push    rbp
        mov     rbp, rsp

        ; printf("%p\n", main);
        mov     esi, OFFSET FLAT:main
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf

        ; printf("%p\n", *main);
        mov     esi, OFFSET FLAT:main
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf

        ; printf("%p\n", &main);
        mov     esi, OFFSET FLAT:main
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf

        nop
        pop     rbp
        ret

They are all precisely the same!! The assembly code for all three lines remains unchanged.

To test whether it behaves the same way with other functions as well, let’s add another function to the code.

#include <stdio.h>

int func() {}


void main()
{
    printf("%p\n", func);
    printf("%p\n", *func);
    printf("%p\n", &func);
}

Disassembly:-

func:
        push    rbp
        mov     rbp, rsp
        nop
        pop     rbp
        ret
.LC0:
        .string "%p\n"
main:
        push    rbp
        mov     rbp, rsp

        ; printf("%p\n", func);
        mov     esi, OFFSET FLAT:func
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf

        ; printf("%p\n", *func);
        mov     esi, OFFSET FLAT:func
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf

        ;printf("%p\n", &func);
        mov     esi, OFFSET FLAT:func
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf

        nop
        pop     rbp
        ret

Still the same!!

If you have never worked with function pointers before, this is just the beginning of things, we can even call the function func using above pointer notations.

#include <stdio.h>

void func() {}


void main()
{
    func();
    (*func)();
    (&func)();
}

Disassembly:-

func:
        push    rbp
        mov     rbp, rsp
        nop
        pop     rbp
        ret
main:
        push    rbp
        mov     rbp, rsp

        ; func();
        mov     eax, 0
        call    func

        ; (*func)();
        mov     eax, 0
        call    func

        ; (&func)();
        mov     eax, 0
        call    func

        nop
        pop     rbp
        ret

We can even add arguments to our func function call, just like we do with a normal function… and the first argument will be stored in edi the second on in esi and so on.

int func(int x) {
    printf("%d\n", x);
}


void main()
{
    func(5);
    (func)(6);
    (*func)(7);
    (&func)(8);
}

Disassembly:-

.LC0:
        .string "%d\n"
func:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     DWORD PTR [rbp-4], edi
        mov     eax, DWORD PTR [rbp-4]
        mov     esi, eax
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        nop
        leave
        ret
main:
        push    rbp
        mov     rbp, rsp

        ; func(5);
        mov     edi, 5
        call    func

        ; func(6);
        mov     edi, 6
        call    func

        ; func(7);
        mov     edi, 7
        call    func

        ; func(8);
        mov     edi, 8
        call    func

        nop
        pop     rbp
        ret

That’s it for today. In the next article, I’ll try to use all our knowledge we have gathered till now to reverse engineer a very simple calculator program.

Arithmatic operators#

Addition#

Subtraction#

Multiplication#

Division and modulo#

Increment/Decrement operators#

Bitwise operators#

shift right/left operators#

Branching#

If-else#

Switch-case#

Functions#

Function pointers#

Arithmatic operators

Addition

Subtraction

Multiplication

Division and modulo

Increment/Decrement operators

Bitwise operators

shift right/left operators

Branching

If-else

Switch-case

Functions

Function pointers