… prologue
At this point I hope you have a general idea of how a C program goes through multiple stages/passes and finally an ELF file is generated. Below is a diagram to jog your memory on this
┌──────────────────┐
│ │
│ hello.c │ // C source
│ │
└────────┬─────────┘
│
│
│ /* Compile */
│
│
│
▼
┌──────────────────┐
│ │
│ hello.s │ // assembler source
│ │
└────────┬─────────┘
│
│
│ /* assemble */
│
│
▼
┌──────────────────┐
│ │
│ hello.o │ // Assembled program (ELF - relocatable)
│ │
└────────┬─────────┘
│
│
│ /* link */
│
│
▼
┌──────────────────┐
│ │
│ hello │ // Executable binary (ELF - executable)
│ │
└──────────────────┘
Creating a simple hello program is very straight-forward, let me show you how this flow works when we are building something that has more than 1 source file. This is generally what most of the “real-world” projects do, they create multiple files with different functionalities and then merge them together to complete the program with the desired features only.
┌────────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ │ │ │ │ │
│ libarithmatic.c │ │ libarithmatic.h ├───────► │ main.c │
│ │ │ │ │ │
└─────────┬──────────┘ └─────────────────┘ └────────┬────────┘
│ │
│ │
│ /* Compile + assemble */ │ /* Compile + assemble */
│ │
│ │
▼ ▼
┌─────────────────────┐ ┌────────────────────┐
│ │ │ │
│ libarithmatic.o │ │ main.o │
│ │ │ │
└─────────┬───────────┘ └──────────┬─────────┘
│ │
│ │
│ │
│ │
│ /* Linking Magic */ │
└───────────────────────────────────┬──────────────────────────────────────┘
│
│
│
│
│
│
▼
┌────────────────┐
│ │
│ calc │
│ │
└────────────────┘
/*
File: libarithmatic.c
*/
float addFunc (float a, float b) {
return a + b;
}
float subFunc (float a, float b) {
return a - b;
}
float mulFunc (float a, float b) {
return a * b;
}
float divFunc (float a, float b) {
if (b == 0) {
return 0.0;
}
return a / b;
}
/*
File: libarithmatic.h
*/
#ifndef ARITHMATIC_H
#define ARITHMATIC_H
float addFunc (float, float);
float subFunc (float, float);
float mulFunc (float, float);
float divFunc (float, float);
float magicFunc (float a, float b);
#endif
/*
File: main.c
*/
#include <stdio.h>
#include "libarithmatic.h"
int main() {
float num1, num2, result;
char operator;
printf("Enter equation (9 * 6): ");
scanf("%f %c %f", &num1, &operator, &num2);
switch (operator) {
case '+':
result = addFunc(num1, num2);
break;
case '-':
result = subFunc(num1, num2);
break;
case '*':
result = mulFunc(num1, num2);
break;
case '/':
result = divFunc(num1, num2);
break;
default:
printf("Invalid operator\n");
return 1;
}
printf("Result: %.2f\n", result);
return 0;
}
Luckily gcc provides some features, that helps us to make this process easier.
❯ gcc --help
Usage: gcc [options] file...
Options:
<... OMITTED ...>
-E Preprocess only; do not compile, assemble or link.
-S Compile only; do not assemble or link.
-c Compile and assemble, but do not link.
So if you follow these commands, you’ll be fine
# Compile + assemble -> generates main.o
gcc -c main.c
# Compile + assemble -> generates libarithmatic.o
gcc -c libarithmatic.c
# Linking -> generates calc
gcc main.o libarithmatic.o -o calc
This is our first time so far writing multiple files for a program. So let’s take a moment to understand how this works.
First, we create a libarithmatic.c
file with all of the required arithmatic functions - addFunc
, subFunc
, mulFunc
, and divFunc
. Since this file contains these functions (function definitions), the intermediate object file for this file will have related information as well.
Then comes the main.c
file, where we have declared the main
function. Inside the main function, we have used arithmatic functions which are not defined in this file. This will give an error at compilation time when those functions will not be found, so as a promise we give a declaration that these functions are present somewhere and they will be found in later steps by linker. Here those definitions are present in libarithmatic.h
file – header file for libarithmatic.c
.
So when we are compiling libarithmatic.c
, it’ll create a libarithmatic.o
file which will have 4 arithmatic functions as defined. On the other hand, main.c
will generate a main.o
file that will have a main
function which will be trying to call the arithmatic functions - addFunc
, subFunc
, mulFunc
, and divFunc
.
Question - How did main.o
call these functions when the address of these functions is not known to the compiler??
Answer - Compiler takes main.c
and libarithmatic.h
(a promise that these will be present when linking), and then generates the main.o
with all of the call
instructions… but because of the fact that it does not know the address of the functions to be called these addresses are left blank. These blanks will be filled by linker during relocation
process.
Here is a proof that all of them are empty before linking and have all of the addresses fixed up after linking
## Before linking - main.o
❯ objdump -M intel -D -j .text main.o | grep call
26: e8 00 00 00 00 call 2b <main+0x2b>
49: e8 00 00 00 00 call 4e <main+0x4e>
86: e8 00 00 00 00 call 8b <main+0x8b>
a3: e8 00 00 00 00 call a8 <main+0xa8>
c0: e8 00 00 00 00 call c5 <main+0xc5>
dd: e8 00 00 00 00 call e2 <main+0xe2>
f5: e8 00 00 00 00 call fa <main+0xfa>
123: e8 00 00 00 00 call 128 <main+0x128>
13c: e8 00 00 00 00 call 141 <main+0x141>
## After linking - calc
❯ objdump -M intel -D -j .text calc | grep call
1138: e8 63 ff ff ff call 10a0 <_start+0x30>
118f: e8 bc fe ff ff call 1050 <printf@plt>
11b2: e8 a9 fe ff ff call 1060 <__isoc99_scanf@plt>
11ef: e8 b8 00 00 00 call 12ac <addFunc>
120c: e8 b5 00 00 00 call 12c6 <subFunc>
1229: e8 b2 00 00 00 call 12e0 <mulFunc>
1246: e8 af 00 00 00 call 12fa <divFunc>
125e: e8 cd fd ff ff call 1030 <puts@plt>
128c: e8 bf fd ff ff call 1050 <printf@plt>
12a5: e8 96 fd ff ff call 1040 <__stack_chk_fail@plt>
Symbols and symbol tables
Now the question is that how does linker know which blanks to fill and how to fill them?? …here comes the role of symbols and symbol tables.
When writing a program, we often use “names” to reference “objects” in our code, like function “names” and variable “names”. These “names” are commonly referred to as symbols
. (yeah, deal with it now!)
Keep in mind that not all “names” are symbols. For example, a local variables to a function won’t be treated as symbols. If you think it through, you don’t need linker to handle that data so what’s the point of adding that info as a symbol, right?
Another worth noting thing is that unlike string tables, symbol tables have a well-defined structure, and both Glibc and the Linux kernel define a struct for this (Elf64_Sym
for 64-bit files).
/*
Glibc
https://sourceware.org/git/?p=glibc.git;a=blob;f=elf/elf.h;hb=2bd00179885928fd95fcabfafc50e7b5c6e660d2#l530
*/
typedef struct
{
Elf64_Word st_name; /* Symbol name (string tbl index) */
unsigned char st_info; /* Symbol type and binding */
unsigned char st_other; /* Symbol visibility */
Elf64_Section st_shndx; /* Section index */
Elf64_Addr st_value; /* Symbol value */
Elf64_Xword st_size; /* Symbol size */
} Elf64_Sym;
/*
Linux kernel v6.5.8
https://elixir.bootlin.com/linux/v6.5.8/source/include/uapi/linux/elf.h#L197
*/
typedef struct elf64_sym {
Elf64_Word st_name; /* Symbol name, index in string tbl */
unsigned char st_info; /* Type and binding attributes */
unsigned char st_other; /* No defined meaning, 0 */
Elf64_Half st_shndx; /* Associated section index */
Elf64_Addr st_value; /* Value of the symbol */
Elf64_Xword st_size; /* Associated symbol size */
} Elf64_Sym;
Let’s see what each member of this struct resembles
st_name
Similar to other name fields in the ELF specification, this member stores the index or offset in the associated string table.
st_info
This member represents a combined value derived from two different but related attributes: bind
and type
.
Both, Linux Kernel and glibc provide definitions and macros to work with this member.
1. Bind
The “bind” bits provide information about where this symbol can be seen and used… There are 3 kinds of binding defined by linux kernel
/*
https://elixir.bootlin.com/linux/v6.5.8/source/include/uapi/linux/elf.h#L123
*/
#define STB_LOCAL 0 /* not visible outside the object file */
#define STB_GLOBAL 1 /* visible to all object files */
#define STB_WEAK 2 /* like globals, but with lower precedence */
But glibc adds few more to this list
/*
https://sourceware.org/git/?p=glibc.git;a=blob;f=elf/elf.h;hb=2bd00179885928fd95fcabfafc50e7b5c6e660d2#l582
*/
#define STB_LOCAL 0 /* Local symbol */
#define STB_GLOBAL 1 /* Global symbol */
#define STB_WEAK 2 /* Weak symbol */
#define STB_NUM 3 /* Number of defined types. */
#define STB_LOOS 10 /* Start of OS-specific */
#define STB_GNU_UNIQUE 10 /* Unique symbol. */
#define STB_HIOS 12 /* End of OS-specific */
#define STB_LOPROC 13 /* Start of processor-specific */
#define STB_HIPROC 15 /* End of processor-specific */
Kernel and glibc both provide a macro to extract the bind
value from the provided st_info
member - #define ELF_ST_BIND(x) ((x) >> 4)
2. Type
type
bits tells about the type of symbol - function, file, variable, etc. One could say – A general classification for the symbol.
Linux kernel defines total 7 types
/*
https://elixir.bootlin.com/linux/v6.5.8/source/include/uapi/linux/elf.h#L128
*/
#define STT_NOTYPE 0 /* Unspecified */
#define STT_OBJECT 1 /* data objects like variables, arrays, etc*/
#define STT_FUNC 2 /* functions or other executable codes*/
#define STT_SECTION 3 /* associated with a section;
mainly used for relocations
(we'll see relocations in later articles)*/
#define STT_FILE 4 /* name of the source file*/
#define STT_COMMON 5 /* just like STT_OBJECT, but for tentative values */
#define STT_TLS 6 /* stores thread local data which is unique to each thread */
And again our beloved glibc expanded these definitions
/*
https://sourceware.org/git/?p=glibc.git;a=blob;f=elf/elf.h;hb=2bd00179885928fd95fcabfafc50e7b5c6e660d2#l595
*/
#define STT_NOTYPE 0 /* Symbol type is unspecified */
#define STT_OBJECT 1 /* Symbol is a data object */
#define STT_FUNC 2 /* Symbol is a code object */
#define STT_SECTION 3 /* Symbol associated with a section */
#define STT_FILE 4 /* Symbol's name is file name */
#define STT_COMMON 5 /* Symbol is a common data object */
#define STT_TLS 6 /* Symbol is thread-local data object*/
#define STT_NUM 7 /* Number of defined types. */
#define STT_LOOS 10 /* Start of OS-specific */
#define STT_GNU_IFUNC 10 /* Symbol is indirect code object */
#define STT_HIOS 12 /* End of OS-specific */
#define STT_LOPROC 13 /* Start of processor-specific */
#define STT_HIPROC 15 /* End of processor-specific */
Kernel and glibc both provide a macro to extract the type
value from the provided st_info
member - #define ELF_ST_TYPE(x) ((x) & 0xf)
st_other
If you examine the Elf64_Sym
struct in both the kernel and Glibc source code, you’ll notice that the kernel doesn’t currently have any use case for this field and marks it as such. However, Glibc uses this field to track the visibility of the symbol.
/*
https://sourceware.org/git/?p=glibc.git;a=blob;f=elf/elf.h;hb=2bd00179885928fd95fcabfafc50e7b5c6e660d2#l626
*/
#define STV_DEFAULT 0 /* Default symbol visibility rules - as specified by symbol binding*/
#define STV_INTERNAL 1 /* Processor specific hidden class */
#define STV_HIDDEN 2 /* Sym unavailable in other modules */
#define STV_PROTECTED 3 /* Not preemptible, not exported */
From what I understand, symbol visibility (yup, this is what glibc calls st_other
) extends the concept of symbol binding and provides more control over symbol access.
You can read more about this member from here 1 and here 2.
st_shndx
This attribute indicates the section associated with this symbol. It holds the section index corresponding to the sections in the section header.
st_value
Indeed, each symbol should have both a name and an associated value. This member holds the value associated with the respective symbol.
st_size
Many symbols come with associated sizes, for function type symbols this will be the size of that function. If a symbol doesn’t have a size or its size is unknown, this member holds a value of zero.
Analysis
Now that we have a foundational understanding, we can apply this knowledge to analyze our previous files.
1. libarithmatic.o
To keep things straightforward, I’ll begin by listing all the sections in the libarithmatic.o
file. (This is the output from my parser, you can use hexdumps or any other parser of your choice…)
[ 00 ] Section Name: Type: 0x0 Flags: 0x0 Addr: 0x0 Offset: 0x0 Size: 0 Link: 0 Info: 0x0 Addralign: 0x0 Entsize: 0
[ 01 ] Section Name: .text Type: 0x1 Flags: 0x6 Addr: 0x0 Offset: 0x40 Size: 130 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0
[ 02 ] Section Name: .data Type: 0x1 Flags: 0x3 Addr: 0x0 Offset: 0xc2 Size: 0 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0
[ 03 ] Section Name: .bss Type: 0x8 Flags: 0x3 Addr: 0x0 Offset: 0xc2 Size: 0 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0
[ 04 ] Section Name: .comment Type: 0x1 Flags: 0x30 Addr: 0x0 Offset: 0xc2 Size: 28 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 1
[ 05 ] Section Name: .note.GNU-stack Type: 0x1 Flags: 0x0 Addr: 0x0 Offset: 0xde Size: 0 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0
[ 06 ] Section Name: .note.gnu.property Type: 0x7 Flags: 0x2 Addr: 0x0 Offset: 0xe0 Size: 48 Link: 0 Info: 0x0 Addralign: 0x8 Entsize: 0
[ 07 ] Section Name: .eh_frame Type: 0x1 Flags: 0x2 Addr: 0x0 Offset: 0x110 Size: 152 Link: 0 Info: 0x0 Addralign: 0x8 Entsize: 0
[ 08 ] Section Name: .rela.eh_frame Type: 0x4 Flags: 0x40 Addr: 0x0 Offset: 0x288 Size: 96 Link: 9 Info: 0x7 Addralign: 0x8 Entsize: 24
[ 09 ] Section Name: .symtab Type: 0x2 Flags: 0x0 Addr: 0x0 Offset: 0x1a8 Size: 168 Link: 10 Info: 0x3 Addralign: 0x8 Entsize: 24
[ 10 ] Section Name: .strtab Type: 0x3 Flags: 0x0 Addr: 0x0 Offset: 0x250 Size: 49 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0
[ 11 ] Section Name: .shstrtab Type: 0x3 Flags: 0x0 Addr: 0x0 Offset: 0x2e8 Size: 103 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0
Now we can easily filter out the symbol table from this (Type: 0x2
)
[ 09 ] Section Name: .symtab Type: 0x2 Flags: 0x0 Addr: 0x0 Offset: 0x1a8 Size: 168 Link: 10 Info: 0x3 Addralign: 0x8 Entsize: 24
If you go back and revisit the article about section headers and check the explaination about members, you’ll be able to conclude this – .symtab
section is linked to .strtab
section. So the offset values from st_name
of symbol table can be resolved to proper strings using this string table.
┌─────────────────────────────────┐
│ │
│ [ 09 ] Section Name: .symtab │
│ Type: 0x2 │
│ Flags: 0x0 │
│ Addr: 0x0 │
│ Offset: 0x1a8 │
│ Size: 168 │
┌────┼────────── Link: 10 │
│ │ Info: 0x3 │
│ │ Addralign: 0x8 │
│ │ Entsize: 24 │
│ │ │
│ │ │
│ └─────────────────────────────────┘
│
│
│
│
│ ┌─────────────────────────────────┐
│ │ │
└────┤► [ 10 ] Section Name: .strtab │
│ Type: 0x3 │
│ Flags: 0x0 │
│ Addr: 0x0 │
│ Offset: 0x250 │
│ Size: 49 │
│ Link: 0 │
│ Info: 0x0 │
│ Addralign: 0x1 │
│ Entsize: 0 │
│ │
│ │
└─────────────────────────────────┘
Now we can begin with the interesting stuff and the first step will be to pull out the .symtab
section and parse it.
############ Explaination #################
#
# xxd
# -s 0x1a8 # start point (Offset: 0x1a8)
# -l 168 # total length (Size: 168)
# -c 24 # bytes per line (Entsize: 24) - I wanted to get each entry in a single line for uniformity
# libarithmatic.o # filename
# | nl -v0 # line numbers starting from 0
#
#############################################
❯ xxd -s 0x1a8 -l 168 -c 24 libarithmatic.o | nl -v0
0 000001a8: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 ........................
1 000001c0: 0100 0000 0400 f1ff 0000 0000 0000 0000 0000 0000 0000 0000 ........................
2 000001d8: 0000 0000 0300 0100 0000 0000 0000 0000 0000 0000 0000 0000 ........................
3 000001f0: 1100 0000 1200 0100 0000 0000 0000 0000 1a00 0000 0000 0000 ........................
4 00000208: 1900 0000 1200 0100 1a00 0000 0000 0000 1a00 0000 0000 0000 ........................
5 00000220: 2100 0000 1200 0100 3400 0000 0000 0000 1a00 0000 0000 0000 !.......4...............
6 00000238: 2900 0000 1200 0100 4e00 0000 0000 0000 3400 0000 0000 0000 ).......N.......4.......
If we parse this data using the struct Elf64_Sym
, we’ll get something like this
typedef struct {
+------------------------------Elf64_Word st_name;
|
| +---------------------unsigned char st_info;
| |
| | +---------------unsigned char st_other;
| | |
| | | +----------Elf64_Section st_shndx;
| | | |
| | | | Elf64_Addr st_value;----+
| | | | |
| | | | Elf64_Xword st_size;-----+-----------------+
| | | | | |
| | | | } Elf64_Sym; | |
| | | | | |
| | | | | |
| | | +-------------------+ | |
| | | | | |
| | +------------------+ | | |
| | | | | |
| +-------------------+ | | | |
| | | | | |
+-------------------+ | | | | |
| | | | | |
v v v v v v
Index | Offset |
0 | 000001a8:| 0000 0000 | 00 | 00 | 0000 | 0000 0000 0000 0000 | 0000 0000 0000 0000 |
1 | 000001c0:| 0100 0000 | 04 | 00 | f1ff | 0000 0000 0000 0000 | 0000 0000 0000 0000 |
2 | 000001d8:| 0000 0000 | 03 | 00 | 0100 | 0000 0000 0000 0000 | 0000 0000 0000 0000 |
3 | 000001f0:| 1100 0000 | 12 | 00 | 0100 | 0000 0000 0000 0000 | 1a00 0000 0000 0000 |
4 | 00000208:| 1900 0000 | 12 | 00 | 0100 | 1a00 0000 0000 0000 | 1a00 0000 0000 0000 |
5 | 00000220:| 2100 0000 | 12 | 00 | 0100 | 3400 0000 0000 0000 | 1a00 0000 0000 0000 |
6 | 00000238:| 2900 0000 | 12 | 00 | 0100 | 4e00 0000 0000 0000 | 3400 0000 0000 0000 |
From my parser, I got this result
[ 0 ] Name: Info: 0x00 (Bind: 0x0 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0
[ 1 ] Name: libarithmatic.c Info: 0x04 (Bind: 0x0 | Type: 0x4) Other: 0x0 Shndx: 0xfff1 Value: 0x000000000000 Size: 0x0
[ 2 ] Name: Info: 0x03 (Bind: 0x0 | Type: 0x3) Other: 0x0 Shndx: 0x1 Value: 0x000000000000 Size: 0x0
[ 3 ] Name: addFunc Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0x1 Value: 0x000000000000 Size: 0x1a
[ 4 ] Name: subFunc Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0x1 Value: 0x00000000001a Size: 0x1a
[ 5 ] Name: mulFunc Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0x1 Value: 0x000000000034 Size: 0x1a
[ 6 ] Name: divFunc Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0x1 Value: 0x00000000004e Size: 0x34
For the sake of simplicity and the scope of this article, I’ll focus on discussing the four functions in this table and leave the rest for you to explore and learn.
We can observe that the st_info
value for all of these symbols is the same, which implies that their “bind” and “type” values are identical (duhh). According to the information we’ve gathered, these symbols are GLOBAL
(bind=0x1) and of FUNC
(type=0x2) type. This indicates that these symbols are basically global functions and can be called from other files as well.
It’s worth noting that there’s a very cool tool called "ftrace" by elfmaster, which utilizes this information to trace function calls, specifically focusing on function calls and not other symbols.
Furthermore, the st_other
field is empty for these members, indicating default
symbol visibility. There’s nothing noteworthy to discuss here.
So we move on to the sh_shndx
(section index) member. This member tells us that all of these symbols are associated with section 0x1
(which is .text
, and that does make sense – Code of these functions should be in .text
section only).
The st_value
field indicates the offset within the .text
section at which these functions begin. So, if you start executing instructions from offset 0x34
in the .text
section, you’ll be running the mulFunc
function. Makes sense??
The linker will perform relocation
on the object files and generate a final executable binary that will have all the values in correct places. At that point we won’t need the mulFunc
string in our ELF file.
Last but not least, the st_size
field provides the size of the function. This helps the magical entity reading the code determine when to stop and understand the boundaries of the function.
2. main.o
Performing the same initial process for the main.o
file, you will be able yield its symbol table, as shown below.
[ 11 ] Section Name: .symtab Type: 0x2 Flags: 0x0 Addr: 0x0 Offset: 0x248 Size: 312 Link: 12 Info: 0x4 Addralign: 0x8 Entsize: 24
[ 0 ] Name: Info: 0x00 (Bind: 0x0 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0
[ 1 ] Name: main.c Info: 0x04 (Bind: 0x0 | Type: 0x4) Other: 0x0 Shndx: 0xfff1 Value: 0x000000000000 Size: 0x0
[ 2 ] Name: Info: 0x03 (Bind: 0x0 | Type: 0x3) Other: 0x0 Shndx: 0x1 Value: 0x000000000000 Size: 0x0
[ 3 ] Name: Info: 0x03 (Bind: 0x0 | Type: 0x3) Other: 0x0 Shndx: 0x5 Value: 0x000000000000 Size: 0x0
[ 4 ] Name: main Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0x1 Value: 0x000000000000 Size: 0x143
[ 5 ] Name: printf Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0
[ 6 ] Name: __isoc99_scanf Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0
[ 7 ] Name: addFunc Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0
[ 8 ] Name: subFunc Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0
[ 9 ] Name: mulFunc Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0
[ 10 ] Name: divFunc Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0
[ 11 ] Name: puts Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0
[ 12 ] Name: __stack_chk_fail Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0
In this case, things get a bit more interesting. Let’s begin with the same set of symbols: addFunc
, subFunc
, mulFunc
, and divFunc
.
You’ll notice that these symbols are global, but they don’t have any associated types. This is expected since the symbols are not defined in this file; they are just being called. At this stage, we’re not certain if there’s anything like these symbols elsewhere, which is why all the other members are zeroed out (undefined). This essentially instructs the magical linker to locate the values of these symbols (linkers are pretty good at this; they will give errors if the symbols aren’t found).
Now, you’ll also notice the presence of printf
and puts
symbols. This may raise a question: “I didn’t use puts in my code, so why is it there?”
Answer: It’s compiler magic! The compiler observed that the line printf("Enter equation (9 * 6): ");
could be expressed as puts("Enter equation (9 * 6): ");
, so it made this conversion during compilation. To confirm this, you can generate the compiled code using gcc -S
and check the call
to puts
function.
Now, let’s examine our mighty main
symbol. The st_info
indicates that it’s a GLOBAL
function
(with bind=0x1
and type=0x2
). This function is located in the 1st section (sh_shndx: 0x1
) of main.o
, which in our case is the .text
section. The function begins at offset 0x0
, and its size is 0x143
. Pretty simple, right?
(Note: I’m leaving __isoc99_scanf
and __stack_chk_fail
for you. Google them!)
3. calc
This represents the ultimate outcome of the entire compilation, assembly, and linking process – the final ELF executable binary. However, the process to obtain its symbol table remains same.
Here is the symtab
for this ELF binary
[ 27 ] Section Name: .symtab Type: 0x2 Flags: 0x0 Addr: 0x0 Offset: 0x3050 Size: 768 Link: 28 Info: 0x7 Addralign: 0x8 Entsize: 24
[ 0 ] Name: Info: 0x00 (Bind: 0x0 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0
[ 1 ] Name: main.c Info: 0x04 (Bind: 0x0 | Type: 0x4) Other: 0x0 Shndx: 0xfff1 Value: 0x000000000000 Size: 0x0
[ 2 ] Name: libarithmatic.c Info: 0x04 (Bind: 0x0 | Type: 0x4) Other: 0x0 Shndx: 0xfff1 Value: 0x000000000000 Size: 0x0
[ 3 ] Name: Info: 0x04 (Bind: 0x0 | Type: 0x4) Other: 0x0 Shndx: 0xfff1 Value: 0x000000000000 Size: 0x0
[ 4 ] Name: _DYNAMIC Info: 0x01 (Bind: 0x0 | Type: 0x1) Other: 0x0 Shndx: 0x15 Value: 0x000000003de0 Size: 0x0
[ 5 ] Name: __GNU_EH_FRAME_HDR Info: 0x00 (Bind: 0x0 | Type: 0x0) Other: 0x0 Shndx: 0x11 Value: 0x000000002048 Size: 0x0
[ 6 ] Name: _GLOBAL_OFFSET_TABLE_ Info: 0x01 (Bind: 0x0 | Type: 0x1) Other: 0x0 Shndx: 0x17 Value: 0x000000003fe8 Size: 0x0
[ 7 ] Name: __libc_start_main@GLIBC_2.34 Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0
[ 8 ] Name: _ITM_deregisterTMCloneTable Info: 0x32 (Bind: 0x2 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0
[ 9 ] Name: data_start Info: 0x32 (Bind: 0x2 | Type: 0x0) Other: 0x0 Shndx: 0x18 Value: 0x000000004020 Size: 0x0
[ 10 ] Name: subFunc Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0xe Value: 0x0000000012c6 Size: 0x1a
[ 11 ] Name: puts@GLIBC_2.2.5 Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0
[ 12 ] Name: _edata Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x18 Value: 0x000000004030 Size: 0x0
[ 13 ] Name: _fini Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x2 Shndx: 0xf Value: 0x000000001330 Size: 0x0
[ 14 ] Name: __stack_chk_fail@GLIBC_2.4 Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0
[ 15 ] Name: printf@GLIBC_2.2.5 Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0
[ 16 ] Name: addFunc Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0xe Value: 0x0000000012ac Size: 0x1a
[ 17 ] Name: __data_start Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x18 Value: 0x000000004020 Size: 0x0
[ 18 ] Name: __gmon_start__ Info: 0x32 (Bind: 0x2 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0
[ 19 ] Name: __dso_handle Info: 0x17 (Bind: 0x1 | Type: 0x1) Other: 0x2 Shndx: 0x18 Value: 0x000000004028 Size: 0x0
[ 20 ] Name: _IO_stdin_used Info: 0x17 (Bind: 0x1 | Type: 0x1) Other: 0x0 Shndx: 0x10 Value: 0x000000002000 Size: 0x4
[ 21 ] Name: divFunc Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0xe Value: 0x0000000012fa Size: 0x34
[ 22 ] Name: _end Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x19 Value: 0x000000004038 Size: 0x0
[ 23 ] Name: _start Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0xe Value: 0x000000001070 Size: 0x26
[ 24 ] Name: __bss_start Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x19 Value: 0x000000004030 Size: 0x0
[ 25 ] Name: mulFunc Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0xe Value: 0x0000000012e0 Size: 0x1a
[ 26 ] Name: main Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0xe Value: 0x000000001169 Size: 0x143
[ 27 ] Name: __isoc99_scanf@GLIBC_2.7 Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0
[ 28 ] Name: __TMC_END__ Info: 0x17 (Bind: 0x1 | Type: 0x1) Other: 0x2 Shndx: 0x18 Value: 0x000000004030 Size: 0x0
[ 29 ] Name: _ITM_registerTMCloneTable Info: 0x32 (Bind: 0x2 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0
[ 30 ] Name: __cxa_finalize@GLIBC_2.2.5 Info: 0x34 (Bind: 0x2 | Type: 0x2) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0
[ 31 ] Name: _init Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x2 Shndx: 0xc Value: 0x000000001000 Size: 0x0
The linking process did introduce numerous symbols that exceed the combined count of symbols in both individual object files. To keep things simple (* once again *), we won’t dive into the specifics of what these additional symbols do, and we can think of them as a result of linker magic.
Our primary focus for now remains on the symbols and their properties, even if we don’t have detailed knowledge of their functions.
These are the symbols we defined ourselves…
[ 10 ] Name: subFunc Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0xe Value: 0x0000000012c6 Size: 0x1a
[ 16 ] Name: addFunc Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0xe Value: 0x0000000012ac Size: 0x1a
[ 21 ] Name: divFunc Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0xe Value: 0x0000000012fa Size: 0x34
[ 25 ] Name: mulFunc Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0xe Value: 0x0000000012e0 Size: 0x1a
[ 26 ] Name: main Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0xe Value: 0x000000001169 Size: 0x143
We can observe the similarities in various members between libarithmatic.o
and main.o
. The notable difference I can identify is the sh_shndx
value, which has changed but still points to the .text
section of calc
file. The important point is that it should reference the .text
section, regardless of the section index value.
Another difference is in the st_value
. With the addition of numerous new symbols in this file, the positions of these symbols have shifted. Initially, we had the main
function in main.o
and addFunc
in libarithmatic.o
, both at offset 0x0
. However, when combining them into a single file, one of them had to adjust its offset to make room for the other. This is precisely what occurred here, and there are also other symbols (of function type) that occupied the initial offsets, causing our defined functions to compromise on their offsets.
One more intriguing detail is the _start
symbol, which has an offset of 0x000000001070
. This offset serves as the entry point of our ELF executable binary. You can verify this using readelf
or any method you prefer. If you happen to overwrite the entrypoint
value in ELF file headers
, you’ll be calling some other function instead of _start
function of glibc
. Since _start
function performs some startup actions for C runtime environment, so the modified binary may or may not work as intended.
I’m sure that’s enough for today, ta-ta!