In preceding articles, we’ve delved into the details of ELF file headers and section headers. Section headers provide insight into how data and instructions are organized based on their characteristics and grouped into distinct sections. These sections remain distinct due to variations in their types and permissions (… and few other things).
Up to this point, our focus has been on the aspects of the ELF file as it resides on-disk. However, we now turn our attention to what occurs when the file is loaded into memory. How is its arrangement handled? Are all the sections loaded into memory?
This is where the concept of program headers comes into play. Program headers are similar to section headers, but instead of section information, they store segment information. A segment encompasses one or more sections from the ELF file. While program headers hold little significance while the file is on disk, they become imperative when the file needs to be loaded and executed in memory, specifically in the case of executables and shared objects.
Some criteria for grouping sections to form segments can be:
- Type and purpose of the sections (like
.data
and.bss
), - Memory Access Permissions and mapping,
- Alignment and Layout,
- Segment size constraints,
- OS and platform requirements, etc
For this article, I’ll be using the same C code to generate an ELF file
/*
File: hello_world.c
Compile: gcc hello_world.c -o hello_world
*/
#include <stdio.h>
// A macro
#define HELLO_MSG1 "Hello World1"
// A global variable
char HELLO_MSG2[] = "Hello World2";
// main function
int main() {
// local variable for main
char HELLO_MSG3[] = "Hello World3";
// Print messages
printf("%s\n", HELLO_MSG1);
printf("%s\n", HELLO_MSG2);
printf("%s\n", HELLO_MSG3);
return 0;
}
Once you have the ELF file, you can get the program header related information from ELF file headers - e_phoff
, e_phentsize
and e_phnum
I’ll use readelf to get this information from the ELF headers. Feel free to use any method of your choice.
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Position-Independent Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x1040
Start of program headers: 64 (bytes into file)
Start of section headers: 13496 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 13
Size of section headers: 64 (bytes)
Number of section headers: 30
Section header string table index: 29
From the the output above, we can deduce that
- the program headers are located at offset of
64
bytes, - each of these header entries is
56
bytes in size, - and in total, we’ve got
13
entries
Now we can use xxd to get the data out
❯ xxd -s 64 -l $(( 54*13 )) -c 54 build/hello
00000040: 0600 0000 0400 0000 4000 0000 0000 0000 4000 0000 0000 0000 4000 0000 0000 0000 d802 0000 0000 0000 d802 0000 0000 0000 0800 0000 0000 ........@.......@.......@.............................
00000076: 0000 0300 0000 0400 0000 1803 0000 0000 0000 1803 0000 0000 0000 1803 0000 0000 0000 1c00 0000 0000 0000 1c00 0000 0000 0000 0100 0000 ......................................................
000000ac: 0000 0000 0100 0000 0400 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 3006 0000 0000 0000 3006 0000 0000 0000 0010 ....................................0.......0.........
000000e2: 0000 0000 0000 0100 0000 0500 0000 0010 0000 0000 0000 0010 0000 0000 0000 0010 0000 0000 0000 8901 0000 0000 0000 8901 0000 0000 0000 ......................................................
00000118: 0010 0000 0000 0000 0100 0000 0400 0000 0020 0000 0000 0000 0020 0000 0000 0000 0020 0000 0000 0000 b400 0000 0000 0000 b400 0000 0000 ................. ....... ....... ....................
0000014e: 0000 0010 0000 0000 0000 0100 0000 0600 0000 d02d 0000 0000 0000 d03d 0000 0000 0000 d03d 0000 0000 0000 4802 0000 0000 0000 5002 0000 ...................-.......=.......=......H.......P...
00000184: 0000 0000 0010 0000 0000 0000 0200 0000 0600 0000 e02d 0000 0000 0000 e03d 0000 0000 0000 e03d 0000 0000 0000 e001 0000 0000 0000 e001 .....................-.......=.......=................
000001ba: 0000 0000 0000 0800 0000 0000 0000 0400 0000 0400 0000 3803 0000 0000 0000 3803 0000 0000 0000 3803 0000 0000 0000 4000 0000 0000 0000 ......................8.......8.......8.......@.......
000001f0: 4000 0000 0000 0000 0800 0000 0000 0000 0400 0000 0400 0000 7803 0000 0000 0000 7803 0000 0000 0000 7803 0000 0000 0000 4400 0000 0000 @.......................x.......x.......x.......D.....
00000226: 0000 4400 0000 0000 0000 0400 0000 0000 0000 53e5 7464 0400 0000 3803 0000 0000 0000 3803 0000 0000 0000 3803 0000 0000 0000 4000 0000 ..D...............S.td....8.......8.......8.......@...
0000025c: 0000 0000 4000 0000 0000 0000 0800 0000 0000 0000 50e5 7464 0400 0000 1420 0000 0000 0000 1420 0000 0000 0000 1420 0000 0000 0000 2400 ....@...............P.td..... ....... ....... ......$.
00000292: 0000 0000 0000 2400 0000 0000 0000 0400 0000 0000 0000 51e5 7464 0600 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 ......$...............Q.td............................
000002c8: 0000 0000 0000 0000 0000 0000 0000 0000 1000 0000 0000 0000 52e5 7464 0400 0000 d02d 0000 0000 0000 d03d 0000 0000 0000 d03d 0000 0000 ........................R.td.....-.......=.......=....
Now we just have to map each of these lines to Elf64_Phdr
(since we have a 64Bit file)
/*
https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/elf.h#L260
*/
typedef struct elf64_phdr {
Elf64_Word p_type; /* Segment type */
Elf64_Word p_flags; /* Segment flags */
Elf64_Off p_offset; /* Segment file offset */
Elf64_Addr p_vaddr; /* Segment virtual address */
Elf64_Addr p_paddr; /* Segment physical address */
Elf64_Xword p_filesz; /* Segment size in file */
Elf64_Xword p_memsz; /* Segment size in memory */
Elf64_Xword p_align; /* Segment alignment, file & memory */
} Elf64_Phdr;
Using my nifty little parser, I got this digestible and user-friendly output for the above dump (Feel free to compare it)
[ + ] Program headers begins at: 0x40
[ 00 ] Type: 0x6 Flags: 0x4 Offset: 0x0040 vaddr: 0x40 paddr: 0x40 filesz: 0x728 memsz: 0x728 align: 0x8
[ 01 ] Type: 0x3 Flags: 0x4 Offset: 0x0318 vaddr: 0x318 paddr: 0x318 filesz: 0x28 memsz: 0x28 align: 0x1
[ 02 ] Type: 0x1 Flags: 0x4 Offset: 0x0000 vaddr: 0x0 paddr: 0x0 filesz: 0x1584 memsz: 0x1584 align: 0x1000
[ 03 ] Type: 0x1 Flags: 0x5 Offset: 0x1000 vaddr: 0x1000 paddr: 0x1000 filesz: 0x393 memsz: 0x393 align: 0x1000
[ 04 ] Type: 0x1 Flags: 0x4 Offset: 0x2000 vaddr: 0x2000 paddr: 0x2000 filesz: 0x180 memsz: 0x180 align: 0x1000
[ 05 ] Type: 0x1 Flags: 0x6 Offset: 0x2dd0 vaddr: 0x3dd0 paddr: 0x3dd0 filesz: 0x584 memsz: 0x592 align: 0x1000
[ 06 ] Type: 0x2 Flags: 0x6 Offset: 0x2de0 vaddr: 0x3de0 paddr: 0x3de0 filesz: 0x480 memsz: 0x480 align: 0x8
[ 07 ] Type: 0x4 Flags: 0x4 Offset: 0x0338 vaddr: 0x338 paddr: 0x338 filesz: 0x64 memsz: 0x64 align: 0x8
[ 08 ] Type: 0x4 Flags: 0x4 Offset: 0x0378 vaddr: 0x378 paddr: 0x378 filesz: 0x68 memsz: 0x68 align: 0x4
[ 09 ] Type: 0xe553 Flags: 0x4 Offset: 0x0338 vaddr: 0x338 paddr: 0x338 filesz: 0x64 memsz: 0x64 align: 0x8
[ 10 ] Type: 0xe550 Flags: 0x4 Offset: 0x2014 vaddr: 0x2014 paddr: 0x2014 filesz: 0x36 memsz: 0x36 align: 0x4
[ 11 ] Type: 0xe551 Flags: 0x6 Offset: 0x0000 vaddr: 0x0 paddr: 0x0 filesz: 0x0 memsz: 0x0 align: 0x10
[ 12 ] Type: 0xe552 Flags: 0x4 Offset: 0x2dd0 vaddr: 0x3dd0 paddr: 0x3dd0 filesz: 0x560 memsz: 0x560 align: 0x1
Now, it’s time to take a deep dive into the inner workings of the Elf64_Phdr
struct
1. p_type
Just like sh_type
, this member tells the type of the segment. Whether the segment will be loaded in the memory or is it just used to store notes.
/*
https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/elf.h#L25
*/
/* These constants are for the segment types stored in the image headers */
#define PT_NULL 0
#define PT_LOAD 1
#define PT_DYNAMIC 2
#define PT_INTERP 3
#define PT_NOTE 4
#define PT_SHLIB 5
#define PT_PHDR 6
#define PT_TLS 7 /* Thread local storage segment */
#define PT_LOOS 0x60000000 /* OS-specific */
#define PT_HIOS 0x6fffffff /* OS-specific */
#define PT_LOPROC 0x70000000
#define PT_HIPROC 0x7fffffff
#define PT_GNU_EH_FRAME (PT_LOOS + 0x474e550)
#define PT_GNU_STACK (PT_LOOS + 0x474e551)
#define PT_GNU_RELRO (PT_LOOS + 0x474e552)
#define PT_GNU_PROPERTY (PT_LOOS + 0x474e553)
2. p_flags
This is quite similar to the the (r)ead
, (w)rite
and e(x)ecute
permissions we are familiar with. This member specifies the permissions for the given segment.
Usually the segment containing the .text
section will have (r)ead
and e(x)ecute
permissions.
/*
https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/elf.h#L243
*/
/* These constants define the permissions on sections in the program
header, p_flags. */
#define PF_R 0x4
#define PF_W 0x2
#define PF_X 0x1
3. p_offset
This holds the offset from the beginning of the file, where the first byte of the first section in this segment is located.
4. p_vaddr
This member holds the memory/virtual address for the segment.
5. p_paddr
This is same as p_vaddr
, but holds the physical/on-disk address for the segment.
6. p_filesz
This holds the on-disk size (in bytes) of the segment.
7. p_memsz
This member holds the memory/virtual size (in bytes) of the segment.
8. p_align
This member holds the value to which the segments are aligned in memory and in the file.
Similar to sh_addralign
, value of 0
and 1
are treated as “no alignment”, while the positive powers of 2
are taken as the actual alignment values.
Practicals
Let’s start with checking if strip
command makes any change to the program headers.
- Try to write a program to parse the program headers and display the information in better way.
- Try to write a program that gives the information about what sections are grouped together in a segment.
readelf
gives this information in below format
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000000040 0x0000000000000040 0x0002d8 0x0002d8 R 0x8
INTERP 0x000318 0x0000000000000318 0x0000000000000318 0x00001c 0x00001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2 ]
LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x000630 0x000630 R 0x1000
LOAD 0x001000 0x0000000000001000 0x0000000000001000 0x000189 0x000189 R E 0x1000
LOAD 0x002000 0x0000000000002000 0x0000000000002000 0x0000b4 0x0000b4 R 0x1000
LOAD 0x002dd0 0x0000000000003dd0 0x0000000000003dd0 0x000248 0x000250 RW 0x1000
DYNAMIC 0x002de0 0x0000000000003de0 0x0000000000003de0 0x0001e0 0x0001e0 RW 0x8
NOTE 0x000338 0x0000000000000338 0x0000000000000338 0x000040 0x000040 R 0x8
NOTE 0x000378 0x0000000000000378 0x0000000000000378 0x000044 0x000044 R 0x4
GNU_PROPERTY 0x000338 0x0000000000000338 0x0000000000000338 0x000040 0x000040 R 0x8
GNU_EH_FRAME 0x002014 0x0000000000002014 0x0000000000002014 0x000024 0x000024 R 0x4
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10
GNU_RELRO 0x002dd0 0x0000000000003dd0 0x0000000000003dd0 0x000230 0x000230 R 0x1
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.gnu.property .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
03 .init .plt .text .fini
04 .rodata .eh_frame_hdr .eh_frame
05 .init_array .fini_array .dynamic .got .got.plt .data .bss
06 .dynamic
07 .note.gnu.property
08 .note.gnu.build-id .note.ABI-tag
09 .note.gnu.property
10 .eh_frame_hdr
11
12 .init_array .fini_array .dynamic .got
If you want to go extra mile and dig deep,
- Try overwriting the program interpreter with your custom loader program. Things will probably go wrong and then you can dig deep what’s the root cause.
- Add a new section (
.text
type), create it’s section header entry, then create it’s program header entry such that it is loadable in memory. Then change the ELF entrypoint to the newly created section.
Conclusion
Alright, buckle up, because we have just seen what segments are, how sections are grouped into segments, and how program headers act as a table to store information about segments which is helpful for runtime. Picture this -
┌───────────────────────────┐
│ │
│ File Header │
│ │
│ │
├───────────────────────────┤
│ │
│ Program Header │
│ │
│ │
├───────────────────────────┤ ◄───┐
│ │ │
│ │ │
│ Section 1 │ │
│ │ │
├───────────────────────────┤ │ Segment 1
│ Section 2 │ │
├───────────────────────────┤ │
│ │ │
│ Section 3 │ │
├───────────────────────────┤ ◄───┤
│ │ │
│ │ │
│ │ │
│ │ │ Segment 2
│ │ │
│ Section 4 │ │
│ │ │
│ │ ◄───┤
│ │ │ Segment 3
│ │ │
├───────────────────────────┤ ◄───┤
│ │ │
│ │ │
│ Section 5 │ │
│ │ │ Segment 4
│ │ │
│ │ │
├───────────────────────────┤ ◄───┤
│ │ │
│ │ │
│ Section 6 │ │ Segment 5
│ │ │
│ │ │
├───────────────────────────┤ ◄───┘
│ │
│ │
│ Section Header │
│ │
│ │
└───────────────────────────┘