×
Community Blog Discussion on the Dynamic-link Library Mechanism under Linux

Discussion on the Dynamic-link Library Mechanism under Linux

This article will delve into the dynamic-link library mechanism in Linux, including but not limited to global symbol interposition, lazy binding, and position-independent code.

By Yumu

Overview

In the process of software development, problems often occur with the dynamic library linking, which may lead to symbol conflicts, causing program abnormalities or crashes. To understand the dynamic linking mechanism and its working principle, the author reviewed Self-cultivation of Programmers and learned about the dynamic linking process through practical demonstration and disassembly analysis.

This article will probe into the dynamic-link library mechanism in Linux, including but not limited to global symbol interposition, lazy binding, and position-independent code (PIC). Through the discussion of the above concepts and technical details, it is hoped to provide a clearer cognitive framework to reveal the hidden essential reasons behind symbol conflicts. In this way, when encountering similar problems in the actual software development process, developers can take measures to prevent or solve them more easily, ensuring the stable operation of the program while improving the overall quality and user experience.

For the convenience of readers, basic concepts mentioned in this article, such as ELF, PIC, GOT, PLT, and commonly used sections, are summarized in the appendix.

1. Example

Through a simple C language program, we will explore the operation mechanism of dynamic-link libraries within and between modules, which involves the interaction between variables and functions. Moreover, we will use the -fPIC option to ensure that position-independent code is generated.

#include <stdio.h>

// The static variable a is only visible in this module.
static int a;

// Declare the external global variable b with extern.
extern int b;

// The global variable c accessed in this module.
int c = 3;

// Declare the external function ext().
extern void ext();

// The scope of the static function inner() is limited to this module.
static void inner() {}

// The bar() function modifies the static variable a and the external global variable b.
void bar() {
  a = 1; // Modify the value of the static variable a.
  b = 2; // Modify the value of the external global variable b.
  c = 4; // Modify the value of the global variable c in the module.
}

// The inner, bar, and ext are called in the foo() function, and the variable values are printed.
void foo() {
  inner(); // Call the static function inner().
  bar();   // Call the function bar().
  ext();   // Call the external function ext().
  printf("a = %d, b = %d, c = %d\n", a, b, c); // Output the value of the variable.
}



// Define the external global variable b.
int b = 1;

// The external function ext() modifies the value of the external global variable b.
void ext() {
  b = 3; // Modify the value of the external global variable b.
}

// main.c
int main() {
  foo(); // Call the foo() function to demonstrate the interaction between modules.
  return 0; // The program ends normally.
}
gcc -shared -fPIC -o libpic.so pic.c -g
gcc -o main main.c -L. -lpic

In this code example, the -fPIC compilation option can generate position-independent code that is suitable for creating shared libraries. The code contains multiple scenarios:

Intra-module function calls: The inner and bar functions are called in the foo function. Since inner is a static function, its scope is limited to this module. The bar function operates on the static variable a and the global variable c in the module.

Inter-module function calls: The foo function calls the external function ext, which is defined in another module. The ext function is responsible for modifying the external global variable b.

Different types of variables:

• The static variable a is only visible in this module. Its value is not changed in other modules of the program, nor is it lost due to function calls.

• The external global variable b can be shared among multiple modules. Its value is unique and changeable throughout the program.

• The global variable c within the module can only be accessed and modified in the current module.

We all know that dynamic-link libraries need to share the same piece of code between multiple processes. To achieve this goal, the code must be position-independent so that it can be linked to different addresses as needed when loaded, and the -fPIC compilation option can be added to generate position-independent code. How do we implement this when these functions and variables are running? Next, the process of dynamic linking will be analyzed step by step.

2. In-depth Dynamic-link Library Analysis Through Examples

2.1 Intra-module Function Calls

In the example, there are two function calls in the foo function implementation: the static function inner() and the non-static function bar(). The result after disassembly is as follows.

Disassembly of section .plt:

0000000000000670 <bar@plt-0x10>:
 670:  ff 35 92 09 20 00      push   QWORD PTR [rip+0x200992]        # 201008 <_GLOBAL_OFFSET_TABLE_+0x8>
 676:  ff 25 94 09 20 00      jmp    QWORD PTR [rip+0x200994]        # 201010 <_GLOBAL_OFFSET_TABLE_+0x10>
 67c:  0f 1f 40 00            nop    DWORD PTR [rax+0x0]

0000000000000680 <bar@plt>:
 680:  ff 25 92 09 20 00      jmp    QWORD PTR [rip+0x200992]        # 201018 <_GLOBAL_OFFSET_TABLE_+0x18>
 686:  68 00 00 00 00         push   0x0
 68b:  e9 e0 ff ff ff         jmp    670 <_init+0x20>
...
00000000000007e8 <foo>:
foo():

00000000000007e2 <inner>:
inner():
/mnt/share/demo1/pic.c:12

static void inner() {}
 7e2:  55                     push   rbp
 7e3:  48 89 e5               mov    rbp,rsp
 7e6:  5d                     pop    rbp
 7e7:  c3                     ret
...
/mnt/share/demo1/pic.c:15
  inner();
 7ec:  b8 00 00 00 00         mov    eax,0x0
 7f1:  e8 ec ff ff ff         call   7e2 <inner>
/mnt/share/demo1/pic.c:16
  bar();
 7f6:  b8 00 00 00 00         mov    eax,0x0
 7fb:  e8 80 fe ff ff         call   680 <bar@plt>

2.1.1 Static function call: inner() function call

It is similar to static compilation relocation but is simpler here, as follows:

7f1: e8 ec ff ff ff call 7e2 <inner>

e8: relative offset call instruction.

ec ff ff ff: little endian 0XFFFFFFEC is the complement of -20, which is the offset of the destination address relative to the next instruction of the current instruction. That is, the inner address is 0x7f6 (next instruction offset) - 0x 14 = 0x7e2.

Conclusion: Static function calls are simple. You can jump by relative address offset.

2.1.2 Global function call: bar() function call

First call

7fb: e8 80 fe ff ff call 680 <bar@plt>

• The parsing rule is the same as above, but the jump address is 0x 680 .
• The first instruction is jmp QWORD PTR [rip+0x200992], which is an indirect jump (jmp) instruction, running the jump address 0x201018. What is this address?

objdump -s libpic.so

Contents of section .got:
 200fc8 00000000 00000000 00000000 00000000  ................
 200fd8 00000000 00000000 00000000 00000000  ................
 200fe8 00000000 00000000 00000000 00000000  ................
 200ff8 00000000 00000000                    ........
Contents of section .got.plt:
 201000 080e2000 00000000 00000000 00000000  .. .............
 201010 00000000 00000000 86060000 00000000  ................
 201020 96060000 00000000 a6060000 00000000  ................
 201030 b6060000 00000000 c6060000 00000000  ................

• It is found that this address is in the got.plt section, 0x 00000686, and the address stored in this address is

0000000000000680 <bar@plt>:
 680:  ff 25 92 09 20 00      jmp    QWORD PTR [rip+0x200992]        # 201018 <_GLOBAL_OFFSET_TABLE_+0x18>
 686:  68 00 00 00 00         push   0x0
 68b:  e9 e0 ff ff ff         jmp    670 <_init+0x20>

What is the above series of address jumps doing? We use a schematic diagram to show the first address relocation process of bar (orange is the call entry, blue indicates the running instruction, and purple represents the corrected address).

1

The _dl_runtime_resolve() function is not elaborated. The input parameters of this function are the symbol index and library ID of the stack. The parsing process depends on section information such as .dynamic and .rela.plt. After parsing, the redirected address is filled in as 0x201018. You can check the contents of the .rela.plt section.

[root@docker-desktop demo1]# readelf -r libpic.so

Relocation section '.rela.dyn' at offset 0x4e8 contains 10 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000200de8  000000000008 R_X86_64_RELATIVE                    780
000000200df0  000000000008 R_X86_64_RELATIVE                    740
000000200e00  000000000008 R_X86_64_RELATIVE                    200e00
000000200fc8  000200000006 R_X86_64_GLOB_DAT 0000000000000000 _ITM_deregisterTMClone + 0
000000200fd0  000300000006 R_X86_64_GLOB_DAT 0000000000000000 b + 0
000000200fd8  000500000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0
000000200fe0  000e00000006 R_X86_64_GLOB_DAT 0000000000201040 c + 0
000000200fe8  000700000006 R_X86_64_GLOB_DAT 0000000000000000 _Jv_RegisterClasses + 0
000000200ff0  000800000006 R_X86_64_GLOB_DAT 0000000000000000 _ITM_registerTMCloneTa + 0
000000200ff8  000900000006 R_X86_64_GLOB_DAT 0000000000000000 __cxa_finalize + 0

Relocation section '.rela.plt' at offset 0x5d8 contains 5 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000201018  000b00000007 R_X86_64_JUMP_SLO 00000000000007b8 bar + 0
000000201020  000400000007 R_X86_64_JUMP_SLO 0000000000000000 printf + 0
000000201028  000500000007 R_X86_64_JUMP_SLO 0000000000000000 __gmon_start__ + 0
000000201030  000600000007 R_X86_64_JUMP_SLO 0000000000000000 ext + 0
000000201038  000900000007 R_X86_64_JUMP_SLO 0000000000000000 __cxa_finalize + 0

The .rela.plt section in the ELF file contains the function slot relocation information. Specific meanings:

  • Offset - Indicates the offset address in memory, that is, the address of the relocation item in the GOT.
  • Info - Contains two parts: the index of the symbol and the relocation type. In this case, the relocation type is R_X86_64_JUMP_SLOT, which is used to handle the jump of the function call.
  • Type - Describes the type of relocation. In this case, the type is R_X86_64_JUMP_SLOT, which is used to parse the PLT entry for symbols through lazy loading. There are many other common types:

    • R_X86_64_GLOB_DAT - Sets the contents of the global offset table.
    • R_X86_64_64 - 64-bit direct relocation; modify the 64-bit value.
    • R_X86_64_PC32 - 32-bit PC relative relocation; modify the 32-bit value of the offset within the instruction.
    • R_X86_64_GOT32 - 32-bit global offset table entry.
    • R_X86_64_PLT32 - 32-bit PLT relocation for function calls.
    • R_X86_64_GLOB_DAT - Sets the contents of the global offset table.
    • R_X86_64_RELATIVE - Requires a base address reset for module-specific relative address adjustments.
    • R_X86_64_GOTPCREL - PC-relative relocation accessing the GOT.
  • Sym. Value – The value of the symbol within its own defined module. Before relocation occurs, the symbol may not yet have a final runtime address. For local symbols (such as the bar function), this is usually their offset address in the current module. For external symbols (such as the printf function), this is usually 0 before relocation, indicating that the address has not been determined.
  • Sym. Name + Addend - Shows the name of the symbol and the amount added. The added amount is 0 here because we are looking at the relocation items in .rela format, and the added amount is already included in each relocation item.

At runtime, the dynamic linker performs address resolution based on these relocation items. For example, when the program calls printf for the first time, the control flow first jumps to the corresponding item of printf in PLT. There will be a stub code in PLT to trigger the dynamic linker, which will resolve the real address of printf and update the corresponding address in GOT.

Second call

After the address is relocated after running, the second call will be much simpler, as shown in the following figure:

2

After GDB is used for debugging, the single-step debugging address redirects the content of the .got.plt section (base address: 0x7F7A97F75000).

201000 080e2000 00000000 00000000 00000000 .. .............

(gdb) x/16a 0x7f7a98176000
0x7f7a98176000:  0x200e08  0x7f7a983976a8
0x7f7a98176010:  0x7f7a9818d890 <_dl_runtime_resolve_xsave>  0x7f7a97f75686 <bar@plt+6>
0x7f7a98176020:  0x7f7a97f75696 <printf@plt+6>  0x7f7a97f756a6 <__gmon_start__@plt+6>
0x7f7a98176030:  0x7f7a97f756b6 <ext@plt+6>  0x7f7a97f756c6 <__cxa_finalize@plt+6>
0x7f7a98176040 <c>:  0x3  0x0
0x7f7a98176050:  0x31303220352e382e  0x5228203332363035
0x7f7a98176060:  0x3420746148206465  0x2936332d352e382e
0x7f7a98176070:  0x20000002c00  0x8000000

The bar address in the .got.plt section is 0x201018 + 0x7F7A97F75000 (base address) = 0x7F7A98176018, and the content of 0x7F7A98176018 is 0x7f7A97f75686 <bar@plt+6>, which is the same as the relative address offset in the preceding figure. The result after redirection is as follows:

(gdb) x/16a 0x7f7a98176000
0x7f7a98176000:  0x200e08  0x7f7a983976a8
0x7f7a98176010:  0x7f7a9818d890 <_dl_runtime_resolve_xsave>  0x7f7a97f757b8 <bar>
0x7f7a98176020:  0x7f7a97f75696 <printf@plt+6>  0x7f7a97f756a6 <__gmon_start__@plt+6>
0x7f7a98176030:  0x7f7a97f756b6 <ext@plt+6>  0x7f7a97f756c6 <__cxa_finalize@plt+6>
0x7f7a98176040 <c>:  0x3  0x0
0x7f7a98176050:  0x31303220352e382e  0x5228203332363035
0x7f7a98176060:  0x3420746148206465  0x2936332d352e382e
0x7f7a98176070:  0x20000002c00  0x8000000

0x7f7a97f757b8 is the code segment, and 0x7f7a97f757b8 - 0x7F7A97F75000 (base address) = 0x7B8. This offset is also corresponding to the bar entry address of .text.

Let's abstract it as follows:

Based on the figure, the command call bar@plt leads to .plt, which uses the writable .got.plt section. During program execution, the function pointers in .got.plt are corrected to point to the actual addresses in the .text section (which is not writable). This process enables the creation of position-independent code.

This process also includes an important concept: lazy binding. The dynamic linker is completed at runtime. If it has been executed at the beginning, it will definitely slow down the startup speed of the program and affect performance if all symbols are loaded. Therefore, the function is not bound until it is used for the first time. This can greatly speed up the startup of the program. In this example, the bar is redirected only when it is called, and address redirection binding is not performed if it is not called, achieving the lazy binding effect.

Does the external function redirection have to be in .rela.plt?

No, if it is compiled with PIC, it will be performed in .rela.plt; if not, it will be performed in .rela.dyn.

Reason: Enabling the PIC call instruction will point to an entry in the PLT, which requires the .rela.plt section to implement lazy binding. The .rela.dyn section is used by the dynamic linker to bind the symbol to the relocation entry of its runtime address when loading. It contains other dynamic relocation information that is not specific to PLT entries. .rela.plt is mainly used for PLT relocation to resolve function addresses during dynamic linking and implement lazy binding, while .rela.dyn is used for broader dynamic relocation requirements.

Doubt?

Question 1: What are the differences between global function calls within a module and global function calls between modules?

Question 2: Why is there such a significant difference in the jump behavior between static function calls and global function calls, even though both involve function calls?

Put these two questions aside for a moment. Let's move on to inter-module function calls.

2.2 Inter-module Function Calls

In the example, foo() calls ext(). Looking at the assembly, it is found that the method of inter-module function calls is exactly the same as that of intra-module function calls. The assembly instructions are as follows:

/mnt/share/demo1/pic.c:17
  ext();
 800:  b8 00 00 00 00         mov    eax,0x0
 805:  e8 a6 fe ff ff         call   6b0 <ext@plt>

Now let's answer the first question in the previous section. There is no difference between global function calls within a module and between modules. Why?

Let's first recall the loading process. After the dynamic linker completes bootstrapping, it merges both the executable file and the linker's own symbol table into a symbol table called the global symbol table. When a symbol needs to be added to the global symbol table, if the same symbol already exists, the symbol added later is ignored. This rule is called global symbol intervention.

Due to the global symbol intervention rule, if, in the previous section, the intra-module function calls bar() by directly using a relative address, it may be overwritten by the function symbol of the same name in other modules. As a result, the relative address cannot accurately find the correct function address. Therefore, both intra-module and inter-module function calls need to be indirectly called through the .got.plt relocation method.

The answer to the second question in the previous section is also obvious. Static functions do not involve global symbol intervention issues. They can redirect through the relative address within the module. The addressing speed of such calls is also faster than that of global functions.

To have a deeper understanding of global symbol intervention, let's look at another example.

/* a1.c*/
#include <stdio.h>
void a() { 
  printf("a1.c\n"); 
}

/* a2.c */
#include <stdio.h>
void a() { 
  printf("a2.c\n"); 
}

/* b1.c */
void a();
void b1() { 
  a(); 
}

/* b2.c */
void a();
void b2() { 
  a(); 
}

/* main.c */
#include <stdio.h>
void b1();
void b2();
int main() {
  b1();
  b2();
  return 0;
}
[root@docker-desktop priority]# g++ -fPIC -shared a1.c -o a1.so
[root@docker-desktop priority]# g++ -fPIC -shared a2.c -o a2.so
[root@docker-desktop priority]# g++ -fPIC -shared b1.c a1.so -o b1.so
[root@docker-desktop priority]# g++ -fPIC -shared b2.c a2.so -o b2.so
[root@docker-desktop priority]# ldd b1.so
  a1.so (0x0000004001c2a000)
  libstdc++.so.6 => /usr/local/gcc-5.4.0/lib64/libstdc++.so.6 (0x0000004001e2c000)
  libm.so.6 => /lib64/libm.so.6 (0x00000040021ad000)
  libgcc_s.so.1 => /usr/local/gcc-5.4.0/lib64/libgcc_s.so.1 (0x00000040024b0000)
  libc.so.6 => /lib64/libc.so.6 (0x00000040026c7000)
  /lib64/ld-linux-x86-64.so.2 (0x0000004000000000)
[root@docker-desktop priority]# ldd b2.so
  a2.so (0x0000004001c2a000)
  libstdc++.so.6 => /usr/local/gcc-5.4.0/lib64/libstdc++.so.6 (0x0000004001e2c000)
  libm.so.6 => /lib64/libm.so.6 (0x00000040021ad000)
  libgcc_s.so.1 => /usr/local/gcc-5.4.0/lib64/libgcc_s.so.1 (0x00000040024b0000)
  libc.so.6 => /lib64/libc.so.6 (0x00000040026c7000)
  /lib64/ld-linux-x86-64.so.2 (0x0000004000000000)
[root@docker-desktop priority]# g++ main.c b1.so b2.so -o main
[root@docker-desktop priority]# ./main
a1.c
a1.c

In the above example, although both b1.so and b2.so call the a() function, the main program first links b1.so, resulting in the implementation of a() using the definition in a1.so. Therefore, no matter how b2.so changes, the implementation of a1.so is always called in the main program. This phenomenon emphasizes the parsing order of symbols in dynamic-link libraries and how it affects the final execution result. Developers need to carefully consider the naming of symbols and the loading order of libraries when designing interfaces to avoid potential symbol conflicts and uncertainties.

2.3 Intra-module and Inter-module Variables

The example shows the static variable a, the external global variable b, and the internal global variable c. The results after disassembly are as follows:

void bar() {
 7b8:  55                     push   rbp
 7b9:  48 89 e5              mov    rbp,rsp
/mnt/share/demo1/pic.c:7
  a = 1;
 7bc:  c7 05 82 08 20 00 01   mov    DWORD PTR [rip+0x200882],0x1        # 201048 <__TMC_END__>
 7c3:  00 00 00
/mnt/share/demo1/pic.c:8
  b = 2;
 7c6:  48 8b 05 03 08 20 00   mov    rax,QWORD PTR [rip+0x200803]        # 200fd0 <_DYNAMIC+0x1c8>
 7cd:  c7 00 02 00 00 00      mov    DWORD PTR [rax],0x2
/mnt/share/demo1/pic.c:9
  c = 4;
 7d3:  48 8b 05 06 08 20 00   mov    rax,QWORD PTR [rip+0x200806]        # 200fe0 <_DYNAMIC+0x1d8>
 7da:  c7 00 04 00 00 00      mov    DWORD PTR [rax],0x4
/mnt/share/demo1/pic.c:10
}
Idx Name          Size      VMA               LMA               File off  Algn
                  CONTENTS, ALLOC, LOAD, DATA
 20 .got          00000038  0000000000200fc8  0000000000200fc8  00000fc8  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 21 .got.plt      00000040  0000000000201000  0000000000201000  00001000  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 22 .data         00000004  0000000000201040  0000000000201040  00001040  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 23 .bss          0000000c  0000000000201044  0000000000201044  00001044  2**2
                  ALLOC

static int a; # 201048 <__TMC_END__> ==> .bss

extern int b; # 200fd0 <_DYNAMIC+0x1c8> ==> .got

int c; # 200fe0 <_DYNAMIC+0x1d8> ==> .got

In conjunction with the function calls we learned above, variable call redirection is similar, and access to static variables is done directly through offsets. This is more efficient because the scope of static variables is limited to the same compilation unit. As a result, their addresses can be determined at compile time (compared with rip). Non-static variables (including global variables and extern variables defined in the current module) may be referenced or modified by other modules, and their addresses need to be resolved by the dynamic linker at runtime. For global and extern variables, shared libraries use rip-based addressing plus runtime relocation of addresses in the .got section to ensure position independence.

There is no lazy binding for the address of global variables because they are usually resolved at load time and accessed through the global offset table, rather than deferred until the first use. As a result, delaying their address resolution will not provide significant advantages, but will place an additional performance burden at runtime.

3. Position-independent Extension

3.1 Impact of Hidden Symbols

If bar and variable c use symbols hidden by the __attribute__((visibility("hidden"))), what will happen to the function call redirection?

#include <stdio.h>
static int a;
extern int b;
__attribute__((visibility("hidden"))) int c = 3;
extern void ext();
void bar() __attribute__((visibility("hidden")));
void bar() {
  a = 1;
  b = 2;
  c = 4;
}

static void inner() {}

void foo() {
  inner();
  bar();
  ext();
  printf("a = %d, b = %d, c = %d\n", a, b, c);

Results after disassembly

[root@docker-desktop demo1]# objdump -d -M intel -S -l libpic_hidden.so

Disassembly of section .text:
...
0000000000000738 <bar>:
bar():
/mnt/share/demo1/pic_hidden.c:7
static int a;
extern int b;
__attribute__((visibility("hidden"))) int c = 3;
extern void ext();
void bar() __attribute__((visibility("hidden")));
void bar() {
 738:  55                     push   rbp
 739:  48 89 e5               mov    rbp,rsp
/mnt/share/demo1/pic_hidden.c:8
  a = 1;
 73c:  c7 05 fa 08 20 00 01   mov    DWORD PTR [rip+0x2008fa],0x1        # 201040 <__TMC_END__>
 743:  00 00 00
/mnt/share/demo1/pic_hidden.c:9
  b = 2;
 746:  48 8b 05 8b 08 20 00   mov    rax,QWORD PTR [rip+0x20088b]        # 200fd8 <_DYNAMIC+0x1c8>
 74d:  c7 00 02 00 00 00      mov    DWORD PTR [rax],0x2
/mnt/share/demo1/pic_hidden.c:10
  c = 4;
 753:  c7 05 db 08 20 00 04   mov    DWORD PTR [rip+0x2008db],0x4        # 201038 <c>
 75a:  00 00 00

...
/mnt/share/demo1/pic_hidden.c:17
  bar();
 773:  b8 00 00 00 00         mov    eax,0x0
 778:  e8 bb ff ff ff         call   738 <bar>
[root@docker-desktop demo1]# readelf -S libpic_hidden.so
There are 34 section headers, starting at offset 0x1470:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  ......
  [23] .data             PROGBITS         0000000000201038  00001038
       0000000000000004  0000000000000000  WA       0     0     4

bar: After disassembly, we can see that calling bar can directly jump through the relative address without running relocation.

int c; # 201038 <c> ==> .data section

View the .rela.plt section.

[root@docker-desktop demo1]# readelf -r libpic_hidden.so

Relocation section '.rela.dyn' at offset 0x4a8 contains 9 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000200df0  000000000008 R_X86_64_RELATIVE                    700
000000200df8  000000000008 R_X86_64_RELATIVE                    6c0
000000200e08  000000000008 R_X86_64_RELATIVE                    200e08
000000200fd0  000200000006 R_X86_64_GLOB_DAT 0000000000000000 _ITM_deregisterTMClone + 0
000000200fd8  000300000006 R_X86_64_GLOB_DAT 0000000000000000 b + 0
000000200fe0  000500000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0
000000200fe8  000700000006 R_X86_64_GLOB_DAT 0000000000000000 _Jv_RegisterClasses + 0
000000200ff0  000800000006 R_X86_64_GLOB_DAT 0000000000000000 _ITM_registerTMCloneTa + 0
000000200ff8  000900000006 R_X86_64_GLOB_DAT 0000000000000000 __cxa_finalize + 0

Relocation section '.rela.plt' at offset 0x580 contains 4 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000201018  000400000007 R_X86_64_JUMP_SLO 0000000000000000 printf + 0
000000201020  000500000007 R_X86_64_JUMP_SLO 0000000000000000 __gmon_start__ + 0
000000201028  000600000007 R_X86_64_JUMP_SLO 0000000000000000 ext + 0
000000201030  000900000007 R_X86_64_JUMP_SLO 0000000000000000 __cxa_finalize + 0

There is no bar() in .rela.plt, and no variable c in .rela.dyn, so after hiding, bar() does not need to be relocated, and variable c does not need to be indirectly redirected. The hidden symbols bar() and c also do not appear in the dynamic symbol table (.dynsym) of the dynamic-link library, so they are not visible to other shared objects or executable files during linking. As a result, there is no global symbol intervention for hidden symbols.

3.2 Questions about PIC

1. How to distinguish whether a DSO is a PIC?

readelf -d xxx.so | grep TEXTREL

If there is no output, the dynamic library is generated using PIC. Text relocation (TEXTREL) means that the code section (.text section) needs to be modified to reference the correct address. In non-PIC code, there will be references based on absolute addresses, which need to be modified when loading so that the code can run correctly. This process is text relocation.

2. How to distinguish whether a static library is PIC?

ar -t xxx.a
readelf -r xxx.o

You need to check whether there are absolute address-based relocation types such as R_X86_64_GOTPCREL or other similar relocation types that are not specifically designed for PIC code in the output.

3. Assuming that the static library is compiled without -fPIC and the dynamic library is compiled with -fPIC, is it ok?

No. In practice, the static library a.a does not use -fPIC, and the dynamic library b.so uses -fPIC. The compilation will fail due to the executable program linking the two libraries through the main. The error log is shown as follows:

g++ -c nopic_common.c -o nopic_common.o
ar rcs libnopic_common.a nopic_common.o
g++ -shared -o libnopic.so pic.c -L. -lnopic_common -fPIC
/usr/bin/ld: ./libnopic_common.a(nopic_common.o): relocation R_X86_64_PC32 against symbol `b' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status

The nopic_common.o object file is not compiled with -fPIC, so it contains a reference to the global variable b in a PC-relative manner (R_X86_64_PC32 relocation type). This type of relocation is incompatible with the creation of dynamic libraries as it requires that the code must be executed at a specific address. However, the address where the dynamic library is loaded remains unknown at runtime and may even be different for each run. Specifically, the code of the static library assumes that some data or function exists at a fixed address, which could be occupied by other code or libraries, potentially leading to link errors or runtime errors.

To fix this error, you need to recompile the code in nopic_common.o to position-independent code (PIC).

4. Why is PIC not used by default when compiling a dynamic library?

Historical reason: Due to historical inertia, earlier compiler versions did not include PIC generation as a default option.

• Option delivery issue: -fPIC is a compiler option, which is determined at the source code compilation stage, while -shared is a linker option, which is determined at different stages, so -fPIC cannot be automatically enabled through -shared.

Performance: While PIC is important for efficient operations of shared libraries, in some cases, PIC code may be slightly slower than non-PIC code because it requires using indirect addresses to reference global variables and functions. This performance impact is generally small, but it can be a factor in applications with extremely high performance requirements.

Compiler and build system design: Compilers and build systems often allow developers to choose whether to generate PIC based on project requirements. Support for flexible configuration enables developers to determine the most appropriate compilation option based on specific usage scenarios and requirements.

3.3 Differences between Dynamic and Static Linking Redirection

Static Linking Dynamic Linking
Phase Compilation and linking Loading and running
Execution Control Control is handed over to the executable file. Control is handed over to the dynamic linker, and then to the executable file after mapping.
Addressing Speed Fast Due to indirect jumps, it is about 1% to 5% slower than static linking, and it is improved by using lazy binding.
Table Name Relocation The .rela.text code segment relocates tables.
The .rela.data data segment relocates tables.
The .rela.plt code segment relocates tables.
The .rela.dyn data segment relocates tables.

4. How to Specify the Order of Loading Global Variables and Functions

The above section mainly introduces the dynamic loading process. In the initialization and de-initialization phases, special attention needs to be paid to the construction and destruction order of global variables and functions. These processes directly affect the dependencies between modules and the interactions between objects. Therefore, we need to understand how to control these sequences by using specific attributes to ensure the stability and expected behavior of the program. Especially in the multi-module dynamic library environment, reasonable arrangement of initialization and de-initialization order is an important measure to avoid runtime errors and crashes.

4.1 Global Variable Initialization Order

For global variables across shared libraries, their initialization order is affected by the dependencies between these shared libraries. If shared library A depends on shared library B, then the initialization code of B will be executed before the initialization code of A, so the global variables in B will be initialized before the global variables in A.

Let's take a look at the example in Chapter 1 Function Calls Between Two Modules and view the link order and initialization order through the LD_DEBUG=files ./main command.

[root@docker-desktop]# LD_DEBUG=files ./main
       112:  find library=b1.so [0]; searching
       112:   search path=/usr/local/gcc-5.4.0/lib64/tls/i686:/usr/local/gcc-5.4.0/lib64/tls:/usr/local/gcc-5.4.0/lib64/i686:/usr/local/gcc-5.4.0/lib64:tls/i686:tls:i686:    (LD_LIBRARY_PATH)
       112:    trying file=/usr/local/gcc-5.4.0/lib64/tls/i686/b1.so
       112:    trying file=/usr/local/gcc-5.4.0/lib64/tls/b1.so
       112:    trying file=/usr/local/gcc-5.4.0/lib64/i686/b1.so
       112:    trying file=/usr/local/gcc-5.4.0/lib64/b1.so
       112:    trying file=tls/i686/b1.so
       112:    trying file=tls/b1.so
       112:    trying file=i686/b1.so
       112:    trying file=b1.so
       112:
       112:  find library=b2.so [0]; searching
       112:   search path=/usr/local/gcc-5.4.0/lib64:tls/i686:tls:i686:    (LD_LIBRARY_PATH)
       112:    trying file=/usr/local/gcc-5.4.0/lib64/b2.so
       112:    trying file=tls/i686/b2.so
       112:    trying file=tls/b2.so
       112:    trying file=i686/b2.so
       112:    trying file=b2.so
       112:
       112:  find library=libstdc++.so.6 [0]; searching
       112:   search path=/usr/local/gcc-5.4.0/lib64:tls/i686:tls:i686:    (LD_LIBRARY_PATH)
       112:    trying file=/usr/local/gcc-5.4.0/lib64/libstdc++.so.6
       112:
       112:  find library=libm.so.6 [0]; searching
       112:   search path=/usr/local/gcc-5.4.0/lib64:tls/i686:tls:i686:    (LD_LIBRARY_PATH)
       112:    trying file=/usr/local/gcc-5.4.0/lib64/libm.so.6
       112:    trying file=tls/i686/libm.so.6
       112:    trying file=tls/libm.so.6
       112:    trying file=i686/libm.so.6
       112:    trying file=libm.so.6
       112:   search cache=/etc/ld.so.cache
       112:    trying file=/lib64/libm.so.6
       112:
       112:  find library=libgcc_s.so.1 [0]; searching
       112:   search path=/usr/local/gcc-5.4.0/lib64:tls/i686:tls:i686:    (LD_LIBRARY_PATH)
       112:    trying file=/usr/local/gcc-5.4.0/lib64/libgcc_s.so.1
       112:
       112:  find library=libc.so.6 [0]; searching
       112:   search path=/usr/local/gcc-5.4.0/lib64:tls/i686:tls:i686:    (LD_LIBRARY_PATH)
       112:    trying file=/usr/local/gcc-5.4.0/lib64/libc.so.6
       112:    trying file=tls/i686/libc.so.6
       112:    trying file=tls/libc.so.6
       112:    trying file=i686/libc.so.6
       112:    trying file=libc.so.6
       112:   search cache=/etc/ld.so.cache
       112:    trying file=/lib64/libc.so.6
       112:
       112:  find library=a1.so [0]; searching
       112:   search path=/usr/local/gcc-5.4.0/lib64:tls/i686:tls:i686:    (LD_LIBRARY_PATH)
       112:    trying file=/usr/local/gcc-5.4.0/lib64/a1.so
       112:    trying file=tls/i686/a1.so
       112:    trying file=tls/a1.so
       112:    trying file=i686/a1.so
       112:    trying file=a1.so
       112:
       112:  find library=a2.so [0]; searching
       112:   search path=/usr/local/gcc-5.4.0/lib64:tls/i686:tls:i686:    (LD_LIBRARY_PATH)
       112:    trying file=/usr/local/gcc-5.4.0/lib64/a2.so
       112:    trying file=tls/i686/a2.so
       112:    trying file=tls/a2.so
       112:    trying file=i686/a2.so
       112:    trying file=a2.so
       112:
       112:
       112:  calling init: /lib64/libc.so.6
       112:
       112:
       112:  calling init: /lib64/libm.so.6
       112:
       112:
       112:  calling init: /usr/local/gcc-5.4.0/lib64/libgcc_s.so.1
       112:
       112:
       112:  calling init: /usr/local/gcc-5.4.0/lib64/libstdc++.so.6
       112:
       112:
       112:  calling init: a2.so
       112:
       112:
       112:  calling init: a1.so
       112:
       112:
       112:  calling init: b2.so
       112:
       112:
       112:  calling init: b1.so
       112:
       112:
       112:  initialize program: ./main
       112:
       112:
       112:  transferring control: ./main
       112:
a1.c
a1.c      
      ......

As can be seen from the log, the loading sequence of dynamic libraries is as follows: b1.so, b2.so, a1.so, a2.so. These libraries are loaded according to dependencies. Using the find library statement, we can see that they are searched and the successful path is found.

The order of initialization is: a2.so, a1.so, b2.so, b1.so.

This sequence shows how the constructor of each library is called before the main function is executed. It can be seen that the initialization of dynamic libraries is carried out in the order of dependencies, that is, the initialization of a library will be performed after all the libraries it depends on are initialized.

__attribute__((__init_priority__(PRIORITY)) is a feature provided by GCC that controls the initialization priority of a global variable or function. It can only be used for global or static object declarations. It changes the order in which object constructors are called, ensuring that the constructors of different objects are called in the specified priority order when the program starts (that is, before the main() function is executed). PRIORITY must be an integer between 101 and 65535, where 101 is the highest priority (initialized first) and 65535 is the lowest priority (initialized last).

• If no priority is defined, the initialization order depends on the order of the '.o' where the global variable is defined in the command line parameters when linking.

• If some global variables use init_priority and some do not, all global variables that use init_priority are initialized before global variables that do not use init_priority.

Sample code:

TestClass obj __attribute__((init_priority(102)))

4.2 Function Construction and Destruction Order

Functions can use __attribute__(constructor(PRIORITY)) and __attribute__(destructor(PRIORITY)).

The __attribute__(constructor(PRIORITY)) attribute is used to mark a function, which tells the compiler that this function should be executed automatically before the main() function is executed. If you specify PRIORITY, it can affect the order in which multiple such functions are executed: a smaller PRIORITY value means that the initialization function will be executed earlier.

Functions modified with __attribute__(destructor(PRIORITY)) allow the system to call it after the main() function exits or exit() is called. The priority is the same as above.

Sample code:

void __attribute__((constructor(102))) test()

4.3 Notes

Portability: attribute is GCC-specific. Although many other compilers provide similar extensions, they are not compatible across compilers, so you should consider using other mechanisms or adding compatibility conditions.

Initialization dependencies: Great care must be taken to manage dependencies between objects when using these attributes to modify the initialization order. Incorrectly planned initialization sequences can cause programs to crash when using uninitialized or semi-initialized objects.

Default priority: The compiler also assigns a default initialization priority to global objects that do not have a specified priority. However, this default priority may vary from compiler to compiler, so it is best to specify the priority explicitly to avoid ambiguity.

Compatibility with other features: When using constructor attributes, consider their possible compatibility with other language features such as smart pointers, lazy initialization of static local variables, and so on.

5. Summary

The above sections describe the process of dynamic linking. From the perspective of the overall operation process of the program, it can be divided into several key stages: compilation, linking, loading, and running. The following table briefly summarizes these stages.

Main work

Sample command

Compile

The source file is converted by gcc/g++ into an ELF format object file that contains compiled code but is not bound to the address of the dependency. The .o file is generated on the disk.

gcc -fPIC -c test.c -o test.o

gcc -c main.c -o main.o

• -fPIC: indicates that position-independent code is generated.

• -c: indicates that only the compilation step is executed without linking.

• -o test.o: specifies the name of the output destination file.

Linking

Set up the necessary information for the linker (ld.so) to prepare various table structures and reference placeholders for runtime dynamic linking. The .so file is generated on the disk.

Detailed process:

1. Create a table of symbol references for subsequent resolution by the loader and dynamic linker.

2. Create data structures for runtime symbol resolution, such as placeholders for the global offset table (GOT) and program link table (PLT).

3. Provide the necessary redirection entries to tell the loader the place to find all references to dynamic libraries.

gcc-shared-o libtest.so test.o

gcc -o main main.o -L. -ltest

• -shared: tells the linker that we want to create a shared object, namely, a dynamic library.

• -o libtest.so: specifies the name of the generated dynamic library file.

Loading

(The focus of this article)

The dynamic linker is responsible for loading dynamic libraries into memory and redirecting and repositioning in conjunction with resolution symbols, to ensure that programs can run correctly in memory.

Detailed process:

1. Start the dynamic linker and perform its own relocation work through GOT and .dynamic information to complete bootstrapping.

2. Load the shared target file: merge the executable file and the linker's own symbols into the global symbol table, and traverse the shared target files in the breadth-first order. Their symbol tables will be continuously merged into the global symbol table. If multiple shared objects have the same symbols, the shared target file loaded first will block the subsequent symbols.

3. Relocation (memory): Relocate the function calls, and variable addresses that need to be corrected so that they point to the correct memory address.

4. Initialization: Run the initialization code for dynamic libraries, such as .init and constructors.

./main

Running

Control is handed over to the main function, which parses and updates more symbol references when needed (such as in the case of lazy binding).

Appendix 1: Key Concepts

ELF (Executable and Linkable Format)

An executable and linkable format standard used as the standard binary file format in Unix systems, including executable files, object code, shared libraries, and core dumps. The ELF file contains all the information needed to run a program, such as program instructions, program entry points, data, and symbol tables.

PIC (Position Independent Code)

Concept: Position-independent code refers to code that can be executed without depending on the specific loading address. Compiling to PIC means that the generated code can run anywhere in the address space of the process. This is especially crucial in dynamic libraries, because multiple programs may share a single copy of the same dynamic library, but the library may be loaded to different locations in the address spaces of the programs.

Use phase: Compilation. Compiling with the '-fPIC' option generates position-independent code.

GOT (Global Offset Table)

Concept: The global offset table provides a fixed location for storing absolute addresses of external symbols and is populated by the linker. It is used to support position-independent code (PIC) in shared libraries.

Use phase: Linking/loading. The linker creates the GOT and it is populated by the dynamic linker (part of the loader) when the program starts.

PLT (Procedure Linkage Table)

Concept: The procedure linkage table works with the GOT for function calls in dynamic linking. It contains code to find the address of an external function from .got.plt. If the function is called for the first time, it will trigger the linker to resolve the function address and fill it in the corresponding position of .got.plt. If the function address has been stored in .got.plt, it will jump directly to the corresponding address to continue execution.

Use phase: Linking/loading. Similar to the GOT, the creation of the PLT occurs in the linking phase, and its filling and updating occur when the program starts and the dynamic symbol is accessed for the first time.

ld.so

A dynamic linker program in the Linux system that is responsible for loading shared libraries and performing dynamic linking and binding. It reads the dynamic library dependencies specified by the executable file and loads these libraries into memory, while also handling symbol resolution and relocation. When you run a dynamically linked executable file, it actually runs ld.so first, and then your program itself. ld.so will check the libraries needed by the program and load them into memory.

Key sections

Section name

 

Commands to view the information

Instance results

.interp

Save the path to the dynamic linker.

objdump -s xxx # View all sections.

4

.dynsym

RA

Include only symbols that need to be dynamically linked during program execution. Symbols hidden by __attribute__((visibility("hidden"))) in GCC will not appear here.

readelf-S xxx/objdump-h XXX # View section address distribution.

5

 

'Ndx' (index) is displayed as UND (short for "undefined"), indicating that the symbol is not defined in the shared object and needs to be parsed (imported) from other shared objects.

 

The 'Value' column has a non-zero address value, indicating the symbol's location in the shared object file (.so file).

.rela.dyn and rela.plt

RA

The relocation table segment that stores the relocation information.

 

.rela.dyn fixes data references in the locations: .got and data segments.

 

.rela.plt fixes function references (enable PIC compilation) in the location: .got.plt. Where there is a procedure linked list there usually exists this table, because plt causes absolute jumps, then all absolute addresses that need dynamic linking/relocation in all plt tables (possibly in .got.plt or .got, they depend on whether lazy binding is enabled) need to be recorded through .rela.plt.

readelf -r xxx # View the content of the relocation table.

 

readelf-S xxx/objdump-h XXX # View section address distribution.

6

.plt

RA

A set of springboard functions that implement lazy binding of shared library functions.

readelf-S xxx/objdump-h XXX # View section address distribution.

7

.text

RA

Code section

readelf-S xxx/objdump-h XXX # View section address distribution.

8

.dynamic

RWA

.dynamic stores the basic information used by the dynamic linker, such as the dynamic link symbol table (.dynsym), string table (.dynstr), runtime libraries on which relocation tables (.rela.dyn/rela.plt) depend, and library search paths.

readelf-dxxx # View the .dynmaic section address.

9

.got and .got.plt

RWA

The places where the relocation pointer is stored.

readelf-S xxx/objdump-h XXX # View section address distribution.

 

readelf-x &lt;section&gt; &lt;xxx.so&gt; # View the content of a specific section.

10

11

.data

RWA

Store initialized global and static variables.

readelf-S xxx/objdump-h XXX # View section address distribution.

12

.bss

RWA

Store uninitialized global and static variables. .bss does not occupy actual disk space, because it is just a placeholder.

readelf-S xxx/objdump-h XXX # View section address distribution.

.symtab

This includes not only exported and imported symbols but also local symbols (such as static functions and static global variables) and dynsym debug symbols.

readelf -s xxx # View all symbols.

13

'Ndx' (index) is displayed as UND (short for "undefined"), indicating that the symbol is not defined in the shared object and needs to be parsed (imported) from other shared objects.

The 'Value' column has a non-zero address value, indicating the symbol's location in the shared object file (.so file).

Appendix 2: Common Commands

  • Show runtime links

    • dlopen: Loads a dynamic-link library (.so file) and returns a handle.
    • dlsym: Finds and returns the address of a symbol by using the given dynamic-link library handle and symbol name.
    • dlclose: Closes the dynamic-link library handle opened by dlopen to release resources.
    • dlerror: Returns a string that describes the last error. If no error occurs, NULL is returned.
  • Environment variables

    • LD_LIBRARY_PATH: Specifies additional library search paths for the dynamic linker, and pre-defines paths.
    • LD_PRELOAD: Specifies a list of shared libraries to be loaded before all other libraries. The dynamic linker views the NEEDED type in the ".dynamic" section, and searches the paths in order: LD_LIBRARY_PATH, the directory specified in the /etc/ld.so.conf (/etc/ld.so.cache) configuration file, /lib, /usr/lib. That is, the library in the LD_PRELOAD environment variable is loaded first.
    • LD_DEBUG: Setting this environment variable allows the dynamic linker to print out debugging information to help developers understand what happened during the linking process, including library search paths and symbol resolution. When set, it will output massive information to the standard output, which may cause performance degradation, so it is usually only used during debugging. Format: LD_DEBUG=[Parameter value] ./[Program name], for example, LD_DEBUG=libs ./your_program. The variable contains the following parameters:
    • The libs parameter prints information about each library that needs to be loaded, including the library search and loading process.
    • The files parameter reports the opening and closing operations of input files, that is, binary objects (programs or libraries).
    • The symbols parameter reports the details of symbol resolution, including the process of symbol lookup and binding to a specific address.
    • The bindings parameter provides information about binding to global and local symbols.
    • The versions parameter outputs information about versioning symbols, which can show the version binding of the library.
    • The all parameter outputs all the above debugging information to provide the most comprehensive debugging information.
  • Tools and commands

    • ldd: Used to print dependencies of shared libraries. For example, running ldd /path/to/your/program can list all the dynamic-link libraries required for the program to run.
    • strip: Used to remove debugging information and symbol table .symtab from programs or libraries to reduce the size of the resulting binary file. When using this command, it is worth noting that debugging becomes more difficult due to the removal of some information. Usage: strip --strip-debug /path/to/library.so.

Appendix 3: Reference Documentation

《Self-cultivation of Programmers》


Disclaimer: The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.

0 1 0
Share on

Alibaba Cloud Community

1,133 posts | 351 followers

You may also like

Comments

Alibaba Cloud Community

1,133 posts | 351 followers

Related Products