Community Blog Coolbpf Can Compile Remotely and Discover Network Jitter!

Coolbpf Can Compile Remotely and Discover Network Jitter!

This article is an excerpt from a speech about Coolbpf Application Practice from the 2022 Apsara Conference.


Recently, at the Apsara Conference 2022 OpenAnolis Forum-eBPF & Linux Stability Session, Wenan Mao (a maintainer of the eBPF Technology Exploration SIG) delivered a technical speech on Coolbpf Application Practice. The following article highlights the main points of that speech:


1. Why Should It Support Portability?


With the development of BPF technology, it has become easier to develop a BPF program. Although BPF has provided convenience, it has been pursuing another aspect: portability. BPF portability is defined as a BPF program successfully written and verified by the kernel, which can run in different kernel versions.

There are two challenges in performing the BPF portability:

  1. Different kernel versions have different memory layouts for data.
  2. The kernel type and data structure are constantly changing, and the struct field may be removed or renamed.

BPF CO-RE (Compile Once-Run Everywhere) is a way to achieve portability.

The following components are provided to support CO-RE:

  • BTF: It describes the kernel image and obtains key information about the kernel and BPF program types and codes.
  • Clang releases bpf program relocation information to the.btf segment.
  • Libbpf CO-RE relocates the bpf program according to the btf segment.

Three main types of information need to be relocated:

  1. Struct-related relocation is closely related to BTF. Clang records the member offset by __builtin_preserve_access_index().
  2. The variable relocation of map fd, global variables (data, bss, and rodata), and extern mainly depends on the relocation mechanism of ELF to update the imm field of eBPF instructions.
  3. Sub-function relocation is to put the sub-function called by the eBPF program together with the main function to facilitate loading them into the kernel.


The steps to develop BPF CORE with libbpf are listed below:

  1. Generate the header file vmlinux.h with all kernel types by bpftool
  2. Use Clang (version 10 or newer) to compile the source code of the BPF program into .o object file
  3. Generate the BPF skeleton header file from the compiled BPF object file through the bpftool gen command
  4. Include the generated BPF skeleton header file in the user-space code
  5. Compile user-space code, which will get BPF object code embedded in it, so you don’t have to distribute extra files with your application.

There are roughly the following function calls:

<name>__open(): Create and open the BPF application. After that, you can set the skel->rodata variable.

<name>__load(): Instantiate, load, and verify BPF application parts

<name>__attach(): Attach all auto-attachable BPF programs. When an event and network run message arrives, the bpf program will be triggered.

<name>__destroy(): Detach all BPF programs and use all the resources they use


There are three main development methods for eBPF programs, each of which has advantages and disadvantages:

  1. The Sample Code from Kernel: This method is based on kernel samples/bpf, not CORE. It does not have any open-source projects based on the third party, and the resource usage is low. However, the disadvantage is that you need to completely rebuild the project, which is inefficient and has poor version compatibility.
  2. BPF CORE: Based on the bpf_core_read code written by libbpf, the development machine generates the binary program corresponding to the target machine. This method does not rely on deploying Clang/LLVM in the environment and consumes fewer resources. However, you need to build the compilation project. Some code is relatively fixed and cannot be dynamically configured.
  3. BCC: This method is based on the most widely used open-source project, and the development efficiency is high. However, Clang/LLVM compilation must be executed for each run, and there is competition for resources such as memory and CPU. The target environment depends on the corresponding kernel header file.

2. Function and Architecture of Coolbpf


Coolbpf provides remote compilation (cloud compilation). Remote compilation means the target machine running the program is not on the same machine as the machine compiling the program, which can solve the problem of resource occupation. Local compilation and basic library encapsulation are provided, which is convenient for users to call the basic library for writing. Low-version kernel is supported. Coolbpf also provides automatic generation and release of BTF, and users can directly use it by downloading without manual adaptation. It provides automated testing and supports high-level languages (such as Python, Go, and Rust) for application development.

Coolbpf naturally supports the CORE capability of BPF, which solves the problems of compilation and resource consumption. At the same time, the complex libbpf development steps described earlier are completely simplified. Users only need to focus on their functional development, not environment building and redundant code development.


Coolbpf provides standardized bpf compilation services. When bpf.c is submitted to a remote build server, the server services advanced applications with the return point bpf.so or bpf.o for different languages, depending on the kernel version. Therefore, you only need to load bpf.so in your high-level language code to run the program. Instead of manually triggering Libbpf's open(), load(), attach(), and other functions, it is automatically completed by the init() function of the high-level language program. This way, users can quickly build and deploy the project and only need to focus on the processing after data output.


Coolbpf also supports low-version kernels without eBPF features, and we provide the eBPF driver to help it run safely on low-version kernels. We implement the eBPF verifier verification part in a driver on the higher version, which will perform various security checksums to ensure the security of eBPF compared with kernel modules. In addition, we convert all the original libbpf-based calls to IOCTL system calls.

The previously supported helper function, map creation, and program loading are all converted into the implementation of kprobe or tracepoint in lower versions. In addition, perf event and jit are supported. This way, the same user program, loading such a driver, can safely run on the lower version kernel without modifying the eBPF program code.

3. Network Application Practice of Coolbpf


Raptor is an observable tool based on Coolbpf. It can run on earlier version kernels (such as AliOS and CentOS 3.10). It can be used as an SDK for third parties to collect data.


In this network application observation, the interactive data content and quintuple information are determined by monitoring the data interaction, request, and response information in the system call. They are sent to the user mode through the interactive mode of the map, so the observation is done in a non-intrusive way. Finally, the observation results are presented (such as traffic statistics and request delay).


Let's look at a specific problem and understand how Coolbpf finds the network jitter problem in the packet-receiving phase. We know the network packet receiving is divided into two phases:

Phase 1: The OS sends packets to the application's packet-receiving queue through softIRQ and notifies the process to complete the packet-receiving work of the protocol stack.

Phase 2: The application fetches data from the packet-receiving queue after being notified.

We can write a BPF program in Coolbpf. Then, we only need to monitor two tracepoints (tcp_probe and tcp_rcv_space_adjust) to find the delay problem in phase 2.


In this case, a business application receives packets slowly. The kernel side has received the tcp packet, but the application side does not receive it until nearly one second later.

Observation Method: Deploy the eBPF agent, and we can see the packet receiving delay time in Phase 2 is nearly one second.

Cause: Each time the delay occurs, the delay time is approximately 42 seconds at a certain time. It is suspected the application is delayed due to a scheduled task of the business. Finally, it is found that a certain task of the business will regularly collect jvm parameters and stop the business.

Solution: After the task is stopped, the jitter problem is eliminated.

Please see the following links for more information:

Link Address of eBPF Technology Exploration SIG:

0 1 0
Share on


69 posts | 4 followers

You may also like