×
Community Blog Seven Core Issues about eBPF

Seven Core Issues about eBPF

This article introduces the eBPF technology by answering seven core questions.

By Yanxun

In the past year, ARMS built Kubernetes monitoring based on the eBPF technology to provide multi-language non-intrusive application performance, system performance, and network performance observability capabilities. The technological and ecological development of eBPF is promising. As a practitioner of this technology, this article introduces the eBPF technology by answering seven core questions, helping readers understand eBPF.

What Is eBPF?

eBPF is a technology that can run sandboxed programs in the kernel. It provides a mechanism to safely inject code when kernel and user program events occur so non-kernel developers can control the kernel. With the development of the kernel, eBPF has expanded from the initial packet filtering to the network, kernel, security, tracking, etc. Its functional characteristics are still developing. The early BPF is called classic BPF (cBPF). It is this function extension that makes the current BPF called extended BPF (eBPF).

What Are the Application Scenarios of eBPF?

Network Optimization

eBPF has high performance and high scalability, making it the preferred solution for network packet processing in network solutions.

  • High Performance

The JIT compiler provides near-kernel native code execution efficiency.

  • Highly Scalability

In the context of the kernel, protocol resolution and routing policy can be quickly added.

Troubleshooting

eBPF uses the kprobe and tracepoints tracking mechanism to provide kernel and user tracking capabilities. This end-to-end tracking capability can quickly diagnose faults. eBPF supports the distribution of profiling statistics in a more efficient manner without the need to transmit a large amount of sampled data like traditional systems, making continuous real-time profiling possible.

1

Security Control

eBPF can see all system calls, network packets, and socket network operations. The integration combines process context tracking, network operation-level filtering, and system call filtering to provide better security control.

Performance Monitoring

Compared with traditional system monitoring components that can only provide static counters and gauges(such as sar), eBPF supports customized metrics and events about dynamic collection and edge computing aggregation in a programmable way, improving the efficiency and imagination of performance monitoring.

Why Does eBPF Appear?

The emergence of eBPF aims to solve the contradiction between slow kernel iteration and rapid changes in system requirements. An example commonly used in the eBPF field is that the relationship between eBPF and Linux Kernel is similar to Javascript and HTML, highlighting programmability. Generally speaking, the support of programmability usually brings some new problems. For example, the kernel module aims to solve this problem but does not provide a good boundary. As a result, the kernel module will affect the stability of the kernel and needs to be adapted to different kernel versions. eBPF uses the following strategies to make it a secure and efficient core programmable technology.

  • Security

The eBPF program can only be executed after being verified by the validator and cannot contain unreachable instructions. The eBPF program cannot call kernel functions at will and can only call auxiliary functions defined in the API. The eBPF program stack space is only 512 bytes at most. Mapping storage must be used for larger storage.

  • Efficiency

With the help of the just-in-time compiler (JIT), there is no need to copy data to the user state since the eBPF instructions are still running in the kernel, which improves the efficiency of event processing.

  • Standard

It provides standard interfaces and data models for developers through BPF Helpers, BTF, and PERF MAP.

  • Powerful Features

eBPF expands the number of registers and introduces a new BPF mapping storage but also extends the original single packet filtering event to kernel state functions, user state functions, trace points, performance events (perf_events), security control, and other fields in the 4.x kernel.

How to Use eBPF

2

Five Steps

1.  Develop an eBPF program with the C language

This is the eBPF sandbox program to be called when the insertion point triggers the event, which will run in the kernel state.

2.  Compile eBPF programs into BPF bytecode with LLVM

The eBPF program is compiled into the BPF bytecode for verification later and running within the eBPF virtual machine.

3.  Submit the BPF bytecode to the kernel through the bpf system call

In the user state, the BPF bytecode is loaded into the kernel through the bpf system.

4.  The kernel verifies and runs the BPF bytecode and saves the corresponding state to the BPF map.

The kernel verifies that the BPF bytecode is safe and ensures the correct eBPF program is called when the corresponding event occurs. If a state is to be saved, it is written to the corresponding BPF map. For example, monitoring data can be written to the BPF mapping.

5.  The user program queries the running status of BPF bytecode through BPF mapping.

The user state queries the content of the BPF mapping to obtain the status of the bytecode operation, such as obtaining the captured monitoring data.

A complete eBPF program includes the user state and kernel state. User state programs need to interact with the kernel through BPF system calls to complete eBPF program loading, event mounting, mapping creation and update, etc. In the kernel state, eBPF programs cannot call kernel functions arbitrarily but need to complete the required tasks through BPF auxiliary functions. In particular, when accessing memory addresses, you must use bpf_probe_read series functions to read memory data to ensure secure and efficient memory access. When eBPF programs need large blocks of storage, we need to introduce a specific type of BPF mapping according to the application scenario and use it to provide running state data to programs in user space.

eBPF Program Classification and Usage Scenarios

bpftool feature probe | grep program_type

You can run the preceding command to view the eBPF program types supported by the system. Generally, the following types are available:

eBPF program_type socket_filter is available
eBPF program_type kprobe is available
eBPF program_type sched_cls is available
eBPF program_type sched_act is available
eBPF program_type tracepoint is available
eBPF program_type xdp is available
eBPF program_type perf_event is available
eBPF program_type cgroup_skb is available
eBPF program_type cgroup_sock is available
eBPF program_type lwt_in is available
eBPF program_type lwt_out is available
eBPF program_type lwt_xmit is available
eBPF program_type sock_ops is available
eBPF program_type sk_skb is available
eBPF program_type cgroup_device is available
eBPF program_type sk_msg is available
eBPF program_type raw_tracepoint is available
eBPF program_type cgroup_sock_addr is available
eBPF program_type lwt_seg6local is available
eBPF program_type lirc_mode2 is NOT available
eBPF program_type sk_reuseport is available
eBPF program_type flow_dissector is available
eBPF program_type cgroup_sysctl is available
eBPF program_type raw_tracepoint_writable is available
eBPF program_type cgroup_sockopt is available
eBPF program_type tracing is available
eBPF program_type struct_ops is available
eBPF program_type ext is available
eBPF program_type lsm is available

Please visit this link for more information.

There are mainly three scenarios:

  • Tracking

Tracepoint, kprobe, perf_event, etc., are mainly used to extract tracking information from the system and provide data support for monitoring, troubleshooting, and performance optimization.

  • Network

Xdp, sock_ops, cgroup_sock_addr , sk_msg, etc., are mainly used to filter and process network data packets and realize various functions (such as network observation, filtering, traffic control, and performance optimization). Here, packet loss and redirection can be used.

3

Cilium uses all hook points.

  • Security and Others

Lsm is used for security, and others include flow_deptor and lwt_in, which are not commonly used and will not be described here.

What Is the Best Practice for eBPF?

Find the Insertion Point of the Kernel

The eBPF program is not difficult, but it is difficult to find a suitable event source for it to trigger the operation. The event sources of trace-like eBPF programs include three types in the field of monitoring and diagnosis: kernel function (kprobe), kernel trace point (tracepoint), or performance event (perf_event). There are two questions to answer:

1.  What kernel functions, kernel trace points, or performance events are available in the kernel?

  • Use debugging information to obtain kernel functions and kernel trace points
sudo ls /sys/kernel/debug/tracing/events
  • Use bpftrace to obtain kernel functions and kernel trace points
# Query all kernel insertions and tracking points.
sudo bpftrace -l

# Use wildcards to query all system call tracking points.
sudo bpftrace -l 'tracepoint:syscalls:*'

# Use wildcards to query all trace points whose names contain "open".
sudo bpftrace -l '*open*'
  • Use the perf list to obtain performance events
sudo perf list tracepoint

2.  How can they query the definition format of data structures of kernel functions and tracking points when they need to track their incoming parameters and return values?

  • Use debug information to obtain
sudo cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_openat/format

Use bpftrace to obtain

sudo bpftrace -lv tracepoint:syscalls:sys_enter_openat

Please see bcc for more information.

Find Insertion Points for Applications

1.  How to query the tracking point of the user process

  • The static compilation language retains debugging information through the-g compilation option. The application binary contains Debugging With Attributed Record Format DWARF). With debugging information, you can use tools (such as readelf, objdump, and nm) to query the list of functions, variables, and other symbols that can be used for tracking.
# Query the symbol table.
readelf -Ws /usr/lib/x86_64-linux-gnu/libc.so.6

# Query the USDT information.
readelf -n /usr/lib/x86_64-linux-gnu/libc.so.6
  • Use bpftrace
# Query uprobe.
bpftrace -l 'uprobe:/usr/lib/x86_64-linux-gnu/libc.so.6:*'

# Query USDT.
bpftrace -l 'usdt:/usr/lib/x86_64-linux-gnu/libc.so.6:*'

Uprobe is file-based. When a function in a file is tracked, unless the process PID is filtered, all processes that use this file will be inserted by default.

The preceding is a static compilation language, which is similar to the tracking of the kernel. The symbolic information of the application program can be stored in the ELF binary file or put into the debugging file in the form of a separate file. In addition to storing the kernel binary file, the symbolic information of the kernel will also be exposed to user space in the form of /proc/kallsyms and /sys/kernel/debug.

There are two main types of non-statically compiled languages:

1.  Interpretive Language

Use the trace point query method similar to compiled language applications to query their uprobe and USDT trace points at the interpreter level. How to associate the behavior at the interpreter level with the application behavior needs to be analyzed by experts in relevant languages.

2.  Instant Compilation Language

The application source code of this language will be compiled into bytecode and compiled into machine code by the just-in-time compiler (JIT) for execution. There will be a lot of optimization, and tracking is difficult. Similar to interpreted programming languages, uprobe and USDT tracking can only be used on the just-in-time compiler to obtain the function information of the final application from the trace point parameters of the just-in-time compiler. The relationship between the tracking point of the real-time compiler and the operation of the application requires an expert in the relevant language to analyze.

You can refer to BCC's application tracking and user process tracking, essentially executing the uprobe handler through breakpoints. Although the kernel community has done a lot of performance tuning for BPF, tracking user state functions (especially high-frequency functions, such as lock contention and memory allocation) may still cause massive performance overhead. Therefore, we should try to avoid tracking high-frequency functions when using uprobe.

Please see this link for more information.

Associated Problems and Insertion Points

An ideal state is that all problems are clear, and those insertion points should be observed, but this requires technical support personnel to have a thorough understanding of the end-to-end software stack details. A more reasonable method is the Pareto principle, which grasps the core 80% context of the software stack data flow to ensure that problems will be found in this context. At this time, we can use the kernel stack and user stack to check the specific call stack to find the core problem. For example, we find that the network is losing packets, but we do not know why it is lost. We know the kfree_skb kernel function will be called if the network packet is lost. Then, we can pass:

sudo bpftrace -e 'kprobe:kfree_skb /comm=="<your comm>"/ {printf("kstack: %s\n", kstack);}'

Find the call stack of this function:

kstack: kfree_skb+1 udpv6_destroy_sock+66 sk_common_release+34 udp_lib_close+9 inet_release+75 inet6_release+49 __sock_release+66 sock_close+21 __fput+159 ____fput+14 task_work_run+103 exit_to_user_mode_loop+411 exit_to_user_mode_prepare+187 syscall_exit_to_user_mode+23 do_syscall_64+110 entry_SYSCALL_64_after_hwframe+68

Then, you can trace back the preceding functions to see which line they are called under what conditions to locate the problem. This method can locate the problem and deepen the understanding of kernel calls, such as:

bpftrace -e 'tracepoint:net:* { printf("%s(%d): %s %s\n", comm, pid, probe, kstack()); }'

You can view all network-related trace points and their call stacks.

What Is the Implementation Principle of eBPF?

4

Five Modules

The eBPF is mainly composed of five modules in the kernel:

1.  BPF Verifier

It ensures the security of eBPF programs. The verifier will create the instruction to be executed as a directed acyclic graph (DAG) to ensure the program does not contain unreachable instructions. Then, simulate the execution process of the instruction to ensure that invalid instructions will not be executed. Some students taught me that the verifier cannot guarantee 100% security here, so all BPF programs need strict monitoring and review.

2.  BPF JIT

Compile eBPF bytecode into local machine instructions for efficient execution in the kernel.

3.  A memory module consisting of multiple 64-bit registers, a program counter, and a 512-byte stack

It is used to control the running of eBPF programs, save stack data, and participate in output parameters.

4.  BPF Helpers (Auxiliary Function)

It provides a series of functions for eBPF programs to interact with other kernel modules. These functions cannot be called by any eBPF program. The available functions set is determined by the BPF program type. Note: All changes to input and output parameters in eBPF must comply with BPF specifications. Except for changes to local variables, other changes should be completed using BPF Helpers. If BPF Helpers does not support it, it cannot be modified.

bpftool feature probe

Run the preceding command to see which BPF Helpers different types of eBPF programs can run

5.  BPF Map and Context

It is used to provide large blocks of storage that can be accessed by user-space programs to control the running status of eBPF programs.

bpftool feature probe | grep map_type

Run the command above to see which types of maps the system supports

Three Actions

First, let's talk about the important system call bpf:

int bpf(int cmd, union bpf_attr *attr, unsigned int size);

Here cmd is the key, attr is the parameter of cmd, and size is the parameter size, so the key is to see what cmd has:

// 5.11 kernel
enum bpf_cmd {
BPF_MAP_CREATE,  
BPF_MAP_LOOKUP_ELEM,  
BPF_MAP_UPDATE_ELEM,  
BPF_MAP_DELETE_ELEM, 
BPF_MAP_GET_NEXT_KEY, 
BPF_PROG_LOAD,
BPF_OBJ_PIN,
BPF_OBJ_GET,  
BPF_PROG_ATTACH, 
BPF_PROG_DETACH,  
BPF_PROG_TEST_RUN,
BPF_PROG_GET_NEXT_ID,  
BPF_MAP_GET_NEXT_ID, 
BPF_PROG_GET_FD_BY_ID, 
BPF_MAP_GET_FD_BY_ID,
BPF_OBJ_GET_INFO_BY_FD, 
BPF_PROG_QUERY, 
BPF_RAW_TRACEPOINT_OPEN, 
BPF_BTF_LOAD, 
BPF_BTF_GET_FD_BY_ID, 
BPF_TASK_FD_QUERY, 
BPF_MAP_LOOKUP_AND_DELETE_ELEM, 
BPF_MAP_FREEZE, 
BPF_BTF_GET_NEXT_ID, 
BPF_MAP_LOOKUP_BATCH, 
BPF_MAP_LOOKUP_AND_DELETE_BATCH, 
BPF_MAP_UPDATE_BATCH,  
BPF_MAP_DELETE_BATCH,  
BPF_LINK_CREATE,
BPF_LINK_UPDATE, 
BPF_LINK_GET_FD_BY_ID,
BPF_LINK_GET_NEXT_ID, 
BPF_ENABLE_STATS, 
BPF_ITER_CREATE,
BPF_LINK_DETACH,
BPF_PROG_BIND_MAP,
};

The core is PROG and MAP-related cmd, which is program loading and mapping processing.

1.  Program Loading

The BPF_PROG_LOAD cmd call loads the BPF program into the kernel, but the eBPF program is not like a regular thread. It runs there all the time after it is started and needs an event to trigger before it is executed. These events include system calls, kernel trace points, call exits of kernel functions and user state functions, network events, etc. The second action is required.

2.  Binding Events

b.attach_kprobe(event="xxx", fn_name="yyy")

The preceding aims to bind a specific event to a specific BPF function. The actual implementation principle is listed below:

(1) With the help of bpf system calls, after loading the BPF program, the returned file descriptor will be remembered.

(2) Know the event number of the corresponding function type through the attach operation

(3) Call perf_event_open to create performance monitoring events according to the return value of attaching

(4) Bind the BPF program to the performance monitoring event using the PERF_EVENT_IOC_SET_BPF command of ioctl

3.  Mapping Processing

The MAP-related cmd is used to control the addition and deletion of the MAP. Then, the user state interacts with the kernel state based on the MAP.

What Is the Current Development Status of eBPF?

Kernel Support

Suggestion: Kernel version >=4.14

Ecosystem

The following is the bottom-up ecosystem of eBPF:

1.  Infrastructure

It supports the development of basic eBPF capabilities.

  • Linux Kernal
  • LLVM

2.  Development Tool Set

It is mainly used to load, compile, and debug eBPF programs. Different languages have different development tool sets:

  • Go

https://github.com/cilium/ebpf

https://github.com/aquasecurity/libbpfgo

  • C/C++

https://github.com/libbpf/libbpf

3.  eBPF Application

It provides a set of development tools and scripts.

Based on bcc, a script language is provided.

Network Optimization and Security

Network Security

High-Performance Four-Layer Load Balancing

Observability

Observability

Observability

Schedule the bpftrace script

The Platform for Starting and Managing eBPF Programs in a Distributed Environment

Dynamic Linux Trace

Monitoring Linux Runtime Security

4.  Websites Tracking Ecology

Summary

The Premise of Using eBPF Well Is the Understanding of the Software Stack.

I believe everyone should have a sufficient understanding of eBPF after reading the preceding article. eBPF only provides a framework and mechanism. It is important for people that use eBPF to understand the software stack, find the right insertion point, and be able to relate to application problems.

The Advantages of eBPF Are Full Coverage, No Intrusion, and Programmability.

1.  Full Coverage

Fully cover kernel and application insertion point

2.  No Intrusion

You do not need to modify any hooked code.

3.  Programmability

Dynamically issue eBPF programs, dynamically execute instructions at the edge, and aggregate analysis

Team Information

The Alibaba Cloud Observability Team works on a variety of technical fields and products (such as frontend monitoring, application monitoring, container monitoring, Prometheus, Tracing Analysis, intelligent alerting, and O&M visualization). It aims to improve observable solutions and best practices in different industries and different technical scenarios.

Alibaba Cloud Kubernetes Monitoring is a comprehensive non-intrusive observability product developed for Kubernetes clusters based on the eBPF technology. It aims to provide an overall observability solution for IT developers and O&M personnel based on the metrics, application processes, logs, and events in Kubernetes clusters.

Introduction:
https://www.alibabacloud.com/help/en/application-real-time-monitoring-service/latest/what-is-kubernetes-monitoring

Access:
https://www.alibabacloud.com/help/en/application-real-time-monitoring-service/latest/enable-kubernetes-monitoring-for-a-kubernetes-cluster

1 2 1
Share on

You may also like

Comments

Dikky Ryan Pratama May 8, 2023 at 7:02 am

finally I know what eBPF is

Related Products