Kafka source client buffer pool technology

Date: Oct 25, 2022

Related Tags:1. Message Queue for Apache Kafka Play Video
2. Go to the Configure tab for Kafka

Abstract: When our application calls the kafka client producer to send a message, inside the kafka client, the messages belonging to the same topic partition are first aggregated to form a batch. The real messages sent to the kafka server are in batches.

background



When our application calls the kafka client producer to send a message, inside the kafka client, the messages belonging to the same topic partition are first aggregated to form a batch. The real messages sent to the kafka server are in batches. As shown below:

The benefits of doing so are obvious. The client and server communicate through the network, so batch sending can reduce the performance overhead brought by the network and improve the throughput.

The management of this batch is very worth discussing. Some people may say, this is not easy? When using, allocate a block of memory, and release it after sending.

Kafka is written in java language (most of the new versions are implemented in java). The above solution is to create a new space and assign it to a reference when using it, and set the reference to null when releasing it, etc. JVM GC processing That's it.

It doesn't seem to be a problem. However, when the concurrency is relatively high, GC will be performed frequently. We all know that there is a stop the world during GC. Although the latest GC technology has a very short time, it may still become a performance bottleneck in the production environment.

The designers of kafka can certainly take this layer into account. Let's learn how Kafka manages batches.

Analysis of the principle of buffer pool technology



The kafka client uses the concept of a buffer pool to pre-allocate real memory blocks and put them in the pool.

Each batch actually corresponds to a memory space in the buffer pool. After the message is sent, the batch is no longer used, and the memory block is returned to the buffer pool.

Does it sound familiar? Yes, database connection pools, thread pools and other pooling technologies are basically the same. Through pooling technology, the overhead caused by creation and destruction is reduced, and the execution efficiency is improved.


Code is the best document, so let's take a look at the source code.

The steps we use to code the code are from the top to the bottom. First, we will show you where the buffer pool is used, and then we will go deep into the buffer pool for in-depth analysis.

The following code has made some deletions, and the values ​​related to this article have been retained for easy analysis.

When we call the client to send a message, the bottom layer will call doSend, and then use a record accumulator RecordAccumulator to append the message. Let's move on.

RecordAccumulator actually manages a batch queue. We see that the implementation of the append method is actually calling the free method of BufferPool to apply (allocate) a piece of memory space (ByteBuffer), and then wrap this empty memory space into a batch and add it to the back of the queue.



When the message is sent and the batch is not used, the RecordAccumulator will call the deallocate method to return the memory, and the deallocate method of the BufferPool is actually called internally.

Obviously, BufferPool is the class of buffer pool management, and it is also the focus of our discussion today. Let's first look at the method of allocating memory blocks.

First of all, the whole method is locked, so it supports concurrent allocation of memory.

The logic is like this, when the requested memory size is equal to poolableSize, it is obtained from the cache pool. This poolableSize can be understood as the page size of the buffer pool as the basic unit of buffer pool allocation. Getting from the buffer pool is actually taking an element from the ByteBuffer queue and returning it.

If the requested memory is not equal to a specific value, apply to the non-buffer pool. At the same time, some memory will be taken from the buffer pool and merged into the non-buffer pool. This nonPooledAvailableMemory refers to the available memory size of the non-buffer pool. Non-buffer pool allocation of memory is actually to call ByteBuffer.allocat to allocate real JVM memory.

Cache pool memory is generally rarely reclaimed. The memory of the non-buffer pool is discarded after use, and then waiting for GC to recycle.

Continue to look at the code released by batch,

Quite simply, there are two cases. Either return the buffer pool directly, or update the non-buffer pool part of the available memory. Then notify the first element in the waiting queue.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us