Kafka message retrieval practice

Introduction to scene pain points

During the use of message queues, due to their distributed characteristics, it is inevitable to encounter problems such as message loss and message retransmission.

• For example, in log aggregation scenarios, multiple heterogeneous data sources typically produce data into Kafka for consumption by downstream computing engines such as Spark. When some logs are missing, it is difficult to directly troubleshoot them from the client's logs due to the complexity of message data transmission methods, data structures, and other types.

• In the process of message forwarding, for example, the consumer may repeatedly consume the same data, which requires retrieving data from the message queue based on content to determine whether the message is being repeatedly produced. However, the message queue can only traverse and scan by partition and consumption site, and is not flexible in achieving message retrieval.

The existing message queue products in the industry do not have good tools and methods to retrieve message content, which will greatly increase the difficulty of troubleshooting and investment costs.

Kafka Message Retrieval Component

Introduction to Search Components

Message Queuing Kafka "Search Component" is a fully hosted, highly elastic, and interactive search component with trillions of second level response capabilities for message content retrieval. It aims to solve the problem that industry messaging products do not support retrieving message content. The message queue Kafka "retrieval component" uses the Kafka Connector to transfer the message data in the topic to a table store, providing message retrieval capabilities based on the multiple index function of the table store. It can support retrieval by combining one or more conditions such as the partition, location, and time range of the message, as well as full-text retrieval of messages based on the message Key and Value.

Case Practice

Case Background

Suppose an operation and maintenance team needs to monitor the operation of an online cluster, collect process level logs, and import them to Kafka. Downstream, Flink consumption is used to calculate the resource consumption of each process in real time. When it is found in Flink that the log data of a process is missing for a certain period of time, it is necessary to use the message queue Kafka "retrieval component" to retrieve the message data based on the message value and time range to determine whether the log has been successfully pushed to the message queue Kafka.

For example, the collected log data is in JSON structure, and the format of a log data is:

Opening message retrieval

First, you need to log in to the AliCloud Message Queuing Kafka console, select the corresponding topic, and activate the message retrieval service.

After the message retrieval service is activated, a Tablestore instance will be automatically created, and then the message data will be transferred to the Tablestore, and an index will be created to provide message retrieval capabilities. Each topic corresponds to a data table in the Tablestore. You can view the details of the message retrieval components for each topic on the Message Queuing Kafka console.

Message Retrieval Practice

After the message retrieval service is activated, multiple search terms in the message can be used to retrieve the message, realizing the above case. For example, specify a time range and retrieve a message with PID=276 in the message Value.

2. Example of returned results

Capacity expansion

Introduction to Table Store

Table Store is a structured data store built on the underlying Feitian platform, capable of providing hundreds of billions of data storage and millisecond level data retrieval service capabilities. After the message queue Kafka transfers messages to the Tablestore, it supports retrieving messages through the native data access method of the Tablestore. The Tablestore supports more complex retrieval logic, and also supports retrieving messages through SQL syntax. The following are two methods for retrieving messages:

Multiple Index Search

1. Log in to the table store console, enter the Tablestore instance and data table corresponding to the Kafka message data transfer, and select a multiple index search message on the index management page.

For example, it is necessary to retrieve a message whose Value contains PID=276 or PID=277.

3. Return Results

SQL Search Message

Table storage Tablestore supports retrieving messages based on SQL syntax. First, you need to create an SQL mapping table on the data table where the message is transferred.

2. Retrieve the message with PID=276 based on Tablestore SQL.


Alibaba Cloud Message Queuing Kafka "Search Component" is the first component in the field of message queuing to support interactive message content retrieval. It provides message retrieval service capabilities based on data transfer table storage, Tablestore, and supports free combination of search messages based on any condition such as Key, Value, and partition. It also supports Key and Value full-text search of messages, featuring development free, O&M free, and high flexibility. At the same time, messages can also be retrieved directly through a table store index or SQL, greatly improving the speed of daily troubleshooting for the presence or correctness of messages.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us