You can use Tablestore Sink Connector to batch import data in Apache Kafka to a data table or time series table in Tablestore.

Background information

Apache Kafka is a distributed Message Queuing (MSMQ) system. Data systems can use Kafka Connect to import data streams to and export data streams from Apache Kafka.

The Tablestore team has developed Tablestore Sink Connector based on Kafka Connect. Tablestore Sink Connector pulls message records based on the subscribed topics from Apache Kafka in poll mode, parses the message records, and then batch imports the data to Tablestore. Tablestore Sink Connector optimizes the process of importing data and supports custom configurations.

Tablestore is a multi-model data storage service that is developed by Alibaba Cloud. Tablestore can store large amounts of structured data and supports a variety of data models, including the Wide Column model and the TimeSeries model. You can synchronize data from Apache Kafka to a data table or time series table in Tablestore. Data tables are a table type in the Wide Column model and time series tables are a table type in the TimeSeries model. For more information about specific operations, see Data synchronization to data tables and Data synchronization to time series tables.

Features

Tablestore Sink Connector supports the following features:

  • At-least-once delivery

    Ensures that Kafka message records are delivered from Kafka topics to Tablestore at least once.

  • Data mapping

    Deserializes data in Kafka topics by using Converter. Before you deserialize data by using Converter, you need to modify the key.converter and value.converter attributes in the worker or connector configurations of Kafka Connect. You can choose the JsonConverter that is built in Kafka Connect, a third-party Converter, or a custom Converter.

  • Automatic creation of destination tables in Tablestore

    If the destination table is missing in Tablestore, a destination table can be automatically created based on the primary key columns and attribute column whitelist that you specify. If no attribute column whitelist is specified, all fields in the record values of Kafka message records are used as the attribute columns of the destination table.

  • Error handling policy

    Errors may occur when message records are parsed or written to Tablestore because data is imported in batches. If an error occurs, you can terminate the task or ignore the error. You can also log the message record and the error message in Kafka or Tablestore.

Working mode

Tablestore Sink Connector can work in the standalone or distributed mode. You can select a mode based on your business requirements.
  • In the standalone mode, all tasks are executed in a single process. This mode is easy to configure and use. You can use the standalone mode to learn about the features of Tablestore Sink Connector.
  • In the distributed mode, all tasks are executed in multiple processes in parallel. This mode can allocate tasks to processes based on the workloads of the processes and provides the fault tolerance capability when tasks are executed. This way, the distributed mode outperforms the standalone mode in stability. We recommend that you use the distributed mode.