Kafka Connect - EventBridge - Alibaba Cloud Documentation Center

Custom Message Queue for Apache Kafka connectors allow you to extract data from or push data to Apache RocketMQ clusters. This topic describes the basic terms and configurations when you use custom Message Queue for Apache Kafka connectors.

Basic terms

connector

A connector is located at the logical abstraction layer and is used to specify the data source and data sink. A connector copies data from a data source into a Message Queue for Apache Kafka topic, or data from a Message Queue for Apache Kafka topic to a data sink.

task

A task is a stateless logic execution unit. Each connector instance manages multiple tasks for data transmission.

worker

A worker is a process that runs one or more connector instance threads and task threads. The following running modes are supported by Kafka Connect:

Standalone mode: Only one worker is started. All workloads are performed in the worker. This mode is not fault-tolerant.
Distributed mode: Multiple workers are started. All workers use the same group.id and consist of a worker cluster. A rebalance policy, which is similar to the rebalance policy that is used by a consumer group of Message Queue for Apache Kafka, is used to manage the connectors and schedule tasks among the workers. This mode is scalable and fault-tolerant. If you add or close a worker, or a worker unexpectedly fails, other workers detect the change and perform a rebalance to redistribute connectors and tasks.
If you want to use the distributed mode, you must configure the following items:
- plugin.path: the configuration item of Kafka Connect. Paths are used to address the executable content of connectors when Kafka Connect is started. The executable content can be JARs or dependencies. You can configure multiple paths. Separate multiple paths with commas (,). Example:
```
/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors
```
  In the preceding example, the first directory includes the uber-JARs of all class files on which the plugin and the third-party dependency of the plugin depend. The last directory includes all JARs and the third-party dependency that the plugin requires.
- group.id: the unique ID that is used by the Connect cluster group. The ID cannot be the same as the ID of the consumer group in Message Queue for Apache Kafka. Default value: connect-cluster.
- config.storage.topic: the topic that is used to store the configuration information about connectors and tasks. The topic must contain one partition and multiple replicas. You must manually create the topic because a topic that is automatically created by the system may contain multiple partitions.
- offset.storage.topic: the topic that is used to store the information about offsets. The topic must contain multiple partitions and replicas. Default value: connect-offsets.
- status.storage.topic: the topic that is used to store the information about status. This topic can contain multiple partitions and replicas. Default value: connect-status.

converter

A converter is a component that is used to convert data formats. You can use converters to serialize and deserialize message data between Message Queue for Apache Kafka and other external services to ensure compatibility on data formats and structures. Converters can be configured for workers and connectors. The converter configurations of connectors can overwrite the converter configurations of workers. Converters can be used to convert between the following data formats: Avro, Protobuf, String, JSON, JSON Schema, and ByteArray.

Connector configurations

Important If you configure connectors in the EventBridge console and in a ZIP file at the same time, the configurations in the console overwrites the configurations in the ZIP file.


Parameter (required)	Description	Example
name	The name of the connector. In most cases, the name is a string that does not contain ISO control characters.	mongo-sink
connector.class	The class name or alias of the connector. The class must be a subclass of `org.apache.kafka.connect.connector.Connector`.	com.mongodb.kafka.connect.MongoSinkConnector
task.max	The maximum number of tasks. Valid values: 1 to the maximum number of partitions in the Message Queue for Apache Kafka topic.	1
topics	Specifies the source topics when Message Queue for Apache Kafka Parameters is set to Sink Connect. Separate multiple topics with commas (,).	sourceA,sourceB

For information about optional parameters, see 3.5 Kafka Connect Configs.

The following code shows how to configure a connector whose data sink is jdbc:

name=testConnector
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=2
topics=connect-test