All Products
Search
Document Center

EventBridge:Kafka Connect

Last Updated:Mar 10, 2026

Kafka Connect is a framework for streaming data between Apache Kafka and external systems. Custom Message Queue for Apache Kafka connectors in EventBridge let you extract data from or push data to Apache RocketMQ clusters, as well as other external systems.

The following diagram shows the data flow through a Kafka Connect pipeline:

External system --> Source connector --> Kafka topics --> Sink connector --> External system

Core concepts

Kafka Connect has four core abstractions that work together to move data:

ConceptRoleDetails
ConnectorsCoordinate data streaming by managing tasksSource connectors read from external systems and write to Kafka topics. Sink connectors read from Kafka topics and write to external systems.
TasksCopy data between Kafka and external systemsStateless execution units that run in parallel. Each connector instance manages one or more tasks for data transmission.
WorkersRun connector and task threadsOperate in standalone or distributed mode.
ConvertersSerialize and deserialize message dataEnsure format compatibility between Message Queue for Apache Kafka and external systems.

Connectors

A connector defines where data comes from and where it goes. There are two types:

  • Source connector -- Reads from an external system and writes records to one or more Kafka topics. For example, a database source connector can stream table updates into Kafka.

  • Sink connector -- Reads records from one or more Kafka topics and writes them to an external system. For example, a JDBC sink connector can push data from Kafka into a relational database.

Tasks

A task is a stateless execution unit. Each connector instance manages one or more tasks for parallel data transmission.

Task scaling considerations:

  • For sink connectors, the number of tasks relates to the number of partitions being consumed. Valid values for tasks.max range from 1 to the number of partitions in the source Kafka topic.

  • For source connectors, how source data is partitioned depends on the connector implementation. The connector may create fewer tasks than the tasks.max value.

Workers

A worker is a process that runs one or more connector instance threads and task threads. Kafka Connect supports two running modes:

Standalone mode

A single worker handles all connectors and tasks. Simpler to set up, but not fault-tolerant -- if the worker fails, all data streaming stops.

Distributed mode

Multiple workers share the same group.id and form a cluster. A rebalance policy -- similar to Kafka consumer group rebalancing -- distributes connectors and tasks across workers. This mode is scalable and fault-tolerant:

  • When a worker is added or removed, the cluster rebalances to redistribute connectors and tasks evenly.

  • If a worker fails unexpectedly, other workers detect the failure and rebalance automatically.

Distributed mode configuration

To run Kafka Connect in distributed mode, configure the following parameters:

ParameterDescriptionDefault
plugin.pathComma-separated list of directories containing connector plugin JARs and dependencies. Kafka Connect loads plugins from these paths at startup. Each directory can contain either uber-JARs (single JARs with all dependencies bundled) or a set of individual JARs with their third-party dependencies.None (required)
group.idUnique identifier for the Connect worker cluster. Must differ from any Kafka consumer group ID.connect-cluster
config.storage.topicTopic for storing connector and task configuration. Must have exactly one partition and multiple replicas. Create this topic manually -- auto-created topics may have multiple partitions.None (required)
offset.storage.topicTopic for storing connector offset data. Must have multiple partitions and replicas.connect-offsets
status.storage.topicTopic for storing connector and task status. Can have multiple partitions and replicas.connect-status

plugin.path example:

/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors

Converters

A converter serializes and deserializes message data between Message Queue for Apache Kafka and external systems, ensuring compatibility across data formats and structures.

Converters operate at two levels:

  • Worker level -- Applies to all connectors on that worker.

  • Connector level -- Overrides the worker-level setting for a specific connector.

Supported formats

Converters can be used to convert between the following data formats: Avro, Protobuf, String, JSON, JSON Schema, and ByteArray.

Connector configuration

Important

If you configure connectors in both the EventBridge console and a ZIP file, the console configuration overwrites the ZIP file configuration.

Required parameters

ParameterDescriptionExample
nameConnector name. Typically a string with no ISO control characters.mongo-sink
connector.classFully qualified class name or alias. Must be a subclass of org.apache.kafka.connect.connector.Connector.com.mongodb.kafka.connect.MongoSinkConnector
tasks.maxMaximum number of tasks to create. The effective upper limit depends on connector type.1
topicsComma-separated list of source topics. Applies when Message Queue for Apache Kafka Parameters is set to Sink Connect.sourceA,sourceB

For optional parameters, see Kafka Connect Configs in the Apache Kafka documentation.

Example: JDBC sink connector

name=testConnector
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=2
topics=connect-test