Migrate Kafka data to MaxCompute, Migrate Kafka data to MaxCompute - MaxCompute

Integrating MaxCompute with Kafka offers efficient and reliable data processing and analytics capabilities. This integration is ideal for scenarios that require real-time processing, large-scale data streams, and complex data analytics. This topic describes how to write data from Message Queue for Apache Kafka and self-managed Kafka instances to MaxCompute and provides a detailed example for a self-managed Kafka instance.

Write Kafka data to MaxCompute: Alibaba Cloud fully managed Kafka

MaxCompute is tightly integrated with Message Queue for Apache Kafka. You can use the MaxCompute Sink Connector for Message Queue for Apache Kafka to continuously import data from a specified topic into a MaxCompute table without requiring third-party tools or custom development. For more information, see Create a MaxCompute sink connector.

Write Kafka data to MaxCompute: Self-managed open source Kafka

Prerequisites

You have deployed Kafka V2.2 or later and created a Kafka topic. Version 3.4.0 is recommended.
You have created a MaxCompute project and table. For more information, see Create a MaxCompute project and Create a table.

Notes

The Kafka connector supports writing data in TEXT, CSV, JSON, and FLATTEN formats. The following notes apply to each format. For more information about data types, see Data type descriptions.

When you write Kafka data in TEXT or JSON format to MaxCompute, the MaxCompute table must meet the following requirements:

Field name	Field type	Fixed field
topic	STRING	Yes
partition	BIGINT	Yes
offset	BIGINT	Yes
key	When you write TEXT Kafka data, the field type must be STRING. When you write JSON Kafka data, the field type can be STRING or JSON, depending on the data being written.	This field is fixed to sync the key from the Kafka message to the MaxCompute table. For more information about the mode for syncing Kafka messages to MaxCompute, see mode.
value	When you write TEXT Kafka data, the field type must be STRING. When you write JSON Kafka data, the field type can be STRING or JSON, depending on the data being written.	This field is fixed to sync the value from the Kafka message to the MaxCompute table. For more information about the mode for syncing Kafka messages to MaxCompute, see mode.
pt	STRING (partition field)	Yes

When you write Kafka data in FLATTEN or CSV format to MaxCompute, the table must include the following fields and data types. You can define other fields based on the data that you are writing.
Field name
Field type
topic
STRING
partition
BIGINT
offset
BIGINT
pt
STRING (partition field)
- When you write Kafka data in CSV format to a MaxCompute table, the order and data types of the custom fields in the MaxCompute table must match the columns in the Kafka data to ensure a successful write operation.
- When you write Kafka data in FLATTEN format to a MaxCompute table, the names of the custom fields in the MaxCompute table must match the field names in the Kafka data to ensure a successful write operation.
  For example, if the FLATTEN Kafka data is {"A":a,"B":"b","C":{"D":"d","E":"e"}}, the MaxCompute table must be configured as follows.
```
CREATE TABLE IF NOT EXISTS table_flatten(
 topic STRING,
 `partition` BIGINT,
 `offset` BIGINT,
 A BIGINT,
 B STRING,
 C JSON
) PARTITIONED BY (pt STRING);
```

Configure and start the Kafka connector service

This example uses a Linux environment. In a command window, download the kafka-connector-2.0.jar package by running the following command or using the download link.
```
wget http://maxcompute-repo.oss-cn-hangzhou.aliyuncs.com/kafka/kafka-connector-2.0.jar
```
To prevent dependency conflicts, create a subfolder, such as connector, in the $KAFKA_HOME/libs directory and place the kafka-connector-2.0.jar package in it.
Note
If the kafka-connector-2.0.jar package is not compatible with your Kafka deployment environment, see Configure Kafka-connector for more information about how to configure and start the Kafka-connector service.

In the $KAFKA_HOME/config directory, configure the connect-distributed.properties file.

Add the following content to the connect-distributed.properties file.

## Add the following content
plugin.path=<KAFKA_HOME>/libs/connector

## Update the values of the key.converter and value.converter parameters
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter

In the $KAFKA_HOME/ directory, run the following command to start the Kafka-connector service.
```
## Start command
bin/connect-distributed.sh config/connect-distributed.properties &
```

Configure and start the Kafka connector task

Create and configure the odps-sink-connector.json configuration file. Then, upload the odps-sink-connector.json file to any location.

The content and parameters of the odps-sink-connector.json configuration file are described in the following sections.

{
  "name": "Kafka connector task name",
  "config": {
    "connector.class": "com.aliyun.odps.kafka.connect.MaxComputeSinkConnector",
    "tasks.max": "3",
    "topics": "your_topic",
    "endpoint": "endpoint",
    "tunnel_endpoint": "your_tunnel endpoint",
    "project": "project",
    "schema":"default",
    "table": "your_table",
    "account_type": "account type (STS or ALIYUN)",
    "access_id": "access id",
    "access_key": "access key",
    "account_id": "account id for sts",
    "sts.endpoint": "sts endpoint",
    "region_id": "region id for sts",
    "role_name": "role name for sts",
    "client_timeout_ms": "STS Token valid period (ms)",
    "format": "TEXT",
    "mode": "KEY",
    "partition_window_type": "MINUTE",
    "use_streaming": false,
    "buffer_size_kb": 65536,
    "sink_pool_size":"150",
    "record_batch_size":"8000",
    "runtime.error.topic.name":"kafka topic when runtime errors happens",
    "runtime.error.topic.bootstrap.servers":"kafka bootstrap servers of error topic queue",
    "skip_error":"false"
  }
}

Common parameters

Parameter	Required	Description
name	Yes	The name of the task. The name must be unique.
connector.class	Yes	The class name to start the `Kafka connector` service. The default value is `com.aliyun.odps.kafka.connect.MaxComputeSinkConnector`.
tasks.max	Yes	The maximum number of consumer processes in the `Kafka connector`. The value must be an integer greater than 0.
topics	Yes	The name of the Kafka topic.
endpoint	Yes	The endpoint of the MaxCompute service. You must configure the endpoint based on the region and network connectivity type you selected when creating the MaxCompute project. For the endpoints of each region and network, see Endpoints.
tunnel_endpoint	No	The public endpoint of the Tunnel service. If you do not configure a Tunnel endpoint, the tunnel automatically routes to the Tunnel endpoint corresponding to the network where the MaxCompute service is located. If you configure a Tunnel endpoint, your configuration takes precedence and automatic routing is disabled. For the endpoints of each region and network, see Endpoints.
project	Yes	The name of the target MaxCompute project.
schema	No	This parameter is required if the target MaxCompute project is configured with a three-layer schema model. The default value is default. This parameter is not required if the target MaxCompute project is not configured with a three-layer schema model. For more information about schemas, see Schema operations.
table	Yes	The name of the table in the target MaxCompute project.
format	No	The format of the messages to be written. Valid values: TEXT (default): The message is a string. BINARY: The message is a byte array. CSV: The message is a string with values separated by commas (,). JSON: The message is a string in the JSON data type. For more information about the MaxCompute JSON type, see JSON data type. FLATTEN: The message is a string in the JSON data type. The keys and values in the JSON string are parsed and written to the corresponding columns in the MaxCompute table. The keys in the JSON data must correspond to the column names in the MaxCompute table. For examples of importing messages in different formats, see Usage examples.
mode	No	The mode for syncing messages to MaxCompute. Valid values: KEY: Retains only the message key and writes the key to the target MaxCompute table. VALUE: Retains only the message value and writes the value to the target MaxCompute table. DEFAULT (default): Retains both the message key and value and writes them to the target MaxCompute table. In DEFAULT mode, only TEXT and BINARY data formats are supported.
partition_window_type	No	Partitions data by system time. Valid values: DAY, HOUR (default), and MINUTE.
use_streaming	No	Specifies whether to use the streaming data tunnel. Valid values: false (default): Disabled. true: Enabled.
buffer_size_kb	No	The size of the internal buffer for the odps partition writer, in KB. The default value is 65536 KB.
sink_pool_size	No	The maximum number of threads for multi-threaded writing. The default value is the number of CPU cores in the system.
record_batch_size	No	The maximum number of messages that a single thread within a Kafka connector task can send in parallel at one time.
skip_error	No	Specifies whether to skip records that cause unknown errors. Valid values: false (default): Does not skip the records. true: Skips the records. Note If skip_error is set to false and the runtime.error.topic.name parameter is not configured, the process stops writing data when an unknown error occurs. The process is blocked, and an exception is thrown in the log. If skip_error is set to true and runtime.error.topic.name is not configured, the data writing process continues, and the abnormal data is discarded. If skip_error is set to false and runtime.error.topic.name is configured, the data writing process continues, and the abnormal data is recorded in the topic specified by runtime.error.topic.name. For an example of how to handle abnormal data, see Abnormal data handling example.
runtime.error.topic.name	No	The name of the Kafka topic to which data that causes unknown errors during the write operation is written.
runtime.error.topic.bootstrap.servers	No	The bootstrap server address of the Kafka instance to which data that causes unknown errors during the write operation is written.
account_type	Yes	The method to access the target MaxCompute service. Valid values are STS and ALIYUN. The default value is ALIYUN. Different access methods require different access credential parameters. For more information, see Access MaxCompute using the ALIYUN method and Access MaxCompute using the STS method.

In addition to the common parameters, you must also configure the following parameters.

Parameter Name

Description

access_id

The AccessKey ID of your Alibaba Cloud account or RAM user.

You can obtain the AccessKey ID on the AccessKey Management page.

access_key

The AccessKey secret corresponding to the AccessKey ID.

In addition to the common parameters, you must also configure the following parameters.

Parameter	Description
account_id	The ID of the account used to access the target MaxCompute project. You can view your account ID in the Account Center.
region_id	The region ID of the target MaxCompute project. For the ID of each region, see Endpoints.
role_name	The name of the role used to access the target MaxCompute project. You can view the role name on the Roles page.
client_timeout_ms	The refresh interval for the Security Token Service (STS) token, in milliseconds (ms). The default value is 11 ms.
sts.endpoint	The endpoint of the STS service required for identity authentication using a temporary security token (STS). For the endpoints of each region and network, see Endpoints.

Run the following command to start the Kafka connector data migration task.

curl -i -X POST -H "Accept:application/json" -H  "Content-Type:application/json" http://localhost:8083/connectors -d @odps-sink-connector.json

Write TEXT data

Prepare the data.

Use a local client (odpscmd) or another tool that can run MaxCompute SQL commands to connect to MaxCompute and create the target table.

CREATE TABLE IF NOT EXISTS table_text(
  topic STRING,
  `partition` BIGINT,
  `offset` BIGINT,
  key STRING,
  value STRING
) PARTITIONED BY (pt STRING);

Create the Kafka data.

In the $KAFKA_HOME/bin/ directory, run the following command to create a Kafka topic. This example uses topic_text as the topic name.

sh kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic topic_text

Run the following command to create Kafka messages.

sh kafka-console-producer.sh --bootstrap-server localhost:9092 --topic topic_text --property parse.key=true
>123    abc
>456    edf

(Optional) Start the Kafka-connector service. For more information, see Configure and start the Kafka connector service.
Note
If the Kafka-connector service is already running, you can skip this step.

Create and configure the odps-sink-connector.json file. Then, upload the odps-sink-connector.json file to any location, such as the $KAFKA_HOME/config path.

The following code provides an example of the odps-sink-connector.json file. For more information about the odps-sink-connector.json file, see Configure and start the Kafka connector task.

{
    "name": "odps-test-text",
    "config": {
      "connector.class": "com.aliyun.odps.kafka.connect.MaxComputeSinkConnector",
      "tasks.max": "3",
      "topics": "topic_text",
      "endpoint": "http://service.cn-shanghai.maxcompute.aliyun.com/api",
      "project": "project_name",
      "schema":"default",
      "table": "table_text",
      "account_type": "ALIYUN",
      "access_id": "<yourAccessKeyId>",
      "access_key": "<yourAccessKeySecret>",
      "partition_window_type": "MINUTE",
      "mode":"VALUE",
      "format":"TEXT",
      "sink_pool_size":"150",
      "record_batch_size":"9000",
      "buffer_size_kb":"600000"
    }
  }

Run the following command to start the Kafka connector data migration task.

curl -i -X POST -H "Accept:application/json" -H  "Content-Type:application/json" http://localhost:8083/connectors -d @$KAFKA_HOME/config/odps-sink-connector.json

Verify the result.

Use a local client (odpscmd) or another tool that can run MaxCompute SQL commands to connect to MaxCompute, and then run the following command to query the data and verify the result.

set odps.sql.allow.fullscan=true;
select * from table_text;

The following output is returned:

# Because the mode parameter in the odps-sink-connector.json configuration file is set to VALUE, only the content of the value is retained. The key field is NULL.

+-------+------------+------------+-----+-------+----+
| topic | partition  | offset     | key | value | pt |
+-------+------------+------------+-----+-------+----+
| topic_text | 0      | 0          | NULL | abc   | 07-13-2023 21:13 |
| topic_text | 0      | 1          | NULL | edf   | 07-13-2023 21:13 |
+-------+------------+------------+-----+-------+----+

Write CSV data

Prepare the data.

Use a local client (odpscmd) or another tool that can run MaxCompute SQL commands to connect to MaxCompute and create the target table.

CREATE TABLE IF NOT EXISTS table_csv(
  topic STRING,
  `partition` BIGINT,
  `offset` BIGINT,
  id BIGINT,
  name STRING,
  region STRING
) PARTITIONED BY (pt STRING);

Write data to Kafka.

In the $KAFKA_HOME/bin/ directory, run the following command to create a Kafka topic named topic_csv.

sh kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic topic_csv

Run the following command to create Kafka messages.

sh kafka-console-producer.sh --bootstrap-server localhost:9092 --topic topic_csv --property parse.key=true
>123	1103,zhangsan,china
>456	1104,lisi,usa

(Optional) Start the Kafka-connector service. For more information, see Configure and start the Kafka connector service.
Note
If the Kafka-connector service is already running, you can skip this step.

Create and configure the odps-sink-connector.json file, and then upload the odps-sink-connector.json file to any location. This topic uses the $KAFKA_HOME/config path as an example.

The following code provides an example of the odps-sink-connector.json file. For more information about the odps-sink-connector.json file, see Configure and start the Kafka connector task.

{
    "name": "odps-test-csv",
    "config": {
      "connector.class": "com.aliyun.odps.kafka.connect.MaxComputeSinkConnector",
      "tasks.max": "3",
      "topics": "topic_csv",
      "endpoint": "http://service.cn-shanghai.maxcompute.aliyun.com/api",
      "project": "project_name",    
      "schema":"default",
      "table": "table_csv",
      "account_type": "ALIYUN",
      "access_id": "<yourAccessKeyId>",
      "access_key": "<yourAccessKeySecret>",
      "partition_window_type": "MINUTE",
      "format":"CSV",
      "mode":"VALUE",
      "sink_pool_size":"150",
      "record_batch_size":"9000",
      "buffer_size_kb":"600000"
    }
  }

Run the following command to start the Kafka connector data migration task.

curl -i -X POST -H "Accept:application/json" -H  "Content-Type:application/json" http://localhost:8083/connectors -d @$KAFKA_HOME/config/odps-sink-connector.json

Verify the result.

Use a local client (odpscmd) or another tool that can run MaxCompute SQL commands to connect to MaxCompute, and then run the following command to query the data and verify the result.

set odps.sql.allow.fullscan=true;
select * from table_csv;

The following output is returned:

+-------+------------+------------+------------+------+--------+----+
| topic | partition  | offset     | id         | name | region | pt |
+-------+------------+------------+------------+------+--------+----+
| csv_test | 0       | 0          | 1103       | zhangsan | china  | 07-14-2023 00:10 |
| csv_test | 0       | 1          | 1104       | lisi | usa    | 07-14-2023 00:10 |
+-------+------------+------------+------------+------+--------+----+

Write JSON data

Prepare the data.

Use a local client (odpscmd) or another tool that can run MaxCompute SQL commands to connect to MaxCompute and create the target table.

CREATE TABLE IF NOT EXISTS table_json(
  topic STRING,
  `partition` BIGINT,
  `offset` BIGINT,
  key STRING,
  value JSON
) PARTITIONED BY (pt STRING);

Create the Kafka data.

In the $KAFKA_HOME/bin/ directory, run the following command to create a Kafka topic. This example uses topic_json as the topic name.

sh kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic topic_json

Run the following command to create Kafka messages.

sh kafka-console-producer.sh --bootstrap-server localhost:9092 --topic topic_json --property parse.key=true
>123    {"id":123,"name":"json-1","region":"beijing"}                         
>456    {"id":456,"name":"json-2","region":"hangzhou"}

(Optional) Start the Kafka-connector service. For more information, see Configure and start the Kafka connector service.
Note
If the Kafka-connector service is already running, you can skip this step.

Create and configure the odps-sink-connector.json file. Then, upload the odps-sink-connector.json file to any location, such as the $KAFKA_HOME/config path.

The following code provides an example of the odps-sink-connector.json file. For more information about the odps-sink-connector.json file, see Configure and start the Kafka connector task.

{
    "name": "odps-test-json",
    "config": {
      "connector.class": "com.aliyun.odps.kafka.connect.MaxComputeSinkConnector",
      "tasks.max": "3",
      "topics": "topic_json",
      "endpoint": "http://service.cn-shanghai.maxcompute.aliyun.com/api",
      "project": "project_name",    
      "schema":"default",
      "table": "table_json",
      "account_type": "ALIYUN",
      "access_id": "<yourAccessKeyId>",
      "access_key": "<yourAccessKeySecret>",
      "partition_window_type": "MINUTE",
      "mode":"VALUE",
      "format":"JSON",
      "sink_pool_size":"150",
      "record_batch_size":"9000",
      "buffer_size_kb":"600000"
    }
  }

Run the following command to start the Kafka connector data migration task.

curl -i -X POST -H "Accept:application/json" -H  "Content-Type:application/json" http://localhost:8083/connectors -d @$KAFKA_HOME/config/odps-sink-connector.json

Verify the result.

Use a local client (odpscmd) or another tool that can run MaxCompute SQL commands to connect to MaxCompute, and then run the following command to query the data and verify the result.

set odps.sql.allow.fullscan=true;
select * from table_json;

The following output is returned:

# The JSON data is successfully written to the value field.
+-------+------------+------------+-----+-------+----+
| topic | partition  | offset     | key | value | pt |
+-------+------------+------------+-----+-------+----+
| Topic_json | 0      | 0          | NULL | {"id":123,"name":"json-1","region":"beijing"} | 07-14-2023 00:28 |
| Topic_json | 0      | 1          | NULL | {"id":456,"name":"json-2","region":"hangzhou"} | 07-14-2023 00:28 |
+-------+------------+------------+-----+-------+----+

Write FLATTEN data

Prepare the data.

Use a local client (odpscmd) or another tool that can run MaxCompute SQL commands to connect to MaxCompute and create the target table.

CREATE TABLE IF NOT EXISTS table_flatten(
  topic STRING,
  `partition` BIGINT,
  `offset` BIGINT,
  id BIGINT,
  name STRING,
  extendinfo JSON
) PARTITIONED BY (pt STRING);

Create the Kafka data.

In the $KAFKA_HOME/bin/ directory, run the following command to create a Kafka topic. This example uses topic_flatten as the topic name.

./kafka/bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic topic_flatten

Run the following command to create Kafka messages.

sh kafka-console-producer.sh --bootstrap-server localhost:9092 --topic topic_flatten --property parse.key=true
>123  {"id":123,"name":"json-1","extendinfo":{"region":"beijing","sex":"M"}}                         
>456  {"id":456,"name":"json-2","extendinfo":{"region":"hangzhou","sex":"W"}}

(Optional) Start the Kafka-connector service. For more information, see Configure and start the Kafka connector service.
Note
If the Kafka-connector service is already running, you can skip this step.

Create and configure the odps-sink-connector.json file, and then upload the odps-sink-connector.json file to any location. This topic uses the $KAFKA_HOME/config path as an example.

The following code provides an example of the odps-sink-connector.json file. For more information about the odps-sink-connector.json file, see Configure and start the Kafka connector task.

{
    "name": "odps-test-flatten",
    "config": {
      "connector.class": "com.aliyun.odps.kafka.connect.MaxComputeSinkConnector",
      "tasks.max": "3",
      "topics": "topic_flatten",
      "endpoint": "http://service.cn-shanghai.maxcompute.aliyun.com/api",
      "project": "project_name",    
      "schema":"default",
      "table": "table_flatten",
      "account_type": "ALIYUN",
      "access_id": "<yourAccessKeyId>",
      "access_key": "<yourAccessKeySecret>",
      "partition_window_type": "MINUTE",
      "mode":"VALUE",
      "format":"FLATTEN",
      "sink_pool_size":"150",
      "record_batch_size":"9000",
      "buffer_size_kb":"600000"
    }
  }

Run the following command to start the Kafka connector task.

curl -i -X POST -H "Accept:application/json" -H  "Content-Type:application/json" http://localhost:8083/connectors -d @$KAFKA_HOME/config/odps-sink-connector.json

Verify the result.

Use a local client (odpscmd) or another tool that can run MaxCompute SQL commands to connect to MaxCompute, and then run the following command to query the data and verify the result.

set odps.sql.allow.fullscan=true;
select * from table_flatten;

The following shows the result:

# The JSON data is parsed and written to a MaxCompute table, with extendinfo as a JSON field that supports nesting.
+-------+------------+--------+-----+------+------------+----+
| topic | partition  | offset | id  | name | extendinfo | pt |
+-------+------------+--------+-----+------+------------+----+
| topic_flatten | 0   | 0      | 123 | json-1 | {"sex":"M","region":"beijing"} | 07-14-2023 01:33 |
| topic_flatten | 0   | 1      | 456 | json-2 | {"sex":"W","region":"hangzhou"} | 07-14-2023 01:33 |
+-------+------------+--------+-----+------+------------+----+

Abnormal data handling example

Prepare the data.

Use a local client (odpscmd) or another tool that can run MaxCompute SQL commands to connect to MaxCompute and create the target table.

CREATE TABLE IF NOT EXISTS table_flatten(
  topic STRING,
  `partition` BIGINT,
  `offset` BIGINT,
  id BIGINT,
  name STRING,
  extendinfo JSON
) PARTITIONED BY (pt STRING);

Create the Kafka data.

In the $KAFKA_HOME/bin/ directory, run the following commands to create Kafka topics.

The topic_abnormal topic.

sh kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic topic_abnormal

Message topic for runtime_error exceptions.
```
sh kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic runtime_error
```
Note
If an error occurs during a data write operation, the abnormal data is written to the runtime_error topic. This type of error is usually caused by a mismatch between the Kafka data and the MaxCompute table schema.

Run the following command to create Kafka messages.

One of the messages in the following command does not match the schema of the target MaxCompute table.

sh kafka-console-producer.sh --bootstrap-server localhost:9092 --topic flatten_test --property parse.key=true

>100  {"id":100,"name":"json-3","extendinfo":{"region":"beijing","gender":"M"}}                         
>101  {"id":101,"name":"json-4","extendinfos":"null"}
>102	{"id":102,"name":"json-5","extendinfo":{"region":"beijing","gender":"M"}}

(Optional) Start the Kafka-connector service. For more information, see Configure and start the Kafka connector service.
Note
If the Kafka-connector service is already running, you can skip this step.

Create and configure the odps-sink-connector.json file, and then upload the odps-sink-connector.json file to any location. This topic uses the $KAFKA_HOME/config path as an example.

The following code provides an example of the odps-sink-connector.json file. For more information about the odps-sink-connector.json file, see Configure and start the Kafka connector task.

{
  "name": "odps-test-runtime-error",
  "config": {
    "connector.class": "com.aliyun.odps.kafka.connect.MaxComputeSinkConnector",
    "tasks.max": "3",
    "topics": "topic_abnormal",
    "endpoint": "http://service.cn-shanghai.maxcompute.aliyun.com/api",
    "project": "project_name",
    "schema":"default",
    "table": "test_flatten",
    "account_type": "ALIYUN",
    "access_id": "<yourAccessKeyId>",
    "access_key": "<yourAccessKeySecret>",
    "partition_window_type": "MINUTE",
    "mode":"VALUE",
    "format":"FLATTEN",
    "sink_pool_size":"150",
    "record_batch_size":"9000",
    "buffer_size_kb":"600000",
    "runtime.error.topic.name":"runtime_error",
    "runtime.error.topic.bootstrap.servers":"http://XXXX",
    "skip_error":"false"
  }
}

Run the following command to start the Kafka connector task.

curl -i -X POST -H "Accept:application/json" -H  "Content-Type:application/json" http://localhost:8083/connectors -d @$KAFKA_HOME/config/odps-sink-connector.json

Verify the result.

Query the MaxCompute table data

Use a local client (odpscmd) or another tool that can run MaxCompute SQL commands to connect to MaxCompute, and then run the following command to query the data and verify the result.

set odps.sql.allow.fullscan=true;
select * from table_flatten;

The following output is returned:

# As you can see from the results, the data with ID 101 was not written to MaxCompute because it did not match the table schema.
# Because the runtime.error.topic.name parameter was configured, the process was not blocked, and subsequent data was written successfully.
+-------+------------+------------+------------+------+------------+----+
| topic | partition  | offset     | id         | name | extendinfo | pt |
+-------+------------+------------+------------+------+------------+----+
| flatten_test | 0          | 0          | 123        | json-1 | {"gender":"M","region":"beijing"} | 07-14-2023 01:33 |
| flatten_test | 0          | 1          | 456        | json-2 | {"gender":"W","region":"hangzhou"} | 07-14-2023 01:33 |
| flatten_test | 0          | 0          | 123        | json-1 | {"gender":"M","region":"beijing"} | 07-14-2023 13:16 |
| flatten_test | 0          | 1          | 456        | json-2 | {"gender":"W","region":"hangzhou"} | 07-14-2023 13:16 |
| flatten_test | 0          | 2          | 100        | json-3 | {"gender":"M","region":"beijing"} | 07-14-2023 13:16 |
| flatten_test | 0          | 4          | 102        | json-5 | {"gender":"M","region":"beijing"} | 07-14-2023 13:16 |
+-------+------------+------------+------------+------+------------+----+

Query messages in the runtime_error topic

In the $KAFKA_HOME/bin/ directory, run the following command to view the messages.

sh kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic runtime_error --from-beginning

The following result is returned:

# The abnormal data is successfully written to the runtime_error message queue.
{"id":101,"name":"json-4","extendinfos":"null"}