All Products
Search
Document Center

EventBridge:Use the Debezium PostgreSQL Source connector to synchronize data from ApsaraDB RDS for PostgreSQL to ApsaraMQ for Kafka

Last Updated:Jan 26, 2024

This topic describes how to create the Debezium PostgreSQL Source connector to synchronize data from ApsaraDB RDS for PostgreSQL to ApsaraMQ for Kafka.

Limits

The Debezium PostgreSQL Source connector allows you to configure only one task to consume Change Data Capture (CDC) data from a source.

Prerequisites

Step 1: Create a table

  1. Log on to the ApsaraDB RDS console and create an ApsaraDB RDS for PostgreSQL instance. For more information, see Create an ApsaraDB RDS for PostgreSQL instance.

    When you create the ApsaraDB RDS for PostgreSQL instance, select the VPC in which the ApsaraMQ for Kafka instance that you created in the "Prerequisites" section of this topic is deployed and add the CIDR block of the VPC to the whitelist.加入白名单

  2. On the Instances page, click the ApsaraDB RDS for PostgreSQL instance that you created. Then, perform the following operations on the instance details page:

    1. Create a new account. For more information, see Create a database and an account on an ApsaraDB RDS for PostgreSQL instance. You can also use an existing account.

    2. Create a database. For more information, see Create a database and an account on an ApsaraDB RDS for PostgreSQL instance. You can also use an existing database.

    3. Click Database Connection to view and record the internal endpoint and port number.

      内网地址

    4. Click Parameters, change the value of the wal_level parameter to logical in the Running Parameter Value column, and then click Apply Changes.

  3. On the instance details page, click Log On to Database to go to the Data Management (DMS) console. Then, perform the following operations:

    1. Right-click the database that you want to manage, select Mode Management, and then click Create Schema to create a schema.

      Note

      You must use the newly created schema in the configurations of the connector. You cannot use system schemas, including information, pg_catalog, and public schemas.

    2. In the new schema, use SQL statements to create a table. The following sample command provides an example on how to create a table whose column names are id and number. For more information, see SQL Commands.

      CREATE TABLE sql_table(id INT ,number INT);
    3. Run the following command to create and initialize the wal2json plug-in and enable data subscription:

      SELECT 'init' FROM pg_create_logical_replication_slot('test_slot', 'wal2json');

Step 2: Create the connector

  1. Download the Debezium PostgreSQL CDC Source Connector file and upload the file to the OSS bucket that you created. For more information, see Get started by using the OSS console.

    Important

    When you download the Debezium PostgreSQL CDC Source Connector file, select a version that is compatible with Java 8.

  2. Log on to the ApsaraMQ for Kafka console. In the Resource Distribution section of the Overview page, select the region where the ApsaraMQ for Kafka instance that you want to manage resides.

  3. In the left-side navigation pane, choose Connector Ecosystem Integration > Tasks.

  4. On the Tasks page, click Create Task.

  5. On the Create Task page, configure the Task Name parameter and follow the on-screen instructions to configure other parameters. The following section describes the parameters:

    • Task Creation

      1. In the Source step, set the Data Provider parameter to Apache Kafka Connect and click Next Step.

      2. In the Connector step, configure the parameters and click Next Step. The following table describes the parameters.

        Subsection

        Parameter

        Description

        Kafka Connect Plug-in

        Bucket

        Select the OSS bucket to which the Debezium PostgreSQL CDC Source Connector file was uploaded.

        File

        Select the Debezium PostgreSQL CDC Source Connector file that you uploaded to the OSS bucket.

        Message Queue for Apache Kafka Resources

        Message Queue for Apache Kafka Parameters

        Select Source Connect.

        Message Queue for Apache Kafka Instance

        Select the ApsaraMQ for Kafka instance that you created in the "Prerequisites" section of this topic.

        VPC

        Select the VPC that you created.

        vSwitch

        Select the vSwitch that you created.

        Security Group

        Select a security group.

        Kafka Connect

        Parse .properties File in ZIP Package

        Select Create .properties File. Change the values of the related fields in the code editor.

        Expand to view the field description

        Field

        Description

        connector.class

        The name of the Debezium PostgreSQL CDC Source Connector file. Use the default value for this field.

        database.dbname

        The name of the ApsaraDB RDS for PostgreSQL database.

        database.hostname

        Specify the internal endpoint that is obtained in Step 1: Create a table.

        database.port

        Specify the port number that is obtained in Step 1: Create a table.

        database.user

        The username that you use to log on to the ApsaraDB RDS for PostgreSQL database.

        database.password

        The password that you use to log on to the ApsaraDB RDS for PostgreSQL database.

        slot.name

        The name of the logical replication stream of the ApsaraDB RDS for PostgreSQL database.

        table.whitelist

        The database tables. Separate multiple tables with commas (,). The name of a table is in the <schemaName>.<tableName> format.

        database.server.name

        The prefix of the destination topic name. The name of a destination topic is in the {database.server.name}.{schemaName}.{tableName} format.

        Important

        Before you synchronize data, you must create a topic in the preceding format.

        Expand to view sample code

        connector.class=io.debezium.connector.postgresql.PostgresConnector
        database.dbname=test_database
        database.hostname=pgm-xxx.pg.rds.aliyuncs.com
        database.password=xxx
        database.port=5432
        database.user=xxx
        name=debezium-psql-source
        # The name of the plug-in. In this example, wal2json is used. Valid values: decoderbufs, wal2json, wal2json_rds, wal2json_streaming, and wal2 json_rds_streaming. 
        plugin.name=wal2json
        slot.drop_on_stop=true
        slot.name=test_slot
        # The tables of the source database. Separate multiple tables with commas (,). The rules that are used to specify tables are in the <schemaName>.<tableName> format. 
        table.whitelist=test_schema.test_table
        # Note: Only one task can consume the message. 
        tasks.max=1
        
        # The prefix of the destination topic. The name of the destination topic is in the <database.server.name>.<schemaName>.<tableName> format. 
        # In this example, schemaName is kafka_connect_schema, and tableName is table2_with_pk.
        # The CDC data of the table2_with_pk table flows to the destination topic test-prefix.kafka_connect_schema.table2_with_pk
        database.server.name=test-prefix
        
        # The component that is used to convert the format of the message value. 
        value.converter=org.apache.kafka.connect.json.JsonConverter
        # Specify whether the message value contains schema information about the structure. 
        value.converter.schemas.enable=false

        For information about all parameters that are used to create the Debezium PostgreSQL Source connector, see the "Connector Properties" section of Debezium Connector for PostgreSQL.

      3. In the Instance step, configure the parameters and click Next Step. The following table describes the parameters.

        Subsection

        Parameter

        Description

        Worker Type

        Worker Type

        Select a worker type.

        Min. Number of Workers

        Set this parameter to 1.

        Max. Number of Workers

        Set this parameter to 1.

        Worker Configurations

        Automatically Create Dependencies for Apache Kafka Connector Worker

        We recommend that you select this option. After you select this option, the system creates internal topics and consumer groups that are required to run Kafka Connect in the selected ApsaraMQ for Kafka instance and synchronizes the information to the corresponding parameters in the code editor. The following items describe the parameters in the code editor:

        • Offset Topic: the topic that is used to store offset data. The name of the topic is in the connect-eb-offset-<Task name> format.

        • Config Topic: the topic that is used to store the configuration data of connectors and tasks. The name of the topic is in the connect-eb-config-<Task name> format.

        • Status Topic: the topic that is used to store status data of connectors and tasks. The name of the topic is in the connect-eb-status-<Task name> format.

        • Kafka Connect Consumer Group: the consumer group that the Kafka Connect workers use to consume messages in internal topics. The name of the consumer group is in the connect-eb-cluster-<Task name> format.

        • Kafka Source Connector Consumer Group: the consumer group that is used to consume data in the source ApsaraMQ for Kafka topic. This consumer group can be used only by sink connectors. The name of the consumer group is in the connector-eb-cluster-<task name>-<connector name> format.

      4. In the Running Configurations section, set the Log Delivery parameter to Deliver Data to Log Service or Deliver Data to ApsaraMQ for Kafka, select a role on which Kafka Connect depends from the Role drop-down list in the Role Authorization subsection, and then click Save.

        Important

        We recommend that you select a role to which the AliyunSAEFullAccess permission policy is attached. Otherwise, the task may fail to run.

    • Task Property

      Configure the retry policy and dead-letter queue for the task. For more information, see Retry policies and dead-letter queues.

    If the status of the task becomes Running, the connector starts to work as expected.

Step 3: Test the connector

  1. In the DMS console, insert a data record into the data table that you created in Step 1: Create a table. The following sample command provides an example on how to insert a data record whose id is 123 and number is 20000:

    INSERT INTO sql_table(id, number) VALUES(123,20000);
  2. Log on to the ApsaraMQ for Kafka console. On the Instances page, click the name of the instance that you want to manage.

  3. In the left-side navigation pane of the Instance Details page, click Topics. On the page that appears, click the topic that you created in the {database.server.name}.{schemaName}.{tableName} format. On the page that appears, click the Message Query tab to view the inserted message data. The following sample code provides an example of a message value:

    {"before":null,"after":{"id":123,"number":20000},"source":{"version":"0.9.2.Final","connector":"postgresql","name":"test-prefix","db":"wb","ts_usec":168386295815075****,"txId":10339,"lsn":412719****,"schema":"test_schema","table":"sql_table","snapshot":false,"last_snapshot_record":null},"op":"c","ts_ms":168386295****}

Common errors and troubleshooting

Error 1: All tasks fail to run

Error message:

All tasks under connector mongo-source failed, please check the error trace of the task.

Solution: On the Message Inflow Task Details page, click Diagnostics in the Basic Information section to go to the Connector Monitoring page. On the Connector Monitoring page, you can view the details of the task failure.

Error 2: Kafka Connect unexpectedly exits

Error message:

Kafka connect exited! Please check the error log /opt/kafka/logs/connect.log on sae application to find out the reason why kafka connect exited and update the event streaming with valid arguments to solve it.

Solution: The status update of Kafka Connect may be delayed. We recommend that you refresh the page. If Kafka Connect still fails, you can perform the following operations to troubleshoot the issue:

  1. In the Worker Information section of the Message Inflow Task Details page, click the instance name to the right of SAE Application to go to the Application Details page.

  2. On the Basic Information page, click the Instance Deployment Information tab.

  3. Click Webshell in the Actions column to log on to the running environment of Kafka Connect.实例部署信息

    • Run the vi /home/admin/connector-bootstrap.log command to view the startup logs of the connector and check whether an error message exists in the logs.

    • Run the vi /opt/kafka/logs/connect.log command to view the running logs of the connector and check whether an error message exists in the ERROR or WARN field.

After you troubleshoot the issue based on the error message, you can restart the corresponding task.

Error 3: The verification of connector parameters fails

Error message:

Start or update connector xxx failed. Error code=400. Error message=Connector configuration is invalid and contains the following 1 error(s):
Value must be one of never, initial_only, when_needed, initial, schema_only, schema_only_recovery
You can also find the above list of errors at the endpoint `/connector-plugins/{connectorType}/config/validate`

Solution: Find the parameter whose value is invalid based on the error message and update the parameter. If you cannot find the parameter based on the error message, you can log on to the running environment of Kafka Connect and run the following command. For information about how to log on to the running environment of Kafka Connect, see Error 2 in this topic.

curl -i -X PUT -H "Accept:application/json" -H  "Content-Type:application/json" -d @$CONNECTOR_PROPERTIES_MAPPING http://localhost:8083/connector-plugins/io.confluent.connect.jdbc.JdbcSinkConnector/config/validate

The verification result of each connector parameter is returned in the response. If the value of a parameter is invalid, the errors field of the parameter is not empty.

"value":{
    "name":"snapshot.mode",
    "value":null,
    "recommended_values":[
        "never",
        "initial_only",
        "when_needed",
        "initial",
        "schema_only",
        "schema_only_recovery"
    ],
    "errors":[
        "Value must be one of never, initial_only, when_needed, initial, schema_only, schema_only_recovery"
    ],
    "visible":true
}