All Products
Search
Document Center

:Synchronize data from Kafka to ApsaraDB for OceanBase in real time

Last Updated:Feb 28, 2026

A single-table real-time synchronization task reads data from a Kafka topic, initializes the destination table structure in ApsaraDB for OceanBase, backfills historical data from the topic, and then continuously synchronizes incremental data in real time.

How it works

The synchronization task runs in two phases:

  1. Schema migration -- Creates the destination table in ApsaraDB for OceanBase or validates an existing table. If automatic table creation is selected, the DDL used for table creation is displayed.

  2. Real-time data synchronization -- Streams data from the Kafka topic to the destination table. During this phase, you can monitor read and write traffic, dirty data, failover events, and operation logs.

Prerequisites

Before you begin, make sure that you have:

Create and run the synchronization task

Step 1: Create the synchronization task

  1. Log on to the DataWorks console. In the top navigation bar, select the target region. In the left-side navigation pane, choose Data Integration > Data Integration. Select the target workspace from the drop-down list and click Go to Data Integration.

  2. In the left-side navigation pane, click Synchronization Task, and then click Create Synchronization Task. Configure the following settings:

    SettingValue
    Source and DestinationKafka -> ApsaraDB for OceanBase
    NameEnter a custom name
    Specific TypeSingle table Real-time synchronization

Step 2: Configure network and resources

  1. In the Network And Resource Configuration section, select the Resource Group for the synchronization task. Optionally, adjust the number of CUs under Task Resource Usage.

  2. For Source, select the Kafka data source. For Destination, select the ApsaraDB for OceanBase data source.

  3. Click Test Connectivity. After both data sources show successful connectivity, click Next.

    Network and resource configuration

Step 3: Configure the synchronization link

Configure the Kafka source

Click the Kafka data source at the top of the page to open the Kafka Source Information panel.

Kafka source configuration
  1. In the Kafka Source Information section, select the topic to synchronize. Retain the default values for other parameters, or modify them as needed.

  2. Click Data Sampling in the upper-right corner. In the dialog box, specify the Start Time and Number Of Samples, and then click Start Collection to retrieve sample data from the topic.

  3. In the Output Field Configuration section, select the fields to synchronize.

Add data processing nodes

Click the image icon to add data processing nodes between the source and the destination. The following processing methods are available:

Arrange the processing nodes in the order they should run. During task execution, data passes through each node sequentially.

Data processing nodes

To preview the output of a processing node, click Data Output Preview in the upper-right corner, and then click Retrieve Upstream Output Again in the dialog box.

Data output preview
Note

Data output preview requires sampled data from the Kafka source. Complete data sampling before previewing.

Configure the ApsaraDB for OceanBase destination

Click the ApsaraDB for OceanBase data destination at the top of the page to open the OceanBase Destination Information panel.

OceanBase destination configuration
  1. In the OceanBase Destination Information section, choose one of the following options for the destination table:

    • Automatically Create Table -- Creates a table with the same name as the data source table by default. Modify the table name if needed.

    • Use Existing Table -- Select a table from the drop-down list.

  2. (Optional) If you selected Automatically Create Table, click Edit Table Schema to modify the schema of the destination table. Click Re-generate Table Schema Based on Output Column of Ancestor Node to regenerate the schema from upstream output columns. Select a column and set it as the primary key.

    Important

    The destination table must have a primary key. The configuration cannot be saved without one.

  3. Configure field mappings. The system maps source fields to destination fields automatically based on matching field names. Adjust the mappings as needed. Field mapping rules:

    • One source field can map to multiple destination fields.

    • Multiple source fields cannot map to the same destination field.

    • Source fields without a mapped destination field are not synchronized.

Step 4: Configure alert rules

Set up alert rules to get notified when the synchronization task encounters issues.

  1. In the upper-right corner of the page, click Configure Alert Rule.

  2. In the panel, click Add Alert Rule and configure the alert parameters.

  3. Enable or disable individual alert rules as needed. Assign different alert recipients based on alert severity levels.

Note

Alert rules apply to the real-time synchronization subtask generated by this task. After the task is created, you can modify alert rules on the Real-time Synchronization Task page.

Step 5: Configure advanced parameters

  1. In the upper-right corner of the page, click Configure Advanced Parameters.

  2. In the panel, modify the parameter values as needed.

Note

Understand the meaning of each parameter before changing its value. Incorrect settings may cause errors or data quality issues.

Step 6: Change the resource group

Click Configure Resource Group in the upper-right corner to view or change the resource group assigned to this task.

Step 7: Test the synchronization task

Before going live, run a simulated test to verify the configuration. The test synchronizes sampled data to the destination table and reports errors in real time if any configuration is invalid, an exception occurs, or dirty data is generated.

  1. In the upper-right corner of the page, click Perform Simulated Running.

  2. In the dialog box, specify the Start At time and the Sampled Data Records count.

  3. Click Start Collection to retrieve sample data from the Kafka source.

  4. Click Preview to synchronize the sampled data to the destination and verify the result.

Step 8: Start the synchronization task

  1. Click Complete at the bottom of the page to save the task configuration.

  2. On the Synchronization Task page, find the task in the Tasks section and click Start in the Operation column.

  3. Click the task name or ID to monitor the execution progress.

Manage the synchronization task

View task status

After the task is created, go to the Tasks page to view all synchronization tasks in the workspace.

Tasks list
  • Click Start or Stop in the Operation column to control the task. Use the More drop-down list to Edit or View the task.

  • For a running task, click the Execution Overview area to view execution details.

Execution overview

The execution overview shows two phases:

  • Schema Migration -- Displays the table creation method (existing table or auto-created table) and the DDL if auto-creation was used.

  • Real-time Data Synchronization -- Displays real-time read and write traffic, dirty data, failover events, and operation logs.

Rerun the synchronization task

If you need to modify the synchronized fields, destination table fields, or table name, rerun the task. Only the changes are applied -- data in unmodified tables that has already been synchronized is not re-synchronized.

Two options are available:

  • Rerun without changes: Click Rerun in the Operation column to rerun the task with the current configuration.

  • Rerun with changes: Edit the task configuration, click Complete to save, and then click Apply Updates in the Operation column.