Real-time integration enables you to collect and combine data from multiple data sources into a destination data source. This process creates a real-time link for data synchronization. This topic describes how to create a real-time integration task.
Prerequisites
You must configure the required data sources before you create a real-time integration task. This lets you select the source and destination data during the configuration process. For more information, see Supported data sources for real-time integration.
Background information
If you select Oracle or MySQL as the destination data source, the Java Database Connectivity (JDBC) protocol is used. Different messages are processed based on the following policies.
If the sink table does not have a primary key:
INSERT messages are directly appended.
UPDATE_BEFORE messages are discarded. UPDATE_AFTER messages are directly appended.
DELETE messages are discarded.
If the sink table has a primary key
INSERT messages are processed as UPSERT messages.
UPDATE_BEFORE messages are discarded. UPDATE_AFTER messages are processed as UPSERT messages.
DELETE messages are processed as DELETE messages.
Because the JDBC protocol writes data immediately, duplicate data may exist if a node fails over and the sink table does not have a primary key. Exactly-once delivery is not guaranteed.
Because the JDBC protocol supports only Data Definition Language (DDL) statements for creating tables and adding fields, other types of DDL messages are discarded.
Oracle supports only basic data types. The INTERVAL YEAR, INTERVAL DAY, BFILE, SYS.ANY, XML, map, ROWID, and UROWID data types are not supported.
MySQL supports only basic data types. The map data type is not supported.
To prevent data inconsistency caused by out-of-order data, only a single concurrent task is supported.
Oracle data sources support Oracle Database 11g, Oracle Database 19c, and Oracle Database 21c.
MySQL data sources support MySQL 8.0, MySQL 8.4, and MySQL 5.7.
Step 1: Create a real-time integration task
On the Dataphin homepage, choose Developer > Data Integration from the top menu bar.
In the top menu bar, select a project. If you are in Dev-Prod mode, you also need to select an environment.
In the navigation pane on the left, choose Integration > Stream Pipeline.
In the real-time integration list, click the
icon and choose Real-time Integration Task to open the Create Real-time Integration Task dialog box.In the Create Real-time Integration Task dialog box, configure the following parameters.
Parameter
Description
Task Name
Enter a name for the real-time task.
The name must start with a letter. It can contain only lowercase letters, digits, and underscores (_). The name must be 4 to 63 characters in length.
Production/Development Environment Queue Resource
You can select any resource group that is configured for real-time tasks.
NoteThis configuration item is available only when the project uses a Flink compute source in Kubernetes deployment mode.
Description
Enter a brief description of the task. The description can be up to 1,000 characters in length.
Select Directory
Select a directory to store the real-time task.
If no directory exists, you can create a new folder as follows:
Above the real-time task list on the left, click the
icon to open the New Folder dialog box.In the New Folder dialog box, enter a Name for the folder and select a location under Select Directory as needed.
Click OK.
After you complete the configuration, click OK.
Step 2: Configure the real-time integration task
The supported source and destination data sources vary based on the real-time computing engine. For more information, see Supported data sources for real-time integration.
Source data source
MySQL
Parameter | Description | |
Data Source Configuration | Data Source Type | Select MySQL. |
Datasource | Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a MySQL data source. Important Enable logging for the data source and ensure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time. | |
Time Zone | Displays the time zone configured for the selected data source. | |
Sync Rule Configuration | Sync Policy | Select Real-time Incremental or Real-time Incremental + Full. The default value is Real-time Incremental.
Note You can select Real-time Incremental + Full when the destination data source is Hive (Hudi table format), MaxCompute, or Databricks. |
Selection Method | You can select Entire Database, Select Tables, or Exclude Tables.
| |
Microsoft SQL Server
Parameter | Description | |
Data Source Configuration | Data Source Type | Select Microsoft SQL Server. |
Datasource | Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a Microsoft SQL Server data source. Important Enable logging for the data source and ensure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time. | |
Time Zone | Displays the time zone configured for the selected data source. | |
Sync Rule Configuration | Sync Policy | Only Real-time Incremental is supported. Incremental changes from the source database are collected and written to the downstream destination database in real time in the order they occur. |
Selection Method | You can select Entire Database, Select Tables, or Exclude Tables.
| |
PostgreSQL
Parameter | Description | |
Data Source Configuration | Data Source Type | Select PostgreSQL. |
Datasource | Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a PostgreSQL data source. Important Enable logging for the data source and ensure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time. | |
Time Zone | Displays the time zone configured for the selected data source. | |
Sync Rule Configuration | Sync Policy | Only Real-time Incremental is supported. Incremental changes from the source database are collected and written to the downstream destination database in real time in the order they occur. |
Selection Method | You can select Entire Database or Select Tables.
| |
Oracle
Parameter | Description | |
Data Source Configuration | Data Source Type | Select Oracle. |
Datasource | Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create an Oracle data source. Important Enable logging for the data source and ensure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time. | |
Time Zone | Displays the time zone configured for the selected data source. | |
Sync Rule Configuration | Sync Policy | Only Real-time Incremental is supported. Incremental changes from the source database are collected and written to the downstream destination database in real time in the order they occur. |
Selection Method | You can select Entire Database, Select Tables, or Exclude Tables.
| |
IBM DB2
Parameter | Description | |
Data Source Configuration | Data Source Type | Select IBM DB2. |
Datasource | Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create an IBM DB2 data source. Important Enable logging for the data source and ensure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time. | |
Sync Rule Configuration | Sync Policy | Only Real-time Incremental is supported. Incremental changes from the source database are collected and written to the downstream destination database in real time in the order they occur. |
Selection Method | You can select Entire Database, Select Tables, or Exclude Tables.
| |
Kafka
Parameter | Description | |
Data Source Configuration | Data Source Type | Select Kafka. |
Datasource | Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a Kafka data source. Important Enable logging for the data source and ensure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time. | |
Source Topic | Select the topic of the source data. You can enter a keyword of the topic name to perform a fuzzy search. | |
Data Format | Only Canal JSON is supported. Canal JSON is a format compatible with Canal, and its data storage format is Canal JSON. | |
Key Type | The key type of Kafka, which determines the key.deserializer configuration when initializing KafkaConsumer. Only STRING is supported. | |
Value Type | The value type of Kafka, which determines the value.deserializer configuration when initializing KafkaConsumer. Only STRING is supported. | |
Consumer Group ID (optional) | Enter the ID of the consumer group. The consumer group ID is used to report the status offset. | |
Sync Rule Configuration | Table List | Enter the names of the tables to be synchronized. Separate multiple table names with line breaks. The value can be up to 1,024 characters in length. Table names can be in one of the following three formats: |
Hive (Hudi table format)
You can select Hive (Hudi data source) as the source data source only when the real-time computing engine is Apache Flink and the compute resource is a Flink on YARN deployment.
Parameter | Description | |
Data Source Configuration | Data Source Type | Select Hive. |
Datasource | You can only select a Hive data source in Hudi table format. You can also click New to create a data source on the Datasource page. For more information, see Create a Hive data source. Important Enable logging for the data source and ensure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time. | |
Sync Rule Configuration | Sync Policy | Only Real-time Incremental is supported. Incremental changes from the source database are collected and written to the downstream destination database in real time in the order they occur. |
Select Table | Select a single table for real-time synchronization. | |
PolarDB (MySQL database type)
Parameter | Description | |
Data Source Configuration | Data Source Type | Select PolarDB. |
Datasource | You can only select a PolarDB data source of the MySQL database type. You can also click New to create a data source on the Datasource page. For more information, see Create a PolarDB data source. Important Enable logging for the data source and ensure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time. | |
Time Zone | Displays the time zone configured for the selected data source. | |
Sync Rule Configuration | Sync Policy | Select Real-time Incremental or Real-time Incremental + Full. The default value is Real-time Incremental.
Note You can select Real-time Incremental + Full when the destination data source is Hive (Hudi table format), MaxCompute, or Databricks. |
Selection Method | You can select Entire Database, Select Tables, or Exclude Tables.
| |
Destination data source
MaxCompute
Parameter | Description | |
Data Source Configuration | Data Source Type | Select MaxCompute. |
Datasource | Select a destination data source. You can select a MaxCompute data source and project. You can also click New to create a data source on the data source page. For more information, see Create a MaxCompute data source. | |
New Sink Table Configuration | New Table Type | You can select Standard Table or Delta Table. The default value is Standard Table. If you select Delta Table and set the sink table creation method to Auto Create Table, a MaxCompute Delta table is created. Additional fields are not used when creating a Delta table. Note After you configure the sink table, if you change the new table type, the system prompts you for confirmation. If you click OK in the dialog box, the sink table configuration is cleared and you must reconfigure it. |
Table Name Transform | Destination table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure table name transformation rules. Click Configure Table Name Transform to open the Configure Table Name Transformation Rules dialog box.
Note
| |
Partition Format | If you set New Table Type to Standard Table, only Multiple Partitions is supported. If you set New Table Type to Delta Table, you can select No Partition or Multiple Partitions. | |
Partition Interval | If you set Partition Format to No Partition, you cannot configure the partition interval. If you set Partition Format to Multiple Partitions, you can set Partition Interval to Hour or Day. Note
| |
MySQL
Parameter | Description | |
Data Source Configuration | Data Source Type | Select MySQL. |
Datasource | Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a MySQL data source. | |
Time Zone | Displays the time zone configured for the selected data source. | |
New Sink Table Configuration | Table Name Transform | Destination table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure table name transformation rules. Click Configure Table Name Transform to open the Configure Table Name Transformation Rules dialog box.
Note
|
Microsoft SQL Server
Parameter | Description | |
Data Source Configuration | Data Source Type | Select Microsoft SQL Server. |
Datasource | Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a Microsoft SQL Server data source. | |
Time Zone | Displays the time zone configured for the selected data source. | |
New Sink Table Configuration | Table Name Transform | Destination table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure table name transformation rules. Click Configure Table Name Transform to open the Configure Table Name Transformation Rules dialog box.
Note
|
Oracle
Parameter | Description | |
Data Source Configuration | Data Source Type | Select Oracle. |
Datasource | Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create an Oracle data source. | |
Time Zone | Displays the time zone configured for the selected data source. | |
New Sink Table Configuration | Table Name Transform | Destination table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure table name transformation rules. Click Configure Table Name Transform to open the Configure Table Name Transformation Rules dialog box.
Note
|
Kafka
Parameter | Description | |
Data Source Configuration | Data Source Type | Select Kafka. |
Datasource | Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a Kafka data source. | |
Destination Topic | The topic for the destination data. You can select Single Topic or Multiple Topics. If you select Single Topic, you must select a destination topic. You can enter a keyword of the topic name to search. If you select Multiple Topics, you can configure topic name transformation and topic parameters.
| |
Data Format | Set the storage format for the written data. Supported formats include DTS Avro and Canal Json.
Note If you set Destination Topic to Multiple Topics, you can only select Canal Json as the data format. | |
Destination Topic Configuration | Topic Name Transform | Click Configure Topic Name Transform. In the Configure Topic Name Transformation Rules dialog box, you can configure Topic Name Transformation Rules and a prefix and suffix for the topic name.
Note
|
Topic Parameters | Additional parameters for creating a topic. The format is Note This item can be configured only when Destination Topic is set to Multiple Topics. | |
DataHub
Parameter | Description | |
Destination Data | Data Source Type | Select DataHub. |
Datasource | Select a destination data source. The system provides a shortcut to create a new data source. You can click New to create a DataHub data source on the data source page. For more information, see Create a DataHub data source. | |
Destination Topic Creation Method | You can select New Topic or Use Existing Topic.
| |
Destination Topic |
| |
Databricks
Parameter | Description | |
Data Source Configuration | Data Source Type | Select Databricks. |
Datasource | Select a destination data source. You can select a Databricks data source and project. You can also click New to create a data source on the data source page. For more information, see Create a Databricks data source. | |
Time Zone | Time-formatted data is processed based on the current time zone. The default value is the time zone configured in the selected data source and cannot be changed. Note Time zone conversion is supported only when the source data source type is MySQL or PostgreSQL and the destination data source type is Databricks. | |
New Sink Table Configuration | Table Name Transform | Destination table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure table name transformation rules. Click Configure Table Name Transform to open the Configure Table Name Transformation Rules dialog box.
Note
|
Partition Format | You can select No Partition or Multiple Partitions. | |
Partition Interval | If you set Partition Format to No Partition, you cannot configure the partition interval. If you set Partition Format to Multiple Partitions, you can set Partition Interval to Hour or Day. Note
| |
SelectDB
Parameter | Description | |
Data Source Configuration | Data Source Type | Select SelectDB. |
Datasource | Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a SelectDB data source. | |
New Sink Table Configuration | Table Name Transform | Destination table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure table name transformation rules. Click Configure Table Name Transform to open the Configure Table Name Transformation Rules dialog box.
Note
|
Hive
Parameter | Description | |
Data Source Configuration | Data Source Type | Set Data Source Type to Hive. |
Datasource | Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a Hive data source. | |
New Sink Table Configuration | Data Lake Table Format | You can select None, Hudi, Iceberg, or Paimon.
Note This item can be configured only when Data Lake Table Format Configuration is enabled for the selected Hive data source. |
Hudi Table Type/Paimon Table Type | For Hudi Table Type, you can select MOR (merge on read) or COW (copy on write). For Paimon Table Type, you can select MOR (merge on read), COW (copy on write), or MOW (merge on write). Note This item can be configured only when Data Lake Table Format is set to Hudi or Paimon. | |
Table Creation Execution Engine | You can select Hive or Spark. After you select a data lake table format, Spark is selected by default.
| |
Table Name Transform | Destination table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure table name transformation rules. Click Configure Table Name Transform to open the Configure Table Name Transformation Rules dialog box.
Note
| |
Partition Format | You can select Single Partition, Multiple Partitions, or Fixed Partition. Note When the format is set to Single Partition or Fixed Partition, the default partition field name is | |
Partition Interval | The default value is Hour. You can also select Day. Click the
Note This configuration item is supported only when Partition Format is set to Single Partition or Multiple Partitions. | |
Partition Value | Enter a fixed partition value, for example, 20250101. Note This configuration item is supported only when Partition Format is set to Fixed Partition. | |
Mapping configuration
Mapping configuration is not supported when the destination data source type is DataHub, or when the destination data source is Kafka and the destination topic is a single topic.
Destination data source is not Kafka

Area | Description |
① View Additional Fields | During real-time incremental synchronization, additional fields are automatically added by default when a table is created to facilitate data use. Click View Additional Fields to view the fields. In the Additional Fields dialog box, you can view information about the currently added fields. Important
Click View DDL For Adding Fields to view the DDL statement for adding the additional fields. Note
|
② Search and Filter Area | You can search by Source Table and Sink Table Name. To quickly filter sink tables, click the |
③ Add Global Fields, Refresh Mappings |
|
④ Destination Database List | The destination database list includes Serial Number, Source Table, Mapping Status, Sink Table Creation Method, and Sink Table Name. You can also Add Field, View Fields, Refresh, or Delete a sink table.
|
⑤ Batch Operations | You can perform batch Delete operations on sink tables. |
Destination data source is Kafka (destination topic is multiple topics)

Area | Description |
① Search and Filter Area | You can search by Source Table and Destination Topic Name. To quickly filter sink tables, click the |
② Refresh Mappings | To refresh the sink table configuration list, click Refresh Mappings. Important If the destination topic configuration already has content, reselecting the data source type and data source will reset the destination topic list and mapping status. Proceed with caution. |
③ List | The list includes Serial Number, Source Table, Mapping Status, Destination Topic Creation Method, and Destination Topic Name. You can also delete a sink table.
|
④ Batch Operations | You can perform batch Delete operations on sink tables. |
DDL message handling policy
The DDL message handling policy is not supported when the source data source type is DataHub or Kafka.
The DDL message handling policy is not supported when the destination data source type is PostgreSQL or Hive (Hudi table type).
When the destination data source type is Hive (Hudi table type) and the data lake table format is Hudi, only the Ignore policy is supported.
When the source data source type is Kafka, only the Ignore policy is supported.
Data for new columns added to existing partitions of Hive or MaxCompute tables cannot be synchronized. The data for these new columns in existing partitions will be NULL. Data synchronization will function correctly for subsequent new partitions.
Create Table, Add Column, etc.: DDL operations are processed normally. These operations include creating tables, adding columns, deleting columns, renaming columns, and modifying column types. This DDL information is sent to the destination data source for processing. Processing policies vary for different destination data sources.
Ignore: Discards the DDL message and does not send it to the destination data source.
Error: Immediately stops the real-time sync task with an error status.
Step 3: Configure real-time integration task properties
Click Resource Configuration in the top menu bar of the current real-time integration task tab, or click Property in the right-side sidebar to open the Property panel.
Configure the Basic Information and Resource Configuration for the current real-time integration task.
Basic Information: Select the Development Owner and Operation Owner for the current real-time integration task, and enter a Description for the task. The description can be up to 1,000 characters.
Resource Configuration: For more information, see Real-time integration resource configuration.
Step 4: Submit the real-time integration task
Click Submit to submit the current real-time integration task.
In the Submit dialog box, enter the Submission Remarks and click OK And Submit.
After the submission is complete, you can view the submission details in the Submit dialog box.
If the project is in Dev-Prod mode, you need to publish the real-time integration task to the production environment. For more information, see Manage release tasks.
What to do next
You can view and manage the real-time integration task in the Operation Center to ensure that it runs as expected. For more information, see View and manage real-time tasks.