Create a real-time integration task - Dataphin - Alibaba Cloud Documentation Center

Real-time integration enables you to combine and collect data from multiple source data sources into a single destination data source. This establishes a real-time synchronization link for data sync. This topic describes how to create a real-time integration task.

Prerequisites

You must configure at least one data source before creating a real-time integration task. This lets you select the source and destination data sources when configuring the task. For more information, see Supported data sources for real-time integration.

Background information

If the destination data source is Oracle or MySQL, the Java Database Connectivity (JDBC) protocol is used. Messages are processed according to the following policies.
- If the sink table does not have a primary key.
  - When an INSERT message is received, it is appended directly.
  - When an UPDATE_BEFORE message is received, it is discarded. When an UPDATE_AFTER message is received, it is appended directly.
  - When a DELETE message is received, it is discarded.
- If the sink table has a primary key.
  - When an INSERT message is received, it is processed as an UPSERT message.
  - When an UPDATE_BEFORE message is received, it is discarded. When an UPDATE_AFTER message is received, it is processed as an UPSERT message.
  - When a DELETE message is received, it is processed as a DELETE message.
The JDBC protocol writes data immediately. If a task fails over and the sink table has no primary key, duplicate data may result. Exactly-once delivery is not guaranteed.
The JDBC protocol supports only DDL statements for creating tables and adding fields. DDL messages of other types are discarded.
Oracle supports only basic data types. The INTERVAL YEAR, INTERVAL DAY, BFILE, SYS.ANY, XML, map, ROWID, and UROWID data types are not supported.
MySQL supports only basic data types. The map data type is not supported.
To prevent data inconsistency caused by out-of-order data, only a single concurrent task is supported.
The Oracle data source supports Oracle Database 11g, Oracle Database 19c, and Oracle Database 21c.
The MySQL data source supports MySQL 8.0, MySQL 8.4, and MySQL 5.7.

Step 1: Create a real-time integration task

In the top menu bar of the Dataphin homepage, choose Develop > Data Integration.
In the top menu bar, select a project. If you are in Dev-Prod mode, select an environment.
In the left navigation pane, select Integration > Real-time Integration.
Click the icon in the real-time integration list and select Real-time Integration Task to open the Create Real-time Integration Task dialog box.

In the Create Real-time Integration Task dialog box, configure the following parameters.

Parameter	Description
Task Name	Enter a name for the real-time task. The name must start with a letter, contain only lowercase letters, digits, and underscores (_), and be 4 to 63 characters in length.
Production/Development environment queue resource	You can select all resource groups that are configured for real-time tasks. Note This configuration item is supported only when the compute source used by the project is a Flink compute source in Kubernetes deployment mode.
Description	Enter a brief description of the task. The description can be up to 1,000 characters in length.
Select Directory	Select the folder where the real-time task is stored. If no folder is created, you can create one as follows: Above the real-time task list on the left, click the icon to open the New Folder dialog box. In the New Folder dialog box, enter a folder Name and Select Directory as needed. Click OK.

After you complete the configuration, click OK.

Step 2: Configure the real-time integration task

The supported source and destination data sources depend on the real-time computing engine. For more information, see Supported data sources for real-time integration.

Source data source

Note

If the source data source is an external data source and you select Entire Database or Select Tables with Batch Select, the table names are retrieved from the Metadata Center. If no metadata acquisition task is configured for the data source, go to Metadata > Acquisition Task to create one.

MySQL

Parameter		Description
Data Source Configuration	Data Source Type	Select MySQL.
	Datasource	Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a MySQL data source. Important Enable logging for the data source and make sure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time.
	Time Zone	The time zone configured for the selected data source.
Sync Rule Configuration	Sync Solution	Select Real-time Incremental or Real-time Incremental + Full. The default value is Real-time Incremental. Real-time Incremental: Collects incremental changes from the source database and writes them to the downstream destination database in the order they occur. Real-time Incremental + Full: Imports the full data from the source database at one time, and then collects and writes incremental changes to the downstream destination database in the order they occur. Note If the destination data source is Hive (Hudi table format), MaxCompute, or Databricks, you can set Sync Solution to Real-time Incremental + Full.
Sync Rule Configuration	Selection Method	You can select Entire Database, Select Tables, or Exclude Tables. Entire Database: Synchronizes all tables in all databases under the selected data source. Select Tables/Exclude Tables: Selects some tables in the current database for real-time synchronization. After you select tables, click Preview to view all matched tables in the Select/Exclude Table Preview dialog box. In the dialog box, you can search for tables by keyword and delete tables individually or in batches. You cannot delete tables if you use Regex Match. Batch Select/Batch Exclude: If you select Batch Select, multiple selected tables in the current database are synchronized in real time. If you select Batch Exclude, multiple selected tables in the current database are not synchronized in real time. You can select all tables in all databases under the selected data source. Tables are displayed in the format of `DBname.Tablename`. Regex Match: Enter a regular expression for table names in the Regular Expression input box. Java regular expressions are supported, such as `schemaA.\|schemaB.`. You can match all tables in all databases under the selected data source in batches. You can use the database name (DBname) and table name (Tablename) for regex matching.

Microsoft SQL Server

Parameter		Description
Data Source Configuration	Data Source Type	Select Microsoft SQL Server.
	Datasource	Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a Microsoft SQL Server data source. Important Enable logging for the data source and make sure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time.
	Time Zone	The time zone configured for the selected data source.
Sync Rule Configuration	Sync Solution	Only Real-time Incremental is supported. Collects incremental changes from the source database and writes them to the downstream destination database in real time in the order they occur.
Sync Rule Configuration	Selection Method	You can select Entire Database, Select Tables, or Exclude Tables. Entire Database: Synchronizes data for the entire current database. Select Tables/Exclude Tables: Selects some tables in the current database for real-time synchronization. After you select tables, click Preview to view all matched tables in the Select/Exclude Table Preview dialog box. In the dialog box, you can search for tables by keyword and delete tables individually or in batches. Batch Select/Batch Exclude: If you select Batch Select, multiple selected tables in the current database are synchronized in real time. If you select Batch Exclude, multiple selected tables in the current database are not synchronized in real time.

PostgreSQL

Parameter		Description
Data Source Configuration	Data Source Type	Select PostgreSQL.
	Datasource	Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a PostgreSQL data source. Important Enable logging for the data source and make sure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time.
	Time Zone	The time zone configured for the selected data source.
Sync Rule Configuration	Sync Solution	Only Real-time Incremental is supported. Collects incremental changes from the source database and writes them to the downstream destination database in real time in the order they occur.
Sync Rule Configuration	Selection Method	You can select Entire Database or Select Tables. Entire Database: Synchronizes data for the entire current database. Select Tables: Selects some tables in the current database for real-time synchronization. After you select tables, click Preview to view all matched tables in the Select Table Preview dialog box. In the dialog box, you can search for tables by keyword and delete tables individually or in batches. Batch Select: The selected tables in the current database are synchronized in real time.

Oracle

Parameter		Description
Data Source Configuration	Data Source Type	Select Oracle.
	Datasource	Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create an Oracle data source. Important Enable logging for the data source and make sure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time.
	Time Zone	The time zone configured for the selected data source.
Sync Rule Configuration	Sync Solution	Only Real-time Incremental is supported. Collects incremental changes from the source database and writes them to the downstream destination database in real time in the order they occur.
Sync Rule Configuration	Selection Method	You can select Entire Database, Select Tables, or Exclude Tables. Entire Database: Synchronizes all tables in all databases under the selected data source. Select Tables/Exclude Tables: Selects some tables in the current database for real-time synchronization. After you select tables, click Preview to view all matched tables in the Select/Exclude Table Preview dialog box. In the dialog box, you can search for tables by keyword and delete tables individually or in batches. You cannot delete tables if you use Regex Match. Batch Select/Batch Exclude: If you select Batch Select, multiple selected tables in the current database are synchronized in real time. If you select Batch Exclude, multiple selected tables in the current database are not synchronized in real time. Regex Match: Enter a regular expression for table names in the Regular Expression input box. Java regular expressions are supported, such as `schemaA.\|schemaB.`.

IBM DB2

Parameter		Description
Data Source Configuration	Data Source Type	Select IBM DB2.
Data Source Configuration	Datasource	Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create an IBM DB2 data source. Important Enable logging for the data source and make sure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time.
Sync Rule Configuration	Sync Solution	Only Real-time Incremental is supported. Collects incremental changes from the source database and writes them to the downstream destination database in real time in the order they occur.
Sync Rule Configuration	Selection Method	You can select Entire Database, Select Tables, or Exclude Tables. Entire Database: Synchronizes all tables in all databases under the selected data source. Select Tables/Exclude Tables: Selects some tables in the current database for real-time synchronization. After you select tables, click Preview to view all matched tables in the Select/Exclude Table Preview dialog box. In the dialog box, you can search for tables by keyword and delete tables individually or in batches. Batch Select/Batch Exclude: If you select Batch Select, multiple selected tables in the current database are synchronized in real time. If you select Batch Exclude, multiple selected tables in the current database are not synchronized in real time.

Kafka

Parameter		Description
Data Source Configuration	Data Source Type	Select Kafka.
	Datasource	Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a Kafka data source. Important Enable logging for the data source and make sure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time.
	Source topic	Select the topic of the source data. You can enter a keyword in the topic name to perform a fuzzy search.
	Data format	Only Canal JSON is supported. Canal JSON is a format compatible with Canal. Data is stored in Canal JSON format.
	Key Type	The key type for Kafka, which determines the key.deserializer configuration when initializing KafkaConsumer. Only STRING is supported.
	Value Type	The value type for Kafka, which determines the value.deserializer configuration when initializing KafkaConsumer. Only STRING is supported.
	Consumer Group ID (optional)	Enter the ID of the consumer group. The consumer group ID is used to report the status offset.
Sync Rule Configuration	Table List	Enter the names of the tables to be synchronized. Separate multiple table names with line breaks. The value can be up to 1,024 characters in length. Table names can be in one of the following three formats: `tablename`, `db.tablename`, or `schema.tablename`.

Hive (Hudi table format)

You can select Hive (Hudi data source) as the source data source only when the real-time engine is Apache Flink and the compute source is a Flink on YARN deployment.

Parameter		Description
Data Source Configuration	Data Source Type	Select Hive.
Data Source Configuration	Datasource	You can only select a Hive data source in Hudi table format. You can also click New to create a data source on the Datasource page. For more information, see Create a Hive data source. Important Enable logging for the data source and make sure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time.
Sync Rule Configuration	Sync Solution	Only Real-time Incremental is supported. Collects incremental changes from the source database and writes them to the downstream destination database in real time in the order they occur.
Sync Rule Configuration	Select Table	Select a single table for real-time synchronization.

PolarDB (MySQL database type)

Parameter		Description
Data Source Configuration	Data Source Type	Select PolarDB.
	Datasource	You can only select a PolarDB data source of the MySQL database type. You can also click New to create a data source on the Datasource page. For more information, see Create a PolarDB data source. Important Enable logging for the data source and make sure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time.
	Time Zone	The time zone configured for the selected data source.
Sync Rule Configuration	Sync Solution	Select Real-time Incremental or Real-time Incremental + Full. The default value is Real-time Incremental. Real-time Incremental: Collects incremental changes from the source database and writes them to the downstream destination database in the order they occur. Real-time Incremental + Full: Imports the full data from the source database at one time, and then collects and writes incremental changes to the downstream destination database in the order they occur. Note If the destination data source is Hive (Hudi table format), MaxCompute, or Databricks, you can set Sync Solution to Real-time Incremental + Full.
Sync Rule Configuration	Selection Method	You can select Entire Database, Select Tables, or Exclude Tables. Entire Database: Synchronizes all tables in all databases under the selected data source. Select Tables/Exclude Tables: Selects some tables in the current database for real-time synchronization. After you select tables, click Preview to view all matched tables in the Select/Exclude Table Preview dialog box. In the dialog box, you can search for tables by keyword and delete tables individually or in batches. You cannot delete tables if you use Regex Match. Batch Select/Batch Exclude: If you select Batch Select, multiple selected tables in the current database are synchronized in real time. If you select Batch Exclude, multiple selected tables in the current database are not synchronized in real time. Regex Match: Enter a regular expression for table names in the Regular Expression input box. Java regular expressions are supported, such as `schemaA.\|schemaB.`.

Destination data source

MaxCompute

Parameter		Description
Data Source Configuration	Data Source Type	Select MaxCompute.
Data Source Configuration	Datasource	Select a destination data source. You can select a MaxCompute data source and project. You can also click New to create a data source on the data source page. For more information, see Create a MaxCompute data source.
Sink Table Creation Configuration	New Table Type	Select Standard Table or Delta Table. The default value is Standard Table. If you select Delta Table and set the sink table creation method to Auto-create table, a MaxCompute Delta table is created. Additional fields are not used when creating a Delta table. Note After you configure the sink table, if you change the new table type, the system asks for confirmation. If you click OK in the dialog box, the sink table configuration is cleared and you must re-enter it.
	Table Name Transform	Sink table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure a table name transform rule. Click Configure Table Name Transform to open the Configure Table Name Transform Rules dialog box. Replace String: Click New Rule to add a rule. Configure Source Table String to Replace and Sink Table Replacement String. You can add up to 5 rules. Table Name Prefix/Suffix: Cannot be empty. Can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters. Note After you configure the table name transform, the system automatically matches and replaces strings based on the transform rules in top-down order. Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.
	Partition Format	If you select standard table as the table type, the partition format supports only Multi-partition. If you select Delta table as the table type, the partition format supports No Partition or Multi-partition.
	Partition Interval	If you set Partition Format to No Partition, you cannot configure the partition interval. If you set Partition Format to Multiple Partitions, you can set the partition interval to hour or day. Note hour: Four levels of partitions: YYYY, MM, DD, and HH. day: Three levels of partitions: YYYY, MM, and DD.

MySQL

Parameter		Description
Data Source Configuration	Data Source Type	Select MySQL.
	Datasource	Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a MySQL data source.
	Time Zone	The time zone configured for the selected data source.
Sink Table Creation Configuration	Table Name Transform	Sink table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure a table name transform rule. Click Configure Table Name Transform to open the Configure Table Name Transform Rules dialog box. Replace String: Click New Rule to add a rule. Configure Source Table String to Replace and Sink Table Replacement String. You can add up to 5 rules. Table Name Prefix/Suffix: Cannot be empty. Can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters. Note After you configure the table name transform, the system automatically matches and replaces strings based on the transform rules in top-down order. Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.

Microsoft SQL Server

Parameter		Description
Data Source Configuration	Data Source Type	Select Microsoft SQL Server.
	Datasource	Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a Microsoft SQL Server data source.
	Time Zone	The time zone configured for the selected data source.
Sink Table Creation Configuration	Table Name Transform	Sink table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure a table name transform rule. Click Configure Table Name Transform to open the Configure Table Name Transform Rules dialog box. Replace String: Click New Rule to add a rule. Configure Source Table String to Replace and Sink Table Replacement String. You can add up to 5 rules. Table Name Prefix/Suffix: Cannot be empty. Can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters. Note After you configure the table name transform, the system automatically matches and replaces strings based on the transform rules in top-down order. Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.

Oracle

Parameter		Description
Data Source Configuration	Data Source Type	Select Oracle.
	Datasource	Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create an Oracle data source.
	Time Zone	The time zone configured for the selected data source.
Sink Table Creation Configuration	Table Name Transform	Sink table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure a table name transform rule. Click Configure Table Name Transform to open the Configure Table Name Transform Rules dialog box. Replace String: Click New Rule to add a rule. Configure Source Table String to Replace and Sink Table Replacement String. You can add up to 5 rules. Table Name Prefix/Suffix: Cannot be empty. Can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters. Note After you configure the table name transform, the system automatically matches and replaces strings based on the transform rules in top-down order. Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.

Kafka

Parameter		Description
Data Source Configuration	Data Source Type	Select Kafka.
	Datasource	Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a Kafka data source.
	Destination Topic	The topic for the destination data. You can select Single Topic or Multiple Topics. If you select Single Topic, select a destination topic. You can enter a keyword in the topic name to search. If you select Multiple Topics, you can configure topic name transform and topic parameters. Single Topic: All table messages are written to the same topic. Multiple Topics: A topic with the same name is created for each table.
	Data format	Set the storage format for the written data. Supported formats include DTS Avro and Canal Json. DTS Avro: A data serialization format that converts data structures or objects into a format that is easy to store or transmit. Canal Json: A format compatible with Canal. Data is stored in Canal JSON format. Note If you set Destination Topic to Multiple Topics, you can only set Data format to Canal Json.
Destination topic configuration	Topic Name Transform	Click Configure Topic Name Transform. In the Configure Topic Name Transform Rules dialog box, configure Topic Name Transform Rules and a prefix and suffix for the topic name. Topic Name Transform Rules: Click New Rule to add a rule. You must enter the Source Table String to Replace and Destination Topic Replacement String. Neither can be empty, and the Destination Topic Replacement String can only contain letters, digits, and underscores (_) and be up to 32 characters in length. Prefix and suffix for the topic name: Can contain letters, digits, and underscores (_). The length cannot exceed 32 characters. Note Letters in the replacement strings and topic name prefixes and suffixes are automatically converted to lowercase. You can configure topic name transform only when Destination Topic is set to Multiple Topics.
Destination topic configuration	Topic Parameters	Additional parameters for creating a topic. The format is `key=value`. Separate multiple parameters with line breaks. Note This item can be configured only when Destination Topic is set to Multiple Topics.

DataHub

Parameter		Description
Destination Data	Data Source Type	Select DataHub.
	Datasource	Select a destination data source. The system provides a shortcut to create a data source. You can click New to create a DataHub data source on the data source page. For more information, see Create a DataHub data source.
	Destination Topic Creation Method	You can select New Topic or Use Existing Topic. New Topic: Manually enter the destination topic to create it. Use Existing Topic: Use an existing topic in the destination database. Make sure that the topic's schema is consistent with the format of the sync message. Otherwise, the sync task will fail.
	Destination Topic	Target Topic Creation Method is New Topic. Manually enter the Destination Topic. The Destination Topic must start with a lowercase letter and contain 3 to 64 digits, letters, or underscores (_). After you enter the topic, click Validate to check if the topic already exists in the destination database. If the topic does not exist in the destination database, it is automatically created. The schema is the schema of the sync message, and the default lifecycle is 7 days. If the topic already exists in the destination database, make sure that the topic's schema is consistent with the schema of the sync message. Otherwise, the task will fail. Target Topic Creation Method is Use Existing Topic. Click the drop-down list to select an existing topic in the destination database. If there are many topics, you can enter a topic name to search for the desired topic.

Databricks

Parameter		Description
Data Source Configuration	Data Source Type	Select Databricks.
	Datasource	Select a destination data source. You can select a Databricks data source and project. You can also click New to create a data source on the data source page. For more information, see Create a Databricks data source.
	Time Zone	Time-formatted data is processed based on the current time zone. By default, this is the time zone configured in the selected data source and cannot be modified. Note Time zone conversion is supported only when the source data source type is MySQL or PostgreSQL and the destination data source type is Databricks.
Sink Table Creation Configuration	Table Name Transform	Sink table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure a table name transform rule. Click Configure Table Name Transform to open the Configure Table Name Transform Rules dialog box. Replace String: Click New Rule to add a rule. Configure Source Table String to Replace and Sink Table Replacement String. You can add up to 5 rules. Table Name Prefix/Suffix: Cannot be empty. Can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters. Note After you configure the table name transform, the system automatically matches and replaces strings based on the transform rules in top-down order. Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.
	Partition Format	You can select No Partition or Multiple Partitions.
	Partition Interval	If you set Partition Format to No Partition, you cannot configure the partition interval. If you set Partition Format to Multiple Partitions, you can set the partition interval to hour or day. Note hour: Four levels of partitions: YYYY, MM, DD, and HH. day: Three levels of partitions: YYYY, MM, and DD.

SelectDB

Parameter		Description
Data Source Configuration	Data Source Type	Select SelectDB.
Data Source Configuration	Datasource	Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a SelectDB data source.
Sink Table Creation Configuration	Table Name Transform	Sink table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure a table name transform rule. Click Configure Table Name Transform to open the Configure Table Name Transform Rules dialog box. Replace String: Click New Rule to add a rule. Configure Source Table String to Replace and Sink Table Replacement String. You can add up to 5 rules. Table Name Prefix/Suffix: Cannot be empty. Can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters. Note After you configure the table name transform, the system automatically matches and replaces strings based on the transform rules in top-down order. Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.

Hive

Parameter		Description
Data Source Configuration	Data Source Type	Set Data Source Type to Hive.
Data Source Configuration	Datasource	Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a Hive data source.
Sink Table Creation Configuration	Data lake table format	You can select None, Hudi, Iceberg, or Paimon. None: Writes data and creates tables as standard Hive tables. Hudi: Writes data and creates tables in Hudi format. You can select Hudi only when the Hive data source version is CDP7.x Hive 3.1.3. Iceberg: Writes data and creates tables in Iceberg format. You can select Iceberg only when the Hive data source version is EMR5.x Hive 3.1.x. Paimon: Writes data and creates tables in Paimon format. You can select Paimon only when the Hive data source version is EMR5.x Hive 3.1.x. Note This item can be configured only when Data lake table format configuration is enabled for the selected Hive data source.
	Hudi Table Type/Paimon Table Type	For Hudi Table Type, you can select MOR (merge on read) or COW (copy on write). For Paimon Table Type, you can select MOR (merge on read), COW (copy on write), or MOW (merge on write). Note This item can be configured only when Data lake table format is set to Hudi or Paimon.
	Table Creation Execution Engine	You can select Hive or Spark. If you select a data lake table format, Spark is selected by default. Hive: Uses the Hive engine to create tables. The table creation syntax is Hive syntax. Spark: Uses the Spark engine to create tables. The table creation syntax is Spark syntax. You can select Spark only when Spark is enabled for the Hive data source. Note When Data lake table format is set to Paimon, only Spark is supported as the table creation execution engine.
	Table Name Transform	Sink table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure a table name transform rule. Click Configure Table Name Transform to open the Configure Table Name Transform Rules dialog box. Replace String: Click New Rule to add a rule. Configure Source Table String to Replace and Sink Table Replacement String. You can add up to 5 rules. Table Name Prefix/Suffix: Cannot be empty. Can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters. Note After you configure the table name transform, the system automatically matches and replaces strings based on the transform rules in top-down order. Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.
	Partition Format	You can select Single Partition, Multiple Partitions, or Fixed Partition. Note If you select Single Partition or Fixed Partition, the default partition field name is `ds` and cannot be modified.
	Partition Interval	The default value is hour. You can also select day. Click the icon next to the partition interval to view partition setting details. Single Partition: hour: A hash partition (yyyyMMddhh) with the partition key column name `ds`. Day: Displays the hash partition (yyMMdd) with the partition key column named `ds`. Multiple Partitions: hour: Four levels of partitions: yyyy, mm, dd, and hh. day: Three levels of partitions: yyyy, mm, and dd. Note This configuration item is supported only when Partition Format is set to Single Partition or Multiple Partitions.
	Partition Value	Enter a fixed partition value, for example, 20250101. Note This configuration item is supported only when Partition Format is set to Fixed Partition.

Hologres

Parameter		Description
Data Source Configuration	Data Source Type	Select Hologres.
	Datasource	Select a destination data source. You can select a Hologres data source and project. You can also click New to create a data source on the data source page. For more information, see Create a Hologres data source.
	Schema	Select a destination schema.
Sink Table Creation Configuration	Table Name Transform	Sink table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure a table name transform rule. Click Configure Table Name Transform to open the Configure Table Name Transform Rules dialog box. Replace String: Click New Rule to add a rule. Configure Source Table String to Replace and Sink Table Replacement String. You can add up to 5 rules. Table Name Prefix/Suffix: Cannot be empty. Can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters. Note After you configure the table name transform, the system automatically matches and replaces strings based on the transform rules in top-down order. Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.

StarRocks

Parameter		Description
Data Source Configuration	Data Source Type	Select StarRocks.
Data Source Configuration	Datasource	Select a data source. You can also click New to create a data source on the Datasource page. For more information, see.
Sink Table Creation Configuration	Table Name Transform	Sink table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure a table name transform rule. Click Configure Table Name Transform to open the Configure Table Name Transform Rules dialog box. Replace String: Click New Rule to add a rule. Configure Source Table String to Replace and Sink Table Replacement String. You can add up to 5 rules. Table Name Prefix/Suffix: Cannot be empty. Can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters. Note After you configure the table name transform, the system automatically matches and replaces strings based on the transform rules in top-down order. Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.

Mapping configuration

Note

Mapping configuration is not supported if the destination data source is DataHub or Kafka (with a single destination topic).
If the destination data source is an external data source, the sink table names in the mapping configuration are retrieved from the Metadata Center. In this case, the sink table creation method does not support auto-create table. You must manually create the sink table in the database.

Destination data source is not Kafka

Block	Description
① View additional fields	During real-time incremental synchronization, additional fields are automatically added when a table is auto-created to facilitate data use. Click View additional fields to view the fields. In the Additional Fields dialog box, you can view the currently added fields. Important If you select an existing table as the sink table and it has no additional fields, add them to the existing sink table. Otherwise, data usage will be affected. If you select a data lake table format, no additional fields are included. Click View DDL for Adding Fields to view the DDL statement for adding the additional fields. Note Viewing additional fields is not supported when the source data source type is Kafka.
② Search and filter area	Search by Source Table and Sink Table Name. To quickly filter sink tables, click the icon at the top and filter by Mapping Status and Creation Method.
③ Add global fields, Refresh mapping	Add global fields Click Add global fields to add global fields in the Add Global Fields dialog box. Name: The name of the global field. Type: Supported data types are String, Long, Double, Date, and Boolean. Value: The value of the global field. Description: A description of the field. Note If a field is added both globally and for a single table, only the single-table field takes effect. Currently, only constants can be added. Global fields only take effect for sink tables that are set to Auto-create table. Adding global fields is not supported when the source data source type is Kafka. Refresh mapping To refresh the sink table configuration list, click Refresh mapping. Important If the sink table configuration already has content, reselecting the data source type and data source will reset the sink table list and mapping. Proceed with caution. You can click to refresh again at any time during the refresh process. Each time you click Refresh mapping, only the configured global fields are saved. Other information, including the sink table creation method, sink table name, and deletion records, is not saved. When the source data source type is Kafka, clicking Refresh mapping will map based on the table list in the Sync Rule Configuration. An error is reported if a table does not exist.
④ Destination database list	The destination database list includes Serial Number, Source Table, Mapping Status, Sink Table Creation Method, and Sink Table Name. You can also add fields, view fields, refresh, or delete a sink table. Mapping Status: Completed: Mapping is completed normally. Incomplete: The status was modified, but the mapping was not refreshed. Mapping: Waiting for mapping or in the process of mapping. Abnormal: A data source or internal system error exists. Failed: The destination partitioned table is inconsistent with the partition set for the real-time task. Alerting: The source and sink tables may have incompatible data types. Sink Table Creation Method has three options: If a table with the same name as the source table exists in the destination database, the creation method is "Use existing table", and this table is used as the sink table by default. To change to "Auto-create table", add a table name transform rule or a prefix/suffix and remap. If no table with the same name is found in the destination database, the creation method defaults to "Auto-create table". You can also change it to "Use existing table" and select an existing table for synchronization. Only tables that are auto-created support adding fields or custom DDL table creation. Global fields also only take effect for auto-created tables. Note When the destination data source type is Hive: During auto-creation, if the data lake table format is None, a standard Hive table is created. Otherwise, a table of the selected format is created. Hudi and Iceberg are currently supported. During custom creation, if the data lake table format is None, use the DDL for a standard Hive table. Otherwise, use the DDL for the selected table format. Hudi and Iceberg are currently supported. When the source data source type is Kafka, the sink table creation method only supports Use existing table. When the destination data source type is SelectDB, if the source table has no primary key, a Duplicate table is created during auto-creation. If the source table has a primary key, a Unique table is created. If Partition Format is set to Single Partition or Fixed Partition and Sink Table Creation Method is set to Use existing table, the system automatically checks if the sink table partition matches the partition settings. An error is reported if it does not match. When the destination data source is StarRocks, auto-creation creates a StarRocks table. If the source table has no primary key, a Duplicate table is created. If the source table has a primary key, a Primary table is created. Sink Table Name: Sink table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure a table name transform rule. When the destination data source type is MaxCompute: If Sink Table Creation Method is Auto-create table and New Table Type is Delta Table, the icon is displayed next to the sink table name, indicating that a Delta table will be created. If Sink Table Creation Method is Use existing table and the user selects a Delta table from the sink table list, the icon is also displayed next to the sink table name, indicating that it is a Delta table. Actions: Custom table creation: You can create a table by adding fields or using DDL. After enabling custom table creation, global fields no longer take effect. Note Added fields are only displayed in the actions column for auto-created tables. You cannot modify an existing sink table (a table with the creation method "Use existing table"). View fields: View the fields and types of the source and sink tables. Refresh: Remap the source and sink tables. Delete: Deleting a source table cannot be undone.
⑤ Batch operations	You can Delete sink tables in batches.

Destination data source is Kafka (with multiple destination topics)

Block	Description
① Search and filter area	Search by Source Table and Destination Topic Name. To quickly filter sink tables, click the icon at the top and filter by Mapping Status and Destination Topic Creation Method.
② Refresh mapping	To refresh the sink table configuration list, click Refresh mapping. Important If the destination topic configuration already has content, reselecting the data source type and data source will reset the destination topic list and mapping. Proceed with caution.
③ List	The list includes Serial Number, Source Table, Mapping Status, Destination Topic Creation Method, and Destination Topic Name. You can also delete a sink table. Destination Topic Creation Method: If the destination topic already exists, the creation method is Use Existing Topic. If the destination topic does not exist, the creation method is Auto-create Topic. When a topic is auto-created, the system creates it based on the generated destination topic name and topic parameters. Mapping Status: Only checks if the destination topic exists. Delete: Deletes the corresponding row. This operation cannot be undone.
④ Batch operations	You can Delete sink tables in batches.

DDL processing policy

Note

DDL processing policies are not supported when the source data source type is DataHub or Kafka.
DDL processing policies are not supported when the destination data source type is PostgreSQL or Hive (Hudi table format).
When the destination data source type is Hive (Hudi table format) and the data lake table format is Hudi, all DDL processing policies support only Ignore.
When the source data source type is Kafka, all DDL processing policies support only Ignore.
New columns added to existing partitions of Hive or MaxCompute tables cannot have their data synchronized. The data for these new columns in existing partitions will be NULL. The next new partition will be effective and available.

Create Table and Add Column, among others: Normal processing (includes creating tables, adding columns, deleting columns, renaming columns, and modifying column types). This DDL information is passed to the destination data source for processing. Processing policies vary by destination data source.
Ignore: Discards this DDL information and does not send it to the destination data source.
Error: Stops the real-time sync task with an error status.

Step 3: Configure real-time integration task properties

Click Resource Configuration in the top menu bar of the current real-time integration task tab, or click Property in the right sidebar to open the Property panel.
Configure the Basic Information and Resource Configuration for the current real-time integration task.
- Basic Information: Select the Development Owner and Operation Owner for the current real-time integration task, and enter a Description for the task. The description can be up to 1,000 characters long.
- Resource Configuration: For more information, see Real-time integration resource configuration.

Step 4: Submit the real-time integration task

Click Submit to submit the current real-time integration task.
In the Submit dialog box, enter Submission notes and click OK and Submit.
After submission, you can view the submission details in the Submit dialog box.
If the project is in Dev-Prod mode, you must publish the real-time integration task to the production environment. For more information, see Manage publish tasks.

What to do next

You can view and manage the real-time integration task in the Operation Center to ensure that it runs as expected. For more information, see View and manage real-time tasks.