All Products
Search
Document Center

Dataphin:Create a real-time integration task

Last Updated:Nov 19, 2025

Real-time integration enables you to collect and combine data from multiple data sources into a destination data source. This process creates a real-time link for data synchronization. This topic describes how to create a real-time integration task.

Prerequisites

You must configure the required data sources before you create a real-time integration task. This lets you select the source and destination data during the configuration process. For more information, see Supported data sources for real-time integration.

Background information

  • If you select Oracle or MySQL as the destination data source, the Java Database Connectivity (JDBC) protocol is used. Different messages are processed based on the following policies.

    • If the sink table does not have a primary key:

      • INSERT messages are directly appended.

      • UPDATE_BEFORE messages are discarded. UPDATE_AFTER messages are directly appended.

      • DELETE messages are discarded.

    • If the sink table has a primary key

      • INSERT messages are processed as UPSERT messages.

      • UPDATE_BEFORE messages are discarded. UPDATE_AFTER messages are processed as UPSERT messages.

      • DELETE messages are processed as DELETE messages.

  • Because the JDBC protocol writes data immediately, duplicate data may exist if a node fails over and the sink table does not have a primary key. Exactly-once delivery is not guaranteed.

  • Because the JDBC protocol supports only Data Definition Language (DDL) statements for creating tables and adding fields, other types of DDL messages are discarded.

  • Oracle supports only basic data types. The INTERVAL YEAR, INTERVAL DAY, BFILE, SYS.ANY, XML, map, ROWID, and UROWID data types are not supported.

  • MySQL supports only basic data types. The map data type is not supported.

  • To prevent data inconsistency caused by out-of-order data, only a single concurrent task is supported.

  • Oracle data sources support Oracle Database 11g, Oracle Database 19c, and Oracle Database 21c.

  • MySQL data sources support MySQL 8.0, MySQL 8.4, and MySQL 5.7.

Step 1: Create a real-time integration task

  1. On the Dataphin homepage, choose Developer > Data Integration from the top menu bar.

  2. In the top menu bar, select a project. If you are in Dev-Prod mode, you also need to select an environment.

  3. In the navigation pane on the left, choose Integration > Stream Pipeline.

  4. In the real-time integration list, click the image icon and choose Real-time Integration Task to open the Create Real-time Integration Task dialog box.

  5. In the Create Real-time Integration Task dialog box, configure the following parameters.

    Parameter

    Description

    Task Name

    Enter a name for the real-time task.

    The name must start with a letter. It can contain only lowercase letters, digits, and underscores (_). The name must be 4 to 63 characters in length.

    Production/Development Environment Queue Resource

    You can select any resource group that is configured for real-time tasks.

    Note

    This configuration item is available only when the project uses a Flink compute source in Kubernetes deployment mode.

    Description

    Enter a brief description of the task. The description can be up to 1,000 characters in length.

    Select Directory

    Select a directory to store the real-time task.

    If no directory exists, you can create a new folder as follows:

    1. Above the real-time task list on the left, click the image icon to open the New Folder dialog box.

    2. In the New Folder dialog box, enter a Name for the folder and select a location under Select Directory as needed.

    3. Click OK.

  6. After you complete the configuration, click OK.

Step 2: Configure the real-time integration task

The supported source and destination data sources vary based on the real-time computing engine. For more information, see Supported data sources for real-time integration.

Source data source

MySQL

Parameter

Description

Data Source Configuration

Data Source Type

Select MySQL.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a MySQL data source.

Important

Enable logging for the data source and ensure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time.

Time Zone

Displays the time zone configured for the selected data source.

Sync Rule Configuration

Sync Policy

Select Real-time Incremental or Real-time Incremental + Full. The default value is Real-time Incremental.

  • Real-time Incremental: Collects incremental changes from the source database and writes them to the downstream destination database in the order they occur.

  • Real-time Incremental + Full: Imports the full data from the source database at once, and then collects and writes incremental changes to the downstream destination database in the order they occur.

Note

You can select Real-time Incremental + Full when the destination data source is Hive (Hudi table format), MaxCompute, or Databricks.

Selection Method

You can select Entire Database, Select Tables, or Exclude Tables.

  • Entire Database: Synchronizes all tables in all databases under the selected data source.

  • Select Tables/Exclude Tables: Selects some tables in the current database for real-time synchronization. After you select the tables, you can click Preview to view all matched tables in the Select/Exclude Table Preview dialog box. In the dialog box, you can search for tables by keyword and delete tables individually or in batches. Deletion is not supported for regular expression matches.

    • Batch Select/Batch Exclude: When you use Batch Select, the selected tables in the current database are synchronized in real time. When you use Batch Exclude, the selected tables are not synchronized.

      You can select all tables in all databases under the selected data source. The tables are displayed in the DBname.Tablename format.

    • Regular Expression Match: Enter a regular expression for table names in the Regular Expression box. Java regular expressions are supported, such as schemaA.*|schemaB.*.

      You can match all tables in all databases under the selected data source in batches. You can use the database name (DBname) and table name (Tablename) for regular expression matching.

Microsoft SQL Server

Parameter

Description

Data Source Configuration

Data Source Type

Select Microsoft SQL Server.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a Microsoft SQL Server data source.

Important

Enable logging for the data source and ensure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time.

Time Zone

Displays the time zone configured for the selected data source.

Sync Rule Configuration

Sync Policy

Only Real-time Incremental is supported. Incremental changes from the source database are collected and written to the downstream destination database in real time in the order they occur.

Selection Method

You can select Entire Database, Select Tables, or Exclude Tables.

  • Entire Database: Synchronizes the entire current database.

  • Select Tables/Exclude Tables: Selects some tables in the current database for real-time synchronization. After you select the tables, you can click Preview to view all matched tables in the Select/Exclude Table Preview dialog box. In the dialog box, you can search for tables by keyword and delete tables individually or in batches.

    Batch Select/Batch Exclude: When you use Batch Select, the selected tables in the current database are synchronized in real time. When you use Batch Exclude, the selected tables are not synchronized.

PostgreSQL

Parameter

Description

Data Source Configuration

Data Source Type

Select PostgreSQL.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a PostgreSQL data source.

Important

Enable logging for the data source and ensure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time.

Time Zone

Displays the time zone configured for the selected data source.

Sync Rule Configuration

Sync Policy

Only Real-time Incremental is supported. Incremental changes from the source database are collected and written to the downstream destination database in real time in the order they occur.

Selection Method

You can select Entire Database or Select Tables.

  • Entire Database: Synchronizes the entire current database.

  • Select Tables: Selects some tables in the current database for real-time synchronization. After you select the tables, you can click Preview to view all matched tables in the Select Table Preview dialog box. In the dialog box, you can search for tables by keyword and delete tables individually or in batches.

    Batch Select: The selected tables in the current database are synchronized in real time.

Oracle

Parameter

Description

Data Source Configuration

Data Source Type

Select Oracle.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create an Oracle data source.

Important

Enable logging for the data source and ensure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time.

Time Zone

Displays the time zone configured for the selected data source.

Sync Rule Configuration

Sync Policy

Only Real-time Incremental is supported. Incremental changes from the source database are collected and written to the downstream destination database in real time in the order they occur.

Selection Method

You can select Entire Database, Select Tables, or Exclude Tables.

  • Entire Database: Synchronizes all tables in all databases under the selected data source.

  • Select Tables/Exclude Tables: Selects some tables in the current database for real-time synchronization. After you select the tables, you can click Preview to view all matched tables in the Select/Exclude Table Preview dialog box. In the dialog box, you can search for tables by keyword and delete tables individually or in batches. Deletion is not supported for regular expression matches.

    • Batch Select/Batch Exclude: When you use Batch Select, the selected tables in the current database are synchronized in real time. When you use Batch Exclude, the selected tables are not synchronized.

    • Regular Expression Match: Enter a regular expression for table names in the Regular Expression box. Java regular expressions are supported, such as schemaA.*|schemaB.*.

IBM DB2

Parameter

Description

Data Source Configuration

Data Source Type

Select IBM DB2.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create an IBM DB2 data source.

Important

Enable logging for the data source and ensure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time.

Sync Rule Configuration

Sync Policy

Only Real-time Incremental is supported. Incremental changes from the source database are collected and written to the downstream destination database in real time in the order they occur.

Selection Method

You can select Entire Database, Select Tables, or Exclude Tables.

  • Entire Database: Synchronizes all tables in all databases under the selected data source.

  • Select Tables/Exclude Tables: Selects some tables in the current database for real-time synchronization. After you select the tables, you can click Preview to view all matched tables in the Select/Exclude Table Preview dialog box. In the dialog box, you can search for tables by keyword and delete tables individually or in batches.

    Batch Select/Batch Exclude: When you use Batch Select, the selected tables in the current database are synchronized in real time. When you use Batch Exclude, the selected tables are not synchronized.

Kafka

Parameter

Description

Data Source Configuration

Data Source Type

Select Kafka.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a Kafka data source.

Important

Enable logging for the data source and ensure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time.

Source Topic

Select the topic of the source data. You can enter a keyword of the topic name to perform a fuzzy search.

Data Format

Only Canal JSON is supported. Canal JSON is a format compatible with Canal, and its data storage format is Canal JSON.

Key Type

The key type of Kafka, which determines the key.deserializer configuration when initializing KafkaConsumer. Only STRING is supported.

Value Type

The value type of Kafka, which determines the value.deserializer configuration when initializing KafkaConsumer. Only STRING is supported.

Consumer Group ID (optional)

Enter the ID of the consumer group. The consumer group ID is used to report the status offset.

Sync Rule Configuration

Table List

Enter the names of the tables to be synchronized. Separate multiple table names with line breaks. The value can be up to 1,024 characters in length.

Table names can be in one of the following three formats: tablename, db.tablename, or schema.tablename.

Hive (Hudi table format)

You can select Hive (Hudi data source) as the source data source only when the real-time computing engine is Apache Flink and the compute resource is a Flink on YARN deployment.

Parameter

Description

Data Source Configuration

Data Source Type

Select Hive.

Datasource

You can only select a Hive data source in Hudi table format. You can also click New to create a data source on the Datasource page. For more information, see Create a Hive data source.

Important

Enable logging for the data source and ensure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time.

Sync Rule Configuration

Sync Policy

Only Real-time Incremental is supported. Incremental changes from the source database are collected and written to the downstream destination database in real time in the order they occur.

Select Table

Select a single table for real-time synchronization.

PolarDB (MySQL database type)

Parameter

Description

Data Source Configuration

Data Source Type

Select PolarDB.

Datasource

You can only select a PolarDB data source of the MySQL database type. You can also click New to create a data source on the Datasource page. For more information, see Create a PolarDB data source.

Important

Enable logging for the data source and ensure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time.

Time Zone

Displays the time zone configured for the selected data source.

Sync Rule Configuration

Sync Policy

Select Real-time Incremental or Real-time Incremental + Full. The default value is Real-time Incremental.

  • Real-time Incremental: Collects incremental changes from the source database and writes them to the downstream destination database in the order they occur.

  • Real-time Incremental + Full: Imports the full data from the source database at once, and then collects and writes incremental changes to the downstream destination database in the order they occur.

Note

You can select Real-time Incremental + Full when the destination data source is Hive (Hudi table format), MaxCompute, or Databricks.

Selection Method

You can select Entire Database, Select Tables, or Exclude Tables.

  • Entire Database: Synchronizes all tables in all databases under the selected data source.

  • Select Tables/Exclude Tables: Selects some tables in the current database for real-time synchronization. After you select the tables, you can click Preview to view all matched tables in the Select/Exclude Table Preview dialog box. In the dialog box, you can search for tables by keyword and delete tables individually or in batches. Deletion is not supported for regular expression matches.

    • Batch Select/Batch Exclude: When you use Batch Select, the selected tables in the current database are synchronized in real time. When you use Batch Exclude, the selected tables are not synchronized.

    • Regular Expression Match: Enter a regular expression for table names in the Regular Expression box. Java regular expressions are supported, such as schemaA.*|schemaB.*.

Destination data source

MaxCompute

Parameter

Description

Data Source Configuration

Data Source Type

Select MaxCompute.

Datasource

Select a destination data source. You can select a MaxCompute data source and project. You can also click New to create a data source on the data source page. For more information, see Create a MaxCompute data source.

New Sink Table Configuration

New Table Type

You can select Standard Table or Delta Table. The default value is Standard Table.

If you select Delta Table and set the sink table creation method to Auto Create Table, a MaxCompute Delta table is created. Additional fields are not used when creating a Delta table.

Note

After you configure the sink table, if you change the new table type, the system prompts you for confirmation. If you click OK in the dialog box, the sink table configuration is cleared and you must reconfigure it.

Table Name Transform

Destination table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure table name transformation rules.

Click Configure Table Name Transform to open the Configure Table Name Transformation Rules dialog box.

  • Replace String: Click New Rule to add a rule. Configure the Source Table String To Replace and Destination Table Replacement String. You can add up to 5 rules.

  • Table Name Prefix/Suffix: This cannot be empty. It can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters.

Note
  • After you configure the table name transformation, the system automatically matches and replaces strings based on the rules from top to bottom.

  • Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.

Partition Format

If you set New Table Type to Standard Table, only Multiple Partitions is supported. If you set New Table Type to Delta Table, you can select No Partition or Multiple Partitions.

Partition Interval

If you set Partition Format to No Partition, you cannot configure the partition interval. If you set Partition Format to Multiple Partitions, you can set Partition Interval to Hour or Day.

Note
  • Hour: Creates four levels of partitions: YYYY, MM, DD, and HH.

  • Day: Creates three levels of partitions: YYYY, MM, and DD.

MySQL

Parameter

Description

Data Source Configuration

Data Source Type

Select MySQL.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a MySQL data source.

Time Zone

Displays the time zone configured for the selected data source.

New Sink Table Configuration

Table Name Transform

Destination table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure table name transformation rules.

Click Configure Table Name Transform to open the Configure Table Name Transformation Rules dialog box.

  • Replace String: Click New Rule to add a rule. Configure the Source Table String To Replace and Destination Table Replacement String. You can add up to 5 rules.

  • Table Name Prefix/Suffix: This cannot be empty. It can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters.

Note
  • After you configure the table name transformation, the system automatically matches and replaces strings based on the rules from top to bottom.

  • Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.

Microsoft SQL Server

Parameter

Description

Data Source Configuration

Data Source Type

Select Microsoft SQL Server.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a Microsoft SQL Server data source.

Time Zone

Displays the time zone configured for the selected data source.

New Sink Table Configuration

Table Name Transform

Destination table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure table name transformation rules.

Click Configure Table Name Transform to open the Configure Table Name Transformation Rules dialog box.

  • Replace String: Click New Rule to add a rule. Configure the Source Table String To Replace and Destination Table Replacement String. You can add up to 5 rules.

  • Table Name Prefix/Suffix: This cannot be empty. It can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters.

Note
  • After you configure the table name transformation, the system automatically matches and replaces strings based on the rules from top to bottom.

  • Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.

Oracle

Parameter

Description

Data Source Configuration

Data Source Type

Select Oracle.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create an Oracle data source.

Time Zone

Displays the time zone configured for the selected data source.

New Sink Table Configuration

Table Name Transform

Destination table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure table name transformation rules.

Click Configure Table Name Transform to open the Configure Table Name Transformation Rules dialog box.

  • Replace String: Click New Rule to add a rule. Configure the Source Table String To Replace and Destination Table Replacement String. You can add up to 5 rules.

  • Table Name Prefix/Suffix: This cannot be empty. It can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters.

Note
  • After you configure the table name transformation, the system automatically matches and replaces strings based on the rules from top to bottom.

  • Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.

Kafka

Parameter

Description

Data Source Configuration

Data Source Type

Select Kafka.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a Kafka data source.

Destination Topic

The topic for the destination data. You can select Single Topic or Multiple Topics. If you select Single Topic, you must select a destination topic. You can enter a keyword of the topic name to search. If you select Multiple Topics, you can configure topic name transformation and topic parameters.

  • Single Topic: All table messages are written to the same topic.

  • Multiple Topics: A topic with the same name is created for each table.

Data Format

Set the storage format for the written data. Supported formats include DTS Avro and Canal Json.

  • DTS Avro: A data serialization format that converts data structures or objects into a format that is easy to store or transmit.

  • Canal Json: A format compatible with Canal. The data storage format is Canal Json.

Note

If you set Destination Topic to Multiple Topics, you can only select Canal Json as the data format.

Destination Topic Configuration

Topic Name Transform

Click Configure Topic Name Transform. In the Configure Topic Name Transformation Rules dialog box, you can configure Topic Name Transformation Rules and a prefix and suffix for the topic name.

  • Topic Name Transformation Rules: Click New Rule to add a rule. You must enter the Source Table String To Replace and the Destination Topic Replacement String. Neither can be empty. The Destination Topic Replacement String can contain only letters, digits, and underscores (_) and be up to 32 characters in length.

  • Prefix and suffix for the topic name: You can enter letters, digits, and underscores (_). The length cannot exceed 32 characters.

Note
  • Letters in the replacement strings and the topic name prefix and suffix are automatically converted to lowercase.

  • You can configure topic name transformation only when Destination Topic is set to Multiple Topics.

Topic Parameters

Additional parameters for creating a topic. The format is key=value. Separate multiple parameters with line breaks.

Note

This item can be configured only when Destination Topic is set to Multiple Topics.

DataHub

Parameter

Description

Destination Data

Data Source Type

Select DataHub.

Datasource

Select a destination data source.

The system provides a shortcut to create a new data source. You can click New to create a DataHub data source on the data source page. For more information, see Create a DataHub data source.

Destination Topic Creation Method

You can select New Topic or Use Existing Topic.

  • New Topic: Manually enter the destination topic to create it.

  • Use Existing Topic: Use an existing topic in the destination database. Ensure that the schema of the topic is consistent with the format of the synchronization message. Otherwise, the sync task will fail.

Destination Topic

  • Set Target Topic Creation Method to Create Topic.

    You must manually enter the Destination Topic. The Destination Topic name must start with a lowercase letter and can contain 3 to 64 digits, letters, and underscores (_).

    After you enter the name, you can click Validate to check if the topic already exists in the destination database.

    • If the topic does not exist in the destination database, it is automatically created. Its schema is the schema of the synchronization message, and its default lifecycle is 7 days.

    • If the topic already exists in the destination database, ensure that its schema is consistent with the schema of the synchronization message. Otherwise, the task will fail.

  • Target Topic Creation Method is Use Existing Topic.

    Click the drop-down list to select an existing topic in the destination database. If there are many topics, you can enter a topic name to search for the one you need.

Databricks

Parameter

Description

Data Source Configuration

Data Source Type

Select Databricks.

Datasource

Select a destination data source. You can select a Databricks data source and project. You can also click New to create a data source on the data source page. For more information, see Create a Databricks data source.

Time Zone

Time-formatted data is processed based on the current time zone. The default value is the time zone configured in the selected data source and cannot be changed.

Note

Time zone conversion is supported only when the source data source type is MySQL or PostgreSQL and the destination data source type is Databricks.

New Sink Table Configuration

Table Name Transform

Destination table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure table name transformation rules.

Click Configure Table Name Transform to open the Configure Table Name Transformation Rules dialog box.

  • Replace String: Click New Rule to add a rule. Configure the Source Table String To Replace and Destination Table Replacement String. You can add up to 5 rules.

  • Table Name Prefix/Suffix: This cannot be empty. It can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters.

Note
  • After you configure the table name transformation, the system automatically matches and replaces strings based on the rules from top to bottom.

  • Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.

Partition Format

You can select No Partition or Multiple Partitions.

Partition Interval

If you set Partition Format to No Partition, you cannot configure the partition interval. If you set Partition Format to Multiple Partitions, you can set Partition Interval to Hour or Day.

Note
  • Hour: Creates four levels of partitions: YYYY, MM, DD, and HH.

  • Day: Creates three levels of partitions: YYYY, MM, and DD.

SelectDB

Parameter

Description

Data Source Configuration

Data Source Type

Select SelectDB.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a SelectDB data source.

New Sink Table Configuration

Table Name Transform

Destination table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure table name transformation rules.

Click Configure Table Name Transform to open the Configure Table Name Transformation Rules dialog box.

  • Replace String: Click New Rule to add a rule. Configure the Source Table String To Replace and Destination Table Replacement String. You can add up to 5 rules.

  • Table Name Prefix/Suffix: This cannot be empty. It can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters.

Note
  • After you configure the table name transformation, the system automatically matches and replaces strings based on the rules from top to bottom.

  • Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.

Hive

Parameter

Description

Data Source Configuration

Data Source Type

Set Data Source Type to Hive.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a Hive data source.

New Sink Table Configuration

Data Lake Table Format

You can select None, Hudi, Iceberg, or Paimon.

  • None: Data is written and tables are created as standard Hive tables.

  • Hudi: Data is written and tables are created in Hudi format. You can select Hudi only when the Hive data source version is CDP7.x Hive 3.1.3.

  • Iceberg: Data is written and tables are created in Iceberg format. You can select Iceberg only when the Hive data source version is EMR5.x Hive 3.1.x.

  • Paimon: Data is written and tables are created in Paimon format. You can select Paimon only when the Hive data source version is EMR5.x Hive 3.1.x.

Note

This item can be configured only when Data Lake Table Format Configuration is enabled for the selected Hive data source.

Hudi Table Type/Paimon Table Type

For Hudi Table Type, you can select MOR (merge on read) or COW (copy on write).

For Paimon Table Type, you can select MOR (merge on read), COW (copy on write), or MOW (merge on write).

Note

This item can be configured only when Data Lake Table Format is set to Hudi or Paimon.

Table Creation Execution Engine

You can select Hive or Spark. After you select a data lake table format, Spark is selected by default.

  • Hive: Uses the Hive engine to create tables. The table creation syntax is Hive syntax.

  • Spark: Uses the Spark engine to create tables. The table creation syntax is Spark syntax. You can select Spark only when Spark is enabled for the Hive data source.

    Note

    When Data Lake Table Format is set to Paimon, only the Spark table creation execution engine is supported.

Table Name Transform

Destination table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure table name transformation rules.

Click Configure Table Name Transform to open the Configure Table Name Transformation Rules dialog box.

  • Replace String: Click New Rule to add a rule. Configure the Source Table String To Replace and Destination Table Replacement String. You can add up to 5 rules.

  • Table Name Prefix/Suffix: This cannot be empty. It can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters.

Note
  • After you configure the table name transformation, the system automatically matches and replaces strings based on the rules from top to bottom.

  • Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.

Partition Format

You can select Single Partition, Multiple Partitions, or Fixed Partition.

Note

When the format is set to Single Partition or Fixed Partition, the default partition field name is ds and cannot be changed.

Partition Interval

The default value is Hour. You can also select Day. Click the image icon next to the partition interval to view the partition setting details.

  • Single Partition:

    • Hour: Displays a hash partition named ds (yyyyMMddhh).

    • Day: Displays a hash partition named ds (yyMMdd).

  • Multiple Partitions:

    • Hour: Displays four levels of partitions: yyyy, mm, dd, and hh.

    • Day: Displays three levels of partitions: yyyy, mm, and dd.

Note

This configuration item is supported only when Partition Format is set to Single Partition or Multiple Partitions.

Partition Value

Enter a fixed partition value, for example, 20250101.

Note

This configuration item is supported only when Partition Format is set to Fixed Partition.

Mapping configuration

Note

Mapping configuration is not supported when the destination data source type is DataHub, or when the destination data source is Kafka and the destination topic is a single topic.

Destination data source is not Kafka

image

Area

Description

View Additional Fields

During real-time incremental synchronization, additional fields are automatically added by default when a table is created to facilitate data use. Click View Additional Fields to view the fields. In the Additional Fields dialog box, you can view information about the currently added fields.

Important
  • If you select an existing table as the sink table and it has no additional fields, add the additional fields to the existing sink table. Otherwise, data usage will be affected.

  • After you select a data lake table format, additional fields are not included.

Click View DDL For Adding Fields to view the DDL statement for adding the additional fields.

Note
  • Viewing additional fields is not supported when the source data source type is Kafka.

  • If the sink table is a primary key table, you do not need to add additional fields. If the sink table is not a primary key table, you must add additional fields.

Search and Filter Area

You can search by Source Table and Sink Table Name. To quickly filter sink tables, click the 1 icon at the top. You can filter by Mapping Status and Creation Method.

Add Global Fields, Refresh Mappings

  • Add Global Fields

    Click Add Global Fields to add global fields in the Add Global Fields dialog box.

    • Name: The name of the global field.

    • Type: Supported data types are String, Long, Double, Date, and Boolean.

    • Value: The value of the global field.

    • Description: A description of the field.

    Note
    • If a field is added both globally and to a single table, only the field added to the single table takes effect.

    • Currently, only constants can be added.

    • Global fields take effect only for sink tables created using the Auto Create Table method.

    • Adding global fields is not supported when the source data source type is Kafka.

  • Refresh Mappings

    To refresh the sink table configuration list, click Refresh Mappings.

    Important
    • If the sink table configuration already has content, reselecting the data source type and data source will reset the sink table list and mapping status. Proceed with caution.

    • You can click to refresh again at any time during the refresh process. Each time you click Refresh Mappings, only the configured global fields are saved. Other information, including the sink table creation method, sink table name, and deletion records, is not saved.

    • When the source data source type is Kafka, clicking Refresh Mappings will map the tables according to the table list in the Sync Rule Configuration. An error is reported if a table does not exist.

Destination Database List

The destination database list includes Serial Number, Source Table, Mapping Status, Sink Table Creation Method, and Sink Table Name. You can also Add Field, View Fields, Refresh, or Delete a sink table.

  • Mapping Status:

    • Completed: The mapping is completed normally.

    • Incomplete: The mapping has not been refreshed after a status change.

    • Mapping: Waiting for mapping or in the process of mapping.

    • Abnormal: A data source or internal system error exists.

    • Failed: The destination partitioned table is inconsistent with the partition set for the real-time task.

    • Alerting: The source and sink tables may have incompatible data types.

  • Sink Table Creation Method has three options:

    • If a table with the same name as the source table exists in the destination database, the creation method is Use Existing Table. This table is used as the sink table by default. To change to Auto Create Table, you must add a table name transformation rule or a prefix/suffix and then remap.

    • If no table with the same name is found in the destination database, the creation method defaults to Auto Create Table. You can also change the method to Use Existing Table and select an existing table for synchronization.

    • You can add fields or use custom DDL to create tables only for tables that are automatically created. Global fields also take effect only for automatically created tables.

    Note
    • When the destination data source type is Hive:

      • If you use Auto Create Table and the data lake table format is None, a standard Hive table is created. Otherwise, a table of the selected format is created. Hudi and Iceberg are currently supported.

      • If you use Custom Table Creation and the data lake table format is None, the DDL for a standard Hive table is used. Otherwise, you must use the DDL for the selected table format. Hudi and Iceberg are currently supported.

    • When the source data source type is Kafka, the only supported sink table creation method is Use Existing Table.

    • When the destination data source type is SelectDB, if the source table has no primary key during automatic table creation, a Duplicate table is created. If the source table has a primary key, a Unique table is created.

    • When the partition format is Single Partition or Fixed Partition and the sink table creation method is Use Existing Table, the system automatically checks if the sink table partition meets the partition settings. An error is reported if it does not.

  • Sink Table Name: Sink table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure table name transformation rules.

    When the destination data source type is MaxCompute, the sink table creation method is Auto Create Table, and the new table type is Delta Table, the image icon is displayed after the sink table name, indicating that a new Delta table will be created. When the sink table creation method is Use Existing Table, if the user selects a Delta table in the sink table list, the image icon is also displayed after the sink table name, indicating that the table is a Delta table.

  • Actions:

    • Custom Table Creation: You can create a table using Add Field or DDL. After you enable custom table creation, global fields no longer take effect.

      Note
      • After a field is added, it is displayed only in the actions column for Auto Create Table.

      • You cannot modify an existing sink table, which is a sink table created using the Use Existing Table method.

    • View Fields: View the fields and types of the source and sink tables.

    • Refresh: Remap the source and sink tables.

    • Delete: Deleting a source table cannot be undone.

Batch Operations

You can perform batch Delete operations on sink tables.

Destination data source is Kafka (destination topic is multiple topics)

image

Area

Description

Search and Filter Area

You can search by Source Table and Destination Topic Name. To quickly filter sink tables, click the 1 icon at the top. You can filter by Mapping Status and Destination Topic Creation Method.

Refresh Mappings

To refresh the sink table configuration list, click Refresh Mappings.

Important

If the destination topic configuration already has content, reselecting the data source type and data source will reset the destination topic list and mapping status. Proceed with caution.

List

The list includes Serial Number, Source Table, Mapping Status, Destination Topic Creation Method, and Destination Topic Name. You can also delete a sink table.

  • Destination Topic Creation Method: If the destination topic already exists, the creation method is Use Existing Topic. If the destination topic does not exist, the creation method is Auto Create Topic.

    When you use Auto Create Topic, the system creates the topic based on the generated destination topic name and topic parameters.

  • Mapping Status: Only checks if the destination topic exists.

  • Delete: Deletes the corresponding row. This operation cannot be undone.

Batch Operations

You can perform batch Delete operations on sink tables.

DDL message handling policy

Note
  • The DDL message handling policy is not supported when the source data source type is DataHub or Kafka.

  • The DDL message handling policy is not supported when the destination data source type is PostgreSQL or Hive (Hudi table type).

  • When the destination data source type is Hive (Hudi table type) and the data lake table format is Hudi, only the Ignore policy is supported.

  • When the source data source type is Kafka, only the Ignore policy is supported.

  • Data for new columns added to existing partitions of Hive or MaxCompute tables cannot be synchronized. The data for these new columns in existing partitions will be NULL. Data synchronization will function correctly for subsequent new partitions.

  • Create Table, Add Column, etc.: DDL operations are processed normally. These operations include creating tables, adding columns, deleting columns, renaming columns, and modifying column types. This DDL information is sent to the destination data source for processing. Processing policies vary for different destination data sources.

  • Ignore: Discards the DDL message and does not send it to the destination data source.

  • Error: Immediately stops the real-time sync task with an error status.

Step 3: Configure real-time integration task properties

  1. Click Resource Configuration in the top menu bar of the current real-time integration task tab, or click Property in the right-side sidebar to open the Property panel.

  2. Configure the Basic Information and Resource Configuration for the current real-time integration task.

    • Basic Information: Select the Development Owner and Operation Owner for the current real-time integration task, and enter a Description for the task. The description can be up to 1,000 characters.

    • Resource Configuration: For more information, see Real-time integration resource configuration.

Step 4: Submit the real-time integration task

  1. Click Submit to submit the current real-time integration task.

  2. In the Submit dialog box, enter the Submission Remarks and click OK And Submit.

  3. After the submission is complete, you can view the submission details in the Submit dialog box.

    If the project is in Dev-Prod mode, you need to publish the real-time integration task to the production environment. For more information, see Manage release tasks.

What to do next

You can view and manage the real-time integration task in the Operation Center to ensure that it runs as expected. For more information, see View and manage real-time tasks.