All Products
Search
Document Center

Dataphin:Create a real-time integration task

Last Updated:Mar 05, 2026

Real-time integration enables you to combine and collect data from multiple source data sources into a single destination data source. This establishes a real-time synchronization link for data sync. This topic describes how to create a real-time integration task.

Prerequisites

You must configure at least one data source before creating a real-time integration task. This lets you select the source and destination data sources when configuring the task. For more information, see Supported data sources for real-time integration.

Background information

  • If the destination data source is Oracle or MySQL, the Java Database Connectivity (JDBC) protocol is used. Messages are processed according to the following policies.

    • If the sink table does not have a primary key.

      • When an INSERT message is received, it is appended directly.

      • When an UPDATE_BEFORE message is received, it is discarded. When an UPDATE_AFTER message is received, it is appended directly.

      • When a DELETE message is received, it is discarded.

    • If the sink table has a primary key.

      • When an INSERT message is received, it is processed as an UPSERT message.

      • When an UPDATE_BEFORE message is received, it is discarded. When an UPDATE_AFTER message is received, it is processed as an UPSERT message.

      • When a DELETE message is received, it is processed as a DELETE message.

  • The JDBC protocol writes data immediately. If a task fails over and the sink table has no primary key, duplicate data may result. Exactly-once delivery is not guaranteed.

  • The JDBC protocol supports only DDL statements for creating tables and adding fields. DDL messages of other types are discarded.

  • Oracle supports only basic data types. The INTERVAL YEAR, INTERVAL DAY, BFILE, SYS.ANY, XML, map, ROWID, and UROWID data types are not supported.

  • MySQL supports only basic data types. The map data type is not supported.

  • To prevent data inconsistency caused by out-of-order data, only a single concurrent task is supported.

  • The Oracle data source supports Oracle Database 11g, Oracle Database 19c, and Oracle Database 21c.

  • The MySQL data source supports MySQL 8.0, MySQL 8.4, and MySQL 5.7.

Step 1: Create a real-time integration task

  1. In the top menu bar of the Dataphin homepage, choose Develop > Data Integration.

  2. In the top menu bar, select a project. If you are in Dev-Prod mode, select an environment.

  3. In the left navigation pane, select Integration > Real-time Integration.

  4. Click the image icon in the real-time integration list and select Real-time Integration Task to open the Create Real-time Integration Task dialog box.

  5. In the Create Real-time Integration Task dialog box, configure the following parameters.

    Parameter

    Description

    Task Name

    Enter a name for the real-time task.

    The name must start with a letter, contain only lowercase letters, digits, and underscores (_), and be 4 to 63 characters in length.

    Production/Development environment queue resource

    You can select all resource groups that are configured for real-time tasks.

    Note

    This configuration item is supported only when the compute source used by the project is a Flink compute source in Kubernetes deployment mode.

    Description

    Enter a brief description of the task. The description can be up to 1,000 characters in length.

    Select Directory

    Select the folder where the real-time task is stored.

    If no folder is created, you can create one as follows:

    1. Above the real-time task list on the left, click the image icon to open the New Folder dialog box.

    2. In the New Folder dialog box, enter a folder Name and Select Directory as needed.

    3. Click OK.

  6. After you complete the configuration, click OK.

Step 2: Configure the real-time integration task

The supported source and destination data sources depend on the real-time computing engine. For more information, see Supported data sources for real-time integration.

Source data source

Note

If the source data source is an external data source and you select Entire Database or Select Tables with Batch Select, the table names are retrieved from the Metadata Center. If no metadata acquisition task is configured for the data source, go to Metadata > Acquisition Task to create one.

MySQL

Parameter

Description

Data Source Configuration

Data Source Type

Select MySQL.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a MySQL data source.

Important

Enable logging for the data source and make sure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time.

Time Zone

The time zone configured for the selected data source.

Sync Rule Configuration

Sync Solution

Select Real-time Incremental or Real-time Incremental + Full. The default value is Real-time Incremental.

  • Real-time Incremental: Collects incremental changes from the source database and writes them to the downstream destination database in the order they occur.

  • Real-time Incremental + Full: Imports the full data from the source database at one time, and then collects and writes incremental changes to the downstream destination database in the order they occur.

Note

If the destination data source is Hive (Hudi table format), MaxCompute, or Databricks, you can set Sync Solution to Real-time Incremental + Full.

Selection Method

You can select Entire Database, Select Tables, or Exclude Tables.

  • Entire Database: Synchronizes all tables in all databases under the selected data source.

  • Select Tables/Exclude Tables: Selects some tables in the current database for real-time synchronization. After you select tables, click Preview to view all matched tables in the Select/Exclude Table Preview dialog box. In the dialog box, you can search for tables by keyword and delete tables individually or in batches. You cannot delete tables if you use Regex Match.

    • Batch Select/Batch Exclude: If you select Batch Select, multiple selected tables in the current database are synchronized in real time. If you select Batch Exclude, multiple selected tables in the current database are not synchronized in real time.

      You can select all tables in all databases under the selected data source. Tables are displayed in the format of DBname.Tablename.

    • Regex Match: Enter a regular expression for table names in the Regular Expression input box. Java regular expressions are supported, such as schemaA.*|schemaB.*.

      You can match all tables in all databases under the selected data source in batches. You can use the database name (DBname) and table name (Tablename) for regex matching.

Microsoft SQL Server

Parameter

Description

Data Source Configuration

Data Source Type

Select Microsoft SQL Server.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a Microsoft SQL Server data source.

Important

Enable logging for the data source and make sure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time.

Time Zone

The time zone configured for the selected data source.

Sync Rule Configuration

Sync Solution

Only Real-time Incremental is supported. Collects incremental changes from the source database and writes them to the downstream destination database in real time in the order they occur.

Selection Method

You can select Entire Database, Select Tables, or Exclude Tables.

  • Entire Database: Synchronizes data for the entire current database.

  • Select Tables/Exclude Tables: Selects some tables in the current database for real-time synchronization. After you select tables, click Preview to view all matched tables in the Select/Exclude Table Preview dialog box. In the dialog box, you can search for tables by keyword and delete tables individually or in batches.

    Batch Select/Batch Exclude: If you select Batch Select, multiple selected tables in the current database are synchronized in real time. If you select Batch Exclude, multiple selected tables in the current database are not synchronized in real time.

PostgreSQL

Parameter

Description

Data Source Configuration

Data Source Type

Select PostgreSQL.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a PostgreSQL data source.

Important

Enable logging for the data source and make sure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time.

Time Zone

The time zone configured for the selected data source.

Sync Rule Configuration

Sync Solution

Only Real-time Incremental is supported. Collects incremental changes from the source database and writes them to the downstream destination database in real time in the order they occur.

Selection Method

You can select Entire Database or Select Tables.

  • Entire Database: Synchronizes data for the entire current database.

  • Select Tables: Selects some tables in the current database for real-time synchronization. After you select tables, click Preview to view all matched tables in the Select Table Preview dialog box. In the dialog box, you can search for tables by keyword and delete tables individually or in batches.

    Batch Select: The selected tables in the current database are synchronized in real time.

Oracle

Parameter

Description

Data Source Configuration

Data Source Type

Select Oracle.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create an Oracle data source.

Important

Enable logging for the data source and make sure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time.

Time Zone

The time zone configured for the selected data source.

Sync Rule Configuration

Sync Solution

Only Real-time Incremental is supported. Collects incremental changes from the source database and writes them to the downstream destination database in real time in the order they occur.

Selection Method

You can select Entire Database, Select Tables, or Exclude Tables.

  • Entire Database: Synchronizes all tables in all databases under the selected data source.

  • Select Tables/Exclude Tables: Selects some tables in the current database for real-time synchronization. After you select tables, click Preview to view all matched tables in the Select/Exclude Table Preview dialog box. In the dialog box, you can search for tables by keyword and delete tables individually or in batches. You cannot delete tables if you use Regex Match.

    • Batch Select/Batch Exclude: If you select Batch Select, multiple selected tables in the current database are synchronized in real time. If you select Batch Exclude, multiple selected tables in the current database are not synchronized in real time.

    • Regex Match: Enter a regular expression for table names in the Regular Expression input box. Java regular expressions are supported, such as schemaA.*|schemaB.*.

IBM DB2

Parameter

Description

Data Source Configuration

Data Source Type

Select IBM DB2.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create an IBM DB2 data source.

Important

Enable logging for the data source and make sure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time.

Sync Rule Configuration

Sync Solution

Only Real-time Incremental is supported. Collects incremental changes from the source database and writes them to the downstream destination database in real time in the order they occur.

Selection Method

You can select Entire Database, Select Tables, or Exclude Tables.

  • Entire Database: Synchronizes all tables in all databases under the selected data source.

  • Select Tables/Exclude Tables: Selects some tables in the current database for real-time synchronization. After you select tables, click Preview to view all matched tables in the Select/Exclude Table Preview dialog box. In the dialog box, you can search for tables by keyword and delete tables individually or in batches.

    Batch Select/Batch Exclude: If you select Batch Select, multiple selected tables in the current database are synchronized in real time. If you select Batch Exclude, multiple selected tables in the current database are not synchronized in real time.

Kafka

Parameter

Description

Data Source Configuration

Data Source Type

Select Kafka.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a Kafka data source.

Important

Enable logging for the data source and make sure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time.

Source topic

Select the topic of the source data. You can enter a keyword in the topic name to perform a fuzzy search.

Data format

Only Canal JSON is supported. Canal JSON is a format compatible with Canal. Data is stored in Canal JSON format.

Key Type

The key type for Kafka, which determines the key.deserializer configuration when initializing KafkaConsumer. Only STRING is supported.

Value Type

The value type for Kafka, which determines the value.deserializer configuration when initializing KafkaConsumer. Only STRING is supported.

Consumer Group ID (optional)

Enter the ID of the consumer group. The consumer group ID is used to report the status offset.

Sync Rule Configuration

Table List

Enter the names of the tables to be synchronized. Separate multiple table names with line breaks. The value can be up to 1,024 characters in length.

Table names can be in one of the following three formats: tablename, db.tablename, or schema.tablename.

Hive (Hudi table format)

You can select Hive (Hudi data source) as the source data source only when the real-time engine is Apache Flink and the compute source is a Flink on YARN deployment.

Parameter

Description

Data Source Configuration

Data Source Type

Select Hive.

Datasource

You can only select a Hive data source in Hudi table format. You can also click New to create a data source on the Datasource page. For more information, see Create a Hive data source.

Important

Enable logging for the data source and make sure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time.

Sync Rule Configuration

Sync Solution

Only Real-time Incremental is supported. Collects incremental changes from the source database and writes them to the downstream destination database in real time in the order they occur.

Select Table

Select a single table for real-time synchronization.

PolarDB (MySQL database type)

Parameter

Description

Data Source Configuration

Data Source Type

Select PolarDB.

Datasource

You can only select a PolarDB data source of the MySQL database type. You can also click New to create a data source on the Datasource page. For more information, see Create a PolarDB data source.

Important

Enable logging for the data source and make sure that the configured account has permissions to read logs. Otherwise, the system cannot synchronize data from this data source in real time.

Time Zone

The time zone configured for the selected data source.

Sync Rule Configuration

Sync Solution

Select Real-time Incremental or Real-time Incremental + Full. The default value is Real-time Incremental.

  • Real-time Incremental: Collects incremental changes from the source database and writes them to the downstream destination database in the order they occur.

  • Real-time Incremental + Full: Imports the full data from the source database at one time, and then collects and writes incremental changes to the downstream destination database in the order they occur.

Note

If the destination data source is Hive (Hudi table format), MaxCompute, or Databricks, you can set Sync Solution to Real-time Incremental + Full.

Selection Method

You can select Entire Database, Select Tables, or Exclude Tables.

  • Entire Database: Synchronizes all tables in all databases under the selected data source.

  • Select Tables/Exclude Tables: Selects some tables in the current database for real-time synchronization. After you select tables, click Preview to view all matched tables in the Select/Exclude Table Preview dialog box. In the dialog box, you can search for tables by keyword and delete tables individually or in batches. You cannot delete tables if you use Regex Match.

    • Batch Select/Batch Exclude: If you select Batch Select, multiple selected tables in the current database are synchronized in real time. If you select Batch Exclude, multiple selected tables in the current database are not synchronized in real time.

    • Regex Match: Enter a regular expression for table names in the Regular Expression input box. Java regular expressions are supported, such as schemaA.*|schemaB.*.

Destination data source

MaxCompute

Parameter

Description

Data Source Configuration

Data Source Type

Select MaxCompute.

Datasource

Select a destination data source. You can select a MaxCompute data source and project. You can also click New to create a data source on the data source page. For more information, see Create a MaxCompute data source.

Sink Table Creation Configuration

New Table Type

Select Standard Table or Delta Table. The default value is Standard Table.

If you select Delta Table and set the sink table creation method to Auto-create table, a MaxCompute Delta table is created. Additional fields are not used when creating a Delta table.

Note

After you configure the sink table, if you change the new table type, the system asks for confirmation. If you click OK in the dialog box, the sink table configuration is cleared and you must re-enter it.

Table Name Transform

Sink table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure a table name transform rule.

Click Configure Table Name Transform to open the Configure Table Name Transform Rules dialog box.

  • Replace String: Click New Rule to add a rule. Configure Source Table String to Replace and Sink Table Replacement String. You can add up to 5 rules.

  • Table Name Prefix/Suffix: Cannot be empty. Can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters.

Note
  • After you configure the table name transform, the system automatically matches and replaces strings based on the transform rules in top-down order.

  • Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.

Partition Format

If you select standard table as the table type, the partition format supports only Multi-partition. If you select Delta table as the table type, the partition format supports No Partition or Multi-partition.

Partition Interval

If you set Partition Format to No Partition, you cannot configure the partition interval. If you set Partition Format to Multiple Partitions, you can set the partition interval to hour or day.

Note
  • hour: Four levels of partitions: YYYY, MM, DD, and HH.

  • day: Three levels of partitions: YYYY, MM, and DD.

MySQL

Parameter

Description

Data Source Configuration

Data Source Type

Select MySQL.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a MySQL data source.

Time Zone

The time zone configured for the selected data source.

Sink Table Creation Configuration

Table Name Transform

Sink table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure a table name transform rule.

Click Configure Table Name Transform to open the Configure Table Name Transform Rules dialog box.

  • Replace String: Click New Rule to add a rule. Configure Source Table String to Replace and Sink Table Replacement String. You can add up to 5 rules.

  • Table Name Prefix/Suffix: Cannot be empty. Can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters.

Note
  • After you configure the table name transform, the system automatically matches and replaces strings based on the transform rules in top-down order.

  • Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.

Microsoft SQL Server

Parameter

Description

Data Source Configuration

Data Source Type

Select Microsoft SQL Server.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a Microsoft SQL Server data source.

Time Zone

The time zone configured for the selected data source.

Sink Table Creation Configuration

Table Name Transform

Sink table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure a table name transform rule.

Click Configure Table Name Transform to open the Configure Table Name Transform Rules dialog box.

  • Replace String: Click New Rule to add a rule. Configure Source Table String to Replace and Sink Table Replacement String. You can add up to 5 rules.

  • Table Name Prefix/Suffix: Cannot be empty. Can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters.

Note
  • After you configure the table name transform, the system automatically matches and replaces strings based on the transform rules in top-down order.

  • Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.

Oracle

Parameter

Description

Data Source Configuration

Data Source Type

Select Oracle.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create an Oracle data source.

Time Zone

The time zone configured for the selected data source.

Sink Table Creation Configuration

Table Name Transform

Sink table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure a table name transform rule.

Click Configure Table Name Transform to open the Configure Table Name Transform Rules dialog box.

  • Replace String: Click New Rule to add a rule. Configure Source Table String to Replace and Sink Table Replacement String. You can add up to 5 rules.

  • Table Name Prefix/Suffix: Cannot be empty. Can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters.

Note
  • After you configure the table name transform, the system automatically matches and replaces strings based on the transform rules in top-down order.

  • Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.

Kafka

Parameter

Description

Data Source Configuration

Data Source Type

Select Kafka.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a Kafka data source.

Destination Topic

The topic for the destination data. You can select Single Topic or Multiple Topics. If you select Single Topic, select a destination topic. You can enter a keyword in the topic name to search. If you select Multiple Topics, you can configure topic name transform and topic parameters.

  • Single Topic: All table messages are written to the same topic.

  • Multiple Topics: A topic with the same name is created for each table.

Data format

Set the storage format for the written data. Supported formats include DTS Avro and Canal Json.

  • DTS Avro: A data serialization format that converts data structures or objects into a format that is easy to store or transmit.

  • Canal Json: A format compatible with Canal. Data is stored in Canal JSON format.

Note

If you set Destination Topic to Multiple Topics, you can only set Data format to Canal Json.

Destination topic configuration

Topic Name Transform

Click Configure Topic Name Transform. In the Configure Topic Name Transform Rules dialog box, configure Topic Name Transform Rules and a prefix and suffix for the topic name.

  • Topic Name Transform Rules: Click New Rule to add a rule. You must enter the Source Table String to Replace and Destination Topic Replacement String. Neither can be empty, and the Destination Topic Replacement String can only contain letters, digits, and underscores (_) and be up to 32 characters in length.

  • Prefix and suffix for the topic name: Can contain letters, digits, and underscores (_). The length cannot exceed 32 characters.

Note
  • Letters in the replacement strings and topic name prefixes and suffixes are automatically converted to lowercase.

  • You can configure topic name transform only when Destination Topic is set to Multiple Topics.

Topic Parameters

Additional parameters for creating a topic. The format is key=value. Separate multiple parameters with line breaks.

Note

This item can be configured only when Destination Topic is set to Multiple Topics.

DataHub

Parameter

Description

Destination Data

Data Source Type

Select DataHub.

Datasource

Select a destination data source.

The system provides a shortcut to create a data source. You can click New to create a DataHub data source on the data source page. For more information, see Create a DataHub data source.

Destination Topic Creation Method

You can select New Topic or Use Existing Topic.

  • New Topic: Manually enter the destination topic to create it.

  • Use Existing Topic: Use an existing topic in the destination database. Make sure that the topic's schema is consistent with the format of the sync message. Otherwise, the sync task will fail.

Destination Topic

  • Target Topic Creation Method is New Topic.

    Manually enter the Destination Topic. The Destination Topic must start with a lowercase letter and contain 3 to 64 digits, letters, or underscores (_).

    After you enter the topic, click Validate to check if the topic already exists in the destination database.

    • If the topic does not exist in the destination database, it is automatically created. The schema is the schema of the sync message, and the default lifecycle is 7 days.

    • If the topic already exists in the destination database, make sure that the topic's schema is consistent with the schema of the sync message. Otherwise, the task will fail.

  • Target Topic Creation Method is Use Existing Topic.

    Click the drop-down list to select an existing topic in the destination database. If there are many topics, you can enter a topic name to search for the desired topic.

Databricks

Parameter

Description

Data Source Configuration

Data Source Type

Select Databricks.

Datasource

Select a destination data source. You can select a Databricks data source and project. You can also click New to create a data source on the data source page. For more information, see Create a Databricks data source.

Time Zone

Time-formatted data is processed based on the current time zone. By default, this is the time zone configured in the selected data source and cannot be modified.

Note

Time zone conversion is supported only when the source data source type is MySQL or PostgreSQL and the destination data source type is Databricks.

Sink Table Creation Configuration

Table Name Transform

Sink table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure a table name transform rule.

Click Configure Table Name Transform to open the Configure Table Name Transform Rules dialog box.

  • Replace String: Click New Rule to add a rule. Configure Source Table String to Replace and Sink Table Replacement String. You can add up to 5 rules.

  • Table Name Prefix/Suffix: Cannot be empty. Can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters.

Note
  • After you configure the table name transform, the system automatically matches and replaces strings based on the transform rules in top-down order.

  • Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.

Partition Format

You can select No Partition or Multiple Partitions.

Partition Interval

If you set Partition Format to No Partition, you cannot configure the partition interval. If you set Partition Format to Multiple Partitions, you can set the partition interval to hour or day.

Note
  • hour: Four levels of partitions: YYYY, MM, DD, and HH.

  • day: Three levels of partitions: YYYY, MM, and DD.

SelectDB

Parameter

Description

Data Source Configuration

Data Source Type

Select SelectDB.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a SelectDB data source.

Sink Table Creation Configuration

Table Name Transform

Sink table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure a table name transform rule.

Click Configure Table Name Transform to open the Configure Table Name Transform Rules dialog box.

  • Replace String: Click New Rule to add a rule. Configure Source Table String to Replace and Sink Table Replacement String. You can add up to 5 rules.

  • Table Name Prefix/Suffix: Cannot be empty. Can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters.

Note
  • After you configure the table name transform, the system automatically matches and replaces strings based on the transform rules in top-down order.

  • Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.

Hive

Parameter

Description

Data Source Configuration

Data Source Type

Set Data Source Type to Hive.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see Create a Hive data source.

Sink Table Creation Configuration

Data lake table format

You can select None, Hudi, Iceberg, or Paimon.

  • None: Writes data and creates tables as standard Hive tables.

  • Hudi: Writes data and creates tables in Hudi format. You can select Hudi only when the Hive data source version is CDP7.x Hive 3.1.3.

  • Iceberg: Writes data and creates tables in Iceberg format. You can select Iceberg only when the Hive data source version is EMR5.x Hive 3.1.x.

  • Paimon: Writes data and creates tables in Paimon format. You can select Paimon only when the Hive data source version is EMR5.x Hive 3.1.x.

Note

This item can be configured only when Data lake table format configuration is enabled for the selected Hive data source.

Hudi Table Type/Paimon Table Type

For Hudi Table Type, you can select MOR (merge on read) or COW (copy on write).

For Paimon Table Type, you can select MOR (merge on read), COW (copy on write), or MOW (merge on write).

Note

This item can be configured only when Data lake table format is set to Hudi or Paimon.

Table Creation Execution Engine

You can select Hive or Spark. If you select a data lake table format, Spark is selected by default.

  • Hive: Uses the Hive engine to create tables. The table creation syntax is Hive syntax.

  • Spark: Uses the Spark engine to create tables. The table creation syntax is Spark syntax. You can select Spark only when Spark is enabled for the Hive data source.

    Note

    When Data lake table format is set to Paimon, only Spark is supported as the table creation execution engine.

Table Name Transform

Sink table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure a table name transform rule.

Click Configure Table Name Transform to open the Configure Table Name Transform Rules dialog box.

  • Replace String: Click New Rule to add a rule. Configure Source Table String to Replace and Sink Table Replacement String. You can add up to 5 rules.

  • Table Name Prefix/Suffix: Cannot be empty. Can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters.

Note
  • After you configure the table name transform, the system automatically matches and replaces strings based on the transform rules in top-down order.

  • Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.

Partition Format

You can select Single Partition, Multiple Partitions, or Fixed Partition.

Note

If you select Single Partition or Fixed Partition, the default partition field name is ds and cannot be modified.

Partition Interval

The default value is hour. You can also select day. Click the image icon next to the partition interval to view partition setting details.

  • Single Partition:

    • hour: A hash partition (yyyyMMddhh) with the partition key column name ds.

    • Day: Displays the hash partition (yyMMdd) with the partition key column named ds.

  • Multiple Partitions:

    • hour: Four levels of partitions: yyyy, mm, dd, and hh.

    • day: Three levels of partitions: yyyy, mm, and dd.

Note

This configuration item is supported only when Partition Format is set to Single Partition or Multiple Partitions.

Partition Value

Enter a fixed partition value, for example, 20250101.

Note

This configuration item is supported only when Partition Format is set to Fixed Partition.

Hologres

Parameter

Description

Data Source Configuration

Data Source Type

Select Hologres.

Datasource

Select a destination data source. You can select a Hologres data source and project. You can also click New to create a data source on the data source page. For more information, see Create a Hologres data source.

Schema

Select a destination schema.

Sink Table Creation Configuration

Table Name Transform

Sink table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure a table name transform rule.

Click Configure Table Name Transform to open the Configure Table Name Transform Rules dialog box.

  • Replace String: Click New Rule to add a rule. Configure Source Table String to Replace and Sink Table Replacement String. You can add up to 5 rules.

  • Table Name Prefix/Suffix: Cannot be empty. Can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters.

Note
  • After you configure the table name transform, the system automatically matches and replaces strings based on the transform rules in top-down order.

  • Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.

StarRocks

Parameter

Description

Data Source Configuration

Data Source Type

Select StarRocks.

Datasource

Select a data source. You can also click New to create a data source on the Datasource page. For more information, see.

Sink Table Creation Configuration

Table Name Transform

Sink table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure a table name transform rule.

Click Configure Table Name Transform to open the Configure Table Name Transform Rules dialog box.

  • Replace String: Click New Rule to add a rule. Configure Source Table String to Replace and Sink Table Replacement String. You can add up to 5 rules.

  • Table Name Prefix/Suffix: Cannot be empty. Can contain only letters, digits, and underscores (_). The length cannot exceed 32 characters.

Note
  • After you configure the table name transform, the system automatically matches and replaces strings based on the transform rules in top-down order.

  • Letters in the replacement strings and table name prefixes and suffixes are automatically converted to lowercase.

Mapping configuration

Note
  • Mapping configuration is not supported if the destination data source is DataHub or Kafka (with a single destination topic).

  • If the destination data source is an external data source, the sink table names in the mapping configuration are retrieved from the Metadata Center. In this case, the sink table creation method does not support auto-create table. You must manually create the sink table in the database.

Destination data source is not Kafka

image

Block

Description

View additional fields

During real-time incremental synchronization, additional fields are automatically added when a table is auto-created to facilitate data use. Click View additional fields to view the fields. In the Additional Fields dialog box, you can view the currently added fields.

Important
  • If you select an existing table as the sink table and it has no additional fields, add them to the existing sink table. Otherwise, data usage will be affected.

  • If you select a data lake table format, no additional fields are included.

Click View DDL for Adding Fields to view the DDL statement for adding the additional fields.

Note

Viewing additional fields is not supported when the source data source type is Kafka.

Search and filter area

Search by Source Table and Sink Table Name. To quickly filter sink tables, click the 1 icon at the top and filter by Mapping Status and Creation Method.

Add global fields, Refresh mapping

  • Add global fields

    Click Add global fields to add global fields in the Add Global Fields dialog box.

    • Name: The name of the global field.

    • Type: Supported data types are String, Long, Double, Date, and Boolean.

    • Value: The value of the global field.

    • Description: A description of the field.

    Note
    • If a field is added both globally and for a single table, only the single-table field takes effect.

    • Currently, only constants can be added.

    • Global fields only take effect for sink tables that are set to Auto-create table.

    • Adding global fields is not supported when the source data source type is Kafka.

  • Refresh mapping

    To refresh the sink table configuration list, click Refresh mapping.

    Important
    • If the sink table configuration already has content, reselecting the data source type and data source will reset the sink table list and mapping. Proceed with caution.

    • You can click to refresh again at any time during the refresh process. Each time you click Refresh mapping, only the configured global fields are saved. Other information, including the sink table creation method, sink table name, and deletion records, is not saved.

    • When the source data source type is Kafka, clicking Refresh mapping will map based on the table list in the Sync Rule Configuration. An error is reported if a table does not exist.

Destination database list

The destination database list includes Serial Number, Source Table, Mapping Status, Sink Table Creation Method, and Sink Table Name. You can also add fields, view fields, refresh, or delete a sink table.

  • Mapping Status:

    • Completed: Mapping is completed normally.

    • Incomplete: The status was modified, but the mapping was not refreshed.

    • Mapping: Waiting for mapping or in the process of mapping.

    • Abnormal: A data source or internal system error exists.

    • Failed: The destination partitioned table is inconsistent with the partition set for the real-time task.

    • Alerting: The source and sink tables may have incompatible data types.

  • Sink Table Creation Method has three options:

    • If a table with the same name as the source table exists in the destination database, the creation method is "Use existing table", and this table is used as the sink table by default. To change to "Auto-create table", add a table name transform rule or a prefix/suffix and remap.

    • If no table with the same name is found in the destination database, the creation method defaults to "Auto-create table". You can also change it to "Use existing table" and select an existing table for synchronization.

    • Only tables that are auto-created support adding fields or custom DDL table creation. Global fields also only take effect for auto-created tables.

    Note
    • When the destination data source type is Hive:

      • During auto-creation, if the data lake table format is None, a standard Hive table is created. Otherwise, a table of the selected format is created. Hudi and Iceberg are currently supported.

      • During custom creation, if the data lake table format is None, use the DDL for a standard Hive table. Otherwise, use the DDL for the selected table format. Hudi and Iceberg are currently supported.

    • When the source data source type is Kafka, the sink table creation method only supports Use existing table.

    • When the destination data source type is SelectDB, if the source table has no primary key, a Duplicate table is created during auto-creation. If the source table has a primary key, a Unique table is created.

    • If Partition Format is set to Single Partition or Fixed Partition and Sink Table Creation Method is set to Use existing table, the system automatically checks if the sink table partition matches the partition settings. An error is reported if it does not match.

    • When the destination data source is StarRocks, auto-creation creates a StarRocks table. If the source table has no primary key, a Duplicate table is created. If the source table has a primary key, a Primary table is created.

  • Sink Table Name: Sink table names can contain only letters, digits, and underscores (_). If a source table name contains other characters, you must configure a table name transform rule.

    When the destination data source type is MaxCompute: If Sink Table Creation Method is Auto-create table and New Table Type is Delta Table, the image icon is displayed next to the sink table name, indicating that a Delta table will be created. If Sink Table Creation Method is Use existing table and the user selects a Delta table from the sink table list, the image icon is also displayed next to the sink table name, indicating that it is a Delta table.

  • Actions:

    • Custom table creation: You can create a table by adding fields or using DDL. After enabling custom table creation, global fields no longer take effect.

      Note
      • Added fields are only displayed in the actions column for auto-created tables.

      • You cannot modify an existing sink table (a table with the creation method "Use existing table").

    • View fields: View the fields and types of the source and sink tables.

    • Refresh: Remap the source and sink tables.

    • Delete: Deleting a source table cannot be undone.

Batch operations

You can Delete sink tables in batches.

Destination data source is Kafka (with multiple destination topics)

image

Block

Description

Search and filter area

Search by Source Table and Destination Topic Name. To quickly filter sink tables, click the 1 icon at the top and filter by Mapping Status and Destination Topic Creation Method.

Refresh mapping

To refresh the sink table configuration list, click Refresh mapping.

Important

If the destination topic configuration already has content, reselecting the data source type and data source will reset the destination topic list and mapping. Proceed with caution.

List

The list includes Serial Number, Source Table, Mapping Status, Destination Topic Creation Method, and Destination Topic Name. You can also delete a sink table.

  • Destination Topic Creation Method: If the destination topic already exists, the creation method is Use Existing Topic. If the destination topic does not exist, the creation method is Auto-create Topic.

    When a topic is auto-created, the system creates it based on the generated destination topic name and topic parameters.

  • Mapping Status: Only checks if the destination topic exists.

  • Delete: Deletes the corresponding row. This operation cannot be undone.

Batch operations

You can Delete sink tables in batches.

DDL processing policy

Note
  • DDL processing policies are not supported when the source data source type is DataHub or Kafka.

  • DDL processing policies are not supported when the destination data source type is PostgreSQL or Hive (Hudi table format).

  • When the destination data source type is Hive (Hudi table format) and the data lake table format is Hudi, all DDL processing policies support only Ignore.

  • When the source data source type is Kafka, all DDL processing policies support only Ignore.

  • New columns added to existing partitions of Hive or MaxCompute tables cannot have their data synchronized. The data for these new columns in existing partitions will be NULL. The next new partition will be effective and available.

  • Create Table and Add Column, among others: Normal processing (includes creating tables, adding columns, deleting columns, renaming columns, and modifying column types). This DDL information is passed to the destination data source for processing. Processing policies vary by destination data source.

  • Ignore: Discards this DDL information and does not send it to the destination data source.

  • Error: Stops the real-time sync task with an error status.

Step 3: Configure real-time integration task properties

  1. Click Resource Configuration in the top menu bar of the current real-time integration task tab, or click Property in the right sidebar to open the Property panel.

  2. Configure the Basic Information and Resource Configuration for the current real-time integration task.

    • Basic Information: Select the Development Owner and Operation Owner for the current real-time integration task, and enter a Description for the task. The description can be up to 1,000 characters long.

    • Resource Configuration: For more information, see Real-time integration resource configuration.

Step 4: Submit the real-time integration task

  1. Click Submit to submit the current real-time integration task.

  2. In the Submit dialog box, enter Submission notes and click OK and Submit.

  3. After submission, you can view the submission details in the Submit dialog box.

    If the project is in Dev-Prod mode, you must publish the real-time integration task to the production environment. For more information, see Manage publish tasks.

What to do next

You can view and manage the real-time integration task in the Operation Center to ensure that it runs as expected. For more information, see View and manage real-time tasks.