You can create a real-time synchronization solution and use the solution to synchronize full and incremental data to Hologres. You can also use the solution to synchronize only incremental data to Hologres in real time if full data is synchronized to Hologres by using another method. This topic describes how to create a real-time synchronization solution to synchronize data to Hologres.
- The required data sources are configured. Before you configure a data synchronization solution, you must configure the data sources from which you want to read data and to which you want to write data. This way, you can select the data sources when you configure a data synchronization solution. For information about the data source types that support the solution-based synchronization feature and the configuration of a data source, see Supported data source types and read and write operations. Note For information about the items that you need to understand before you configure a data source, see Overview.
- An exclusive resource group for Data Integration that meets your business requirements is purchased. For more information, see Create and use an exclusive resource group for Data Integration.
- Network connections between the exclusive resource group for Data Integration and data sources are established. For more information, see Establish a network connection between a resource group and a data source.
- The data source environments are prepared. Before you configure a data synchronization solution, you must create an account that can be used to access a database and grant the account the permissions required to perform specific operations on the database based on your configurations for data synchronization. For more information, see Overview.
|Number of tables from which you can read data|
|Nodes||A real-time synchronization solution generates batch synchronization nodes to synchronize full data and real-time synchronization nodes to synchronize incremental data. The number of batch synchronization nodes that are generated by the solution varies based on the number of tables from which you can read data.|
|Data write||After you run a real-time synchronization solution, full data in the source is written to the destination by using batch synchronization nodes. Then, incremental data in the source is written to the destination by using real-time synchronization nodes. |
Note After full data is synchronized, the batch synchronization nodes are frozen.
When you synchronize data to Hologres, you can write data only to the partitions of a partitioned table.
- Step 1: Select a synchronization solution
- Step 2: Configure network connections for data synchronization
- Step 3: Configure the source and synchronization rules
- Step 4: Configure a destination table
- Step 5: Configure table-level synchronization rules
- Step 6: Configure rules to process DDL messages
- Step 7: Configure the resources required by the synchronization solution
- Step 8: Run the synchronization solution
Step 1: Select a synchronization solutionGo to the Data Integration page in the DataWorks console and click Create Data Synchronization Solution. On the Create Data Synchronization Solution page, select a source and a destination for data synchronization from the drop-down lists. Then, select One-click real-time synchronization to Hologres. For more information, see Create a synchronization solution.
Step 2: Configure network connections for data synchronization
Select a source, a destination, and a resource group that is used to run nodes. Test the network connectivity to make sure that the resource group is connected to the source and destination. For more information, see Configure network connections for data synchronization.
Step 3: Configure the source and synchronization rules
- In the Basic Configuration section, configure the parameters, such as the Solution Name and Location parameters, based on your business requirements.
- In the Data Source section, confirm the information about the source.
- In the Source Table section, select the tables from which you want to read data from the Source Table list. Then, click the icon to add the tables to the Selected Source Table list.
The Selected Source Table list displays all tables in the source. You can select all or specific tables.
- In the Set Mapping Rules for Table/Database Names section, click Add Rule, select a rule type, and then configure a mapping rule of the selected type. By default, data in a source table is written to the destination schema or table that has the same name as the source schema or table. You can specify a destination schema name or table name in a mapping rule to write data in multiple source tables to the same Hologres table. You can also configure a mapping rule to synchronize data from source tables whose names start with a specified prefix to the destination tables whose names start with another specified prefix. You can use regular expressions to convert the names of the schemas or tables. You can also use built-in variables to add prefixes and suffixes to the names of destination tables. For more information, see Configure the source and synchronization rules.
Step 4: Configure a destination table
- Configure the Policy for Writing to Hologres parameter. You can set the Policy for Writing to Hologres parameter only to Replay. This value indicates that the operations performed on the source are also performed on the destination by Hologres Writer. The operations include INSERT, UPDATE, and DELETE.
- Configure mappings between the source table and the destination Hologres table. Click Refresh source table and Hologres Table mapping to create a destination table based on the rules you configured in the Set Mapping Rules for Table/Database Names section in Step 3. If no mapping rule is configured in Step 3, data in the source table is written to the destination table that has the same name as the source table. If no destination table that has the same name as the source table exists, the system automatically creates such a destination table. You can change the method of creating the destination table and add additional fields to the destination table.
Operation Description Select a primary key for a source table Source tables that do not have primary keys cannot be synchronized. If a source table does not have a primary key, you can click the icon in the Synchronized Primary Key column to specify one or more fields in the table as the primary key of the table. Select the method of creating a destination table You can set the Table creation method parameter to Create Table or Use Existing Table.
- Use Existing Table: If you select this method, you must select the desired destination table from the drop-down list in the Table name column.
- Create Table: If you select this method, the name of the Hologres table that is automatically created appears in the drop-down list of the Table name column. You can click the table name to view and modify the table creation statements.
Configure the Full Synchronization parameter You can specify whether to turn on the switch in the Full Synchronization column to synchronize full data of the source tables to the destination.
If you disable full data synchronization for specific tables, you cannot perform full batch synchronization for data in source the tables. You can disable full data synchronization for tables whose data is synchronized by using other batch synchronization methods.
Edit additional fields You can click Edit additional fields in the Actions column to add additional fields to a destination table and assign values to the fields. The values can be constants or variables.Note You can add additional fields only if you select Create Table from the drop-down list in the Table creation method column.The following additional variables are supported by Data Integration:
EXECUTE_TIME: the execution time UPDATE_TIME: the update time DB_NAME_SRC: the name of the original database DB_NAME_SRC_TRANSED: the converted name of the database DATASOURCE_NAME_SRC: the name of the source data source DATASOURCE_NAME_DEST: the name of the destination data source DB_NAME_DEST: the name of the destination database TABLE_NAME_DEST: the name of the destination table TABLE_NAME_SRC: the name of the source table
Edit destination tables You can click the name of a table in the Table name column to edit the destination table. For example, you can configure a primary key and a distribution key for the destination table and modify the mappings between the source fields and destination fields.
Field type conversion may occur during data synchronization. For example, if the data types of the fields in a destination table are different from the data types of the fields in a source table, the synchronization solution converts the fields in the source table to the data types that can be written to the destination table.Note You can edit a destination table only if you select Create Table from the drop-down list in the Table creation method column.
Step 5: Configure table-level synchronization rulesYou can configure rules to process DML messages generated for insert, update, and delete operations that are performed on the source.
- Normal: The system sends the DML message to the destination, and the destination processes data.
- Ignore: The system discards the DML message and no longer sends this message to the destination, and the related data is not modified.
- Conditionally Normal Processing: The system first filters data based on the filter expression. Data that meets the filter expression is normally processed. Data that does not meet the filter expression is ignored.
Step 6: Configure rules to process DDL messages
DDL operations are performed on a source. Data Integration provides default rules to process DDL messages. You can also configure processing rules for different DDL messages based on your business requirements. For more information, see Rules for processing DDL messages.
Step 7: Configure the resources required by the synchronization solution
After you create a synchronization solution, this synchronization solution generates batch synchronization nodes for full data synchronization and real-time synchronization nodes for incremental data synchronization. You must configure the parameters in the Configure Resources step.
You can configure the exclusive resource groups for Data Integration that you want to use to run real-time synchronization nodes and batch synchronization nodes, and the resource groups for scheduling that you want to use to run batch synchronization nodes. You can also click Advanced Configuration to configure the Number of concurrent writes on the target side and Allow Dirty Data Records parameters.
- DataWorks uses resource groups for scheduling to issue batch synchronization nodes to resource groups for Data Integration and runs the nodes on the resource groups for Data Integration. Therefore, a batch synchronization node also occupies the resources of a resource group for scheduling. You are charged fees for using the resource group for scheduling to schedule the batch synchronization nodes. For information about the node issuing mechanism, see Mechanism for issuing nodes.
- We recommend that you use different resource groups to run batch and real-time synchronization nodes. If you use the same resource group to run batch and real-time synchronization nodes, the nodes compete for resources and affect each other. For example, CPU resources, memory resources, and networks used by the two types of nodes may affect each other. In this case, the batch synchronization nodes may slow down, or the real-time synchronization node may be delayed. Out of memory (OOM) errors may also occur due to insufficient resources.
Step 8: Run the synchronization solution
- Go to the Tasks page in Data Integration and find the created data synchronization solution.
- Click Submit and Run in the Actions column to run the data synchronization solution.
- Click Execution details in the Actions column to view the execution details of the data synchronization solution.