This topic describes how to create and configure a batch synchronization solution to synchronize data from multiple tables in an ApsaraDB for ClickHouse database to Hologres.
Limits
- Batch synchronization from ApsaraDB for ClickHouse supports only ApsaraDB for ClickHouse data sources of V20.8 or V21.8.
- Batch synchronization from ApsaraDB for ClickHouse supports only exclusive resource groups for Data Integration.
Create an exclusive resource group for Data Integration and establish network connections between the resource group and data sources
Before you run a batch synchronization solution, you must establish network connections between your exclusive resource group for Data Integration and data sources. For more information, see Configure network connectivity.
- If your exclusive resource group for Data Integration and a data source reside in the same region, you can use a virtual private cloud (VPC) that resides in the region to establish a network connection between the resource group and data resource. To establish such a network connection, perform the following operations:
- Associate the exclusive resource group for Data Integration with a VPC and add a custom route for the resource group.
- Add the required IP address or CIDR block to the IP address whitelist of the data source.
- If your exclusive resource group for Data Integration and a data source reside in different regions, you can establish a network connection between the resource group and data resource over the Internet. To establish such a network connection, add the required IP address or CIDR block to the IP address whitelist of the data source.
Step 1: Associate the exclusive resource group for Data Integration with a VPC and add a custom route for the resource group
Note If you establish a network connection between the exclusive resource group for Data Integration and a data source over the Internet, you can skip this step.
- Associate the exclusive resource group for Data Integration with a VPC.
- Add a custom route for the exclusive resource group for Data Integration. Note If you select the zone and vSwitch in which the data source resides in the preceding substep, you can skip this substep. If you select another zone and another vSwitch, you must perform operations in this substep to add a custom route for the exclusive resource group for Data Integration.
Step 2: Configure the IP address whitelist for the data source
- Obtain required IP addresses or CIDR blocks.
- If you use a VPC to establish a network connection between the exclusive resource group for Data Integration and the data source, you must enter the CIDR block of the vSwitch that is specified when you associate the resource group with a VPC. You can find the resource group on the Exclusive Resource Groups tab of the Resource Groups page in the DataWorks console and click Network Settings in the Actions column to view the CIDR block of the vSwitch.
- If you establish a network connection between the exclusive resource group for Data Integration and the data source over the Internet, you must enter the elastic IP address (EIP) of the resource group in the IP address whitelist. You can find the resource group on the Exclusive Resource Groups tab of the Resource Groups page in the DataWorks console and click View Information in the Actions column to view the EIP of the resource group.
- Add the IP address or CIDR block to the IP address whitelist configured for the data source.
Prepare data sources
Add an ApsaraDB for ClickHouse data source
In the upper-right corner of the Data Source page in the DataWorks console, click Add data source. In the Add data source dialog box, add an ApsaraDB for ClickHouse data source as prompted. You must configure the following parameters.
Note You can log on to the ApsaraDB for ClickHouse console, find the ApsaraDB for ClickHouse cluster from which you want to read data and then click the name of the cluster to view the following information about the cluster on the Cluster Information page: internal and public endpoints, HTTP port number, vSwitch ID, and zone.
- JDBC URL: Configure this parameter in the
jdbc:clickhouse://<ip>:<port>/<dbname>
format. Before you configure this parameter, take note of the following items:<ip>
: You must replace this item with the public or internal endpoint of the ApsaraDB for ClickHouse cluster.- If you use a VPC to establish a network connection between the exclusive resource group for Data Integration and the ApsaraDB for ClickHouse cluster, you must replace <ip> with the internal endpoint of the cluster.
- If you establish a network connection between the exclusive resource group for Data Integration and the ApsaraDB for ClickHouse cluster over the Internet, you must replace <ip> with the public endpoint of the cluster.
<port>
: You must replace this item with the HTTP port of the ApsaraDB for ClickHouse cluster. In most cases, the HTTP port number is 8123.<dbname>
: You must replace this item with the name of the ApsaraDB for ClickHouse database from which you want to read data.
- Username and Password: Specify the username and password of the ApsaraDB for ClickHouse database.
- Test connectivity: Select the exclusive resource group for Data Integration that you want to use and test the network connectivity between the resource group and the ApsaraDB for ClickHouse data source. Make sure that the network connectivity test is successful. Note The preceding configurations establish a network connection only between the exclusive resource group for Data Integration and the ApsaraDB for ClickHouse data source. If you want to establish a network connection between a resource group for DataService Studio or resource group for scheduling and the ApsaraDB for ClickHouse data source, you must configure the required network settings and test the network connectivity.
Add a Hologres data source
You can associate a Hologres compute engine with the workspace that you want to use to enable the system to generate a Hologres data source. You can also directly add a Hologres data source to the workspace that you want to use. For more information, see Associate a Hologres compute engine with a workspace or Add a Hologres data source.
Create and configure a synchronization solution
- Select a data synchronization solution type. On the Tasks page in Data Integration, click Create Node. On the Create Data Synchronization Solution page, select ClickHouse as the source and Hologres as the destination for the Data Source field in the Synchronization Type section. The system displays the Hologres Offline synchronization solution type. By default, the Hologres Offline synchronization solution type is selected. You cannot change the type.
- Test the network connectivity between the exclusive resource group for Data Integration and the data sources. In the Network and Resource Configuration section, select the ApsaraDB for ClickHouse data source and the Hologres data source that are added to DataWorks, select the exclusive resource group for Data Integration that is purchased, and then click Test Connectivity for All Resource Groups and Data Sources to test the network connectivity between the two data sources and the resource group. If the system prompts that the network connections between the data sources and resource group are established, click Next.
- Select the tables from which you want to read data. In the Select Data Sources and Tables for Data Synchronization section of the page that appears, select the tables from which you want to read data in the Source Table area on the left side and click the icon to add the selected tables to the Selected Tables area on the right side.
- In the Mapping Rules for Destination Tables section, select all items in this section and click Batch Refresh Mapping Results. You can also select specific items and click Batch Modify to modify the items based on your business requirements. The following table describes the options under Batch Modify.
Option Description Value assignment You can add constants and variables to destination tables. Customize Mapping Rules for Destination Schema Names You can concatenate built-in variables and specified strings into a final destination schema name. You can edit built-in variables. For example, you can specify strings as the value of built-in variables. Customize Mapping Rules for Destination Table Names You can concatenate built-in variables and specified strings into a final destination table name. You can edit built-in variables. For example, you can specify strings as the value of built-in variables. Have Primary Key You can use the primary key information of destination tables to implement automatic mapping. - If destination tables are created in a visualized manner, you can click the icon in the Destination Table Name column to edit the table schemas and select primary keys. Then, refresh the mappings.
- If destination tables contain primary keys, the system uses the new data to overwrite the original data. This indicates that data in all columns of specific rows is completely overwritten. The fields for which column mappings are not configured are forcefully set to NULL.
- Modify mappings between field types. If the destination Hologres tables are in the pending state, the system provides default mappings between data types of fields in ApsaraDB for ClickHouse and Hologres tables. The following table lists the default mappings. You can also click Edit Mapping of Field Data Types in the upper-right corner of the Mapping Rules for Destination Tables section to customize field type mappings. After you customize field type mappings, click Apply and Refresh Mapping.
Category Data type of fields in ApsaraDB for ClickHouse data source Data type of fields in Hologres data source Date Date Date DateTime TIMESTAMPTZ DateTime(timezone) TIMESTAMPTZ DateTime64 TIMESTAMPTZ Numeric Int8 SMALLINT Int16 SMALLINT Int32 INTEGER Int64 BIGINT UInt8 INTEGER UInt16 INTEGER UInt32 BIGINT UInt64 BIGINT Float32 FLOAT Float64 DOUBLE PRECISION Decimal(P, S) DECIMAL Decimal32(S) DECIMAL Decimal64(S) DECIMAL Decimal128(S) DECIMAL Boolean None (UInt8 is used instead.) BOOLEAN String String TEXT - Configure advanced parameters. You can click Configure Advanced Parameters in the upper-right corner of the configuration page to perform finer-grained configurations for the source and destination for data synchronization. For example, you can configure the maximum number of connections and the parameters related to throttling.
- Configure a resource group. You can click Configure Resource Group in the upper-right corner of the configuration page and modify the exclusive resource group for Data Integration that you want to use to run the data synchronization solution.
- After the preceding configuration is complete, click Complete.
Run the data synchronization solution
- Go to the Tasks page in Data Integration and find the created data synchronization solution.
- Click Submit and Run in the Actions column to run the data synchronization solution.
- Click Execution details in the Actions column to view the execution details of the data synchronization solution.
Perform O&M operations for the data synchronization solution
View the status of the data synchronization solution
After the data synchronization solution is created, you can go to the Tasks page to view all data synchronization solutions that are created in the workspace and the basic information of each data synchronization solution.
- You can find the desired data synchronization solution and start, stop, modify, or view the details of the data synchronization solution.
- You can find the desired data synchronization solution and click Execution details in the Operation column to view the running details of the solution. You can also click different sections on the Execution details page to view the related information.