To use DataWorks to develop and manage tasks for an ApsaraDB for ClickHouse cluster, you must add the cluster as a ClickHouse computing resource in DataWorks. This lets you connect to the ApsaraDB for ClickHouse cluster from DataWorks and perform operations such as data synchronization and data development.
Prerequisites
An ApsaraDB for ClickHouse cluster has been created.
NoteWe recommend that you create your ClickHouse cluster in the same Region as the DataWorks workspace that is associated to the ClickHouse computing resource.
If the cluster and workspace are in different regions, you can add the cluster only as a cross-region data source. This type of data source can be used only for data synchronization tasks and not for computing tasks in Data Studio or Operation Center.
A workspace has been created in DataWorks. The Resource Access Management (RAM) user who performs the operation has been added to the workspace and assigned the Workspace Administrator role.
A resource group has been associated to the workspace, and network connectivity is confirmed.
If you use a Serverless resource group, ensure that the Serverless resource group can connect to the ApsaraDB for ClickHouse cluster.
If you use a legacy exclusive resource group, ensure that the exclusive resource group for integration, exclusive resource group for scheduling, and exclusive resource group for services can connect to the ApsaraDB for ClickHouse cluster for your scenario.
By default, ApsaraDB for ClickHouse clusters deny access from all IP addresses. Before you associate the computing resource, you must add the vSwitch CIDR block of the resource group, the EIP of a legacy resource group, or the EIP of the VPC associated to a Serverless resource group to the whitelist of the ApsaraDB for ClickHouse cluster. Otherwise, the connection fails and you cannot associate the ClickHouse computing resource.
NoteFor more information about how to obtain the vSwitch CIDR block, the EIP of a legacy resource group, or the EIP of the VPC associated to a Serverless resource group, see Add IP addresses to the DataWorks whitelist.
Limits
Feature limits: If the SSL authentication service is enabled for the ClickHouse compute engine, you cannot use it for data development or periodic scheduling tasks.
Region limits: China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), and Indonesia (Jakarta).
Permission limits:
Operator
Required permissions
Alibaba Cloud account
No extra permissions are required.
RAM user/RAM role
Only workspace members with the O&M or Workspace Administrator role, or members with the
AliyunDataWorksFullAccesspermission can create computing resources. For more information about authorization, see Grant a user the Workspace Administrator permissions.
Associate a ClickHouse computing resource in the new DataStudio
Associate a ClickHouse computing resource to a workspace that Use Data Studio (New Version).
Go to the computing resources page
Log on to the DataWorks console. In the top navigation bar, select the destination region. In the navigation pane on the left, choose . Select the desired workspace from the drop-down list and click Go To Management Center.
In the navigation pane on the left, click Computing Resources to go to the Computing Resources page.
Associate the ClickHouse computing resource
On the Computing Resources page, configure and associate the ClickHouse computing resource.
Select the computing resource type.
Click Associate Computing Resource to go to the Associate Computing Resource page.
On the Associate Computing Resource page, set the computing resource type to ClickHouse, which opens the Associate ClickHouse Computing Resource configuration page.
Configure the ClickHouse computing resource.
On the Associate ClickHouse Computing Resource configuration page, set the parameters as described in the following table.
Parameter
Description
Configuration Mode
Only Connection String Mode is supported.
JDBC URL
JDBC URL format:
jdbc:clickhouse://<ip>:<port>/<dbname>.<ip>: The VPC Address or Public Address on the ClickHouse Cluster Information page. For example,
cc-bp1xxx..clickhouse.ads.aliyuncs.com.<port>: If the Authentication Option is No Authentication, use the VPC HTTP Port Number (
8123) from the ClickHouse Cluster Information page. If the Authentication Option is SSL Authentication, use the VPC HTTPS Port Number (8443) from the ClickHouse Cluster Information page.<dbname>: The ClickHouse database that you use. The default value is
default. You can create a new database as needed.
Username and password
The account and password for your ApsaraDB for ClickHouse cluster.
Authentication Method
Select the authentication method to access the ApsaraDB for ClickHouse cluster.
No Authentication: No other operations are required.
SSL Authentication: If you select this method, you must Download The CA Certificate from the Cluster Information page of the ApsaraDB for ClickHouse cluster for later verification.
NoteIf the SSL authentication service is enabled for the ClickHouse compute engine, you cannot use it for data development or periodic scheduling tasks.
SSL CA Certificate
If you set Authentication Option to SSL Authentication, click Add Authentication File below and upload the CA certificate that you downloaded from the Cluster Information page of the ApsaraDB for ClickHouse cluster.
Computing Resource Instance Name
Enter a custom name for the computing resource instance.
Test the connectivity.
In the connection configuration section, select the resource group for running ClickHouse node tasks. Click Test Network Connectivity to verify that the resource group can connect to your ApsaraDB for ClickHouse cluster. For more information, see Overview of network connection solutions.
Click OK to complete the configuration.
Associate a ClickHouse computing resource in the legacy DataStudio
Associate a ClickHouse computing resource to a workspace that does not Use Data Studio (New Version).
Go to the computing resources page
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
In the left navigation pane, click the
icon to open the Computing Resources page.
Associate the ClickHouse computing resource
On the Computing Resources page, configure and associate the ClickHouse computing resource.
Select the computing resource type.
Click Create Computing Resource to go to the Create Computing Resource page.
On the Create Computing Resource page, set the computing resource type to ClickHouse. The Create Computing Resource configuration page opens.
Configure the ClickHouse computing resource.
On the Create Computing Resource configuration page, set the parameters as described in the following table.
Parameter
Description
Data Source Name
Enter a custom name for the computing resource.
Configuration Mode
Only Connection String Mode is supported.
Host Address/IP Address
The VPC Address or Public Address on the Cluster Information page of the ApsaraDB for ClickHouse cluster. For example,
cc-bp1xxx..clickhouse.ads.aliyuncs.com.Port
If you set Authentication Option to No Authentication, use the VPC HTTP Port (
8123) from the Cluster Information page of the ApsaraDB for ClickHouse cluster.If you set Authentication Option to SSL Authentication, use the VPC HTTPS Port (
8443) from the Cluster Information page of the ApsaraDB for ClickHouse cluster.Database Name
The ClickHouse database that you use. The default value is
default. You can also create a new database as needed.Username and password
The account and password for your ApsaraDB for ClickHouse cluster.
Version
Specify the version of the cluster to associate.
Advanced Parameters
This parameter is optional. You can click Add Property to configure properties.
Authentication Method
Select the authentication method to access the ApsaraDB for ClickHouse cluster.
No Authentication: No other operations are required.
SSL Authentication: If you select this method, you must Download The CA Certificate from the Cluster Information page of the ApsaraDB for ClickHouse cluster for later verification.
NoteIf the SSL authentication service is enabled for the ClickHouse compute engine, you cannot use it for data development or periodic scheduling tasks.
SSL CA Certificate
If you set Authentication Option to SSL Authentication, click Add Authentication File below and upload the CA certificate that you downloaded from the Cluster Information page of the ApsaraDB for ClickHouse cluster.
Test the connectivity.
In the connection configuration section, select the resource group for running ClickHouse tasks. Click Test Network Connectivity to verify that the resource group can connect to your ApsaraDB for ClickHouse cluster. For more information, see Overview of network connection solutions.
Click Create and Associate Computing Resource with DataStudio to complete the configuration.
What to do next
New Data Studio: After you Associate the ClickHouse computing resource, you can use a batch synchronization node for data synchronization or a ClickHouse SQL node for data development.
Legacy DataStudio: After you associate the ClickHouse computing resource, you can use a node for data synchronization.
FAQ
Error message: "not support data sync channel, error code: 0001."
Solution: Verify that the JDBC URL parameter does not contain spaces or extra characters.
Error message: ru.yandex.clickhouse.except.ClickHouseUnknownException: ClickHouse exception, code: 1002.
Solution: Verify that the IP address is correct.