To use DataWorks to develop and manage tasks for a ClickHouse cluster, you must first add the cluster as a ClickHouse computing resource. Once added, you can use the computing resource to connect to the ClickHouse cluster from various DataWorks modules for operations such as data synchronization and data development.
Prerequisites
A ClickHouse cluster has been created.
NoteWe recommend that you create the ClickHouse cluster in the same Region as the DataWorks workspace.
If the cluster and the workspace are in different regions, the cluster can be used as a cross-region data source only for data synchronization tasks. You cannot use it to run computing tasks from Data Studio or Operation Center.
A DataWorks workspace has been created. The RAM user who performs the operation is a member of the workspace and has the Workspace Administrator role.
Ensure that a resource group is associated with the workspace and that network connectivity is established.
If you use a Serverless resource group, ensure that the resource group can connect to the ClickHouse computing resource.
If you use a legacy exclusive resource group, ensure that the ClickHouse computing resource can connect to the relevant exclusive resource group for integration, exclusive resource group for scheduling, and exclusive resource group for services based on your scenario.
By default, ApsaraDB for ClickHouse clusters deny access from all IP addresses. Before you associate the computing resource, you must add the IP addresses or CIDR blocks of the DataWorks resource group to the whitelist of the ClickHouse cluster. Otherwise, the association fails. Depending on your resource group type, this will be the vSwitch CIDR block associated with your resource group, the EIP of your legacy resource group, or the EIP of the VPC that is associated with your Serverless resource group.
NoteTo learn how to obtain the vSwitch CIDR block, the EIP of the legacy resource group, or the EIP of the VPC that is associated with the Serverless resource group, see Add the IP addresses of DataWorks to a whitelist.
Limitations
Feature limitation: If SSL authentication is enabled for the ClickHouse compute engine, you cannot use it for data development or periodic scheduling tasks.
Region limitation: China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), and Indonesia (Jakarta).
Permission requirements:
Operator
Required permissions
Alibaba Cloud account
No additional permissions are required.
RAM user/RAM role
Only workspace members with the O&M or Workspace Administrator role, or those with the
AliyunDataWorksFullAccesspermission, can create computing resources. For more information, see Grant a user the Workspace Administrator permissions.
New Data Studio: Associate a ClickHouse resource
This procedure applies to workspaces that Use Data Studio (New Version).
Go to the computing resources page
Log on to the DataWorks console. In the top navigation bar, select a region. In the left-side navigation pane, choose . Select a workspace from the drop-down list and click Go to Management Center.
In the left-side navigation pane, click Computing Resources to go to the computing resources page.
Associate the ClickHouse computing resource
On the computing resources page, configure and associate the ClickHouse computing resource.
Select the computing resource type.
Click Associate Computing Resources to open the Associate Computing Resources page.
On the Associate Computing Resources page, select ClickHouse as the resource type to open the Associate ClickHouse Computing Resource configuration page.
Configure the ClickHouse computing resource.
On the Associate ClickHouse Computing Resource page, configure the parameters as described in the following table.
Parameter
Description
Configuration Mode
Only Connection String Mode is supported.
JDBC URL
JDBC URL format:
jdbc:clickhouse://<ip>:<port>/<dbname>.<ip>: The VPC Endpoint or Public Endpoint on the Cluster Information page of your ClickHouse cluster. For example,
cc-bp1xxx..clickhouse.ads.aliyuncs.com.<port>: When Authentication Method is set to No Authentication, use the VPC HTTP Port (
8123) from the ClickHouse Cluster Information page. When Authentication Method is set to SSL Authentication, use the VPC HTTPS Port (8443) from the ClickHouse Cluster Information page.<dbname>: The name of the ClickHouse database. The default value is
default. You can create a new database as needed.
Username and password
The account and password for your ClickHouse cluster.
Authentication Method
Specifies the method for authenticating connections to the ClickHouse cluster.
No Authentication: No other configuration is needed.
SSL Authentication: If you select this option, you must download the CA certificate from the Cluster Information page of your ClickHouse cluster for verification.
NoteIf SSL authentication is enabled for the ClickHouse compute engine, you cannot use it for data development or periodic scheduling tasks.
SSL CA Certificate
This parameter is required only if Authentication Method is set to SSL Authentication. Click Add authentication document and upload the CA certificate that you downloaded from the Cluster Information page of your ClickHouse cluster.
Computing Resource Instance Name
Enter a custom name for the computing resource instance.
Test the network connectivity.
In the connection settings area, select the resource group that you want to use to run ClickHouse nodes and click Test Connectivity. This ensures that the resource group can access your ClickHouse cluster. For more information, see Network connectivity solutions.
Click Confirm to complete the configuration.
Legacy Data Studio: Associate a ClickHouse resource
This procedure applies to workspaces that use legacy Use Data Studio (New Version).
Go to the computing resources page
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
In the left-side navigation pane, click the
icon to open the Computing Resources page.
Associate the ClickHouse computing resource
On the computing resources page, configure and associate the ClickHouse computing resource.
Select the computing resource type.
Click Create Computing Resource to go to the Create Computing Resource page.
On the Create Computing Resource page, select ClickHouse as the computing resource type to open the Create Computing Resource configuration page.
Configure the ClickHouse computing resource.
On the Create Computing Resource page, configure the following parameters:
Parameter
Description
Data Source Name
Enter a custom name for the computing resource.
Configuration Mode
Only Connection String Mode is supported.
Host Address/IP
The VPC Endpoint or Public Endpoint on the Cluster Information page of your ClickHouse cluster. For example,
cc-bp1xxx..clickhouse.ads.aliyuncs.com.Port
If Authentication Method is set to No Authentication, use the VPC HTTP Port Number (
8123) from the Cluster Information page of your ClickHouse cluster.If Authentication Method is set to SSL Authentication, use the VPC HTTPS Port Number (
8443) from the Cluster Information page of your ClickHouse cluster.Database Name
The name of the ClickHouse database. The default value is
default. You can create a new database as needed.Username and password
The account and password for your ClickHouse cluster.
Version
Specify the version of the cluster that you want to associate.
Advanced Parameters
Optional. You can click Add Property to configure additional properties.
Authentication Method
The method to use for authenticating connections to the ClickHouse cluster.
No Authentication: No other configuration is needed.
SSL Authentication: If you select this option, you must download the CA certificate from the Cluster Information page of your ClickHouse cluster for verification.
NoteIf SSL authentication is enabled for the ClickHouse compute engine, you cannot use it for data development or periodic scheduling tasks.
SSL CA Certificate
This parameter is required only if Authentication Method is set to SSL Authentication. Click Add authentication document and upload the CA certificate that you downloaded from the Cluster Information page of your ClickHouse cluster.
Test the network connectivity.
In the connection settings area, select the resource group that you want to use to run ClickHouse tasks and click Test Connectivity. This ensures that the resource group can access your ClickHouse cluster. For more information, see Network connectivity solutions.
Click Create and Associate Computing Resource with DataStudio to complete the configuration.
Next steps
Data Studio (new version): After you configure the ClickHouse computing resource, you can use a batch synchronization node for data synchronization or a ClickHouse SQL node for data development in Data Studio.
Legacy Data Studio: After you configure the ClickHouse computing resource, you can use a node for data synchronization.
FAQ
Error message: not support data sync channel, error code: 0001.
Solution: Check the JDBC URL for spaces or extra characters.
Error message: ru.yandex.clickhouse.except.ClickHouseUnknownException: ClickHouse exception, code: 1002.
Solution: Check that the IP address is correct.