By adding a ClickHouse data source, you can enable Dataphin to read business data from ClickHouse or write data to ClickHouse. This topic describes how to add a ClickHouse data source.
Background information
If you are using ClickHouse database and need to import business data from ClickHouse to Dataphin or export Dataphin data to ClickHouse, you need to first complete the creation of a ClickHouse data source.
Permission requirements
Only custom global roles with the Create Data Source permission and system roles such as Super Administrator, Data Source Administrator, Domain Architect, and Project Administrator can create data sources.
Procedure
On the Dataphin homepage, choose Management Center > Datasource Management from the top navigation bar.
On the Datasource page, click +Create Data Source.
On the Create Data Source page, select ClickHouse in the Relational Database section.
If you have recently used ClickHouse, you can also select ClickHouse in the Recently Used section. You can also quickly search for ClickHouse by entering keywords in the search box.
On the Create ClickHouse Data Source page, configure the connection parameters.
Configure the basic information of the data source.
Parameter
Description
Datasource Name
Enter a name for the data source. The name must meet the following requirements:
It can contain only Chinese characters, letters, digits, underscores (_), or hyphens (-).
It cannot exceed 64 characters in length.
Datasource Code
After configuring the data source code, you can directly access Dataphin data source tables in Flink_SQL tasks or using the Dataphin JDBC client through the format
data source code.table nameordata source code.schema.table namefor quick consumption. If you need to automatically switch data sources based on the task execution environment, access using the variable format${data source code}.tableor${data source code}.schema.table. For more information, see Dataphin data source table development method.ImportantThe data source code cannot be modified after it is configured successfully.
After the data source code is configured successfully, you can preview data on the object details page in the asset directory and asset inventory.
In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, and SelectDB data sources are currently supported.
Data Source Description
A brief description of the data source. It cannot exceed 128 characters.
Data Source Configuration
Select the data source that you want to configure:
If your business data source distinguishes between production and development data sources, select Production + Development Data Source.
If your business data source does not distinguish between production and development data sources, select Production Data Source.
Tag
You can categorize and tag data sources based on labels. For information on how to create tags, see Manage data source tags.
Configure the connection parameters between the data source and Dataphin.
If you select Production + Development data source for your data source configuration, you need to configure the connection information for the Production + Development data source. If your data source configuration is Production data source, you only need to configure the connection information for the Production data source.
NoteTypically, production and development data sources should be configured as separate data sources to achieve environment isolation between development and production, reducing the impact of development activities on production. However, Dataphin also supports configuring them as the same data source with identical parameter values.
For Configuration Method, you can choose either JDBC URL or Host. The default selection is JDBC URL.
JDBC URL configuration method
Parameter
Description
JDBC URL
The connection address of ClickHouse. The JDBC URL format is
jdbc:clickhouse://host:port/dbname.If you are using ApsaraDB for ClickHouse, you can view the connection address and port information on the Cluster Information page in the ClickHouse console. After connecting to the cluster, you can execute the
show Databasescommand in SQL Console to view the database name.Username, Password
The username and password used to access the ClickHouse instance.
Host configuration method
Host configuration method
Parameter
Description
Server Address
Enter the IP address and port number of the server.
You can click +Add to add multiple sets of IP addresses and port numbers, and click the
icon to delete excess IP addresses and port numbers, but at least one set must be retained.dbname
Enter the database name.
Parameter configuration
Parameter
Description
Parameter
Parameter name: Supports selecting existing parameter names or entering custom parameter names.
Custom parameter names only support uppercase and lowercase English letters, numbers, periods (.), underscores (_), and hyphens (-).
Parameter value: When a parameter name is selected, the parameter value is required. Only supports uppercase and lowercase English letters, numbers, periods (.), underscores (_), and hyphens (-), with a maximum length of 256 characters.
NoteYou can add multiple parameters by clicking +Add Parameter, and delete excess parameters by clicking the
icon. You can add up to 30 parameters.Username, Password
The username and password used to access the ClickHouse instance.
NoteWhen the configuration method is set to Host and the data source creation is completed, if you need to switch to the JDBC URL configuration method, the system will concatenate the server's IP address and port number into a JDBC URL for filling.
Configure advanced settings for the data source.
Parameter
Description
connection_timeout
The connection_timeout duration of the database (in milliseconds), default is 900000 milliseconds (15 minutes).
NoteIf you have a connection_timeout configuration in the JDBC URL, the connection_timeout will be the timeout value configured in the JDBC URL.
For data sources created before Dataphin V3.11, the default connection_timeout is
-1, indicating no timeout limit.
socket_timeout
The socket_timeout duration of the database (in milliseconds), default is 1800000 milliseconds (30 minutes).
NoteIf you have a socket_timeout configuration in the JDBC URL, the socket_timeout will be the timeout value configured in the JDBC URL.
For data sources created before Dataphin V3.11, the default socket_timeout is
-1, indicating no timeout limit.
Connection Retries
If the database connection times out, it will automatically retry connecting until the set number of retries is completed. If the connection is still unsuccessful after reaching the maximum number of retries, the connection fails.
NoteThe default number of retries is 1, and you can configure a value between 0 and 10.
The connection retry count will be applied by default to offline integration tasks and global quality (requires enabling the asset quality function module). In offline integration tasks, you can configure task-level retry counts separately.
NoteRules for duplicate parameter values:
If a parameter exists simultaneously in JDBC URL, Advanced Settings parameters, and Host Configuration method's parameter configuration, the value in the JDBC URL takes precedence.
If a parameter exists simultaneously in JDBC URL and Advanced Settings parameters, the value in the JDBC URL takes precedence.
If a parameter exists simultaneously in Advanced Settings parameters and Host Configuration method's parameter configuration, the value in the Advanced Settings parameter configuration takes precedence.
Select a Default Resource Group, which is used to run tasks related to the current data source, including database SQL, offline database migration, data preview, and more.
Perform a Test Connection or directly click OK to save and complete the creation of the ClickHouse data source.
Click Test Connection, and the system will test whether the data source can connect normally with Dataphin. If you directly click OK, the system will automatically test the connection for all selected clusters, but even if all selected clusters fail to connect, the data source can still be created normally.