By creating a Doris data source, you can enable Dataphin to read business data from Doris or write data to Doris. This topic describes how to create a Doris data source.
Background information
Doris, also known as Apache Doris, is a high-performance, real-time analytical database based on MPP architecture. It can return query results for massive data with sub-second response time. It effectively supports both high-concurrency point query scenarios and complex analytical scenarios with high throughput. With these capabilities, it meets the requirements for report analysis, ad hoc query, unified data warehouse construction, and data lake federated query acceleration. For more information, see Doris official website.
Permissions
Only custom global roles with the permission to create data sources and the roles of super administrator, data source administrator, domain architect, and project administrator can create data sources.
Procedure
On the Dataphin homepage, click Management Center > Data Source Management in the top navigation bar.
On the Datasource page, click +Create Data Source.
In the Big Data section of the Create Data Source page, select Doris.
If you have recently used Doris, you can also select it in the Recently Used section. You can also enter Doris keywords in the search box to quickly find it.
On the Create Doris Data Source page, configure the connection parameters.
Configure the basic information of the data source.
Parameter
Description
Datasource Name
The name must meet the following requirements:
It can contain only Chinese characters, uppercase and lowercase letters, digits, underscores (_), and hyphens (-).
It cannot exceed 64 characters in length.
Datasource Code
After you configure the data source code, you can reference tables in the data source in Flink_SQL tasks by using the format
data_source_code.table_nameordata_source_code.schema.table_name. If you need to automatically access the data source in the corresponding environment based on the current environment, use the variable format${data_source_code}.tableor${data_source_code}.schema.table. For more information, see Dataphin data source table development method.ImportantThe data source code cannot be modified after it is configured successfully.
After the data source code is configured successfully, you can preview data on the object details page in the asset directory and asset inventory.
In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, and SelectDB data sources are currently supported.
Data Source Description
A brief description of the Doris data source. It cannot exceed 128 characters.
Data Source Configuration
Based on whether the business data source distinguishes between production and development data sources:
If the business data source distinguishes between production and development data sources, select Production + Development Data Source.
If the business data source does not distinguish between production and development data sources, select Production Data Source.
Tag
You can categorize and tag data sources based on tags. For information about how to create tags, see Manage data source tags.
Configure the connection parameters between the data source and Dataphin.
If you select Production + Development Data Source for Data Source Configuration, you need to configure the connection information for both Production + Development Data Source. If you select Production Data Source, you only need to configure the connection information for the Production Data Source.
NoteTypically, production and development data sources should be configured as separate data sources to achieve environment isolation and reduce the impact of development data sources on production data sources. However, Dataphin also supports configuring them as the same data source with identical parameter values.
Parameter
Description
JDBC URL
Enter the connection address of the data source. The connection address format is:
jdbc:mysql://host:port/dbname.Username,Password
The username and password used to log on to the Doris data source.
FE Node URL
Enter the connection address of the FE node, which is used to access the FE node through a web server. The connection address format is
<FE IP>:<Http Port>(the port is typically 8030 by default). You can configure multiple FE nodes, separated by commas (,).SSL Encryption
If you need to establish an encrypted connection through SSL, you need to Enable SSL encryption, Upload The Truststore Certificate, and enter the Truststore Certificate Password.
Configure advanced settings for the data source.
Parameter
Description
connectTimeout
The connectTimeout duration of the database (in milliseconds). The default is 900,000 milliseconds (15 minutes).
NoteIf you include a connectTimeout configuration in the JDBC URL, the connectTimeout will be the timeout period configured in the JDBC URL.
For data sources created before Dataphin V3.11, the default connectTimeout is
-1, which means no timeout limit.
socketTimeout
The socketTimeout duration of the database (in milliseconds). The default is 1,800,000 milliseconds (30 minutes).
NoteIf you include a socketTimeout configuration in the JDBC URL, the socketTimeout will be the timeout period configured in the JDBC URL.
For data sources created before Dataphin V3.11, the default socketTimeout is
-1, which means no timeout limit.
Connection Retry Count
If the database connection times out, the system will automatically retry the connection until the specified number of retries is reached. If the connection still fails after the maximum number of retries, the connection is considered failed.
NoteThe default retry count is 1, and you can configure a value between 0 and 10.
The connection retry count will be applied by default to offline integration tasks and global quality (requires the asset quality function module to be enabled). In offline integration tasks, you can configure task-level retry counts separately.
Select a Default Resource Group, which is used to run tasks related to the current data source, including database SQL, offline database migration, data preview, and more.
Perform a Test Connection or directly click OK to save and complete the creation of the Doris data source.
Click Test Connection, and the system will test whether the data source can connect to Dataphin normally. If you directly click OK, the system will automatically test the connection for all selected clusters. However, even if all selected clusters fail the connection test, the data source can still be created normally.