Create a MySQL data source to enable Dataphin to read data from or write data to MySQL. This topic describes how to create a MySQL data source.
Background information
MySQL is a leading open-source relational database management system that uses Structured Query Language (SQL) to store, retrieve, and manage data. It is often used for websites, applications, and commercial products. It is commonly used as a primary relational data storage system.
Permissions
Only custom global roles with the Create Data Source permission and the super administrator, data source administrator, domain architect, and project administrator roles can create data sources.
Procedure
On the Dataphin home page, choose Management Center > Datasource Management from the top menu bar.
On the Datasource page, click +Create Data Source.
On the Create Data Source page, select MySQL in the Relational Database section.
If you have recently used MySQL, you can also select it from the Recently Used section. Alternatively, you can enter MySQL in the search box to find it.
On the Create MySQL Data Source page, configure the connection parameters.
Configure the basic information for the data source.
Parameter
Description
Datasource Name
Enter a name for the data source. The naming convention is as follows:
The name can contain Chinese characters, letters, digits, underscores (_), and hyphens (-).
The name can be up to 64 characters long.
Datasource Code
After you configure the data source code, you can directly access Dataphin data source tables in Flink SQL tasks or using the Dataphin JDBC client. Use the format
datasourcecode.tablenameordatasourcecode.schema.tablenamefor quick access. To automatically switch data sources based on the task execution environment, use the variable format${datasourcecode}.tableor${datasourcecode}.schema.table. For more information, see Develop Dataphin data source tables.ImportantThe data source code cannot be modified after it is configured.
You can preview data on the object details page in the asset directory and asset inventory only after the data source code is configured.
In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, and SelectDB data sources are supported.
Version
You can select only MySQL 5.1.43, MySQL 5.6/5.7, MySQL 8.0.x, MySQL 8.4.x, or RDS MySQL.
ImportantIf you use a MySQL data source from Alibaba Cloud, select RDS MySQL to connect properly.
Data Source Description
A brief description of the MySQL data source. The description can be up to 128 characters long.
Time Zone
Time-formatted data in integration tasks is processed based on the current time zone. The default time zone is Asia/Shanghai. Click Modify to select a target time zone. The options are as follows:
GMT: GMT-12:00, GMT-11:00, GMT-10:00, GMT-09:30, GMT-09:00, GMT-08:00, GMT-07:00, GMT-06:00, GMT-05:00, GMT-04:00, GMT-03:00, GMT-03:00, GMT-02:30, GMT-02:00, GMT-01:00, GMT+00:00, GMT+01:00, GMT+02:00, GMT+03:00, GMT+03:30, GMT+04:00, GMT+04:30, GMT+05:00, GMT+05:30, GMT+05:45, GMT+06:00, GMT+06:30, GMT+07:00, GMT+08:00, GMT+08:45, GMT+09:00, GMT+09:30, GMT+10:00, GMT+10:30, GMT+11:00, GMT+12:00, GMT+12:45, GMT+13:00, GMT+14:00.
Daylight Saving Time: Africa/Cairo, America/Chicago, America/Denver, America/Los_Angeles, America/New_York, America/Sao_Paulo, Asia/Bangkok, Asia/Dubai, Asia/Kolkata, Asia/Shanghai, Asia/Tokyo, Atlantic/Azores, Australia/Sydney, Europe/Berlin, Europe/London, Europe/Moscow, Europe/Paris, Pacific/Auckland, Pacific/Honolulu.
Data Source Configuration
Specify whether your business data source distinguishes between production and development environments:
If your business data source distinguishes between production and development environments, select Production + Development Data Source.
If your business data source does not distinguish between production and development environments, select Production Data Source.
Tag
You can add tags to data sources for categorization. For more information about how to create a tag, see Manage data source tags.
Configure the connection parameters between the data source and Dataphin.
The connection information that you need to configure depends on the Data Source Configuration setting. If you select Production + Development data source, configure the connection information for both environments. If you select Production data source, configure only the production connection information.
NoteTypically, production and development data sources are configured separately to isolate the environments and prevent development activities from affecting the production data source. However, Dataphin also supports configuring them as a single data source with identical parameter values.
For Configuration Method, you can select JDBC URL, Host, . The default method is JDBC URL.
JDBC URL configuration method
Parameter
Description
JDBC URL
The format of the JDBC URL is
jdbc:mysql://host:port/dbname.NoteThe default port for a MySQL data source is 3306.
Username, Password
The Username and Password for the database.
SSL Encryption
If you enable this feature, upload a Truststore certificate and enter the Truststore certificate password.
Host configuration method
Host configuration
Parameter
Description
Server Address
Enter the IP address and port number of the server.
Click +Add to add multiple IP addresses and port numbers. Click the
icon to delete IP addresses and port numbers. You must retain at least one IP address and port number.dbname
Enter the database name.
Parameter configuration
Parameter
Description
Parameter
Parameter name: Select an existing parameter name or enter a custom parameter name.
A custom parameter name can contain only letters, digits, periods (.), underscores (_), and hyphens (-).
Parameter value: If you select a parameter name, this parameter is required. The value can contain only letters, digits, periods (.), underscores (_), and hyphens (-). The value can be up to 256 characters long.
NoteClick +Add Parameter to add multiple parameters. Click the
icon to delete parameters. You can add up to 30 parameters.Username, Password
The username and password used to log on to the MySQL instance.
SSL Encryption
If you enable this feature, upload a Truststore certificate and enter the Truststore certificate password.
NoteIf you select the Host configuration method and later switch to the JDBC URL method for the same data source, the system automatically populates the JDBC URL field by concatenating the server IP address and port number.
Configure the advanced settings for the data source.
Parameter
Description
connectTimeout
The connection timeout period for the database, in milliseconds. The default value is 900000 ms (15 minutes).
NoteIf you specify a `connectTimeout` value in the JDBC URL, that value is used.
For data sources created before Dataphin V3.11, the default `connectTimeout` value is
-1, which indicates no timeout limit.
socketTimeout
The socket timeout period for the database, in milliseconds. The default value is 1800000 ms (30 minutes).
NoteIf you specify a `socketTimeout` value in the JDBC URL, that value is used.
For data sources created before Dataphin V3.11, the default `socketTimeout` value is
-1, which indicates no timeout limit.
Connection Retries
If the database connection times out, the system automatically retries the connection until the specified number of retries is reached. If the connection still fails after the maximum number of retries, the connection fails.
NoteThe default number of retries is 1. You can set this parameter to a value from 0 to 10.
The number of connection retries is applied to offline integration tasks and global quality checks by default. To use global quality checks, you must enable the asset quality module. For offline integration tasks, you can configure the number of retries at the task level.
NoteValue rules for duplicate parameters:
If a parameter is specified in the JDBC URL, the Advanced Settings, and the Host configuration, the value in the JDBC URL takes precedence.
If a parameter is specified in both the JDBC URL and the Advanced Settings, the value in the JDBC URL takes precedence.
If a parameter is specified in both the Advanced Settings and the Host configuration, the value in the Advanced Settings takes precedence.
Select a Default Resource Group. This resource group is used to run tasks related to the data source, such as SQL tasks, full offline database migrations, and data previews.
Click Test Connection or click OK to save the configuration and create the MySQL data source.
When you click Test Connection, the system tests the connection to the data source. If you click OK, the system automatically tests the connection for all selected clusters. The data source can be created even if the connection test fails.
Test Connection tests the connection for the Default Cluster or Registered Scheduling Clusters that have been registered in Dataphin and are in normal use. The Default Cluster is selected by default and cannot be deselected. If there are no resource groups under a Registered Scheduling Cluster, connection testing is not supported. You need to create a resource group first before testing the connection.
The selected clusters are only used to test network connectivity with the current data source and are not used for running related tasks later.
The test connection usually takes less than 2 minutes. If it times out, you can click the
icon to view the specific reason and retry.Regardless of whether the test result is Connection Failed, Connection Successful, or Succeeded With Warning, the system will record the generation time of the final result.
NoteOnly the test results for the Default Cluster include three connection statuses: Succeeded With Warning, Connection Successful, and Connection Failed. The test results for Registered Scheduling Clusters in Dataphin only include two connection statuses: Connection Successful and Connection Failed.
When the test result is Connection Failed, you can click the
icon to view the specific failure reason.When the test result is Succeeded With Warning, it means that the application cluster connection is successful but the scheduling cluster connection failed. The current data source cannot be used for data development and integration. You can click the
icon to view the log information.