By creating an Amazon RDS for MySQL data source, you can enable Dataphin to read business data from or write data to Amazon RDS for MySQL. This topic describes how to create an Amazon RDS for MySQL data source.
Permission requirements
Only custom global roles with the Create Data Source permission and the super administrator, data source administrator, domain architect, and project administrator roles can create data sources.
Procedure
On the Dataphin homepage, choose Management Center > Datasource Management from the top navigation bar.
On the Datasource page, click +Create Data Source.
In the Create Data Source dialog box, select Amazon RDS for MySQL in the Relational Database section.
If you have recently used Amazon RDS for MySQL, you can also select Amazon RDS for MySQL in the Recently Used section. You can also enter keywords of Amazon RDS for MySQL in the search box to quickly search for it.
In the Create Amazon RDS For MySQL Data Source dialog box, configure the parameters for connecting to the data source.
Configure the basic information of the data source.
Parameter
Description
Datasource Name
Enter a name for the data source. The name must meet the following requirements:
It can contain only Chinese characters, letters, digits, underscores (_), and hyphens (-).
It cannot exceed 64 characters in length.
Datasource Code
After you configure the data source code, you can access Dataphin data source tables in Flink_SQL tasks or by using the Dataphin JDBC client in the format of
Data source code.Table nameorData source code.Schema.Table namefor quick consumption. If you need to automatically switch data sources based on the task execution environment, access the data source in the variable format of${Data source code}.tableor${Data source code}.schema.table. For more information, see Development method for Dataphin data source tables.ImportantThe data source code cannot be modified after it is configured successfully.
After the data source code is configured successfully, you can preview data on the object details page in the asset directory and asset inventory.
In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, and SelectDB data sources are currently supported.
Version
You can select only MySQL 5.6/5.7 or MySQL 8.
Data Source Description
Enter a brief description of the Amazon RDS for MySQL data source. The description cannot exceed 128 characters in length.
Data Source Configuration
Based on whether the business data source distinguishes between production and development data sources:
If the business data source distinguishes between production and development data sources, select Production + Development Data Source.
If the business data source does not distinguish between production and development data sources, select Production Data Source.
Tag
You can categorize and tag data sources based on tags. For information about how to create tags, see Manage data source tags.
Configure the parameters for connecting the data source to Dataphin.
If you select Production + Development data source for your data source configuration, you need to configure the connection information for the Production + Development data source. If your data source configuration is Production data source, you only need to configure the connection information for the Production data source.
NoteIn most cases, the production data source and development data source should be configured as different data sources to isolate the development environment from the production environment and reduce the impact of the development data source on the production data source. However, Dataphin also supports configuring them as the same data source with the same parameter values.
For Configuration Method, you can select JDBC URL or Host. The default value is JDBC URL.
JDBC URL configuration method
Parameter
Description
JDBC URL
The format of the JDBC URL is
jdbc:mysql://host:port/dbname.Username, Password
Enter the username and password of a user who has the required permissions. To ensure that tasks can be executed properly, make sure that the user has the required data permissions.
SSL Encryption
After you enable this option, you need to upload a truststore certificate and enter the password of the certificate.
Host configuration method
Host configuration method
Parameter
Description
Server Address
Enter the IP address and port number of the server.
You can click +Add to add multiple sets of IP addresses and port numbers. You can click the
icon to delete redundant IP addresses and port numbers. You must retain at least one set.dbname
Enter the database name.
Parameter configuration
Parameter
Description
Parameter
Parameter name: You can select an existing parameter name or enter a custom parameter name.
A custom parameter name can contain only letters, digits, periods (.), underscores (_), and hyphens (-).
Parameter value: If you have selected a parameter name, you must enter a parameter value. The parameter value can contain only letters, digits, periods (.), underscores (_), and hyphens (-). It cannot exceed 256 characters in length.
NoteYou can click +Add Parameter to add multiple parameters. You can click the
icon to delete redundant parameters. You can add a maximum of 30 parameters.Username, Password
The username and password used to log on to the Amazon RDS for MySQL instance.
SSL Encryption
After you enable this option, you need to upload a truststore certificate and enter the password of the certificate.
NoteIf you select Host as the configuration method and create a data source, and then you want to switch to the JDBC URL configuration method, the system concatenates the IP address and port number of the server to form a JDBC URL.
Configure advanced settings for the data source.
Parameter
Description
connectTimeout
The connection timeout period of the database. Unit: milliseconds. Default value: 900000 milliseconds (15 minutes).
NoteIf you have configured connectTimeout in the JDBC URL, the connectTimeout value in the JDBC URL takes effect.
For data sources created before Dataphin V3.11, the default value of connectTimeout is
-1, which indicates that no timeout limit is set.
socketTimeout
The socket timeout period of the database. Unit: milliseconds. Default value: 1800000 milliseconds (30 minutes).
NoteIf you have configured socketTimeout in the JDBC URL, the socketTimeout value in the JDBC URL takes effect.
For data sources created before Dataphin V3.11, the default value of socketTimeout is
-1, which indicates that no timeout limit is set.
Connection Retries
If the database connection times out, the system automatically retries the connection until the specified number of retries is reached. If the connection still fails after the maximum number of retries is reached, the connection fails.
NoteThe default number of retries is 1. You can set this parameter to a value from 0 to 10.
The number of connection retries applies to offline integration tasks and global quality (the Data Quality module must be activated). You can separately configure the number of retries for offline integration tasks.
NoteRules for duplicate parameters:
If a parameter exists in the JDBC URL, Advanced Settings, and Host Configuration method, the value of the parameter in the JDBC URL takes effect.
If a parameter exists in the JDBC URL and Advanced Settings, the value of the parameter in the JDBC URL takes effect.
If a parameter exists in Advanced Settings and the Host Configuration method, the value of the parameter in Advanced Settings takes effect.
Select a Default Resource Group. The resource group is used to run tasks related to the current data source, including database SQL, offline database migration, and data preview.
Click Test Connection or directly click OK to save the settings and create the Amazon RDS for MySQL data source.
If you click Test Connection, the system tests whether the data source can be connected to Dataphin. If you directly click OK, the system automatically tests the connection to all selected clusters. Even if the connection to all selected clusters fails, the data source can still be created.
Test Connection tests the connection for the Default Cluster or Registered Scheduling Clusters that have been registered in Dataphin and are in normal use. The Default Cluster is selected by default and cannot be deselected. If there are no resource groups under a Registered Scheduling Cluster, connection testing is not supported. You need to create a resource group first before testing the connection.
The selected clusters are only used to test network connectivity with the current data source and are not used for running related tasks later.
The test connection usually takes less than 2 minutes. If it times out, you can click the
icon to view the specific reason and retry.Regardless of whether the test result is Connection Failed, Connection Successful, or Succeeded With Warning, the system will record the generation time of the final result.
NoteOnly the test results for the Default Cluster include three connection statuses: Succeeded With Warning, Connection Successful, and Connection Failed. The test results for Registered Scheduling Clusters in Dataphin only include two connection statuses: Connection Successful and Connection Failed.
When the test result is Connection Failed, you can click the
icon to view the specific failure reason.When the test result is Succeeded With Warning, it means that the application cluster connection is successful but the scheduling cluster connection failed. The current data source cannot be used for data development and integration. You can click the
icon to view the log information.