By creating an Amazon S3 data source, you can enable Dataphin to read business data from Amazon S3 or write data to Amazon S3. This topic describes how to create an Amazon S3 data source.
Background information
Amazon S3 (Simple Storage Service) is a cloud storage service provided by Amazon. It allows individuals, organizations, and enterprises to store and retrieve data in the cloud. If you use Amazon S3, you need to create an Amazon S3 data source before you can develop data in Dataphin or write Dataphin data to Amazon S3. For more information, see What is Amazon S3.
Permission requirements
Only custom global roles with the Create Data Source permission and the system roles super administrator, data source administrator, domain architect, and project administrator can create data sources.
Procedure
On the Dataphin homepage, click Management Hub > Datasource Management in the top navigation bar.
On the Datasource page, click +Create Data Source.
In the Create Data Source page, select Amazon S3 from the File section.
If you have recently used Amazon S3, you can also select it from the Recently Used section. You can also quickly search for Amazon S3 by entering keywords in the search box.
On the Create Amazon S3 Data Source page, configure the connection parameters.
Configure the basic information of the data source.
Parameter
Description
Datasource Name
Enter the name of the data source. The name must meet the following requirements:
It can contain only Chinese characters, letters, digits, underscores (_), and hyphens (-).
It cannot exceed 64 characters in length.
Datasource Code
After you configure the data source code, you can reference tables in the data source in Flink_SQL tasks by using the format
data_source_code.table_nameordata_source_code.schema.table_name. If you need to automatically access the data source in the corresponding environment based on the current environment, use the variable format${data_source_code}.tableor${data_source_code}.schema.table. For more information, see Dataphin data source table development method.ImportantThe data source code cannot be modified after it is configured successfully.
After the data source code is configured successfully, you can preview data on the object details page in the asset directory and asset inventory.
In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, and SelectDB data sources are currently supported.
Data Source Description
A brief description of the data source. It cannot exceed 128 characters.
Data Source Configuration
Select the data source to configure:
If the business data source distinguishes between production and development data sources, select Production + Development Data Source.
If the business data source does not distinguish between production and development data sources, select Production Data Source.
Tag
You can categorize and tag data sources using tags. For information on how to create tags, see Manage data source tags.
Configure the connection parameters between the data source and Dataphin.
If you select Production + Development data source for your data source configuration, you need to configure the connection information for the Production + Development data source. If your data source configuration is Production data source, you only need to configure the connection information for the Production data source.
NoteTypically, production and development data sources should be configured as separate data sources to achieve environment isolation and reduce the impact of development data sources on production data sources. However, Dataphin also supports configuring them as the same data source with identical parameter values.
Parameter
Description
Endpoint
The endpoint of the region where Amazon S3 is located. The format is
http://s3-{Region}.amazonaws.com, where Region is the region where the bucket is located.The endpoint of the Amazon S3 service is related to the region. When accessing different regions, you need to enter different domain names. For more information, see Amazon S3 endpoints.
Region
The region where the bucket is located. This parameter is optional. If the Region is not specified in the Endpoint, you need to fill in the Region.
Bucket
The bucket information corresponding to the Amazon S3 region. A bucket is a container for storing objects. See Amazon S3 bucket overview to obtain the bucket corresponding to the Amazon S3 region.
Directory
If you only have permissions for a specific directory, you can specify the directory path here. For example,
/dataphin/.Access ID, Access Key
The AccessKey ID and AccessKey Secret of the account where the Amazon S3 data source is located.
For information on how to obtain them, see Amazon access keys.
NoteThese are not Alibaba Cloud account AccessKey ID and AccessKey Secret.
Select a Default Resource Group, which will be used to run tasks related to the current data source, including database SQL, offline database migration, and data preview.
Click Test Connection or directly click OK to save and complete the creation of the Amazon S3 data source.
When you click Test Connection, the system tests whether the data source can connect to Dataphin normally. If you directly click OK, the system will automatically test the connection for all selected clusters. However, even if all selected clusters fail the connection test, the data source can still be created normally.