All Products
Search
Document Center

Dataphin:Create Amazon S3 Data Source

Last Updated:Mar 05, 2025

By creating an Amazon S3 data source, Dataphin can read business data from Amazon S3 or write data to Amazon S3. This topic describes how to create an Amazon S3 data source.

Background information

Amazon S3 (Simple Storage Service) is a cloud storage service provided by Amazon. It enables individuals, organizations, and enterprises to store and retrieve data in the cloud. To integrate with Dataphin for data development or to write Dataphin data to Amazon S3, you must first create an Amazon S3 data source. For more information about Amazon S3, see What is Amazon S3.

Permission description

Only custom global roles with the Create Data Source permission point and system roles such as Super Administrator, Data Source Administrator, Board Architect, and Project Administrator are authorized to create data sources.

Procedure

  1. On the Dataphin home page, click the top menu bar Management Center > Datasource Management.

  2. On the Datasource page, click + Create Data Source.

  3. In the Create Data Source dialog box, in the File area, select Amazon S3.

    If you have recently used Amazon S3, you can also select it in the Recently Used area. Alternatively, enter the keyword for Amazon S3 in the search box for a quick search.

  4. In the Create Amazon S3 Data Source dialog box, configure the connection data source parameters.

    1. Configure the basic information of the data source.

      Parameter

      Description

      Datasource Name

      Enter the name of the data source. The naming conventions are as follows:

      • Can only contain Chinese characters, uppercase and lowercase English letters, numbers, underscores (_), or hyphens (-).

      • Cannot exceed 64 characters in length.

      Datasource Code

      After configuring the data source code, you can reference tables in the data source in Flink_SQL tasks using the format Datasource Code.Table Name or Datasource Code.Schema.Table Name. If you need to automatically access the data source corresponding to the environment, use the variable format ${Datasource Code}.table or ${Datasource Code}.schema.table. For more information, see Dataphin Data Source Table Development Method.

      Important
      • Once the data source encoding is successfully configured, it cannot be modified.

      • After the data source encoding is successfully configured, data preview can be performed on the object details page of the asset directory and asset checklist.

      • In Flink SQL, currently only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, and SelectDB data sources are supported.

      Datasource Description

      A brief description of the data source. Must not exceed 128 characters.

      Datasource Configuration

      Select the data source to be configured:

      • If the business data source distinguishes between production data source and development data source, select Production + Development Data Source.

      • If the business data source does not distinguish between production data source and development data source, select Production Data Source.

      Tag

      You can classify and tag the data source based on labels. For information on how to create tags, see Manage Data Source Tags.

    2. Configure the connection parameters between the data source and Dataphin.

      If your data source configuration selects Production + Development Data Source, you need to configure the connection information for Production + Development Data Source. If your data source configuration is Production Data Source, you only need to configure the connection information for Production Data Source.

      Note

      Typically, production and development data sources should be configured as separate entities to maintain environment isolation and minimize the impact of development activities on production data sources. However, Dataphin also allows for them to be configured with the same parameter values.

      Parameter

      Description

      Endpoint

      The endpoint corresponding to the region where Amazon S3 is located, formatted as http://s3-{Region}.amazonaws.com , where Region is the region where the bucket is located.

      The endpoint of the Amazon S3 service is related to the region. Different domain names need to be filled in when accessing different regions. For more information, see Amazon S3 Endpoints.

      Region

      The region where the bucket is located, not required. If the region is not specified in the endpoint, you need to fill in the region.

      Bucket

      The bucket information corresponding to the region where Amazon S3 is located. It is used as a container for storing objects. See Amazon S3 Bucket Overview for information on obtaining the bucket corresponding to the region where Amazon S3 is located.

      Directory

      If you only have permissions for a specific directory, you can specify the directory path here. For example, /dataphin/.

      Access ID, Access Key

      The AccessKey ID and AccessKey Secret of the account where the Amazon S3 data source is located.

      For information on how to obtain them, see Amazon Access Keys.

      Note

      These are not the AccessKey ID and AccessKey Secret of an Alibaba Cloud account.

  5. Click Test Connection to verify that the data source can connect to Dataphin successfully.

  6. After a successful test, click OK to finalize the creation of the Amazon S3 data source.