Amazon Simple Storage Service (Amazon S3) is an object storage service that allows you to store and retrieve any amount of data from anywhere. You can add Amazon S3 data sources to your DataWorks workspace and then read data from and write data to the added data sources. This topic describes how to add an Amazon S3 data source.

Prerequisites

A resource group for Data Integration is created to run the sync node.

You must use a resource group for Data Integration to run the sync node. When you add an Amazon S3 data source, you must test the connectivity between the data source and the resource group to ensure that the data source is connected to the resource group. You must use an exclusive resource group for Data Integration. For more information, see Exclusive resource groups for Data Integration.

Background information

Workspaces in standard mode support the data source isolation feature. You can add data sources separately for the development and production environments to isolate the data sources. This helps keep your data secure. For more information, see Isolate connections between the development and production environments.

Add an Amazon S3 data source

  1. Go to the Data Source page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. After you select the region where the required workspace resides, find the workspace and click Data Integration in the Actions column.
    4. In the left-side navigation pane of the Data Integration page, choose Data Source > Data Sources to go to the Data Source page.
  2. On the Data Source page, click Add data source in the upper-right corner.
  3. In the Add data source dialog box, click S3 in the Semi-structuredstorage section.
  4. In the Add S3 data source dialog box, set the parameters.
    1. Configure basic information for the Amazon S3 data source.
      Configure the basic information
      Parameter Description
      Data Source Name

      The name of the data source. The name can contain letters, digits, and underscores (_) and must start with a letter.

      Data source description

      The description of the data source. The description can be a maximum of 80 characters in length.

      Endpoint The endpoint of the Amazon S3 data source. Example: http://s3.ap-northeast-1.amazonaws.com. You can query the endpoint of the source Amazon S3 bucket in the Amazon S3 console.
      Bucket The name of the Amazon S3 bucket. A bucket is a storage space that serves as a container for storing objects.

      You can create one or more buckets and add one or more objects to each bucket.

      During data synchronization, DataWorks can search for objects only in the bucket that is specified by this parameter.
      AccessKey ID The AccessKey ID of the account that you use to connect to the Amazon S3 bucket. You can view the AccessKey ID on the Security Management page.
      AceessKey Secret The AccessKey secret of the account that you use to connect to the Amazon S3 bucket.
  5. Test the network connectivity between the Amazon S3 data source and the resource group.
    1. Select Data Integration for the Resource Group connectivity parameter.
    2. In the resource group list, find the resource group that you want to use and click Test connectivity in the Actions column.
      A synchronization node can use only one type of resource group. To ensure that your synchronization nodes can be normally run, you must test the connectivity of all the resource groups for Data Integration on which your synchronization nodes will be run. If you want to test the connectivity of multiple resource groups for Data Integration at a time, select the resource groups and click Batch test connectivity. For more information, see Select a network connectivity solution.
      Note
      • By default, the resource group list displays only exclusive resource groups for Data Integration. To ensure the stability and performance of data synchronization, we recommend that you use exclusive resource groups for Data Integration.
      • If you want to test the network connectivity between the shared resource group or a custom resource group and the data source, click Advanced below the resource group list. In the Warning message, click Confirm. Then, all available shared and custom resource groups appear in the resource group list.
  6. After the data source passes the connectivity test, click Complete.

What to do next

You have learned how to add an Amazon S3 data source. You can proceed to subsequent tutorials. In subsequent tutorials, you will learn how to configure Amazon S3 Reader. For more information, see S3 Reader.