Amazon Simple Storage Service (Amazon S3) is an object storage service that allows
you to store and retrieve any amount of data from anywhere. You can add Amazon S3
data sources to your DataWorks workspace and then read data from and write data to
the added data sources. This topic describes how to add an Amazon S3 data source.
Prerequisites
A resource group for Data Integration is created to run the sync node.
You must use a resource group for Data Integration to run the sync node. When you
add an Amazon S3 data source, you must test the connectivity between the data source
and the resource group to ensure that the data source is connected to the resource
group. You must use an exclusive resource group for Data Integration. For more information,
see Exclusive resource groups for Data Integration.
Background information
Workspaces in standard mode support the data source isolation feature. You can add
data sources separately for the development and production environments to isolate
the data sources. This helps keep your data secure. For more information, see Isolate connections between the development and production environments.
Add an Amazon S3 data source
- Go to the Data Source page.
- Log on to the DataWorks console.
- In the left-side navigation pane, click Workspaces.
- After you select the region where the required workspace resides, find the workspace
and click Data Integration in the Actions column.
- In the left-side navigation pane of the Data Integration page, choose to go to the Data Source page.
- On the Data Source page, click Add data source in the upper-right corner.
- In the Add data source dialog box, click S3 in the Semi-structuredstorage section.
- In the Add S3 data source dialog box, set the parameters.
- Configure basic information for the Amazon S3 data source.

Parameter |
Description |
Data Source Name |
The name of the data source. The name can contain letters, digits, and underscores
(_) and must start with a letter.
|
Data source description |
The description of the data source. The description can be a maximum of 80 characters
in length.
|
Endpoint |
The endpoint of the Amazon S3 data source. Example: http://s3.ap-northeast-1.amazonaws.com . You can query the endpoint of the source Amazon S3 bucket in the Amazon S3 console.
|
Bucket |
The name of the Amazon S3 bucket. A bucket is a storage space that serves as a container
for storing objects.
You can create one or more buckets and add one or more objects to each bucket. During data synchronization, DataWorks can search for objects only in the bucket that
is specified by this parameter.
|
AccessKey ID |
The AccessKey ID of the account that you use to connect to the Amazon S3 bucket. You
can view the AccessKey ID on the Security Management page.
|
AceessKey Secret |
The AccessKey secret of the account that you use to connect to the Amazon S3 bucket.
|
- Test the network connectivity between the Amazon S3 data source and the resource group.
- Select Data Integration for the Resource Group connectivity parameter.
- In the resource group list, find the resource group that you want to use and click
Test connectivity in the Actions column.
A synchronization node can use only one type of resource group. To ensure that your
synchronization nodes can be normally run, you must test the connectivity of all the
resource groups for Data Integration on which your synchronization nodes will be run.
If you want to test the connectivity of multiple resource groups for Data Integration
at a time, select the resource groups and click
Batch test connectivity. For more information, see
Select a network connectivity solution.
Note
- By default, the resource group list displays only exclusive resource groups for Data
Integration. To ensure the stability and performance of data synchronization, we recommend
that you use exclusive resource groups for Data Integration.
- If you want to test the network connectivity between the shared resource group or
a custom resource group and the data source, click Advanced below the resource group list. In the Warning message, click Confirm. Then, all available shared and custom resource groups appear in the resource group
list.
- After the data source passes the connectivity test, click Complete.
What to do next
You have learned how to add an Amazon S3 data source. You can proceed to subsequent
tutorials. In subsequent tutorials, you will learn how to configure Amazon S3 Reader.
For more information, see S3 Reader.