DataWorks provides MaxCompute Reader and MaxCompute Writer for you to read data from and write data to MaxCompute data sources. This topic describes how to add a MaxCompute data source to DataWorks.
Background information
- MaxCompute provides a comprehensive data import scheme that helps achieve fast computing of large amounts of data. DataWorks provides MaxCompute Reader and MaxCompute Writer for you to read data from and write data to MaxCompute data sources.
- When you associate a MaxCompute compute engine with a workspace, DataWorks automatically generates a MaxCompute data source on the Data Source page in the DataWorks console based on the configurations of the MaxCompute compute engine instance. You can also refer to the instructions provided in this topic to add a MaxCompute project to DataWorks as a data source.
Precautions
- Workspaces in standard mode allow you to isolate data sources. You can separately add data sources for the development and production environments to isolate the data sources. This keeps your data secure. For more information, see Isolate a data source in the development and production environments.
- When you associate a MaxCompute compute engine with a workspace for the first time, DataWorks generates a default MaxCompute data source named odps_first on the Data Source page in the DataWorks console. If you associate other MaxCompute compute engines with the workspace subsequently, DataWorks generates MaxCompute data sources named in the format of 0_Region ID_Compute engine instance name.
- The names of the MaxCompute projects based on which the default MaxCompute data sources are generated are the same as the names of the MaxCompute projects that are associated with the workspace as compute engine instances. You can go to the Workspace Management page of the DataWorks console to view the information of the MaxCompute compute engine instances that are associated with a workspace. If you want to modify the information of a MaxCompute compute engine instance that is associated with a workspace, you must make sure that no nodes are running on the compute engine instance before you perform the operation. The nodes include data synchronization nodes of Data Integration, DataStudio nodes, and other nodes that are related to DataWorks. For information about how to view the information of a MaxCompute compute engine, see Associate a MaxCompute compute engine with a workspace.
Add a MaxCompute data source
- Go to the Data Source page.
- Log on to the DataWorks console.
- In the left-side navigation pane, click Workspaces.
- In the top navigation bar, select the region in which the workspace you want to manage resides. On the Workspaces page, find the workspace, move the pointer over the
icon in the Actions column, and then select Workspace Settings.
- In the left-side navigation pane of the page that appears, click Data Source to go to the Data Source page.
Note You can also go to the Data Sources page in Data Integration to add a data source. However, you can add a data source on the Data Sources page in Data Integration only to the production environment. - On the Data Source page, click Add data source in the upper-right corner.
- In the Add data source dialog box, click MaxCompute in the Big Data Storage Systems section.
- In the Add MaxCompute data source dialog box, configure the parameters.
Parameter Description Data Source Name The name of the data source. The name can contain only letters, digits, and underscores (_), and must start with a letter. Data Source Description The description of the data source. The description cannot exceed 80 characters in length. Environment The environment in which the data source is used. Valid values: Development and Production. Note This parameter is displayed only if the workspace is in standard mode.ODPS Endpoint The endpoint of the MaxCompute project. DataWorks automatically obtains the endpoint from system configurations. Tunnel Endpoint The endpoint of the MaxCompute Tunnel service. For more information, see Endpoints. ODPS project name The name of the MaxCompute project. AccessKey ID The AccessKey ID of the account that you use to connect to the MaxCompute project. You can view and obtain the AccessKey ID on the Security Management page. Important You must make sure that the account you use has permissions on the MaxCompute project. For example, you can add the account that you use to the MaxCompute project as a member to obtain permissions on the project.AccessKey Secret The AccessKey secret of the Alibaba Cloud account that you use to connect to the MaxCompute project. - Set Resource Group connectivity to Data Integration.
- Find the desired resource group in the resource group list in the lower part of the dialog box and click Test connectivity in the Actions column. A synchronization node can use only one type of resource group. To ensure that your synchronization nodes can be normally run, you must test the connectivity of all the resource groups for Data Integration on which your synchronization nodes will be run. If you want to test the connectivity of multiple resource groups for Data Integration at a time, select the resource groups and click Batch test connectivity. For more information, see Establish a network connection between a resource group and a data source.Note
- By default, the resource group list displays only exclusive resource groups for Data Integration. To ensure the stability and performance of data synchronization, we recommend that you use exclusive resource groups for Data Integration.
- If you want to test the network connectivity between the shared resource group or a custom resource group and the data source, click Advanced below the resource group list. In the Warning message, click Confirm. Then, all available shared and custom resource groups appear in the resource group list.
- If the data source passes the network connectivity test, click Complete.