All Products
Search
Document Center

DataWorks:Add an AnalyticDB for PostgreSQL data source

Last Updated:Apr 15, 2024

Before you can develop and manage AnalyticDB for PostgreSQL tasks in DataWorks, you must add an AnalyticDB for PostgreSQL instance to the required DataWorks workspace as a data source. This way, you can use the AnalyticDB for PostgreSQL data source in different services of DataWorks and perform operations such as data synchronization, data development, and data analysis based on the data source.

Prerequisites

  • An AnalyticDB for PostgreSQL instance is purchased, and an AnalyticDB for PostgreSQL database is created in the instance. For more information, see Create an instance and Manage databases.

    Note

    We recommend that you create an AnalyticDB for PostgreSQL instance in the same region as the workspace to which you want to add an AnalyticDB for PostgreSQL data source. If the regions are different, you can add only a cross-region data source to the workspace. The data source cannot be associated with DataStudio for data development or periodic task scheduling. The data source can be used only for data synchronization.

  • The required resource group is purchased and configured. An AnalyticDB for PostgreSQL data source supports only exclusive resource groups for Data Integration, exclusive resource groups for scheduling, and exclusive resource groups for DataService Studio. After the AnalyticDB for PostgreSQL data source is added, you can use the data source in scenarios such as data synchronization, development and scheduling of computing tasks, and generation of DataService Studio APIs. In these scenarios, a resource group for Data Integration, a resource group for scheduling, and a resource group for DataService Studio of DataWorks are separately required.

    You must purchase and configure the required resource group based on the use scenario of the AnalyticDB for PostgreSQL data source and establish a network connection between the data source and resource group in advance. For information about resource groups provided by DataWorks and how to select a resource group, see Overview.

  • A DataWorks workspace is created, or the account that you use is added to the desired workspace as a member.

    You must add the desired AnalyticDB for PostgreSQL instance to the workspace as a data source. This way, you can use the data source to perform data development operations in the workspace. In addition, you must associate the purchased resource group with the workspace and establish a network connection between the resource group and data source. For information about how to create a workspace, see Create and manage workspaces.

    Note

    You can add the same AnalyticDB for PostgreSQL instance to multiple workspaces as a data source.

Limits

  • An AnalyticDB for PostgreSQL data source can be associated with DataStudio only if the data source meets the following conditions: The AnalyticDB for PostgreSQL instance based on which the data source is added resides in the same region and belongs to the same Alibaba Cloud account as the workspace. This way, the AnalyticDB for PostgreSQL data source can be used for data development and periodic task scheduling.

  • You can add an AnalyticDB for PostgreSQL instance that does not belong to the current Alibaba Cloud account to a workspace within the current Alibaba Cloud account as a data source. After the data source is added, you can use only a RAM role to access the related AnalyticDB for PostgreSQL instance. AnalyticDB for PostgreSQL data sources that are added across accounts cannot be used for data development or periodic task scheduling.

  • AnalyticDB for PostgreSQL data sources that are added in connection string mode cannot be used for data development or periodic task scheduling. These data sources can be used only for data synchronization. For more information, see Add a data source.

  • You can use only an exclusive resource group for Data Integration or exclusive resource group for scheduling to run an AnalyticDB for PostgreSQL task. We recommend that you use an exclusive resource group for DataService Studio to create DataService Studio APIs based on an AnalyticDB for PostgreSQL data source. For more information, see Create and use an exclusive resource group for Data Integration, Create and use an exclusive resource group for scheduling, and Create and use an exclusive resource group for DataService Studio.

Preparations: Permission description and configuration

  • If you want to add a data source as a RAM user or by using a RAM role, make sure that the RAM user or RAM role meets one of the following requirements:

  • If you want to add a data source across accounts, you must also make the following permission configurations:

    If you add an AnalyticDB for PostgreSQL data source across Alibaba Cloud accounts, you can use only a RAM role to access the related AnalyticDB for PostgreSQL instance, and you must grant the required permissions to the RAM role.

    Sample scenario

    In this sample scenario, an AnalyticDB for PostgreSQL instance that belongs to Alibaba Cloud Account B is added to a DataWorks workspace that belongs to Alibaba Cloud Account A as a data source, and Alibaba Cloud Account A needs to access the AnalyticDB for PostgreSQL instance.

    • Alibaba Cloud Account A: A DataWorks workspace is created within Alibaba Cloud Account A, and Alibaba Cloud Account A needs to access an AnalyticDB for PostgreSQL instance that is created within Alibaba Cloud Account B.

    • Alibaba Cloud Account B: An AnalyticDB for PostgreSQL instance is created within Alibaba Cloud Account B, and an AnalyticDB for PostgreSQL database is created in the instance. In addition, a RAM role is created within Alibaba Cloud Account B and is assigned to Alibaba Cloud Account A. This way, Alibaba Cloud Account A can access the AnalyticDB for PostgreSQL instance by assuming the RAM role.

    After the preceding operations are performed, you can use Alibaba Cloud Account A to add an AnalyticDB for PostgreSQL instance to DataWorks as a data source and then access the AnalyticDB for PostgreSQL instance.

    Create and grant permissions to a RAM role

    You must use Alibaba Cloud Account B to create a RAM role, grant the RAM role the permissions to view and modify information about the AnalyticDB for PostgreSQL instance, and then assign the RAM role to Alibaba Cloud Account A. This way, Alibaba Cloud Account A can access services within Alibaba Cloud Account B by assuming the RAM role.

    1. Create a RAM role and add Alibaba Cloud Account A as the trusted cloud account of the RAM role. The RAM role is required for Alibaba Cloud Account A to access the AnalyticDB for PostgreSQL instance.

    2. Modify the trust policy of the RAM role to allow Alibaba Cloud Account A to assume the RAM role. The following code shows the document of the modified trust policy:

      {
          "Version": "1",
          "Statement": [
              {
                  "Action": [
                      "sts:AssumeRole",
                      "gpdb:DescribeDBInstanceAttribute",
                      "gpdb:DescribeDBInstances",
                      "gpdb:DescribeResourceUsage",
                      "gpdb:DescribeDBInstanceIPArrayList",
                      "gpdb:DescribeDBClusterIPArrayList",
                      "gpdb:DescribeDBInstancePerformance",
                      "gpdb:DescribeDBInstanceNetInfo",
                      "gpdb:DescribeRegions",
                      "gpdb:ModifySecurityIps"
                  ],
                  "Resource": "*",
                  "Effect": "Allow",
                  "Principal": {
                      "Service": [
                          "ID of Alibaba Cloud Account A@engine.dataworks.aliyuncs.com"
                      ]
                  }
              }
          ]
      }

      For information about how to create a RAM role and modify the trust policy of the RAM role, see Create a RAM role for a trusted Alibaba Cloud account and Edit the trust policy of a RAM role.

Entry points for adding a data source

  1. Go to the Data Source page.

    1. Log on to the DataWorks console. In the left-side navigation pane, click Management Center. On the page that appears, select the desired workspace from the drop-down list and click Go to Management Center.

    2. In the left-side navigation pane of the page that appears, click Data Source. The Data Source page appears.

  2. On the Data Source page, click Add Data Source. In the Add Data Source dialog box, click AnalyticDB for PostgreSQL. On the page that appears, configure the parameters to add an AnalyticDB for PostgreSQL data source.

    You can also go to the Data Sources page in Data Integration to add an AnalyticDB for PostgreSQL data source. You can add a data source only to the production environment on the Data Sources page in Data Integration. After the data source is added, you must manage the data source on the Data Source page in SettingCenter. You can go to Data Integration to view the types of data sources that you can add in this service.

Add a data source

DataWorks allows you to add a data source in one of the following modes.

Note

If you use a workspace in standard mode, you must separately add a data source in the development environment and production environment. For information about workspace modes, see Differences between workspaces in basic mode and workspaces in standard mode.

Mode 1: Alibaba Cloud instance mode

  1. Configure the parameters in the Basic Information section.image.png

    Parameter

    Description

    Data Source Name

    The name of the data source in DataWorks. The name must be unique within the current tenant.

    Configuration Mode

    The mode in which you want to add the data source. Select Alibaba Cloud Instance Mode.

    Alibaba Cloud Account

    Specifies whether the AnalyticDB for PostgreSQL instance you want to use belongs to the current Alibaba Cloud account or another Alibaba Cloud account. Valid values:

    • Current Alibaba Cloud Account: The AnalyticDB for PostgreSQL instance belongs to the current Alibaba Cloud account.

    • Another Alibaba Cloud Account: The AnalyticDB for PostgreSQL instance belongs to another Alibaba Cloud account.

      Note

      If you set this parameter to Another Alibaba Cloud Account, you must add the AnalyticDB for PostgreSQL data source across accounts. After the data source is added, you can use only a RAM role to access the related AnalyticDB for PostgreSQL instance. In addition, the data source cannot be associated with DataStudio for data development or periodic task scheduling. The data source can be used only in Data Integration for data synchronization.

    Other parameters that you must configure vary based on the value of the Alibaba Cloud Account parameter.

    Region

    The region in which the AnalyticDB for PostgreSQL instance that you want to use resides.

    Note

    If the region that you selected is different from the region in which the workspace resides, you cannot associate the data source with DataStudio after you add the data source. In this case, the data source cannot be used for data development or periodic task scheduling. The data source can be used only in Data Integration for data synchronization.

    Other items such as the AnalyticDB for PostgreSQL instance and the access identity

    Other parameters that you must configure vary based on the value of the Alibaba Cloud Account parameter.

    Value of the Alibaba Cloud Account parameter: Current Alibaba Cloud Account

    • Instance: Select the AnalyticDB for PostgreSQL instance that you want to add as a data source from the Instance drop-down list.

    • Database Name, Username, Password: Enter the name of the desired AnalyticDB for PostgreSQL database in the Database Name field, and separately enter the username and password that are used to access the database in the Username and Password fields. You can log on to the AnalyticDB for PostgreSQL to obtain information about the AnalyticDB for PostgreSQL database.

    Value of the Alibaba Cloud Account parameter: Another Alibaba Cloud Account

    • UID Of Alibaba Cloud Account, Instance ID, RAM Role: Enter the UID of the Alibaba Cloud account to which the AnalyticDB for PostgreSQL instance you want to add as a data source belongs in the UID Of Alibaba Cloud Account field, enter the ID of the instance in the Instance ID field, and enter the RAM role that you want to use to access the instance in the RAM Role field. The AnalyticDB for PostgreSQL instance that you specify is added to DataWorks as a data source, and you must use the RAM role that you specify to access the instance.

      Note

      For information about the permission configuration required by a RAM role to access a data source that is added across accounts, see the Preparations: Permission description and configuration in this topic.

    • Database Name, Username, Password: Enter the name of the desired AnalyticDB for PostgreSQL database in the Database Name field, and separately enter the username and password that are used to access the database in the Username and Password fields. You can log on to the AnalyticDB for PostgreSQL to obtain information about the AnalyticDB for PostgreSQL database.

    Note

    If you set this parameter to Another Alibaba Cloud Account, you must add the AnalyticDB for PostgreSQL data source across accounts. After the data source is added, you can use only a RAM role to access the related AnalyticDB for PostgreSQL instance. In addition, the data source cannot be associated with DataStudio for data development or periodic task scheduling. The data source can be used only in Data Integration for data synchronization.

  2. Test the network connectivity between the data source and a resource group.

    Resource groups provided by DataWorks can be classified into resource groups for Data Integration, resource groups for scheduling, and resource groups for DataService Studio based on the use scenarios of the resource groups. For more information about these resource groups, see Overview.

    You can find the resource group that you want to use in the Connection Configuration section based on the scenario in which you want to use the data source, and then test the network connectivity between the resource group and data source. If the network connectivity test fails, tasks that use the data source cannot be run.

Mode 2: Connection string mode

You can add a data source that belongs only to the current Alibaba Cloud account in connection string mode.

  1. Configure the parameters in the Basic Information section.image.png

    Parameter

    Description

    Data Source Name

    The name of the data source in DataWorks. The name must be unique within the current tenant.

    Configuration Mode

    Set this parameter to Connection String Mode.

    JDBC URL

    The JDBC URL that is used to connect to the AnalyticDB for PostgreSQL database. You can log on to the AnalyticDB for PostgreSQL to obtain information about the AnalyticDB for PostgreSQL database and the port number over which you can access the database.

    Username

    The username that you can use to access the database.

    Password

    The password that you can use to access the database.

  2. Test the network connectivity between the data source and a resource group.

    Resource groups provided by DataWorks can be classified into resource groups for Data Integration, resource groups for scheduling, and resource groups for DataService Studio based on the use scenarios of the resource groups. For more information about these resource groups, see Overview.

    You can find the resource group that you want to use in the Connection Configuration section based on the scenario in which you want to use the data source, and then test the network connectivity between the resource group and data source. If the network connectivity test fails, tasks that use the data source cannot be run.

What to do next

After the data source is added, you can perform the following operations based on your business requirements:

  • Develop and schedule computing tasks:

    DataWorks DataStudio and Operation Center provide the capabilities of developing and scheduling AnalyticDB for PostgreSQL tasks. If you want to develop AnalyticDB for PostgreSQL tasks based on the AnalyticDB for PostgreSQL data source or periodically schedule AnalyticDB for PostgreSQL tasks, you must go to the DataStudio page in the DataWorks console and associate the AnalyticDB for PostgreSQL data source with DataStudio.

    Note

    You can associate an AnalyticDB for PostgreSQL data source with DataStudio only if the AnalyticDB for PostgreSQL instance based on which the data source is added resides in the same region and belongs to the same Alibaba Cloud account as the workspace to which the data source is added.

  • Perform data synchronization:

    DataWorks Data Integration provides AnalyticDB for PostgreSQL Reader and AnalyticDB for PostgreSQL Writer for you to read data from and write data to the AnalyticDB for PostgreSQL data source. You can configure a batch synchronization task for the AnalyticDB for PostgreSQL data source in DataStudio or configure a synchronization task for the AnalyticDB for PostgreSQL data source in Data Integration based on your business requirements to perform data synchronization.

  • Manage the data source: You can go to the Data Source page in SettingCenter to perform management operations on the data source. For example, you can edit or delete the data source.