All Products
Search
Document Center

DataWorks:Add a Hologres data source

Last Updated:Apr 26, 2024

Before you can develop and manage Hologres tasks in DataWorks, you must add a Hologres instance to the desired DataWorks workspace as a data source. This way, you can use the Hologres data source in different services of DataWorks and perform operations such as data synchronization, data development, and data analysis based on the Hologres data source.

Prerequisites

  • A Hologres instance is created, and a database is created for the Hologres instance. For more information, see Purchase a Hologres instance and Create a database.

    Note

    We recommend that you create a Hologres instance in the same region as the workspace to which you want to add a Hologres data source. If the regions are different, you can add only a cross-region data source to the workspace. The data source cannot be referenced as a compute engine instance. This indicates that the data source cannot be used for computing tasks in DataStudio or Operation Center. The data source can be used only for data synchronization.

  • The required resource group is purchased and configured. A Hologres data source supports only exclusive resource groups for Data Integration, exclusive resource groups for scheduling, and exclusive resource groups for DataService Studio.

    After the Hologres data source is added, you can use the data source in scenarios such as data synchronization, development and scheduling of computing tasks, and generation of DataService Studio APIs. In these scenarios, a resource group for Data Integration, a resource group for scheduling, and a resource group for DataService Studio of DataWorks are separately required.

    You must purchase and configure the required resource group based on the use scenario of the Hologres data source and establish a network connection between the data source and resource group in advance. For information about resource groups provided by DataWorks and how to select a resource group, see Overview.

  • A DataWorks workspace is created, or the account that you use is added to the desired workspace as a member.

    You must add the desired Hologres instance to the workspace as a data source. This way, you can use the data source to perform data development operations in the workspace. In addition, you must associate the purchased resource group with the workspace and establish a network connection between the resource group and data source. For information about how to create a workspace, see Create and manage workspaces.

    Note

    You can add the same Hologres instance to multiple workspaces as a data source.

Limits

  • A Hologres data source can be referenced as a compute engine instance only if the Hologres data source meets the following conditions: The Hologres instance based on which the Hologres data source is added resides in the same region and belongs to the same Alibaba Cloud account as the workspace, and SSL authentication is not enabled for the Hologres data source. This way, the Hologres data source can be used for computing tasks in DataStudio and Operation Center.

  • You can add a Hologres instance that does not belong to the current Alibaba Cloud account to a workspace within the current Alibaba Cloud account as a data source. After the data source is added, you can use only a RAM role to access the related Hologres instance. Hologres data sources that are added across accounts cannot be used for data development or task scheduling.

  • When you add a Hologres data source, you can specify whether to enable SSL authentication for the data source. If transmission encryption is enabled for the Hologres instance that you want to add as a data source, you can enable SSL authentication when you add the Hologres instance as a data source. Hologres data sources for which SSL authentication is enabled cannot be used for data development or task scheduling.

  • You can use only an exclusive resource group for Data Integration or exclusive resource group for scheduling to run a Hologres task that is configured for a Hologres data source. You can use an exclusive resource group for DataService Studio to provide data services of Hologres. For more information, see Create and use an exclusive resource group for Data Integration, Create and use an exclusive resource group for scheduling, or Create and use an exclusive resource group for DataService Studio.

Preparations: Permission description and configuration

  1. Configure the required permissions at the DataWorks side.

    Before you add a Hologres data source to DataWorks, you must make sure that your Alibaba Cloud account has the required permissions to perform operations on a data source, such as the permissions to add, remove, and modify a data source and test the network connectivity of a data source.

    • If the account that you use is attached the AdministratorAccess or AliyunDataWorksFullAccess policy, the account has the permissions to perform the preceding operations.

    • If you want to add a data source as a RAM user or by using a RAM role, you must take note of the following items:

      • If the RAM user or RAM role is assigned the Workspace Owner role, the RAM user or RAM role has the permissions to perform the preceding operations.

      • If the RAM user or RAM role is not assigned the Workspace Owner role, you must assign the O&M or Workspace Administrator role to the RAM user or RAM role. For more information, see Add a RAM user to a workspace as a member and assign roles to the member.

  2. Configure the required permissions at the Hologres side.

    After a Hologres data source is added, you must use the access identity that is specified for the data source to access the related Hologres instance. You must make sure that the Alibaba Cloud account that corresponds to the access identity has operation permissions on the Hologres instance. For information about permissions on a Hologres instance and how to grant a user the permissions on a Hologres instance, see Overview.

  3. Optional. Configure the required permissions if you want to add a Hologres data source across accounts.

    If you add a Hologres data source across Alibaba Cloud accounts, you can use only a RAM role to access the related Hologres instance, and you must grant the required permissions to the RAM role.

    • Example for adding a Hologres data source across Alibaba Cloud accounts:

      In this example, Alibaba Cloud Account A is used to log on to the DataWorks console and add a Hologres instance that belongs to Alibaba Cloud Account B to DataWorks as a data source.

      • Alibaba Cloud Account A: DataWorks is activated within Alibaba Cloud Account A and Alibaba Cloud Account A needs to access the Hologres instance.

      • Alibaba Cloud Account B: A Hologres instance is created within Alibaba Cloud Account B and a Hologres database is created for the instance.

    • Requirements on a RAM role of Alibaba Cloud Account B and permission configuration for the RAM role:

      • You must create a RAM role within Alibaba Cloud Account B. The RAM role is required for Alibaba Cloud Account A to access the Hologres instance.

      • You must add Alibaba Cloud Account A as the trusted cloud account of the RAM role to allow Alibaba Cloud Account A to assume the RAM role.

      • You must modify the trust policy of the RAM role to allow Alibaba Cloud Account A to assume the RAM role. The following code shows the document of the modified trust policy:

        {
            "Version": "1",
            "Statement": [
                {
                    "Action": [
                        "sts:AssumeRole",
                        "hologram:GetInstance",
                        "hologram:ListInstances",
                        "hologram:ListWarehouses"
                    ],
                    "Resource": "*",
                    "Effect": "Allow",
                    "Principal": {
                        "Service": [
                            "ID of Alibaba Cloud Account A@engine.dataworks.aliyuncs.com"
                        ]
                    }
                }
            ]
        }

        For information about how to create a RAM role and modify the trust policy of the RAM role, see Create a RAM role for a trusted Alibaba Cloud account and Edit the trust policy of a RAM role.

Add a data source

  1. Go to the Data Source page.

    1. Log on to the DataWorks console. In the left-side navigation pane, click Management Center. On the page that appears, select the desired workspace from the drop-down list and click Go to Management Center.

    2. In the left-side navigation pane of the page that appears, click Data Source. The Data Source page appears.

  2. On the Data Source page, click Add Data Source. In the Add Data Source dialog box, click Hologres. On the page that appears, configure the parameters.

  3. Configure information for the Hologres data source.

    Configure parameters such as Data Source Name in the Basic Information section. The following table describes the parameters that you must configure.

    Note

    If you use a workspace in standard mode, you must separately add data sources in the development environment and production environment. For information about the workspace modes, see Differences between workspaces in basic mode and workspaces in standard mode.

    Parameter

    Description

    Data Source Name

    The name of the data source in DataWorks. The name must be unique within the current tenant.

    Authentication Method

    For a new data source, the value of this parameter is fixed as Alibaba Cloud account and Alibaba Cloud RAM role.

    Note

    For an existing data source that is added by using an AccessKey pair, we recommend that you change the value of this parameter to Alibaba Cloud account and Alibaba Cloud RAM role for the data source.

    Alibaba Cloud Account

    Specifies whether the Hologres instance you want to use belongs to the current Alibaba Cloud account or another Alibaba Cloud account. Valid values:

    • Current Alibaba Cloud Account: The Hologres instance belongs to the current Alibaba Cloud account.

    • Another Alibaba Cloud Account: The Hologres instance belongs to another Alibaba Cloud account.

      Note

      If you set this parameter to Another Alibaba Cloud Account, you must add the Hologres data source across accounts. After the Hologres data source is added, you can use only a RAM role to access the related Hologres instance.

    Region

    The region in which the Hologres instance that you want to use resides.

    Note

    If the region that you selected is different from the region in which the workspace resides, you cannot reference the data source as a compute engine instance of the workspace after you add the data source. This indicates that the data source cannot be used in DataStudio or Operation Center and can be used only in Data Integration for data synchronization.

    Other items such as the Hologres instance and default access identity

    The other parameters that you must configure vary based on the value of the Alibaba Cloud Account parameter.

    Value of the Alibaba Cloud Account parameter: Current Alibaba Cloud Account

    • Hologres Instance and Database Name: Select the Hologres instance that you want to add as a data source from the Hologres Instance drop-down list and enter the name of the desired Hologres database in the Database Name field. You can log on to the Hologres console, and obtain the Hologres instance and database information on the details page of the Hologres instance.

    • Default Access Identity: The default access identity that is used to access the data source.

      • Development environment: The value of this parameter is fixed as Executor. Executor indicates the current logon account.

        For example, if you create and debug a Hologres task on the DataStudio page in the DataWorks console, the default access identity that is used to access Hologres is the Alibaba Cloud account used to log on to the DataWorks console.

      • Production environment: The value of this parameter can be Alibaba Cloud Account, Alibaba Cloud RAM User, or Alibaba Cloud RAM role.

        Note

        For information about how to use a RAM role to perform operations, see (Advanced) Use a RAM role to log on to the DataWorks console and use DataWorks.

        The default access identities that are displayed in the Default Access Identity drop-down list vary based on the account that you use to add the Hologres data source.

        When a Hologres task is periodically scheduled in Operation Center, the default access identity that you selected is used to access the related Hologres instance.

    Value of the Alibaba Cloud Account parameter: Another Alibaba Cloud Account

    Note

    If you want to add the Hologres data source across accounts, you must set this parameter to Another Alibaba Cloud Account. After the Hologres data source is added, you can use only a RAM role to access the related Hologres instance. Hologres data sources that are added across accounts cannot be used for data development or task scheduling.

    • UID Of Alibaba Cloud Account and RAM Role: Select the UID of the Another Alibaba Cloud Account to which the Hologres instance you want to add as a data source belongs from the Alibaba Cloud Primary Account UID drop-down list, and then select the RAM role that you want to use to access the Hologres instance from the RAM Role drop-down list. You must use the selected RAM role to access the Hologres instance.

    • Hologres Instance and Database Name: Enter the ID of the Hologres instance that you want to add to the current workspace as a data source in the Hologres Instance field, and then enter the name of the desired Hologres database in the Database Name field. You can log on to the Hologres console, and obtain the Hologres instance and database information on the details page of the Hologres instance.

    Authentication Method and SSL Authentication

    Specifies whether to enable SSL authentication for the Hologres data source and whether to implement encrypted transmission for the Hologres data source.

    • If you want to set the Authentication Method parameter to SSL Authentication, you must make sure that encrypted transmission is enabled for the Hologres instance you use. If encrypted transmission is not enabled for the Hologres instance and you set the Authentication Method parameter to SSL authentication, an error is reported when you access the Hologres instance.

    • If you enable SSL authentication for the Hologres data source, the Hologres data source cannot be used for data development or task scheduling after it is added.

  4. Test the network connectivity between the Hologres data source and a resource group.

    Resource groups provided by DataWorks can be classified into resource groups for Data Integration, resource groups for scheduling, and resource groups for DataService Studio based on the use scenarios of the resource groups. For more information about these resource groups, see Overview.

    You can find the resource group that you want to use in the Connection Configuration section and test the network connectivity between the data source and resource group. If the network connectivity fails, tasks that use the data source cannot be run.

What to do next

To ensure the smoothness of data development, we recommend that you read Usage notes for development of Hologres tasks in DataWorks to understand information such as the procedure of using Hologres in DataWorks, fees for data development by using Hologres, environment preparation, and permission management before you perform the related operations.

After the data source is added, you can perform the following operations based on your business requirements:

  • Develop and schedule computing tasks:

    DataWorks DataStudio and Operation Center provide the capabilities of developing and scheduling Hologres tasks. If you want to develop Hologres tasks based on the Hologres data source or periodically schedule Hologres tasks, you must go to the DataStudio page in the DataWorks console and associate the Hologres data source with DataStudio.

    Note

    You can associate a Hologres data source with DataStudio only if the Hologres instance based on which the data source is added resides in the same region and belongs to the same Alibaba Cloud account as the workspace to which the data source is added.

  • Perform data synchronization:

    DataWorks Data Integration provides Hologres Reader and Hologres Writer for you to read data from and write data to the Hologres data source. You can configure a batch or real-time synchronization task for the Hologres data source in DataStudio or configure a synchronization task for the Hologres data source in Data Integration based on your business requirements to perform data synchronization.

  • Manage the data source: You can go to the Data Source page in SettingCenter to perform management operations on the data source. For example, you can edit or delete the data source.