On the Workspace Management page of a workspace, you can manage and configure the workspace. DataWorks supports a variety of compute engines, such as MaxCompute, E-MapReduce (EMR), Realtime Compute for Apache Flink, Hologres, Graph Compute, AnalyticDB for PostgreSQL, AnalyticDB for MySQL, and ClickHouse.

Go to the Workspace Management page

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces.
  3. Go to the Workspace Management page of a workspace.
    You can use one of the following methods to go to the Workspace Management page:
    • On the Workspaces page, find the workspace that you want to configure and click Workspace Settings in the Actions column. In the Workspace Settings panel, click More. The Workspace Management page appears. Click More
    • On the Workspaces page, find the workspace that you want to configure and click Data Analytics in the Actions column. On the DataStudio page, click the Workspace Management icon in the upper-right corner. The Workspace Management page appears. Workspace Management page
  4. On the Workspace Management page, configure Basic properties, Scheduling Properties, Security Settings, and Compute Engine Information for the workspace based on your business requirements.

Configure basic properties

Basic properties
Parameter Description
Workspace ID The ID of the workspace.
Workspace Name The name of the workspace. The name must start with a letter and can contain only letters and digits. It is not case-sensitive. The name uniquely identifies the workspace and cannot be changed after the workspace is created.
Status The status of the workspace. Valid values: Normal, Deleted, Initializing, Initialization Failed, Manual Disable, Deleting, Deletion Failed, Suspended (Overdue), Updating, and Update Failed.
Note
  • If a workspace fails to be created, it enters the Initialization Failed state. In this case, you can create the workspace again.
  • A workspace in the Normal state can be disabled by the workspace administrator. After a workspace is disabled, all the features of the workspace become unavailable. However, the data of the workspace is retained, and the committed nodes can run normally.
  • The workspace administrator can click Enable in the Actions column to recover the disabled workspace to the Normal state.
Display Name The display name that is used to identify the workspace. The display name can contain only letters and digits. You can change it based on your requirements.
Creation Time The time when the workspace was created. The value cannot be changed.
Mode The mode of the workspace. Valid values: Basic Mode and Standard Mode.
Note In this example, a workspace in standard mode is used.
Owner The owner of the workspace, who has permissions to delete and disable the workspace. The owner of the workspace cannot be changed.
Description The description of the workspace. You can modify the description based on your requirements. The description can be a maximum of 128 characters in length and can contain letters, special characters, and digits.

Configure scheduling properties

In the Scheduling Properties section, you can enable periodic scheduling for the workspace. You can also specify Default Scheduling Resource Group, Default Data Integration Resource Group, Default Automatic Rerun Times Upon Error, and Default Automatic Rerun Interval Upon Error for the workspace. Scheduling Properties

Nodes can be periodically run in a workspace only after you turn on Enable periodic scheduling for the workspace.

Configure security settings

Security Settings
Parameter Description
Download SELECT Query Result Specifies whether the query results that are returned by SELECT statements in DataStudio can be downloaded. If you turn off this switch, the query results cannot be downloaded.
Change Node Owner by RAM User Specifies whether to allow RAM users to change the owners of their nodes.
Sandbox Whitelist (contains IP addresses and domain names that can be accessed by Shell nodes) The IP addresses or domain names that can be accessed by a Shell node that runs on the default resource group.
Note You must specify public IP addresses or domain names that can be accessed. For internal services in your enterprise, we recommend that you use exclusive resource groups to ensure network accessibility. For more information, see Exclusive resource group mode.

To add an IP address or domain name to the whitelist, perform the following steps:

  1. In the Security Settings section, click Add.
  2. In the Add dialog box, enter an IP address or domain name in the Address field and a port number in the Port field.
  3. Click Confirm.

Associate a MaxCompute compute engine instance with a workspace

  1. In the Compute Engine Information section, click the MaxCompute tab. On this tab, you can view the information about all the available MaxCompute compute engine instances in the workspace.
  2. Click Add Instance.
  3. In the Add a MaxCompute instance dialog box, configure the parameters.
    Add a MaxCompute instance dialog box
    Parameter or section Description
    Instance Display Name The display name of the compute engine instance. The display name can be a maximum of 27 characters in length. It must start with a letter and can contain letters, underscores (_), and digits.
    Region The region of the workspace.
    Payment mode The billing method of the compute engine instance. Valid values: The pay-as-you-go billing method, Monthly package, and Developer version.
    Note An instance of the developer version cannot be associated with a workspace in standard mode.
    Quota group The quotas of computing resources and disk space for the compute engine instance.
    Production Environment The parameters in this section include Project name and Access Identity.
    • Project name: the name of the MaxCompute project in production environment that is associated with the DataWorks workspace.
    • Access Identity: the type of the account used to access the MaxCompute project. Valid values: Alibaba Cloud primary account and Alibaba Cloud sub-account.
    Development Environment The parameters in this section include Project name and Access Identity.
    • Project name: the name of the MaxCompute project in development environment that is associated with the DataWorks workspace.
      Note This MaxCompute project provides computing and storage resources.
    • Access Identity: the type of the account used to access the MaxCompute project. The default value of this parameter is Task owner and cannot be changed.
  4. Click Confirm.
    After the compute engine instance is added, you can configure it as the default instance.

Associate an EMR compute engine instance with a workspace

  1. In the Compute Engine Information section, click the E-MapReduce tab. On this tab, you can view the information about all the available EMR compute engine instances in the workspace.
  2. Click Add Instance.
  3. In the New EMR cluster dialog box, configure the parameters.
    New EMR cluster dialog box
    Parameter Description
    Instance Display Name The display name of the EMR cluster.
    Region The region of the workspace. The region cannot be modified.
    Access Mode The access mode of the EMR cluster. Valid values: Shortcut mode and Security mode.
    Note In this example, an EMR cluster in shortcut mode is associated with the workspace.
    Scheduling access identity The identity that is used to commit the code of an EMR node in the production environment to the EMR cluster after the node is committed to the scheduling system of DataWorks. Valid values: Alibaba Cloud primary account and Alibaba Cloud sub-account.
    Note If you select Alibaba Cloud sub-account, you must specify a RAM user to which the AliyunEMRDevelopAccess policy is attached.
    Access identity The identity that is used to commit the code of an EMR node in the development environment to the EMR cluster. Default value: Task owner.
    Note This parameter is available only for workspaces in standard mode.
    Cluster ID The ID of the EMR cluster. Select an ID from the drop-down list. The selected EMR cluster is used as the runtime environment of EMR nodes.
    Project ID The ID of the EMR project that you want to associate with the workspace. Select an ID from the drop-down list. The selected EMR project is used as the runtime environment of EMR nodes.
    Note If Access Mode is set to Security mode, EMR projects cannot be selected.
    YARN resource queue The name of the resource queue in the EMR cluster. Unless otherwise specified, set the parameter to default.
    Endpoint The endpoint of EMR, which cannot be modified.
  4. Click Confirm.
    After the compute engine instance is added, you can configure it as the default instance and modify the instance configuration based on your requirements.

Associate a Realtime Compute for Apache Flink compute engine instance with a workspace

  1. In the Compute Engine Information section, click the Real-time Computing tab. On this tab, you can view the information about all the available Realtime Compute for Apache Flink compute engine instances in the workspace.
  2. Click Add Instance.
  3. In the Add a real-time computing instance dialog box, configure the parameters.
    Add a real-time computing instance dialog box
    Parameter Description
    Instance Display Name The display name of the Realtime Compute for Apache Flink compute engine instance.
    Region The region of the workspace.
    Select Project The Realtime Compute for Apache Flink project that you want to associate with the workspace. Select a project from the drop-down list. If you need to create a project, click Real-time calculation control platform.
  4. Click Confirm.
    After the compute engine instance is added, you can configure it as the default instance and modify the instance configuration based on your requirements.

Associate a Hologres compute engine instance with a workspace

  1. In the Compute Engine Information section, click the Hologres tab. On this tab, you can view the information about all the available Hologres compute engine instances in the workspace.
  2. Click Bind Hologres Database.
  3. In the Bind Hologres Database dialog box, configure the parameters.
    Hologres
    Parameter Description
    Instance Display Name The display name of the Hologres compute engine instance.
    Access identity The identity used to run the code of committed Hologres nodes. Valid values: Alibaba Cloud primary account and Alibaba Cloud sub-account.
    Hologres instance name The name of the Hologres instance that you want to associate with the workspace.
    Database name The name of the database that was created in SQL Console, such as testdb.
    Server The endpoint of the purchased Hologres instance. This value is automatically generated after you select the Hologres instance.
    Port The port of the purchased Hologres instance. This value is automatically generated after you select the Hologres instance.
  4. Click Test Connectivity.
  5. After the connectivity test is passed, click Confirm.

Associate a Graph Compute compute engine instance with a workspace

  1. In the Compute Engine Information section, click the GraphCompute tab.
  2. Click Bind Graph Compute Instance.
    Notice A Graph Compute instance can be associated with only one DataWorks workspace. After a Graph Compute instance is associated with a DataWorks workspace, the instance cannot be used in other DataWorks workspaces.
  3. In the Bind Graph Compute Instance dialog box, configure the parameters.
    GraphCompute
    Parameter Description
    Instance Display Name The display name of the Graph Compute instance.
    Graph Compute Instance The name of the Graph Compute instance that you want to associate with the workspace.
    Create an instance If you do not have a Graph Compute instance, click Create an instance to purchase a Graph Compute instance.
    Notice By default, each Alibaba Cloud account can purchase only one Graph Compute instance.
  4. Click Bind.

Associate an AnalyticDB for PostgreSQL compute engine instance with a workspace

Notice
  • You can use the AnalyticDB for PostgreSQL compute engine only in DataWorks Standard Edition or a more advanced edition. Therefore, the AnalyticDB for PostgreSQL tab is available only in DataWorks Standard Edition or a more advanced edition.
  • AnalyticDB for PostgreSQL nodes can run only on exclusive resource groups for scheduling.
  1. In the Compute Engine Information section, click the AnalyticDB for PostgreSQL tab.
  2. Click Add Instance.
    For a workspace in standard mode, the development environment is isolated from the production environment. If you are using a workspace in standard mode, you must add instances to both the development and production environments.
  3. In the Add an AnalyticDB for PostgreSQL instance dialog box, configure the parameters. In this example, the workspace is in standard mode.
    ADB
    Parameter Description
    Instance Display Name The display name of the AnalyticDB for PostgreSQL instance. The display name must be unique.
    InstanceName The name of the AnalyticDB for PostgreSQL instance that you want to associate with the workspace.
    DatabaseName The name of the AnalyticDB for PostgreSQL database that you want to associate with the workspace.
    Username The username that you can use to connect to the database. You can obtain the information from the Account Management page in the AnalyticDB for PostgreSQL console. For more information, see Configure an account.
    Password The password that you can use to connect to the database.You can obtain the information from the Account Management page in the AnalyticDB for PostgreSQL console. For more information, see Configure an account.
    Connectivity Test AnalyticDB for PostgreSQL nodes must be run on exclusive resource groups for scheduling. Therefore, you must select an exclusive resource group for scheduling. For more information, see Exclusive resource group mode.

    Click Test connectivity to test the connectivity between the specified exclusive resource group for scheduling and AnalyticDB for PostgreSQL instance. If no exclusive resource group for scheduling is available, click Create a new exclusive Resource Group to create one.

  4. After the connectivity test is passed, click Confirm.

Associate an AnalyticDB for MySQL compute engine instance with a workspace

Notice
  • You can use the AnalyticDB for MySQL compute engine only in DataWorks Standard Edition or a more advanced edition. Therefore, the AnalyticDB for MySQL tab is available only in DataWorks Standard Edition or a more advanced edition.
  • AnalyticDB for MySQL nodes can run only on exclusive resource groups for scheduling.
  1. In the Compute Engine Information section, click the AnalyticDB for MySQL tab.
  2. Click Add Instance.
    For a workspace in standard mode, the development environment is isolated from the production environment. If you are using a workspace in standard mode, you must add instances to both the development and production environments.
  3. In the Add an AnalyticDB for MySQL instance dialog box, configure the parameters. In this example, the workspace is in standard mode.
    AnalyticDB for MySQL
    Parameter Description
    Instance Display Name The display name of the compute engine instance. The value must be unique.
    InstanceName The name of the AnalyticDB for MySQL cluster that you want to associate with the workspace.
    DatabaseName The name of the AnalyticDB for MySQL database that you want to associate with the workspace.
    Username The username that you can use to connect to the database. You can obtain the information from the Accounts page in the Cloud Native Data Warehouse Console. For more information, see Database accounts and permissions.
    Password The password that you can use to connect to the database. You can obtain the information from the Accounts page in the Cloud Native Data Warehouse Console. For more information, see Database accounts and permissions.
    Connectivity Test AnalyticDB for MySQL nodes must be run on exclusive resource groups for scheduling. Therefore, you must select an exclusive resource group for scheduling. For more information, see Exclusive resource group mode.

    Click Test connectivity to test the connectivity between the specified exclusive resource group for scheduling and AnalyticDB for MySQL cluster. If no exclusive resource group for scheduling is available, click Create a new exclusive Resource Group to create one.

  4. After the connectivity test is passed, click Confirm.

Associate a ClickHouse compute engine instance with a workspace

Before you associate a ClickHouse compute engine instance with a workspace, make sure that the following prerequisites are met:
  • A ClickHouse cluster is created.

    For more information about how to create an ApsaraDB for ClickHouse cluster, see Create a cluster.

  • DataWorks is activated, and the workspace that you want to use to connect to the ClickHouse cluster is created. The workspace that is used to connect to the ClickHouse cluster does not need to be associated with compute engines. Therefore, when you create the workspace, you do not need to select an engine. For more information about how to create a workspace, see Create a workspace.
  • An exclusive resource group for scheduling is created and is associated with the DataWorks workspace. For more information, see Create and use an exclusive resource group for scheduling.
    Note We recommend that you associate the exclusive resource group for scheduling with the virtual private cloud (VPC) to which the ClickHouse cluster belongs. If you want to associate the exclusive resource group for scheduling with a different VPC in the same region as the ClickHouse cluster or with a VPC in another region, you must perform complex network connectivity tests. For more information, see Select a network connectivity solution.
  1. In the Compute Engine Information section, click the ClickHouse tab.
  2. Click Add Instance.
    For a workspace in standard mode, the development environment is isolated from the production environment. If you are using a workspace in standard mode, you must add instances to both the development and production environments.
  3. In the Add EMR ClickHouse Cluster dialog box, configure the parameters.
    You can set Cluster Type to Connection String Mode or EMR Cluster Mode for the workspace that DataWorks uses to connect to the ClickHouse cluster. This topic uses a workspace in standard mode to demonstrate how to configure the parameters in each mode.
    • Add a ClickHouse cluster with Cluster Type set to Connection String Mode. Add a ClickHouse cluster
      Parameter Description
      Instance Display Name

      The display name of the compute engine instance. The display name must be unique.

      Cluster Type Set this parameter to Connection String Mode.
      Access Mode

      The parameter value is Shortcut mode and cannot be modified. If the access mode of the ClickHouse compute engine is shortcut mode, the Alibaba Cloud account or RAM users only commit code to the ClickHouse cluster during code running or automatic node scheduling. The username that you specified in the AccessKey ID section is used to run nodes in DataWorks.

      JDBC URL The Java Database Connectivity (JDBC) connection string that is used to connect to the ClickHouse cluster.
      Username The username that you use to connect to the ClickHouse cluster.

      To view the username of an EMR ClickHouse cluster, log on to the EMR console, click the Cluster Management tab, find the ClickHouse cluster, and then click Details in the Actions column. Then, go to the ClickHouse service page to view the username. The following figure shows how to view the username.

      Password The password that you use to connect to the ClickHouse cluster.

      To view the password of an EMR ClickHouse cluster, log on to the EMR console, click the Cluster Management tab, find the ClickHouse cluster, and then click Details in the Actions column. Then, go to the ClickHouse service page to view the password. The following figure shows how to view the password.

      Exclusive Resource Group for Scheduling

      Select an exclusive resource group for scheduling that connects to the DataWorks workspace. If no exclusive resource group for scheduling is available, you can create one. For more information about how to create an exclusive resource group for scheduling and configure network connectivity, see Create and use an exclusive resource group for scheduling.

    • Add a ClickHouse cluster with Cluster Type set to EMR Cluster Mode. EMR Cluster Mode
      Parameter Description
      Instance Display Name

      The display name of the compute engine instance. The display name must be unique.

      Cluster Type Set this parameter to EMR Cluster Mode.
      Access Mode

      The parameter value is Shortcut mode and cannot be modified. If the access mode of the ClickHouse compute engine is shortcut mode, the Alibaba Cloud account or RAM users only commit code to the ClickHouse cluster during code running or automatic node scheduling. The username that you specified in the AccessKey ID section is used to run nodes in DataWorks.

      Select EMR ClickHouse Cluster Set this parameter to the ID of the EMR ClickHouse cluster that you associate with the workspace.
      Note If you log on to the DataWorks console by using a RAM user, you must grant the AliyunEMRDevelopAccess permission to the RAM user in the RAM console before you select a cluster. For more information about how to grant the RAM user the AliyunEMRDevelopAccess permission, see Authorize RAM users.
      Username The username that you use to connect to the ClickHouse cluster.

      To view the username of an EMR ClickHouse cluster, log on to the EMR console, click the Cluster Management tab, find the ClickHouse cluster, and then click Details in the Actions column. Then, go to the ClickHouse service page to view the username. The following figure shows how to view the username.

      Password The password that you use to connect to the ClickHouse cluster.

      To view the password of an EMR ClickHouse cluster, log on to the EMR console, click the Cluster Management tab, find the ClickHouse cluster, and then click Details in the Actions column. Then, go to the ClickHouse service page to view the password. The following figure shows how to view the password.

      Exclusive Resource Group for Scheduling

      Select an exclusive resource group for scheduling that connects to the DataWorks workspace. If no exclusive resource group for scheduling is available, you can create one. For more information about how to create an exclusive resource group for scheduling and configure network connectivity, see Create and use an exclusive resource group for scheduling.

  4. After the connectivity test is passed, click Confirm.