On the Workspaces page in the DataWorks console, you can view all the workspaces within your account and perform relevant operations. For example, you can create, configure, delete, enable, and disable workspaces, and refresh the workspace list.

Go to the Workspaces page

  1. Log on to the DataWorks console by using your Alibaba Cloud account. The Overview page appears.
  2. In the left-side navigation pane, click Workspaces. On the Workspaces page, you can view all the workspaces within your Alibaba Cloud account.
    • Status: This column displays the status of each workspace. The status of a workspace may be Normal, Initializing, Initialization Failed, Deleting, Deleted, Disabled, or Update Failed. After you create a workspace, the workspace enters the Initializing state. Then, it enters the Initialization Failed or Normal state based on the initialization result.

      After you disable a workspace, you can enable it again or delete it. After you enable the workspace again, the workspace enters the Normal state.

    • Service: This column displays the icons of the types of compute engine instances that are associated with the workspace. When you move the pointer over an icon, all icons that represent the types of associated compute engine instances are displayed. If a compute engine instance of a specific type is available, its icon is in blue. If a compute engine instance of a specific type is overdue, its icon is in red and has an overdue payment mark. If a compute engine instance of a specific type is overdue and deleted, its icon is dimmed. In most cases, a compute engine instance of a specific type is automatically deleted if you do not renew it within seven days after it is overdue.

Create a workspace

  1. On the Workspaces page, select a region where you want to create a workspace in the top navigation bar.
    Note
    • The time zone for the region that you select is automatically used as the time zone for scheduling. This indicates that the time zone is used when you configure the scheduling time for a node.
    • If you select the US (Silicon Valley) or Germany (Frankfurt) region to create a workspace for the first time, a prompt message will be displayed. In the prompt message, you can submit a ticket to set the time zone for scheduling to the UTC+8 time zone.
    • Take note of the following items when you change the time zone for scheduling:
      • You cannot change the time zone for scheduling twice. Proceed with caution.
      • Scope of impacts on the time:
        • The time zone that is used when you configure the scheduling time for a node on the DataStudio page is changed. For more information about how to configure the scheduling time for a node, see Configure time properties.
        • The time zone for the time that is displayed on pages in Operation Center is changed. For more information about Operation Center, see Overview.
        Except for the page in DataStudio on which you configure the scheduling time for a node and the pages in Operation Center, other pages in DataStudio and pages of other DataWorks modules automatically use the time zone for the region in which the workspace resides.
      • Scope of impacts on users:

        The change of the time zone for scheduling takes effect for regions. After you change the time zone for scheduling, the time zone that is used when you configure the scheduling time for a node and the time zone for the time that is displayed on pages in Operation Center are changed for all workspaces of all users in the current region.

  2. Click Create Workspace. The Basic Settings step of the Create Workspace wizard appears. Configure the parameters and click Next.
    Section Parameter Description
    Basic Information Workspace Name The name of the workspace. The name must be 3 to 23 characters in length and can contain letters, underscores (_), and digits. The name must start with a letter.
    Display Name The display name of the workspace. The display name can be a maximum of 23 characters in length. It can contain letters, underscores (_), and digits and must start with a letter.
    Mode The mode of the workspace. Valid values: Basic Mode (Production Environment Only) and Standard Mode (Development and Production Environments).
    • Basic Mode (Production Environment Only): A workspace in basic mode is associated with only one MaxCompute project. Workspaces in basic mode do not isolate the development environment from the production environment. In these workspaces, you can perform only basic data development and cannot strictly control the data development process and the permissions on tables.
    • Standard Mode (Development and Production Environments): A workspace in standard mode is associated with two MaxCompute projects. One serves as the development environment, and the other serves as the production environment. Workspaces in standard mode allow you to develop code in a standard way and strictly control the permissions on tables. These workspaces impose limits on table operations in the production environment for data security.

    For more information, see Basic mode and standard mode.

    Description The description of the workspace.
    Advanced Settings Download SELECT Query Result Specifies whether the query results that are returned by SELECT statements in DataStudio can be downloaded. If you turn off this switch, the query results cannot be downloaded. You can change the setting of this parameter for the workspace in the Workspace Settings panel after the workspace is created. For more information, see Configure security settings.
  3. In the Select Engines and Services step, select compute engines and services based on your business requirements and click Next.
    DataWorks is available as a commercial service. If you have not activated DataWorks in a region, activate it before you create a workspace in the region.
    Section Parameter Description
    DataWorks Services
    Note The services that are enabled for the workspace. By default, the check box in this section is selected.
    Data Integration Provides a stable, efficient, and scalable data synchronization platform. Data Integration is designed to efficiently transmit and synchronize data between heterogeneous data sources in complex network environments. For more information, see Data Integration.
    Data Analytics Allows you to design a data computing process that consists of multiple mutually dependent nodes based on your business requirements. The nodes are run by the scheduling system of DataWorks. For more information, see DataStudio.
    Operation Center Allows you to view all your nodes and node instances and perform operations on them. For more information, see Operation Center.
    Data Quality Provides an end-to-end data quality solution that relies on DataWorks. This solution allows you to explore data, compare data, monitor data quality, scan SQL statements, and use intelligent alerting. For more information, see Data Quality.
    Compute Engines MaxCompute Provides a rapid, fully managed data warehouse solution that can process terabytes or petabytes of data. MaxCompute supports fast computing on large amounts of data, effectively reduces costs for enterprises, and ensures data security. For more information, see the MaxCompute documentation.
    Note After you create workspaces in DataWorks, you must associate them with MaxCompute projects. Otherwise, the error project not found is returned when you run commands in the workspaces.
    Realtime Compute Allows you to develop streaming computing nodes in DataWorks.
    E-MapReduce Allows you to use E-MapReduce (EMR) to develop big data processing nodes in DataWorks. For more information, see the EMR documentation.
    Notice
    Hologres Allows you to use HoloStudio in DataWorks to manage internal and foreign tables and develop Hologres SQL nodes.
    Graph Compute Allows you to use Graph Studio in DataWorks to manage Graph Compute instances.
    AnalyticDB for PostgreSQL Allows you to develop AnalyticDB for PostgreSQL nodes in DataWorks. For more information, see Overview.
    Note You can use the AnalyticDB for PostgreSQL compute engine only in DataWorks Standard Edition or a more advanced edition.
    AnalyticDB for MySQL Allows you to develop AnalyticDB for MySQL nodes in DataWorks. For more information about AnalyticDB for MySQL, see Product introduction.
    Note You can use the AnalyticDB for MySQL compute engine only in DataWorks Standard Edition or a more advanced edition.
    Machine Learning Services PAI Studio Uses statistical algorithms to learn large amounts of historical data and generate an empirical model to provide business strategies.
  4. In the Engine Details step, configure the parameters for the selected compute engines.
    • Associate a MaxCompute compute engine instance with a workspace
      Parameter Description
      Method Specifies whether to create a MaxCompute project or use an existing MaxCompute project. Valid values: Create Project and Associate Existing Project.
      Instance Display Name The display name of the MaxCompute compute engine instance. The display name must be 3 to 28 characters in length and can contain letters, underscores (_), and digits. The display name must start with a letter.
      Region The region of the workspace.
      Payment mode The billing method of the MaxCompute compute engine instance. Valid values: The pay-as-you-go billing method, Monthly package, and Developer version.
      Note A MaxCompute compute engine instance of the developer version cannot be associated with a workspace in standard mode.
      Quota group The quotas of computing resources and disk space for the MaxCompute compute engine instance.
      MaxCompute data type The data type edition of the MaxCompute compute engine instance. Valid values: 2.0 data type (recommended), 1.0 data type (for users who already use 1.0 data type), and Hive compatible types (for Hive migration users). For more information, see Data type editions.
      Whether to encrypt Specifies whether to encrypt the MaxCompute compute engine instance.
      Production Environment Configure the MaxCompute Project Name and Access Identity parameters for the production environment.
      • MaxCompute Project Name: the name of the MaxCompute project that you want to associate with the workspace as the compute engine instance in the production environment.
      • Access Identity: the identity that is used to access the MaxCompute project. Valid values: Alibaba Cloud primary account, Alibaba Cloud sub-account, and Alibaba Cloud RAM role.
      Development Environment Configure the MaxCompute Project Name and Access Identity parameters for the development environment.
      • MaxCompute Project Name: the name of the MaxCompute project that you want to associate with the workspace as the compute engine instance in the development environment.
        Note This MaxCompute project provides computing and storage resources.
      • Access Identity: The default value is Task owner and cannot be changed.
    • Associate an EMR compute engine instance with a workspace
      Parameter Description
      Instance Display Name The display name of the EMR compute engine instance.
      Region The region of the workspace.
      Access Mode
      • In shortcut mode, if you run or schedule EMR nodes in DataWorks by using an Alibaba Cloud account or a RAM user, the code of the nodes is committed to the EMR compute engine instance and run by a Hadoop user in the EMR compute engine instance.
      • In security mode, if you run or schedule EMR nodes in DataWorks by using an Alibaba Cloud account or a RAM user, the code of the nodes is committed to the EMR compute engine instance and run by a user that has the same name as the Alibaba Cloud account or RAM user in the EMR compute engine instance. You can use EMR Ranger to manage the permissions of each user in the EMR compute engine instance. This ensures that Alibaba Cloud accounts, node owners, or RAM users have different data permissions when they run EMR nodes in DataWorks. This way, higher data security is implemented.
      Scheduling access identity
      • If you set the Access Mode parameter to Shortcut mode, you can commit the code of an EMR node to the EMR compute engine instance by using an Alibaba Cloud account or a RAM user after the node is committed and deployed to the scheduling system of DataWorks.
      • If you set the Access Mode parameter to Security mode, you can commit the code of an EMR node to the EMR compute engine instance by using an Alibaba Cloud account or a RAM user or as a node owner after the node is committed and deployed to the production environment. A Hadoop user that corresponds to the identity in the EMR compute engine instance is used to run the code.
      Note
      • This parameter is available only for the production environment.
      • Before you associate an EMR compute engine instance with a workspace, you must attach the AliyunEMRDevelopAccess policy to workspace roles such as developers and administrators. This way, the roles can be used to create and run EMR nodes in DataStudio.
        • The AliyunEMRDevelopAccess policy is attached to Alibaba Cloud accounts by default.
        • If you want to use a RAM user to run EMR nodes, you must attach the AliyunEMRDevelopAccess policy to the RAM user. For more information, see Grant permissions to RAM users.
      Access identity The identity that is used to commit the code of an EMR node in the development environment to the EMR compute engine instance. Default value: Task owner.
      Note
      • This parameter is available only for the development environment of a workspace in standard mode.
      • Task owner can be an Alibaba Cloud account or a RAM user.
        Before you associate an EMR compute engine instance with a workspace, you must attach the AliyunEMRDevelopAccess policy to workspace roles such as developers and administrators. This way, the roles can be used to create and run EMR nodes in DataStudio.
        • The AliyunEMRDevelopAccess policy is attached to Alibaba Cloud accounts by default.
        • If you want to use a RAM user to run EMR nodes, you must attach the AliyunEMRDevelopAccess policy to the RAM user.
      Cluster ID The ID of the EMR cluster that you want to associate with the workspace as the compute engine instance. Select an ID from the drop-down list. The EMR cluster is used as the runtime environment of EMR nodes.
      Project ID The ID of the EMR project that you want to associate with the workspace. Select an ID from the drop-down list. The EMR project is used as the runtime environment of EMR nodes.
      Note If you set Access Mode to Security mode, no EMR project IDs are displayed and can be selected.
      YARN resource queue The name of the YARN resource queue in the EMR cluster. Unless otherwise specified, set this parameter to default.
      Override DataStudio YARN resource queue
      • Override DataStudio YARN resource queue is selected: All EMR nodes are run based on the specified YARN resource queue.
      • Override DataStudio YARN resource queue is not selected:
        • If you configure the queue parameter for an EMR node on the Advanced Settings tab, the EMR node is run based on the configured YARN resource queue.
        • If you do not configure the queue parameter for an EMR node or you delete the setting of the queue parameter for an EMR node on the Advanced Settings tab, the EMR node is run based on the specified YARN resource queue.
      Note If the Override DataStudio YARN resource queue check box is not displayed, submit a ticket to upgrade the DataWorks edition.
      Endpoint The endpoint of the EMR cluster. The value of this parameter cannot be changed.
      Resource Group Select an exclusive resource group for scheduling that connects to the DataWorks workspace. If no exclusive resource group for scheduling is available, create one. For more information about how to create an exclusive resource group for scheduling and configure network connectivity, see Create and use an exclusive resource group for scheduling.

      After you select an exclusive resource group for scheduling, click Test Connectivity to test the connectivity between the exclusive resource group for scheduling and the EMR cluster. After the connectivity test is passed, the system initializes the exclusive resource group for scheduling.

      Note The exclusive resource group for scheduling must be reinitialized if the configuration of the EMR cluster is modified.
    • Associate a Hologres compute engine instance with a workspace
      Parameter Description
      Instance Display Name The display name of the Hologres compute engine instance.
      Access identity
      • The identity that is used to run the code of committed Hologres nodes. Valid values: Alibaba Cloud primary account and Alibaba Cloud sub-account.
        Note This parameter is available only for the production environment.
      • The default value of this parameter for the development environment is Task owner.
      Hologres instance name The name of the Hologres instance that you want to associate with the workspace as the compute engine instance.
      Database name The name of the database that is created in SQL Console, such as testdb.
    • Associate a Graph Compute compute engine instance with a workspace
      Parameter Description
      Instance Display Name The display name of the Graph Compute compute engine instance. The display name must be 3 to 27 characters in length and can contain letters, underscores (_), and digits. The display name must start with a letter.
      Graph Compute Instance Name The name of the Graph Compute instance that you want to associate with the workspace as the compute engine instance. If you do not have a Graph Compute instance, click Create an instance to purchase a Graph Compute instance.
      Notice By default, each Alibaba Cloud account can purchase only one Graph Compute instance.
    • Associate an AnalyticDB for PostgreSQL compute engine instance with a workspace
      Parameter Description
      Instance Display Name The display name of the AnalyticDB for PostgreSQL compute engine instance. The display name must be unique.
      InstanceName The name of the AnalyticDB for PostgreSQL instance that you want to associate with the workspace as the compute engine instance.
      DatabaseName The name of the AnalyticDB for PostgreSQL database that you want to associate with the workspace.
      Username The username that you can use to connect to the database. You can obtain the information from the Account Management page in the AnalyticDB for PostgreSQL console. For more information, see Create a database account.
      Password The password that you can use to connect to the database.You can obtain the information from the Account Management page in the AnalyticDB for PostgreSQL console. For more information, see Create a database account.
      Connectivity Test AnalyticDB for PostgreSQL nodes must be run on exclusive resource groups for scheduling. Therefore, you must select an exclusive resource group for scheduling. For more information, see Exclusive resource group mode.

      Click Test Connectivity to test the connectivity between the specified exclusive resource group for scheduling and AnalyticDB for PostgreSQL instance. If no exclusive resource group for scheduling is available, click Create Exclusive Resource Group to create one.

    • Associate an AnalyticDB for MySQL compute engine instance with a workspace
      Notice
      • You can use the AnalyticDB for MySQL compute engine only in DataWorks Standard Edition or a more advanced edition. Therefore, the AnalyticDB for MySQL tab is displayed only in DataWorks Standard Edition or a more advanced edition.
      • AnalyticDB for MySQL nodes can run only on exclusive resource groups for scheduling.
      • If you want to use a RAM user to associate an AnalyticDB for MySQL compute engine instance with a workspace, you must make sure that the RAM user is granted the DescribeDBClusters permission. For more information about how to grant permissions to a RAM user, see RAM users and permissions.
      Parameter Description
      Instance Display Name The display name of the AnalyticDB for MySQL compute engine instance. The display name must be unique.
      InstanceName The name of the AnalyticDB for MySQL cluster that you want to associate with the workspace as the compute engine instance.
      DatabaseName The name of the AnalyticDB for MySQL database that you want to associate with the workspace.
      Username The username that you can use to connect to the database. You can obtain the information from the Accounts page in the AnalyticDB for MySQL console. For more information, see Database accounts and permissions.
      Password The password that you can use to connect to the database.You can obtain the information from the Accounts page in the AnalyticDB for MySQL console. For more information, see Database accounts and permissions.
      Connectivity Test AnalyticDB for MySQL nodes must be run on exclusive resource groups for scheduling. Therefore, you must select an exclusive resource group for scheduling. For more information, see Exclusive resource group mode.

      Click Test Connectivity to test the connectivity between the specified exclusive resource group for scheduling and AnalyticDB for MySQL cluster. If no exclusive resource group for scheduling is available, click Create Exclusive Resource Group to create one.

  5. Click Create Workspace.
After the workspace is created, you can view the information about the workspace on the Workspaces page.
Note
  • If you are the owner of a workspace, all data in the workspace belongs to you. Other users can access the workspace only after you grant permissions to them. If you create a workspace by using a RAM user of an Alibaba Cloud account, the workspace belongs to both the RAM user and the Alibaba Cloud account.
  • You can add a RAM user to a workspace so that the RAM user can use the workspace. This way, the RAM user does not need to create a workspace.

Configure a workspace

On the Workspaces page, you can find a workspace and click the More icon in the Actions column. You can select Workspace Settings from the drop-down list that appears to configure the basic and advanced settings of the workspace. For example, you can modify the display name and description of the workspace and enable the recurrence feature for the workspace. For more information, see Configure a workspace.

Configure a workspace

Configure compute engines

You can configure DataWorks services, compute engines, and machine learning services for a workspace. You can select a compute engine of a specific type in the Modify service configuration pane of a workspace only after you purchase a compute engine instance of this compute engine type.

On the Workspaces page, you can find a workspace and click the More icon in the Actions column. You can select Engine Configuration from the drop-down list that appears. In the dialog box that appears, you can configure the parameters.

To use a new compute engine in your workspace, select the compute engine and click Next. In the Engine Details step, configure the parameters for the compute engine and click OK.

Go to the DataStudio, Data Integration, or Data Map page

On the Workspaces page, find a workspace and click Data Development, Data Integration, or Data Map in the Actions column to go to the related page.

Delete or disable a workspace

On the Workspaces page, you can find a workspace and click the More icon in the Actions column. You can select Delete Workspace or Disable Workspace from the drop-down list that appears to delete or disable the current workspace.
  • Delete a workspace
    Select Delete Workspace from the drop-down list that appears. In the Delete Workspace panel, enter the verification code YES and click OK.
    Note
    • In the Delete Workspace panel, the verification code is fixed as YES.
    • After you delete a workspace, you cannot recover it. Proceed with caution when you delete a workspace.
  • Disable a workspace
    Select Disable Workspace from the drop-down list that appears. In the Disable Workspace panel, click OK.
    Note
    • After you disable a workspace, the system no longer generates instances for auto triggered nodes in the workspace. The instances that are generated before you disable the workspace automatically run at the specified time. However, you cannot log on to the workspace to view information about these instances.
    • After you disable a workspace, compute engine instances that are associated with the workspace still exist, and you may be still charged for the compute engine instances that you use to store data. You are not charged in the DataWorks service but in the Alibaba Cloud services to which the compute engine instances you use to store data belong. If you have questions about billing, you can contact the technical support of the Alibaba Cloud services.