A workspace in DataWorks is similar to a project in MaxCompute. This topic describes how to create a workspace in the DataWorks console.

Prerequisites

An Alibaba Cloud account is available. For more information, see Activate DataWorks. The Alibaba Cloud account is required to create a workspace, as described in this topic.
Note

If a Resource Access Management (RAM) user is required to perform operations on behalf of an Alibaba Cloud account, the RAM user must be granted required permissions. For more information about how to grant permissions to a RAM user, see How do I grant the Aliyundataworksfullaccess permission to a RAM user by using the Alibaba Cloud account? and User permission management.

Procedure

  1. Log on to the DataWorks console by using your Alibaba Cloud account.
  2. On the Overview page, click create Workspace in the Frequently Used Workspaces section on the right.
    You can also click Workspaces in the left-side navigation pane and click Create Workspace on the page that appears.
  3. In the Create Workspace panel, set the parameters in the Basic Settings step and click Next.
    Section Parameter Description
    Basic Information Workspace Name The name of the workspace. The name must be 3 to 23 characters in length and can contain letters, underscores (_), and digits. The name must start with a letter.
    Display Name The display name of the workspace. The display name can be a maximum of 23 characters in length and can contain letters, underscores (_), and digits. The display name must start with a letter.
    Mode The mode of the workspace. Valid values: Basic Mode (Production Environment Only) and Standard Mode (Development and Production Environments).
    • Basic Mode (Production Environment Only): A workspace in basic mode is associated with only one MaxCompute project. Workspaces in basic mode do not isolate the development environment from the production environment. In these workspaces, you can perform only basic data development and cannot strictly control the data development process and the permissions on tables.
    • Standard Mode (Development and Production Environments): A workspace in standard mode is associated with two MaxCompute projects. One serves as the development environment, and the other serves as the production environment. Workspaces in standard mode allow you to develop code in a standard way and strictly control the permissions on tables. These workspaces impose limits on table operations in the production environment for data security.

    For more information, see Basic mode and standard mode.

    Description The description of the workspace.
    Advanced Settings Download SELECT Query Result Specifies whether the query results that are returned by SELECT statements in DataStudio can be downloaded. If you turn off this switch, the query results cannot be downloaded. You can change the setting of this parameter for the workspace in the Workspace Settings panel after the workspace is created. For more information, see Configure security settings.
  4. In the Select Engines and Services step, select the compute engines and services based on your needs and click Next.
    Note
    • If you need to bind a compute engine, you must first activate the corresponding service, such as Realtime Compute, E-MapReduce, Hologres, Graph Compute, AnalyticDB for MySQL, or AnalyticDB for PostgreSQL. You cannot select the checkbox for a compute engine that is not activated.
    • If you do not select a compute engine when you create a workspace, you cannot perform operations that are related to the compute engine on the DataStudio page. For example, you cannot create tables or cleanse data based on engine nodes.
    DataWorks is available as a commercial service. If you have not activated DataWorks in a region, activate it before you create a workspace in the region.
    Section Parameter Description
    DataWorks Services
    Note The services that are enabled for the workspace. By default, the check box in this section is selected.
    Data Integration Provides a stable, efficient, and scalable data synchronization platform. Data Integration is designed to efficiently transmit and synchronize data between heterogeneous data sources in complex network environments. For more information, see Data Integration.
    Data Analytics Allows you to design a data computing process that consists of multiple mutually dependent nodes based on your business requirements. The nodes are run by the scheduling system of DataWorks. For more information, see DataStudio.
    Operation Center Allows you to view all your nodes and node instances and perform operations on them. For more information, see Operation Center.
    Data Quality Provides an end-to-end data quality solution that relies on DataWorks. This solution allows you to explore data, compare data, monitor data quality, scan SQL statements, and use intelligent alerting. For more information, see Data Quality.
    Compute Engines MaxCompute Provides a rapid, fully managed data warehouse solution that can process terabytes or petabytes of data. MaxCompute supports fast computing on large amounts of data, effectively reduces costs for enterprises, and ensures data security. For more information, see the MaxCompute documentation.
    Note After you create workspaces in DataWorks, you must associate them with MaxCompute projects. Otherwise, the error project not found is returned when you run commands in the workspaces.
    Realtime Compute Allows you to develop streaming computing nodes in DataWorks.
    E-MapReduce Allows you to use E-MapReduce (EMR) to develop big data processing nodes in DataWorks. For more information, see the EMR documentation.
    Hologres Allows you to use HoloStudio in DataWorks to manage internal and foreign tables and develop Hologres SQL nodes.
    Graph Compute Allows you to use Graph Studio in DataWorks to manage Graph Compute instances.
    AnalyticDB for PostgreSQL Allows you to develop AnalyticDB for PostgreSQL nodes in DataWorks. For more information, see Overview.
    Note You can use the AnalyticDB for PostgreSQL compute engine only in DataWorks Standard Edition or a more advanced edition.
    AnalyticDB for MySQL Allows you to develop AnalyticDB for MySQL nodes in DataWorks. For more information about AnalyticDB for MySQL, see Product introduction.
    Note You can use the AnalyticDB for MySQL compute engine only in DataWorks Standard Edition or a more advanced edition.
    Machine Learning Services PAI Studio Uses statistical algorithms to learn large amounts of historical data and generate an empirical model to provide business strategies.
  5. In the Engine Details step, set the parameters for the selected compute engines.
    • Associate a MaxCompute compute engine instance with the workspace
      Parameter Description
      Method Specifies whether to create a MaxCompute project or use an existing MaxCompute project. Valid values: Create Project and Associate Existing Project.
      Instance Display Name The display name of the MaxCompute compute engine instance. The display name must be 3 to 28 characters in length and can contain letters, underscores (_), and digits. The display name must start with a letter.
      Region The region of the workspace.
      Payment mode The billing method of the MaxCompute compute engine instance. Valid values: The pay-as-you-go billing method, Monthly package, and Developer version.
      Note A MaxCompute compute engine instance of the developer version cannot be associated with a workspace in standard mode.
      Quota group The quotas of computing resources and disk space for the MaxCompute compute engine instance.
      MaxCompute data type The data type edition of the MaxCompute compute engine instance. Valid values: 2.0 data type (recommended), 1.0 data types (for users who already use 1.0 data type), and Hive compatible types (for Hive migration users). For more information, see Data type editions.
      Whether to encrypt Specifies whether to encrypt the MaxCompute compute engine instance,For more information, see Data encryption.
      Production Environment Configure the MaxCompute Project Name and Access Identity parameters for the production environment.
      • MaxCompute Project Name: the name of the MaxCompute project that you want to associate with the workspace as the compute engine instance in the production environment.
      • Access Identity: the identity that is used to access the MaxCompute project. Valid values: Alibaba Cloud primary account, Alibaba Cloud sub-account, and Alibaba Cloud RAM role.
      Development Environment Configure the MaxCompute Project Name and Access Identity parameters for the development environment.
      • MaxCompute Project Name: the name of the MaxCompute project that you want to associate with the workspace as the compute engine instance in the development environment.
        Note This MaxCompute project provides computing and storage resources.
      • Access Identity: The default value is Task owner and cannot be changed.
    • Associate an E-MapReduce compute engine instance with the workspace
      Parameter Description
      Instance Display Name The display name of the EMR compute engine instance.
      Region The region of the workspace.
      Access Mode
      • In shortcut mode, if you run or schedule EMR nodes in DataWorks by using an Alibaba Cloud account or a RAM user, the code of the nodes is committed to the EMR compute engine instance and run by a Hadoop user in the EMR compute engine instance.
      • In security mode, if you run or schedule EMR nodes in DataWorks by using an Alibaba Cloud account or a RAM user, the code of the nodes is committed to the EMR compute engine instance and run by a user that has the same name as the Alibaba Cloud account or RAM user in the EMR compute engine instance. You can use EMR Ranger to manage the permissions of each user in the EMR compute engine instance. This ensures that Alibaba Cloud accounts, node owners, or RAM users have different data permissions when they run EMR nodes in DataWorks. This way, higher data security is implemented.
      Scheduling access identity
      • If you set the Access Mode parameter to Shortcut mode, you can commit the code of an EMR node to the EMR compute engine instance by using an Alibaba Cloud account or a RAM user after the node is committed and deployed to the scheduling system of DataWorks.
      • If you set the Access Mode parameter to Security mode, you can commit the code of an EMR node to the EMR compute engine instance by using an Alibaba Cloud account or a RAM user or as a node owner after the node is committed and deployed to the production environment. A Hadoop user that corresponds to the identity in the EMR compute engine instance is used to run the code.
      Note
      • This parameter is available only for the production environment.
      • Before you associate an EMR compute engine instance with a workspace, you must attach the AliyunEMRDevelopAccess policy to workspace roles such as developers and administrators. This way, the roles can be used to create and run EMR nodes in DataStudio.
        • The AliyunEMRDevelopAccess policy is attached to Alibaba Cloud accounts by default.
        • If you want to use a RAM user to run EMR nodes, you must attach the AliyunEMRDevelopAccess policy to the RAM user. For more information, see Grant permissions to RAM users.
      Access identity The identity that is used to commit the code of an EMR node in the development environment to the EMR compute engine instance. Default value: Task owner.
      Note
      • This parameter is available only for the development environment of a workspace in standard mode.
      • Task owner can be an Alibaba Cloud account or a RAM user.
        Before you associate an EMR compute engine instance with a workspace, you must attach the AliyunEMRDevelopAccess policy to workspace roles such as developers and administrators. This way, the roles can be used to create and run EMR nodes in DataStudio.
        • The AliyunEMRDevelopAccess policy is attached to Alibaba Cloud accounts by default.
        • If you want to use a RAM user to run EMR nodes, you must attach the AliyunEMRDevelopAccess policy to the RAM user.
      Cluster ID The ID of the EMR cluster that you want to associate with the workspace as the compute engine instance. Select an ID from the drop-down list. The EMR cluster is used as the runtime environment of EMR nodes.
      Project ID The ID of the EMR project that you want to associate with the workspace. Select an ID from the drop-down list. The EMR project is used as the runtime environment of EMR nodes.
      Note If you set Access Mode to Security mode, no EMR project IDs are displayed and can be selected.
      YARN resource queue The name of the resource queue in the EMR cluster. Unless otherwise specified, set this parameter to default.
      Endpoint The endpoint of the EMR cluster. The value of this parameter cannot be changed.
      Resource Group Select an exclusive resource group for scheduling that connects to the DataWorks workspace. If no exclusive resource group for scheduling is available, create one. For more information about how to create an exclusive resource group for scheduling and configure network connectivity, see Create and use an exclusive resource group for scheduling.

      After you select an exclusive resource group for scheduling, click Test Connectivity to test the connectivity between the exclusive resource group for scheduling and the EMR cluster.

    • Associate a Hologres compute engine instance with the workspace
      Parameter Description
      Instance Display Name The display name of the Hologres compute engine instance.
      Access identity
      • The identity that is used to run the code of committed Hologres nodes. Valid values: Alibaba Cloud primary account and Alibaba Cloud sub-account.
        Note This parameter is available only for the production environment.
      • The default value of this parameter for the development environment is Task owner.
      Hologres instance name The name of the Hologres instance that you want to associate with the workspace as the compute engine instance.
      Database name The name of the database that is created in SQL Console, such as testdb.
    • Associate a Graph Compute compute engine instance with the workspace
      Parameter Description
      Instance Display Name The display name of the Graph Compute compute engine instance. The display name must be 3 to 27 characters in length and can contain letters, underscores (_), and digits. The display name must start with a letter.
      Graph Compute Instance Name The name of the Graph Compute instance that you want to associate with the workspace as the compute engine instance. If you do not have a Graph Compute instance, click Create an instance to purchase a Graph Compute instance.
      Notice By default, each Alibaba Cloud account can purchase only one Graph Compute instance.
    • Associate an AnalyticDB for PostgreSQL compute engine instance with the workspace
      Parameter Description
      Instance Display Name The display name of the AnalyticDB for PostgreSQL compute engine instance. The display name must be unique.
      InstanceName The name of the AnalyticDB for PostgreSQL instance that you want to associate with the workspace as the compute engine instance.
      DatabaseName The name of the AnalyticDB for PostgreSQL database that you want to associate with the workspace.
      Username The username that you can use to connect to the database. You can obtain the information from the Account Management page in the AnalyticDB for PostgreSQL console. For more information, see Create a database account.
      Password The password that you can use to connect to the database.You can obtain the information from the Account Management page in the AnalyticDB for PostgreSQL console. For more information, see Create a database account.
      Connectivity Test AnalyticDB for PostgreSQL nodes must be run on exclusive resource groups for scheduling. Therefore, you must select an exclusive resource group for scheduling. For more information, see Exclusive resource group mode.

      Click Test Connectivity to test the connectivity between the specified exclusive resource group for scheduling and AnalyticDB for PostgreSQL instance. If no exclusive resource group for scheduling is available, click Create Exclusive Resource Group to create one.

    • Associate an AnalyticDB for MySQL compute engine instance with the workspace
      Parameter Description
      Instance Display Name The display name of the AnalyticDB for MySQL compute engine instance. The display name must be unique.
      InstanceName The name of the AnalyticDB for MySQL cluster that you want to associate with the workspace as the compute engine instance.
      DatabaseName The name of the AnalyticDB for MySQL database that you want to associate with the workspace.
      Username The username that you can use to connect to the database. You can obtain the information from the Accounts page in the AnalyticDB for MySQL console. For more information, see Database accounts and permissions.
      Password The password that you can use to connect to the database.You can obtain the information from the Accounts page in the AnalyticDB for MySQL console. For more information, see Database accounts and permissions.
      Connectivity Test AnalyticDB for MySQL nodes must be run on exclusive resource groups for scheduling. Therefore, you must select an exclusive resource group for scheduling. For more information, see Exclusive resource group mode.

      Click Test Connectivity to test the connectivity between the specified exclusive resource group for scheduling and AnalyticDB for MySQL cluster. If no exclusive resource group for scheduling is available, click Create Exclusive Resource Group to create one.

  6. Click Create Workspace.
    After the workspace is created, you can view the information about the workspace on the Workspaces page.
    Note
    • If you are the owner of a workspace, all data in the workspace belongs to you. Other users can access the workspace only after you grant permissions to them. If you create a workspace as a RAM user, the workspace belongs to the RAM user and the Alibaba Cloud account that manages the RAM user.
    • You can add a RAM user to a workspace so that the RAM user can use the workspace. This way, the RAM user does not need to create a workspace.

What to do next

You have learned how to create a workspace. You can now proceed with the next tutorial to add workspace members or directly perform operations that are described in Quick Start. Quick Start guides you through a complete process of data development and O&M.