A workspace in DataWorks is similar to a project in MaxCompute. This topic describes how to create a workspace in the DataWorks console.

Prerequisites

An Alibaba Cloud account is prepared. For more information, see Prepare an Alibaba Cloud account. The Alibaba Cloud account is required to create a workspace, as described in this topic.
Note The procedure for creating a workspace as a RAM user is the same as that described in this topic.

Procedure

  1. Log on to the DataWorks console by using your Alibaba Cloud account.
  2. On the Overview page, click Create Workspace in the Shortcuts section on the right.
    You can also click Workspaces in the left-side navigation pane and click Create Workspace on the page that appears.
  3. In the Create Workspace panel, set the parameters in the Basic Settings step and click Next.
    Section Parameter Description
    Basic Information Workspace Name The name of the workspace. The name must be 3 to 23 characters in length. It must start with a letter and can contain only letters, underscores (_), and digits.
    Display Name The display name of the workspace. The display name can be up to 23 characters in length. It must start with a letter and can contain only letters, underscores (_), and digits.
    Mode The mode of the workspace. Valid values: Basic Mode (Production Environment Only) and Standard Mode (Development and Production Environments).
    • Basic Mode (Production Environment Only): A workspace in basic mode is associated with only one MaxCompute project. Workspaces in basic mode do not isolate the development environment from the production environment. In these workspaces, you can only perform basic data development and cannot strictly control the data development process and table permissions.
    • Standard Mode (Development and Production Environments): A workspace in standard mode is associated with two different MaxCompute projects. One of the projects serves as the development environment, and the other serves as the production environment. Workspaces in standard mode allow you to develop code in a standard way and strictly control table permissions. These workspaces impose limits on table operations in the production environment for data security.
    Description The description of the workspace.
    Advanced Settings Download SELECT Query Result Specifies whether workspace members can download the query results that are returned by SELECT statements in DataStudio. If you turn off this switch, workspace members cannot download the query results.
  4. In the Select Engines and Services step, select required compute engines and services and click Next.
    Note
    • If you need to bind a compute engine, you must first activate the corresponding service, such as Realtime Compute, E-MapReduce, Hologres, Graph Compute, or AnalyticDB for PostgreSQL. Otherwise, the compute engine does not appear in the Select Engines and Services step for you to select.
    • If you do not select a required compute engine when you create a workspace, you cannot perform operations that are related to the compute engine on the DataStudio page. For example, you cannot create tables.
    DataWorks is now available as a commercial service. If you have not activated DataWorks in a region, activate it before you create a workspace in the region.
    Section Service or engine Description
    DataWorks Services
    Note The services that are enabled for the workspace. By default, the check box in this section is selected.
    Data Integration Provides a data synchronization platform that features stable, efficient, and scalable services. Data Integration is designed to transmit and synchronize data efficiently between various heterogeneous data stores in complex network environments. For more information, see Data Integration.
    Data Analytics Allows you to design a data computing process that consists of multiple mutually dependent nodes based on business needs to automatically run the nodes in Operation Center. For more information, see Data Analytics.
    Operation Center Allows you to view all your nodes and node instances and perform relevant operations on them as needed. For more information, see Operation Center.
    Data Quality Provides a comprehensive data quality solution that relies on DataWorks. This solution allows you to explore data, compare data, monitor data quality, scan SQL statements, and use intelligent alerting. For more information, see Data Quality.
    Compute Engines MaxCompute Provides a rapid and fully-managed data warehouse solution that can process terabytes or petabytes of data. MaxCompute supports fast computing on a large amount of data, effectively saves costs for enterprises, and ensures data security. For more information, see the MaxCompute documentation.
    Note After you create workspaces in DataWorks, you must associate them with MaxCompute projects. Otherwise, the error project not found is returned when you run commands in the workspaces.
    Realtime Compute Allows you to use Stream Studio in DataWorks to develop streaming computing nodes.
    E-MapReduce Allows you to use E-MapReduce to develop big data processing nodes in DataWorks. For more information, see the E-MapReduce documentation.
    Hologres Allows you to use HoloStudio in DataWorks to manage internal and foreign tables and develop SQL nodes of Hologres.
    Graph Compute Allows you to use Graph Studio in DataWorks to manage Graph Compute instances.
    AnalyticDB for PostgreSQL Allows you to use AnalyticDB for PostgreSQL to develop AnalyticDB for PostgreSQL nodes in DataWorks. For more information, see Overview.
    Note You can use the AnalyticDB for PostgreSQL compute engine only in DataWorks Standard Edition or a more advanced edition.
    AnalyticDB for MySQL Allows you to use AnalyticDB for MySQL to develop AnalyticDB for MySQL nodes in DataWorks.
    Machine Learning Services PAI Studio Uses statistical algorithms to learn large amounts of historical data and generate an empirical model to provide business strategies.
  5. In the Engine Details step, set the parameters for the selected compute engines.
    Engine Parameter or button Description
    MaxCompute Instance Display Name The display name of the compute engine instance. The display name must start with a letter and can contain only letters, underscores (_), and digits.
    Resource Group The quotas of computing resources and disk space for the compute engine instance.
    MaxCompute Data Type Edition The edition of the MaxCompute data type. This configuration takes effect within 5 minutes. For more information, see Date types. If you do not know which edition to select, we recommend that you contact the workspace administrator.
    Whether to encrypt Specifies whether to encrypt data. Valid values: No encryption and Encryption.
    MaxCompute Project Name The name of the MaxCompute project. By default, the MaxCompute project that serves as the production environment is named after the DataWorks workspace. The MaxCompute project that serves as the development environment is named in the format of DataWorks workspace name_dev.
    Account for Accessing MaxCompute The identity that you can use to access the MaxCompute project. For the development environment, the value is fixed to Node Owner.

    For the production environment, the valid values are Alibaba Cloud Account and RAM User.

    Realtime Compute Instance Display Name The display name of the compute engine instance. The display name must start with a letter and can contain only letters, underscores (_), and digits.
    Realtime Compute Cluster The Realtime Compute cluster to which the Realtime Compute project belongs. If no Realtime Compute cluster exists, create one in the Realtime Compute console.
    Realtime Compute Project The Realtime Compute project to be bound to the DataWorks workspace. If no Realtime Compute project exists, create one in the Realtime Compute console.
    E-MapReduce Instance Display Name The display name of the compute engine instance. The display name must start with a letter and can contain only letters, underscores (_), and digits.
    Access ID The AccessKey ID of the account that is authorized to access the E-MapReduce cluster to be added as the compute engine instance.
    Access Key The AccessKey secret of the account that is authorized to access the E-MapReduce cluster.
    Cluster ID The ID of the E-MapReduce cluster. You can obtain the ID from the E-MapReduce console.
    EmrUserID The ID of the user who created the E-MapReduce cluster.
    Workspace ID The ID of the project in the E-MapReduce cluster.
    YARN Resource Queue The name of the resource queue in the E-MapReduce cluster. Unless otherwise specified, set this parameter to default.
    Endpoint The endpoint of the E-MapReduce cluster. You can obtain the endpoint from the E-MapReduce console.
    Hologres Instance Display Name The display name of the compute engine instance. The display name must start with a letter and can contain only letters, underscores (_), and digits.
    Access identity The identity that you can use to access the Hologres instance. For the development environment, the value is fixed to Task owner.

    For the production environment, the valid values are Alibaba Cloud primary account and Alibaba Cloud sub-account.

    Hologres instance name The name of the Hologres instance.
    Database name The name of the database to be bound to the DataWorks workspace. After you create a Hologres instance, the system automatically creates a database named postgres for management only. You can create a database based on your business needs in the Hologres console and bind the database to the DataWorks workspace.
    Connectivity Test Click Test Connectivity to test the connectivity of the compute engine instance.
    Graph Compute Instance Display name The display name of the compute engine instance. The display name must start with a letter and can contain only letters, underscores (_), and digits.
    Bind Graph Compute Instance The Graph Compute instance to be bound to the DataWorks workspace.
    AnalyticDB for PostgreSQL Instance Display Name The display name of the compute engine instance. The display name must start with a letter and can contain only letters, underscores (_), and digits.
    InstanceName The name of the AnalyticDB for PostgreSQL instance to be added as the compute engine instance.
    DatabaseName The name of the database to be connected in the AnalyticDB for PostgreSQL instance.
    Username The username that you can use to connect to the database.
    Password The password that you can use to connect to the database.
    Connectivity Test AnalyticDB for PostgreSQL nodes must be run on exclusive resource groups. Specify an exclusive resource group for running these nodes.
    Test Connectivity Click Test Connectivity to test the connectivity between the specified exclusive resource group and AnalyticDB for PostgreSQL instance.
    AnalyticDB for MySQL Instance Display Name The display name of the compute engine instance. The display name must start with a letter and can contain only letters, underscores (_), and digits.
    InstanceName The name of the AnalyticDB for MySQL instance to be added as the compute engine instance.
    DatabaseName The name of the database to be connected in the AnalyticDB for MySQL instance.
    Username The username that you can use to connect to the database.
    Password The password that you can use to connect to the database.
    Connectivity Test AnalyticDB for MySQL nodes must be run on exclusive resource groups. Specify an exclusive resource group for running these nodes.
    Test Connectivity Click Test Connectivity to test the connectivity between the specified exclusive resource group and AnalyticDB for MySQL instance.
  6. Click Create Workspace.
    After the workspace is created, you can view information about the workspace on the Workspaces page.
    Note
    • If you are the owner of a workspace, all data in the workspace belongs to you. Other users have no access to your workspace before you grant permissions to them. If you create a workspace as a RAM user under an Alibaba Cloud account, the workspace belongs to both the RAM user and the Alibaba Cloud account.
    • You can add a RAM user to a workspace so that the RAM user can use the workspace. This way, the RAM user does not need to create a workspace.

What to do next

You have learned how to create a workspace. You can now proceed with the next tutorial to add workspace members or directly perform operations that are described in Quick Start. Quick Start guides you through a complete process of data analytics and O&M.