All Products
Search
Document Center

DataWorks:Prepare the environment

Last Updated:Feb 03, 2026

This tutorial shows you how to build a user profile in the China (Shanghai) region. It uses a raw dataset from DataWorks to guide you through the entire process of data synchronization, transformation, and quality monitoring. You must prepare a MaxCompute project and a DataWorks workspace, and configure data sources, computing resources, and storage resources in advance.

Business background

To create better business strategies, you need to obtain basic profile data about your website's user groups based on their website behavior. This data includes geographical and social attributes. This lets you perform profile analysis at scheduled times and implement fine-grained website traffic operations.

Before you begin

To follow this tutorial, read the introduction to understand the user profile analysis experiment.

Notes

  • This tutorial provides the required user information and website access test data.

  • The data in this tutorial is intended only for hands-on practice with DataWorks. All data is mock data.

  • This tutorial uses Data Studio (new version) for data transformation.

Prepare the MaxCompute environment

1. Activate MaxCompute

This tutorial uses MaxCompute. First, activate MaxCompute in the China (Shanghai) region with the following parameters.

  • Region: China (Shanghai)

  • Specifications Type: Standard.

2. Create MaxCompute projects

A standard DataWorks workspace requires two MaxCompute projects: one for the development environment and one for the production environment. These projects act as computing resources.

  1. Go to the MaxCompute console. In the navigation pane on the left, choose Manage Configurations > Projects.

  2. Click Create Project to create two MaxCompute projects. The following table lists the key parameters for this tutorial. Keep the default values for any parameters not mentioned.

    Configuration item

    Configuration

    Project Name (Globally Unique)

    Custom. Must be globally unique.

    This tutorial uses:

    • Production environment: workshop2024_01

    • Development environment: workshop2024_01_dev

    Billing Method

    This tutorial uses: Pay-as-you-go.

    Default Quota

    This tutorial uses: os_PayAsYouGoQuota.

    Data Type Edition

    This tutorial uses: 2.0 Data Type (Recommended).

    Storage Encryption

    This tutorial uses: Not encrypted.

For more information about how to create a MaxCompute project, see Create a MaxCompute project.

Prepare the DataWorks environment

Before using DataWorks for development, ensure that the DataWorks service is activated. For more information, see Purchase.

1. Create a workspace

If you already have a workspace (new version) in the China (Shanghai) region, you can skip this step and use the existing workspace.

  1. Log on to the DataWorks console. In the top navigation bar, set the region to China (Shanghai). In the navigation pane on the left, click Workspace to go to the workspace list page.

  2. Click Create Workspace to create a Use Data Studio (New Version) workspace, and enable Isolate Development and Production Environments.

    Note

    Starting February 18, 2025, the new Data Studio is enabled by default the first time an Alibaba Cloud account activates DataWorks and creates a workspace in the China (Shanghai) region.

For more information about how to create a workspace, see Create a workspace.

2. Create a serverless resource group

  1. Purchase a Serverless resource group.

    This tutorial requires a DataWorks Serverless resource group for data synchronization and scheduling. You must purchase a Serverless resource group and complete the initial setup first.

    1. Log on to the DataWorks - Resource Group List page. In the top navigation bar, set the region to China (Shanghai). In the navigation pane on the left, click Resource Group to go to the resource group List page.

    2. Click Create Resource Group. On the purchase page, set Region And Zone to China (Shanghai) and specify a Resource Group Name. Configure other parameters as prompted and complete the payment. For information about the billing of Serverless resource groups, see Billing of Serverless resource groups.

      Note

      If no VPC or vSwitch is available in the current region, click the console link in the parameter description to create them. For more information about VPCs and vSwitches, see What is a virtual private cloud (VPC)?.

  2. Bind the resource group to the DataWorks workspace.

    A newly purchased Serverless resource group must be bound to a workspace before it can be used.

    Log on to the DataWorks - Resource Group List page and set the region to China (Shanghai) in the top navigation bar. Find the serverless resource group that you purchased. In the Actions column, click Associate Workspace and then click Associate next to the DataWorks workspace that you created.

  3. Configure public network access for the resource group.

    The test data for this tutorial is retrieved from the internet. By default, resource groups do not have public network access. You must configure an Internet NAT Gateway for the VPC that is bound to the resource group and add an EIP to retrieve data from the public network.

    1. Log on to the VPC - Internet NAT Gateway console. In the top menu bar, set the region to China (Shanghai).

    2. Click Create Internet NAT Gateway and configure the parameters. The following table lists the key parameters for this tutorial. Keep the default values for any parameters not mentioned.

      Parameter

      Value

      Region

      China (Shanghai).

      Network And Zone

      Select the VPC and vSwitch bound to the resource group.

      You can go to the DataWorks console and switch to the China (Shanghai) region. In the navigation pane on the left, click Resource Group. Find the resource group that you created and click Network Settings in the Actions column. In the Data Scheduling & Data Integration area, view the associated VPC and VSwitch. For more information about VPCs and vSwitches, see What is a virtual private cloud (VPC)?.

      Network Type

      Internet NAT Gateway.

      EIP

      Create EIP.

      Service-linked Role Creation

      When you create a NAT Gateway for the first time, you must create a service-linked role. Click Create Service-linked Role.

    3. Click Buy Now, select the terms of service, and then click Activate Now to complete the purchase.

    4. After the NAT Gateway instance is purchased successfully, return to the console to create SNAT entries for the newly purchased NAT Gateway instance.

      Note

      Resource groups using this VPC can access the Internet only after SNAT entries are configured.

      1. Find the newly purchased instance, click Manage in the Actions column to enter the management page of the target NAT Gateway instance, and switch to the SNAT tab.

      2. In the SNAT Entry List section, click Create SNAT Entry. The key configurations are as follows:

        Parameter

        Value

        SNAT Entry

        Select Specify VPC. This ensures that all resource groups within the VPC to which the NAT Gateway belongs can access the Internet through the configured EIP.

        Select EIP

        Select the EIP bound to the current NAT Gateway instance.

      3. After configuring the parameters, click OK to create the SNAT entry.

      In the SNAT entry list, when the status of the newly created SNAT entry changes to Available, it indicates that the VPC bound to the resource group now has Internet access capability.

For more information about how to add and use Serverless resource groups, see Use a Serverless resource group.

3. Bind MaxCompute as a computing resource

You must bind the MaxCompute projects that you created to the DataWorks workspace as computing resources. Then, you can process data in MaxCompute within the Data Studio module.

  1. Go to the DataWorks - Workspace List page. In the top navigation bar, set the region to China (Shanghai). Find the workspace that you created and click its name to open the Workspace Details page.

  2. In the navigation pane on the left, click Computing Resource.

  3. Click Associate Computing Resource, select a computing resource type, and then configure the parameters.

    This tutorial uses MaxCompute as the computing and storage resource. Select MaxCompute as the computing resource type and configure its parameters. The following table describes the key parameters. You can retain the default values for other parameters.

    Parameter

    Description

    MaxCompute Project

    Select the MaxCompute projects to bind. For this tutorial, bind the corresponding MaxCompute projects created in Step 2 to the production and development environments.

    Default Access Identity

    Defines the identity used to access the MaxCompute project from the current workspace.

    • Development environment: Only the Executor identity is supported.

    • Production environment: Select from the drop-down list based on the current logon account. This tutorial uses Alibaba Cloud Account.

      Note

      If you are logged on with a different identity, see New Data Studio: Bind a MaxCompute computing resource for configuration details.

    Computing Resource Instance Name

    Custom name to identify the computing resource. This name is used to select the computing resource for a task at runtime.

    Connection Configuration

    The resource group used to connect to the MaxCompute computing resource. The Serverless resource group that you created and bound to the current workspace is displayed here. You must test the connectivity for both the development and production environments.

  4. Click Confirm to complete the MaxCompute computing resource configuration.

For more information about how to bind computing resources, see Bind computing resources.

Next steps

Now that you have prepared the environment, you can proceed to the next tutorial. In the next tutorial, you will learn how to synchronize basic user information and user website access logs to OSS, and then use an ODPS SQL node to create a table and query the synchronized data. For more information, see Synchronize data.