All Products
Search
Document Center

DataWorks:Prepare the environment

Last Updated:Mar 28, 2026

This tutorial walks you through setting up a MaxCompute and DataWorks environment in the China (Shanghai) region — a required first step before running the user profile analysis experiment. By the end, you'll have two MaxCompute projects, a DataWorks workspace, a Serverless resource group with internet access, and the computing resources connected together.

Prerequisites

Before you begin, ensure that you have:

  • An Alibaba Cloud account

  • DataWorks activated (see Purchase)

  • Permissions to create MaxCompute projects, DataWorks workspaces, and virtual private cloud (VPC) resources

Note

This tutorial uses Data Studio (new version) for data transformation. Before starting, read the introduction to the user profile analysis experiment. All data used is mock data for hands-on practice only.

Steps in this tutorial

This page covers environment preparation only. You will:

  1. Activate MaxCompute in the China (Shanghai) region

  2. Create two MaxCompute projects (development and production)

  3. Create a DataWorks workspace

  4. Purchase and configure a Serverless resource group

  5. Associate the MaxCompute projects as computing resources

Prepare the MaxCompute environment

Step 1: Activate MaxCompute

Activate MaxCompute in the MaxCompute console with the following settings:

Parameter Value
Region China (Shanghai)
Type Standard

Step 2: Create MaxCompute projects

A standard DataWorks workspace requires two MaxCompute projects — one for the development environment and one for the production environment. These projects serve as the computing and storage resources for your workspace.

  1. Go to the MaxCompute console. In the left navigation pane, choose Manage Configurations > Project management.

  2. Click New Project and create two projects. Use the parameters below; keep the defaults for anything not listed.

Parameter Value
Project name Custom. Must be globally unique. This tutorial uses workshop2024_01 (production) and workshop2024_01_dev (development).
Billing method of computing resources Pay-as-you-go
Default quota os_PayAsYouGoQuota
Data type 2.0 Data Type (Recommended)
Storage encryption Not encrypted

For more information, see Create a MaxCompute project.

Prepare the DataWorks environment

Step 1: Create a workspace

If you already have a workspace (new version) in the China (Shanghai) region, skip this step.

  1. Log on to the DataWorks console. In the top navigation bar, set the region to China (Shanghai). In the left navigation pane, click Workspace.

  2. Click Create Workspace, select Use Data Studio (New Version), and enable Isolate Development and Production Environments.

Note

Starting February 18, 2025, new Data Studio is enabled by default the first time an Alibaba Cloud account activates DataWorks and creates a workspace in the China (Shanghai) region.

For more information, see Create a workspace.

Step 2: Create a Serverless resource group

The tutorial uses a DataWorks Serverless resource group for data synchronization and scheduling. Setting it up involves three sub-steps: purchase, associate with your workspace, and configure internet access.

Purchase a Serverless resource group

  1. Log on to the DataWorks - Resource Group List page. In the top navigation bar, set the region to China (Shanghai). In the left navigation pane, click Resource Group.

  2. Click Create Resource Group. On the purchase page, set Region and Zone to China (Shanghai), specify a Resource Group Name, and complete the purchase. For billing details, see Billing of Serverless resource groups.

Note

If no VPC or vSwitch is available in the region, follow the console link in the parameter description to create them. For details, see What is a virtual private cloud (VPC)?.

Associate the resource group with your workspace

A newly purchased Serverless resource group must be associated with a workspace before it can be used.

On the DataWorks - Resource Group List page, set the region to China (Shanghai). Find the resource group you purchased, and in the Actions column, click Associate Workspace > Associate next to your workspace.

Configure internet access for the resource group

Important

The test data in this tutorial is retrieved from the internet. By default, Serverless resource groups have no outbound internet access. Without the following configuration, the data synchronization tasks in the next tutorial will fail. Complete all sub-steps before proceeding.

To enable internet access, add an Internet NAT Gateway and an elastic IP address (EIP) to the VPC bound to the resource group.

Create an Internet NAT Gateway:

  1. Go to the VPC - Internet NAT Gateway console. In the top navigation bar, set the region to China (Shanghai).

  2. Click Create Internet NAT Gateway and configure the parameters below. Keep the defaults for anything not listed.

Parameter Value
Region China (Shanghai)
Network and zone Select the VPC and vSwitch bound to the resource group. To find them: go to the DataWorks console, switch to China (Shanghai), click Resource Group in the left pane, find your resource group, and click Network Settings in the Actions column. The VPC and vSwitch appear under Data Scheduling & Data Integration.
Network type Internet NAT Gateway
EIP Create EIP
Service-linked role If this is your first NAT Gateway, click Create Service-linked Role.
  1. Click Buy Now, accept the terms of service, and click Activate Now.

Create SNAT entries:

Resource groups in the VPC cannot reach the internet until you add Source Network Address Translation (SNAT) entries to the NAT Gateway.

  1. Find the NAT Gateway instance you just purchased and click Manage in the Actions column.

  2. Switch to the SNAT tab. In the SNAT entry list section, click Create SNAT entry with the following settings:

Parameter Value
SNAT entry Select Specify VPC. This allows all resource groups in the VPC to access the internet through the EIP.
Select EIP Select the EIP bound to the current NAT Gateway instance.
  1. Click OK.

The SNAT entry is ready when its status changes to Available.

For more information, see Use a Serverless resource group.

Step 3: Associate MaxCompute as a computing resource

Associate the two MaxCompute projects with your DataWorks workspace so you can process data in MaxCompute from within Data Studio.

  1. Go to the DataWorks - Workspace List page. Set the region to China (Shanghai), find your workspace, and click its name to open the Workspace details page.

  2. In the left navigation pane, click Computing Resources.

  3. Click Associate Computing Resources, select MaxCompute as the computing resource type, and configure the parameters below.

Parameter Description
MaxCompute project Select the production and development MaxCompute projects you created in Step 2.
Default access identity Controls which identity is used to access the MaxCompute project from this workspace. Development environment: Executor only. Production environment: select based on your current account — this tutorial uses Alibaba Cloud account. If you are logged on with a different identity, see Associate in Data Studio (new version).
Computing resource instance name Custom name used to identify this computing resource when assigning it to a task.
Connection configuration Select the Serverless resource group you created and associated with this workspace. Test connectivity for both the development and production environments.
  1. Click Confirm.

For more information, see Associate a computing resource.

What's next

Your environment is ready. In the next tutorial, you'll synchronize basic user information and website access logs to OSS, create a table using an ODPS SQL node, and query the synchronized data.

Synchronize data