Before running the user profile analysis tutorial, set up three cloud services: Object Storage Service (OSS), EMR Serverless StarRocks, and DataWorks — including a workspace, resource group, and data source connections.
Prerequisites
Before you begin, make sure you have:
An Alibaba Cloud account with billing enabled
OSS activated and an OSS bucket created — used to store custom function resources in this tutorial
An EMR Serverless StarRocks instance in the China (Shanghai) region — check free trial eligibility or purchase an instance
What you'll do
Create an EMR Serverless StarRocks instance and database
Create a DataWorks workspace
Purchase a resource group and configure network connectivity
Add StarRocks, MySQL, and HttpFile data sources to DataWorks
Step 1: Set up EMR Serverless StarRocks
Create an instance
If you don't have an EMR Serverless StarRocks instance, create one with the following configuration:
| Parameter | Value |
|---|---|
| Instance type | Compute-storage Integration |
| Region | China (Shanghai) |
| Instance edition | Basic Edition |
| Version | 3.1 |
Basic Edition is for trial use and feature testing only. The service level agreement (SLA) is not guaranteed for this edition. For production workloads, use Standard Edition.
Create the database
After the instance is created, log on to the instance and run the following SQL statement in the SQL Editor:
CREATE DATABASE user_behavior_analysis;SQL EditorThis database is used throughout the tutorial.
Step 2: Set up DataWorks
Create a workspace
Log on to the DataWorks console. In the upper-left corner, select the China (Shanghai) region.
In the left-side navigation pane, click Workspaces. On the Workspaces page, click Create Workspace and configure the parameters. For details, see Create a workspace.
If you already have a workspace, skip this step and use your existing workspace. The MySQL and HttpFile data sources in this tutorial are in the China (Shanghai) region, so use a workspace in the same region.
Create a resource group
A resource group is required to run StarRocks tasks in DataWorks.
Purchase a resource group
Purchase a serverless resource group. For details, see Use serverless resource groups.
Configure network connectivity
Make sure the resource group can reach the StarRocks data source. For an overview of network connectivity options, see Network connectivity solutions.
Complete the following tasks:
Check the StarRocks network environment.

Associate the resource group with the virtual private cloud (VPC) where the StarRocks instance resides.

Add the resource group's outbound IP address to the StarRocks IP address whitelist.
Get the outbound IP address of the DataWorks serverless resource group.

Click the EMR Serverless StarRocks instance name. In the Basic information section, click Add Rule and add the CIDR block of the vSwitch associated with the serverless resource group.

Configure an Internet NAT gateway so the resource group can access the StarRocks instance over the internet using an elastic IP address (EIP).
Log on to the VPC console and go to the Internet NAT Gateway page. In the top navigation bar, select the China (Shanghai) region.
Click Create NAT Gateway and configure the following parameters:
Parameter Value Region China (Shanghai) Network and zone Select the VPC and vSwitch associated with the resource group. To find these values: in the DataWorks console, go to Resource Groups, find the resource group, and click Network Settings in the Actions column. Under the VPC Binding tab, view the VPC and vSwitch in the Data Scheduling & Data Integration section. For details on VPCs, see What is VPC? EIP Select Purchase EIP Service-linked role Click Create Service-linked Role. Required the first time you create an Internet NAT gateway. Retain the default values for all other parameters.
Click Buy Now. On the Confirm page, read and accept the terms of service, then click Confirm.
Add a StarRocks data source
In the left-side navigation pane of the DataWorks console, click Management Center. Select your workspace from the drop-down list and click Go to Management Center.
In the left-side navigation pane of the SettingCenter page, choose Data Sources > Data Sources. Click Add Data Source.
In the Add Data Source dialog box, click StarRocks. On the Add StarRocks Data Source page, set Configuration Mode to Alibaba Cloud Instance Mode.

Configure the basic information. Get the instance details from the EMR console.
Parameter Value Data source name Doc_StarRocks_Storage_Compute_Tightly_01Configuration mode Alibaba Cloud Instance Mode Region China East 2 (Shanghai) Instance Select the Serverless instance you created Database name user_behavior_analysisUsername Your StarRocks database username Password Your StarRocks database password Click Test Network Connectivity. If the test passes, click Complete Creation.
Add a MySQL data source
In the left-side navigation pane of the SettingCenter page, click Data Sources. Click Add Data Source.
In the Add Data Source dialog box, select MySQL.
On the Add MySQL Data Source page, configure the following parameters. Use the sample values shown — this is a shared test dataset provided for this tutorial.
Parameter Value Data source name user_behavior_analysis_mysqlData source description This data source is provided for this tutorial and serves as the source for batch synchronization tasks. Use it for data reading in data synchronization scenarios only. Configuration mode Connection String Mode Host IP address rm-bp1z69dodhh85z9qa.mysql.rds.aliyuncs.comPort number 3306Database name workshopUsername workshopPassword workshop#2017Authentication method No Authentication For each resource group, click Test Network Connectivity in both the Connection Status (Development Environment) and Connection Status (Production Environment) columns. When connectivity is confirmed, Connected appears in each column.
Click Complete Creation.
Add an HttpFile data source
In the left-side navigation pane of the SettingCenter page, choose Data Sources > Data Sources. Click Add Connection.
In the dialog box, click HttpFile.

Configure the following parameters:
Parameter Value Data source name user_behavior_analysis_httpfileDescription This data source is provided for this tutorial and serves as the source for batch synchronization tasks. Use it for data reading in data synchronization scenarios only. URL https://dataworks-workshop-2024.oss-cn-shanghai.aliyuncs.comClick Test Network Connectivity. If the test passes, click Complete Creation.
What's next
Your environment is ready. In the next tutorial, you'll synchronize basic user information and website access logs to StarRocks. For details, see Synchronize data.