This tutorial shows you how to use the DataWorks and MaxCompute product portfolio for data development and analysis. It uses a user persona analysis scenario to demonstrate the capabilities of DataWorks modules, such as Data Integration, Data Studio, and Operation Center.
Introduction
To create better business strategies, you can analyze user behavior on your website. This analysis helps you build basic user personas with data such as geographic and social attributes. You can then run scheduled analyses for fine-grained traffic operations. You can use the DataWorks and MaxCompute product portfolio to synchronize, transform, manage, and consume data.
To follow this tutorial, read Tutorial objectives and design to understand the overall flow of the user persona analysis.
Data Studio
This tutorial uses the new DataStudio platform in DataWorks. Make sure that the new DataStudio is enabled for your workspace. You can enable it as follows:
When you create a workspace, select Use Data Studio (New Version).
To upgrade from the old DataStudio version, click the Upgrade button at the top of the interface. Then, follow the on-screen instructions to complete the upgrade.
After February 18, 2025, the new DataStudio is enabled by default when an Alibaba Cloud account enables DataWorks and creates a workspace for the first time in the following regions:
China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Ulanqab), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Thailand (Bangkok), Germany (Frankfurt), UK (London), US (Silicon Valley), US (Virginia)
Procedure
Create the required MaxCompute project and DataWorks workspace. Then, configure the network for the resource group.
In DataWorks, configure a data synchronization task. This task synchronizes the provided user information and website log data to MaxCompute. Then, query the synchronized data.
Use a MaxCompute SQL node in DataWorks to transform the data in the user information and access log tables. This produces the target user persona data.
Configure data quality monitoring rules for the tables generated from data transformation. This helps you detect and block dirty data early to prevent it from affecting downstream data.
After the user persona analysis task flow is complete, corresponding data tables are created in MaxCompute. You can view these tables in the Data Map module. You can also view the relationships between the tables using data lineage.
Consume data
After the user persona analysis is complete, use the DataAnalysis module to create a data visualization of the processed data. This lets you quickly fetch key information and gain insights into the business trends behind the data.
After you obtain the final processed data, you can use the DataService Studio module to share and apply data through standardized API data service interfaces to provide data to other business modules that accept data through APIs.