This tutorial explains how to use the DataWorks and MaxCompute product portfolio for big data development and analysis. It uses a user profile analysis example to demonstrate the capabilities of DataWorks in Data Integration, Data Development, and Operation Center.
Tutorial overview
To create better business strategies, you can obtain basic profile data about your website users from their online behavior. This data includes geographical and social attributes. You can then perform profile analysis at scheduled times for fine-grained website traffic operations. You can use the DataWorks and MaxCompute product portfolio to complete data synchronization, data transformation, data management, and data consumption.
Read Experiment introduction to familiarize yourself with the entire process of a user profile analysis case. This ensures that you can complete this tutorial.
Data development platform
This tutorial uses the previous version of DataStudio in DataWorks. Ensure that your workspace does not use the New Version Of Data Studio.
When you create a workspace, do not select Use The New Version Of Data Studio.
After February 18, 2025, when you use an Alibaba Cloud account to activate DataWorks and create a workspace for the first time in one of the following regions, the new version of Data Studio is enabled by default. If the new version of Data Studio is enabled by default for your account, see Experience the new version of Data Studio.
China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Ulanqab), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Thailand (Bangkok), Germany (Frankfurt), UK (London), US (Silicon Valley), US (Virginia)
Procedure
Create the MaxCompute project and DataWorks workspace required for this tutorial. Then, complete the network configurations for the resource group.
In DataWorks, configure a data synchronization task to synchronize the user information and website log data provided in this tutorial to MaxCompute. Then, query the synchronized data.
Use a MaxCompute SQL node in DataWorks to transform the data in the user information table and the access log table that were synchronized to MaxCompute. This process produces the target user profile data.
Configure data quality monitoring rules for the tables generated from data transformation. This helps you identify and block dirty data early to prevent its impact from spreading.
After the user profile analysis task flow is complete, the corresponding data tables are created in MaxCompute. You can view the generated tables in the Data Map module and check their data lineage to see the relationships between them.
Consume data
After the user profile analysis is complete, you can use the DataAnalysis module to visualize the transformed data. This helps you quickly extract key information and gain insights into business trends from the data.
After you obtain the final transformed data, you can use the DataService Studio module to share and use the data through standard API data services. This provides data to other business modules that receive data through APIs.