This tutorial shows how to use the DataWorks and StarRocks product portfolio for big data development and analysis. A user profile analysis case study walks you through the capabilities of the Data Integration, Data Development, and Operation Center modules in DataWorks.
Case study introduction
This case study demonstrates how to obtain basic profile data from user behavior on a website to improve business strategies. This data, which includes geographical and social attributes, is used to perform scheduled profile analysis. This analysis enables fine-grained website traffic operations. You can use the DataWorks and EMR Serverless StarRocks product portfolio to perform data synchronization, data transformation, data management, and data consumption.
Read Experiment introduction to familiarize yourself with the entire process of a user profile analysis case. This ensures that you can complete this tutorial.
Data development platform
This tutorial uses the DataWorks legacy Data Development (DataStudio) platform. Ensure that your workspace is not set to Using The New Data Development (Data Studio).
When you create a workspace, do not select the Use The New Data Development (DataStudio) option.
After February 18, 2025, the new version of Data Studio is enabled by default when you use an Alibaba Cloud account to enable DataWorks and create a workspace for the first time in the following regions. If the new version of Data Studio is already enabled by default for your account, see Use the new version of Data Studio.
China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Ulanqab), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Thailand (Bangkok), Germany (Frankfurt), UK (London), US (Silicon Valley), US (Virginia)
Procedure
Create the StarRocks instance and DataWorks workspace required for this tutorial, and complete the related resource group network configuration.
Configure a data synchronization channel in DataWorks to synchronize the user information and website log data provided in this tutorial to the StarRocks computing resource. Then, you can query the synchronized data.
Use the StarRocks node in DataWorks to transform the data in the user information table and the access log table that were synchronized to StarRocks. This process generates the target user profile data.
Configure data quality monitoring rules for the tables generated from data transformation. This helps you detect and block dirty data in advance and prevent its impact from spreading.
After the user profile analysis task workflow is complete, the corresponding data tables are created in StarRocks. You can view the generated data tables and their lineage in the Data Map module.
Consume data
After the user profile analysis is complete, use the DataAnalysis module to visualize the transformed data. This helps you quickly extract key information and gain insights into business trends.
After you obtain the final transformed data, use the DataService Studio module to share and apply the data through a standard API-based data service. This provides data to other business modules that use APIs.