This tutorial explains how to use the DataWorks and StarRocks product portfolio for big data development and analysis. This tutorial uses a user profile analysis case study to demonstrate the capabilities of DataWorks for Data Integration, DataStudio, and Operation Center.
Case introduction
To create better business strategies, you need to obtain basic profile data of website users based on their behavior. This data includes geographical and social attributes. You can analyze user profiles at specific times and locations to perform fine-grained operations on website traffic. You can use the DataWorks and StarRocks product portfolio to perform data synchronization, data transformation, data management, and data consumption.
To follow this tutorial, read Tutorial objectives and design to understand the overall flow of the user persona analysis.
Data Studio
This tutorial uses the new DataStudio platform in DataWorks. Make sure that the new DataStudio is enabled for your workspace. You can enable it as follows:
When you create a workspace, select Use Data Studio (New Version).
To upgrade from the old DataStudio version, click the Upgrade button at the top of the interface. Then, follow the on-screen instructions to complete the upgrade.
After February 18, 2025, the new DataStudio is enabled by default when an Alibaba Cloud account enables DataWorks and creates a workspace for the first time in the following regions:
China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Ulanqab), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Thailand (Bangkok), Germany (Frankfurt), UK (London), US (Silicon Valley), US (Virginia)
Procedure
Create the StarRocks instance and DataWorks workspace that are required for this tutorial, and complete the required resource group and network configurations.
In DataWorks, configure a data synchronization task to sync the user information and website log data provided in this tutorial to a StarRocks computing resource. Then, you can query the synchronized data.
In DataWorks, use StarRocks nodes to process the data in the user information and access log tables that were synchronized to StarRocks. This lets you obtain the target user profile data.
Configure data quality monitoring rules for the tables that are generated by data transformation to identify and block dirty data at an early stage and prevent the impact of dirty data from spreading.
When the user profile analysis task flow is complete, data tables are created in StarRocks. You can then view the data lineage between these tables in Data Map.
Consume data
After you complete the user profile analysis, you can use the DataAnalysis module to visualize the transformed data. This helps you quickly extract key information and gain insights into business trends.
After you obtain the final transformed data, you can use the DataService Studio module to share and apply the data through standardized API data service interfaces. This provides data for other business modules that accept data via APIs.