Use DataWorks with EMR Serverless Spark for big data development and analytics. A user profile analysis case walks you through the Data Integration, Data Development, and Operation Center modules.
Overview
Extract user profile data—such as geographic and social attributes—from website behavior to drive business strategy. Use DataWorks and EMR Serverless Spark to synchronize, transform, manage, and consume the data on a recurring schedule.
Before you begin, read Tutorial objectives and design for a workflow overview.
数据开发平台
本案例使用DataWorks的旧版数据开发(DataStudio)平台,请确保您的工作空间未使用新版数据开发(Data Studio)。
-
When you create a workspace, do not select the Use Data Studio (New Version) option.
-
2025年02月18日后,主账号在如下地域首次开通DataWorks并创建工作空间时,默认启用新版数据开发。如您已默认启用了新版数据开发,具体教程请参见Get started with the new Data Studio。
China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Ulanqab), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Thailand (Bangkok), Germany (Frankfurt), UK (London), US (Silicon Valley), US (Virginia)
Procedure
-
Step 1: Prepare the environment
Create the required EMR Serverless Spark and DataWorks workspaces, and configure the resource group and network settings.
-
Step 2: Synchronize data
Configure a data synchronization task in DataWorks to synchronize basic user information and website access logs to a Spark computing resource, and query the synchronized data.
-
Step 3: Process data
Use EMR Spark SQL nodes in DataWorks to process the synchronized user information and access log data to obtain user profile data.
-
Step 4: Monitor data quality
Configure a monitor for the output tables to identify and intercept dirty data before its impact escalates.
-
Step 5: Manage data
View the data tables generated in Spark after user profile analysis and explore data lineages between the tables in Data Map.
-
Step 6: Consume data
Use the data analytics module to visualize the processed data, extract key information, and gain insights into business trends.