All Products
Search
Document Center

DataWorks:Using the Legacy Data Development Experience

Last Updated:Oct 28, 2025

This tutorial shows you how to use the DataWorks and EMR product portfolio for big data development and analysis. This tutorial uses a user persona analysis case to demonstrate the capabilities of DataWorks in Data Integration, Data Development, and Operation Center.

Case description

To create better business strategies, you need to obtain basic profile data, such as geographical and social attributes, from user website behavior. This data allows for scheduled persona analysis and fine-grained management of website traffic. You can use the DataWorks and EMR product portfolio to perform data synchronization, data transformation, data management, and data consumption.

Note

Read Experiment introduction to familiarize yourself with the entire process of a user profile analysis case. This ensures that you can complete this tutorial.

Data development platform

This tutorial uses the DataWorks classic DataStudio platform. Ensure that your workspace is not set to Use The New Data Studio.

  • When you create a workspace, do not select Use The New Data Studio.

  • After February 18, 2025, the new Data Studio is enabled by default when you create a workspace for the first time in the following regions using an Alibaba Cloud account with DataWorks enabled. If the new Data Studio is enabled by default in your workspace, see the Experience the new Data Studio tutorial.

    China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Ulanqab), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Thailand (Bangkok), Germany (Frankfurt), UK (London), US (Silicon Valley), US (Virginia)

Procedure

  1. Prepare the environment

    Create the EMR cluster and DataWorks workspace for this tutorial. Then, configure the resource group network.

  2. Synchronize data

    In DataWorks, configure a data synchronization task to sync the provided user information and website log data to Object Storage Service (OSS). Create an EMR foreign table to parse the data in OSS, which syncs the data to the attached EMR computing resource. You can then query the synchronized data.

  3. Transform data

    Use an EMR Hive node in DataWorks to transform the data in the user information and access log tables that are synced to EMR. The goal is to generate the target user persona data.

  4. Monitor data quality

    Configure data quality monitoring for the tables generated during data transformation. This helps detect and block dirty data early to prevent it from affecting downstream processes.

  5. Manage data

    After the user persona analysis workflow completes, data tables are created in EMR. Use Data Map to view the data lineage between these tables.

  6. Consume data

    After the user persona analysis is complete, use the DataAnalysis module to visualize the transformed data. This helps you quickly extract key information and understand business trends.