All Products
Search
Document Center

DataWorks:Data Studio (legacy version) tutorial

Last Updated:Mar 02, 2026

This tutorial shows you how to use the DataWorks and EMR product portfolio for big data development and analysis. This tutorial uses a user persona analysis case to demonstrate the capabilities of DataWorks in Data Integration, Data Development, and Operation Center.

Case description

To create better business strategies, you need to obtain basic profile data, such as geographical and social attributes, from user website behavior. This data allows for scheduled persona analysis and fine-grained management of website traffic. You can use the DataWorks and EMR product portfolio to perform data synchronization, data transformation, data management, and data consumption.

Note

To follow this tutorial, read Tutorial objectives and design to understand the overall flow of the user persona analysis.

Data development platform

This tutorial uses the DataWorks classic DataStudio platform. Ensure that your workspace is not set to Use The New Data Studio.

  • When you create a workspace, do not select the Use The New Data Development (DataStudio) option.

  • After February 18, 2025, the new Data Studio is enabled by default when you create a workspace for the first time in the following regions using an Alibaba Cloud account with DataWorks enabled. If the new Data Studio is enabled by default in your workspace, see the Experience the new Data Studio tutorial.

    China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Ulanqab), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Thailand (Bangkok), Germany (Frankfurt), UK (London), US (Silicon Valley), US (Virginia)

Procedure

  1. Prepare the environment

    Create the EMR cluster and DataWorks workspace for this tutorial. Then, configure the resource group network.

  2. Synchronize data

    In DataWorks, configure a data synchronization task to sync the provided user information and website log data to Object Storage Service (OSS). Create an EMR foreign table to parse the data in OSS, which syncs the data to the attached EMR computing resource. You can then query the synchronized data.

  3. Transform data

    Use an EMR Hive node in DataWorks to transform the data in the user information and access log tables that are synced to EMR. The goal is to generate the target user persona data.

  4. Monitor data quality

    Configure data quality monitoring for the tables generated during data transformation. This helps detect and block dirty data early to prevent it from affecting downstream processes.

  5. Manage data

    After the user persona analysis workflow completes, data tables are created in EMR. Use Data Map to view the data lineage between these tables.

  6. Consume data

    After the user persona analysis is complete, use the DataAnalysis module to visualize the transformed data. This helps you quickly extract key information and understand business trends.