All Products
Search
Document Center

DataWorks:Use old-version DataStudio

Last Updated:May 16, 2025

This topic describes how to use DataWorks together with E-MapReduce (EMR) for big data development and analysis. This topic also provides a case study on user profile analysis for you to experience the capabilities of DataWorks services, such as Data Integration, DataStudio, and Operation Center.

Case introduction

To develop effective business management strategies, you must obtain the basic profile data of website users based on their activities on websites. The basic profile data includes the geographical and social attributes of the website users. You can analyze profile data by time and location to further enable refined operations on website traffic. You can use DataWorks together with EMR to complete data synchronization, data processing, data management, and data consumption.

Note

Read Experiment introduction to familiarize yourself with the entire process of a user profile analysis case. This ensures that you can complete this tutorial.

Data development platform

In this tutorial, DataWorks old-version DataStudio is used. Make sure that your workspace does not participate in the public preview of new-version Data Studio.

  • Do not turn on Participate in Public Preview of Data Studio when you create a workspace.

  • Since February 19, 2025, new-version Data Studio is enabled by default if you activate DataWorks and create a workspace for the first time by using your Alibaba Cloud account in the following regions. For information about the tutorial in the case that new-version Data Studio is enabled by default, see Use new-version Data Studio.

    China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Hong Kong), Singapore, Indonesia (Jakarta), and Germany (Frankfurt)

Procedure

  1. Step 1: Prepare environments

    Create an EMR cluster and a DataWorks workspace that are required for the tutorial, and configure the resource group and network settings.

  2. Step 2: Synchronize data

    Configure a data synchronization task in DataWorks to synchronize the basic user information and website access logs of the users provided in the tutorial to Object Storage Service (OSS). Create an EMR external table to parse the data stored in OSS, synchronize the data to the associated EMR computing resource, and query the synchronized data.

  3. Step 3: Process data

    Use EMR Hive nodes in DataWorks to process the data in the basic user information table and access log table that are synchronized to EMR to obtain the desired user profile data.

  4. Step 4: Monitor the data quality

    Configure a monitor for tables that are generated after data processing to help identify and intercept dirty data in advance to prevent the impacts of dirty data from escalating.

  5. Step 5: Manage data

    Data tables are generated in EMR after a user profile analysis task is complete. You can view data lineages between the tables in Data Map.

  6. Step 6: Consume data

    After you complete user profile analysis, use DataAnalysis to display the processed data in a visualized manner. This helps you quickly extract key information to gain insights into the business trends behind the data.