DataWorks allows you to analyze data in online mode. In this topic, the MaxCompute public dataset is used to describe how to perform this operation.

Procedure

  1. Preparations: Prepare an environment. Before you prepare an environment, make sure that you understand all the requirements on the environment for data queries and analytics.
  2. Query data: Query data from two tables in the MaxCompute public dataset.
  3. Data analytics and sharing: Use web Excel to sort query results and perform pivoting for the results.

Preparations

Activate MaxCompute and DataWorks that are deployed in the same region, and create a DataWorks workspace and a MaxCompute project. Skip this step if you already have an environment that meets the preceding requirements.

  1. Activate MaxCompute and DataWorks.
    1. Log on to Alibaba Cloud, go to the product page of Alibaba Cloud MaxCompute, and then click Buy Now.
    2. Configure parameters such as Region, read and select the terms of service, and complete payment as prompted.
  2. Create a DataWorks workspace and a MaxCompute project, and associate them.
    1. Log on to the MaxCompute console and click Create Project on the Project management tab.
    2. Create a DataWorks workspace.
      In the Create Workspace panel, configure parameters and click Create project in the lower-left corner.Create Project
      • Mode: In this topic, Basic Mode (Production Environment Only) is selected. You can set this parameter based on your requirements.
      • Advanced Settings: If you do not need to download data to your on-premises machine, we recommend that you turn off Download SELECT Query to prevent downloaded data from being forwarded. This improves data security.
    3. Create a MaxCompute project.
      Configure parameters and click Confirm creation in the lower-left corner.Create a MaxCompute project
      • Payment mode: You can use the default mode or enable another mode based on your requirements. Default value: The pay-as-you-go billing method.
      • Access identity: For a data analytics project, we recommend that you select Node Owner. This indicates that the system verifies the permissions of the logon accounts of all members before the members can commit nodes.
    4. After the system displays the "Created successfully" message, click Go to list view to view the created workspace.

Query data

This section describes how to use the online data analytics feature provided by DataWorks to query data from the MaxCompute public dataset.

The MaxCompute public dataset provides a variety of public data for verification. This topic uses the maxcompute_public_data.dwd_product_movie_basic_info and maxcompute_public_data.ods_product_movie_box tables to collect statistics on the daily box office charts during the Spring Festival (Chinese New Year) in 2017.Public dataset
  • maxcompute_public_data.dwd_product_movie_basic_info: stores the basic information of movies, including the movie title, director, screenwriter, leading actors, and movie type.
  • maxcompute_public_data.ods_product_movie_box: stores the basic information of the box office for movies, including the movie title, daily box office gross, and total box office gross.
To collect statistics on the daily box office charts during the Spring Festival, you must associate the two tables. The following description provides the related operations and sample code.
  1. In the MaxCompute console, click Query editing.
  2. In the Select Datasource dialog box, select a data source.
    Select MaxCompute for Type and the workspace created in Preparations for Workspace. Then, click OK.
  3. In the left-side navigation pane of the page that appears, find the maxcompute_public_data.dwd_product_movie_basic_info and maxcompute_public_data.ods_product_movie_box tables on the Public data set tab and view the fields of the tables.
    1. Expand a table and move the pointer over a field name to view the description of the field.
    2. Right-click the table and select Data Preview. In the Table Details dialog box, 20 data records are randomly displayed.
  4. In the code editor, edit code to query data.
    In this example, the period from January 28, 2017 to February 3, 2017 is used as the official period of the Spring Festival. The daily box office gross of movies released during this period and the key information of the movies are queried. The query results are stored in a query result table, which is used for subsequent online data analytics.
    1. Edit query code in the code editor on the right.
      Sample code:
      WITH 
          a as(select ds,moviename,sumboxoffice,DENSE_RANK() OVER (PARTITION BY ds ORDER BY CAST(sumboxoffice AS DOUBLE) DESC) AS srank,boxoffice,irank from maxcompute_public_data.ods_product_movie_box WHERE ds>='20170128' and ds<='20170203' ),
          b as (select   movie_name,dirctor,type ,area, actors,movie_language ,ROW_NUMBER() OVER (PARTITION BY movie_name ORDER BY type DESC) AS nums from maxcompute_public_data.dwd_product_movie_basic_info where ds>='20170128' and ds<='20170203' ),
          c as (select b.movie_name,b.dirctor,b.type ,b.area, b.actors,b.movie_language from b where b.nums=1)
          select a.ds as Date,a.moviename as Movie title,a.irank as Box office ranking on the day,a.boxoffice as Box office gross on the day (tens of thousands) ,a.srank as Overall box office ranking,a.sumboxoffice as Total box office gross (tens of thousands),c.dirctor as Director,c.type as Movie type,c.area as Production area,c.actors as Leading actors,c.movie_language as Language from a LEFT join c on a.moviename=c.movie_name;
    2. Click the Run icon to run the query code.
    3. After the code is successfully run, click the Save query icon to save the query code.
      Query code
    4. Click Query mode in the upper-left corner to switch to Analysis mode. On the page that appears, click Save in the upper-right corner to save the result table for subsequent data analytics and sharing.
      Save the result table

Data analytics and sharing

You can use the analysis mode to perform simple secondary analytics and measurements based on web Excel. You can also share the results with others.

  1. Analyze data.
    • View the box office chart of movies each day.Chart
      1. Select a cell in the result table and click the Sort and Filter icon.
      2. Right-click the Box office ranking on the day column and select Convert the values in this column into numeric values.
      3. Sort the values in the Box office ranking on the day column in ascending order and the values in the Date column in descending order.
    • Use the pivoting feature to collect statistics on the box office distribution of movies throughout the Spring Festival. Select the column on which you want to perform pivoting and click Pivot in the upper-right corner. In the dialog box that appears, leave the parameters at their default values and click OK.Pivoting
      1. Drag the Movie title field from the Pivot Table Fields section to the Row section and the Box office gross on the day (tens of thousands) field to the Indicator section. Then, move the pointer over the field in the Indicator section, click the More icon, and select Edit. In the Property settings dialog box, select SUM for Summary method.
      2. Select the columns and click the Pie Chart icon.
    You can also perform more diversified analytics and measurements on data based on your business requirements. You must save all the statistics and analysis results.
  2. Share data.
    After you save the pie chart, click Share in the upper-right corner to share the results with RAM users that belong to your Alibaba Cloud account. The RAM users can access the result table based on the shared link or access code. You can specify whether the RAM users have edit or read permissions on the table.Share