This topic describes how to use the data warehouse development feature of Data Management (DMS) to create analysis reports.

Background information

Reports are used in various scenarios.

  • Employees use reports to present business performance to managers.
  • Data analysts use reports to analyze business scenarios from different angles to gain insights into business growth.
  • Enterprises use reports to summarize data in business systems, such as daily revenue in an e-commerce system and daily orders in a logistics system.

These reports are often created in a big data platform. However, you can also use the data warehouse development feature of DMS to create these reports.

Benefits

The data warehouse development feature of DMS provides the following benefits:
  • You can manage and analyze data in a centralized manner. You can manage both the business database and analytic database in DMS without the need to access the big data platform to obtain data.
  • You can change the analytics engine based on your business requirements. You can select a database in DMS as the analytic database based on your business scale. DMS provides a wide range of databases for different scenarios. For example, you can use ApsaraDB RDS to analyze gigabyte-scale data and use AnalyticDB for MySQL to analyze petabyte-scale data.
  • You can use permissions in DMS to manage data access and improve data security.

Prerequisites

  • Select a business database that you want to analyze. (The sample data of TPCH is used in the examples in this topic.)
  • On the Permission Tickets page, apply for the export permission on the business database, change permission on the analytic database (ApsaraDB RDS is used in the examples in this topic), and query permission on the dictionary.
  • Create a task flow named tpch_report.

Step 1: Synchronize tables in the business database to the analytic database

This section describes how to use a cross-database Spark SQL task to synchronize tables in the business database to the analytic database. For more information about how to process data across databases, see Query data across multiple database instances.

Create a cross-database Spark SQL task

  1. Log on to the DMS console V5.0.
  2. In the top navigation bar, click DTS. In the left-side navigation pane, choose Data Development > Data Warehouse Development.
  3. On the Data Warehouse Development page, find the workspace that you want to use and click Go to Workspace.
    Note If you do not have a workspace, create a Warehouse project before you perform the following operations.
  4. Create a cross-database Spark SQL task.
    1. In the Task Type panel on the left side of the canvas, drag the Cross-Database Spark SQL node on to the canvas.
    2. Right-click the Cross-Database Spark SQL node on the canvas and click Rename to change the node name.
      Change the name of the node to table_sync.
      Note You can click the blank area on the canvas to save your changes to the node name.
      create-node
  5. Copy data by using the cross-database Spark SQL node.
    1. Double-click the table_sync node.
    2. Click Add Database Reference, specify the Database Type and Database parameters, and click Save. Then, write SQL statements to synchronize data.
  6. Configure variables.
    1. Click Variable Setting on the right side of the node configuration page.
    2. Click the Task Flow Variables tab and specify the Variable Name and Variable Rule parameters.
  7. Click Preview at the upper part of the page to check whether the variable is replaced correctly in the SQL statement.

Step 2: Analyze data in the analytic database

Obtain the number of customers who placed an order, number of orders, and total order amount each day.

  1. Create a single-instance SQL task.
    1. In the Task Type panel on the left side of the canvas, drag the Single Instance SQL node to the canvas.
    2. Right-click the Single Instance SQL node on the canvas and click Rename to change the node name.

      Change the name of the node to daily_orders.

  2. Select the analytic database. Create a table named daily_orders_summary and write SQL statements to count daily data.
    Note Your edits are saved automatically.

Count the number of parts, total discount amount, and total tax amount of orders each day in the lineitem table.

Step 3: Synchronize analysis tables to the business or BI database

For more information about how to add a cross-database Spark SQL task to synchronize tables, see Step 1: Synchronize tables in the business database to the analytic database.

Connect nodes in the directed acyclic graph (DAG)

Connect the task nodes.

Configure the scheduling settings

  1. On the task configuration page, click Task Flow Information below the canvas.
  2. In the Scheduling Settings section, turn on the Enable Scheduling switch and specify the parameters including Scheduling Type and Effective Time.
    task-information

Run a task flow and view the execution status

Run a task flow

Click Try Run above the canvas to run a task flow and test if the task flow runs as expected.
  • If status SUCCEEDED appears in the last line of the logs, the test run is successful.
  • If status FAILED appears in the last line of the logs, the test run fails. In this case, view the node on which the failure occurs and the reason for the failure in the logs. Then, modify the configuration of the node and try again.

View the execution status of a task

Click Go to O&M in the upper-right corner of the canvas to view the task flow O&M page.

  • You can view basic information about the task flow, including the Created At, Changed At, and Scheduling Settings parameters.
  • Click the Running History tab, and select Scheduling Trigger or Trigger Manually to view the running history of the task flow.
    Note
    • Scheduling Trigger: The task flow is automatically run at a scheduled time based on the specified scheduling properties.
    • Trigger Manually: The task flow is run after you click Test Run.
    • Click the Plus icon icon next to the Status column to view the detailed run logs of the task flow.
    • Click Executive History in the Operating column. In the panel that appears, you can view the time when the task flow was run, the user who ran the task flow, and the scheduling type of the task flow.
    • In the Operating column, select End, Rerun, Pause, Restore, or Set Successfully to manage the task flow based on its status.
      Note
      • If a task flow is successful, you can rerun it.
      • If a task flow fails, you can set the Status of the task flow to Success in its running history.
      • If a task flow is running, you can terminate or pause it.
  • Click the Published Tasks tab to view the Version ID, Published By, and Published At parameters of the task flow.