This topic describes how to use the data warehouse development feature of Data Management (DMS) to create analysis reports.
Background information
Reports are used in various scenarios.
- Employees use reports to present business performance to managers.
- Data analysts use reports to analyze business scenarios from different angles to gain insights into business growth.
- Enterprises use reports to summarize data in business systems, such as daily revenue in an e-commerce system and daily orders in a logistics system.
These reports are often created in a big data platform. However, you can also use the data warehouse development feature of DMS to create these reports.
Benefits
- You can manage and analyze data in a centralized manner. You can manage both the business database and analytic database in DMS without the need to access the big data platform to obtain data.
- You can change the analytics engine based on your business requirements. You can select a database in DMS as the analytic database based on your business scale. DMS provides a wide range of databases for different scenarios. For example, you can use ApsaraDB RDS to analyze gigabyte-scale data and use AnalyticDB for MySQL to analyze petabyte-scale data.
- You can use permissions in DMS to manage data access and improve data security.
Prerequisites
- Select a business database that you want to analyze. (The sample data of TPCH is used in the examples in this topic.)
- On the Permission Tickets page, apply for the export permission on the business database, change permission on the analytic database (ApsaraDB RDS is used in the examples in this topic), and query permission on the dictionary.
- Create a task flow named tpch_report.
Step 1: Synchronize tables in the business database to the analytic database
This section describes how to use a cross-database Spark SQL task to synchronize tables in the business database to the analytic database. For more information about how to process data across databases, see Query data across multiple database instances.
Create a cross-database Spark SQL task
Step 2: Analyze data in the analytic database
Obtain the number of customers who placed an order, number of orders, and total order amount each day.
Count the number of parts, total discount amount, and total tax amount of orders each day in the lineitem table.
- For more information about how to add a task to synchronize the lineitem table , see Step 1: Synchronize tables in the business database to the analytic database.
- For more information about how configure the task, see the configuration of the daily_orders task in Step 2: Analyze data in the analytic database.
Step 3: Synchronize analysis tables to the business or BI database
For more information about how to add a cross-database Spark SQL task to synchronize tables, see Step 1: Synchronize tables in the business database to the analytic database.
Connect nodes in the directed acyclic graph (DAG)
Connect the task nodes.
Configure the scheduling settings
Run a task flow and view the execution status
Run a task flow
- If
status SUCCEEDED
appears in the last line of the logs, the test run is successful. - If
status FAILED
appears in the last line of the logs, the test run fails. In this case, view the node on which the failure occurs and the reason for the failure in the logs. Then, modify the configuration of the node and try again.
View the execution status of a task
Click Go to O&M in the upper-right corner of the canvas to view the task flow O&M page.
- You can view basic information about the task flow, including the Created At, Changed At, and Scheduling Settings parameters.
- Click the Running History tab, and select Scheduling Trigger or Trigger Manually to view the running history of the task flow.
Note
- Scheduling Trigger: The task flow is automatically run at a scheduled time based on the specified scheduling properties.
- Trigger Manually: The task flow is run after you click Test Run.
- Click the icon next to the Status column to view the detailed run logs of the task flow.
- Click Executive History in the Operating column. In the panel that appears, you can view the time when the task flow was run, the user who ran the task flow, and the scheduling type of the task flow.
- In the Operating column, select End, Rerun, Pause, Restore, or Set Successfully to manage the task flow based on its status.
Note
- If a task flow is successful, you can rerun it.
- If a task flow fails, you can set the Status of the task flow to Success in its running history.
- If a task flow is running, you can terminate or pause it.
- Click the Published Tasks tab to view the Version ID, Published By, and Published At parameters of the task flow.