All Products
Search
Document Center

DataWorks:Monitor data quality

Last Updated:Mar 31, 2025

This topic describes how to use monitoring rules in Data Quality to detect exceptions that occur when data synchronization nodes are scheduled each day. In this topic, the user information table ods_user_info_d_starrocks is used, and a strong rule that monitors whether the number of rows synchronized to the user information table is greater than 0 and a weak rule that monitors whether the business primary key in the table is unique are configured for the table. When the data synchronization node that generates the table is scheduled each day, the rules are triggered to detect the missing source data and duplicate primary key exceptions in real time to ensure the reliability of downstream computing of the node. Refer to the following sections to configure monitoring rules in Data Quality to monitor data quality.

Prerequisites

Data is synchronized and processed.

  • The basic user information in the ApsaraDB RDS for MySQL table ods_user_info_d is synchronized to the ods_user_info_d_starrocks table created in an E-MapReduce (EMR) Serverless StarRocks instance by using Data Integration.

  • The website access logs of users in user_log.txt in Object Storage Service (OSS) are synchronized to the ods_raw_log_d_starrocks table created in an EMR Serverless StarRocks instance by using Data Integration.

  • The collected data is processed into basic user profile data in Data Studio.

Analysis of data quality monitoring requirements

In this example, Data Quality is used to promptly detect changes to source data in the user profile analysis case and dirty data generated when the extract, transform, and load (ETL) operations are performed on the source data. The following table describes the monitoring requirements for the user profile analysis and processing procedure.

Table name

Detailed requirement

ods_raw_log_d_starrocks

Configure a strong rule that monitors whether the number of rows synchronized to the raw log data table is greater than 0 on a daily basis. This ensures that the raw log data can be successfully obtained on a daily basis and prevents subsequent computing from being affected due to missing data.

ods_user_info_d_starrocks

Configure a strong rule that monitors whether the number of rows synchronized to the user information table is greater than 0 on a daily basis, and a weak rule that monitors whether the business primary key in the table is unique on a daily basis. This ensures that the user information can be successfully obtained on a daily basis, prevents data duplication, and ensures accuracy of subsequent computing.

dwd_log_info_di_starrocks

Run the node without configuring a monitoring rule.

dws_user_info_all_di_starrocks

Run the node without configuring a monitoring rule.

ads_user_info_1d_starrocks

Configure a rule that monitors the fluctuation of the number of rows in the user information table on a daily basis. The rule is used to observe the fluctuation of daily unique visitors (UVs) and helps you learn the application status at the earliest opportunity.

You can perform the steps in the following sections to configure monitoring rules for the ods_user_info_d_starrocks table to monitor the quality of the table data generated based on periodic scheduling.

Step 1: Go to the Configure by Table page

  1. Go to the Data Quality page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Governance > Data Quality. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Quality.

  2. Go to the Configure by Table page.

    In the left-side navigation pane of the Data Quality page, choose Configure Rules > Configure by Table. On the Configure by Table page, search for the desired table based on the following filter conditions:

    • In the Connection section, select StarRocks.

    • On the right side of the Configure by Table page, specify filter conditions to find the ods_user_info_d_starrocks table.

  3. Find the desired table in the search results and click Rule Management in the Actions column. The Table Quality Details page of the table appears. The following sections describe the configurations of the table.

Step 2: Configure monitoring rules

In this section, the rule that monitors whether the specified partition contains data is configured for the ods_user_info_d_starrocks table. The configurations include creating monitoring rules, specifying the trigger method of the rules, and specifying the handling policy for exceptions detected by the rules.

  1. Select a monitoring scope.

    1. On the Monitor tab, click Create Monitor.

    2. Set the Data Range parameter to dt=$[yyyymmdd-1].

      Note

      To monitor the table data generated based on periodic scheduling, make sure that the value of the Data Range parameter corresponds to the partition generated for the table on the current day.

  2. Create monitoring rules.

    In this section, the rule that monitors whether the number of rows in a table is greater than 0 is configured for the ods_user_info_d_starrocks table. For more information about how to configure monitoring rules, see Configure a monitoring rule for a single table.

    1. On the Create Monitor page, click Create Rule. The Create Rule panel appears.

    2. On the System Template tab of the Create Rule panel, find the Table is not empty rule and click Use. On the right side of the panel, set the Degree of Importance parameter to Strong Rule.

      Note

      In this example, the rule is defined as a strong rule. This indicates that when the number of rows in the ods_user_info_d_starrocks table is found to be 0, an alert is triggered and the descendant nodes are blocked from running.

    3. On the System Template tab of the Create Rule panel, find the Unique value. fixed value rule and click Use. On the right side of the panel, configure the Rule Scope, Monitoring Threshold, and Degree of Importance parameters.

      • Rule Scope: Set it to uid(STRING).

      • Monitoring Threshold: For the Normal threshold parameter, set the comparison operator to = and the value to 0.

      • Degree of Importance: Set it to Weak rules.

    4. Click Determine to save the configured monitoring rules.

  3. Specify the trigger method of the rules.

    Set the Trigger Method parameter to Triggered by Node Scheduling in Production Environment and select the ods_user_info_d_starrocks node that is created during data synchronization.

  4. Specify the handling policy for exceptions detected by the rules.

    Set the handling policy to blocking the running of the node or sending an alert notification to the recipient based on your business requirements.

  5. After the configuration is complete, click Save to save the configurations of the monitor.

Step 3: Perform a test run on the monitor

After the configuration is complete, you can perform a test run to verify whether the configurations of the monitoring rules that are associated with the monitor work as expected. To ensure that the configurations of the rules are correct and meet your expectations, perform a test run on the monitor after you create and associate the rules with the monitor to check the monitoring effect of the monitor.

  1. In the Monitor Perspective section of the Rule Management tab, select the created monitor. Then, click Test Run on the right side of the tab. The Test Run dialog box appears.

  2. In the Test Run dialog box, configure the Scheduling Time parameter and click Test Run.

  3. After the test run is complete, click View Details to check whether the data passes the test.

Step 4: Subscribe to the monitor

After you configure the monitoring rules, you can perform the following operations to configure an alert notification method and a recipient to which alert notifications are sent.

  1. In the Monitor Perspective section of the Rule Management tab, select the created monitor.

  2. Click Alert Subscription on the right side of the tab.

  3. In the Alert Subscription dialog box, configure the Notification Method and Recipient parameters, and click Save in the Actions column.

  4. After the subscription configuration is complete, choose Quality O&M > Monitor in the left-side navigation pane. Then, select My Subscriptions on the Monitor page to view and modify the subscribed monitors.

What to do next

After the data is processed, you can use DataAnalysis to visualize the data. For more information, see Visualize data on a dashboard.