All Products
Search
Document Center

DataWorks:Monitor data quality

Last Updated:Aug 11, 2025

This topic describes how to use Data Quality to monitor table data.

Prerequisites

Before you begin, ensure you have completed the steps in Synchronize data and Process data.

  • The basic user information in the ApsaraDB RDS for MySQL table ods_user_info_d is synchronized to the MaxCompute table ods_user_info_d_odps by using Data Integration.

  • The website access logs of users in user_log.txt in Object Storage Service (OSS) are synchronized to the MaxCompute table ods_raw_log_d_odps by using Data Integration.

  • The collected data is processed into basic user profile data in DataStudio.

Background information

Data Quality is an end-to-end platform that allows you to check the data quality of heterogeneous data sources, configure alert notifications, and manage data sources. Data Quality monitors data in datasets. You can use Data Quality to monitor MaxCompute tables. When offline MaxCompute data changes, Data Quality checks the data and blocks nodes that use the data. This prevents downstream data from being affected by dirty data. In addition, Data Quality allows you to manage the check result history. This way, you can analyze and grade the data quality.

In this user profile analysis tutorial, Data Quality is used to promptly detect source data changes and any dirty data generated during the ETL process. The following table describes the monitoring requirements for the analysis and processing procedure of user profile data.

Table name

Detailed requirement

ods_raw_log_d_odps

Perform a daily non-zero row count check on the raw log data to prevent invalid downstream processing.

ods_user_info_d_odps

Configure a strong rule that monitors whether the number of rows synchronized to the user information table is 0 on a daily basis, and a weak rule that monitors whether the business primary key in the table is unique on a daily basis. These rules help prevent invalid data processing.

dwd_log_info_di_odps

No separate rule is required.

dws_user_info_all_di_odps

No separate rule is required.

ads_user_info_1d_odps

Configure a rule that monitors the fluctuation of the number of rows in the user information table on a daily basis. The rule is used to observe the fluctuation of daily unique visitors (UVs) and helps you learn the application status at the earliest opportunity.

Go to the Configure by Table page

  1. Go to the Data Quality page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Governance > Data Quality. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Quality.

  2. Go to the Configure by Table page.

    In the left-side navigation pane of the Data Quality page, choose Configure Rules > Configure by Table. On the Configure by Table page, find the desired table based on the following filter conditions:

    • In the Connection section, select MaxCompute.

    • In the MaxCompute category, select the current project in the production environment. In this example, workshop2024_01 is used.

    • On the right side of the Configure by Table page, specify filter conditions to find the ods_raw_log_d_odps, ods_user_info_d_odps, and ads_user_info_1d_odps tables for which you want to configure a monitor.

  3. Find the desired table in the search results and click Create Monitor in the Actions column. The Table Quality Details page of the table appears. The following section describes the configurations of each table.

Configure monitoring rules

Configure monitoring rules for the ods_raw_log_d_odps table

The ods_raw_log_d_odps table is used to store website access logs of users synchronized from OSS. You can configure a monitoring rule that monitors whether the number of rows in the table is 0 for the table based on the business properties of the table. Then, you can associate the monitoring rule with a monitor to trigger quality check for the table.

1. Configure a monitor

You can use a monitor to check whether the quality of data in the specified range (partition) of a table meets your expectations.

In this step, you must set the Data Range parameter of the monitor to dt=$[yyyymmdd-1]. When triggered, the monitor checks the data in the specified partition to verify it meets the configured quality rules.

In this case, each time the scheduling node that is used to write data to the ods_raw_log_d_odps table is run, the monitor is triggered and the rules that are associated with the monitor are used to check whether the quality of data in the specified range meets your expectations.

You need to perform the following steps:

  1. On the Monitor tab, click Create Monitor.

  2. Configure the parameters of the monitor.

    The following table describes the key parameters.

    Parameter

    Description

    Data Range

    dt=$[yyyymmdd-1]

    Trigger Method

    The trigger method. Set this parameter to Triggered by Node Scheduling in Production Environment and select the ods_raw_log_d_odps node that is created during data synchronization.

    Monitoring Rule

    You do not need to configure this parameter. The monitoring rules are configured in the Configure monitoring rules section.

    Note

    For more information about how to configure a monitor, see Configure a monitoring rule for a single table.

2. Configure monitoring rules

The ods_raw_log_d_odps table is used to store website access logs of users synchronized from OSS. The table is used as a source table in a user profile analysis scenario. To prevent invalid data processing and data quality issues, you need to create and configure a strong rule that monitors whether the number of rows in the table is greater than 0. This rule helps you determine whether synchronization tasks wrote data to the related partitions in the table.

If the number of rows in the related partitions of the ods_raw_log_d_odps table is 0, an alert is triggered, the ods_raw_log_d_odps node fails and exits, and the descendant nodes of the ods_raw_log_d_odps node are blocked from running.

You need to perform the following steps:

  1. In the Monitor Perspective section of the Rule Management tab, select a monitor. In this example, the raw_log_number_of_table_rows_not_0 monitor is selected. Then, click Create Rule on the right side of the tab. The Create Rule panel appears.

  2. On the System Template tab of the Create Rule panel, find the Table is not empty rule and click Use. On the right side of the panel, set the Degree of Importance parameter to Strong Rule.

    Note

    In this example, the rule is defined as a strong rule. This indicates that when the number of rows in the ods_raw_log_d_odps table is found to be 0, an alert is triggered and the descendant nodes are blocked from running.

  3. Click Determine.

    Note

    For information about other parameters configured for a monitoring rule, see Configure a monitoring rule for a single table.

3. Perform a test run on the monitor

You can perform a test run to verify whether the configurations of the monitoring rules that are associated with the monitor work as expected. To ensure that the configurations of the rules are correct and meet your expectations, perform a test run on the monitor after you create the rules that are associated with the monitor.

  1. In the Monitor Perspective section of the Rule Management tab, select a monitor. In this example, the raw_log_number_of_table_rows_not_0 monitor is selected. Then, click Test Run on the right side of the tab. The Test Run dialog box appears.

  2. In the Test Run dialog box, configure the Scheduling Time parameter and click Test Run.

  3. After the test run is complete, click View Details to check whether the data passes the test.

4. Subscribe to the monitor

Data Quality provides the monitoring and alerting feature. You can subscribe to monitors to receive alert notifications about data quality issues. This way, you can resolve the data quality issues at the earliest opportunity and ensure data security, data stability, and the timeliness of data generation.

  1. In the Monitor Perspective section of the Rule Management tab, select a monitor. In this example, the raw_log_number_of_table_rows_not_0 monitor is selected. Then, click Alert Subscription on the right side of the tab.

  2. In the Alert Subscription dialog box, configure the Notification Method and Recipient parameters, and click Save in the Actions column.

  3. After the subscription configuration is complete, choose Quality O&M > Monitor in the left-side navigation pane. Then, select My Subscriptions on the Monitor page to view and modify the subscribed monitors.

Configure monitoring rules for the ods_user_info_d_odps table

The ods_user_info_d_odps table is used to store basic user information synchronized from ApsaraDB RDS for MySQL. You can configure a rule that monitors whether the number of rows in the table is 0 and a rule that monitors whether the primary key values are unique for the table based on the business properties of the table. Then, you can associate the rules with a monitor to trigger quality check for the table.

1. Configure a monitor

You can use a monitor to check whether the quality of data in the specified range (partition) of a table meets your expectations.

In this step, you must set the Data Range parameter of the monitor to dt=$[yyyymmdd-1]. When the monitor is run, the monitor searches for the data partitions that match the parameter value and checks whether the quality of the data meets your expectations.

In this case, each time the scheduling node that is used to write data to the ods_user_info_d_odps table is run, the monitor is triggered and the rules that are associated with the monitor are used to check whether the quality of data in the specified range meets your expectations.

You need to perform the following steps:

  1. On the Monitor tab, click Create Monitor.

  2. Configure the parameters of the monitor.

    The following table describes the key parameters.

    Parameter

    Description

    Data Range

    dt=$[yyyymmdd-1]

    Trigger Method

    The trigger method. Set this parameter to Triggered by Node Scheduling in Production Environment and select the ods_user_info_d_odps node that is created during data synchronization.

    Monitoring Rule

    You do not need to configure this parameter. The monitoring rules are configured in the Configure monitoring rules section.

    Note

    For more information about how to configure a monitor, see Configure a monitoring rule for a single table.

2. Configure monitoring rules

The ods_user_info_d_odps table is used to store basic user information synchronized from ApsaraDB RDS for MySQL. The table is used as a source table in a user profile analysis scenario. To prevent invalid data processing and data quality issues, you need to create and configure a strong rule that monitors whether the number of rows in the table is greater than 0. This rule helps you determine whether synchronization tasks wrote data to the related partitions in the table.

After the monitoring rules take effect, if the number of rows in the related partitions of the ods_user_info_d_odps table is 0, an alert is triggered, the ods_user_info_d_odps node fails and exits, and the descendant nodes of the ods_user_info_d_odps node are blocked from running.

You need to perform the following steps:

  1. In the Monitor Perspective section of the Rule Management tab, select a monitor. In this example, the user_info_quality_control monitor is selected. Then, click Create Rule on the right side of the tab. The Create Rule panel appears.

  2. On the System Template tab of the Create Rule panel, find the Table is not empty rule and click Use. On the right side of the panel, set the Degree of Importance parameter to Strong Rule.

    Note

    In this example, the rule is defined as a strong rule. This indicates that when the number of rows in the ods_user_info_d_odps table is found to be 0, an alert is triggered and the descendant nodes are blocked from running.

  3. On the System Template tab of the Create Rule panel, find the Unique value. fixed value rule and click Use. On the right side of the panel, configure the Rule Scope, Monitoring Threshold, and Degree of Importance parameters.

    • Rule Scope: Set it to uid(STRING).

    • Monitoring Threshold: For the Normal threshold parameter, set the comparison operator to = and the value to 0.

    • Degree of Importance: Set it to Weak rules.

  4. Click Determine.

    Note

    For information about other parameters configured for a monitoring rule, see Configure a monitoring rule for a single table.

3. Other configurations

The operations to perform a test run on the monitor and subscribe to the monitor are the same as the operations that are described in the Configure monitoring rules for the ods_raw_log_d_odps table section.

Configure monitoring rules for the ads_user_info_1d_odps table

The ads_user_info_1d_odps table is the final result table. You can configure a rule that monitors the fluctuation of the number of rows in the table and a rule that monitors whether the primary key values are unique for the final result table based on the business properties of the table. This way, you can observe fluctuations in daily UVs and promptly learn about online traffic changes. Then, you can associate the rules with a monitor to trigger quality check for the table.

1. Configure a partition filter expression

You can use a monitor to check whether the quality of data in the specified range (partition) of a table meets your expectations.

In this step, you must set the Data Range parameter of the monitor to dt=$[yyyymmdd-1]. When the monitor is run, the monitor searches for the data partitions that match the parameter value and checks whether the quality of the data meets your expectations.

In this case, each time the scheduling node that is used to write data to the ads_user_info_1d_odps table is run, the monitor is triggered and the rules that are associated with the monitor are used to check whether the quality of data in the specified range meets your expectations.

You need to perform the following steps:

  1. On the Monitor tab, click Create Monitor.

  2. Configure the parameters of the monitor.

    The following table describes the key parameters.

    Parameter

    Description

    Data Range

    dt=$[yyyymmdd-1]

    Trigger Method

    The trigger method. Set this parameter to Triggered by Node Scheduling in Production Environment and select the ads_user_info_1d_odps node that is created during data synchronization.

    Monitoring Rule

    You do not need to configure this parameter. The monitoring rules are configured in the Configure monitoring rules section.

    Note

    For more information about how to configure a monitor, see Configure a monitoring rule for a single table.

2. Configure monitoring rules

The ads_user_info_1d_odps table is used for user profile analysis. To detect the fluctuation of daily UVs, you need to create and configure a rule that monitors the fluctuation of the number of rows in the aggregate data in the table and a rule that monitors whether the primary key values are unique for the table. This helps you observe the fluctuation of daily UVs and learn the online traffic fluctuation at the earliest opportunity.

After the monitoring rules take effect, if repeated primary keys exist in the ads_user_info_1d_odps table, an alert is triggered. If the fluctuation rate of the number of rows in the ads_user_info_1d_odps table within seven days is greater than 10% and less than 50%, a warning alert is triggered. If the fluctuation rate of the number of rows in the ads_user_info_1d_odps table within seven days is greater than or equal to 50%, a critical alert is triggered.

Note

A handling policy is configured in the monitor.

  • If a rule is defined as a strong rule and the critical threshold is exceeded, the handling policy is block. This indicates that if a data quality issue is detected in the table, the scheduling node in the production environment that is used to write data to the table is identified, and the system sets the running status of the node to Failed. In this case, the descendant nodes of the node cannot be run, which blocks the production link and prevents the spread of dirty data.

  • If other exceptions are detected, the handling policy is alert. This indicates that if a data quality issue is detected in the table, the system sends alert notifications to the alert recipient by using the notification method configured in the monitor.

Take note of the following items when you configure monitoring rules:

  • If a rule is defined as a strong rule and the critical threshold is exceeded, critical alerts are reported and descendant nodes are blocked. If other exceptions occur, alerts are reported but descendant nodes are not blocked.

  • If a rule is defined as a weak rule and the critical threshold is exceeded, critical alerts are reported but descendant nodes are not blocked. If other exceptions occur, alerts are reported but descendant nodes are not blocked.

You need to perform the following steps:

  1. In the Monitor Perspective section of the Rule Management tab, select a monitor. In this example, the ads_user_info_quality_control monitor is selected. Then, click Create Rule on the right side of the tab. The Create Rule panel appears.

  2. On the System Template tab of the Create Rule panel, find the Number of rows. 7-day volatility rule and click Use. On the right side of the panel, configure the Monitoring Threshold and Degree of Importance parameters.

    • Monitoring Threshold:

      • For the Red Threshold parameter, set the comparison operator to > and the value to 50%.

      • For the Orange threshold parameter, set the comparison operator to > and the value to 10%.

      • For the Normal threshold parameter, set the comparison operator to <= and the value to 10%.

    • Degree of Importance: Set it to Weak rules.

  3. On the System Template tab of the Create Rule panel, find the Table is not empty rule and click Use. On the right side of the panel, set the Degree of Importance parameter to Strong Rule.

  4. Click Determine.

    Note

    For information about other parameters configured for a monitoring rule, see Configure a monitoring rule for a single table.

3. Other configurations

The operations to perform a test run on the monitor and subscribe to the monitor are the same as the operations that are described in the Configure monitoring rules for the ods_raw_log_d_odps table section.

What to do next

After the data is processed, you can use DataAnalysis to visualize the data. For more information, see Visualize data on a dashboard.