All Products
Search
Document Center

DataWorks:Monitor data quality

Last Updated:Nov 27, 2025

This topic describes how to use Data Quality to monitor the quality of table data.

Prerequisites

Before you begin this tutorial, ensure that you have completed Synchronize data and Process data.

  • The basic user information from the RDS for MySQL table ods_user_info_d is synchronized to the MaxCompute table ods_user_info_d using Data Integration.

  • The website access logs of users from the user_log.txt file in OSS are synchronized to the MaxCompute table ods_raw_log_d using Data Integration.

  • The collected data is processed into basic user persona data in DataStudio.

Background

Data Quality is an end-to-end platform that lets you check the data quality of heterogeneous data sources, configure alert notifications, and manage data sources. Data Quality monitors data in datasets. You can use Data Quality to monitor MaxCompute tables. When offline MaxCompute data changes, Data Quality checks the data and blocks the nodes that use the data. This prevents downstream data from being affected by dirty data. In addition, Data Quality lets you manage the check result history so that you can analyze and evaluate the data quality.

In this example, the Data Quality feature of DataWorks is used to promptly detect changes to source data and identify dirty data generated during the extract, transform, and load (ETL) process for the user persona analysis case. The following table describes the monitoring requirements for the analysis and transformation procedure of user persona data.

Table name

Requirement details

ods_raw_log_d

Monitor the daily synchronized raw log data to ensure the table is not empty. This prevents invalid processing and mitigates issues promptly.

ods_user_info_d

Check user information data that is synchronized daily. Ensure the row count is not zero (strong rule) and the business primary key is unique (soft rule). This prevents invalid processing and stops data issues early.

dwd_log_info_di

Not monitored separately.

dws_user_info_all_di

Not monitored separately.

ads_user_info_1d

Configure a rule to monitor the daily row count of your user data table. This lets you track changes in daily unique visitors (UVs) and promptly understand your application's status.

Go to the rule configuration page

  1. Go to the Data Quality page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Governance > Data Quality. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Quality.

  2. Go to the rule configuration page for a table.

    In the navigation pane on the left of the Data Quality page, click Configure Rules > Configure By Table and find the target table based on the following parameters.

    • Data Source: MaxCompute

    • Database: Your current production project, such as workshop2024_01.

    • Table: In this tutorial, you need to configure quality monitoring for the ods_raw_log_d, ods_user_info_d, and ads_user_info_1d tables.

  3. Find the target table in the search results and click Rule Management in the Actions column. The Table Quality Details page for the table appears. The following section describes how to configure monitoring rules for the table.

Configure monitoring rules

Configure monitoring rules for the ods_raw_log_d table

The ods_raw_log_d table stores user website access records synchronized from OSS. Based on the business properties of this table, you can configure a rule to verify that the table row count is not 0. You can then associate this rule with a quality monitor to trigger a data quality check.

1. Configure a monitor

A monitor checks whether the data quality in a specified timestamp range (partition) of a table meets your expectations.

In this step, you must set the Data Range parameter of the monitor to dt=$[yyyymmdd-1]. When the monitor is run, it searches for the data partitions that match the parameter value and checks whether the data quality meets your expectations.

This means that each time the scheduling job for the ods_raw_log_d table runs, a data quality monitor is triggered. The monitor then uses its associated quality rules to check the data in the corresponding data range and determine whether the data conforms to the quality check rules.

Perform the following steps:

  1. On the Monitor tab, click Create Monitor.

  2. Configure the monitor.

    Key parameters:

    Parameter

    Configuration example

    Data Range

    dt=$[yyyymmdd-1]

    Trigger Method

    Triggered by production scheduling. Select the ods_raw_log_d node that is created during data synchronization.

    Monitoring Rule

    Configure this in a later section.

    Note

    For more information about how to configure data quality monitoring, see Configure a monitoring rule for a single table.

In this example, the goal is to monitor whether the table data generated by daily scheduling tasks meets expectations. Because the table generates data for the previous day, the data is considered as expected if the scheduling time is the current day and the calculated result is for the previous day.

2. Configure monitoring rules

The ods_raw_log_d table stores website access logs of users synchronized from OSS. The table is used as a source table in a user persona analysis scenario. To prevent invalid data processing and data quality issues, you need to create and configure a strong rule that monitors whether the number of rows in the table is greater than 0. This rule helps you determine whether synchronization tasks wrote data to the related partitions in the table.

If the number of rows in the related partitions of the ods_raw_log_d table is 0, an alert is triggered, the ods_raw_log_d node fails, and the downstream tasks of the ods_raw_log_d node are blocked.

Perform the following steps:

  1. On the Rule Management tab, in the Monitor Perspective section, select an existing quality monitor, such as raw_log_number_of_table_rows_not_0. Then, click Create Rule.

  2. On the System Template Rules tab, find the Table Row Count Is Greater Than 0 rule, click +Use, and then set Degree of importance to Strong rules.

    Note

    In this example, the rule is defined as a strong rule. This indicates that if the number of rows in the ods_raw_log_d table is 0, an alert is triggered and the execution of downstream tasks is blocked.

  3. Click Determine.

    Note

    For more information about configuration items, see Configure rules for a single table.

3. Perform a test run on the monitor

A test run verifies whether the monitoring rules in the monitor are configured correctly. To ensure that the rules are correct and meet your expectations, you can perform a test run after you create them.

  1. On the Rule Management tab, under Monitor Perspective, select the quality monitoring rule that you created, such as raw_log_number_of_table_rows_not_0. Then, click Test Run on the right to open the Test Run dialog box.

  2. In the Test Run dialog box, select Scheduling Time and click Test Run.

  3. After the test run is complete, click View Details to check whether the test passed.

4. Subscribe to the monitor

Data Quality provides a monitoring and alerting feature. You can subscribe to monitors to receive alert notifications about data quality issues. This lets you resolve issues promptly and ensures data security, stability, and timeliness.

  1. On the Rule Management tab, in the Monitor Perspective, select the quality monitoring rule, such as raw_log_number_of_table_rows_not_0, and then click Alert Subscription on the right.

  2. As prompted, add a Notification Method and Recipient, and then click Save in the Actions column.

  3. After you configure the subscription management settings, in the navigation pane on the left, click Quality O&M > Monitor and select My Subscriptions to view and modify the subscribed tasks.

Configure monitoring rules for the ods_user_info_d table

The ods_user_info_d table stores basic user information synchronized from ApsaraDB RDS for MySQL. Based on the business properties of this table, you can configure a rule to monitor whether the number of rows in the table is 0 and a rule to monitor whether the primary key values are unique. You can then associate the rules with a monitor to trigger a quality check for the table.

1. Configure a monitor

A monitor checks whether the data quality in a specified timestamp range (partition) of a table meets your expectations.

In this step, you need to set the Data Range parameter of the quality monitor to dt=$[yyyymmdd-1]. When the monitor runs, it matches the data in this partition to determine whether it meets your expectations.

This means that each time the scheduling task for the ods_user_info_d table runs, the data quality monitor is triggered. The associated quality rules are then used to check whether the data in the corresponding range meets the data quality requirements.

Perform the following steps:

  1. On the Monitor tab, click Create Monitor.

  2. Configure the monitor.

    Key parameters:

    Parameter

    Configuration example

    Data Range

    dt=$[yyyymmdd-1]

    Trigger Method

    Triggered by production scheduling. Select the ods_user_info_d node that is created during data synchronization.

    Monitoring Rule

    Do not configure this now. You will configure it in a later section.

    Note

    For more information about how to configure data quality monitoring, see Configure a monitoring rule for a single table.

2. Configure monitoring rules

The ods_user_info_d table stores basic user information synchronized from RDS for MySQL. The table is used as a source table in a user persona analysis scenario. To prevent invalid data processing and data quality issues, you need to create and configure a strong rule that monitors whether the number of rows in the table is greater than 0. This rule helps you determine whether synchronization tasks wrote data to the related partitions in the table.

After the rule takes effect, if the number of rows in the related partitions of the ods_user_info_d table is 0, an alert is triggered, the ods_user_info_d node fails, and the downstream tasks of the ods_user_info_d node are blocked.

Perform the following steps:

  1. On the Rule Management tab, in the Monitor Perspective section, select a monitor. In this example, the user_info_quality_control monitor is selected. Then, click Create Rule.

  2. On the System Template Rules tab, find the Table Row Count Is Greater Than 0 rule and click +Use. Then, set Degree of importance to Strong rules.

    Note

    In this example, the rule is defined as a strong rule. This indicates that if the number of rows in the ods_user_info_d table is 0, an alert is triggered and the execution of downstream tasks is blocked.

  3. On the System Template Rules tab, find the Unique Value Count, Fixed Value rule, click +Use, and then modify the Rule Scope, Monitoring Threshold, and Degree of importance parameters as follows.

    • Rule Scope: uid(STRING)

    • Monitoring Threshold: Normal threshold = 0

    • Degree of importance: soft rule

  4. Click Determine.

    Note

    For more information about configuration items, see Configure rules for a single table.

3. Other configurations

The procedures to run a quality monitoring test and subscribe to quality monitoring are the same as those described in Configure monitoring rules for the ods_raw_log_d table.

Configure monitoring rules for the ads_user_info_1d table

The ads_user_info_1d table is the final result table. Based on the business properties of the table, you can configure a rule that monitors the fluctuation of the number of rows in the table and a rule that checks whether the primary key values are unique. This lets you observe the fluctuation of daily unique visitors (UVs) and learn about online traffic fluctuations at the earliest opportunity. You can then associate the rules with a quality monitor to trigger data quality checks for the table.

1. Configure the partition filter expression

A monitor checks whether the data quality in a specified timestamp range (partition) of a table meets your expectations.

In this step, you must set the Data Range parameter of the monitor to dt=$[yyyymmdd-1]. When the monitor runs, it matches the data in this partition to determine whether the data quality meets your expectations.

This means that each time the scheduling task for the ads_user_info_1d table is run, quality monitoring is triggered and its associated quality rules are used to check whether the data in the corresponding data range meets the data quality check rules.

Perform the following steps:

  1. On the Monitor tab, click Create Monitor.

  2. Configure the monitor.

    Key parameters:

    Parameter

    Configuration Example

    Data Range

    dt=$[yyyymmdd-1]

    Trigger Method

    Triggered by production scheduling. Select the ads_user_info_1d node that is created in Data Processing.

    Monitoring Rule

    Do not configure this now. This is configured in a later section.

    Note

    For more information about how to configure data quality monitoring, see Configure a monitoring rule for a single table.

2. Configure monitoring rules

The ads_user_info_1d table is used for user persona analysis. To detect the fluctuation of daily UVs, you need to create and configure a rule that monitors the fluctuation of the number of rows in the aggregate data and a rule that monitors whether the primary key values are unique. This helps you observe the fluctuation of daily UVs and learn about online traffic fluctuations at the earliest opportunity.

When this rule is active, it triggers an alert if duplicate primary keys are detected in the ads_user_info_1d table. The rule also triggers a warning alert if the 7-day row count fluctuation rate is greater than 10% but less than 50%, and a critical alert if the rate exceeds 50%.

Note

Handling Policies is configured in data quality monitoring:

  • Strong rule-Red alert: The handling policy is Blocks. This indicates that if a data quality issue is detected in the table, the scheduling node in the production environment that writes data to the table is identified, and the system sets the running status of the node to Failed. In this case, the descendant nodes of the node cannot run, which blocks the production pipeline and prevents the spread of dirty data.

  • For other exceptions, the handling policy is Alert. This means that if a data quality issue is detected, the system sends alert information to the alert subscription channels of the monitoring job.

Note the following when you configure monitoring rules:

  • If you set a strong rule, an error alert blocks descendant nodes. Other alerts do not.

  • If you set a soft rule, no alerts block descendant nodes.

Perform the following steps:

  1. On the Rule Management tab, in the Monitoring Perspective, select an existing quality monitor, such as ads_user_info_quality_control. Then, click Create Rule.

  2. On the System Template Rules tab, find the Number Of Rows, 7-day Volatility rule, click +Use, and then configure the Monitoring Threshold and Degree of importance as follows.

    • Monitoring Threshold:

      • Red Threshold > 50%

      • Orange Threshold > 10%

      • Normal Threshold <= 10%

    • Degree of importance: soft rule

  3. On the System Template Rules tab, find the Table Row Count Is Greater Than 0 rule, click +Use, and then set Degree of importance to Strong rules.

  4. Click Determine.

    Note

    For more information about configuration items, see Configure rules for a single table.

3. Other configurations

The procedures to perform a test run on a quality monitor and subscribe to a quality monitor are the same as those described in the Configure monitoring rules for the ods_raw_log_d table section.

What to do next

After the data is processed, you can use DataAnalysis to visualize the data. For more information, see Visualize data on a dashboard.