All Products
Search
Document Center

DataWorks:Monitor data quality

Last Updated:Mar 26, 2026

When an ETL pipeline processes bad data, every downstream table inherits the problem. Data Quality lets you catch these issues at the source: it checks each MaxCompute table after its scheduling node runs, and when a rule fails, it can block all downstream tasks before dirty data spreads further.

This tutorial shows how to configure monitoring rules for three tables in the user persona analysis pipeline: ods_raw_log_d, ods_user_info_d, and ads_user_info_1d.

Prerequisites

Before you begin, ensure that you have completed:

  • Synchronize data — synchronizes ods_user_info_d from ApsaraDB RDS for MySQL and ods_raw_log_d from Object Storage Service (OSS) into MaxCompute using Data Integration

  • Process data — processes the raw data into user persona data in DataStudio

Key concepts

Concept

Description

Monitor

Defines the data range to check and the trigger condition. A monitor is scoped to a partition (for example, dt=$[yyyymmdd-1] selects yesterday's data) and fires each time the associated scheduling node completes.

Monitoring rule

Defines what to check (for example, row count > 0, primary key uniqueness) and what happens on violation.

Rules have two severity levels that determine the outcome:

Severity

Violation outcome

Strong rule

The scheduling node is set to Failed, and all downstream tasks are blocked. Use for checks where dirty data must not proceed.

Soft rule

An alert notification is sent to subscribers. Downstream tasks continue. Use for checks where you want visibility without blocking the pipeline.

Monitoring requirements

The following table summarizes the quality checks for each table in this tutorial:

Table

Rule

Severity

ods_raw_log_d

Row count > 0

Strong rule — blocks downstream if table is empty after daily sync from OSS

ods_user_info_d

Row count > 0

Strong rule — blocks downstream if no rows loaded

ods_user_info_d

Primary key uid uniqueness

Soft rule — alerts subscribers if duplicates detected

dwd_log_info_di

Not monitored

dws_user_info_all_di

Not monitored

ads_user_info_1d

Row count > 0

Strong rule — blocks downstream if result table is empty

ads_user_info_1d

7-day row count volatility

Soft rule — alerts if daily UV count deviates more than 10% or 50% from the 7-day average

Go to the rule configuration page

  1. Log on to the DataWorks console. In the top navigation bar, select the target region. In the left-side navigation pane, choose Data Governance > Data Quality. Select the target workspace from the drop-down list and click Go to Data Quality.

  2. In the left navigation pane of the Data Quality page, click Configure Rules > Configure By Table.

  3. Search for the target table using the following filters:

    • Data Source: MaxCompute

    • Database: your production project, for example, workshop2024_01

    • Table: enter the table name

  4. In the search results, click Rule Management in the Actions column. The Table Quality Details page appears.

Repeat steps 3–4 for each table you want to configure: ods_raw_log_d, ods_user_info_d, and ads_user_info_1d.

Configure monitoring rules for ods_raw_log_d

The ods_raw_log_d table stores website access logs synchronized daily from OSS. If the table is empty after a sync, all downstream processing is wasted. Configure a strong rule that blocks downstream tasks when the row count is 0.

Step 1: Create a monitor

A monitor defines the partition to check and the scheduling node that triggers the check.

  1. On the Monitor tab, click Create Monitor.

  2. Configure the monitor with the following key parameters:

    Setting Data Range to dt=$[yyyymmdd-1] means the monitor checks yesterday's partition each time the scheduling job runs today. For a full list of monitor parameters, see Configure a monitoring rule for a single table.

    Parameter

    Value

    Data Range

    dt=$[yyyymmdd-1]

    Trigger Method

    Triggered by production scheduling. Select the ods_raw_log_d node created during data synchronization.

    Monitoring Rule

    Leave blank for now.

Step 2: Add a monitoring rule

  1. On the Rule Management tab, in the Monitor Perspective section, select the monitor you created (for example, raw_log_number_of_table_rows_not_0). Click Create Rule.

  2. On the System Template Rules tab, find Table Row Count Is Greater Than 0, click +Use, and set Degree of importance to Strong rules.

  3. Click Determine.

    With a strong rule, if the row count in the target partition is 0, the ods_raw_log_d node fails and all downstream tasks are blocked. For parameter details, see Configure rules for a single table.

Step 3: Run a test

Test the monitor to verify the rules are configured correctly.

  1. On the Rule Management tab, under Monitor Perspective, select the monitor (for example, raw_log_number_of_table_rows_not_0) and click Test Run.

  2. In the Test Run dialog box, select Scheduling Time and click Test Run.

  3. After the test completes, click View Details to check whether the test passed.

Step 4: Subscribe to alerts

Subscribe to the monitor to receive notifications when a rule is violated.

  1. On the Rule Management tab, under Monitor Perspective, select the monitor and click Alert Subscription.

  2. Add a Notification Method and Recipient, then click Save.

  3. To view or modify subscriptions, go to Quality O&M > Monitor in the left navigation pane and select My Subscriptions.

Configure monitoring rules for ods_user_info_d

The ods_user_info_d table stores basic user information synchronized daily from ApsaraDB RDS for MySQL. Configure two rules: a strong rule for row count and a soft rule for primary key uniqueness.

Step 1: Create a monitor

  1. On the Monitor tab, click Create Monitor.

  2. Configure the monitor with the following key parameters:

    For a full list of monitor parameters, see Configure a monitoring rule for a single table.

    Parameter

    Value

    Data Range

    dt=$[yyyymmdd-1]

    Trigger Method

    Triggered by production scheduling. Select the ods_user_info_d node created during data synchronization.

    Monitoring Rule

    Leave blank for now.

Step 2: Add monitoring rules

  1. On the Rule Management tab, under Monitor Perspective, select the monitor (for example, user_info_quality_control). Click Create Rule.

  2. On the System Template Rules tab, find Table Row Count Is Greater Than 0, click +Use, and set Degree of importance to Strong rules.

  3. On the System Template Rules tab, find Unique Value Count, Fixed Value, click +Use, and configure the following parameters:

    Parameter

    Value

    Rule Scope

    uid(STRING)

    Monitoring Threshold

    Normal threshold = 0

    Degree of importance

    soft rule

  4. Click Determine.

    If the row count is 0, the ods_user_info_d node fails and all downstream tasks are blocked. For parameter details, see Configure rules for a single table.

Step 3: Run a test and subscribe to alerts

The steps to test the monitor and subscribe to alerts are the same as those in Configure monitoring rules for ods_raw_log_d.

Configure monitoring rules for ads_user_info_1d

The ads_user_info_1d table is the final result table for user persona analysis. Configure a volatility rule to track daily unique visitor (UV) fluctuations and a strong rule to ensure the table is not empty.

Step 1: Create a monitor

  1. On the Monitor tab, click Create Monitor.

  2. Configure the monitor with the following key parameters:

    For a full list of monitor parameters, see Configure a monitoring rule for a single table.

    Parameter

    Value

    Data Range

    dt=$[yyyymmdd-1]

    Trigger Method

    Triggered by production scheduling. Select the ads_user_info_1d node created during data processing.

    Monitoring Rule

    Leave blank for now.

Step 2: Add monitoring rules

  1. On the Rule Management tab, in the Monitor Perspective section, select the monitor (for example, ads_user_info_quality_control). Click Create Rule.

  2. On the System Template Rules tab, find Number Of Rows, 7-day Volatility, click +Use, and configure the thresholds: Set Degree of importance to soft rule. Alerts notify subscribers but do not block downstream tasks.

    Threshold

    Value

    Outcome

    Red Threshold

    > 50%

    Alert sent — row count deviated more than 50% from the 7-day average

    Orange Threshold

    > 10%

    Alert sent — row count deviated more than 10% from the 7-day average

    Normal Threshold

    ≤ 10%

    No alert

  3. On the System Template Rules tab, find Table Row Count Is Greater Than 0, click +Use, and set Degree of importance to Strong rules.

  4. Click Determine. The following table summarizes how violations are handled for this table:

    For parameter details, see Configure rules for a single table.

    Rule

    Severity

    Handling policy

    Table row count > 0

    Strong rule

    Scheduling node is set to Failed; downstream tasks are blocked

    7-day row count volatility > 50%

    Soft rule

    Alert sent to subscribers; downstream tasks continue

    7-day row count volatility > 10%

    Soft rule

    Alert sent to subscribers; downstream tasks continue

Step 3: Run a test and subscribe to alerts

The steps to test the monitor and subscribe to alerts are the same as those in Configure monitoring rules for ods_raw_log_d.

What's next

With quality monitoring in place, visualize the processed user persona data in a dashboard. See Visualize data on a dashboard.