When an ETL pipeline processes bad data, every downstream table inherits the problem. Data Quality lets you catch these issues at the source: it checks each MaxCompute table after its scheduling node runs, and when a rule fails, it can block all downstream tasks before dirty data spreads further.
This tutorial shows how to configure monitoring rules for three tables in the user persona analysis pipeline: ods_raw_log_d, ods_user_info_d, and ads_user_info_1d.
Prerequisites
Before you begin, ensure that you have completed:
Synchronize data — synchronizes
ods_user_info_dfrom ApsaraDB RDS for MySQL andods_raw_log_dfrom Object Storage Service (OSS) into MaxCompute using Data IntegrationProcess data — processes the raw data into user persona data in DataStudio
Key concepts
Concept | Description |
Monitor | Defines the data range to check and the trigger condition. A monitor is scoped to a partition (for example, |
Monitoring rule | Defines what to check (for example, row count > 0, primary key uniqueness) and what happens on violation. |
Rules have two severity levels that determine the outcome:
Severity | Violation outcome |
Strong rule | The scheduling node is set to Failed, and all downstream tasks are blocked. Use for checks where dirty data must not proceed. |
Soft rule | An alert notification is sent to subscribers. Downstream tasks continue. Use for checks where you want visibility without blocking the pipeline. |
Monitoring requirements
The following table summarizes the quality checks for each table in this tutorial:
Table | Rule | Severity |
| Row count > 0 | Strong rule — blocks downstream if table is empty after daily sync from OSS |
| Row count > 0 | Strong rule — blocks downstream if no rows loaded |
| Primary key | Soft rule — alerts subscribers if duplicates detected |
| Not monitored | — |
| Not monitored | — |
| Row count > 0 | Strong rule — blocks downstream if result table is empty |
| 7-day row count volatility | Soft rule — alerts if daily UV count deviates more than 10% or 50% from the 7-day average |
Go to the rule configuration page
Log on to the DataWorks console. In the top navigation bar, select the target region. In the left-side navigation pane, choose Data Governance > Data Quality. Select the target workspace from the drop-down list and click Go to Data Quality.
In the left navigation pane of the Data Quality page, click Configure Rules > Configure By Table.
Search for the target table using the following filters:
Data Source: MaxCompute
Database: your production project, for example,
workshop2024_01Table: enter the table name
In the search results, click Rule Management in the Actions column. The Table Quality Details page appears.
Repeat steps 3–4 for each table you want to configure: ods_raw_log_d, ods_user_info_d, and ads_user_info_1d.
Configure monitoring rules for ods_raw_log_d
The ods_raw_log_d table stores website access logs synchronized daily from OSS. If the table is empty after a sync, all downstream processing is wasted. Configure a strong rule that blocks downstream tasks when the row count is 0.
Step 1: Create a monitor
A monitor defines the partition to check and the scheduling node that triggers the check.
On the Monitor tab, click Create Monitor.
Configure the monitor with the following key parameters:
Setting Data Range to
dt=$[yyyymmdd-1]means the monitor checks yesterday's partition each time the scheduling job runs today. For a full list of monitor parameters, see Configure a monitoring rule for a single table.Parameter
Value
Data Range
dt=$[yyyymmdd-1]Trigger Method
Triggered by production scheduling. Select the
ods_raw_log_dnode created during data synchronization.Monitoring Rule
Leave blank for now.
Step 2: Add a monitoring rule
On the Rule Management tab, in the Monitor Perspective section, select the monitor you created (for example,
raw_log_number_of_table_rows_not_0). Click Create Rule.On the System Template Rules tab, find Table Row Count Is Greater Than 0, click +Use, and set Degree of importance to Strong rules.
Click Determine.
With a strong rule, if the row count in the target partition is 0, the
ods_raw_log_dnode fails and all downstream tasks are blocked. For parameter details, see Configure rules for a single table.
Step 3: Run a test
Test the monitor to verify the rules are configured correctly.
On the Rule Management tab, under Monitor Perspective, select the monitor (for example,
raw_log_number_of_table_rows_not_0) and click Test Run.In the Test Run dialog box, select Scheduling Time and click Test Run.
After the test completes, click View Details to check whether the test passed.
Step 4: Subscribe to alerts
Subscribe to the monitor to receive notifications when a rule is violated.
On the Rule Management tab, under Monitor Perspective, select the monitor and click Alert Subscription.
Add a Notification Method and Recipient, then click Save.
To view or modify subscriptions, go to Quality O&M > Monitor in the left navigation pane and select My Subscriptions.
Configure monitoring rules for ods_user_info_d
The ods_user_info_d table stores basic user information synchronized daily from ApsaraDB RDS for MySQL. Configure two rules: a strong rule for row count and a soft rule for primary key uniqueness.
Step 1: Create a monitor
On the Monitor tab, click Create Monitor.
Configure the monitor with the following key parameters:
For a full list of monitor parameters, see Configure a monitoring rule for a single table.
Parameter
Value
Data Range
dt=$[yyyymmdd-1]Trigger Method
Triggered by production scheduling. Select the
ods_user_info_dnode created during data synchronization.Monitoring Rule
Leave blank for now.
Step 2: Add monitoring rules
On the Rule Management tab, under Monitor Perspective, select the monitor (for example,
user_info_quality_control). Click Create Rule.On the System Template Rules tab, find Table Row Count Is Greater Than 0, click +Use, and set Degree of importance to Strong rules.
On the System Template Rules tab, find Unique Value Count, Fixed Value, click +Use, and configure the following parameters:
Parameter
Value
Rule Scope
uid(STRING)Monitoring Threshold
Normal threshold = 0Degree of importance
soft ruleClick Determine.
If the row count is 0, the
ods_user_info_dnode fails and all downstream tasks are blocked. For parameter details, see Configure rules for a single table.
Step 3: Run a test and subscribe to alerts
The steps to test the monitor and subscribe to alerts are the same as those in Configure monitoring rules for ods_raw_log_d.
Configure monitoring rules for ads_user_info_1d
The ads_user_info_1d table is the final result table for user persona analysis. Configure a volatility rule to track daily unique visitor (UV) fluctuations and a strong rule to ensure the table is not empty.
Step 1: Create a monitor
On the Monitor tab, click Create Monitor.
Configure the monitor with the following key parameters:
For a full list of monitor parameters, see Configure a monitoring rule for a single table.
Parameter
Value
Data Range
dt=$[yyyymmdd-1]Trigger Method
Triggered by production scheduling. Select the
ads_user_info_1dnode created during data processing.Monitoring Rule
Leave blank for now.
Step 2: Add monitoring rules
On the Rule Management tab, in the Monitor Perspective section, select the monitor (for example,
ads_user_info_quality_control). Click Create Rule.On the System Template Rules tab, find Number Of Rows, 7-day Volatility, click +Use, and configure the thresholds: Set Degree of importance to soft rule. Alerts notify subscribers but do not block downstream tasks.
Threshold
Value
Outcome
Red Threshold
> 50%
Alert sent — row count deviated more than 50% from the 7-day average
Orange Threshold
> 10%
Alert sent — row count deviated more than 10% from the 7-day average
Normal Threshold
≤ 10%
No alert
On the System Template Rules tab, find Table Row Count Is Greater Than 0, click +Use, and set Degree of importance to Strong rules.
Click Determine. The following table summarizes how violations are handled for this table:
For parameter details, see Configure rules for a single table.
Rule
Severity
Handling policy
Table row count > 0
Strong rule
Scheduling node is set to Failed; downstream tasks are blocked
7-day row count volatility > 50%
Soft rule
Alert sent to subscribers; downstream tasks continue
7-day row count volatility > 10%
Soft rule
Alert sent to subscribers; downstream tasks continue
Step 3: Run a test and subscribe to alerts
The steps to test the monitor and subscribe to alerts are the same as those in Configure monitoring rules for ods_raw_log_d.
What's next
With quality monitoring in place, visualize the processed user persona data in a dashboard. See Visualize data on a dashboard.