Data pipelines that run on a daily schedule can silently fail—source data may not arrive, or extract, transform, and load (ETL) operations may introduce duplicate primary keys. Configure data quality monitoring rules to catch these issues before they reach downstream computations, and either block the pipeline or send an alert when a check fails.
This tutorial configures two rules for ods_user_info_d_starrocks: a strong rule that confirms the table received data, and a weak rule that detects duplicate primary keys.
Prerequisites
Before you begin, ensure that you have:
Synchronized basic user information from the ApsaraDB RDS for MySQL table
ods_user_info_dtoods_user_info_d_starrocksin an E-MapReduce (EMR) Serverless StarRocks instance via Data IntegrationSynchronized website access logs from
user_log.txtin Object Storage Service (OSS) toods_raw_log_d_starrocksin the same EMR Serverless StarRocks instance via Data IntegrationProcessed the collected data into basic user profile data in Data Studio
Monitoring requirements
The following table describes the monitoring requirements for each table in the user profile analysis pipeline. This tutorial focuses on ods_user_info_d_starrocks.
Table | Monitoring requirement |
| Strong rule: daily synchronized row count > 0 |
| Strong rule: row count > 0; weak rule: business primary key uniqueness |
| No rule |
| No rule |
| Rule monitoring daily row count fluctuation (UV observation) |
How rules work
A monitoring rule defines a *passing* condition. When Data Quality runs the check, it evaluates whether your data meets that condition. If the condition is not met, the rule fails.
Rule type | On failure | When to use |
Strong rule | Alert fires; downstream nodes are blocked | Conditions that make downstream computations meaningless if violated—such as an empty partition |
Weak rule | Alert fires; downstream nodes continue | Conditions that indicate data quality issues worth investigating but that do not require blocking the pipeline |
Step 1: Go to the Configure by Table page
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Governance > Data Quality. On the page that appears, select the desired workspace and click Go to Data Quality.
In the left-side navigation pane of the Data Quality page, choose Configure Rules > Configure by Table.
On the Configure by Table page, filter by:
Database type: StarRocks
Table:
ods_user_info_d_starrocks
In the search results, click Rule Management in the Actions column. The Table Quality Details page opens.
Step 2: Configure monitoring rules
Select the monitoring scope
On the Monitor tab, click Create Monitor.
Set the Data Range parameter to
dt=$[yyyymmdd-1].To monitor table data generated by a periodic schedule, ensure that the Data Range value corresponds to the partition generated for the table on the current day.
Create monitoring rules
This section configures two rules for ods_user_info_d_starrocks:
A strong rule to confirm that at least one row was synchronized each day. This catches cases where source data did not arrive.
A weak rule to verify that the
uidcolumn has no duplicate values. This catches ETL errors that introduce duplicate primary keys.
Create the strong rule:
On the Create Monitor page, click Create Rule. The Create Rule panel opens.
On the System Template tab, find Table is not empty and click Use.
On the right side of the panel, set Degree of Importance to Strong Rule. In plain terms: this rule passes if the row count is greater than 0. If it fails (row count = 0), an alert fires and downstream nodes are blocked.
Create the weak rule:
On the System Template tab, find Unique value. fixed value and click Use.
Configure the following parameters: In plain terms: this rule passes if the number of duplicate
uidvalues equals 0. If duplicates are found, an alert fires but the pipeline continues running.Parameter
Value
Rule Scope
uid(STRING)Monitoring Threshold
0(expected number of duplicates)Degree of Importance
Weak rulesClick OK to save the rules.
Set the trigger method and exception handling policy
Set the trigger method to Triggered by Node Scheduling in Production Environment and select the
ods_user_info_d_starrocksnode created during data synchronization.Under the exception handling policy, choose one of the following actions:
Block the running of the node
Send an alert notification to the recipient
Click Save.
Step 3: Test the monitor
Run a test before the rules go into production to confirm the configuration is correct.
In the Monitor Perspective section of the Rule Management tab, select the monitor you created.
Click Test Run on the right side of the tab.
In the Test Run dialog box, set the Scheduling Time parameter and click Test Run.
After the test completes, click View Details to check whether the data passes the validation checks.
Step 4: Subscribe to monitor alerts
In the Monitor Perspective section of the Rule Management tab, select the monitor.
Click Alert Subscription on the right side of the tab.
In the Alert Subscription dialog box, configure Notification Method and Recipient, then click Save in the Actions column.
To view or modify subscriptions later, choose Quality O&M > Monitor in the left-side navigation pane, then select My Subscriptions.
What's next
After the data is processed, use DataAnalysis to visualize the results. For details, see Visualize data on a dashboard.