All Products
Search
Document Center

DataWorks:Monitor data quality

Last Updated:Mar 26, 2026

Data pipelines that run on a daily schedule can silently fail—source data may not arrive, or extract, transform, and load (ETL) operations may introduce duplicate primary keys. Configure data quality monitoring rules to catch these issues before they reach downstream computations, and either block the pipeline or send an alert when a check fails.

This tutorial configures two rules for ods_user_info_d_starrocks: a strong rule that confirms the table received data, and a weak rule that detects duplicate primary keys.

Prerequisites

Before you begin, ensure that you have:

  • Synchronized basic user information from the ApsaraDB RDS for MySQL table ods_user_info_d to ods_user_info_d_starrocks in an E-MapReduce (EMR) Serverless StarRocks instance via Data Integration

  • Synchronized website access logs from user_log.txt in Object Storage Service (OSS) to ods_raw_log_d_starrocks in the same EMR Serverless StarRocks instance via Data Integration

  • Processed the collected data into basic user profile data in Data Studio

Monitoring requirements

The following table describes the monitoring requirements for each table in the user profile analysis pipeline. This tutorial focuses on ods_user_info_d_starrocks.

Table

Monitoring requirement

ods_raw_log_d_starrocks

Strong rule: daily synchronized row count > 0

ods_user_info_d_starrocks

Strong rule: row count > 0; weak rule: business primary key uniqueness

dwd_log_info_di_starrocks

No rule

dws_user_info_all_di_starrocks

No rule

ads_user_info_1d_starrocks

Rule monitoring daily row count fluctuation (UV observation)

How rules work

A monitoring rule defines a *passing* condition. When Data Quality runs the check, it evaluates whether your data meets that condition. If the condition is not met, the rule fails.

Rule type

On failure

When to use

Strong rule

Alert fires; downstream nodes are blocked

Conditions that make downstream computations meaningless if violated—such as an empty partition

Weak rule

Alert fires; downstream nodes continue

Conditions that indicate data quality issues worth investigating but that do not require blocking the pipeline

Step 1: Go to the Configure by Table page

  1. Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Governance > Data Quality. On the page that appears, select the desired workspace and click Go to Data Quality.

  2. In the left-side navigation pane of the Data Quality page, choose Configure Rules > Configure by Table.

  3. On the Configure by Table page, filter by:

    • Database type: StarRocks

    • Table: ods_user_info_d_starrocks

  4. In the search results, click Rule Management in the Actions column. The Table Quality Details page opens.

Step 2: Configure monitoring rules

Select the monitoring scope

  1. On the Monitor tab, click Create Monitor.

  2. Set the Data Range parameter to dt=$[yyyymmdd-1].

    To monitor table data generated by a periodic schedule, ensure that the Data Range value corresponds to the partition generated for the table on the current day.

Create monitoring rules

This section configures two rules for ods_user_info_d_starrocks:

  • A strong rule to confirm that at least one row was synchronized each day. This catches cases where source data did not arrive.

  • A weak rule to verify that the uid column has no duplicate values. This catches ETL errors that introduce duplicate primary keys.

Create the strong rule:

  1. On the Create Monitor page, click Create Rule. The Create Rule panel opens.

  2. On the System Template tab, find Table is not empty and click Use.

  3. On the right side of the panel, set Degree of Importance to Strong Rule. In plain terms: this rule passes if the row count is greater than 0. If it fails (row count = 0), an alert fires and downstream nodes are blocked.

Create the weak rule:

  1. On the System Template tab, find Unique value. fixed value and click Use.

  2. Configure the following parameters: In plain terms: this rule passes if the number of duplicate uid values equals 0. If duplicates are found, an alert fires but the pipeline continues running.

    Parameter

    Value

    Rule Scope

    uid(STRING)

    Monitoring Threshold

    0 (expected number of duplicates)

    Degree of Importance

    Weak rules

  3. Click OK to save the rules.

Set the trigger method and exception handling policy

  1. Set the trigger method to Triggered by Node Scheduling in Production Environment and select the ods_user_info_d_starrocks node created during data synchronization.

  2. Under the exception handling policy, choose one of the following actions:

    • Block the running of the node

    • Send an alert notification to the recipient

  3. Click Save.

Step 3: Test the monitor

Run a test before the rules go into production to confirm the configuration is correct.

  1. In the Monitor Perspective section of the Rule Management tab, select the monitor you created.

  2. Click Test Run on the right side of the tab.

  3. In the Test Run dialog box, set the Scheduling Time parameter and click Test Run.

  4. After the test completes, click View Details to check whether the data passes the validation checks.

Step 4: Subscribe to monitor alerts

  1. In the Monitor Perspective section of the Rule Management tab, select the monitor.

  2. Click Alert Subscription on the right side of the tab.

  3. In the Alert Subscription dialog box, configure Notification Method and Recipient, then click Save in the Actions column.

  4. To view or modify subscriptions later, choose Quality O&M > Monitor in the left-side navigation pane, then select My Subscriptions.

What's next

After the data is processed, use DataAnalysis to visualize the results. For details, see Visualize data on a dashboard.