Data Quality in DataWorks offers a variety of built-in monitoring templates. This topic describes the verification logic of Data Quality and the available built-in monitoring rule templates.
Built-in rule templates
Use built-in rule templates to quickly set up monitoring rules for a single table or multiple tables in bulk. For details, see the referenced document or the referenced document.
Table-level templates
Table-level templates monitor row count and storage size across the entire table.
Number of rows
| Template name | Description |
|---|---|
| Fixed value | Checks that the row count matches a fixed value. |
| Table not empty | Checks that the table contains at least one row. |
| 1-day difference | Checks the daily change in row count. The baseline is the previous day's partition row count. |
| Change in table row count since the previous epoch | Compares the current row count with the previous scheduling cycle's partition to detect fluctuations against set thresholds. |
| 1, 7, 30 days and 1st of month volatility | Compares the current row count with counts from 1 day ago, 7 days ago, 30 days ago, and the first day of the current month. Triggers an alert if any threshold is breached. |
| 1, 7, 30 day volatility | Compares the current row count with counts from 1 day ago, 7 days ago, and 30 days ago. Triggers an alert if any threshold is breached. |
| 1-day volatility | Compares the current row count with the previous day's. Triggers an alert if the threshold is breached. |
| 30-day volatility | Compares the current row count with the count from 30 days ago. Triggers an alert if the threshold is breached. |
| 7-day volatility | Compares the current row count with the count from 7 days ago. Triggers an alert if the threshold is breached. |
| Table rows (Intelligent Dynamic Threshold) | When Comparison Method is set to Intelligent Dynamic Threshold, the system automatically determines fluctuation thresholds using intelligent algorithms and triggers alerts or blocks on anomalies. |
| 30-day average volatility | Compares the current row count against the 30-day average. The 30-day average is the total row count over the past 30 days divided by 30. |
| 7-day average volatility | Compares the current row count against the 7-day average. The 7-day average is the total row count over the past 7 days divided by 7. |
| Upper cycle volatility | Compares the current row count with the previous scheduling cycle's partition. |
| Custom range of table rows | Lets you define the comparison method and threshold range for row count based on your business requirements. |
Table size
E-MapReduce (EMR) tables do not support table-size-based monitoring rules.
| Template name | Description |
|---|---|
| Fixed value for table size | Checks that the table size (in bytes) matches a fixed value. |
| Table size: periodic difference (24h) | Checks the change in table storage size over the last 24 hours (in bytes). |
| Table size: periodic difference (epoch) | Checks the change in table size since the previous epoch (in bytes). |
| Table size: 1-day volatility | Compares the current table size with the previous day's and checks the fluctuation against set thresholds. For example, with a warning threshold of 5% and an error threshold of 10%: a warning alert fires if the fluctuation exceeds 5% but is 10% or less; an error alert fires if it exceeds 10%. |
| Table size: 30-day volatility | Compares the current table size with the size from 30 days ago and checks the fluctuation against set thresholds. |
| Table size: 7-day volatility | Compares the current table size with the size from 7 days ago and checks the fluctuation against set thresholds. |
| Table size (Intelligent Dynamic Threshold) | When Comparison Method is set to Intelligent Dynamic Threshold, the system automatically determines thresholds using intelligent algorithms and triggers alerts or blocks on anomalies. |
Percent on condition
| Template name | Description |
|---|---|
| Percentage of matched conditions and custom range | Lets you define the comparison method and threshold range for the match rate of filter conditions based on your business requirements. |
Field-level templates
Field-level templates monitor individual fields (columns) within a table.
Null value count
| Template name | Description |
|---|---|
| Number of null values: fixed value | Checks the null value count in a field against a fixed value. A value is considered null if identified by the SQL IS NULL operator. |
| No null values in single field | Checks that a field contains no null values. |
| Null count / total rows: fixed value | Checks the ratio of null values to total rows against a fixed decimal value. |
Format verification
| Template name | Description |
|---|---|
| Regular expression verification | Checks that a field conforms to a specified regular expression. |
| Date format verification | Checks that a field conforms to a specified date format. |
| Email format verification | Checks that a field conforms to the email format. |
| ID card format verification | Checks that a field conforms to the ID card format. |
| Mobile phone number format verification | Checks that a field conforms to the mobile phone number format. |
| Currency format verification | Checks that a field conforms to the currency format. |
| Numeric format verification | Checks that a field conforms to the numeric format. |
| Telephone number format verification | Checks that a field conforms to the telephone number format. |
Duplicate value count
| Template name | Description |
|---|---|
| Repeated value: fixed value | Checks the number of duplicate values in a field against a fixed value. |
| No duplicate values in a single field | Checks that a field contains no duplicate values. |
Distinct count across multiple fields
| Template name | Description |
|---|---|
| No duplicate values across fields | Checks that there are no duplicate values across multiple fields. |
Duplicate values to total rows ratio
| Template name | Description |
|---|---|
| Ratio of repeated values to total rows: fixed value | Checks the ratio of duplicate values to total rows against a fixed value. |
Distinct count
| Template name | Description |
|---|---|
| Unique value: fixed value | Checks the number of unique values (post-deduplication) against a fixed value. |
| Unique value count: 1-day, 7-day, and 30-day volatility | Compares the unique value count for the current day with counts from 1 day ago, 7 days ago, and 30 days ago, and checks the fluctuations against set thresholds. |
| Unique value (Intelligent Dynamic Threshold) | When Comparison Method is set to Intelligent Dynamic Threshold, the system automatically determines thresholds using intelligent algorithms and triggers alerts or blocks on anomalies. |
| Unique values to total rows ratio: fixed value | Checks the ratio of unique values to total rows against a fixed value. |
Min value
| Template name | Description |
|---|---|
| Minimum: 1, 7, 30-day volatility | Compares the current day's minimum field value against the averages from 1 day ago, 7 days ago, and 30 days ago. Triggers an alert if any threshold is breached. |
| Minimum value (Intelligent Dynamic Threshold) | When Comparison Method is set to Intelligent Dynamic Threshold, the system automatically determines thresholds using intelligent algorithms and triggers alerts or blocks on anomalies. |
| Minimum value: 1-day volatility | Compares the current day's minimum with the previous day's and checks the fluctuation against set thresholds. |
| Minimum value: previous cycle volatility | Compares the current day's minimum with the previous scheduling cycle's and checks the fluctuation against set thresholds. Triggers an alert if the threshold is breached. |
| User-defined minimum value condition | Lets you define the comparison method and threshold range for a field's minimum value based on your business requirements. |
Maximum
| Template name | Description |
|---|---|
| Maximum: 1, 7, 30-day volatility | Compares the current day's maximum field value against the averages from 1 day ago, 7 days ago, and 30 days ago. Triggers an alert if any threshold is breached. |
| Maximum value (Intelligent Dynamic Threshold) | When Comparison Method is set to Intelligent Dynamic Threshold, the system automatically determines thresholds using intelligent algorithms and triggers alerts or blocks on anomalies. |
| Maximum daily volatility | Compares the current day's maximum with the previous day's and checks the fluctuation against set thresholds. |
| Maximum value: previous cycle volatility | Compares the current day's maximum with the previous scheduling cycle's and checks the fluctuation against set thresholds. Triggers an alert if the threshold is breached. |
| Maximum value with user-defined condition | Lets you define the comparison method and threshold range for a field's maximum value based on your business needs. |
Average
| Template name | Description |
|---|---|
| Average: 1, 7, 30-day volatility | Compares the current day's average field value with the values from 1 day ago, 7 days ago, and 30 days ago. Triggers an alert if any threshold is breached. |
| Average value (Intelligent Dynamic Threshold) | When Comparison Method is set to Intelligent Dynamic Threshold, the system automatically determines thresholds using intelligent algorithms and triggers alerts or blocks on anomalies. |
| Average daily volatility | Compares the current day's average with the previous day's and checks the fluctuation against set thresholds. |
| Average value with user-defined conditions | Lets you define the comparison method and threshold range for a field's average value based on your business needs. |
Sum
| Template name | Description |
|---|---|
| Summary value: 1, 7, 30-day volatility | Compares the current day's field sum against the average values from 1 day ago, 7 days ago, and 30 days ago. Triggers an alert if any threshold is breached. |
| Aggregated value (Intelligent Dynamic Threshold) | When Comparison Method is set to Intelligent Dynamic Threshold, the system automatically determines thresholds using intelligent algorithms and triggers alerts or blocks on anomalies. |
| Summary value: 1-day volatility | Compares the current day's field sum with the previous day's and checks the fluctuation against set thresholds. |
| Summary value: previous cycle volatility | Compares the current day's field sum with the previous scheduling cycle's and checks the fluctuation against set thresholds. Triggers an alert if the threshold is breached. |
| Sum value with user-defined conditions | Lets you define the comparison method and threshold range for the aggregated field value based on your business needs. |
Enumeration
| Template name | Description |
|---|---|
| Enumeration: unmatched unique value count, fixed value | Checks whether the count of unmatched enumeration items equals a fixed value. |
| Enumeration: unmatched row count, fixed value | Checks the number of rows with unmatched enumeration items against a fixed value. |
| Enumeration value mismatch when row count is 0 | Set the fixed value and comparison method to = 0. An alert fires if any rows contain unmatched enumeration values. |
| Enumeration value: custom, fixed value | Lets you customize the enumeration items' count, row number, comparison method, thresholds, and additional statistical metrics. |
Discrete values
| Template name | Description |
|---|---|
| Discrete value (status value): fixed value | Checks the count of values within each group of a field against a fixed value. |
| Discrete value (number of groups): fixed value | Compares the number of groups returned by a GROUP BY operation against a fixed value. |
| Discrete value (number of groups): Intelligent Dynamic Threshold | When Comparison Method is set to Intelligent Dynamic Threshold, the system automatically determines thresholds using intelligent algorithms and triggers alerts or blocks on anomalies. |
| Discrete value (status value): Intelligent Dynamic Threshold | When Comparison Method is set to Intelligent Dynamic Threshold, the system automatically determines thresholds using intelligent algorithms and triggers alerts or blocks on anomalies. |
| Discrete value (number of groups): 1-day volatility | Compares the number of groups from a GROUP BY operation with the number of groups from the previous day's sample to calculate the fluctuation rate. |
| Discrete values (groups and status values): 1-day, 7-day, and 30-day volatility | Compares the group count and per-group counts from a GROUP BY operation with the same metrics from 1, 7, and 30 days ago to calculate the volatility rate. |
Fluctuation rate formula
The fluctuation rate is calculated as:
Fluctuation rate = (Current Value - Baseline Value) / Baseline Value
Current Value is the actual metric value for the current check. Baseline Value depends on the template type:
-
N-day volatility templates: the metric value from N days ago.
-
N-day average volatility templates: the average metric value over the past N days.
-
Previous cycle volatility templates: the metric value from the previous scheduling cycle.
Verification logic
Data Quality uses three verification methods.
| Verification method | Logic |
|---|---|
| Comparison with a fixed value | Evaluates the metric against a fixed value using one of these operators: >, <, >=, <=, or !=. If the result is true, the status is Normal. Otherwise, an error alert fires. |
| Fluctuation comparison | Monitors the magnitude of change in a metric. Supports three comparison types: Absolute value (monitors any change, regardless of direction — for example, alert if row count changes by more than 10%), Increase (monitors only abnormal increases — for example, when monitoring costs, alert only if costs rise significantly), and Decrease (monitors only abnormal drops — for example, when monitoring order volume, alert only if volume drops suddenly). The system determines the alert level based on the warning and error thresholds you set. |
| Dynamic threshold | No threshold configuration required. The system checks metrics in real time using algorithm models and reports an alert if a metric falls outside a reasonable range. |
Appendix 1: How the previous cycle baseline works
For previous-cycle templates, the baseline is the most recent completed instance whose data timestamp differs from the current instance. Instances are sorted by data timestamp in descending order, and the most recent one (excluding the current data timestamp) is used.
| Scheduling type | Data timestamps | Baseline selection | Edge case |
|---|---|---|---|
| Daily | 2024-06-01, 2024-06-02, 2024-06-03, 2024-06-04, 2024-06-05 | When the June 6, 2024 instance starts its monitoring check, the June 5, 2024 instance is used as the baseline. | Data backfill scenario: The node runs as expected from June 1 to June 5, 2024. After the June 5 instance finishes, a backfill is run to populate July 1, 2024 data. When the June 6 instance starts its monitoring check, the July 1, 2024 instance (the most recent with a different data timestamp) is used as the baseline — until the July 2, 2024 instance completes. |
| Hourly (3 runs per day) | 2024-06-01, 2024-06-02, 2024-06-03 | When an instance with data timestamp June 4, 2024 starts its check, the last instance with data timestamp June 3, 2024 is used as the baseline. | Hourly scheduling scenario: Three instances run per day from June 1 to June 3, 2024, and the first June 4 instance also completes. When the second June 4 instance starts its monitoring check, the first June 4 instance is excluded (same data timestamp), so the last June 3 instance is used as the baseline. |
Appendix 2: How the N-day sample value works for hourly tasks
For hourly tasks, the sample value from N days ago is determined by sorting all instances on the Nth day by running time (not scheduled time) and using the last instance's result data as the sample.
| Scheduling type | Data timestamps | Fluctuation comparison method | Edge case |
|---|---|---|---|
| Hourly (3 runs per day) | 2024-06-01 through 2024-06-08 | For a 7-day fluctuation check, when an instance with running time June 8, 2024 starts its monitoring check, the last instance whose running time is June 1, 2024 is used as the sample value. | Hourly scheduling scenario: Three instances run per day from June 1 to June 8, 2024. When the second instance with running time June 8, 2024 starts its check, the last instance with running time June 1, 2024 is used as the 7-day sample value. |