Data Quality provides various built-in table-level and field-level monitoring rule templates. This topic describes the verification logic of Data Quality and the built-in monitoring rule templates.

Description of calculation formula

You can calculate the fluctuation by using the following formula: Fluctuation = (Sample value - Baseline)/Baseline.
  • Sample value

    The sample value for the current day. For example, if you want to check the fluctuation in the number of table rows on an SQL node within a day, the sample value is the number of table rows in partitions on that day.

  • Baseline
    The comparison value collected from the previous N days. Examples:
    • If you want to check the fluctuation in the number of table rows on an SQL node based on the statistics seven days ago, the baseline is the number of table rows in partitions seven days before the current day. In other words, the fluctuation is calculated by comparing the sample value collected on the current day with that collected seven days before the current day.
    • If you want to check the fluctuation in the number of table rows on an SQL node in the last seven days, the baseline is the average number of table rows in the last seven days. In other words, the baseline is calculated by dividing the total number of table rows in the last seven days by seven.

Verification logic

Data Quality supports three verification methods: comparison with a fixed value, comparison with thresholds, and comparison with a dynamic threshold.
Verification method Verification logic
Comparison with a fixed value
  1. Return a Boolean value based on the verification expression. The following comparison operators are supported:

    >, <, >=, <= and!=

  2. If the calculation result is true, the data is considered to be normal. If the calculation result is false, an error alert is reported.
Comparison with thresholds

The comparison of the raising range, drop range, and fluctuation range (absolute value) is supported. The comparison of the fluctuation range (absolute value) is used as an example in this topic.

  • If the absolute value of the fluctuation does not exceed the warning threshold, the data is considered to be normal.
  • If the absolute value of the fluctuation exceeds the warning threshold and does not exceed the error threshold, a warning alert is reported.
  • If the absolute value of the fluctuation exceeds the error threshold, an error alert is reported.
Comparison with a dynamic threshold You do not need to set thresholds. The system automatically checks the metrics in real time based on algorithm models. If the value of a metric falls outside a reasonable range, an alert is reported.
Important You must purchase DataWorks Enterprise Edition or a more advanced edition to use the dynamic threshold feature.

Built-in monitoring rule templates

Built-in monitoring rule templates are classified into table-level monitoring rule templates and field-level monitoring rule templates. You can use a built-in monitoring rule template to quickly configure monitoring rules for multiple tables or fields at a time in Data Quality. For more information, see Configure monitoring rules by table and Configure monitoring rules based on a monitoring rule template.

Built-in monitoring rule templates for offline data
Table 1. Table-level monitoring rule templates
Template Description
Number of rows, fixed value Data Quality compares the number of table rows collected on the current day with a fixed value.
Table row number, 1, 7, 30 day volatility Data Quality compares the number of table rows collected on the current day with that on the previous day, seven days ago, and 30 days ago to obtain the fluctuations.
Note Then, Data Quality compares the obtained fluctuations with thresholds. If a fluctuation exceeds a threshold, Data Quality reports an alert.
Table row number, 7-day average volatility Data Quality compares the number of table rows collected on the current day with the average number of table rows in the last seven days to obtain the fluctuation. The baseline is the average number of table rows in the last seven days. In other words, the baseline is calculated by dividing the total number of table rows in the last seven days by seven.
Table row number, 30-day average volatility Data Quality compares the number of table rows collected on the current day with the average number of table rows in the last 30 days to obtain the fluctuation. The baseline is the average number of table rows in the last 30 days. In other words, the baseline is calculated by dividing the total number of table rows in the last 30 days by 30.
Table row number, 1 day volatility Data Quality compares the number of table rows collected on the current day with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert.
Number of rows, 7-day volatility Data Quality compares the number of table rows collected on the current day with that seven days before the current day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuations with thresholds. If a fluctuation exceeds a threshold, Data Quality reports an alert.
Table row number, 30-day volatility Data Quality compares the number of table rows collected on the current day with that 30 days before the current day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert.
Number of rows, 1, 7, 30 days, 1st of this month, volatility Data Quality compares the number of table rows collected on the current day with that on the previous day, seven days ago, 30 days ago, and that on the first day of the current month to obtain the fluctuations. Then, Data Quality compares the obtained fluctuations with thresholds. If a fluctuation exceeds a threshold, Data Quality reports an alert.
Number of table rows, upper cycle volatility Data Quality compares the number of table rows collected on the current day with that in partitions generated in the last cycle to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds.
Number of rows, 1 day difference Data Quality compares the number of table rows collected on the current day with that in partitions generated on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds.
Note The baseline is the number of table rows in partitions generated on the previous day.
Number of table rows, upper cycle difference Data Quality compares the number of table rows collected on the current day with that in partitions generated in the last cycle to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds.
Table size, fixed value Data Quality compares the size of a table in bytes on the current day with a fixed value.
Table size, 1 day volatility Data Quality compares the size of a table on the current day with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert.

For example, you can set the warning threshold to 5% and the error threshold to 10%. If the fluctuation is greater than 5% and less than or equal to 10%, a warning alert is reported. If the fluctuation is greater than 10%, an error alert is reported.

Table size, 7-day volatility Data Quality compares the size of a table on the current day with that seven days ago to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert.

For example, you can set the warning threshold to 5% and the error threshold to 10%. If the fluctuation is greater than 5% and less than or equal to 10%, a warning alert is reported. If the fluctuation is greater than 10%, an error alert is reported.

Table size, upper period difference Data Quality compares the size of a table in bytes on the current day with that in the last cycle to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds.
Table size (in bytes), upper period difference Data Quality compares the size of a table in bytes on the current day with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds.
Note You cannot configure table-size-based monitoring rules for E-MapReduce (EMR) tables.
Table 2. Field-level monitoring rule templates
Template Description
Average, 1, 7, 30-day volatility Data Quality compares the average value of a field calculated on the current day with that on the previous day, seven days ago, and 30 days ago to obtain the fluctuations. Then, Data Quality compares the obtained fluctuations with thresholds. If a fluctuation exceeds a threshold, Data Quality reports an alert.
Note Data Quality compares the average value of a field calculated on the current day with that on the previous day, seven days ago, and 30 days ago.
Summary value, 1, 7, 30-day volatility Data Quality compares the value sum of a field on the current day with the average values calculated on the previous day, seven days ago, and 30 days ago to obtain the fluctuations. Then, Data Quality compares the obtained fluctuations with thresholds. If a fluctuation exceeds a threshold, Data Quality reports an alert.
Minimum, 1, 7, 30-day volatility Data Quality compares the minimum value of a field on the current day with the average values calculated on the previous day, seven days ago, and 30 days ago to obtain the fluctuations. Then, Data Quality compares the obtained fluctuations with thresholds. If a fluctuation exceeds a threshold, Data Quality reports an alert.
Maximum, 1, 7, 30-day volatility Data Quality compares the maximum value of a field on the current day with the average values calculated on the previous day, seven days ago, and 30 days ago to obtain the fluctuations. Then, Data Quality compares the obtained fluctuations with thresholds. If a fluctuation exceeds a threshold, Data Quality reports an alert.
Unique value, fixed value Data Quality compares the number of unique values of a field after deduplication with a fixed value.
The number of unique values, 1, 7, 30 volatility Data Quality compares the number of unique values of a field after deduplication on the current day with that on the previous day, seven days ago, and 30 days ago to obtain the fluctuations. Then, Data Quality compares the obtained fluctuations with thresholds.
Number of null values, fixed value Data Quality compares the number of null values of a field with a fixed value.
Note The IS NULL expression is used in the SQL statements to check whether a value of a field is a null value.
Number of nulls / total number of rows, fixed value Data Quality compares the ratio of the number of null values of a field to the total number of rows with a fixed value.
Note The fixed value is a decimal.
Repeated number of values / total number of rows, fixed value Data Quality compares the ratio of the number of duplicated values of a field to the total number of rows with a fixed value.
Repeated value, fixed value Data Quality subtracts the number of values of a field after deduplication from the total number of rows to obtain the number of duplicated values of the field. Then, Data Quality compares the number of duplicated values with a fixed value.
Number of unique values / total number of rows Data Quality compares the ratio of the number of unique values of a field to the total number of rows with a fixed value.
Discrete value (status value), fixed value Data Quality compares the number of values in each group of a field with a fixed value.
Discrete values (number of groups and status values), 1, 7, 30-day volatility Data Quality compares the number of groups and the number of values in each group of a field on the current day with those on the previous day, seven days ago, and 30 days ago to obtain the fluctuations. Then, Data Quality compares the obtained fluctuations with thresholds.
Discrete value (number of groups), fixed value Data Quality compares the number of groups of a field with a fixed value.
Discrete value (number of groups), 1 day volatility Data Quality compares the number of groups of a field on the current day with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds.
Average, 1 day volatility Data Quality compares the average value of a field on the current day with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds.
Summary value, 1 day volatility Data Quality compares the value sum of a field on the current day with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds.
Minimum value, 1 day volatility Data Quality compares the minimum value of a field on the current day with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds.
Maximum, 1 day volatility Data Quality compares the maximum value of a field on the current day with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds.
Summary value, upper period volatility Data Quality compares the value sum of a field on the current day with that in the last cycle to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds.
Minimum period Data Quality compares the minimum value of a field on the current day with that in the last cycle to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds.
Maximum period Data Quality compares the maximum value of a field on the current day with that in the last cycle to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds.