This topic describes the verification logic of Data Quality and the built-in rule templates that are provided for monitoring offline data.

Terms

  • sample: the sample value for the current day. For example, if you need to check the fluctuation of table rows on an SQL node in a day, the sample is the number of table rows on the current day.
  • baseline: the comparison value from the previous N days. Examples:
    • If you need to check the fluctuation of table rows on an SQL node in a day, the baseline is the number of table rows on the previous day.
    • If you need to check the fluctuation of the average number of table rows on an SQL node in seven days, the baseline is the average number of table rows in the last seven days.

Verification logic

Data Quality supports the following verification methods: comparison with a fixed value, comparison with thresholds, and dynamic threshold.
Verification method Verification logic
Comparison with a fixed value
  1. Data Quality calculates the Boolean result of a comparison expression. The following comparison operators are supported:

    >, <, >=, <=, and !=

  2. If the calculation result is true, no alert is reported. If the calculation result is false, Data Quality reports an error alert.
Comparison with thresholds
  • If the fluctuation does not exceed the warning threshold, Data Quality determines that data is normal.
  • If the fluctuation exceeds the warning threshold but does not exceed the error threshold, Data Quality reports a warning alert.
  • If the fluctuation exceeds the error threshold, Data Quality reports an error alert.
Dynamic threshold You do not need to set thresholds. Data Quality automatically checks the metrics in real time based on algorithm models. If the value of a metric falls outside a reasonable range, Data Quality reports an alert.
Notice You can use the dynamic threshold feature only in DataWorks Enterprise Edition or more advanced editions.

List of built-in rule templates for offline data

Template Description
Fluctuations of the average value of a field compared with that on the previous day, that of seven days ago, and that of 30 days ago Data Quality compares the average value of a field with that on the previous day, that of seven days ago, and that of 30 days ago to obtain the fluctuations. Then, Data Quality compares the obtained fluctuations with thresholds. If a fluctuation exceeds a threshold, Data Quality reports an alert.
Fluctuations of the sum of values in a field compared with that on the previous day, that of seven days ago, and that of 30 days ago Data Quality compares the sum of values in a field with that on the previous day, that of seven days ago, and that of 30 days ago to obtain the fluctuations. Then, Data Quality compares the obtained fluctuations with thresholds. If a fluctuation exceeds a threshold, Data Quality reports an alert.
Fluctuations of the minimum value of a field compared with that on the previous day, that of seven days ago, and that of 30 days ago Data Quality compares the minimum value of a field with that on the previous day, that of seven days ago, and that of 30 days ago to obtain the fluctuations. Then, Data Quality compares the obtained fluctuations with thresholds. If a fluctuation exceeds a threshold, Data Quality reports an alert.
Fluctuations of the maximum value of a field compared with that on the previous day, that of seven days ago, and that of 30 days ago Data Quality compares the maximum value of a field with that on the previous day, that of seven days ago, and that of 30 days ago to obtain the fluctuations. Then, Data Quality compares the obtained fluctuations with thresholds. If a fluctuation exceeds a threshold, Data Quality reports an alert.
Number of unique values in a field Data Quality compares the number of unique values in a field after deduplication with a fixed value.
Fluctuations of the number of unique values in a field compared with that on the previous day, that of seven days ago, and that of 30 days ago Data Quality compares the number of unique values in a field after deduplication with that on the previous day, that of seven days ago, and that of 30 days ago to obtain the fluctuations. Then, Data Quality compares the obtained fluctuations with thresholds. If a fluctuation exceeds a threshold, Data Quality reports an alert.
Fluctuation of the number of table rows compared with that on the previous day, that of seven days ago, and that of 30 days ago Data Quality compares the number of table rows with that on the previous day, that of seven days ago, and that of 30 days ago to obtain the fluctuations. Then, Data Quality compares the obtained fluctuations with thresholds. If a fluctuation exceeds a threshold, Data Quality reports an alert.
Number of null values in a field Data Quality compares the number of null values in a field with a fixed value.
Ratio of the number of null values in a field to the total number of rows Data Quality compares the ratio of the number of null values in a field to the total number of rows with a fixed value.
Note The fixed value is a decimal.
Ratio of the number of duplicated values in a field to the total number of rows Data Quality compares the ratio of the number of duplicated values in a field to the total number of rows with a fixed value.
Number of duplicated values in a field Data Quality subtracts the number of values in a field after deduplication from the total number of rows to obtain the number of duplicated values in the field. Then, Data Quality compares the number of duplicated values with a fixed value.
Ratio of the number of unique values in a field to the total number of rows Data Quality compares the ratio of the number of unique values in a field to the total number of rows with a fixed value.
Fluctuation of the average value of a field compared with that on the previous day Data Quality compares the average value of a field with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert.
Fluctuation of the sum of values in a field compared with that on the previous day Data Quality compares the sum of values in a field with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert.
Fluctuation of the minimum value of a field compared with that on the previous day Data Quality compares the minimum value of a field with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert.
Fluctuation of the maximum value of a field compared with that on the previous day Data Quality compares the maximum value of a field with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert.
Fluctuation of the sum of values in a field compared with that in the last cycle Data Quality compares the sum of values in a field with that in the last cycle to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert.
Fluctuation of the minimum value of a field compared with that in the last cycle Data Quality compares the minimum value of a field with that in the last cycle to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert.
Fluctuation of the maximum value of a field compared with that in the last cycle Data Quality compares the maximum value of a field with that in the last cycle to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert.
Count of each discrete point for grouping in a field Data Quality compares the count of each discrete point for grouping in a field with a fixed value.
Fluctuations of the count of each discrete point for grouping in a field compared with that on the previous day, that of seven days ago, and that of 30 days ago Data Quality compares the count of each discrete point for grouping in a field with that on the previous day, that of seven days ago, and that of 30 days ago to obtain the fluctuations. Then, Data Quality compares the obtained fluctuations with thresholds. If a fluctuation exceeds a threshold, Data Quality reports an alert.
Total number of discrete points for grouping in a field Data Quality compares the total number of discrete points for grouping in a field with a fixed value.
Fluctuation of the total number of discrete points for grouping in a field compared with that on the previous day Data Quality compares the total number of discrete points for grouping in a field with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert.
Whether the table size, in bytes, remains unchanged, compared with that in the last cycle Data Quality checks whether the table size in bytes remains unchanged, compared with that in the last cycle.
Whether the table size, in bytes, is changed, compared with that in the last cycle Data Quality checks whether the table size in bytes is changed, compared with that in the last cycle.
Whether the number of table rows is changed, compared with that in the last cycle Data Quality checks whether the number of table rows is changed, compared with that in the last cycle.
Whether the number of table rows remains unchanged, compared with that in the last cycle Data Quality checks whether the number of table rows remains unchanged, compared with that in the last cycle.
Difference between the table size, in bytes, and that in the last cycle Data Quality compares the table size in bytes with that in the last cycle to obtain the difference. Then, Data Quality compares the difference with a fixed value.
Difference between the number of table rows and that in the last cycle Data Quality compares the number of table rows collected on the current day with that in the partition generated in the last cycle to obtain the difference. Then, Data Quality compares the difference with a fixed value.
Number of table rows Data Quality compares the number of table rows with a fixed value.
Table size, in bytes Data Quality compares the table size in bytes with a fixed value.
Difference between the number of table rows and that on the previous day Data Quality compares the number of table rows collected on the current day with that in the partition generated on the previous day to obtain the difference. Then, Data Quality compares the difference with a fixed value.
Difference between the table size, in bytes, and that on the previous day Data Quality compares the table size in bytes with that on the previous day to obtain the difference. Then, Data Quality compares the difference with a fixed value.
Fluctuation of the table size compared with that on the previous day Data Quality compares the table size with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert.

For example, you can set the warning threshold to 5% and the error threshold to 10%. If the fluctuation is greater than 5% and less than or equal to 10%, Data Quality reports a warning alert. If the fluctuation is greater than 10%, Data Quality reports an error alert.

Fluctuation of the table size compared with that of seven days ago Data Quality compares the table size with that of seven days ago to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert.

For example, you can set the warning threshold to 5% and the error threshold to 10%. If the fluctuation is greater than 5% and less than or equal to 10%, Data Quality reports a warning alert. If the fluctuation is greater than 10%, Data Quality reports an error alert.

Fluctuation of the table size compared with that of 30 days ago Data Quality compares the table size with that of 30 days ago to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert.

For example, you can set the warning threshold to 5% and the error threshold to 10%. If the fluctuation is greater than 5% and less than or equal to 10%, Data Quality reports a warning alert. If the fluctuation is greater than 10%, Data Quality reports an error alert.

Fluctuation of the number of table rows compared with the average value in the last seven days Data Quality compares the number of table rows with the average value in the last seven days to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert.
Fluctuation of the number of table rows compared with the average value in the last 30 days Data Quality compares the number of table rows with the average value in the last 30 days to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert.
Fluctuation of the number of table rows compared with that of the previous day Data Quality compares the number of table rows collected on the current day with that in the partition generated on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert.
Fluctuation of the number of table rows compared with that of seven days ago Data Quality compares the number of table rows collected on the current day with that in the partition generated seven days ago to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert.
Fluctuation of the number of table rows compared with that of 30 days ago Data Quality compares the number of table rows collected on the current day with that in the partition generated 30 days ago to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert.
Fluctuation of the number of table rows compared with that on the previous day, that of seven days ago, that of 30 days ago, and that on the first day of the current month Data Quality compares the number of table rows with that on the previous day, that of seven days ago, that of 30 days ago, and that on the first day of the current month to obtain the fluctuations. Then, Data Quality compares the obtained fluctuations with thresholds. If a fluctuation exceeds a threshold, Data Quality reports an alert.
Fluctuation of the number of table rows compared with that in the last cycle Data Quality compares the number of table rows collected on the current day with that in the partition generated in the last cycle to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert.