This topic describes the verification logic of Data Quality and the built-in rule templates that are provided for monitoring offline data.

Terms

  • sample: the sample value for the current day. For example, if you want to check the fluctuation of table rows on an SQL node in a day, the sample is the number of table rows on that day.
  • baseline: the comparison value from the previous samples.
    • If you want to check the fluctuation of table rows on an SQL node in a day, the baseline is the number of table rows on the previous day.
    • If you want to check the average fluctuation of table rows on an SQL node in seven days, the baseline is the average number of table rows in the last seven days.

Verification logic

Data Quality supports three verification methods: comparison with a fixed value, comparison with thresholds, and dynamic threshold.
Verification method Verification logic
Comparison with a fixed value
  1. Return the Boolean result based on the verification expression. The following comparison operators are supported:

    >, <, >=, <=, and ! =

  2. If the calculation result is true, the data is considered to be normal. If the calculation result is false, an error alert is reported.
Comparison with thresholds
  • If the absolute value of the fluctuation does not exceed the warning threshold, the data is considered to be normal.
  • If the absolute value of the fluctuation does not meet the condition in Case 1 and does not exceed the error threshold, a warning alert is reported.
  • If the fluctuation does not meet the condition in Case 2, an error alert is reported.
Dynamic threshold You do not need to set thresholds. The system automatically checks the metrics in real time based on algorithm models. If the value of a metric falls outside a reasonable range, an alert is reported.
Notice You must purchase DataWorks Enterprise Edition or a more advanced edition to use the dynamic threshold feature.

Description of built-in rule templates for offline data

Template name Description
Fluctuations of the average value of a field compared with that on the previous day, that of seven days ago, and that of one month ago Data Quality compares the average value of a field with that on the previous day, that of seven days ago, and that of one month ago to obtain the fluctuations. Then, Data Quality compares the obtained fluctuations with thresholds. If a fluctuation exceeds a threshold, an alert is reported.
Fluctuations of the sum of values in a field compared with that on the previous day, that of seven days ago, and that of one month ago Data Quality compares the sum of values in a field with that on the previous day, that of seven days ago, and that of one month ago to obtain the fluctuations. Then, Data Quality compares the obtained fluctuations with thresholds. If a fluctuation exceeds a threshold, an alert is reported.
Fluctuations of the minimum value of a field compared with that on the previous day, that of seven days ago, and that of one month ago Data Quality compares the minimum value of a field with that on the previous day, that of seven days ago, and that of one month ago to obtain the fluctuations. Then, Data Quality compares the obtained fluctuations with thresholds. If a fluctuation exceeds a threshold, an alert is reported.
Fluctuations of the maximum value of a field compared with that on the previous day, that of seven days ago, and that of one month ago Data Quality compares the maximum value of a field with that on the previous day, that of seven days ago, and that of one month ago to obtain the fluctuations. Then, Data Quality compares the obtained fluctuations with thresholds. If a fluctuation exceeds a threshold, an alert is reported.
Number of unique values in a field Data Quality compares the number of unique values in a field after deduplication with a fixed value.
Fluctuations of the number of unique values in a field compared with that on the previous day, that of seven days ago, and that of one month ago Data Quality compares the number of unique values in a field after deduplication with that on the previous day, that of seven days ago, and that of one month ago. This is a comparison with a fixed value.
Fluctuations of the number of table rows compared with that on the previous day, that of seven days ago, and that of one month ago Data Quality compares the number of table rows with that on the previous day, that of seven days ago, and that of one month ago to obtain the fluctuations.
Number of null values in a field Data Quality compares the number of null values in a field with a fixed value.
Ratio of the number of null values in a field to the total number of rows Data Quality compares the ratio of the number of null values in a field to the total number of rows with a fixed value.
Note The fixed value is a decimal.
Ratio of the number of duplicated values in a field to the total number of rows Data Quality compares the ratio of the number of duplicated values in a field to the total number of rows with a fixed value.
Number of duplicated values in a field Data Quality subtracts the number of values in a field after deduplication from the total number of rows to obtain the number of duplicated values in the field. Then, Data Quality compares the number of duplicated values with a fixed value.
Ratio of the number of unique values in a field to the total number of rows Data Quality compares the ratio of the number of unique values in a field to the total number of rows with a fixed value.
Fluctuation of the average value of a field compared with that on the previous day Data Quality compares the average value of a field with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds.
Fluctuation of the sum of values in a field compared with that on the previous day Data Quality compares the sum of values in a field with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds.
Fluctuation of the minimum value of a field compared with that on the previous day Data Quality compares the minimum value of a field with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds.
Fluctuation of the maximum value of a field compared with that on the previous day Data Quality compares the maximum value of a field with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds.
Fluctuation of the sum of values in a field compared with that in the last cycle Data Quality compares the sum of values in a field with that in the last cycle to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, an alert is reported.
Fluctuation of the minimum value of a field compared with that in the last cycle Data Quality compares the minimum value of a field with that in the last cycle to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, an alert is reported.
Fluctuation of the maximum value of a field compared with that in the last cycle Data Quality compares the maximum value of a field with that in the last cycle to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, an alert is reported.
Count of each discrete point for grouping in a field The count of each discrete point for grouping in a field.
Fluctuations of the count of each discrete point for grouping in a field compared with that on the previous day, that of seven days ago, or that of one month ago The fluctuations of the count of each discrete point for grouping in a field compared with that on the previous day, that of seven days ago, or that of one month ago.
Total number of discrete points for grouping in a field The total number of discrete points for grouping in a field.
Fluctuation of the total number of discrete points for grouping in a field compared with that on the previous day The fluctuation of the total number of discrete points for grouping in a field compared with that on the previous day.
Whether the table size, in bytes, remains unchanged, compared with that in the last cycle The table size, in bytes, remains unchanged, compared with that in the last cycle.
Whether the table size, in bytes, is changed, compared with that in the last cycle The table size in bytes is changed, compared with that in the last cycle.
Whether the number of table rows is changed, compared with that in the last cycle The number of table rows is changed, compared with that in the last cycle.
Whether the number of table rows remains unchanged, compared with that in the last cycle The number of table rows remains unchanged, compared with that in the last cycle.
Difference between the table size, in bytes, and that in the last cycle Data Quality compares the table size in bytes with that in the last cycle to obtain the difference.
Difference between the number of table rows and that in the last cycle Data Quality compares the number of table rows collected on the current day with that in the partition generated in the last cycle to obtain the difference.
Number of table rows The number of table rows.
Table size, in bytes The table size, in bytes.
Difference between the number of table rows and that on the previous day Data Quality compares the number of table rows collected on the current day with that in the partition generated on the previous day to obtain the difference.
Difference between the table size, in bytes, and that on the previous day Data Quality compares the table size in bytes with that on the previous day to obtain the difference.
Fluctuation of the table size compared with that on the previous day This template is used to compare the table size with that on the previous day to obtain the fluctuation.

For example, you can set the warning threshold to 5% and the error threshold to 10%. If the fluctuation is greater than 5% and less than or equal to 10%, a warning alert is reported. If the fluctuation is greater than 10%, an error alert is reported.

Fluctuation of the table size compared with that of seven days ago This template is used to compare the table size with that of seven days ago.

For example, you can set the warning threshold to 5% and the error threshold to 10%. If the fluctuation is greater than 5% and less than or equal to 10%, a warning alert is reported. If the fluctuation is greater than 10%, an error alert is reported.

Fluctuation of the table size compared with that of one month ago This template is used to compare the table size with that of one month ago.

For example, you can set the warning threshold to 5% and the error threshold to 10%. If the fluctuation is greater than 5% and less than or equal to 10%, a warning alert is reported. If the fluctuation is greater than 10%, an error alert is reported.

Fluctuation of the number of table rows compared with the average number in the last seven days The average number of table rows in the last seven days is the baseline.
Fluctuation of the number of table rows compared with the average number in the last 30 days The average number of table rows in the last 30 days is the baseline.
Fluctuation of the number of table rows compared with that of the previous day Data Quality compares the number of table rows collected on the current day with that in the partition generated on the previous day to obtain the fluctuation.
Fluctuation of the number of table rows compared with that of seven days ago Data Quality compares the number of table rows collected on the current day with that in the partition generated seven days ago to obtain the fluctuation.
Fluctuation of the number of table rows compared with that of one month ago Data Quality compares the number of table rows collected on the current day with that in the partition generated one month ago to obtain the fluctuation.
Fluctuations of the number of table rows compared with that on the previous day, that of seven days ago, that of one month ago, and that on the first day of the current month Data Quality compares the number of table rows with that on the previous day, that of seven days ago, that of one month ago, and that on the first day of the current month to obtain the fluctuations.
Fluctuation of the number of table rows compared with that in the last cycle Data Quality compares the number of table rows collected on the current day with that in the partition generated in the last cycle to obtain the fluctuation.