Data Quality provides rule templates to simplify the creation of Specs. The system includes a set of common, built-in templates, and you can also create your own custom templates. Templates contain most of the required Spec settings, so when you create a rule from a template, you only need to configure the remaining settings.
Use system rule templates
The following example shows how to create a rule using the 1, 7, and 30-day average volatility system rule template:
datasets:
- type: Table
tables:
- tb_d_spec_demo
filter: "dt='$[yyyymmdd]' AND hh='$[hh24-1/24]'"
dataSource:
name: odps_first
envType: Dev
rules:
- templateId: "SYSTEM:field:avg:flux:1_7_1m_bizdate"
fields:
- col_income
warn: "when > 1%"
fail: "when > 10%"The preceding code snippet creates a Data Quality monitoring rule by referencing the 1, 7, and 30-day average volatility system rule template. The parameters are configured as follows:
templateId: The rule templateID. For more rule template IDs, see System Rule Template List. You can also create a custom rule template and refer to Use a custom rule template to create a rule.templateParameters: The parameters that the rule template requires. Different templates require different parameters. A common parameter isfields, which specifies the list of fields that the rule monitors.For the parameters required by each rule template, see System rule template list.
warn/fail: The thresholds.
Use custom rule templates
If system rule templates do not meet your needs, or if you have many custom SQL rules that contain the same SQL statements, you can create custom rule templates.
Define a custom template
The following code snippet shows an example of a Spec configuration for a custom rule template:
name: "Check for non-empty ID row count"
id: "1760d075-00bd-43c9-be8f-3c9a4bac35db"
assertion: "id_not_null_row_count = 0"
id_not_null_row_count:
query: "SELECT COUNT(*) FROM ${table} WHERE dt = '$[yyyymmdd-1]';"
catalog: "ods_layer/real-time_check"The preceding code snippet defines a simple custom rule template. It includes the following fields:
name: The template name.id: The unique identifier for the template. This ID must be globally unique and is used to reference the template.assertion: The logic of the rule. For more information about the syntax, see theassertionfield in Data Quality Spec configuration.id_not_null_row_count: Themetricdefinition referenced by theassertionin this example. In thequery, you can use${tableName}as a placeholder for the table name. When you create a rule from this rule template, this placeholder is replaced with the name of the table that the rule monitors. You can also directly reference system time parameters. For more information, see Data filtering configuration.catalog: The category of the rule template, which is used for management. Use a forward slash (/) to separate multiple levels of categories.
Threshold settings in custom templates
Fixed and non-fixed thresholds
The following code snippet shows an example of a configuration with a non-fixed threshold:
name: "Check for non-empty ID row count"
id: "1760d075-00bd-43c9-be8f-3c9a4bac35db"
assertion: "id_not_null_row_count"
id_not_null_row_count:
query: "SELECT COUNT(*) FROM ${table} WHERE dt = '$[yyyymmdd-1]';"Volatility thresholds
name: "Check for non-empty ID row count"
id: "1760d075-00bd-43c9-be8f-3c9a4bac35db"
assertion: "change avg last 7 days percent for id_not_null_row_count"
id_not_null_row_count:
query: "SELECT COUNT(*) FROM ${table} WHERE dt = '$[yyyymmdd-1]';"Create a rule using a custom template
Example 1
datasets: - type: Table tables: - tb_d_spec_demo filter: "dt='$[yyyymmdd]' AND hh='$[hh24-1/24]'" dataSource: name: odps_first envType: Dev rules: - templateId: "1760d075-00bd-43c9-be8f-3c9a4bac35db" # Defines the failure threshold fail: "when != 0"Example 2
datasets: - type: Table tables: - tb_d_spec_demo filter: "dt='$[yyyymmdd]' AND hh='$[hh24-1/24]'" dataSource: name: odps_first envType: Dev rules: - templateId: "1760d075-00bd-43c9-be8f-3c9a4bac35db" warn: "when between 1 and 10)" fail: "when > 10"Example 3
datasets: - type: Table tables: - tb_d_spec_demo filter: "dt='$[yyyymmdd]' AND hh='$[hh24-1/24]'" dataSource: name: odps_first envType: Dev rules: - templateId: "1760d075-00bd-43c9-be8f-3c9a4bac35db" warn: "when not between -1% and 1%" fail: "when not between -10% and 10%"
System rule template list
Id | Template | Required fields | Example |
| Table row count, fixed value |
| |
| Table row count greater than 0 | - | |
| Table row count, 1-day difference |
| |
| Table row count, difference from the previous epoch |
| |
| Table row count, 1-, 7-, and 30-day volatility rate |
| |
| Table row count, 1-, 7-, 30-day, and first-of-month volatility rate |
| |
| Table row count, 1-day volatility rate |
| |
| Table row count, 30-day volatility rate |
| |
| Table row count, 7-day volatility rate |
| |
| Table row count, dynamic threshold | - | |
| Table row count, 7-day average volatility rate |
| |
| Table row count, 30-day average volatility rate |
| |
| Table row count, volatility rate from the previous epoch |
| |
| Table size, fixed value |
| |
| Table size, 1-day difference (bytes) |
| |
| Table size, difference from the previous epoch |
| |
| Table size, 1-day volatility rate |
| |
| Table size, 30-day volatility rate |
| |
| Table size, 7-day volatility rate |
| |
| Table size, dynamic threshold | - | |
| Null value count, fixed value |
| |
| Null value count is 0 |
| |
| Percentage of null values, fixed value |
| |
| Regular expression check |
| |
| Date format check |
| |
| Email format check |
| |
| ID card format check |
| |
| Mobile number format check |
| |
| Currency format check |
| |
| Numeric format check |
| |
| Phone number format check |
| |
| Duplicate value count, fixed value |
| |
| Duplicate value count is 0 |
| |
| Duplicate value count for multiple fields is 0 |
| |
| Percentage of duplicate values, fixed value |
| |
| Unique value count, fixed value |
| |
| Unique value count, 1-, 7-, and 30-day volatility rate |
| |
| Unique value count, dynamic threshold |
| |
| Percentage of unique values, fixed value |
| |
| Minimum value, 1-, 7-, and 30-day volatility rate |
| |
| Minimum value, dynamic threshold |
| |
| Minimum value, 1-day volatility rate |
| |
| Minimum value, volatility rate from the previous epoch |
| |
| Minimum value, 1-, 7-, and 30-day volatility rate |
| |
| Maximum value, dynamic threshold |
| |
| Maximum value, 1-day volatility rate |
| |
| Maximum value, volatility rate from the previous epoch |
| |
| Average value, 1-, 7-, and 30-day volatility rate |
| |
| Average value, dynamic threshold |
| |
| Average value, 1-day volatility rate |
| |
| Sum, 1-, 7-, and 30-day volatility rate |
| |
| Sum, dynamic threshold |
| |
| Sum, 1-day volatility rate |
| |
| Sum, volatility rate from the previous epoch |
| |
| Count of unique values not in enumeration, fixed value |
| |
| Row count with values not in enumeration, fixed value |
| |
| Row count with values not in enumeration is 0 |
| |
| Custom enumeration check, fixed value |
| |
| Count of discrete status values, fixed value |
| |
| Count of discrete groups, fixed value |
| |
| Count of discrete groups, dynamic threshold |
| |
| Count of discrete status values, dynamic threshold |
| |
| Count of discrete groups, volatility rate from the previous epoch |
| |
| Discrete metrics (groups and values), 1-, 7-, and 30-day volatility rate |
| |
Appendix: Configuration instructions for custom range system templates
The following custom range system templates are configured differently from standard system templates and require separate configuration.
Conditional match percentage, custom range
rules: - assertion: "matched_row_percent = 0" filter: "id IS NULL"Field minimum value, custom range
rules: - assertion: "anomal detection for min(income)"Field maximum value, custom range
rules: - assertion: "change avg last 7 days percent for max(income)" warn: "when > 0.1%" fail: "when > 0.5%"Field average value, custom range
rules: - assertion: "change var last 30 days percent for avg(income)" warn: "when < -0.1%" fail: "when < -0.5%"Field sum, custom range
rules: - assertion: "change 1 days ago percent for sum(income)" warn: "when not between -0.1% and 0.1%" fail: "when not between -0.5% and 0.5%"Custom SQL
assertionparameter, see Define custom metric rules.rules: - assertion: "change percent for id_not_null_count" id_not_null_count: query: "SELECT COUNT(*) AS cnt FROM tb_spec_demo WHERE dt = '$[yyyymmdd]'" warn: "when not between -0.1% and 0.1%" fail: "when not between -0.5% and 0.5%"