Data Quality provides rule templates to simplify the creation of Specs. The system includes a set of common, built-in templates, and you can also create your own custom templates. Templates contain most of the required Spec settings, so when you create a rule from a template, you only need to configure the remaining settings.
Use system rule templates
The following example shows how to create a rule using the 1, 7, and 30-day average volatility system rule template:
datasets:
- type: Table
tables:
- tb_d_spec_demo
filter: "dt='$[yyyymmdd]' AND hh='$[hh24-1/24]'"
dataSource:
name: odps_first
envType: Dev
rules:
- templateId: "SYSTEM:field:avg:flux:1_7_1m_bizdate"
fields:
- col_income
warn: "when > 1%"
fail: "when > 10%"The preceding code snippet creates a Data Quality monitoring rule by referencing the 1, 7, and 30-day average volatility system rule template. The parameters are configured as follows:
templateId: The rule templateID. For more rule template IDs, see System Rule Template List. You can also create a custom rule template and refer to Use a custom rule template to create a rule.templateParameters: The parameters that the rule template requires. Different templates require different parameters. A common parameter isfields, which specifies the list of fields that the rule monitors.For the parameters required by each rule template, see System rule template list.
warn/fail: The thresholds.
Use custom rule templates
If system rule templates do not meet your needs, or if you have many custom SQL rules that contain the same SQL statements, you can create custom rule templates.
Define a custom template
The following code snippet shows an example of a Spec configuration for a custom rule template:
name: "Check for number of rows with non-null ID"
id: "1760d075-00bd-43c9-be8f-3c9a4bac35db"
assertion: "id_not_null_row_count = 0"
id_not_null_row_count:
query: "SELECT COUNT(*) FROM ${table} WHERE dt = '$[yyyymmdd-1]';"
catalog: "ods_layer/real_time_check"The preceding code snippet defines a simple custom rule template. It includes the following fields:
name: The template name.id: The unique identifier for the template. This ID must be globally unique and is used to reference the template.assertion: The logic of the rule. For more information about the syntax, see theassertionfield in Data Quality Spec configuration.id_not_null_row_count: Themetricdefinition referenced by theassertionin this example. In thequery, you can use${tableName}as a placeholder for the table name. When you create a rule from this rule template, this placeholder is replaced with the name of the table that the rule monitors. You can also directly reference system time parameters. For more information, see Data filtering configuration.catalog: The category of the rule template, which is used for management. Use a forward slash (/) to separate multiple levels of categories.
Threshold settings in custom templates
Fixed and non-fixed thresholds
The following code snippet shows an example of a configuration with a non-fixed threshold:
name: "Check for number of rows with non-null ID"
id: "1760d075-00bd-43c9-be8f-3c9a4bac35db"
assertion: "id_not_null_row_count"
id_not_null_row_count:
query: "SELECT COUNT(*) FROM ${table} WHERE dt = '$[yyyymmdd-1]';"Volatility thresholds
name: "Check for number of rows with non-null ID"
id: "1760d075-00bd-43c9-be8f-3c9a4bac35db"
assertion: "change avg last 7 days percent for id_not_null_row_count"
id_not_null_row_count:
query: "SELECT COUNT(*) FROM ${table} WHERE dt = '$[yyyymmdd-1]';"Create a rule using a custom template
Example 1
datasets: - type: Table tables: - tb_d_spec_demo filter: "dt='$[yyyymmdd]' AND hh='$[hh24-1/24]'" dataSource: name: odps_first envType: Dev rules: - templateId: "1760d075-00bd-43c9-be8f-3c9a4bac35db" # Use fail to define the failure threshold fail: "when != 0"Example 2
datasets: - type: Table tables: - tb_d_spec_demo filter: "dt='$[yyyymmdd]' AND hh='$[hh24-1/24]'" dataSource: name: odps_first envType: Dev rules: - templateId: "1760d075-00bd-43c9-be8f-3c9a4bac35db" warn: "when between 1 and 10)" fail: "when > 10"Example 2
datasets: - type: Table tables: - tb_d_spec_demo filter: "dt='$[yyyymmdd]' AND hh='$[hh24-1/24]'" dataSource: name: odps_first envType: Dev rules: - templateId: "1760d075-00bd-43c9-be8f-3c9a4bac35db" warn: "when not between -1% and 1%" fail: "when not between -10% and 10%"
System rule template list
Id | Template | Required fields | Example |
| Table Row Count, Static Field |
| |
| Table Row Count Is Greater Than 0 | - | |
| Table Row Count, 1-day Difference |
| |
| Table Row Count, Previous Epoch Difference |
| |
| Table Row Count, 1, 7, 30-day Volatility Rate |
| |
| Table Row Count, 1, 7, 30-day, And First Day Of The Month Volatility Rate |
| |
| Table Row Count, 1-day Volatility Rate |
| |
| Table Row Count, 30-day Volatility Rate |
| |
| Table Row Count, 7-day Volatility Rate |
| |
| Table Row Count, Dynamic Threshold | - | |
| Table Row Count, 7-day Average Volatility Rate |
| |
| Table Row Count, 30-day Average Volatility Rate |
| |
| Table Row Count, Previous Epoch Volatility Rate |
| |
| Table Size, Static Field |
| |
| Table Size, Difference From 1 Day Ago (bytes) |
| |
| Table Size, Previous Epoch Difference |
| |
| Table Size, 1-day Volatility Rate |
| |
| Table Size, 30-day Volatility Rate |
| |
| Table Size, 7-day Volatility Rate |
| |
| Table Size, Dynamic Threshold | - | |
| Number Of Null Values, Static Field |
| |
| Number Of Null Records Is 0 |
| |
| Number Of Null Values/Total Row Count, Static Field |
| |
| Regular Expression Check |
| |
| Date Format Check |
| |
| Email Format Check |
| |
| ID Card Format Check |
| |
| Mobile Number Format Check |
| |
| Currency Format Check |
| |
| Numeric Format Check |
| |
| Phone Number Format Check |
| |
| Number Of Duplicate Values, Static Field |
| |
| Field Has 0 Duplicate Values |
| |
| Multiple Fields Have 0 Duplicate Values |
| |
| Number Of Duplicate Values/Total Row Count, Static Field |
| |
| Number Of Unique Values, Static Field |
| |
| Number Of Unique Values, 1, 7, 30-day Volatility Rate |
| |
| Number Of Unique Values, Dynamic Threshold |
| |
| Number Of Unique Values/Total Row Count, Static Field |
| |
| Minimum Value, 1, 7, 30-day Volatility Rate |
| |
| Minimum Value, Dynamic Threshold |
| |
| Minimum Value, 1-day Volatility Rate |
| |
| Minimum Value, Previous Epoch Volatility Rate |
| |
| Maximum Value, 1, 7, 30-day Volatility Rate |
| |
| Maximum Value, Dynamic Threshold |
| |
| Maximum Value, 1-day Volatility Rate |
| |
| Maximum Value, Previous Epoch Volatility Rate |
| |
| Average Value, 1, 7, 30-day Volatility Rate |
| |
| Average Value, Dynamic Threshold |
| |
| Average Value, 1-day Volatility Rate |
| |
| Sum, 1, 7, 30-day Volatility Rate |
| |
| Sum, Dynamic Threshold |
| |
| Sum, 1-day Volatility Rate |
| |
| Sum, Previous Epoch Volatility Rate |
| |
| Number Of Unique Values Not Matching Enumeration, Static Field |
| |
| Number Of Rows Not Matching Enumeration, Static Field |
| |
| Number Of Rows Not Matching Enumeration Is 0 |
| |
| Enumeration Value, Custom, Static Field |
| |
| Discrete Value (status Value), Static Field |
| |
| Discrete Value (number Of Groups), Static Field |
| |
| Discrete Value (number Of Groups), Dynamic Threshold |
| |
| Discrete Value (status Value), Dynamic Threshold |
| |
| Discrete Value (number Of Groups), 1-day Volatility Rate |
| |
| Discrete Value (number Of Groups And Status Values), 1, 7, 30-day Volatility Rate |
| |
Appendix: Configuration instructions for custom range system templates
The following custom range system templates are configured differently from standard system templates and require separate configuration.
Conditional Match Percentage, Custom Range
rules: - assertion: "matched_row_percent = 0" filter: "id IS NULL"Field Minimum Value, Custom Range
rules: - assertion: "anomal detection for min(income)"Field Maximum Value, Custom Range
rules: - assertion: "change avg last 7 days percent for max(income)" warn: "when > 0.1%" fail: "when > 0.5%"Field Average Value, Custom Range
rules: - assertion: "change var last 30 days percent for avg(income)" warn: "when < -0.1%" fail: "when < -0.5%"Field Sum, Custom Range
rules: - assertion: "change 1 days ago percent for sum(income)" warn: "when not between -0.1% and 0.1%" fail: "when not between -0.5% and 0.5%"Custom SQL
assertionparameter, see Define custom metric rules.rules: - assertion: "change percent for id_not_null_count" id_not_null_count: query: "SELECT COUNT(*) AS cnt FROM tb_spec_demo WHERE dt = '$[yyyymmdd]'" warn: "when not between -0.1% and 0.1%" fail: "when not between -0.5% and 0.5%"