Data quality monitoring typically scans only changed data, not an entire table. Use the filter parameter in your monitoring configuration to define which rows Data Quality evaluates.
How filters work
Set a filter at two levels:
-
Dataset level — applies to all rules in the dataset. Use this when the same partition or time condition applies to every check, for example, checking only today's partition in a MaxCompute table.
-
Rule level — applies to a single rule only. Use this to add extra row-level conditions on top of the dataset filter.
When both levels are set, Data Quality combines them with AND logic. The following table shows how a dataset-level filter and a rule-level filter combine into the effective filter that Data Quality executes:
| Level | Filter expression |
|---|---|
| Dataset | dt = '$[yyyymmdd]' AND hh = '$[hh24-1/24]' |
| Rule | id IS NOT NULL |
| Effective filter | (dt = '$[yyyymmdd-1]' AND hh = '$[hh24-1/24]') AND (id IS NOT NULL) |
When to use each level:
-
Add a dataset-level filter when the same time partition or row scope applies to most or all rules in that dataset.
-
Add a rule-level filter when only one rule needs an extra condition. If the same rule-level filter appears on many rules, promote it to the dataset level.
If no rows match the filter expression, Data Quality does not evaluate the check. The check result is neither pass nor fail — it is skipped. This is expected behavior for incremental data scenarios where a partition may be empty.
Configuration example
The following YAML shows both filter levels in use:
datasets:
- type: Table
tables:
- tb_d_spec_demo
filter: "dt = '$[yyyymmdd]' AND hh = '$[hh24-1/24]'"
dataSource:
name: odps_first
envType: Dev
rules:
- assertion: "row_count > 0"
# Rule-level filter. Combined with the dataset filter using AND.
# Effective filter: (dt = '$[yyyymmdd-1]' AND hh = '$[hh24-1/24]') AND (id IS NOT NULL)
filter: "id IS NOT NULL"
computeResource:
id: 2001
Filter syntax
Supported predicates
The filter syntax follows the SQL-92 WHERE clause, except that the match and overlaps predicates are not supported. All other standard predicates are available:
| Predicate | Example |
|---|---|
| Comparison | salary > 50000 |
| BETWEEN | price BETWEEN 10 AND 50 |
| IN | department IN ('Sales', 'Marketing') |
| LIKE | name LIKE 'Jo%' |
| NULL | department IS NOT NULL |
| Quantified comparison | salary > ALL (SELECT salary FROM employees WHERE department = 'Sales') |
| EXISTS | EXISTS (SELECT 1 FROM customers cu WHERE cu.country_id = c.country_id) |
Combine predicates with AND or OR logical operators:
-- AND
salary > ALL (SELECT salary FROM employees WHERE department = 'Sales') AND EXISTS (SELECT 1 FROM customers cu WHERE cu.country_id = c.country_id)
-- OR
department IS NOT NULL OR name LIKE 'Jo%'
Formal grammar
<search condition> ::=
<boolean term> | <search condition> OR <boolean term>
<boolean term> ::=
<boolean factor> | <boolean term> AND <boolean factor>
<boolean factor> ::=
[ NOT ] <boolean test>
<boolean test> ::=
<predicate> | ( <search condition> )
<predicate> ::=
<comparison predicate> | <between predicate> |
<in predicate> | <like predicate> | <null predicate> |
<quantified comparison predicate> | <exists predicate>
<comparison predicate> ::=
<expression> <comparison_operator> <expression>
<between predicate> ::=
<expression> [NOT] BETWEEN <lower_bound> AND <upper_bound>
<in predicate> ::=
<expression> [NOT] IN (value_list | subquery)
<like predicate> ::=
<expression> [NOT] LIKE pattern [ESCAPE escape_character]
<null predicate> ::=
<expression> IS [NOT] NULL
<quantified comparison predicate> ::=
<expression> <comparison_operator> { ALL | ANY | SOME } (subquery)
<exists predicate> ::=
EXISTS (subquery)
Use a native query
If the underlying data source requires syntax not expressible in SQL-92 — such as RLIKE, geolocation search, or full-text search — write a native query directly in the filter field. Prefix the expression with query:.
The following example filters a MaxCompute table using RLIKE:
datasets:
- type: Table
indices:
- ods_d_customers
dataSource:
name: odps_first
envType: Dev
rules:
- assertion: "row_count > 0"
filter: "query: email_address RLIKE '^info@'"
Use time variables in a filter
Use scheduling system time offset expressions in filters to target the correct daily partition automatically. Data Quality evaluates the expression against the instance's scheduled time and substitutes the calculated value before running the check.
For example, $[yyyymmdd-1] resolves to yesterday's date, and $[hh24-1/24] resolves to the previous hour.
| Trigger type | How triggerTime is determined |
|---|---|
| Triggered by a scheduling instance | The instance's scheduled time is used automatically |
| Triggered manually (not by a scheduling instance) | Specify triggerTime in the trigger parameters of CreateDataQualityScanRun |
If triggerTime is not set for a manually triggered run, time variable expressions cannot be resolved correctly. Always set triggerTime when calling CreateDataQualityScanRun outside of a scheduled context.