Data filtering and parameter settings - DataWorks - Alibaba Cloud Documentation Center

Data quality monitoring typically scans only changed data, not an entire table. Use the filter parameter in your monitoring configuration to define which rows Data Quality evaluates.

How filters work

Set a filter at two levels:

Dataset level — applies to all rules in the dataset. Use this when the same partition or time condition applies to every check, for example, checking only today's partition in a MaxCompute table.
Rule level — applies to a single rule only. Use this to add extra row-level conditions on top of the dataset filter.

When both levels are set, Data Quality combines them with AND logic. The following table shows how a dataset-level filter and a rule-level filter combine into the effective filter that Data Quality executes:

Level	Filter expression
Dataset	`dt = '$[yyyymmdd]' AND hh = '$[hh24-1/24]'`
Rule	`id IS NOT NULL`
Effective filter	`(dt = '$[yyyymmdd-1]' AND hh = '$[hh24-1/24]') AND (id IS NOT NULL)`

When to use each level:

Add a dataset-level filter when the same time partition or row scope applies to most or all rules in that dataset.
Add a rule-level filter when only one rule needs an extra condition. If the same rule-level filter appears on many rules, promote it to the dataset level.

Note

If no rows match the filter expression, Data Quality does not evaluate the check. The check result is neither pass nor fail — it is skipped. This is expected behavior for incremental data scenarios where a partition may be empty.

Configuration example

The following YAML shows both filter levels in use:

datasets:
  - type: Table
    tables:
      - tb_d_spec_demo
    filter: "dt = '$[yyyymmdd]' AND hh = '$[hh24-1/24]'"
    dataSource:
      name: odps_first
      envType: Dev
rules:
  - assertion: "row_count > 0"
    # Rule-level filter. Combined with the dataset filter using AND.
    # Effective filter: (dt = '$[yyyymmdd-1]' AND hh = '$[hh24-1/24]') AND (id IS NOT NULL)
    filter: "id IS NOT NULL"
computeResource:
  id: 2001

Filter syntax

Supported predicates

The filter syntax follows the SQL-92 WHERE clause, except that the match and overlaps predicates are not supported. All other standard predicates are available:

Predicate	Example
Comparison	`salary > 50000`
BETWEEN	`price BETWEEN 10 AND 50`
IN	`department IN ('Sales', 'Marketing')`
LIKE	`name LIKE 'Jo%'`
NULL	`department IS NOT NULL`
Quantified comparison	`salary > ALL (SELECT salary FROM employees WHERE department = 'Sales')`
EXISTS	`EXISTS (SELECT 1 FROM customers cu WHERE cu.country_id = c.country_id)`

Combine predicates with AND or OR logical operators:

-- AND
salary > ALL (SELECT salary FROM employees WHERE department = 'Sales') AND EXISTS (SELECT 1 FROM customers cu WHERE cu.country_id = c.country_id)

-- OR
department IS NOT NULL OR name LIKE 'Jo%'

Formal grammar

<search condition> ::=
    <boolean term> | <search condition> OR <boolean term>

<boolean term> ::=
    <boolean factor> | <boolean term> AND <boolean factor>

<boolean factor> ::=
    [ NOT ] <boolean test>

<boolean test> ::=
    <predicate> | ( <search condition> )

<predicate> ::=
    <comparison predicate> | <between predicate> |
    <in predicate> | <like predicate> | <null predicate> |
    <quantified comparison predicate> | <exists predicate>

<comparison predicate> ::=
    <expression> <comparison_operator> <expression>

<between predicate> ::=
    <expression> [NOT] BETWEEN <lower_bound> AND <upper_bound>

<in predicate> ::=
    <expression> [NOT] IN (value_list | subquery)

<like predicate> ::=
    <expression> [NOT] LIKE pattern [ESCAPE escape_character]

<null predicate> ::=
    <expression> IS [NOT] NULL

<quantified comparison predicate> ::=
    <expression> <comparison_operator> { ALL | ANY | SOME } (subquery)

<exists predicate> ::=
    EXISTS (subquery)

Use a native query

If the underlying data source requires syntax not expressible in SQL-92 — such as RLIKE, geolocation search, or full-text search — write a native query directly in the filter field. Prefix the expression with query:.

The following example filters a MaxCompute table using RLIKE:

datasets:
  - type: Table
    indices:
      - ods_d_customers
    dataSource:
      name: odps_first
      envType: Dev
rules:
  - assertion: "row_count > 0"
    filter: "query: email_address RLIKE '^info@'"

Use time variables in a filter

Use scheduling system time offset expressions in filters to target the correct daily partition automatically. Data Quality evaluates the expression against the instance's scheduled time and substitutes the calculated value before running the check.

For example, $[yyyymmdd-1] resolves to yesterday's date, and $[hh24-1/24] resolves to the previous hour.

Trigger type	How `triggerTime` is determined
Triggered by a scheduling instance	The instance's scheduled time is used automatically
Triggered manually (not by a scheduling instance)	Specify `triggerTime` in the trigger parameters of CreateDataQualityScanRun

Note

If triggerTime is not set for a manually triggered run, time variable expressions cannot be resolved correctly. Always set triggerTime when calling CreateDataQualityScanRun outside of a scheduled context.