All Products
Search
Document Center

Dataphin:Quality rule parameter configuration

Last Updated:Nov 18, 2025

This topic describes the parameter configuration information for quality rules.

Data table parameter configuration

Data table rule configuration

Template type

Description

Completeness/Uniqueness

Completeness-Null Value Validation/Empty String Validation

Uniqueness-Uniqueness Validation/Field Group Count Validation/Duplicate Value Count Validation:

  • Validation Field: Select the field from the physical table to validate.

  • Validation Table Data Filtering: This option is disabled by default. You can enable it to configure filter conditions, partition filters, or regular data filters for the validation table. The filter conditions are appended directly to the validation SQL statement. To filter partitions, you can configure a partition filter expression in the scheduling configuration. This sets the validation partition as the minimum granularity for viewing the quality report. Enter the data filtering conditions. For example:

    id = 12 --single table

    T1.id=12 and T2.name = "Zhang San" --two tables

Timeliness

  • Time Comparison With Expression:

    • Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness.

    • Comparison Item: The expression for comparing time, with built-in parameters ${column}, ${bizdate}.

      • Define an expression based on the comparison field. ${column} is a built-in parameter for the comparison item, for example: ${column}, substr(${column}) .

      • If you select ${bizdate} for date function comparison, it is recognized as the data timestamp.

    • Time Difference: Supports selecting Validation Field - (minus) Comparison Field, Comparison Field - (minus) Validation Field. The value before the decimal point of the unit item is taken without rounding. For example, if the shipping date and order date need to be on the same day, Validation Item - Comparison Item can be set to less than 1 day. As shown in the figure, the setting is: Validation field - Comparison field >= 1 day and Validation field - Comparison field < 1 day.

      image.png

  • Single Table Field Comparison:

    • Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness.

    • Comparison Field: Select the field in the physical table that you want to compare.

    • Time Difference: Supports selecting Validation Field - (minus) Comparison Field, Comparison Field - (minus) Validation Field. The value before the decimal point of the unit is taken without rounding. For example, if the shipping date and order date need to be on the same day, Validation Item-Comparison Item can be set to less than 1 day. As shown in the figure, the setting is: Validation field - Comparison field >= 1 day and Validation field - Comparison field < 1 day.

      image.png

  • Time Interval Comparison In Two Tables:

    • Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness.

    • Select Comparison Table: Select the table and field in the physical table that you want to compare.

    • Association Expression: Built-in data table parameters T1 and T2, where T1 is the detection table and T2 is the comparison table, for example: T1.id = T2.id.

    • Time Difference: Supports selecting Validation Field - (minus) Comparison Field, Comparison Field - (minus) Validation Field. The value before the decimal point of the unit item is taken without rounding. For example, if the shipping date and order date need to be on the same day, Validation Item - Comparison Item can be set to less than 1 day. As shown in the figure, the setting is: Validation field - Comparison field >= 1 day and Validation field - Comparison field < 1 day.

      image.png

Validity

  • Column Format Validation:

    • Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness.

    • Content Recognition Form: You can select Recognition Expression, Regular Expression, String Matching (%), Starts With, Contains, Ends With. The expression is used to match data that conforms to the rules.

    • Content Recognition Method - Recognition Expression: Enter an SQL expression function. For example, ${column1}>0, where ${column} is a built-in parameter.

    • Content Recognition Form-Regular Expression: You can enter a regular expression and select built-in regular expressions, including Mobile Phone Number,Landline Phone,ID Card Number,Email,Bank Card Number, etc.

    • Content Recognition Form-String Matching (%): Standard like expression, using % as a wildcard character. For example, to match data starting with 'a', enter a%.

    • Content Recognition Form-Starts With: Enter the string to be matched, and % will be automatically appended to the end according to the pattern, for example: a%.

    • Content Recognition Form-Contains: Enter the string to be matched, and % will be automatically appended to the end according to the pattern, for example: %a%.

    • Content Recognition Form-Ends With: Enter the string to be matched, and % will be automatically appended to the end according to the pattern, for example: %a.

  • Column Length Validation:

    • Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness.

    • Field Length: Used for field length validation. You can set this to an Enumeration or a Range. Example of enumeration value validation: To compare against 3, 6, and 9, the generated validation SQL statement is x in (3, 6, 9).

  • Column Value Domain Validation:

    • Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness.

    • Value Domain Type: Select according to your actual business needs.

      • Text: Used for text validation. The validation field must be a text type. Set the Value Range to an Enumeration or a Range. Example of enumeration value validation: To compare against Zhang San and Li Si, the generated validation SQL statement is x in ("Zhang San", "Li Si").

      • Number: Used for numeric validation. The validation field must be a numeric type. Set the Value Range to an Enumeration or a Range. Example of enumeration value validation: To compare against 3, 6, and 9, the generated validation SQL statement is x in (3, 6, 9).

      • Date: Used for date validity validation. You need to ensure that the validation field is a date type (date). You need to set Date Format, supporting Year, Year-month, Year-month-day, Year-month-day-hour-minute-second settings. For example: Year: 2021~2022, Year-month: 2022-01~2022-02.

      • Timestamp: Used for timestamp validity validation. You need to ensure that the validation field is a timestamp type (timestamp).

      • Custom: Used for validating multiple formats, also supporting the use of functions. The content you enter will be directly sent to the data source for execution.

        You need to set Value Domain Range, supporting Enumeration or Range settings.

        Example of enumeration value validation: To compare against 3, 6, and 9, the generated validation SQL statement is x in (3, 6, 9).

        Example of range validation: The value range is from a minimum value y1 to a maximum value y2. If the field being compared is x, the final validation SQL statement is x >= y1 and x <= y2.

        Note
        • When using custom field formats, examples such as 10 and "10" may behave differently in different data sources.

        • You need to ensure that the functions you use are valid in the data source where the quality check is performed.

  • Reference Table Validation:

    • Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness.

    • Data Source Type: Select the data source type.

    • Datasource: Select the data source corresponding to the data source type. The system automatically fills in the DB/Schema of the data source, which can be modified. If the data source cannot connect to the Dataphin cluster, you need to configure it manually.

      Note
      • For performance considerations, at most 1000 lookup table data records will be read for cross-data source comparison. It is recommended that the lookup table and the table being validated are in the same data source.

      • Fields for cross-data source comparison will be automatically converted to string type. Some special field types may cause exceptions. It is recommended that the field types in the lookup table and the validation field are both string or other identical types.

    • Reference Tables: Select the target lookup table.

    • Lookup Table Name: Enter the lookup table name, which will be used as prompt information when configuring quality rules.

    • Lookup Table Reference Field: Select the corresponding reference field in the lookup table.

      Note

      The lookup table reference is used to determine whether a field exists in the lookup table. You need to select a field for comparison. For example, to determine whether all user IDs in the user details table are registered users.

    • Lookup Table Filter Condition: Set filter conditions for the partition table to filter data.

  • Standard Reference Table Validation (requires the Data Standard module to be activated):

    • Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness.

    • Reference Tables: Select a published lookup table under Data Standard. To create a lookup table, see Create and manage standard codes (lookup tables).

    • Lookup Table Reference: Compare the reference value selected here with the field value. You can select Code Value, Code Name, Code English Name.

Consistency

  • Columns Value Consistency Validation:

    • Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness.

    • Comparison Field: Select the field in the physical table that you want to compare.

  • Columns Statistical Consistency Validation:

    • Statistical Method: You can count fields, calculate maximum field values, character repetition rates, etc. For more information, see Stability.

    • Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness.

    • Comparison Field: Select the field in the physical table that you want to compare.

  • Single Field Business Logic Consistency Comparison:

    • Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness.

    • Recognition Expression: Enter an SQL expression function. For example, ${column1}>0, where ${column} is a built-in parameter.

      Note
      • The recognition expression supports the use of functions, but you need to ensure that the database used for execution supports these functions.

      • You can validate the business logic between multiple fields. For example, to check if the total sales amount is correct using the formula total_sales = unit_price × sales_volume, you can use ${total_sales}=${unit_price}*${sales_volume}.

  • Columns In Two Tables Value Consistency Validation:

    • Validation Field, Select Comparison Table, Association Expression, Data Filtering: For more information, see Timeliness Time Fields In Two Tables.

    • Join Type: This affects the calculation of total rows, normal rows, and abnormal rows in the data result. You can view the specific logic through the SQL preview. Supports four join types: left join, right join, inner join, and full join.

      • left join: Based on the validation table, calculates metrics such as total rows and normal rows.

      • right join: Based on the comparison table, calculates metrics such as total rows and normal rows.

      • inner join: Based on the matched data between the validation table and comparison table, calculates metrics such as total rows and normal rows. Suitable for scenarios where only matched data needs to be validated.

      • full join: Based on all data from both the validation table and comparison table, calculates metrics such as total rows and normal rows. Suitable for scenarios requiring strong consistency assurance between two tables.

  • Columns In Two Tables Statistical Consistency Validation:

    • Statistical Method: You can count fields, calculate maximum field values, character repetition rates, etc. For more information, see the description of statistical methods for the Stability metric.

    • Validation Field, Select Comparison Table, Validation Table Data Filtering: For more information, see Timeliness Time Fields In Two Tables.

    • Comparison Table Data Filtering: Enter the data filtering content. For example: city='beijing'.

  • Cross-source Two Tables Field Statistical Value Consistency Comparison:

    • Statistical Method: You can count fields, calculate maximum field values, character repetition rates, etc. For more information, see the description of statistical methods for the Stability metric.

    • Validation Field: Select the field in the data table that you want to validate.

    • Data Source Type: Select the data source type for the data table. For more information about the supported data source types, see Supported data sources in Dataphin.

    • Datasource: Select the data source to which this data table belongs. The system automatically fills in the DB/Schema of the data source, which can be modified. If it is an external data source, you need to configure it manually.

    • Select Comparison Table: Select the table and field in the data table that you want to compare.

    • Validation Table Data Filtering: Enter the data filtering content, for example:

      id = 12 --single table

      T1.id=12 and T2.name = "Zhang San" --two tables

    • Comparison Table Data Filtering: Enter the data filtering content. For example: city='beijing'.

  • Columns In Two Tables Processing Logic Consistency Validation:

    • Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness.

    • Business Logic: Enter an SQL expression function. For example, ${column1}>0, where ${column} is a built-in parameter.

      Note

      You can validate the business logic between multiple fields. For example, to check if the total sales amount is correct using the formula total_sales = unit_price × sales_volume, you can use ${T1.total_sales}=${T2.unit_price}*${T2.sales_volume}.

Stability

  • Table Stability Validation/Table Volatility Validation:

    • Statistical Method: You can count fields, calculate maximum field values, character repetition rates, etc. For more information, see the description of statistical methods for the Stability metric.

    • Validation Table Data Filtering: For more information, see Completeness/Uniqueness.

  • Column Stability Validation/Column Volatility Validation:

    • Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness.

    • Statistical Method: You can count fields, calculate maximum field values, character repetition rates, etc. For more information, see the description of statistical methods for the Stability metric.

Custom SQL

  • Custom Statistic Validation:

    Sql: Supports select query statements. The query object must include the main table. For example:

    select sum(sale) from tableA where ds=${bizdate};

  • Custom Detail Value Validation:

    • SQL for total row count: Enter the SQL statement for counting the total number of rows. For example: select count(*) from tableA where ds=${bizdate};

    • SQL for abnormal row count: Enter the SQL statement for counting the number of abnormal rows. For example: select count(*) from tableA where ds=${bizdate} and age<0;

    • SQL for abnormal data: Enter the SQL statement for querying abnormal data. For example: select ${t1.c1} as dataphin_quality_error_data from ${t1} where ds=${bizdate} and ${t1.c1}<0;

Note

If you select a custom rule template for a Custom SQL rule, the configuration section automatically parses the template's variable fields as property values. Configure these values based on the descriptions of the template variables.

Data table validation configuration description

Template type

Configuration item

Description

Completeness

Abnormal rows/Normal rows/Abnormal rate/Normal rate

  • Normal rows: Number of unique value rows.

  • Normal rate: Number of unique value rows/Total rows.

  • Abnormal rows: Total number of rows - Number of normal rows. This is the sum of groups where the count is greater than 2.

  • Abnormal rate: 1 - Normal rate or Abnormal rows/Total rows.

    Note

    Unique value definition: Records that appear only once. That is, groups with count=1 after grouping.

Uniqueness

Abnormal rows/Normal rows/Abnormal rate/Normal rate

  • Normal rows: Number of unique value rows.

  • Normal rate: Number of unique value rows/Total rows.

  • Abnormal rows: Total number of rows - Number of normal rows. This is the sum of groups where the count is greater than 2.

  • Abnormal rate: 1 - Normal rate or Abnormal rows/Total rows.

    Note

    Unique value definition: Records that appear only once. That is, groups with count=1 after grouping.

Statistical value

Refers to the unique value definition, which is the data after a count distinct operation.

Statistical value (Duplicate rows/Duplication rate)

  • Duplicate rows is used to check the number of duplicated rows in the data, only counting the extra rows. The algorithm is total rows - deduplicated rows.

  • Duplication rate = Duplicate rows/Total rows.

  • To calculate the total number of duplicate data rows/duplication rate, you can use the abnormal rows/abnormal rate metrics of the unique value validation template.

Timeliness, Validity

Abnormal rows/Normal rows/Abnormal rate/Normal rate

  • Normal rows: Number of unique value rows.

  • Normal rate: Number of unique value rows/Total rows.

  • Abnormal rows: Total number of rows - Number of normal rows. This is the sum of groups where the count is greater than 2.

  • Abnormal rate: 1 - Normal rate or Abnormal rows/Total rows.

    Note

    Unique value definition: Records that appear only once. That is, groups with count=1 after grouping.

Consistency

Statistical difference, Statistical difference rate (%)

Statistical difference: Validation field - Comparison field.

Statistical difference rate: Validation field/Comparison field.

Stability

Statistical value (1-day volatility, 7-day volatility, 30-day volatility)

Compares the table row count with those collected 1 day, 7 days, and 30 days ago, compares the volatility, and then compares with the set threshold. If any one does not meet the rule, an alarm is triggered.

Custom SQL

Abnormal rows/Normal rows/Abnormal rate/Normal rate

  • Normal rows: Number of unique value rows.

  • Normal rate: Number of unique value rows/Total rows.

  • Abnormal rows: Total number of rows - Number of normal rows. This is the sum of groups where the count is greater than 2.

  • Abnormal rate: 1 - Normal rate or Abnormal rows/Total rows.

    Note

    Unique value definition: Records that appear only once. That is, groups with count=1 after grouping.

Statistical value (1-day volatility, 7-day volatility, 30-day volatility)

Compares the table row count with those collected 1 day, 7 days, and 30 days ago, compares the volatility, and then compares with the set threshold. If any one does not meet the rule, an alarm is triggered.

Metric parameter configuration

Metric rule configuration

Template type

Description

Uniqueness

Field Group Count Validation/Duplicate Value Count Validation: Data filtering needs to be configured.

Data Filtering: Disabled by default. After it is enabled, you can configure filter conditions, partition filters, or regular data filters for the validation table. The filter conditions will be directly appended to the validation SQL statement. To filter partitions for the validation table, we recommend that you configure a partition filter expression in the scheduling configuration. After the configuration, the quality report will be viewed at the minimum granularity of the validation partition. Enter the data filtering content, for example:

id = 12 --single table

T1.id=12 and T2.name = "Zhang San" --two tables

Stability

Column Stability Validation/Column Volatility Validation:

  • Data Filtering: Consistent with the description of Uniqueness in the metric rule configuration.

  • Statistical Method: Select the statistical method for the metric.

    • Field Count: Compares the field COUNT number with the expected number, i.e., fixed value check.

    • Field Unique Value Count: Compares the field COUNT number after deduplication with the expected number, i.e., fixed value check.

    • Field Sum Value: Takes the SUM value of the field and compares it with a fixed value.

    • Field Average Value: Takes the average value of the field and compares it with a set fixed value.

    • Field Maximum Value: Takes the maximum value of the field and compares it with a threshold.

    • Field Minimum Value: Takes the minimum value of the field and compares it with a threshold.

    • Field Duplicate Rows: Compares the number of duplicate values with a fixed value (total rows minus the number after deduplication, i.e., the number of duplicate values in the field).

    • Field Duplication Rate %: Field duplicate count/Total rows, compares the ratio of duplicate values to total rows with a fixed value.

    • Field Group Count: Takes the number of groups after grouping the field and compares it with a set fixed value.

    • Field Null Value Rows: Compares the number of null values in the field with a set fixed value.

    • Field Null Value Rate: Field null value count/Row count, compares the ratio of null values to total rows with a fixed value.

Metric validation configuration

Template type

Configuration item

Description

Uniqueness

Number of field groups

Compares the count of this field after grouping with the specified static field.

Statistics (duplicate rows/duplication rate)

  • Duplicate rows are used to check the number of repeated rows in the data, counting only the extra rows. The algorithm is total rows minus deduplicated rows.

  • Duplication rate = duplicate rows/total rows.

  • To calculate the total number of duplicate rows/duplication rate, you can use the abnormal rows/abnormal rate metrics of the unique value validation template.

Stability

Statistical value

The unique value count, which is obtained from a count distinct operation.

Statistical values (1-day volatility, 7-day volatility, 30-day volatility)

Compares the number of table rows collected with those collected 1 day, 7 days, and 30 days ago. Compares the fluctuation rates with the set threshold. If any of these comparisons does not comply with the rule, an alarm is triggered.

Mean fluctuation detection (7-day fluctuation, 30-day fluctuation)

The baseline value is the average number of table rows in the last 7 days or 30 days, compared with the fluctuation rate of the average value over the last 7 days or 30 days.

Statistical values (fluctuation rate compared to the 1st day of the current month, fluctuation rate compared to the previous month, fluctuation rate compared to the previous year)

Compares the number of table rows collected with those from the 1st day of the current month, the previous month, and the previous year to calculate fluctuation rates. These rates are then compared with the set threshold. An alarm is triggered if any of the comparisons does not comply with the rule.

Real-time meta-table parameter configuration

Offline link comparison parameter configuration

When real-time data and offline data use the same statistical ingest endpoint logic, the real-time and offline data verification quality rule can detect differences between the data. If the differences are significant, there may be issues with data quality.

Parameter

Description

Validation Field

Select the field that needs to be validated.

Metric Operator

Select the algorithm for the data.

Object Form

Select Single Value Data and Multiple Value Data.

Time Constraint Condition

Select the field for time constraint.

Enable Condition Constraint

Select Enable or Shutdown condition constraint.

Offline Data

Select offline data table from the dropdown.

Offline Data Retrieval

The default is Shutdown. When enabled, you can configure data retrieval from offline data tables through SQL statements.

Time Zone Setting

Select time zone from the dropdown.

Multi-path comparison parameter configuration

In scenarios where business requires strong guarantees, you can monitor data through real-time dual-link or real-time triple-link quality rules. If abnormalities occur, O&M personnel can promptly switch or backup data. Real-time multi-link comparison quality rules support monitoring issues such as data retention and statistical drift.

Parameter

Description

Validation Field

Select the field that needs to be validated.

Metric Operator

Select the algorithm for the data.

Object Form

Select Single-value Data and Multi-value Data.

Time Limit Condition

Select the time-limited field.

Enable Condition Restriction

Select Enable or Shutdown for condition restriction.

Number Of Comparison Links

Select the number of comparison links for the quality rule. The system supports selecting Real-time Three-link Comparison and Real-time Two-link Comparison.

Comparison Trace 1/Comparison Trace 2

Select a real-time meta table as the comparison trace data:

  • If you select Real-time Dual Trace Comparison as the template, you only need to select one real-time meta table.

  • If you select Real-time Triple Trace Comparison as the template, you need to select two real-time meta tables.

Time Zone Settings

Select a time zone from the dropdown list.