All Products
Search
Document Center

Dataphin:Quality rule parameter configuration

Last Updated:Jan 23, 2025

This topic outlines the configuration parameters for quality rules.

Data table parameter configuration

Data table rule configuration

Template type

Description

Completeness/uniqueness

Completeness-Null Value Validation/Empty String Validation

Uniqueness-Uniqueness Validation/Field Group Count Validation/Duplicate Value Count Validation:

  • Validation Field: Select the field to be validated in the physical table.

  • Validation Table Data Filtering: Disabled by default. Enable to configure the filter condition, partition filter, or normal data filter of the validation table. The filter condition is appended directly to the validation SQL. If the validation table requires a partition filter, configure the partition expression in the scheduling configuration. Post-configuration, the quality report will reflect the validation partition as the minimum granularity. Example data filtering content:

    id = 12 --Single table

    T1.id=12 and T2.name = "Zhang San" --Double table

Timeliness

  • Time Comparison With Expression:

    • Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness description.

    • Comparison Item: The expression of the comparison time, built-in ${column}, ${bizdate} parameters.

      • Define the expression based on the comparison field, ${column} is the built-in parameter of the comparison item, for example: ${column}, substr(${column}) .

      • If you select the date function comparison ${bizdate}, it will be recognized as the business date.

    • Time Difference: Supports selecting Validation Field - (minus) Comparison Field, Comparison Field - (minus) Validation Field, taking the value before the decimal point of the unit item without rounding. For example, if the delivery date and the order date need to be on the same day, then Validation Item- Comparison Item can be set to less than 1 day. As shown in the figure, the validation field - comparison field >= 1 day and the validation field - comparison field < 1 day.

      image.png

  • Single Table Field Comparison:

    • Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness description.

    • Comparison Field: Select the field to be compared in the physical table.

    • Time Difference: Supports selecting Validation Field - (minus) Comparison Field, Comparison Field - (minus) Validation Field, taking the value before the decimal point of the unit item without rounding. For example, if the delivery date and the order date need to be on the same day, then Validation Item-Comparison Item can be set to less than 1 day. As shown in the figure, the validation field - comparison field >= 1 day and the validation field - comparison field < 1 day.

      image.png

  • Time Interval Comparison In Two Tables:

    • Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness description.

    • Select Comparison Table: Select the table to be compared in the physical table and the field to be compared in the physical table.

    • Association Expression: Built-in data table parameters T1, T2, T1 is the validation table, T2 is the comparison table, for example: T1.id = T2.id.

    • Time Difference: Supports selecting Validation Field - (minus) Comparison Field, Comparison Field - (minus) Validation Field, taking the value before the decimal point of the unit item without rounding. For example, if the delivery date and the order date need to be on the same day, then Validation Item- Comparison Item can be set to less than 1 day. As shown in the figure, the validation field - comparison field >= 1 day and the validation field - comparison field < 1 day.

      image.png

Validity

  • Column Format Validation:

    • Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness description.

    • Content Recognition Form: You can select Recognition Expression, Regular Expression, String Matching (%), Starts With, Contains, Ends With. The expression is used to match data that meets the rules.

    • Content Recognition Form-Recognition Expression: Enter the SQL expression function. For example, ${column1}>0, where ${column} is a built-in parameter.

    • Content Recognition Form-Regular Expression: You can enter a regular expression and select a built-in regular expression, including Mobile Number, Landline Number, ID Number, Email, Bank Card Number, etc.

    • Content Recognition Form-String Matching (%): Standard like expression, using % to represent a wildcard character. For example, to match data that starts with 'a', enter 'a%'.

    • Content Recognition Form-Starts With: Enter the string to be matched, and % will be automatically appended to the end according to the pattern. For example: 'a%'.

    • Content Recognition Form-Contains: Enter the string to be matched, and % will be automatically appended to the end according to the pattern. For example: '%a%'.

    • Content Recognition Form-Ends With: Enter the string to be matched, and % will be automatically appended to the end according to the pattern. For example: '%a'.

  • Column Length Validation:

    • Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness description.

    • Field Length: Used for field length validity validation, supporting Enumeration or Range settings. Example of validation by selecting enumeration values: comparing 3, 6, 9, the generated validation SQL is x in (3, 6, 9).

  • Column Value Domain Validation:

    • Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness description.

    • Value Domain Type: Select based on actual business needs.

      • Text: Used to validate the validity of text, ensuring that the validation field is of text type. You need to set the Value Domain Range, supporting Enumeration or Range settings. Example of validation by selecting enumeration values: comparing Zhang San and Li Si, the generated validation SQL is x in ("Zhang San", "Li Si").

      • Number: Used to validate the validity of numbers, ensuring that the validation field is of numeric type. You need to set the Value Domain Range, supporting Enumeration or Range settings. Example of validation by selecting enumeration values: comparing 3, 6, 9, the generated validation SQL is x in (3, 6, 9).

      • Date: Used to validate the validity of dates, ensuring that the validation field is of date type (date). You need to set the Date Format, supporting Year, Year-month, Year-month-day, Year-month-day-hour-minute-second settings. For example: Year: 2021~2022, Year-Month: 2022-01~2022-02.

      • Timestamp: Used to validate the validity of timestamps, ensuring that the validation field is of timestamp type (timestamp).

      • Customize: Used to validate multiple formats and supports the use of functions. The content filled in will be directly executed on the data source.

        You need to set the Value Domain Range, supporting Enumeration or Range settings.

        Example of validation by selecting enumeration values: comparing 3, 6, 9, the generated validation SQL is x in (3, 6, 9).

        Example of validation by selecting range settings: the value domain range is minimum value y1 to maximum value y2; assuming the field being compared is x, the final validation SQL is x >= y1 and x <= y2.

        Note
        • The field format when customizing, such as 10 and "10", may differ when executed on different data sources.

        • You need to ensure that the functions used are valid on the data source being validated.

  • Reference Table Validation:

    • Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness description.

    • Data Source Type: Select the data source type.

    • Datasource: Select the corresponding data source of the data source type.

      Note
      • For performance considerations, a maximum of 1000 reference table data entries will be read across data sources for comparison. It is recommended that the reference table and the validation table be in the same data source.

      • Fields compared across data sources will be automatically converted to string type. Some special field types may have abnormal situations. It is recommended that the field types in the reference table and the validation field type are both string or other same types.

    • Reference Tables: Select the target reference table.

    • Reference Table Name: Enter the name of the reference table as a prompt message when configuring quality rules.

    • Reference Table Reference Field: Select the corresponding reference field in the reference table.

      Note

      The reference table reference field is used to determine whether a field is in the reference table. You need to select a comparison field. For example, to determine whether the user IDs in the user details table are all registered users.

    • Reference Table Filter Condition: Set the filter condition of the partitioned table to filter the data.

  • Standard Reference Table Validation (requires enabling the data standard module):

    • Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness description.

    • Reference Tables: Select the reference table that has been published under the data standard. To create a reference table, see Create and manage standard codes (reference tables).

    • Reference Table Reference: Compare the selected reference value and field value here. You can choose Code Value, Code Name, Code English Name.

Consistency

  • Columns Value Consistency Validation:

    • Validation Field and Validation Table Data Filtering: For more information, see the Completeness/Uniqueness description.

    • Comparison Field: Select the field for comparison from the physical table.

  • Columns Statistical Consistency Validation:

    • Statistical Method: Count field occurrences, maximum values, character repetition rates, etc. For more information, refer to the Stability indicator description.

    • Validation Field and Validation Table Data Filtering: For more information, see the Completeness/Uniqueness description.

    • Comparison Field: Select the field for comparison from the physical table.

  • Single Field Business Logic Consistency Comparison:

    • Validation Field and Validation Table Data Filtering: For more information, see the Completeness/Uniqueness description.

    • Recognition Expression: Enter the SQL expression function, such as ${column1}>0, where ${column} is a built-in parameter.

      Note
      • The recognition expression supports functions; ensure the database supports the function during execution.

      • Supports validation of business logic across multiple fields. For instance, to verify if total sales are correct, use ${total_sales}=${unit_price}*${sales_volume}.

  • Columns In Two Tables Value Consistency Validation:

    • Validation Field, Select Comparison Table, Association Expression, and Data Filtering: For more information, see Timeliness and the Two Table Time Fields.

    • Join Type: Affects the calculation of total, normal, and abnormal row counts in the results. Use SQL preview to understand the logic. Supported joins: left join, right join, inner join, and full join.

      • left join: Calculates total, normal, and other row counts based on the validation table.

      • right join: Calculates total, normal, and other row counts based on the comparison table.

      • inner join: Calculates total, normal, and other row counts based on matched data between the validation and comparison tables, suitable for matched data validation scenarios.

      • full join: Calculates total, normal, and other row counts based on all data in both tables, suitable for scenarios requiring strong consistency.

  • Columns In Two Tables Statistical Consistency Validation:

    • Statistical Method: Count field occurrences, maximum values, character repetition rates, etc. For more information, refer to the Stability indicator description.

    • Validation Field, Select Comparison Table, and Validation Table Data Filtering: For more information, see Timeliness and the Two Table Time Fields.

    • Comparison Table Data Filtering: Specify the data filtering criteria, such as: city='beijing'.

  • Cross-source Two Table Field Statistical Value Consistency Comparison:

    • Statistical Method: Count field occurrences, maximum values, character repetition rates, etc. For more information, refer to the Stability indicator description.

    • Validation Field: Select the field for validation from the physical table.

    • Data Source Type: Choose the data source type for the physical table.

    • Datasource: Select the data source to which the physical table belongs.

    • Select Comparison Table: Choose the comparison table and the corresponding field from the physical table.

    • Validation Table Data Filtering: Enter the data filtering criteria, for example:

      id = 12 --Single table

      T1.id=12 and T2.name = "Zhang San" --Double table

    • Comparison Table Data Filtering: Specify the data filtering criteria, such as: city='beijing'.

  • Columns In Two Tables Processing Logic Consistency Validation:

    • Validation Field and Validation Table Data Filtering: For more information, see the Completeness/Uniqueness description.

    • Business Logic: Enter the SQL expression function, such as ${column1}>0, where ${column} is a built-in parameter.

      Note

      Supports validation of business logic across multiple fields. For example, to verify if total sales are correct, use ${T1.total_sales}=${T2.unit_price}*${T2.sales_volume}.

Stability

  • Table Stability Validation/Table Volatility Validation:

    • Statistical Method: You can count field counts, field maximum values, character repetition rates, etc. For additional information, consult the Stability indicator's statistical method description.

    • Validation Table Data Filtering: For more information, see Completeness/Uniqueness description.

  • Column Stability Validation/Column Volatility Validation:

    • Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness description.

    • Statistical Method: You can count field counts, field maximum values, character repetition rates, etc. For more information, see the indicator Stability description of the statistical method.

Sql

  • Custom Statistic Validation:

    • Sql: Supports select query statements. The query object must include the primary table. For example:

      select sum(sale) from tableA where ds=${bizdate};

  • Custom Detail Value Validation:

    • Total Rows SQL: Enter the SQL for counting the total number of rows. For example: select count(*) from tableA where ds=${bizdate};

    • Abnormal Rows SQL: Enter the SQL for counting the number of abnormal rows. For example: select count(*) from tableA where ds=${bizdate} and age<0;

    • Abnormal Data SQL: Enter the SQL for counting the abnormal data. For example: select ${t1.c1} as dataphin_quality_error_data from ${t1} where ds=${bizdate} and ${t1.c1}<0;

Data table validation configuration description

Template type

Configuration item

Description

Completeness

Abnormal Rows/Normal Rows/Abnormal Rate/Normal Rate

  • Normal Rows: Number of unique value rows.

  • Normal Rate: Number of unique value rows/Total number of rows.

  • Abnormal Rows: Total number of rows - number of normal rows, that is, count greater than 2 sum.

  • Abnormal Rate: 1 - normal rate or abnormal rows/total number of rows.

    Note

    Definition of unique value: records that appear only once. That is, the group with count=1 after grouping.

Uniqueness

Abnormal Rows/Normal Rows/Abnormal Rate/Normal Rate

  • Normal Rows: Number of unique value rows.

  • Normal Rate: Number of unique value rows/Total number of rows.

  • Abnormal Rows: Total number of rows - number of normal rows, that is, count greater than 2 sum.

  • Abnormal Rate: 1 - normal rate or abnormal rows/total number of rows.

    Note

    Definition of unique value: records that appear only once. That is, the group with count=1 after grouping.

Statistical Value

Refers to the definition of deduplicated value, the data after count distinct.

Statistical Value (Duplicate Rows/Duplicate Rate)

  • Duplicate rows are used to check the number of duplicate rows in the data, only calculating the extra rows. The algorithm is total number of rows - number of deduplicated rows.

  • Duplicate Rate = Duplicate Rows/Total Number of Rows.

  • If you need to calculate the total number of duplicate data rows/duplicate rate, you can use the abnormal rows/abnormal rate indicators of the uniqueness validation template.

Timeliness, Validity

Abnormal Rows/Normal Rows/Abnormal Rate/Normal Rate

  • Normal Rows: Number of unique value rows.

  • Normal Rate: Number of unique value rows/Total number of rows.

  • Abnormal Rows: Total number of rows - number of normal rows, that is, count greater than 2 sum.

  • Abnormal Rate: 1 - normal rate or abnormal rows/total number of rows.

    Note

    Definition of unique value: records that appear only once. That is, the group with count=1 after grouping.

Consistency

Statistical Difference, Statistical Difference Rate (%)

Statistical Difference: Validation Field - Comparison Field.

Statistical Difference Rate: Validation Field/Comparison Field.

Stability

Statistical Value (1-Day Volatility, 7-Day Volatility, 30-Day Volatility)

Compare the number of rows collected 1 day, 7 days, and 30 days ago, compare the volatility rate, and then compare it with the set threshold. If any one does not meet the rule, an alarm will be triggered.

Custom SQL

Abnormal Rows/Normal Rows/Abnormal Rate/Normal Rate

  • Normal Rows: Number of unique value rows.

  • Normal Rate: Number of unique value rows/Total number of rows.

  • Abnormal Rows: Total number of rows - number of normal rows, that is, count greater than 2 sum.

  • Abnormal Rate: 1 - normal rate or abnormal rows/total number of rows.

    Note

    Definition of unique value: records that appear only once. That is, the group with count=1 after grouping.

Statistical Value (1-Day Volatility, 7-Day Volatility, 30-Day Volatility)

Compare the number of rows collected 1 day, 7 days, and 30 days ago, compare the volatility rate, and then compare it with the set threshold. If any one does not meet the rule, an alarm will be triggered.

Indicator parameter configuration

Indicator rule configuration

Template type

Description

Uniqueness

Field Group Count Validation/Duplicate Value Count Validation: Data filtering must be configured.

Data Filtering: Disabled by default. Enable to configure the filter condition, partition filter, or normal data filter of the validation table. The filter condition is appended directly to the validation SQL. If the validation table requires a partition filter, configure the partition expression in the scheduling configuration. Post-configuration, the quality report will reflect the validation partition as the minimum granularity. Example data filtering content:

id = 12 --Single table

T1.id=12 and T2.name = "Zhang San" --Double table

Stability

Column Stability Validation/Column Volatility Validation:

  • Data Filtering: Consistent with the description of Uniqueness in the indicator rule configuration.

  • Statistical Method: Select the statistical method of the indicator.

    • Field Count: Compare the field COUNT with the expected number, which is a static field validation.

    • Field Unique Value Count: Compare the deduplicated field COUNT with the expected number, which is a static field validation.

    • Field Sum Value: The sum value of the field, taking the SUM value of the field and comparing it with the static field.

    • Field Average Value: Take the average value of the field and compare it with the set static field.

    • Field Maximum Value: The maximum value of the field, taking the maximum value of the field and comparing it with the threshold.

    • Field Minimum Value: The minimum value of the field, taking the minimum value of the field and comparing it with the threshold.

    • Field Duplicate Row Count: Compare the number of duplicate values with the static field (total number of rows minus the number of deduplicated values, which is the number of duplicate values).

    • Field Duplicate Rate %: That is, the number of duplicate values/total number of rows, the ratio of the number of duplicate values to the total number of rows, and compare it with a static field.

    • Field Group Count: Take the number of groups of the field and compare it with the set static field.

    • Field Null Value Row Count: Take the number of null values of the field and compare it with the set static field.

    • Field Null Value Rate: That is, the number of null values/total number of rows, the ratio of the number of null values to the total number of rows, and compare it with a static field.

Indicator validation configuration

Template type

Configuration item

Description

Uniqueness

Field Group Count

Take the number of groups of the field and compare it with the set static field.

Statistical Value (Duplicate Rows/Duplicate Rate)

  • Duplicate rows are used to check the number of duplicate rows in the data, only calculating the extra rows. The algorithm is total number of rows - number of deduplicated rows.

  • Duplicate Rate = Duplicate Rows/Total Number of Rows.

  • If you need to calculate the total number of duplicate data rows/duplicate rate, you can use the abnormal rows/abnormal rate indicators of the uniqueness validation template.

Stability

Statistical Value

Refers to the deduplicated value, the data after count distinct.

Statistical Value (1-Day Volatility, 7-Day Volatility, 30-Day Volatility)

Compare the number of rows collected 1 day, 7 days, and 30 days ago, compare the volatility rate, and then compare it with the set threshold. If any one does not meet the rule, an alarm will be triggered.

Mean Volatility Detection (7-Day Volatility, 30-Day Volatility)

The benchmark value is the average number of rows in the table in the last 7 days and 30 days, comparing the volatility rate of the average value in the last 7 days and 30 days.

Statistical Value (Volatility Compared to the 1st Day of the Month, Volatility Compared to the Previous Month, Volatility Compared to the Previous Year)

Compare the number of rows collected on the 1st day of the month, the previous month, and the previous year, compare the volatility rate, and then compare it with the set threshold. If any one does not meet the rule, an alarm will be triggered.

Real-time metadata parameter configuration

Offline link comparison parameter configuration

When real-time data and offline data use the same statistical logic, the real-time and offline data comparison quality rules can detect the differences between the data. If the difference is large, there may be data quality issues.

Parameter

Description

Validation Field

Select the field for validation.

Indicator Operator

Choose the data operation algorithm.

Object Form

Select either Single Value Data or Multi-value Data.

Time Limitation Condition

Specify the time-limited field.

Enable Condition Limitation

Choose to Enable or Disable condition limitation.

Offline Data

Choose the offline data table from the list.

Offline Data Retrieval

By default, this is Shut Down. When enabled, configure the retrieval of offline data tables through SQL statements.

Time Zone Setting

Choose the time zone from the dropdown menu.

Multi-link comparison parameter configuration

In scenarios requiring strong assurance, real-time dual-link or real-time triple-link quality rules can be used to monitor data. If there is an anomaly, the operation and maintenance personnel can switch or back up the data in time. Real-time multi-link comparison quality rules support monitoring data latency, statistical deviations, and other issues.

Parameter

Description

Validation Field

Select the field for validation.

Indicator Operator

Choose the data operation algorithm.

Object Form

Select either Single Value Data or Multi-value Data.

Time Limitation Condition

Specify the time-limited field.

Enable Condition Limitation

Choose to Enable or Disable condition limitation.

Number Of Comparison Links

Select the number of comparison links for the quality rule. The system supports selecting Real-time Triple-link Comparison and Real-time Dual-link Comparison.

Comparison Link 1/Comparison Link 2

Select the real-time metadata table as the comparison link data:

  • If the template selects Real-time Dual-link Comparison, only one real-time metadata table needs to be selected.

  • If the template selects Real-time Triple-link Comparison, two real-time metadata tables need to be selected.

Time Zone Setting

Choose the time zone from the dropdown menu.