Template type | Description |
Completeness/Uniqueness | Completeness-Null Value Validation/Empty String Validation Uniqueness-Uniqueness Validation/Field Group Count Validation/Duplicate Value Count Validation: Validation Field: Select the field from the physical table to validate. Validation Table Data Filtering: This option is disabled by default. You can enable it to configure filter conditions, partition filters, or regular data filters for the validation table. The filter conditions are appended directly to the validation SQL statement. To filter partitions, you can configure a partition filter expression in the scheduling configuration. This sets the validation partition as the minimum granularity for viewing the quality report. Enter the data filtering conditions. For example: id = 12 --single table
T1.id=12 and T2.name = "Zhang San" --two tables
|
Timeliness | Time Comparison With Expression: Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness. Comparison Item: The expression for comparing time, with built-in parameters ${column}, ${bizdate}. Define an expression based on the comparison field. ${column} is a built-in parameter for the comparison item, for example: ${column}, substr(${column}) . If you select ${bizdate} for date function comparison, it is recognized as the data timestamp.
Time Difference: Supports selecting Validation Field - (minus) Comparison Field, Comparison Field - (minus) Validation Field. The value before the decimal point of the unit item is taken without rounding. For example, if the shipping date and order date need to be on the same day, Validation Item - Comparison Item can be set to less than 1 day. As shown in the figure, the setting is: Validation field - Comparison field >= 1 day and Validation field - Comparison field < 1 day. 
Single Table Field Comparison: Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness. Comparison Field: Select the field in the physical table that you want to compare. Time Difference: Supports selecting Validation Field - (minus) Comparison Field, Comparison Field - (minus) Validation Field. The value before the decimal point of the unit is taken without rounding. For example, if the shipping date and order date need to be on the same day, Validation Item-Comparison Item can be set to less than 1 day. As shown in the figure, the setting is: Validation field - Comparison field >= 1 day and Validation field - Comparison field < 1 day. 
Time Interval Comparison In Two Tables: Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness. Select Comparison Table: Select the table and field in the physical table that you want to compare. Association Expression: Built-in data table parameters T1 and T2, where T1 is the detection table and T2 is the comparison table, for example: T1.id = T2.id. Time Difference: Supports selecting Validation Field - (minus) Comparison Field, Comparison Field - (minus) Validation Field. The value before the decimal point of the unit item is taken without rounding. For example, if the shipping date and order date need to be on the same day, Validation Item - Comparison Item can be set to less than 1 day. As shown in the figure, the setting is: Validation field - Comparison field >= 1 day and Validation field - Comparison field < 1 day. 
|
Validity | Column Format Validation: Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness. Content Recognition Form: You can select Recognition Expression, Regular Expression, String Matching (%), Starts With, Contains, Ends With. The expression is used to match data that conforms to the rules. Content Recognition Method - Recognition Expression: Enter an SQL expression function. For example, ${column1}>0, where ${column} is a built-in parameter. Content Recognition Form-Regular Expression: You can enter a regular expression and select built-in regular expressions, including Mobile Phone Number,Landline Phone,ID Card Number,Email,Bank Card Number, etc. Content Recognition Form-String Matching (%): Standard like expression, using % as a wildcard character. For example, to match data starting with 'a', enter a%. Content Recognition Form-Starts With: Enter the string to be matched, and % will be automatically appended to the end according to the pattern, for example: a%. Content Recognition Form-Contains: Enter the string to be matched, and % will be automatically appended to the end according to the pattern, for example: %a%. Content Recognition Form-Ends With: Enter the string to be matched, and % will be automatically appended to the end according to the pattern, for example: %a.
Column Length Validation: Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness. Field Length: Used for field length validation. You can set this to an Enumeration or a Range. Example of enumeration value validation: To compare against 3, 6, and 9, the generated validation SQL statement is x in (3, 6, 9).
Column Value Domain Validation: Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness. Value Domain Type: Select according to your actual business needs. Text: Used for text validation. The validation field must be a text type. Set the Value Range to an Enumeration or a Range. Example of enumeration value validation: To compare against Zhang San and Li Si, the generated validation SQL statement is x in ("Zhang San", "Li Si"). Number: Used for numeric validation. The validation field must be a numeric type. Set the Value Range to an Enumeration or a Range. Example of enumeration value validation: To compare against 3, 6, and 9, the generated validation SQL statement is x in (3, 6, 9). Date: Used for date validity validation. You need to ensure that the validation field is a date type (date). You need to set Date Format, supporting Year, Year-month, Year-month-day, Year-month-day-hour-minute-second settings. For example: Year: 2021~2022, Year-month: 2022-01~2022-02. Timestamp: Used for timestamp validity validation. You need to ensure that the validation field is a timestamp type (timestamp). Custom: Used for validating multiple formats, also supporting the use of functions. The content you enter will be directly sent to the data source for execution. You need to set Value Domain Range, supporting Enumeration or Range settings. Example of enumeration value validation: To compare against 3, 6, and 9, the generated validation SQL statement is x in (3, 6, 9). Example of range validation: The value range is from a minimum value y1 to a maximum value y2. If the field being compared is x, the final validation SQL statement is x >= y1 and x <= y2.
Note When using custom field formats, examples such as 10 and "10" may behave differently in different data sources. You need to ensure that the functions you use are valid in the data source where the quality check is performed.
Reference Table Validation: Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness. Data Source Type: Select the data source type. Datasource: Select the data source corresponding to the data source type. The system automatically fills in the DB/Schema of the data source, which can be modified. If the data source cannot connect to the Dataphin cluster, you need to configure it manually.
Note For performance considerations, at most 1000 lookup table data records will be read for cross-data source comparison. It is recommended that the lookup table and the table being validated are in the same data source. Fields for cross-data source comparison will be automatically converted to string type. Some special field types may cause exceptions. It is recommended that the field types in the lookup table and the validation field are both string or other identical types.
Reference Tables: Select the target lookup table. Lookup Table Name: Enter the lookup table name, which will be used as prompt information when configuring quality rules. Lookup Table Reference Field: Select the corresponding reference field in the lookup table.
Note The lookup table reference is used to determine whether a field exists in the lookup table. You need to select a field for comparison. For example, to determine whether all user IDs in the user details table are registered users. Lookup Table Filter Condition: Set filter conditions for the partition table to filter data.
Standard Reference Table Validation (requires the Data Standard module to be activated): Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness. Reference Tables: Select a published lookup table under Data Standard. To create a lookup table, see Create and manage standard codes (lookup tables). Lookup Table Reference: Compare the reference value selected here with the field value. You can select Code Value, Code Name, Code English Name.
|
Consistency | Columns Value Consistency Validation: Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness. Comparison Field: Select the field in the physical table that you want to compare.
Columns Statistical Consistency Validation: Statistical Method: You can count fields, calculate maximum field values, character repetition rates, etc. For more information, see Stability. Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness. Comparison Field: Select the field in the physical table that you want to compare.
Single Field Business Logic Consistency Comparison: Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness. Recognition Expression: Enter an SQL expression function. For example, ${column1}>0, where ${column} is a built-in parameter.
Note The recognition expression supports the use of functions, but you need to ensure that the database used for execution supports these functions. You can validate the business logic between multiple fields. For example, to check if the total sales amount is correct using the formula total_sales = unit_price × sales_volume, you can use ${total_sales}=${unit_price}*${sales_volume}.
Columns In Two Tables Value Consistency Validation: Validation Field, Select Comparison Table, Association Expression, Data Filtering: For more information, see Timeliness Time Fields In Two Tables. Join Type: This affects the calculation of total rows, normal rows, and abnormal rows in the data result. You can view the specific logic through the SQL preview. Supports four join types: left join, right join, inner join, and full join. left join: Based on the validation table, calculates metrics such as total rows and normal rows. right join: Based on the comparison table, calculates metrics such as total rows and normal rows. inner join: Based on the matched data between the validation table and comparison table, calculates metrics such as total rows and normal rows. Suitable for scenarios where only matched data needs to be validated. full join: Based on all data from both the validation table and comparison table, calculates metrics such as total rows and normal rows. Suitable for scenarios requiring strong consistency assurance between two tables.
Columns In Two Tables Statistical Consistency Validation: Statistical Method: You can count fields, calculate maximum field values, character repetition rates, etc. For more information, see the description of statistical methods for the Stability metric. Validation Field, Select Comparison Table, Validation Table Data Filtering: For more information, see Timeliness Time Fields In Two Tables. Comparison Table Data Filtering: Enter the data filtering content. For example: city='beijing'.
Cross-source Two Tables Field Statistical Value Consistency Comparison: Statistical Method: You can count fields, calculate maximum field values, character repetition rates, etc. For more information, see the description of statistical methods for the Stability metric. Validation Field: Select the field in the data table that you want to validate. Data Source Type: Select the data source type for the data table. For more information about the supported data source types, see Supported data sources in Dataphin. Datasource: Select the data source to which this data table belongs. The system automatically fills in the DB/Schema of the data source, which can be modified. If it is an external data source, you need to configure it manually. Select Comparison Table: Select the table and field in the data table that you want to compare. Validation Table Data Filtering: Enter the data filtering content, for example: id = 12 --single table
T1.id=12 and T2.name = "Zhang San" --two tables
Comparison Table Data Filtering: Enter the data filtering content. For example: city='beijing'.
Columns In Two Tables Processing Logic Consistency Validation: Validation Field, Validation Table Data Filtering: For more information, see Completeness/Uniqueness. Business Logic: Enter an SQL expression function. For example, ${column1}>0, where ${column} is a built-in parameter.
Note You can validate the business logic between multiple fields. For example, to check if the total sales amount is correct using the formula total_sales = unit_price × sales_volume, you can use ${T1.total_sales}=${T2.unit_price}*${T2.sales_volume}.
|
Stability | |
Custom SQL | Custom Statistic Validation: Sql: Supports select query statements. The query object must include the main table. For example: select sum(sale) from tableA where ds=${bizdate};
Custom Detail Value Validation: SQL for total row count: Enter the SQL statement for counting the total number of rows. For example: select count(*) from tableA where ds=${bizdate}; SQL for abnormal row count: Enter the SQL statement for counting the number of abnormal rows. For example: select count(*) from tableA where ds=${bizdate} and age<0; SQL for abnormal data: Enter the SQL statement for querying abnormal data. For example: select ${t1.c1} as dataphin_quality_error_data from ${t1} where ds=${bizdate} and ${t1.c1}<0;
Note If you select a custom rule template for a Custom SQL rule, the configuration section automatically parses the template's variable fields as property values. Configure these values based on the descriptions of the template variables. |