All Products
Search
Document Center

Dataphin:Create metric quality rules

Last Updated:Jan 21, 2025

Dataphin supports the creation of quality rules to validate metrics, enhancing the convenience of metric quality monitoring. This topic describes how to configure metric quality rules.

Prerequisites

Quality rules can be configured only after you have added monitored objects. For more information, see Add monitored objects.

Permission description

  • Super administrators, quality administrators, custom global roles with Quality Rule - Manage permissions, custom project roles with Project Quality Management - Quality Rule Management permissions for the project where the table resides, and metric business owners can configure scheduling, alerts, and more for quality rules.

  • Quality owners and regular users who require additional read permissions for logical table fields can refer to the following documentation: how to apply for, renew, and return table permissions.

  • The permissions supported for different objects vary. For details, see Quality Rule Operation Permissions.

Validation rule description

When metrics are validated against quality rules, the system sends alert messages if weak quality monitoring rules are triggered, helping you promptly identify and address anomalies. If strong quality monitoring rules are triggered, the system automatically interrupts tasks associated with the table to prevent dirty data from flowing downstream. The system also sends alert messages to help you promptly identify and address anomalies.

Differences between trial runs and executions

The differences between trial runs and executions lie in the execution method and displayed results. Trial runs refer to running a quality rule in a test mode to check its correctness and execution status. The results of trial runs are not displayed in quality reports. Executions refer to checking quality rules within a specific time frame. The results of executions are output to quality reports for users to view and analyze.

Quality rule configuration

  1. On the Dataphin home page, in the top menu bar, select Administration > Data Quality.

  2. Click Quality Rule in the left-side navigation pane. On the Metrics page, click the name of the target object to enter the Quality Rule Details page and configure the quality rule.

  3. On the Quality Rule Details page, click the Create Quality Rule button.

  4. In the Create Quality Rule dialog box, configure the parameters.

    Parameter

    Description

    Basic Information

    Rule Name

    The name of the custom quality rule.

    Rule Strength

    Supports Weak Rule and Strong Rule.

    • If you select Weak Rule, an alert is triggered when the quality rule validation result is abnormal but it does not block downstream task nodes.

    • If you select Strong Rule, an alert is triggered when the quality rule validation result is abnormal. If there are downstream tasks (code inspection scheduling, task trigger scheduling), it will block downstream tasks to prevent data pollution from spreading. If there are no downstream tasks (such as periodic quality scheduling), it will only trigger an alert.

    Description

    Description of the custom quality rule. No more than 128 characters.

    Configuration Method

    • Template Creation: Quickly create quality rules using general system templates and custom business templates.

      • System Template: Built-in parameters in the template can be configured, suitable for general rule creation.

      • Custom Template: Preset parameters in the template do not require configuration, generally used for rule creation with business logic.

    • SQL: Flexibly customize quality monitoring rules through SQL, suitable for flexible and complex scenarios.

    Rule Template

    Dropdown to select rule templates, Uniqueness, Stability, SQL.

    • Uniqueness: Includes Field Group Count Validation and Duplicate Value Count Validation.

    • Stability: Includes Column Stability Validation and Column Volatility Validation.

    • SQL: Includes Custom Statistic Validation.

    For more information, see the referenced document.

    Rule Type

    The rule type is related to the template and is the most basic property of the template. It can be used for description and filtering functions.

    Rule Configuration

    Rule Configuration

    When Rule Template is selected as Uniqueness, the corresponding parameters are as follows.

    • Field Group Count Validation/Duplicate Value Count Validation:

      • Validation Table Data Filtering: Disabled by default. When enabled, you can configure the filter conditions or partition filtering or general data filtering for the validation table. The filter conditions will be directly appended to the validation SQL. If there is a partition filtering requirement for the validation table, it is recommended to configure the partition expression in the scheduling configuration. After configuration, the quality report will be viewed with the validation partition as the minimum granularity. Fill in the data filtering content, such as:

        id = 12 --single table

        T1.id=12 and T2.name = "Zhang San" --double table

    When Rule Template is selected as Stability, the corresponding parameters are as follows.

    • Column Stability Validation/Column Volatility Validation:

      • Statistical Method: It is recommended to choose the statistical method based on the business scenario.

      • Validation Table Data Filtering: Disabled by default. When enabled, you can configure the filter conditions or partition filtering or general data filtering for the validation table. The filter conditions will be directly appended to the validation SQL. If there is a partition filtering requirement for the validation table, it is recommended to configure the partition expression in the scheduling configuration. After configuration, the quality report will be viewed with the validation partition as the minimum granularity. Fill in the data filtering content, such as:

        id = 12 --single table

        T1.id=12 and T2.name = "Zhang San" --double table

    When Rule Template is selected as SQL, the corresponding parameters are as follows.

    • Custom Statistic Validation:

      • SQL: Supports select query statements. The query object must include the primary table. For example:

        select sum(sale) from tableA where ds=${bizdate};

    Validation Configuration

    Rule Validation

    • After the data quality rule validation, the result will be compared with the abnormal validation configuration. If the conditions are met, the validation result will be failed. It will also trigger alerts and other subsequent processes.

    • The available indicators for abnormal validation are determined by the template and configuration content. It supports multiple conditional and/or conditions. It is recommended to have fewer than three in actual configuration.

    For more information, see the referenced document.

    Business Property Configuration

    Property Information

    The specification for filling in business properties depends on the configuration of the quality rule properties. For example:

    • The field value type corresponding to the department in charge is an enumeration value (multiple choice). The range of selectable enumeration values includes the Big Data Department, Business Department, and Technical Department. Therefore, when creating a quality rule, this property value is a dropdown multiple-choice box. The selectable options are enumeration values (multiple choice), and the range of selectable enumeration values includes the Big Data Department, Business Department, and Technical Department.

    • The field value type corresponding to the rule owner is custom input, and the property field length is 256. Therefore, when creating a quality rule, this property value can be entered with up to 256 characters.

    If the method for filling in the property field is Range Interval, the configuration method is as follows:

    Range Interval: Commonly used when the value range is continuous numbers or dates. You can choose from four symbols: >, >=, <, <=. For more property configurations, see the referenced document.

    Scheduling Property Configuration

    Scheduling Method

    Supports selecting a configured schedule. If the scheduling method is not yet decided, you can configure it after creating the quality rule. If you need to create a new one, see the referenced document.

  5. Click Save to complete the rule configuration.

    You can click Preview SQL to compare the current configuration with the last saved configuration, which helps in viewing SQL changes.

    Note
    • If the key information is not fully filled out, the SQL preview is not available.

    • The left side shows the SQL preview of the last saved configuration. If not configured, it is empty. The right side shows the SQL preview of the current configuration.

    Rule configuration list

    You can view the configured metric rule information on the rule configuration list page and perform operations such as view, edit, trial run, run, or delete.

    image

    Area

    Description

    Filter and search area

    Supports quick search by object or rule name.

    Supports filtering by rule type, rule template, rule strength, trial run status, or active status.

    Note

    If the quality rule property is configured with searchable and filterable business properties and is enabled, you can search or filter based on this property.

    List area

    Displays the object type/name, rule name/ID, trial run status, active status, rule type, rule template, rule strength, schedule type, and related knowledge base document information of the rule configuration list. Click the image icon before refresh to select the rule list fields you need to display.

    • Active Status: It is recommended to conduct a trial run before activating the rule. Activate the status for rules that pass the trial run to avoid incorrect rules blocking online tasks.

      • After activating the status, the selected rules will automatically execute according to the configured schedule.

      • After deactivating the status, the selected rules will not automatically execute but can be manually executed.

    • Related Knowledge Base Document: Click View Details to view the knowledge base information associated with the rule. This includes table name, validation object, rule, and related knowledge base document information. You can also perform search, view, edit, or delete operations on the knowledge base. For more information, see the referenced document.

    Operation area

    You can perform view, clone, edit, trial run, run, schedule configuration, associate knowledge base document, or delete operations.

    • View: View the details of the rule configuration.

    • Clone: Quickly clone a rule.

    • Edit: After editing a rule, a trial run is required again.

    • Trial Run: Supports selecting Existing Schedule or Custom Validation Range to trial run the rule. After the trial run, click the image icon View Trial Run Log.

    • Run: Supports selecting Existing Schedule or Custom Validation Range to run the rule. After running, you can view the validation results in Quality Record.

    • Scan Configuration: Supports filtering schedule types or quick searching schedules by schedule name in the dialog box. Also supports editing schedules.

    • Associate Knowledge Base Document: After associating a rule with a knowledge base, you can view the associated knowledge in the quality rule and administration workbench. Supports selecting unassociated knowledge bases. For creation, see create and manage knowledge base.

    • Delete: Deleting this quality rule object will delete all quality rules under the object. This action cannot be revoked. Please proceed with caution.

    Batch operation area

    You can perform batch trial run, run, schedule configuration, enable, shutdown, modify business properties, associate knowledge base document, or delete operations.

    • Trial Run: Supports selecting Existing Schedule or Custom Validation Range to batch trial run rules. After the trial run, click the image icon View Trial Run Log.

    • Run: Supports selecting Existing Schedule or Custom Validation Range to batch run rules. After running, you can view the validation results in Quality Record.

    • Scan Configuration: Supports filtering schedule types or quick searching schedules by schedule name in the dialog box. Also supports editing schedules to batch configure schedules for quality rules. Only supports modifying selected rules that are editable on the quality rule list page.

    • Enable: After batch enabling the active status, the selected rules will automatically execute according to the configured schedule. Only supports enabling selected rules that are editable on the quality rule list page.

    • Shutdown: After batch deactivating the active status, the selected rules will not automatically execute but can be manually executed. Only supports deactivating selected rules that are editable on the quality rule list page.

    • Modify Business Properties: When the field value type corresponding to the business property is single or multiple choice, batch modification of business properties is supported.

      • When the field value type corresponding to the business property is multiple choice, appending or modifying property values is supported.

      • When the field value type corresponding to the business property is single choice, direct modification of property values is supported.

    • Associate Knowledge Base Document: After associating rules with knowledge, you can view the associated knowledge in the quality rule and administration workbench. Supports batch configuration of knowledge bases for monitored objects. For creation, see create and manage knowledge base.

    • Delete: Supports batch deletion of quality rule objects. This action cannot be revoked. Please proceed with caution. Only supports deleting selected rules that are editable on the quality rule list page.

New scheduling

Note
  • When setting up scheduling rules, you can swiftly create configurations using existing schedules, with a maximum of 20 rules per table.

  • A maximum of 10 schedules can be configured for the same rule.

  • Automatic deduplication is supported when the scheduling configuration is fully consistent.

  • The validation scope will be issued as a filter condition in the quality validation statement to control the scope of each quality validation. The validation scope will also serve as the basic unit for subsequent quality reports and other downstream processes. Viewing quality reports will use the validation scope as the smallest viewing granularity.

  1. On the Quality Rule Details page, click the Scan Configuration tab, and then click the New Scheduling button to enter the New Scheduling dialog box.

  2. In the New Scheduling dialog box, configure the parameters.

    Parameter

    Description

    Schedule Name

    Custom schedule name.

    Schedule Type

    Supports Recurrency Triggered, Data Update Triggered, and Task Triggered.

    • Recurrency Triggered: Supports scheduled, periodic quality checks on data based on the set schedule time. Suitable for scenarios where data output time is relatively fixed.

      • Recurrence: Running quality rules will occupy certain computing resources. It is recommended to avoid concurrent execution of multiple quality rules at the same time to prevent affecting the normal operation of production tasks. The scheduling cycle includes five types: Day, Week, Month, Hour, and Minute.

    • Data Update Triggered: When all code tasks are executed, it will parse whether the current table's specified validation scope is updated during this task run. Suitable for tables with non-fixed modification tasks or tables that require focused monitoring, i.e., each change needs to be monitored.

      Note

      It is recommended to select the partition updated by the task as the validation scope (non-partitioned tables will validate the entire table). The system will automatically detect all data changes and perform validation to avoid omissions.

    • Task Triggered: Execute the configured quality rules after or before the specified task runs successfully. Supports selecting task types such as engine SQL, offline pipeline, Python, Shell, Virtual, Datax, Spark_jar, Hive_MR, and database SQL node to trigger tasks. Suitable for scenarios where table modification tasks are fixed.

      Note

      Fixed task triggers can only select production environment tasks. If the rule intensity is configured as a strong rule, a scheduling task validation failure may affect online tasks. Please operate cautiously according to business needs.

      • Trigger Timing: Select the trigger timing for quality checks. Supports selecting Trigger After All Tasks Run Successfully, Trigger After Each Task Runs Successfully, and Trigger Before Each Task Runs.

      • Triggering Task: Supports selecting production task nodes for which the current user has maintenance permissions. You can search by node output name.

        Note

        When the trigger timing is selected as trigger after all tasks run successfully, it is recommended to select tasks with the same scheduling cycle to avoid delayed rule execution and delayed quality check results due to different scheduling cycles.

    Schedule Condition

    Disabled by default. When enabled, it will first determine whether the scheduling conditions are met before the quality rule is officially scheduled. If the conditions are met, it will be officially scheduled. If not, this schedule will be ignored.

    • Data Timestamp/Executed On: If the schedule type is selected as Recurrency Triggered (timed scheduling does not support execution date), Data Update Triggered, or Task Triggered, date configuration is supported. You can choose Regular Calendar or Custom Calendar. For how to customize a calendar, see Create a public calendar.

      • If you choose Regular Calendar, the conditions can be Month, Week, or Date. For example, see the figure below:

        image

      • If you choose Custom Calendar, the conditions can be Date Type or Tag. For example, see the figure below:

        image

    • Instance Type: If the schedule type is selected as Data Update Triggered or Task Triggered, instance type configuration is supported. You can choose Recurring Instance, Data Backfill Instance, or One-time Instance. For example, see the figure below:

      image

    Note
    • At least one rule must be configured. To add a rule, click the + Add Rule button.

    • A maximum of 10 scheduling conditions can be configured.

    • The relationship between scheduling conditions can be configured as and or or.

    Validation Scope

    When the schedule type is selected as timed scheduling, fixed task triggered scheduling, the validation scope supports custom validation scope. When the schedule type is selected as data update triggered scheduling, the validation scope supports task updated partition, custom validation scope.

    • Updated Partition: If a partition is updated in the inspection task, the task will be issued directly according to the updated partition.

      Note
      • Dynamic partition scenarios may not parse the partition and will not perform quality validation.

      • Volatile validation rules (such as checking partition size, partition row count, field statistics) require specifying a partition and do not support task updated partition validation scope.

      • If there is data update in a non-partitioned table, the entire table will be validated.

    • Custom Validation Scope: For scenarios that cannot be parsed, you can use a custom validation scope to specify the validation scope expression based on the data timestamp or execution date.

      • Validation Scope Expression: It is an input-enabled drop-down selection box that supports directly entering the scope to be validated, such as ds='${yyyyMMdd}'. You can also select a built-in validation scope expression and then modify it to help you quickly configure. For details on partition expressions, see Built-in partition expression types.

        Note
        • If there are multiple conditions for validation, you can use and or or to connect them, such as province="Zhejiang" and ds<=${yyyyMMdd}.

        • If a filter condition is configured in the quality rule, the relationship between the validation scope expression and the filter condition is AND. When validating data, both conditions will be filtered together.

        • The validation scope expression supports full table scan.

          Note: Full table scan will consume significant resources, and some do not support full table scan. It is recommended to configure partition expressions to avoid full table scan.

      • Validation Scope Budget: The default is the current day's data timestamp.

  3. Click OK to complete the scheduling configuration.

Scheduling configuration list

After the scheduling is created, you can view, edit, clone, or delete it in the scheduling configuration list.

image.png

Area

Description

Filter and Search Area

Supports quick search by schedule name.

Supports filtering by Recurrency Triggered, Data Update Triggered, Task Triggered.

List Area

Displays the Schedule Name, Schedule Type, Last Updated By, and Last Updated Time information of the rule configuration list.

Operation Area

You can edit, clone, or delete the schedule.

  • Edit: You can modify the configured schedule information.

    Important

    All rule configurations referencing this schedule will change synchronously. Please operate cautiously.

  • Clone: Quickly copy the schedule configuration.

  • Delete: Schedules referenced by rule configurations cannot be deleted.

Alert configuration

You can configure different alert methods for different rules to distinguish alerts. For example, configure phone alerts for strong rule abnormalities and text message alerts for soft rule abnormalities. If a rule hits multiple alert configurations simultaneously, you can set the effective policy for the alert.

Note

A single monitored object supports creating no more than 20 alert configurations.

  1. On the Quality Rule Details page, click the Alert Configuration tab, then click the New Alert Configuration button to enter the New Alert Configuration dialog box.

    image.png

  2. In the New Alert Configuration dialog box, configure the parameters.

    Parameter

    Description

    Coverage

    Supports selecting All Rules, All Strong Rules, All Soft Rules, and Custom.

    Note
    • Under a single monitored object, the three ranges of all rules, all strong rules, and all soft rules support configuring one alert each. Newly added rules will automatically match the corresponding alert based on rule strength. If you need to change one of the alert configurations, you can modify the existing configuration.

    • The custom range can select all configured rules under the current monitored object, not exceeding 200.

    Alert Configuration Name

    The alert configuration name under a single monitored object is unique and does not exceed 256 characters.

    Alert Recipient

    Configure the alert recipient and alert method. You need to select at least one alert recipient and alert method.

    • Alert Recipient: Supports selecting three types of alert recipients: custom, shift schedule, and quality owner.

      Supports configuring no more than 5 custom alert recipients and no more than 3 shift schedules.

    • Alert Method: Supports selecting different receiving methods such as phone, email, text message, DingTalk, Lark, WeCom, and custom channel. This receiving method can be controlled through configure channel settings.

  3. Click OK to complete the alert configuration.

Alert configuration list

After completing the alert configuration, you can sort, edit, or delete operations in the alert configuration list.

image.png

Ordinalnumber

Description

① Sort area

Supports configuring the alert effective policy when a quality rule meets multiple alert configurations:

  • The First Alert Configuration Hit Is Effective: When this alert policy is selected, only the first alert configuration hit by the rule is effective. Other configurations are not effective. At this time, you can sort the configured alerts. Click Rule Sort. You can drag and sort by selecting the image.png icon in front of the alert configuration name or choose the icon under the operation column to shift. The icons from left to right are: top, bottom. After adjusting the alert order, click the Sort Complete button to complete the sorting.

    image.png

  • All Alert Configurations Are Effective: The alerts in the current alert configuration list are effective for the quality rules under the current monitored object.

    For example, when you configure multiple alert configurations and select all alert configurations to be effective, the system will merge alerts according to alert receiving method + alert recipient + alert rule. In special cases, if the alert recipient is the same person and the alert method is custom and quality owner, the alert messages will be merged according to the merge policy.

    Note

    Shift schedules do not support alert merging.

② List area

Displays the name of the alert configuration, the effective range, the specific recipients of each alert type, and the corresponding alert receiving method.

Effective Range: Custom alerts support viewing the configured object name and rule name. If the rule is deleted, the object name cannot be viewed. It is recommended to update the alert configuration.

③ Operation area

You can edit or delete the configured alerts.

  • Edit: Supports modifying the configured alert information. If you modify the alert recipient and alert method, please sync with relevant personnel in time to avoid missing business alert information.

  • Delete: After deletion, the rules hit by this alert configuration will no longer be effective. Please operate with caution.

View quality report

Click Quality Report to view the Rule Validation Overview and Rule Validation Details of the current quality rules.

  • You can quickly filter validation details based on abnormal results, partition time, or keywords in the names of rules or objects.

  • In the operation column of the rule validation details list, click the image icon to view the rule validation details of the quality rules.

  • In the operation column of the rule validation details list, click the image icon to view the execution log of the quality rules.

Set quality rule permission management

  1. You can Click Permission Management, and configure View Details, which specifies members who can view validation records, quality rule details, and quality reports.

    View Details: You can select All Members or Only Members With Current Object Quality Management Permissions.

  2. You can Click Confirm to complete the permission management configuration.

What to do next

Once you have configured the quality rule, you can view it on the metric rule list page. For more information, see the monitored object list.