All Products
Search
Document Center

Dataphin:Create real-time metadata table rules

Last Updated:Jan 21, 2025

This topic explains how to create real-time metadata table rules.

Prerequisites

Before configuring quality rules, you must add monitored objects. For more information on how to add monitored objects, see Add Monitored Objects.

Permission description

  • Super administrators, quality administrators, custom global roles with Quality Rule - Manage permissions, custom project roles with Project Quality Management - Quality Rule Management permissions for the project where the metadata table is located, and real-time metadata table owners can configure scheduling, alerts, and other settings for quality rules.

  • Quality owners and regular users must also have read-through permissions for real-time metadata tables. For details on how to request real-time metadata table permissions, see Request Table Permissions.

  • The supported operation permissions vary depending on the object. For more information, see Operation Permissions for Quality Rules.

Differences between trial runs and executions

The main differences between trial runs and executions are the execution methods and the results displayed. A Trial run is a one-time simulated execution of a quality rule to verify its correctness and execution status, with results not displayed in the quality report. An Execution is the scheduled inspection of quality rules, with results output to the quality report for user review and analysis.

Description of metadata table rules

Rule type

Description

Statistical trend monitoring

Monitors data values and data change trends.

Real-time multi-link comparison

In scenarios with strong business guarantees, real-time dual-link or triple-link quality rules can monitor data. If an exception occurs, O&M personnel can promptly switch or back up data. Real-time multi-link comparison quality rules support monitoring issues such as data retention and statistical drift.

Real-time and offline data comparison

When real-time and offline data use the same statistical logic, real-time and offline data comparison quality rules can detect differences between the data. Significant differences may indicate data quality issues.

Quality rule configuration

  1. On the Dataphin home page, select Administration > Data Quality from the top menu bar.

  2. Click Quality Rule in the left-side navigation pane. On the Real-time Metadata Table page, click the name of the target object to access the Quality Rule Details page and configure quality rules.

  3. On the Quality Rule Details page, click Create Quality Rule.

    image

  4. In the Create Quality Rule dialog box, configure the parameters.

    Parameter

    Description

    Basic information

    Rule Name

    The name of the custom quality rule.

    Rule Strength

    Supports Weak Rule and Strong Rule.

    • If you select Weak Rule, an alert is triggered when the quality rule validation result is abnormal. However, it does not block downstream task nodes.

    • If you select Strong Rule, an alert is triggered when the quality rule validation result is abnormal. Additionally, if there are downstream tasks (such as code inspection scheduling or task trigger scheduling), the downstream tasks are blocked to prevent data divergence. If there are no downstream tasks (such as periodic quality scheduling), only an alert is triggered.

    Description

    Custom quality rule description. No more than 128 characters.

    Rule Template

    You can select a consistency or stability rule template.

    • Consistency: Includes Stream-batch Comparison and Real-time Link Comparison.

    • Stability: Includes Real-time Statistical Value Detection.

    For more information, see real-time meta table template type.

    Rule Type

    The rule type is related to the template. It is the most basic property of the template and can be used for description and filtering features.

    Rule configuration

    Rule Configuration

    Perform rule configuration based on the selected Rule Template. For more information, see Offline link comparison parameter configuration, Multi-link comparison parameter configuration.

    Validation configuration

    Rule Validation

    • After data quality rule validation, the results are compared with the configuration of abnormal validation. If the conditions are met, the validation result is considered failed. This also triggers subsequent processes such as alerts.

    • The available indicators for abnormal validation are determined by the template and configuration content. It supports multiple conditional and/or conditions. It is recommended not to exceed three in actual configuration.

    For more information, see the referenced document.

    Business property configuration

    Property information

    The specification for filling in business properties depends on the configuration of the quality rule properties. For example, the field value type corresponding to the department in charge is an enumeration value (multiple choice), with selectable enumeration values including the Big Data Department, Business Department, and Technical Department. Therefore, when creating a quality rule, this property value is a dropdown multiple-choice box, with selectable options being enumeration values (multiple choice), and the range of selectable enumeration values includes the Big Data Department, Business Department, and Technical Department.

    The field value type corresponding to the rule owner is custom input, with a property field length of 256. Therefore, when creating a quality rule, this property value can be entered with up to 256 characters.

    If the method of filling in the property field is range interval, the configuration method is as follows:

    Range interval: This is commonly used when the value range is continuous numbers or dates. You can select from four symbols: >, >=, <, <=. For more property configurations, see create and manage quality rule properties.

    Schedule property configuration

    Schedule Method

    It supports selecting an already configured schedule. If you have not yet decided on a scheduling method, you can create a quality rule first and then configure it. For creating a new one, see create a new schedule.

  5. Click OK to finalize the quality rule configuration.

    To review SQL changes, you can click Preview SQL and compare the current configuration with the previously saved one.

    Note
    • If key information is incomplete, the SQL preview is not available.

    • The left side shows the SQL preview of the previously saved configuration, which will be empty if no configuration exists. The right side displays the SQL preview of the current configuration.

Rule configuration list

The rule configuration list page allows you to view configured meta table rule information and perform actions such as viewing, editing, testing, running, or deleting rules.

image

Area

Description

Filter and Search Area

Supports quick search by object or rule name.

Supports filtering by rule type, rule template, rule strength, trial run status, or effective status.

Note

If the quality rule property is configured with searchable and filterable business attributes and is enabled, you can search or filter based on this attribute.

List Area

Displays the object type/name, rule name/ID, test run status, effective status, rule type, rule template, rule strength, schedule type, and related knowledge base document information in the rule configuration list. Click the image icon before refresh to select the rule list fields you want to display.

  • Effective Status: It is recommended to perform a trial run before enabling the rule to avoid incorrect rules blocking online tasks.

    • After enabling the effective status, the selected rules will be automatically executed according to the configured schedule.

    • After disabling the effective status, the selected rules will not be automatically executed but can be executed manually.

  • Related Knowledge Base Document: Click View Details to view the knowledge base information associated with the rule. This includes table name, validation object, rule, and related knowledge base document information. You can also search, view, edit, or delete the knowledge base. For more information, see the referenced document.

Operation Area

You can view, clone, edit, trial run, run, configure schedule, associate knowledge base documents, or delete operations.

  • View: View the details of the rule configuration.

  • Clone: Quickly clone rules.

  • Edit: After editing the rule, you need to perform a trial run again.

  • Trial Run: After the trial run, click the image icon View Trial Run Log.

  • Run: After running, you can view the validation results in Quality Record.

  • Scan Configuration: In the dialog box, you can filter by schedule type or quickly search for schedules by schedule name. Editing schedules is also supported.

  • Associate Knowledge Base Document: After associating knowledge with the rule, you can view the associated knowledge in the quality rule and administration workbench. It supports selecting unassociated knowledge bases. To create, see create and manage knowledge base.

  • Delete: Deleting this quality rule object will delete all quality rules under the object. This action cannot be revoked.

Batch Operation Area

You can perform batch trial runs, runs, schedule configurations, enable, disable, modify business properties, associate knowledge base documents, or delete operations.

  • Trial Run: Supports batch trial runs of rules. After the trial run, click the image icon View Trial Run Log.

  • Run: Supports batch running of rules. After running, you can view the validation results in Quality Record.

  • Scan Configuration: Supports filtering by schedule type or quickly searching for schedules by schedule name in the dialog box. Editing schedules is also supported for batch schedule configuration of quality rules. Only rules that can be edited on the quality rule list page can be modified.

  • Enable: After batch enabling the effective status, the selected rules will be automatically executed according to the configured schedule. Only rules that can be edited on the quality rule list page can be enabled.

  • Disable: After batch disabling the effective status, the selected rules will not be automatically executed but can be executed manually. Only rules that can be edited on the quality rule list page can be disabled.

  • Modify Business Properties: When the field value type corresponding to the business property is single or multiple choice, batch modification of business properties is supported.

    • When the field value type corresponding to the business property is multiple choice, appending or modifying property values is supported.

    • When the field value type corresponding to the business property is single choice, direct modification of property values is supported.

  • Associate Knowledge Base Document: After associating knowledge with the rule, you can view the associated knowledge in the quality rule and administration workbench. It supports batch configuration of knowledge bases for monitored objects. To create, see create and manage knowledge base.

  • Delete: Supports batch deletion of quality rule objects. This action cannot be revoked. Please operate with caution. Only rules that can be edited on the quality rule list page can be deleted.

Create schedule

Note
  • When setting up scheduling rules, you can swiftly apply configurations using the current schedule in this table, with a limit of 20 scheduling rules per table.

  • A maximum of 10 schedules can be configured for the same rule.

  • Automatic deduplication is supported when the schedule configuration is identical.

  1. On the Quality Rule Details page, click the Scan Configuration tab, then click Create Schedule to open the Create Schedule dialog box.

  2. In the Create Schedule dialog box, set the parameters.

    Parameter

    Description

    Schedule Name

    Custom schedule name.

    Schedule Type

    Supports Timed Scheduling, Data Update Triggered Scheduling, and Fixed Task Triggered Scheduling.

    • Recurrency Triggered: Supports scheduled and periodic data quality checks based on the set schedule time, suitable for scenarios where data production time is relatively fixed.

      • Recurrence: Running quality rules will occupy certain computing resources. It is recommended to avoid concurrent execution of multiple quality rules at the same time to prevent affecting the normal operation of production tasks. The scheduling cycle includes five types: Day, Week, Month, Hour, and Minute.

    • Data Update Triggered: When all code tasks are executed, it will parse whether the current task run updates the specified verification range of the current table. Suitable for tables with non-fixed modification tasks or tables that need to be closely monitored, i.e., every change needs to be monitored.

      Note

      It is recommended to select the partition updated by the task as the verification range (non-partitioned tables will verify the entire table). The system will automatically detect all data changes and perform verification to avoid omissions.

    • Task Triggered: Execute the configured quality rules after or before the specified task runs successfully. Supports selecting task types such as Engine SQL, Offline Pipeline, Python, Shell, Virtual, Datax, Spark_jar, Hive_MR, and Database SQL node to trigger tasks. Suitable for scenarios where table modification tasks are fixed.

      Note

      Fixed task triggering can only select production environment tasks. If the rule intensity is configured as a strong rule, and the scheduling task verification fails, it may affect online tasks. Please operate cautiously according to business needs.

      • Trigger Timing: Select the timing for quality detection. Supports selecting Trigger After All Tasks Run Successfully, Trigger After Each Task Runs Successfully, and Trigger Before Each Task Runs.

      • Triggering Task: Supports selecting production task nodes that the current user has operation and maintenance permissions for, and you can search by node output name.

        Note

        When the trigger timing is selected as trigger after all tasks run successfully, it is recommended to select tasks with the same scheduling cycle to avoid rule delay due to different scheduling cycles, resulting in delayed quality detection results.

    Schedule Condition

    Disabled by default. When enabled, it will first determine whether the scheduling conditions are met before the quality rules are officially scheduled. If the conditions are met, it will be officially scheduled. If not, this schedule will be ignored.

    • Data Timestamp/Executed On: If the schedule type is selected as Recurrency Triggered (timed scheduling does not support execution date), Data Update Triggered, Task Triggered, date configuration is supported. You can choose Normal Calendar or Custom Calendar. For how to customize a calendar, see Create Public Calendar.

      • If you choose Normal Calendar, the conditions can be Month, Week, Date. For example, see the image below:

        image

      • If you choose Custom Calendar, the conditions can be Date Type, Tag. For example, see the image below:

        image

    • Instance Type: If the schedule type is selected as Data Update Triggered, Task Triggered, instance type configuration is supported. You can choose Recurring Instance, Data Backfill Instance, One-time Instance. For example, see the image below:

      image

    Note
    • At least one rule must be configured. To add a rule, click the + Add Rule button.

    • A maximum of 10 scheduling conditions can be configured.

    • The relationship between scheduling conditions can be configured as and, or.

  3. Click OK to complete the schedule setup.

Schedule configuration list

Once the schedule is created, you can manage it through the schedule configuration list, including viewing, editing, cloning, and deleting schedules.

image.png

Area

Description

① Filter and Search Area

Supports quick search by schedule name.

Supports filtering by Recurrency Triggered, Data Update Triggered, Task Triggered.

List Area

Displays the rule configuration list's Schedule Name, Schedule Type, Last Updated By, Last Updated Time information.

Operation Area

You can edit, clone, and delete the schedule.

  • Edit: Modify the configured schedule information.

    Important

    All rule configurations that reference this schedule will change synchronously. Please operate cautiously.

  • Clone: Quickly copy the schedule configuration.

  • Delete: Schedules referenced by rule configurations cannot be deleted.

Alert configurations

Configure different alert methods for various rules to distinguish between alerts, such as setting phone alerts for strong rule anomalies and text message alerts for soft rule anomalies. If a rule triggers multiple alert configurations, you can determine the effective policy for the alert.

Note

A single monitored object can have up to 20 alert configurations.

  1. On the Quality Rule Details page, click the Alert Configurations tab, then click Create Alert Configuration to open the Create Alert Configuration dialog box.

  2. In the Create Alert Configuration dialog box, enter the parameters.

    Parameter

    Description

    Coverage

    Supports selecting All Rules, All Strong Rules, All Soft Rules, and Custom.

    Note
    • For a single monitored object, the three ranges of all rules, all strong rules, and all soft rules each support configuring one alert. Newly added rules will automatically match the corresponding alert based on rule strength. If you need to change one of the alert configurations, you can modify the existing configuration.

    • The custom range can select all configured rules under the current monitored object, with no more than 200 rules.

    Alert Configuration Name

    The alert configuration name is unique under a single monitored object and does not exceed 256 characters.

    Alert Recipient

    Configure the alert recipient and alert method. At least one alert recipient and alert method must be selected.

    • Alert Recipient: Supports selecting custom, shift schedule, and quality owner as alert recipients.

      Supports configuring no more than 5 custom alert recipients and no more than 3 shift schedules.

    • Alert Method: Supports selecting different receiving methods such as phone, email, text message, DingTalk, Lark, WeCom, and custom channel. This receiving method can be controlled through configure channel settings.

  3. Click OK to finalize the alert configuration.

Alert configuration list

After setting up alerts, you can manage them through the alert configuration list, including sorting, editing, and deleting.

image.png

OrdinalNumber

Description

① Sort Area

Supports configuring the alert effective policy when a quality rule meets multiple alert configurations:

  • The First Hit Alert Configuration Takes Effect: When this alert policy is selected, only the first alert configuration hit by the rule takes effect. Other configurations do not take effect. At this time, you can sort the configured alerts. Click Rule Sort. You can drag and sort by selecting the image.png icon in front of the alert configuration name or choose the icon in the operation column to shift. The icons from left to right are: top, bottom. After adjusting the alert order, click the Sort Completed button to complete the sorting.

    image.png

  • All Alert Configurations Take Effect: The alerts in the current alert configuration list take effect for the quality rules under the current monitored object.

    For example, when you configure multiple alert configurations and select all alert configurations to take effect, the system will merge alerts according to alert receiving method + alert recipient + alert rule. Specially, if the alert recipient is the same person, and the alert method is custom and quality owner, the alert messages will be merged according to the merge policy.

    Note

    Shift schedules do not support alert merging.

② List Area

Displays the name, effective range, specific recipients of each alert type, and the corresponding alert receiving method of the alert configuration.

Effective Range: Custom alerts support viewing the configured object name and rule name. If the rule is deleted, the object name cannot be viewed. It is recommended to update the alert configuration.

③ Operation Area

You can edit and delete the configured alerts.

  • Edit: Supports modifying the configured alert information. If you modify the alert recipient and alert method, please sync with relevant personnel in time to avoid missing business alert information.

  • Delete: After deletion, the rules hit by this alert configuration will no longer take effect. Please operate with caution.

View quality report

Click Quality Report to access the Rule Validation Overview and Rule Validation Details for the current quality rules.

  • Quickly filter validation details based on abnormal results, partition time, rule, or object name keywords.

  • In the operation column of the rule validation details list, click the image icon to view detailed validation for the quality rules.

  • In the operation column of the rule validation details list, click the image icon to view the execution log for the quality rules.

Set permission management for quality rules

  1. To manage permissions, click Permission Management and configure View Details, which specifies members who can view validation records, quality rule details, and quality reports.

    View Details: Choose between All Members or Only Members With Current Object Quality Management Permissions.

  2. Confirm the permission management configuration by clicking Confirm.

What to do next

After completing the quality rule configurations, you can view them on the real-time metadata table rule list page. For more information, see Monitored Object List.