This topic explains how to create real-time metadata table rules.
Prerequisites
Before configuring quality rules, you must add monitored objects. For more information on how to add monitored objects, see Add Monitored Objects.
Permission description
Super administrators, quality administrators, custom global roles with Quality Rule - Manage permissions, custom project roles with Project Quality Management - Quality Rule Management permissions for the project where the metadata table is located, and real-time metadata table owners can configure scheduling, alerts, and other settings for quality rules.
Quality owners and regular users must also have read-through permissions for real-time metadata tables. For details on how to request real-time metadata table permissions, see Request Table Permissions.
The supported operation permissions vary depending on the object. For more information, see Operation Permissions for Quality Rules.
Differences between trial runs and executions
The main differences between trial runs and executions are the execution methods and the results displayed. A Trial run is a one-time simulated execution of a quality rule to verify its correctness and execution status, with results not displayed in the quality report. An Execution is the scheduled inspection of quality rules, with results output to the quality report for user review and analysis.
Description of metadata table rules
Rule type | Description |
Statistical trend monitoring | Monitors data values and data change trends. |
Real-time multi-link comparison | In scenarios with strong business guarantees, real-time dual-link or triple-link quality rules can monitor data. If an exception occurs, O&M personnel can promptly switch or back up data. Real-time multi-link comparison quality rules support monitoring issues such as data retention and statistical drift. |
Real-time and offline data comparison | When real-time and offline data use the same statistical logic, real-time and offline data comparison quality rules can detect differences between the data. Significant differences may indicate data quality issues. |
Quality rule configuration
On the Dataphin home page, select Administration > Data Quality from the top menu bar.
Click Quality Rule in the left-side navigation pane. On the Real-time Metadata Table page, click the name of the target object to access the Quality Rule Details page and configure quality rules.
On the Quality Rule Details page, click Create Quality Rule.
In the Create Quality Rule dialog box, configure the parameters.
Parameter
Description
Basic information
Rule Name
The name of the custom quality rule.
Rule Strength
Supports Weak Rule and Strong Rule.
If you select Weak Rule, an alert is triggered when the quality rule validation result is abnormal. However, it does not block downstream task nodes.
If you select Strong Rule, an alert is triggered when the quality rule validation result is abnormal. Additionally, if there are downstream tasks (such as code inspection scheduling or task trigger scheduling), the downstream tasks are blocked to prevent data divergence. If there are no downstream tasks (such as periodic quality scheduling), only an alert is triggered.
Description
Custom quality rule description. No more than 128 characters.
Rule Template
You can select a consistency or stability rule template.
Consistency: Includes Stream-batch Comparison and Real-time Link Comparison.
Stability: Includes Real-time Statistical Value Detection.
For more information, see real-time meta table template type.
Rule Type
The rule type is related to the template. It is the most basic property of the template and can be used for description and filtering features.
Rule configuration
Rule Configuration
Perform rule configuration based on the selected Rule Template. For more information, see Offline link comparison parameter configuration, Multi-link comparison parameter configuration.
Validation configuration
Rule Validation
After data quality rule validation, the results are compared with the configuration of abnormal validation. If the conditions are met, the validation result is considered failed. This also triggers subsequent processes such as alerts.
The available indicators for abnormal validation are determined by the template and configuration content. It supports multiple conditional and/or conditions. It is recommended not to exceed three in actual configuration.
For more information, see the referenced document.
Business property configuration
Property information
The specification for filling in business properties depends on the configuration of the quality rule properties. For example, the field value type corresponding to the department in charge is an enumeration value (multiple choice), with selectable enumeration values including the Big Data Department, Business Department, and Technical Department. Therefore, when creating a quality rule, this property value is a dropdown multiple-choice box, with selectable options being enumeration values (multiple choice), and the range of selectable enumeration values includes the Big Data Department, Business Department, and Technical Department.
The field value type corresponding to the rule owner is custom input, with a property field length of 256. Therefore, when creating a quality rule, this property value can be entered with up to 256 characters.
If the method of filling in the property field is range interval, the configuration method is as follows:
Range interval: This is commonly used when the value range is continuous numbers or dates. You can select from four symbols: >, >=, <, <=. For more property configurations, see create and manage quality rule properties.
Schedule property configuration
Schedule Method
It supports selecting an already configured schedule. If you have not yet decided on a scheduling method, you can create a quality rule first and then configure it. For creating a new one, see create a new schedule.
Click OK to finalize the quality rule configuration.
To review SQL changes, you can click Preview SQL and compare the current configuration with the previously saved one.
NoteIf key information is incomplete, the SQL preview is not available.
The left side shows the SQL preview of the previously saved configuration, which will be empty if no configuration exists. The right side displays the SQL preview of the current configuration.
Rule configuration list
The rule configuration list page allows you to view configured meta table rule information and perform actions such as viewing, editing, testing, running, or deleting rules.
Area | Description |
①Filter and Search Area | Supports quick search by object or rule name. Supports filtering by rule type, rule template, rule strength, trial run status, or effective status. Note If the quality rule property is configured with searchable and filterable business attributes and is enabled, you can search or filter based on this attribute. |
②List Area | Displays the object type/name, rule name/ID, test run status, effective status, rule type, rule template, rule strength, schedule type, and related knowledge base document information in the rule configuration list. Click the
|
③Operation Area | You can view, clone, edit, trial run, run, configure schedule, associate knowledge base documents, or delete operations.
|
④Batch Operation Area | You can perform batch trial runs, runs, schedule configurations, enable, disable, modify business properties, associate knowledge base documents, or delete operations.
|
Create schedule
When setting up scheduling rules, you can swiftly apply configurations using the current schedule in this table, with a limit of 20 scheduling rules per table.
A maximum of 10 schedules can be configured for the same rule.
Automatic deduplication is supported when the schedule configuration is identical.
On the Quality Rule Details page, click the Scan Configuration tab, then click Create Schedule to open the Create Schedule dialog box.
In the Create Schedule dialog box, set the parameters.
Parameter
Description
Schedule Name
Custom schedule name.
Schedule Type
Supports Timed Scheduling, Data Update Triggered Scheduling, and Fixed Task Triggered Scheduling.
Recurrency Triggered: Supports scheduled and periodic data quality checks based on the set schedule time, suitable for scenarios where data production time is relatively fixed.
Recurrence: Running quality rules will occupy certain computing resources. It is recommended to avoid concurrent execution of multiple quality rules at the same time to prevent affecting the normal operation of production tasks. The scheduling cycle includes five types: Day, Week, Month, Hour, and Minute.
Data Update Triggered: When all code tasks are executed, it will parse whether the current task run updates the specified verification range of the current table. Suitable for tables with non-fixed modification tasks or tables that need to be closely monitored, i.e., every change needs to be monitored.
NoteIt is recommended to select the partition updated by the task as the verification range (non-partitioned tables will verify the entire table). The system will automatically detect all data changes and perform verification to avoid omissions.
Task Triggered: Execute the configured quality rules after or before the specified task runs successfully. Supports selecting task types such as Engine SQL, Offline Pipeline, Python, Shell, Virtual, Datax, Spark_jar, Hive_MR, and Database SQL node to trigger tasks. Suitable for scenarios where table modification tasks are fixed.
NoteFixed task triggering can only select production environment tasks. If the rule intensity is configured as a strong rule, and the scheduling task verification fails, it may affect online tasks. Please operate cautiously according to business needs.
Trigger Timing: Select the timing for quality detection. Supports selecting Trigger After All Tasks Run Successfully, Trigger After Each Task Runs Successfully, and Trigger Before Each Task Runs.
Triggering Task: Supports selecting production task nodes that the current user has operation and maintenance permissions for, and you can search by node output name.
NoteWhen the trigger timing is selected as trigger after all tasks run successfully, it is recommended to select tasks with the same scheduling cycle to avoid rule delay due to different scheduling cycles, resulting in delayed quality detection results.
Schedule Condition
Disabled by default. When enabled, it will first determine whether the scheduling conditions are met before the quality rules are officially scheduled. If the conditions are met, it will be officially scheduled. If not, this schedule will be ignored.
Data Timestamp/Executed On: If the schedule type is selected as Recurrency Triggered (timed scheduling does not support execution date), Data Update Triggered, Task Triggered, date configuration is supported. You can choose Normal Calendar or Custom Calendar. For how to customize a calendar, see Create Public Calendar.
If you choose Normal Calendar, the conditions can be Month, Week, Date. For example, see the image below:
If you choose Custom Calendar, the conditions can be Date Type, Tag. For example, see the image below:
Instance Type: If the schedule type is selected as Data Update Triggered, Task Triggered, instance type configuration is supported. You can choose Recurring Instance, Data Backfill Instance, One-time Instance. For example, see the image below:
NoteAt least one rule must be configured. To add a rule, click the + Add Rule button.
A maximum of 10 scheduling conditions can be configured.
The relationship between scheduling conditions can be configured as and, or.
Click OK to complete the schedule setup.
Schedule configuration list
Once the schedule is created, you can manage it through the schedule configuration list, including viewing, editing, cloning, and deleting schedules.
Area | Description |
① Filter and Search Area | Supports quick search by schedule name. Supports filtering by Recurrency Triggered, Data Update Triggered, Task Triggered. |
②List Area | Displays the rule configuration list's Schedule Name, Schedule Type, Last Updated By, Last Updated Time information. |
③Operation Area | You can edit, clone, and delete the schedule.
|
Alert configurations
Configure different alert methods for various rules to distinguish between alerts, such as setting phone alerts for strong rule anomalies and text message alerts for soft rule anomalies. If a rule triggers multiple alert configurations, you can determine the effective policy for the alert.
A single monitored object can have up to 20 alert configurations.
On the Quality Rule Details page, click the Alert Configurations tab, then click Create Alert Configuration to open the Create Alert Configuration dialog box.
In the Create Alert Configuration dialog box, enter the parameters.
Parameter
Description
Coverage
Supports selecting All Rules, All Strong Rules, All Soft Rules, and Custom.
NoteFor a single monitored object, the three ranges of all rules, all strong rules, and all soft rules each support configuring one alert. Newly added rules will automatically match the corresponding alert based on rule strength. If you need to change one of the alert configurations, you can modify the existing configuration.
The custom range can select all configured rules under the current monitored object, with no more than 200 rules.
Alert Configuration Name
The alert configuration name is unique under a single monitored object and does not exceed 256 characters.
Alert Recipient
Configure the alert recipient and alert method. At least one alert recipient and alert method must be selected.
Alert Recipient: Supports selecting custom, shift schedule, and quality owner as alert recipients.
Supports configuring no more than 5 custom alert recipients and no more than 3 shift schedules.
Alert Method: Supports selecting different receiving methods such as phone, email, text message, DingTalk, Lark, WeCom, and custom channel. This receiving method can be controlled through configure channel settings.
Click OK to finalize the alert configuration.
Alert configuration list
After setting up alerts, you can manage them through the alert configuration list, including sorting, editing, and deleting.
OrdinalNumber | Description |
① Sort Area | Supports configuring the alert effective policy when a quality rule meets multiple alert configurations:
|
② List Area | Displays the name, effective range, specific recipients of each alert type, and the corresponding alert receiving method of the alert configuration. Effective Range: Custom alerts support viewing the configured object name and rule name. If the rule is deleted, the object name cannot be viewed. It is recommended to update the alert configuration. |
③ Operation Area | You can edit and delete the configured alerts.
|
View quality report
Click Quality Report to access the Rule Validation Overview and Rule Validation Details for the current quality rules.
Quickly filter validation details based on abnormal results, partition time, rule, or object name keywords.
In the operation column of the rule validation details list, click the
icon to view detailed validation for the quality rules.
In the operation column of the rule validation details list, click the
icon to view the execution log for the quality rules.
Set permission management for quality rules
To manage permissions, click Permission Management and configure View Details, which specifies members who can view validation records, quality rule details, and quality reports.
View Details: Choose between All Members or Only Members With Current Object Quality Management Permissions.
Confirm the permission management configuration by clicking Confirm.
What to do next
After completing the quality rule configurations, you can view them on the real-time metadata table rule list page. For more information, see Monitored Object List.