Dataphin enables monitoring of data source connectivity and table structure changes. It regularly checks for changes and supports setting up alerts for abnormalities, helping you stay informed about the status of data source connectivity and table structure in real time. This topic describes how to set up quality rules for data sources.
Prerequisites
You must add the monitored object before configuring quality rules. For more information, see Add Monitored Object.
Permission Description
Super administrators, quality administrators, custom global roles with Quality Rule-Management permissions, and data source owners are authorized to configure scheduling, alerts, and more for quality rules.
Quality owners and regular users require additional permissions to review data sources. For instructions on how to obtain these permissions, see Request Data Source Permissions.
Operation permissions vary by object type. For more information, see Quality Rule Operation Permissions.
Differences Between Trial Run and Run
A Trial Run simulates the execution of a quality rule to verify its correctness and operational status, with results not displayed in the quality report. A Run executes the quality rule at a specified time, with results published in the quality report for user review and analysis.
Quality Rule Configuration
Navigate to Administration > Data Quality from the top menu bar on the Dataphin home page.
Select Quality Rule in the left-side navigation pane. On the Datasource page, click the name of the target object to access the Quality Rule Details page and configure quality rules.
On the Quality Rule Details page, click the Create Quality Rule button.
In the Create Quality Rule dialog box, set the necessary parameters.
Parameter
Description
Basic Information
Rule Name
Custom name for the quality rule, not exceeding 256 characters.
Rule Strength
Supports Weak Rule and Strong Rule.
Weak Rule: If you select Weak Rule, the quality rule verification result will alert when abnormal but will not block downstream task nodes.
Strong Rule: If you select Strong Rule, the quality rule verification result will alert when abnormal. If there are downstream tasks (code check scheduling, task trigger scheduling), it will block downstream tasks to prevent data pollution. If there are no downstream tasks (such as periodic quality scheduling), it will only alert.
Description
Custom description of the quality rule. Not exceeding 128 characters.
Rule Template
Only supports Stability, including Data Source Connectivity Monitoring and Table Structure Change Monitoring.
Connectivity Monitoring: Configure alerts for changes in connectivity monitoring due to network changes, username, password, etc., that cause connection failures to data sources configured in Dataphin, leading to task errors.
Table Structure Change: Monitor and alert for changes in the structure of ancestor tables, such as renaming, deleting, or adding fields, which may cause downstream errors.
Rule Type
The rule type is related to the template and is the most basic property of the template. It can be used as a description and filtering feature.
Rule Configuration
Select Verification Table
When the rule template is set to Table Structure Change Monitoring, you need to select the data table to be verified.
Business Property Configuration
Property Information
The specification for filling in business properties depends on the configuration of the quality rule properties. For example, the field value type corresponding to the department in charge is an enumeration value (multiple choice), and the optional enumeration value range is the Big Data Department, Business Department, and Technical Department. Therefore, when creating a quality rule, this property value is a drop-down multiple-choice box. The optional items are enumeration values (multiple choice), and the optional enumeration value range is the Big Data Department, Business Department, and Technical Department.
The field value type corresponding to the rule owner is custom input, and the property field length is 256. Therefore, when creating a quality rule, this property value can be entered with no more than 256 characters.
If the filling method of the property field is Range Interval, the configuration method is as follows:
Range Interval: Commonly used when the value range is continuous numbers or dates. You can select four symbols: >, >=, <, <=. For more property configurations, see Create and Manage Quality Rule Properties.
Scheduling Property Configuration
Scheduling Method
Supports selecting configured scheduling. If the scheduling method is not yet decided, you can configure it after creating the quality rule. To create a new one, see Create Scheduling.
Click OK to finalize the rule configuration.
Rule Configuration List
The rule configuration list page displays the data source rule information, where you can view, edit, trial run, run, and delete rules.
Area | Description |
①Filter and Search Area | Supports quick search by object or rule name. Supports filtering by rule type, rule template, rule strength, trial run status, and effective status. Note If the quality rule property is configured with searchable and filterable business properties and is enabled, you can search or filter based on this property. |
②List Area | Displays the object type/name of the rule configuration list, rule name/ID, test run status, effective status, rule type, rule template, rule intensity, schedule type, and related knowledge base document information. Click the
|
③Operation Area | You can perform view, clone, edit, trial run, run, scheduling configuration, associate knowledge base document, and delete operations.
|
④Batch Operation Area | You can perform batch trial run, run, configure scheduling, enable, disable, modify business properties, associate knowledge base document, and delete operations.
|
Create Scheduling
When setting up scheduling for rule configuration, you can swiftly apply up to 20 scheduling rules per table based on its current schedule.
A single rule can have up to 10 scheduling configurations.
Automatic deduplication is supported when the scheduling configuration is fully consistent.
On the Quality Rule Details page, select the Scan Configuration tab, then click the Create Scheduling button to open the Create Scheduling dialog box.
In the Create Scheduling dialog box, configure the required parameters.
Parameter
Description
Scheduling Name
Custom scheduling name, not exceeding 64 characters.
Scheduling Type
Supports Recurrency Triggered and Task Triggered.
Recurrency Triggered: Supports timed and periodic quality checks on data based on the set scheduling time, suitable for scenarios where data production time is relatively fixed.
Recurrence: Running quality rules will occupy certain computing resources. It is recommended to avoid concurrent execution of multiple quality rules at the same time to avoid affecting the normal operation of production tasks. The scheduling cycle includes five cycle types: Day, Week, Month, Hour, and Minute.
Task Triggered: Execute the configured quality rule after or before the specified task runs successfully. Supports selecting task types such as engine SQL, offline pipeline, Python, Shell, Virtual, Datax, Spark_jar, Hive_MR, and database SQL nodes to trigger tasks. Suitable for situations where table modification tasks are fixed.
NoteFixed task triggers can only select production environment tasks. If the rule strength is configured with a strong rule, scheduling task verification failure may affect online tasks. Please operate cautiously according to business needs.
Trigger Timing: Select the trigger timing for quality checks. Supports selecting Trigger After All Tasks Run Successfully, Trigger After Each Task Runs Successfully, and Trigger Before Each Task Runs.
Triggering Task: Supports selecting production task nodes with maintenance permissions for the current user. You can search by node output name.
NoteWhen the trigger timing is set to trigger after all tasks run successfully, it is recommended to select tasks with the same scheduling cycle to avoid rule delay execution and quality check result delay due to different scheduling cycles.
Schedule Condition
Disabled by default. After enabling, before the formal scheduling of the quality rule, it will first determine whether the scheduling conditions are met. The formal scheduling will only occur if the conditions are met. If the conditions are not met, this scheduling will be ignored.
Business Date/Executed On: If the scheduling type is set to Recurrency Triggered (timed scheduling does not support execution date), Code Check Triggered Scheduling, or Task Triggered, date configuration is supported. You can select Regular Calendar or Custom Calendar. For how to customize a calendar, see Create Public Calendar.
If you select Regular Calendar, the conditions can be selected as Month, Week, Date. For example, see the figure below:
If you select Custom Calendar, the conditions can be selected as Date Type, Tag. For example, see the figure below:
Instance Type: If the scheduling type is set to Code Check Triggered Scheduling or Task Triggered, instance type configuration is supported. You can select Recurring Instance, Data Backfill Instance, One-time Instance. For example, see the figure below:
NoteAt least one rule must be configured. To add a rule, click the + Add Rule button.
Up to 10 scheduling conditions can be configured.
The relationship between scheduling conditions can be configured as and, or.
Click OK to complete the scheduling setup.
Scheduling Configuration List
The scheduling configuration list allows for viewing, editing, cloning, and deleting of scheduling configurations after their creation.
Area | Description |
①Filter and Search Area | Supports quick search by scheduling name. Supports filtering by Recurrency Triggered and Fixed Task Trigger Scheduling. |
②List Area | Displays the information of Schedule Name, Schedule Type, Last Updated By, and Last Updated Time in the rule configuration list. |
③Operation Area | You can perform edit, clone, and delete operations on scheduling.
|
Set Alerts
Configure different alert methods for various rules to distinguish between them. For instance, set phone alerts for critical rule exceptions and text message alerts for minor ones. If a rule triggers multiple alert configurations at once, you can determine the effective alert policy.
A single monitored object can have no more than 20 alert configurations.
On the Quality Rule Details page, click the Alert Configuration tab, then click the Create Alert Configuration button to open the Create Alert Configuration dialog box.
In the Create Alert Configuration dialog box, set the necessary parameters.
Parameter
Description
Coverage
Supports selecting All Rules, All Strong Rules, All Weak Rules, and Custom.
NoteUnder a single monitored object, the three ranges of all rules, all strong rules, and all weak rules support configuring one alert each. Newly added rules will automatically match the corresponding alert based on rule strength. To change one of the alert configurations, you can modify the existing configuration.
The custom range can select all configured rules under the current monitored object, not exceeding 200.
Alert Configuration Name
The alert configuration name under a single monitored object is unique and does not exceed 256 characters.
Alert Recipients
Configure alert recipients and alert methods. At least one alert recipient and alert method must be selected.
Alert Recipients: Supports selecting custom, shift schedule, and quality owner as alert recipients.
Supports configuring up to 5 custom alert recipients and up to 3 shift schedules.
Alert Method: Supports selecting different receiving methods such as phone, email, text message, DingTalk, Lark, WeCom, and custom channel. This receiving method can be controlled through Configure Channel Settings.
Click OK to finalize the alert configuration.
Alert Configuration List
Upon completing the alert configuration, you can sort, edit, and delete configurations in the alert configuration list.
OrdinalNumber | Description |
① Sorting Area | Supports configuring the alert effective policy when a quality rule matches multiple alert configurations:
|
② List Area | Displays the name, effective range, specific recipients of each alert type, and corresponding alert receiving methods of the alert configuration. Scope Of Effect: You can click the View icon after the scope of effect rule to view the scope of the rule. Only custom alerts support viewing the object name and rule name in the configuration. If the rule is deleted, the object name cannot be viewed. It is recommended that you update the alert configuration. |
③ Operation Area | You can perform editing and deleting operations on the configured alerts.
|
View Quality Report
Click Quality Report to access the Rule Verification Overview and Rule Verification Details for the current quality rule.
Quickly filter verification details by abnormal results, partition time, rule, or object name keyword.
In the operation column of the rule verification details list, click the
icon to view detailed verification information for the quality rule.
In the operation column of the rule verification details list, click the
icon to view the execution logs for the quality rule.
Set Quality Rule Permission Management
Click Permission Management to set up View Details, which allows specified members to view verification records, quality rule details, and quality reports.
View Details: Choose between All Members or Only Members With Current Object Quality Management Permissions.
Click OK to finalize the permission management settings.
What to do next
Once you have completed the quality rule configuration, you can view it on the data source rule list page. For more details, see View Monitored Object List.