All Products
Search
Document Center

:Configure rules to monitor data quality

Last Updated:Apr 17, 2024

This topic describes how to configure a monitoring rule in Data Quality for the dwd_log_info_di_emr table.

Prerequisites

Data is collected and processed.

Procedure

  1. Go to the Data Quality page.

    Log on to the DataWorks console. In the left-side navigation pane, choose Data Modeling and Development > Data Quality. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Quality.

  2. Go to the rule configuration page for the dwd_log_info_di_emr table.

    In the left-side navigation pane of the Data Quality page, choose Rule Management > Configure Rule (by Table). You can specify the following conditions to find the desired table:

    • Data source: E-MapReduce (EMR)

    • Database: current project in the production environment

    • Table: dwd_log_info_di_emr

  3. Find the desired table in the search result and click Monitoring Setting in the Actions column to go to the rule configuration page of the table. The following section describes how to configure monitoring rules for the table.

  4. Configure a partition filter expression.

    1. On the rule configuration page of the table, click the image.png icon next to Partition Filter Expression. The Add Partition dialog box is displayed.

    2. In the Add Partition dialog box, set the Partition Filter Expression parameter to dt=$[yyyymmdd-1] and select a data quality plug-in.

    3. Click Verify to check whether the calculation result is as expected. After you confirm the result, click OK.

  5. Create a monitoring rule.

    1. Click Create Rule. The Create Rule panel appears.

    2. Click Add Monitoring Rule on the Template Rules tab, configure the parameters shown in the following table, and then click Batch Create.

      Parameter

      Description

      Rule Name

      The name of the monitoring rule.

      Rule Type

      The strength type of the monitoring rule. Set this parameter to Strong.

      Auto-Generated Threshold

      Specifies whether to use a dynamic threshold. Configure this parameter based on your business requirements.

      Note

      You can use dynamic thresholds only in DataWorks Enterprise Edition or a more advanced edition.

      Rule Source

      The source for the monitoring rule. Valid values: Built-in Template and Rule Templates.

      Note

      You can select Rule Templates only in DataWorks Enterprise Edition or a more advanced edition.

      Field

      Set this parameter to All Fields in Table(table).

      Template

      Set this parameter to Number of rows, fixed value.

      Comparison Method

      Set this parameter to Greater Than.

      Expected Value

      Set this parameter to 0. In this case, the actual expected value must be greater than 0.

  6. Test the monitoring rule.

    1. Click Test.

    2. In the Test dialog box, configure the Data Timestamp and Resource Group parameters and click Test.

    3. After the test run is complete, follow the on-screen instructions to view the test run result.

  7. Associate the monitoring rule with a scheduling node.

    1. On the rule configuration page of the table, click Manage Linked Nodes.

    2. In the Manage Linked Nodes dialog box, enter a node name in the search box and click Create.

    3. After the node is added, you can associate the node with the monitoring rule. After the node instance is run, the monitoring rule is triggered to check data quality.

  8. Subscribe to the check result of the monitoring rule.

    1. On the rule configuration page of the table, click Manage Subscriptions.

    2. In the Manage Subscriptions dialog box, configure the Notification Method and Recipient parameters.

    3. After the configuration is complete, click Save. Then, you can go to the My Subscriptions page to view your subscriptions and modify the subscription configuration.