Data Quality provides dozens of built-in table-level and field-level monitoring rule templates. This topic describes how to configure monitoring rules based on a monitoring rule template.

Background information

Built-in monitoring rule templates are classified into table-level monitoring rule templates and field-level monitoring rule templates. You can use a built-in monitoring rule template to quickly configure monitoring rules for multiple tables or fields at a time in Data Quality.

Limits

Data Quality allows you to configure monitoring rules for data in E-MapReduce (EMR), Hologres, AnalyticDB for PostgreSQL, and MaxCompute data sources based on monitoring rule templates.

Go to the Rule Configuration-Configure by Template page

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces.
  3. In the top navigation bar, select the region in which the workspace that you want to manage resides. Find the workspace and click Data Development in the Actions column.
  4. On the DataStudio page, click the Icon icon in the upper-left corner and choose All Products > Data governance > Data Quality.
  5. In the left-side navigation pane of the Data Quality page, choose Rule configuration > Rule Configuration-Configure by Template to go to the Rule Configuration-Configure by Template page.
    Data Quality provides built-in table-level and field-level monitoring rule templates. You can find the template that you want to use on the Rule Configuration-Configure by Template page and click Configure monitoring rules in the Actions column to configure monitoring rules for multiple tables or fields at a time based on the template. Monitoring rule template

Configure monitoring rules

  1. On the Rule Configuration-Configure by Template page, find the template that you want to use and click Configure monitoring rules in the Actions column to go to the Batch new monitoring rules wizard.
  2. Configure attributes for the monitoring rules.
    1. Configure attributes for the monitoring rules.
      Configure basic attributes
      Parameter Description
      Engine/Data Source The type of the compute engine or data source of tables or fields for which you want to configure monitoring rules.
      Note

      Data Quality allows you to configure monitoring rules for data in E-MapReduce (EMR), Hologres, AnalyticDB for PostgreSQL, and MaxCompute data sources based on monitoring rule templates.

      Rule Source The value of this parameter is fixed as Built-in Template.
      Template The name of the built-in monitoring rule template. For more information, see Built-in monitoring rule templates.
      Note You can configure field-level monitoring rules of the following types only for numeric fields: average value, sum of values, minimum value, and maximum value.
      Rule Name The naming format of the monitoring rules. The names of the monitoring rules are automatically generated. You can change the suffix in the naming format based on your business requirements.
      Description The description of the monitoring rules.
    2. Configure advanced attributes for the monitoring rules.
      Monitoring rule settings
      Parameter Description
      Rule Type The strength of the monitoring rules. Valid values: Strong and Soft.
      • If you set this parameter to Strong, critical alerts are reported and descendant nodes are blocked, whereas warning alerts are reported but descendant nodes are not blocked.
      • If you set this parameter to Soft, critical alerts are reported but descendant nodes are not blocked, whereas warning alerts are not reported and descendant nodes are not blocked.
      Comparison Method
      • If you configure monitoring rules of the numeric type, the valid values of the Comparison Method parameter are Greater Than, Greater Than or Equal To, Equal To, Unequal To, Less Than, and Less Than or Equal To.
      • If you configure monitoring rules of the fluctuation type, the valid values of the Comparison Method parameter are Absolute Value, Raise, and Drop.
      Expected Value This parameter is required only if you configure monitoring rules of the numeric type. When the monitoring rules are triggered, the system compares data profiling results with the expected value that you specify. If the data profiling results deviate from the expected value, an alert or blocking is triggered.
      Thresholds If you configure monitoring rules of the fluctuation type, you must configure Warning Threshold and Error Threshold. This way, the system compares the fluctuation rate of data profiling results with that of data sampling results of a specific time range. The comparison of the raising range, drop range, and fluctuation range (absolute value) is supported.
      For example, you set Rule Type to Strong, Warning Threshold to 5%, and Error Threshold to 10%.
      • If the fluctuation rate is greater than 5% but less than or equal to 10%, a warning alert is reported, and descendant nodes are not blocked.
      • If the fluctuation rate is greater than 10%, a critical alert is reported, and descendant nodes are blocked.
      Start-Stop Status You can turn on or off the switch to enable or disable the monitoring rules to control whether to apply the monitoring rules in the production environment.
      Notice If you disable the monitoring rules, you cannot test the monitoring rules, and the monitoring rules cannot be triggered by auto triggered nodes that are associated with them.
  3. Click Next to go to the Generate rules step.
    In the Generate rules step, add tables or fields to which you want to apply the monitoring rules based on the table-level or field-level monitoring rule template that you use. If you add partitioned tables, you must configure partition filter expressions for the tables. The partition filter expressions are used to determine the sampling scope of the data that you want to monitor. By default, if you add non-partitioned tables, NOTAPARTITIONTABLE is displayed in the Partition expression columns that correspond to the tables.
    1. Add tables or fields.
      • In the Select table section, click Add table. In the Add table dialog box, configure Engine/Database Instance. All tables that belong to the selected compute engine instance or database are displayed. You can also configure Table Name to search for the desired table. Then, you can select the desired tables and click Create to add the tables to the Select table section. Add tables
      • In the Select field section, click Add field. In the Add field dialog box, configure Engine/Database Instance. The Table to be added section displays all tables that belong to the selected compute engine instance or database. Then, select the tables that contain the fields to which you want to apply the monitoring rules. The Select field section displays all fields in the selected tables. You can filter the fields by field name or field description. Select the fields to which you want to apply the monitoring rules and click Create. The selected fields are displayed in the Select field section. Data Quality
    2. Configure partition filter expressions.
      In the Select table section, find the table for which you want to configure a partition filter expression and click the Icon icon. In the Set partition expression in batch dialog box, select a partition filter expression from the Partition expression drop-down list and click OK. Data Quality matches the partitions in which data generated by the auto triggered node is stored every day based on the partition filter expression. If you want to configure partition filter expressions for multiple tables at a time, you can select the tables and click Set partition expression.
  4. Click Generate rules to go to the Rule validation step.
    You can click Custom columns in the Rule validation section to select the columns that you want to display in the monitoring rule list. In the Rule validation section, you can perform the following operations:
    • Test the validity of the monitoring rules.
      After the monitoring rules are configured, you can select one or more monitoring rules that you want to test and click Trial run below the monitoring rule list. In the Trial run dialog box, select a data timestamp from the Data Timestamp drop-down list. The data timestamp is used to simulate the time when the monitoring rules are triggered. Then, click Calculate actual partition. The system calculates values for the partitions in the tables to which the monitoring rules are applied based on the data timestamp you select and the partition filter expressions you configure. Then, click Trial run. The system checks data in the partitions in the tables based on the monitoring rules. Test the monitoring rules
      After you test a monitoring rule, you can click Test run record in the Actions column that corresponds to the monitoring rule to view details about the test and perform the required operations.
      Note If an error occurred during a test on a monitoring rule, the reason may be that the table or the table partition does not exist or table data does not meet the requirements of the monitoring rule.
    • Associate the monitoring rules with auto triggered nodes to trigger the monitoring rules

      You can click Recommended Association scheduling or Manual Association scheduling to associate the monitoring rules with auto triggered nodes that generate the table data. The auto triggered nodes generate the table data after the auto triggered node instances, data backfill instances, or test instances generated for the auto triggered nodes are successfully run. When the auto triggered nodes start to run, the monitoring rules are triggered. You can configure the Rule Type parameter to control whether to block the descendant nodes of the auto triggered nodes. This helps reduce the impact of dirty data records.

      • Recommended Association scheduling: The system associates the selected monitoring rules with auto triggered nodes based on the lineage of the auto triggered nodes that generate the table data.
      • Manual Association scheduling: You can manually associate the selected monitoring rules with specific auto triggered nodes.
      Notice A monitoring rule can be triggered only if it is associated with auto triggered nodes.
      Associate the monitoring rules with auto triggered nodes
    • Delete monitoring rules: You can delete one or more monitoring rules.
    • View the details of a monitoring rule: You can find the monitoring rule whose details you want to view and click Rule details in the Actions column. You can also modify, enable, disable, or delete the monitoring rule, specify strength for the monitoring rule, or view the logs of the monitoring rule.
  5. After the test on the monitoring rules is successful and the monitoring rules are associated with auto triggered nodes, click Save. Check whether the configuration is complete. If the configuration is complete, click OK.

What to do next

  • After the monitoring rules are configured based on a template, you can view the details about the monitoring rules and subscribe to the monitoring rules when you configure monitoring rules by table. Alert messages that are generated after the monitoring rules are triggered can be sent to the related alert contacts by using DingTalk chatbots, text messages, or emails. For more information about how to configure monitoring rules by table, see Configure monitoring rules by table.
  • If you want to prevent data that does not meet the requirements of a monitoring rule from blocking the running of the associated auto triggered node on the specified data timestamp, you can configure a noise reduction rule for the monitoring rule to denoise the data. For more information, see Mange noise reduction rules.