Monitoring rules are a key component of Data Quality. You can configure rules to monitor data in E-MapReduce (EMR), Hologres, AnalyticDB for PostgreSQL, MaxCompute, and DataHub. This topic describes how to configure monitoring rules for MaxCompute.

Add a MaxCompute data source

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces.
  3. On the Workspaces page, find the workspace to which you want to add a MaxCompute data source and click Data Integration in the Actions column.
  4. On the Data Integration page, click Data Source in the left-side navigation pane. The Data Source page appears.
  5. Click Add data source in the upper-right corner. In the Add data source dialog box, click MaxCompute in the Big Data Storage section. In the Add MaxCompute data source dialog box, configure the required parameters to add a MaxCompute data source. For more information, see Add a MaxCompute data source.

Select a data source

  1. Click the More icon icon in the upper-left corner and choose All Products > Data governance > Data Quality.
  2. In the left-side navigation pane, choose Rule management > Configure by Table.
  3. On the page that appears, select MaxCompute from the Engine/Data Source drop-down list and select a MaxCompute project from the Engine/Database Instance drop-down list. All tables in the selected MaxCompute project are displayed.
    You can search for a table by name. You can use the initial letters of a table name to perform a fuzzy match.
  4. Find the table for which you want to configure a monitoring rule and click View Monitoring Rules in the Actions column.

Configure a partition filter expression

In Data Quality, you must configure a monitoring rule based on a partition filter expression:
  • To configure a monitoring rule for a non-partitioned table, you can specify NOTAPARTITIONTABLE as the partition filter expression.
  • To configure a monitoring rule for a partitioned table, you can specify a data timestamp expression, such as $[yyyymmdd], as the partition filter expression.
On the rule configuration page of a table, click the + icon to the right of Partition Filter Expression. Add a partition filter expression
You can create a partition filter expression or select a recommended partition filter expression.
  • Create a partition filter expression
    In the Add Partition dialog box, enter a partition filter expression that conforms to the syntax based on your business requirements. For a non-partitioned table, select NOTAPARTITIONTABLE from the recommended partition filter expressions.
    • For a table with only one level of partitions, specify the partition filter expression in the format of Partition key=Partition value. The partition value can be a constant or a built-in parameter.
    • For a table with multiple levels of partitions, specify the partition filter expression in the format of Partition key 1=Partition value/Partition key 2=Partition value/Partition key N=Partition value. Each partition value can be a constant or a built-in parameter. The partition filter expression must include information about all levels of partitions. You must enclose a parameter in brackets ([]), such as $[yyyymmdd-N].
    The data timestamp that is configured in a partition filter expression determines the recurrence of the partition filter expression. For example, if the data timestamp indicates the date that is five days before the current date, the partition filter expression is triggered every five days. The following table describes the supported partition filter expressions.
    Partition filter expression Description
    dt=$[yyyymmdd-N] Indicates the date that is N days before the current date.
    dt=$[hh24miss-1/24] Indicates the time that is 1 hour before the current time.
    dt=$[hh24miss-30/24/60] Indicates the time that is 30 minutes before the current time.
    dt=$[add_months(yyyymmdd,-1)] Indicates the date that is one month before the current date. The date is accurate to the day.
    $[yyyymmdd] Indicates the date when the node instance is run.
    $[yyyymmdd-1] Indicates the date that is one day before the date when the node instance is run.
    $[yyyymmddhh24miss] Indicates the time at which the node instance is run, in the yyyymmddhh24miss format, where:
    • yyyy indicates a four-digit year.
    • mm indicates a two-digit month.
    • dd indicates a two-digit day.
    • hh24 indicates a two-digit hour (24-hour clock).
    • mi indicates two-digit minutes.
    • ss indicates two-digit seconds.
    NOTAPARTITIONTABLE Indicates the partition filter expression of a non-partitioned table.
  • Select a recommended partition filter expression
    This section describes how to select a recommended partition filter expression. In this example, the dt partition key is used.
    1. In the Add Partition dialog box, click the Partition Filter Expression field. A drop-down list appears to show you the partition filter expressions that are recommended by Data Quality.
      • Select a recommended partition filter expression that meets your business requirements.
      • If no recommended partition filter expressions meet your business requirements, specify a custom partition filter expression.
    2. After you enter a partition filter expression, click Verify. Data Quality uses the current time, which is the scheduling time, to calculate data and verify the partition filter expression. Verify
    3. Click OK.

If you need to delete a specified partition filter expression, move the pointer over the partition filter expression and click the delete icon. After you delete a partition filter expression, all rules that are configured based on the partition filter expression are also deleted.

Associate a partition filter expression with a node

To monitor the quality of data involved in a scheduling node that generates data, you must associate a partition filter expression with the node.
  • The Manage Linked Nodes dialog box lists all committed nodes.
  • Before you associate a partition filter expression with a node in another workspace, make sure that you are assigned the Workspace Manager, Development, or O&M role in the current workspace and the workspace where the node resides.
You can associate a partition filter expression with one or more nodes. After nodes are associated with a partition filter expression, Data Quality can automatically monitor the nodes.
Note Data Quality allows you to associate a partition filter expression with a node in a flexible manner. You can select a node that is not related to your table.
  1. On the rule configuration page of a table, click Manage Linked Nodes.
    Manage Linked Nodes button
  2. In the Manage Linked Nodes dialog box, enter the name of the node that you want to associate with the partition filter expression.
    Manage Linked Nodes dialog box
  3. Click Create.

Create a monitoring rule

Monitoring rules are a key component of Data Quality. You can create monitoring rules for your tables based on your business requirements.

Data Quality allows you to create template rules and custom rules based on your business requirements. If you want to create a template rule or a custom rule, you can click Add Monitoring Rule or Quick Create. For more information, see Configure monitoring rules by table.

After monitoring rules are configured, you can click Batch Create to save all configured monitoring rules for the current partition filter expression.
Creation method Parameter Description
Add Monitoring Rule Rule Name The name of the monitoring rule.
Rule Type The strength of the monitoring rule. Valid values:
  • Strong: If a node reaches the error threshold, Data Quality reports an error alert and determines that the node fails. If a node reaches the warning threshold, Data Quality reports a warning alert and determines that the node is successful.
  • Soft: If a node reaches the error threshold, Data Quality reports an error alert and determines that the node is successful. If a node reaches the warning threshold, Data Quality does not report a warning alert and determines that the node is successful.
Auto-Generated Threshold Specifies whether to use dynamic thresholds. You can use dynamic thresholds only in DataWorks Enterprise Edition or a more advanced edition.
Rule Source The source for the monitoring rule. Valid values: Built-in Template and Rule Templates.
Field The fields that you want to monitor. You can select all fields in a table or a specific field. If you select a field, you can apply the monitoring rule to the specified field in the table.
Note In this example, select All Fields in Table and configure other parameters for the table-specific rule.
Template
  • The template that you want to apply to the monitoring rule. If you set the Rule Source parameter to Built-in Template, the built-in table-specific rules are displayed.
  • If you set the Rule Source parameter to Rule Templates, you must configure parameters such as Sampling Method and Set Flag. For more information, see Create, manage, and use rule templates.
Comparison Method The comparison method of the monitoring rule. Valid values: Absolute Value, Raise, and Drop.
Thresholds The warning threshold and error threshold of the fluctuation. You can enter thresholds or adjust the slider to specify thresholds.
Description The description of the monitoring rule.
Quick Create Rule Name The name of the monitoring rule.
Field The fields that you want to monitor. You can select all fields in a table or a specific field. If you select a field, you can apply the monitoring rule to the specified field in the table.
Trigger
  • The trigger condition of the monitoring rule. If you select All Fields in Table for the Field parameter, you can set this parameter to The number of columns is greater than 0 or Table row number dynamic threshold.
    Important You can use dynamic thresholds only in DataWorks Enterprise Edition or a more advanced edition.
  • If you select a field for the Field parameter, the valid values of the Trigger parameter include The field value already exists, Null Field, and Unique value dynamic threshold.
    Important You can use dynamic thresholds only in DataWorks Enterprise Edition or a more advanced edition.

Test monitoring rules

After monitoring rules are configured for a partition filter expression, you can test all these rules and view the test results.
Note You can run the monitoring rules to test their configurations and notification methods. We recommend that you test rules based on your business requirements.
  1. On the rule configuration page of a table, click Test.
  2. In the Test dialog box, configure the Data Timestamp parameter.
    Parameter Description
    Partition The partition filter expression for which monitoring rules are run. The actual partition varies based on the specified data timestamp. For a non-partitioned table, NOTAPARTITIONTABLE is used as the partition filter expression.
    Data Timestamp The data timestamp that is used to test monitoring rules. The default value is the current time.
  3. Click Test.
  4. In the Test dialog box, click The test is complete. Click to view the results. On the Node Query page, view the test results. For more information, see View monitoring results.

Manage subscriptions

By default, Data Quality sends notifications to the user who created a partition filter expression. You can also specify other users to which you want Data Quality to send notifications.

  1. On the rule configuration page of a table, click Manage Subscriptions.
  2. In the Manage Subscriptions dialog box, specify the notification method and notification recipient.
    Data Quality supports the following notification methods: Email, Email and SMS, DingTalk Chatbot, DingTalk Chatbot @ALL, Lark Group Chatbot, and Enterprise WeChat Chatbot.
    Note Add a DingTalk chatbot, Lark chatbot, or WeChat chatbot and obtain a webhook URL. Then, copy the webhook URL to the Recipient field in the Manage Subscriptions dialog box.
  3. Click Save.

View operation logs

On the rule configuration page of a table, click View Operation Log. In the View Operation Logs panel, you can view the information about each operation, including the user who performed the operation, the time at which the operation was performed, and the operation details.

The Details column displays the details of each operation that is performed on the current partition filter expression, including the rule configuration details.

View check results

On the rule configuration page of a table, click View Check Results to go to the Node Query page. On this page, you can view the check results of all monitoring rules that are configured for the current partition filter expression.

Clone rules

  1. On the rule configuration page of a table, click Clone Rules.
  2. In the Clone Rules dialog box, configure the New Expression parameter.
  3. Select Clone Subscribers or Change Table Names in Custom Rules based on your business requirements.
  4. Click Clone.