The Monitoring Rules page is the most important part of Data Quality, where you can configure rules to monitor data in E-MapReduce, MaxCompute, and Datahub. This topic describes how to configure monitoring rules for MaxCompute.

Add a MaxCompute connection

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces.
  3. On the Workspaces page that appears, find the target workspace and click Data Integration in the Actions column.
  4. On the Data Integration page that appears, click Connection in the left-side navigation pane. The Data Source page appears.
  5. Click Add a Connection in the upper-right corner to add a MaxCompute connection. For more information, see Configure a MaxCompute connection.

Select the MaxCompute connection

  1. On the current page, click Icon in the upper-left corner and choose All Products > Data Quality.
  2. On the page that appears, click Monitoring Rules in the left-side navigation pane.
  3. Set Engine/Data Source to MaxCompute and select the MaxCompute connection. All the tables in the selected MaxCompute data store appear.
    You can search for a table by table name. Fuzzy search based on the initial letters of a table name is supported.
  4. Find the target table and click View Monitoring Rules in the Actions column.

Configure a partition filter expression

In Data Quality, you need to configure rules based on a partition filter expression:
  • To configure rules for a non-partitioned table, you can specify NOTAPARTITIONTABLE as the partition filter expression.
  • To configure rules for a partitioned table, you can specify a data timestamp expression, such as $[yyyymmdd], or a regular expression as the partition filter expression.
On the Monitoring Rules page of a table, click + next to Partition Expression to add a partition filter expression.Add a partition filter expression
You can create a partition filter expression or select a recommended partition filter expression.
  • Create a partition filter expression
    In the Add Partition dialog box that appears, enter a partition filter expression that conforms to the syntax as required. For a non-partitioned table, select NOTAPARTITIONTABLE from the recommended partition filter expressions.
    • For a table with only one partition, follow the format: Partition key = Partition value. The partition value can be either a constant or a system parameter. You must configure partition expressions through the last partition.
    • For a table with multiple partitions, follow the format: Partition key 1 = Partition value/Partition key 2 = Partition value/Partition key N = Partition value. Each partition value can be either a constant or a system parameter. You must use brackets ([ ]) to indicate a parameter, such as $[yyyymmdd-N].
    The data timestamp configured in a partition filter expression also determines the recurrence of the partition filter expression. For example, if the data timestamp is the date of five days ago, the partition filter expression is triggered every five days. The following table lists supported partition filter expressions.
    Partition filter expression Description
    dt=$[yyyymmdd-N] Indicates N days before.
    dt=$[yyyymm01-1] Indicates the first day of each month.
    dt=$[yyyymm01-Nm] Indicates the first day of the month that is N months before the current month.
    dt=$[yyyymmld-1] Indicates the last day of each month.
    dt=$[yyyymmld-1m] Indicates the last day of the month that is N months before the current month.
    dt=$[hh24miss-1/24] Indicates one hour before the hour specified by the data timestamp.
    dt=$[hh24miss-30/24/60] Indicates half an hour before the hour specified by the data timestamp.
    $[yyyymmdd] Indicates the data timestamp.
    $[yyyymmdd-1] Indicates one day before the data timestamp of the current instance.
    $[yyyymmddhh24miss] Indicates the data timestamp of the current instance. The format yyyymmddhh24miss is described as follows:
    • yyyy indicates a four-digit year.
    • mm indicates a two-digit month.
    • dd indicates a two-digit day.
    • hh24 indicates a two-digit hour (24-hour clock).
    • mi indicates two-digit minutes.
    • ss indicates two-digit seconds.
    NOTAPARTITIONTABLE Indicates the partition filter expression of a non-partitioned table.
  • Select a recommended partition filter expression
    This section uses the dt partition as an example to describe how to select a recommended partition filter expression. We recommend that you specify a regular expression as the partition filter expression for a dynamic partitioned table.
    1. In the Add Partition dialog box that appears, click the Partition Expression field. A drop-down list appears to show you the partition filter expressions recommended by Data Quality.
      • Select a recommended partition filter expression if it meets your expectation.
      • Specify a custom partition filter expression if no recommended partition filter expressions meet your expectation.
    2. After you enter a partition filter expression, click Verify. Data Quality uses the current time, that is, the data timestamp, to calculate data and verify the partition filter expression.Computing
    3. Click OK.

If you want to delete a partition filter expression, move the pointer over the partition filter expression and click the Delete icon to delete the partition filter expression. When you delete a partition filter expression, all rules configured based on the partition filter expression are also deleted.

Link a partition filter expression to a node

To monitor the quality of data involved in a node, you need to link a partition filter expression to the node.
  • The Manage Linked Nodes dialog box lists all committed nodes. Data Quality allows you to link a partition filter expression to a node in another workspace.
  • Before you link a partition filter expression to a node in another workspace, make sure that you are an administrator, a developer, or an administration expert in the two workspaces.
You can link a partition filter expression to one or more nodes. After nodes are linked, Data Quality can automatically monitor linked nodes.
Note Data Quality allows you to flexibly link a partition filter expression to a node. You can select a node that is not related to your table.
  1. On the Monitoring Rules page of a table, click Manage Linked Nodes.
    Manage Linked Nodes button
  2. In Manage Linked Nodes dialog box that appears, enter the name of the node that you want to link to the partition filter expression.
    Manage Linked Nodes dialog box
  3. Click Create.

Create a rule

The Monitoring Rules page is the most important part of Data Quality, where you can create rules for your tables.

Currently, Data Quality allows you to create template rules and custom rules as needed. To create a template rule or a custom rule, you can click Add Monitoring Rule or Quick Create. For more information, see Configure monitoring rules.

After rules are configured, you can click Batch Create to save all the configured rules for the current partition filter expression.
Creation method Parameter Description
Add Monitoring Rule Rule Name The name of the rule.
Rule Type The type of the rule. Valid values:
  • Strong: If a node reaches the error threshold, Data Quality reports an error alert and determines that the node fails. If a node reaches the warning threshold, Data Quality reports a warning alert and determines that the node is successful.
  • Weak: If a node reaches the error threshold, Data Quality reports an error alert and determines that the node is successful. If a node reaches the warning threshold, Data Quality does not report a warning alert and determines that the node is successful.
Auto-Generated Threshold You can only use the dynamic threshold feature in DataWorks Enterprise Edition or higher.
Rule Source The source of the rule. Valid values: Built-in Template and Rule Templates.
Field The fields to be monitored. You can select All Fields in Table or specify fields. If you specify fields, you can apply the rule to the specified fields in the table.
Note In this example, select All Fields in Table and set other parameters for the table-specific rule.
Template
  • If you set Rule Source to Built-in Template, the built-in table-specific rules appear.
  • If you set Rule Source to Rule Templates, you need to set parameters, such as Sampling Method and Set Flag. For more information, see Manage rule templates.
Comparison Method The comparison method of the rule. Valid values: Absolute Value, Raise, and Drop.
Thresholds The warning threshold and error threshold of the fluctuation. You can adjust the slider to specify thresholds or directly enter thresholds.
Description The description of the rule.
Quick Create Rule Name The name of the rule.
Field The fields to be monitored. You can select All Fields in Table or specify fields. If you specify fields, you can apply the rule to the specified fields in the table.
Trigger
  • The trigger condition of the rule. If you select All Fields in Table for Field, you can set this parameter to The number of columns is greater than 0 or Table row number dynamic threshold.
    Notice You can only use the dynamic threshold feature in DataWorks Enterprise Edition or higher.
  • If you specify fields for Field, you can select The field value already exists, Null Field, and Unique value dynamic threshold.
    Notice You can only use the dynamic threshold feature in DataWorks Enterprise Edition or higher.

Test rules

After rules are configured for a partition filter expression, you can test all these rules and view the test results.
Note You can manually run these rules to test their configuration and notification methods. We recommend that you test rules as required.
  1. On the Monitoring Rules page of a table, click Test.
  2. In Test dialog box that appears, set the Data Timestamp parameter.
    Parameter Description
    Partition The partition filter expression for which rules are run. The actual partition key varies with the data timestamp. For a non-partitioned table, use NOPARTITIONTABLE as the partition filter expression.
    Data Timestamp The data timestamp for testing rules. The default value is the current time.
  3. Click Test.
  4. On the page that appears, click The test is complete. Click to view the results to view the test results on the Node Query page. For more information, see View monitoring results.

Manage subscriptions

By default, Data Quality sends notifications to the user who created a partition filter expression. You can add other users so that Data Quality sends notifications to them.

  1. On the Monitoring Rules page of a table, click Manage Subscriptions.
  2. In the Manage Subscriptions dialog box that appears, set Subscription Method.
    Data Quality supports the following four methods: Email, Email and SMS, DingTalk Chatbot, and DingTalk Chatbot @ALL.
    Note Add a DingTalk chatbot and obtain a webhook URL. Then, copy the webhook URL to the Manage Subscriptions dialog box.
  3. Click Save.

View operational logs

On the Monitoring Rules page of a table, click View Operational Log. In the Operations Logs pane that appears, you can view the information about each operation, including the user who performed the operation, the time when the operation was performed, and the operation details.

The Operations Logs displays all operations performed on the current partition filter expression.

View check results

On the Monitoring Rules page of a table, click View Check Results to go to the Node Query page. On this page, you can view the check results for all rules under the current partition filter expression.

Clone rules

  1. On the Monitoring Rules page of a table, click Clone Rules.
  2. In Clone Rules dialog box that appears, set Target Expression.
  3. Select Clone Subscribers or Change Table Names in Custom Rules as required.
  4. Click Clone.