The Monitoring Rules page is the most important part of Data Quality. On this page, you can configure rules to monitor data in E-MapReduce, Hologres, AnalyticDB for PostgreSQL, MaxCompute, and DataHub. This topic describes how to configure monitoring rules for MaxCompute.

Create a MaxCompute connection

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces.
  3. On the Workspaces page, find the workspace in which you want to create a connection and click Data Integration in the Actions column.
  4. On the Data Integration page, click Connection in the left-side navigation pane. The Data Source page appears.
  5. Click New data source in the upper-right corner. In the Add data source dialog box, set the parameters as required to create a MaxCompute connection. For more information, see Configure a MaxCompute connection.

Select the MaxCompute connection

  1. On the current page, click the DataWorks icon icon in the upper-left corner and choose All Products > Data governance > Data Quality.
  2. On the Data Quality page, click Monitoring Rules in the left-side navigation pane.
  3. On the Monitoring Rules page, select MaxCompute from the Engine/Data Source drop-down list and select a MaxCompute project. All the tables in the selected MaxCompute project are displayed.
    You can search for a table by table name. Fuzzy search based on the initial letters of a table name is supported.
  4. Find the table for which you want to configure monitoring rules and click View Monitoring Rules in the Actions column.

Configure a partition filter expression

In Data Quality, you must configure rules based on a partition filter expression:
  • To configure rules for a non-partitioned table, you can specify NOTAPARTITIONTABLE as the partition filter expression.
  • To configure rules for a partitioned table, you can specify a data timestamp expression, such as $[yyyymmdd], or a regular expression as the partition filter expression.
On the rule configuration page of a table, click the + icon next to Partition Expression.Add a partition filter expression
You can create a partition filter expression or select a recommended partition filter expression.
  • Create a partition filter expression
    In the Add Partition dialog box, enter a partition filter expression that conforms to the syntax as required. For a non-partitioned table, select NOTAPARTITIONTABLE from the recommended partition filter expressions.
    • For a table with only one partition, follow the format: Partition key=Partition value. The partition value can be a constant or a system parameter. You must configure partition filter expressions by using the last partition.
    • For a table with multiple partitions, follow the format: Partition key 1=Partition value/Partition key 2=Partition value/Partition key N=Partition value. Each partition value can be a constant or a system parameter. You must enclose a parameter in brackets [], such as $[yyyymmdd-N].
    The data timestamp that is configured in a partition filter expression also determines the recurrence of the partition filter expression. For example, if the data timestamp is the date of five days ago, the partition filter expression is triggered every five days. The following table describes the supported partition filter expressions.
    Partition filter expression Description
    dt=$[yyyymmdd-N] Indicates N days before.
    dt=$[yyyymm01-1] Indicates the first day of each month.
    dt=$[yyyymm01-Nm] Indicates the first day of the month that is N months before the current month.
    dt=$[yyyymmld-1] Indicates the last day of each month.
    dt=$[yyyymmld-1m] Indicates the last day of the month that is N months before the current month.
    dt=$[hh24miss-1/24] Indicates one hour before the hour that is specified by the data timestamp.
    dt=$[hh24miss-30/24/60] Indicates half an hour before the hour that is specified by the data timestamp.
    $[yyyymmdd] Indicates the data timestamp.
    $[yyyymmdd-1] Indicates one day before the data timestamp of the current instance.
    $[yyyymmddhh24miss] Indicates the data timestamp of the current instance. Follow the yyyymmddhh24miss format, where:
    • yyyy indicates a four-digit year.
    • mm indicates a two-digit month.
    • dd indicates a two-digit day.
    • hh24 indicates a two-digit hour (24-hour clock).
    • mi indicates two-digit minutes.
    • ss indicates two-digit seconds.
    NOTAPARTITIONTABLE Indicates the partition filter expression of a non-partitioned table.
  • Select a recommended partition filter expression
    This section describes how to select a recommended partition filter expression. In this example, the dt partition is used. We recommend that you specify a regular expression as the partition filter expression for a dynamic partitioned table.
    1. In the Add Partition dialog box, click the Partition Expression field. A drop-down list appears to show you the partition filter expressions that are recommended by Data Quality.
      • Select a recommended partition filter expression that meets your expectation.
      • Specify a custom partition filter expression if no recommended partition filter expressions meet your expectation.
    2. After you enter a partition filter expression, click Verify. Data Quality uses the current time as the data timestamp to calculate data and verify the partition filter expression.Verify
    3. Click OK.

If you need to delete a configured partition filter expression, move the pointer over the partition filter expression and click the Delete icon. After you delete a partition filter expression, all rules that are configured based on the partition filter expression are also deleted.

Link a partition filter expression to a node

To monitor the quality of data involved in a node, you must link a partition filter expression to the node.
  • The Manage Linked Nodes dialog box lists all committed nodes. Data Quality allows you to link a partition filter expression to a node in another workspace.
  • Before you link a partition filter expression to a node in another workspace, make sure that you are an administrator, a developer, or an administration expert in the two workspaces.
You can link a partition filter expression to one or more nodes. After nodes are linked, Data Quality can automatically monitor linked nodes.
Note Data Quality allows you to flexibly link a partition filter expression to a node. You can select a node that is not related to your table.
  1. On the rule configuration page of a table, click Manage Linked Nodes.
    Manage Linked Nodes button
  2. In the Manage Linked Nodes dialog box, enter the name of the node that you want to link to the partition filter expression.
    Manage Linked Nodes dialog box
  3. Click Create.

Create a rule

The Monitoring Rules page is the most important part of Data Quality, where you can create rules for your tables.

Data Quality allows you to create template rules and custom rules as needed. If you want to create a template rule or a custom rule, you can click Add Monitoring Rule or Quick Create. For more information, see Configure monitoring rules.

After rules are configured, you can click Batch Create to save all the configured rules for the current partition filter expression.
Creation method Parameter Description
Add Monitoring Rule Rule Name The name of the rule.
Rule Type The type of the rule. Valid values:
  • Rule Type: If a node reaches the error threshold, Data Quality reports an error alert and determines that the node fails. If a node reaches the warning threshold, Data Quality reports a warning alert and determines that the node is successful.
  • Soft: If a node reaches the error threshold, Data Quality reports an error alert and determines that the node is successful. If a node reaches the warning threshold, Data Quality does not report a warning alert and determines that the node is successful.
Auto-Generated Threshold Specifies whether to use dynamic thresholds. You can use the dynamic threshold feature only in DataWorks Enterprise Edition or more advanced editions.
Rule Source The source of the rule. Valid values: Built-in Template and Rule Templates.
Field The fields to be monitored. You can select All Fields in Table or a specific field. If you select a field, you can apply the rule to the specified field in the table.
Note In this example, select All Fields in Table and set other parameters for the table-specific rule.
Template
  • The template that you want to apply to the rule. If you set the Rule Source parameter to Built-in Template, the built-in table-specific rules are displayed.
  • If you set the Rule Source parameter to Rule Templates, you must set parameters such as Sampling Method and Set Flag. For more information, see Manage rule templates.
Comparison Method The comparison method of the rule. Valid values: Absolute Value, Raise, and Drop.
Thresholds The warning threshold and error threshold of the fluctuation. You can enter thresholds or adjust the slider to specify thresholds.
Description The description of the rule.
Quick Create Rule Name The name of the rule.
Field The fields to be monitored. You can select All Fields in Table or a specific field. If you select a field, you can apply the rule to the specified field in the table.
Trigger
  • The trigger condition of the rule. If you select All Fields in Table for the Field parameter, you can set this parameter to The number of columns is greater than 0 or Table row number dynamic threshold.
    Notice You can use the dynamic threshold feature only in DataWorks Enterprise Edition or more advanced editions.
  • If you select a field for the Field parameter, you can select The field value already exists, Null Field, Unique value dynamic threshold, Summary value dynamic threshold, Average dynamic threshold, Maximum dynamic threshold, or Minimum dynamic threshold.
    Notice You can use the dynamic threshold feature only in DataWorks Enterprise Edition or more advanced editions.

Test rules

After rules are configured for a partition filter expression, you can test all these rules and view the test results.
Note You can manually run these rules to test their configurations and notification methods. We recommend that you test rules as required.
  1. On the rule configuration page of a table, click Test.
  2. In the Test dialog box, set the Data Timestamp parameter.
    Parameter Description
    Partition The partition filter expression for which rules are run. The actual partition key varies based on the data timestamp. For a non-partitioned table, NOPARTITIONTABLE is used as the partition filter expression.
    Data Timestamp The data timestamp for testing rules. The default value is the current time.
  3. Click Test.
  4. In the Test dialog box, click The test is complete. Click to view the results. On the Node Query page, view the test results. For more information, see View monitoring results.

Manage subscriptions

By default, Data Quality sends notifications to the user who created a partition filter expression. You can add other users so that Data Quality sends notifications to them.

  1. On the rule configuration page of a table, click Manage Subscriptions.
  2. In the Manage Subscriptions dialog box, specify the notification method and notification recipient.
    Data Quality supports the following four methods: Email, Email and SMS, DingTalk Chatbot, and DingTalk Chatbot @ALL.
    Note Add a DingTalk chatbot and obtain a webhook URL. Then, copy the webhook URL to the Manage Subscriptions dialog box. For more information, see Add a DingTalk chatbot and obtain a webhook URL.
  3. Click Save.

View operations logs

On the rule configuration page of a table, click View Operation Log. In the Operations Logs panel, you can view the information about each operation, including the user who performed the operation, the time when the operation was performed, and the operation details.

The Details column displays the details of each operation that is performed on the current partition filter expression, including the rule configuration details.

View check results

On the rule configuration page of a table, click View Check Results to go to the Node Query page. On this page, you can view the check results for all rules under the current partition filter expression.

Clone rules

  1. On the rule configuration page of a table, click Clone Rules.
  2. In the Clone Rules dialog box, set the Target Expression parameter.
  3. Select Clone Subscribers and Change Table Names in Custom Rules as required.
  4. Click Clone.