All Products
Search
Document Center

DataWorks:Configuration example: MaxCompute

Last Updated:Sep 14, 2023

Monitoring rules are a key component of Data Quality. You can configure rules to monitor data in E-MapReduce (EMR), Hologres, AnalyticDB for PostgreSQL, MaxCompute, and CDH Hive. This topic describes how to configure monitoring rules for MaxCompute.

Add a MaxCompute data source

  1. Go to the Data Source page.

    1. Log on to the DataWorks console. In the left-side navigation pane, click Management Center. On the page that appears, select the desired workspace from the drop-down list and click Go to Management Center.

    2. In the left-side navigation pane of the page that appears, click Data Source. The Data Source page appears.

  2. In the upper-right corner of the Data Source page, click Add data source. In the Add data source dialog box, click MaxCompute in the Big Data Storage section. In the Add MaxCompute data source dialog box, configure the required parameters to add a MaxCompute data source. For more information, see Add a MaxCompute data source.

Select a data source

  1. Click the 图标 icon in the upper-left corner and choose All Products > Data Governance > Data Quality.

  2. In the left-side navigation pane, choose Rule Management > Configure Rule (by Table).

  3. On the page that appears, select MaxCompute from the Engine/Data Source drop-down list and select a MaxCompute project from the Engine/Database Instance drop-down list. All tables in the selected MaxCompute project are displayed.

    You can search for a table by name. You can use the initial letters of a table name to perform a fuzzy match.

  4. Find the table for which you want to configure a monitoring rule and click View Monitoring Rules in the Actions column.

Configure a partition filter expression

In Data Quality, you must configure a monitoring rule based on a partition filter expression:

  • To configure a monitoring rule for a non-partitioned table, you can specify NOTAPARTITIONTABLE as the partition filter expression.

  • To configure a monitoring rule for a partitioned table, you can specify a data timestamp expression, such as $[yyyymmdd-1], as the partition filter expression.

Note

Partition filter expressions based on which monitoring rules are created do not support braces, such as ${yyyymmdd-1}.

On the rule configuration page of the table, click the + icon to the right of Partition Filter Expression.添加分区表达式

You can create a partition filter expression or select a recommended partition filter expression.

  • Create a partition filter expression

    In the Add Partition dialog box, enter a partition filter expression that conforms to the syntax based on your business requirements. For a non-partitioned table, select NOTAPARTITIONTABLE from the recommended partition filter expressions.

    • For a table with only one level of partitions, specify the partition filter expression in the format of Partition key=Partition value. The partition value can be a constant or a built-in parameter.

    • For a table with multiple levels of partitions, specify the partition filter expression in the format of Partition key 1=Partition value/Partition key 2=Partition value/Partition key N=Partition value. Each partition value can be a constant or a built-in parameter. The partition filter expression must include information about all levels of partitions. You must enclose a parameter in brackets ([]), such as $[yyyymmdd-N].

    The data timestamp that is configured in a partition filter expression determines the recurrence of the partition filter expression. For example, if the data timestamp indicates the date that is five days earlier than the current date, the partition filter expression is triggered every five days. The following table describes the supported partition filter expressions.

    Partition filter expression

    Description

    dt=$[yyyymmdd-N]

    Indicates the date that is N days earlier than the current date.

    dt=$[hh24miss-1/24]

    Indicates the time that is 1 hour earlier than the current time.

    dt=$[hh24miss-30/24/60]

    Indicates the time that is 30 minutes earlier than the current time.

    dt=$[add_months(yyyymmdd,-1)]

    Indicates the date that is one month earlier than the current date. The date is accurate to the day.

    $[yyyymmdd]

    Indicates the date when the node instance is run.

    $[yyyymmdd-1]

    Indicates the date that is one day earlier than the date when the node instance is run.

    $[yyyymmddhh24miss]

    Indicates the scheduling time of the node instance in the yyyymmddhh24miss format:

    • yyyy indicates a four-digit year.

    • mm indicates a two-digit month.

    • dd indicates a two-digit day.

    • hh24 indicates a two-digit hour (24-hour clock).

    • mi indicates two-digit minutes.

    • ss indicates two-digit seconds.

    NOTAPARTITIONTABLE

    Indicates the partition filter expression of a non-partitioned table.

  • Select a recommended partition filter expression

    This section describes how to select a recommended partition filter expression. In this example, the dt partition key is used.

    1. In the Add Partition dialog box, click the Partition Filter Expression field. A drop-down list appears to show you the partition filter expressions that are recommended by Data Quality.

      • Select a recommended partition filter expression that meets your business requirements.

      • If no recommended partition filter expressions meet your business requirements, specify a custom partition filter expression.

    2. After you enter a partition filter expression, click Verify. Data Quality uses the current time, which is the scheduling time, to calculate data and verify the partition filter expression.计算

    3. Click OK.

If you need to delete a specified partition filter expression, move the pointer over the partition filter expression and click the delete icon. After you delete a partition filter expression, all rules that are configured based on the partition filter expression are also deleted.

Associate a monitoring rule that is created based on a partition filter expression with scheduling nodes

To monitor the quality of offline data generated by scheduling nodes, you must associate a monitoring rule that is created based on a partition filter expression with the scheduling nodes that generate table data.

  • You can associate a monitoring rule with scheduling nodes that generate table data only after you deploy the scheduling nodes.

  • Before you associate a monitoring rule with scheduling nodes in a workspace, make sure that you are assigned at least one of the Workspace Manager, Development, and O&M roles in the workspace.

You can associate a monitoring rule with one or more nodes. After you associate a monitoring rule with nodes, data quality monitoring tasks are automatically run to monitor the quality of offline data generated by the nodes.

Note

Data Quality allows you to associate a monitoring rule with nodes in a flexible manner. You can select a node that is not related to your table.

  1. On the rule configuration page of a table, click Manage Linked Nodes.

    关联调度
  2. In the Manage Linked Nodes dialog box, enter the name of the node that you want to associate with the monitoring rule.

    配置质量监控
  3. Click Create.

Create a rule

Monitoring rules are a key component of Data Quality. You can create monitoring rules for your tables based on your business requirements.

Data Quality allows you to create template rules and custom rules based on your business requirements. If you want to create a template rule or a custom rule, you can click Add Monitoring Rule or Quick Create. For more information, see Configure monitoring rules by table (for a single table).

After monitoring rules are configured, you can click Batch Create to save all the configured monitoring rules for the current partition filter expression.

Creation method

Parameter

Description

Add Monitoring Rule

Rule Name

The name of the monitoring rule.

Rule Type

The strength of the monitoring rule. Valid values:

  • Strong: If the critical threshold is exceeded, Data Quality reports critical alerts and determines that the related nodes fail. If the warning threshold is exceeded, Data Quality reports warning alerts but determines that the related nodes are successful.

  • Soft: If the critical threshold is exceeded, Data Quality reports critical alerts but determines that the related nodes are successful. If the warning threshold is exceeded, Data Quality does not report warning alerts and determines that the related nodes are successful.

Auto-Generated Threshold

Specifies whether to use dynamic thresholds. You do not need to configure thresholds. The system automatically checks the metrics in real time based on algorithm models. If the value of a metric falls outside a reasonable range, an alert is reported.

Important

You can use dynamic thresholds only in DataWorks Enterprise Edition or a more advanced edition.

Rule Source

The source for the monitoring rule. Valid values: Built-in Template and Rule Templates.

Field

The fields that you want to monitor. You can select all fields in a table or a specific field. If you select a field, you can apply the monitoring rule to the specified field in the table.

Note

In this example, select All Fields in Table and configure other parameters for the table-specific rule.

Template

  • The template that you want to apply to the monitoring rule. If you set the Rule Source parameter to Built-in Template, the built-in table-specific rules are displayed.

  • If you set the Rule Source parameter to Rule Templates, you must configure parameters such as Sampling Method and Set Flag. For more information, see Create, manage, and use rule templates.

Comparison Method

The comparison method of the monitoring rule. Valid values: Absolute Value, Raise, and Drop.

Thresholds

The warning threshold and critical threshold of the fluctuation. You can enter thresholds or adjust the slider to specify thresholds.

Description

The description of the monitoring rule.

Quick Create

Rule Name

The name of the monitoring rule.

Field

The fields that you want to monitor. You can select all fields in a table or a specific field. If you select a field, you can apply the monitoring rule to the specified field in the table.

Trigger

  • The trigger condition of the monitoring rule. If you select All Fields in Table for the Field parameter, you can set this parameter to The number of columns is greater than 0 or Table row number dynamic threshold.

    Important

    You can use dynamic thresholds only in DataWorks Enterprise Edition or a more advanced edition.

  • If you select a field for the Field parameter, the valid values of the Trigger parameter include The field value already exists, Null Field, and Unique value dynamic threshold.

    Important

    You can use dynamic thresholds only in DataWorks Enterprise Edition or a more advanced edition.

Test monitoring rules

After you configure monitoring rules based on a partition filter expression, you can test all the monitoring rules and view the test results.

Note

You can test the validity of the monitoring rules and notification methods. We recommend that you test monitoring rules based on your business requirements.

  1. On the rule configuration page of a table, click Test.

  2. In the Test dialog box, configure the Scheduling Time parameter.

    Parameter

    Description

    Partition

    The partition filter expression based on which monitoring rules are run. The actual partition varies based on the specified data timestamp. For a non-partitioned table, NOTAPARTITIONTABLE is used as the partition filter expression.

    Scheduling Time

    The time when you test monitoring rules. The default value is the current time.

  3. Click Test.

  4. In the Test dialog box, click The test is complete. Click to view the results. On the Node Query page, view the test results. For more information, see View monitoring results.

Manage subscriptions

By default, Data Quality sends notifications to the user who created a partition filter expression. You can also specify other users to whom you want Data Quality to send notifications.

  1. On the rule configuration page of a table, click Manage Subscriptions.

  2. In the Manage Subscriptions dialog box, specify the notification method and notification recipient.

    Data Quality supports the following notification methods: Email, Email and SMS, DingTalk Chatbot, DingTalk Chatbot @ALL, Lark Group Chatbot, Enterprise WeChat Chatbot, and Custom Webhook.

    Note
    • Add a DingTalk chatbot, Lark chatbot, or WeChat chatbot and obtain a webhook URL. Then, copy the webhook URL to the Recipient field in the Manage Subscriptions dialog box.

    • The Custom Webhook notification method is supported only in DataWorks Enterprise Edition. For information about the message format of an alert notification sent by using a custom webhook, see the "Appendix: Message format of alert notifications sent by using a custom webhook URL" section in Configure monitoring rules for multiple tables by template.

  3. Click Close.

View operation logs

On the rule configuration page of a table, click View Operation Log. In the View Operation Logs panel, you can view the information about each operation, including the user who performed the operation, the time when the operation was performed, and the operation details.

The Details column displays the details of each operation that is performed on the current partition filter expression, including the rule configuration details.

View check results

On the rule configuration page of a table, click View Check Results to go to the Node Query page. On this page, you can view the check results of all monitoring rules that are configured based on the current partition filter expression.

Clone rules

  1. On the rule configuration page of a table, click Clone Rules.

  2. In the Clone Rules dialog box, configure the New Expression parameter.

  3. Select Clone Subscribers or Change Table Names in Custom Rules based on your business requirements.

  4. Click Clone.