The Monitoring Rules page is the most important part of Data Quality. On this page, you can configure rules to monitor data in E-MapReduce, Hologres, AnalyticDB for PostgreSQL, MaxCompute, and DataHub. This topic describes how to configure monitoring rules for DataHub.

Background information

DataHub monitoring supports the following features:
  • Templates for monitoring stream discontinuity and data latency
  • Stream processing features, such as custom Flink SQL, dimension table JOIN, multi-stream JOIN, and window functions

Procedure

  1. Create a DataHub connection.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides. Find the workspace and click Data Integration in the Actions column.
    4. On the Data Integration page, click Connection in the left-side navigation pane. The Data Source page appeas.
    5. Click New data source in the upper-right corner. In the Add data source dialog box, set the parameters as required to create a DataHub connection. For more information, see Configure a DataHub connection.
  2. Select the DataHub connection.
    1. On the current page, click the DataWorks icon icon in the upper-left corner and choose All Products > Data governance > Data Quality.
    2. On the Data Quality page, click Monitoring Rules in the left-side navigation pane.
    3. On the Monitoring Rules page, select Datahub from the Engine/Data Source drop-down list and select the DataHub connection. All the topics in the selected DataHub data store are displayed.
      GUI element Description
      Configure Flink/SLS Resources After you create a connection, click Configure Flink/SLS Resources to configure Realtime Compute and Log Service resources related to the connection.
      Topics The Topics tab lists all topics in the DataHub data store. You can click the following buttons in the Actions column for a topic:
      • View Monitoring Rules: Click it to create rules for the topic. You can create template rules and custom rules as needed.
      • Manage Subscriptions: Click it to view and modify subscribers to the topic, and change the notification method. You can use a DingTalk chatbot to receive notifications. The changed notification method takes effect for all subscribers to the topic.
      Dimension Tables When you create custom rules for a topic, you can create dimension tables and use the JOIN clause to join dimension tables. If the collected data streams lack some fields for a dimension table, you must supplement fields to data streams before data analysis and declare the dimension table in Data Quality.

      DataHub supports the dimension tables of ApsaraDB for HBase, Lindorm, ApsaraDB RDS, Tablestore, Taobao Distributed Data Layer (TDDL), and MaxCompute.

      Flink SQL does not design the data definition language (DDL) syntax for dimension tables. You can use the standard CREATE TABLE statement. However, you must add period for system_time to specify the period of a dimension table and declare that the dimension table stores time-varying data.
      Note When you declare a dimension table, you must specify the primary key. When you join a dimension table with another table, the ON condition must contain an equivalence condition that includes the primary key of either table.
    4. Click the Topics tab. Find the topic for which you want to configure monitoring rules and click View Monitoring Rules in the Actions column.
  3. On the rule configuration page of the topic, click Create Rule.
  4. Create a monitoring rule.
    In Data Quality, you can create template rules and custom rules as needed.
    • On the Template Rules tab of the Create rules panel, click Create Template Rule. Two templates are available: Data Delay and Stream Discontinuity.
      For example, you can select Data Delay for the Template Type parameter.Data Delay
      Parameter Description
      Rule Name The name of the rule. The name can be up to 255 characters in length.
      Field Type The fields to be monitored. By default, this parameter is set to All Fields in Table.
      Template Type
      • Data Delay: monitors the interval between the time when data is generated and the time when data is written to DataHub based on the data timestamp field. If the interval exceeds a specified threshold, an alert is generated.
        Note The data timestamp field supports two data types: TIMESTAMP and STRING (yyyy-MM-dd HH:mm:ss).
      • Stream Discontinuity: monitors the period during which no data is written to DataHub. If the period exceeds a specified threshold, an alert is generated.

        Before you configure a stream discontinuity rule, you must activate Realtime Compute and create a project. On the Monitoring Rules page, click Configure Flink/SLS Resources in the upper-right corner. In the dialog box that appears, specify the Realtime Compute project and click OK.

      Alerts Threshold The maximum number of alerts generated for data latency. Data Quality reports an alert when the number of alerts generated for data latency exceeds this threshold. This parameter is displayed only when you select Data Delay for the Template Type parameter.
      Data Timestamp Field The data timestamp field of the topic for which the rule is created. This field supports two data types: TIMESTAMP and STRING (yyyy-MM-dd HH:mm:ss). This parameter is displayed only when you select Data Delay for the Template Type parameter.
      Alert Frequency The interval at which alerts are reported. You can set the alert interval to 10 minutes, 30 minutes, 1 hour, or 2 hours.
      Warning Threshold The warning threshold, in seconds. The value must be an integer and less than the error threshold.
      Error Threshold The error threshold, in seconds. The value must be an integer and greater than the warning threshold.
    • If template rules do not meet your requirements for monitoring the data quality of DataHub topics, you can create a custom rule. On the Custom Rules tab of the Create rules panel, click Create Custom Rule.
      Note
      • The field in the SELECT clause must be a column. Make sure that you can compare the field values with the warning threshold and error threshold.
      • The FROM clause must include the current topic and all its columns.
      Parameter Description
      Rule Name The name of the rule. The name must be unique in the topic and can be up to 20 characters in length.
      Script The custom SQL script that can be used to set a rule. The return value of the SELECT clause must be unique. Examples:
      • Use a simple SQL statement.
        select id as a from zmr_tst02;
      • Join the topic and a dimension table named test_dim.
        select e.id as eid
        from zmr_test02 as e 
        join test_dim for system_time as of proctime() as w 
        on e.id=w.id
      • Join the topic and another topic named dp1test_zmr01.
        select count(newtab.biz_date) as aa
        from (select o.*
        from zmr_test02 as o
        join dp1test_zmr01 as p
        on o.id=p.id)newtab
        group by id.biz_date,biz_date_str,total_price,'timestamp'
      Warning Threshold The warning threshold, in minutes. The value must be an integer and less than the error threshold.
      Error Threshold The error threshold, in minutes. The value must be an integer and greater than the warning threshold.
      Minimum Alert Interval The minimum interval at which alert are reported, in minutes.
      Description The description of the rule.
  5. Click Batch Create. After rules are created for the topic, you can perform the following operations:
    • View Log: Click it to view the operational logs of the rules.
    • Manage Subscriptions: Click it to view and modify subscribers to the rules, and change the notification method. The changed notification method takes effect for all subscribers to the rules.
      Data Quality supports the following four methods: Email, Email and SMS, DingTalk Chatbot, and DingTalk Chatbot @ALL.
      Note Add a DingTalk chatbot and obtain a webhook URL. Then, copy the webhook URL to the Manage Subscriptions dialog box. For more information, see Add a DingTalk chatbot and obtain a webhook URL.