Data Quality allows you to configure monitoring rules for E-MapReduce tables, Hologres tables, AnalyticDB for PostgreSQL tables, MaxCompute tables, and DataHub topics. This topic describes how to configure a rule for monitoring data in a MaxCompute data store.

Go to the Monitoring Rules page

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces.
  3. In the top navigation bar, select the region where the target workspace resides. Find the target workspace and click Data Analytics in the Actions column.
  4. On the DataStudio page, click the DataWorks icon icon in the upper-left corner and choose All Products > Data governance > Data Quality.
  5. On the Data Quality page, click Monitoring Rules in the left-side navigation pane.
  6. On the Monitoring Rules page, select MaxCompute from the Engine/Data Source drop-down list and select a MaxCompute project from the Engine/Database Instance drop-down list.
    Monitoring Rules page
    You can configure monitoring rules for E-MapReduce tables, Hologres tables, AnalyticDB for PostgreSQL tables, MaxCompute tables, and DataHub topics.
    • If you select an E-MapReduce, a Hologres, an AnalyticDB for PostgreSQL, or a MaxCompute data store, all tables in the data store are displayed.
    • If you select a DataHub data store, all topics in the data store are displayed.
  7. Find the table for which you want to configure monitoring rules and click View Monitoring Rules in the Actions column.
    Data Quality allows you to configure template rules and custom rules for a table or a topic.
    Notice Before you configure a template rule for a table, you must configure a partition filter expression. For more information, see Configure a partition filter expression.

Create a template rule

  1. On the Monitoring Rules page, find a table and click View Monitoring Rules in the Actions column.
  2. On the rule configuration page that appears, click the partition filter expression for which you want to configure a template rule. Then, click Create rules. The Template Rules tab appears in the Create rules panel.
    On the Template Rules tab, click Add Monitoring Rule or Quick Create.
    • Click Add Monitoring Rule.
      Set the parameters as described in the following table. In this example, set the Rule Source parameter to Built-in Template.
      Parameter Description
      Rule Name The name of the rule.
      Rule Type The type of the rule. Valid values: Rule Type and Soft.
      • If you select Rule Type, error alerts are reported and descendant nodes are blocked, whereas warning alerts are reported but descendant nodes are not blocked.
      • If you select Soft, error alerts are reported but descendant nodes are not blocked, whereas warning alerts are not reported and descendant nodes are not blocked.
      Auto-Generated Threshold Specifies whether to use dynamic thresholds.
      Notice You can use the dynamic threshold feature only in DataWorks Enterprise Edition or more advanced editions.
      Rule Source The source of the rule. Valid values: Built-in Template and Rule Templates.
      If you select Rule Templates, you must specify a rule template. For more information, see Manage rule templates.
      Notice You can select Rule Templates only in DataWorks Enterprise Edition or more advanced editions.
      Field The fields to be monitored. You can select All Fields in Table or a specific field of a numeric type or non-numeric type.
      Template The template that you want to apply to the rule. Data Quality supports 43 rule templates. For more information, see Built-in rule templates for offline data.
      Note You can set field-specific rules of the average value, accumulated value, minimum value, and maximum value only for numeric fields.
      Comparison Method The comparison method of the rule. Valid values: Absolute Value, Raise, and Drop.
      Thresholds
      • You can calculate the fluctuation by using the following formula:
        Fluctuation = (Sample - Baseline)/Baseline
        • Sample

          The sample value for the current day. For example, if you need to check the fluctuation of table rows on an SQL node in a day, the sample is the number of table rows on the current day.

        • Baseline
          The comparison value from the previous N days. Examples:
          • If you need to check the fluctuation of table rows on an SQL node in a day, the baseline is the number of table rows on the previous day.
          • If you need to check the fluctuation of the average number of table rows on an SQL node in seven days, the baseline is the average number of table rows in the last seven days.
      • You can calculate the fluctuation variance only for numeric fields such as BIGINT and DOUBLE fields by using the following formula:

        Fluctuation variance = (Sample - Baseline)/Standard deviation

      You can specify the warning threshold and error threshold of the fluctuation to monitor data at different severities:
      • If the fluctuation does not exceed the warning threshold, Data Quality determines that data is normal.
      • If the fluctuation exceeds the warning threshold but does not exceed the error threshold, Data Quality reports a warning alert.
      • If the fluctuation exceeds the error threshold, Data Quality reports an error alert.
      • If you do not specify the warning threshold, Data Quality reports error alerts or normal based on the monitoring result.
      • If you do not specify the error threshold, Data Quality reports warning alerts or normal based on the monitoring result.
      • If you do not specify the warning threshold and the error threshold, Data Quality reports error alerts if it detects anomalies. However, you must specify at least one of the two thresholds. If you specify neither of them, Data Quality applies the following default values: 10% for the warning threshold and 50% for the error threshold.
      Description The description of the rule.
    • Click Quick Create.
      Set the parameters as described in the following table.
      Parameter Description
      Rule Name The name of the rule.
      Field The fields to be monitored. You can select All Fields in Table or a specific field of a numeric type or non-numeric type.
      Trigger The trigger condition of the rule. Valid values: The number of columns is greater than 0 and Table row number dynamic threshold.
      Notice You can select Table row number dynamic threshold only in DataWorks Enterprise Edition or more advanced editions.
  3. Click Batch Create.

Create a custom rule

If template rules do not meet your requirements for monitoring the data quality, you can create custom rules.

  1. On the Monitoring Rules page, find a table and click View Monitoring Rules in the Actions column.
  2. On the rule configuration page that appears, click the partition filter expression for which you want to configure a custom rule. Then, click Create rules. The Template Rules tab appears in the Create rules panel.
  3. Click the Custom Rules tab.
    On the Custom Rules tab, click Add Monitoring Rule or Quick Create.
    • Click Add Monitoring Rule.
      You can select All Fields in Table, SQL Statement, or a specific field for the Field parameter.
      • If you select All Fields in Table or a specific field for the Field parameter, set the parameters as described in the following table.All Fields in Table
        Parameter Description
        Rule Name The name of the rule.
        Rule Type The type of the rule. Valid values: Rule Type and Soft.
        • If you select Rule Type, error alerts are reported and descendant nodes are blocked, whereas warning alerts are reported but descendant nodes are not blocked.
        • If you select Soft, error alerts are reported but descendant nodes are not blocked, whereas warning alerts are not reported and descendant nodes are not blocked.
        Field The fields to be monitored. In this example, select All Fields in Table. If you select All Fields in Table, you can use the WHERE clause to customize filter conditions based on business requirements.
        Sampling Method The statistical function of the rule. Valid values: count and count/table_count.
        Filter The filter condition of the rule. For example, if you want to query the partitions of the table based on a specific data timestamp, you can specify pt=$[yyyymmdd-1] as the filter condition.
        Check type The threshold type of the rule. Valid values: Numeric type, Fluctuation, and Auto-Generated Threshold.
        Note You can select Auto-Generated Threshold only in DataWorks Enterprise Edition or more advanced editions.
        Comparison Method The comparison method of the rule. The comparison methods that can be selected vary based on the threshold type.
        • If you set the Check type parameter to Numeric type, the valid values of this parameter are Greater Than, Greater Than or Equal To, Equal To, Unequal To, Less Than, and Less Than or Equal To.
        • If you set the Check type parameter to Fluctuation, the valid values of this parameter are Absolute Value, Raise, and Drop.
        Verification Method The verification method of the rule. The verification methods that can be selected vary based on the threshold type.
        • If you set the Check type parameter to Numeric type, you can set this parameter only to Compare with a specified value.
        • If you set the Check type parameter to Fluctuation, the valid values of this parameter are Compare the current value with the average value of the last 7 days, Compare the current value with the average value of the last 30 days, Compare the current value with the value 1 day before, Compare the current value with the value 7 days before, Compare the current value with the value 30 days before, The variance between the current value and the value 7 days before, The variance between the current value and the value 30 days before, Compare with the value 1, 7, and 30 days before, and Compare with the value of the previous cycle.
        Expected Value The expected value of the rule. If you set the Check type parameter to Numeric type, you must specify an expected value.
        Thresholds The warning threshold and error threshold of the fluctuation. If you set the Check type parameter to Fluctuation, you must specify a warning threshold and an error threshold for the fluctuation. You can enter thresholds or adjust the slider to specify thresholds.
        Description The description of the rule.
      • If you select SQL Statement for the Field parameter, set the parameters as described in the following table.SQL Statement
        Parameter Description
        Rule Name The name of the rule.
        Rule Type The type of the rule. Valid values: Rule Type and Soft.
        • If you select Rule Type, error alerts are reported and descendant nodes are blocked, whereas warning alerts are reported but descendant nodes are not blocked.
        • If you select Soft, error alerts are reported but descendant nodes are not blocked, whereas warning alerts are not reported and descendant nodes are not blocked.
        Field The fields to be monitored. If you select SQL Statement, you can customize the SQL logic to set a rule. The return value is the value in a row of a column.
        Sampling Method The statistical function of the rule. You can set this parameter only to SQL Statement.
        Set Flag The SET clause of the SQL statement that is used to query the fields to be monitored.
        Custom SQL The SQL statement that is used to query the fields to be monitored. You can specify only an SQL statement that returns the value in a row of a column.

        In the SQL statement, enclose the partition filter expression in brackets [].

        Check type The threshold type of the rule. Valid values: Numeric type and Fluctuation.
        Comparison Method The comparison method of the rule. The comparison methods that can be selected vary based on the threshold type.
        • If you set the Check type parameter to Numeric type, the valid values of this parameter are Greater Than, Greater Than or Equal To, Equal To, Unequal To, Less Than, and Less Than or Equal To.
        • If you set the Check type parameter to Fluctuation, the valid values of this parameter are Absolute Value, Raise, and Drop.
        Verification Method The verification method of the rule. The verification methods that can be selected vary based on the threshold type.
        • If you set the Check type parameter to Numeric type, you can set this parameter only to Compare with a specified value.
        • If you set the Check type parameter to Fluctuation, the valid values of this parameter are Compare the current value with the average value of the last 7 days, Compare the current value with the average value of the last 30 days, Compare the current value with the value 1 day before, Compare the current value with the value 7 days before, Compare the current value with the value 30 days before, The variance between the current value and the value 7 days before, The variance between the current value and the value 30 days before, Compare with the value 1, 7, and 30 days before, and Compare with the value of the previous cycle.
        Expected Value The expected value of the rule. If you set the Check type parameter to Numeric type, you must specify an expected value.
        Thresholds The warning threshold and error threshold of the fluctuation. If you set the Check type parameter to Fluctuation, you must specify a warning threshold and an error threshold for the fluctuation. You can enter thresholds or adjust the slider to specify thresholds.
        Description The description of the rule.
    • Click Quick Create. Set the parameters as described in the following table.Quick Create
      Parameter Description
      Rule Name The name of the rule.
      Trigger The trigger condition of the rule. You can select only Values Duplicated in Multiple Fields.
      Field The fields to be monitored.
  4. Click Batch Create.