Data Quality allows you to configure rules for monitoring data in E-MapReduce, AnalyticDB for PostgreSQL, MaxCompute, and Datahub data stores. This topic describes how to configure a rule for monitoring data in a MaxCompute data store.

Go to the Monitoring Rules page

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces.
  3. Select the region of the workspace, and click Data Analytics.
  4. On the DataStudio page that appears, click Icon in the upper-left corner and choose All Products > Data Quality.
  5. On the page that appears, click Monitoring Rules in the left-side navigation pane.
  6. On the Monitoring Rules page that appears, select MaxCompute from the Engine/Data Source drop-down list and select a project from the Engine/Database Instance drop-down list. A MaxCompute data store is specified.
    Monitoring Rules page
    Data Quality supports E-MapReduce, AnalyticDB for PostgreSQL, MaxCompute, and Datahub data stores.
    • If you specify an E-MapReduce, an AnalyticDB for PostgreSQL, or a MaxCompute data store, all tables in the data store appear.
    • If you specify a Datahub data store, all topics in the data store appear.
  7. Find the target table and click View Monitoring Rules in the Actions column. The rule configuration page for the table appears.
    Data Quality allows you to configure template rules and custom rules for a table.
    Notice Before configuring a template rule for a table, you must configure a partition filter expression. For more information, see Configure a partition filter expression.

Create a template rule

  1. On the Monitoring Rules page, click View Monitoring Rules in the Actions column of a table. The rule configuration page for the table appears.
  2. Select a partition filter expression in the Partition Expression section and click Create rules. By default, the Template Rules tab appears in the Create rules pane.
    You can click Add Monitoring Rule or Quick Create to create a template rule.
    • If you click Add Monitoring Rule:
      Set parameters as described in the following table. In this example, set Rule Source to Built-in Template.
      Parameter Description
      Rule Name The name of the rule.
      Rule Type The type of the rule. Valid values: Strong and Weak.
      • If you select Strong, error alerts are reported and descendant nodes are blocked, whereas warning alerts are reported but descendant nodes are not blocked.
      • If you select Weak, error alerts are reported but descendant nodes are not blocked, whereas warning alerts are not reported and descendant nodes are not blocked.
      Auto-Generated Threshold Specifies whether to use dynamic thresholds. Turn on this switch based on your needs.
      Note The Auto-Generated Threshold switch is only available in DataWorks Enterprise Edition or higher.
      Rule Source The source of the rule. Valid values: Built-in Template and Rule Templates.
      If you select Rule Templates, you must specify a rule template. For more information, see Manage rule templates.
      Note You can only select Rule Templates in DataWorks Enterprise Edition or higher.
      Field The fields to be monitored. You can select All Fields in Table or specify fields. If you specify fields, you can specify fields of a numeric type or non-numeric type.
      Template The template to apply to the rule. Currently, Data Quality supports 43 rule templates. For more information, see Built-in rule templates for offline data.
      Note You can only set field-specific rules of the average value, accumulated value, minimum value, and maximum value for numeric fields.
      Comparison Method The comparison method of the rule. Valid values: Absolute Value, Raise, and Drop.
      Thresholds
      • You can calculate the fluctuation by using the following formula:
        Fluctuation = (Sample - Baseline)/Baseline.
        • Sample: the sample value for the current N days. For example, if you want to check the fluctuation of table rows on an SQL node in a day, the sample is the number of table rows on that day.
        • Baseline: the comparison value from the previous N days. Examples:
          • If you want to check the fluctuation of table rows on an SQL node in a day, the baseline is the number of table rows on the previous day.
          • If you want to check the fluctuation of the average number of table rows on an SQL node in seven days, the baseline is the average number of table rows in the last seven days.
      • You can only calculate the fluctuation variance for numeric fields such as Bigint and Double fields. The formula is as follows:

        Fluctuation variance = (Sample - Baseline)/Standard deviation

      You can set Warning Threshold and Error Threshold to monitor data at different severities:
      • If the fluctuation does not exceed the warning threshold, Data Quality determines that data is normal.
      • If the fluctuation exceeds the warning threshold but does not exceed the error threshold, Data Quality reports a warning alert.
      • If the fluctuation exceeds the error threshold, Data Quality reports an error alert.
      • If you do not specify the warning threshold, Data Quality reports error alerts or normal based on the monitoring result.
      • If you do not specify the error threshold, Data Quality reports warning alerts or normal based on the monitoring result.
      • If you specify neither the warning threshold nor the error threshold, Data Quality reports error alerts if it detects anomalies. However, you must specify at least one of the two thresholds. If you specify neither of them, Data Quality applies default values, that is, 10% for the warning threshold and 50% for the error threshold.
      Description The description of the rule.
    • If you click Quick Create:
      Set parameters as described in the following table.
      Parameter Description
      Rule Name The name of the rule.
      Field The fields to be monitored. You can select All Fields in Table or specify fields. If you specify fields, you can specify fields of a numeric type or non-numeric type.
      Trigger The trigger condition of the rule. Valid values: The number of columns is greater than 0 and Table row number dynamic threshold.
      Note You can only select Table row number dynamic threshold in DataWorks Enterprise Edition or higher.
  3. Click Batch Create.

Create a custom rule

If template rules do not meet your requirements for monitoring the data quality, you can create custom rules.

  1. On the Monitoring Rules page, click View Monitoring Rules in the Actions column of a table. The rule configuration page for the table appears.
  2. Select a partition filter expression in the Partition Expression section and click Create rules. By default, the Template Rules tab appears in the Create rules pane.
  3. Click Custom Rules.
    You can click Add Monitoring Rule or Quick Create to create a custom rule.
    • If you click Add Monitoring Rule, set parameters as described in the following tables.
      You can select All Fields in Table, SQL Statement, or a specific field for the Field parameter.
      • If you select All Fields in Table or a specific field for Field, set parameters as described in the following table.All Fields in Table
        Parameter Description
        Rule Name The name of the rule.
        Rule Type The type of the rule. Valid values: Strong and Weak.
        • If you select Strong, error alerts are reported and descendant nodes are blocked, whereas warning alerts are reported but descendant nodes are not blocked.
        • If you select Weak, error alerts are reported but descendant nodes are not blocked, whereas warning alerts are not reported and descendant nodes are not blocked.
        Field The fields to be monitored. In this example, select All Fields in Table. If you select All Fields in Table, you can specify the WHERE clause to customize filter conditions based on business requirements.
        Sampling Method The statistical function. Valid values: count and count/table_count.
        Filter The filter condition of the rule. For example, if you want to query partitions of the table based on a specific data timestamp, you can specify pt=$[yyyymmdd-1] as the filter condition.
        Check type The threshold type of the rule. Valid values: Numeric type, Fluctuation, and Auto-Generated Threshold.
        Note You can only select Auto-Generated Threshold in DataWorks Enterprise Edition or higher.
        Comparison Method
        • The comparison method of the rule. If you set Check type to Numeric type, the values that are optional for this parameter include Greater Than, Greater Than or Equal To, Equal To, Unequal To, Less Than, and Less Than or Equal To.
        • If you set Check type to Fluctuation, the values that are optional for this parameter include Absolute Value, Raise, and Drop.
        Verification Method
        • The verification method of the rule. If you set Check type to Numeric type, you can only set this parameter to Compare with a specified value.
        • If you set Check type to Fluctuation, the values that are optional for this parameter include Compare the current value with the average value of the last 7 days, Compare the current value with the average value of the last 30 days, Compare the current value with the value 1 day before, Compare the current value with the value 7 days before, Compare the current value with the value 30 days before, The variance between the current value and the value 7 days before, The variance between the current value and the value 30 days before, Compare with the value 1, 7, and 30 days before, and Compare with the value of the previous cycle.
        Expected Value The expected value of the rule. If you set Check type to Numeric type, you must specify an expected value.
        Thresholds The warning threshold and error threshold. If you set Check type to Fluctuation, you need to specify a warning threshold and an error threshold for the fluctuation. You can adjust the slider to specify thresholds or directly enter thresholds.
        Description The description of the rule.
      • If you select SQL Statement for Field, set parameters as described in the following table.SQL Statement
        Parameter Description
        Rule Name The name of the rule.
        Rule Type The type of the rule. Valid values: Strong and Weak.
        • If you select Strong, error alerts are reported and descendant nodes are blocked, whereas warning alerts are reported but descendant nodes are not blocked.
        • If you select Weak, error alerts are reported but descendant nodes are not blocked, whereas warning alerts are not reported and descendant nodes are not blocked.
        Field The fields to be monitored. If you select SQL Statement, you can customize the SQL logic to set a rule. The return value is the value in a row of a column.
        Sampling Method The statistical function. You can only set this parameter to SQL Statement.
        Set Flag The SET clause of the SQL statement for querying the field to be monitored.
        Custom SQL The SQL statement for querying the field to be monitored. You can only specify an SQL statement that returns the value in a row of a column.
        Check type The threshold type of the rule. Valid values: Numeric type and Fluctuation.
        Comparison Method
        • The comparison method of the rule. If you set Check type to Numeric type, the values that are optional for this parameter include Greater Than, Greater Than or Equal To, Equal To, Unequal To, Less Than and Less Than or Equal To.
        • If you set Check type to Fluctuation, the values that are optional for this parameter include Absolute Value, Raise, and Drop.
        Verification Method
        • The verification method of the rule. If you set Check type to Numeric type, you can only set this parameter to Compare with a specified value.
        • If you set Check type to Fluctuation, the values that are optional for this parameter include Compare the current value with the average value of the last 7 days, Compare the current value with the average value of the last 30 days, Compare the current value with the value 1 day before, Compare the current value with the value 7 days before, Compare the current value with the value 30 days before, The variance between the current value and the value 7 days before, The variance between the current value and the value 30 days before, Compare with the value 1, 7, and 30 days before, and Compare with the value of the previous cycle.
        Expected Value The expected value of the rule. If you set Check type to Numeric type, you must specify an expected value.
        Thresholds The warning threshold and error threshold. If you set Check type to Fluctuation, you need to specify a warning threshold and an error threshold for the fluctuation. You can adjust the slider to specify thresholds or directly enter thresholds.
        Description The description of the rule.
    • If you click Quick Create, set parameters as described in the following table.Quick Create
      Parameter Description
      Rule Name The name of the rule.
      Rule Type The type of the rule. You can only select Values Duplicated in Multiple Fields.
      Field The fields to be monitored.
  4. Click Batch Create.