Data Quality allows you to configure monitoring rules for data in E-MapReduce, Hologres, AnalyticDB for PostgreSQL, MaxCompute, and DataHub data stores. This topic describes how to configure a rule for monitoring data in a MaxCompute data store.

Go to the Monitoring Rules page

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces.
  3. After you select the region where the required workspace resides, find the workspace and click Data Analytics.
  4. Click the Icon icon in the upper-left corner and choose All Products > Data governance > Data Quality.
  5. In the left-side navigation pane, click Monitoring Rules.
  6. Select MaxCompute from the Engine/Data Source drop-down list and select a MaxCompute project from the Engine/Database Instance drop-down list.
    Monitoring Rules
    Data Quality supports E-MapReduce, Hologres, AnalyticDB for PostgreSQL, MaxCompute, and DataHub data stores.
    • If you select an E-MapReduce, a Hologres, an AnalyticDB for PostgreSQL, or a MaxCompute data store, all tables in the data store are displayed.
    • If you select a DataHub data store, all topics in the data store are displayed.
  7. Find a table and click View Monitoring Rules.
    Data Quality allows you to configure template rules and custom rules.
    Notice Before you configure a template rule, you must configure a partition filter expression. For more information, see Configure a partition filter expression.

Create a template rule

  1. Find a table and click View Monitoring Rules to go to the Monitoring Rules page of the table.
  2. Click Create rules. The Template Rules tab appears in a panel.
    To create a template rule, you can click Add Monitoring Rule or Quick Create.
    • Add Monitoring Rule
      Click Add Monitoring Rule. The following table describes the parameters that are displayed if you set the Rule Source parameter to Built-in Template.
      Parameter Description
      Rule Name The name of the rule.
      Rule Type Valid values: Rule Type and Soft.
      • If you select Rule Type, error alerts are reported and descendant nodes are blocked, whereas warning alerts are reported but descendant nodes are not blocked.
      • If you select Soft, error alerts are reported but descendant nodes are not blocked, whereas warning alerts are not reported and descendant nodes are not blocked.
      Auto-Generated Threshold Specifies whether to use dynamic thresholds. Set this parameter as needed.
      Notice You can use the dynamic threshold feature only in DataWorks Enterprise Edition or more advanced editions.
      Rule Source Valid values: Built-in Template and Rule Templates.
      If you select Rule Templates, you must specify a rule template. For more information, see Create, manage, and use rule templates.
      Notice You can select Rule Templates only in DataWorks Enterprise Edition or more advanced editions.
      Field You can select All Fields in Table or a specific field of a numeric type or non-numeric type.
      Template Data Quality supports 43 rule templates. You can select only the rule templates that are displayed. For more information, see Built-in rule templates for offline data.
      Note You can set field-specific rules of the average value, accumulated value, minimum value, and maximum value only for numeric fields.
      Comparison Method Valid values: Absolute Value, Raise, and Drop.
      Thresholds
      • Calculate the fluctuation.
        You can calculate the fluctuation by using the following formula: Fluctuation = (Sample - Baseline)/Baseline.
        • Sample

          The sample value for the current day. For example, if you want to check the fluctuation of table rows on an SQL node in a day, the sample is the number of table rows on that day.

        • Baseline
          The comparison value from the previous N days. Examples:
          • If you want to check the fluctuation of table rows on an SQL node in a day, the baseline is the number of table rows on the previous day.
          • If you want to check the average fluctuation of table rows on an SQL node in seven days, the baseline is the average number of table rows in the last seven days.
      • Calculate the fluctuation variance.

        You can calculate the fluctuation variance only for numeric fields such as BIGINT and DOUBLE fields by using the following formula: Fluctuation variance = (Sample - Average value of past N days)/Standard deviation.

      You can specify the warning threshold and error threshold of the fluctuation to monitor data at different severities:
      • If the absolute value of the fluctuation does not exceed the warning threshold, the data is considered to be normal.
      • If the absolute value of the fluctuation does not meet the condition in Case 1 and does not exceed the error threshold, a warning alert is reported.
      • If the fluctuation does not meet the condition in Case 2, an error alert is reported.
      Description The description of the rule.
    • Quick Create
      Click Quick Create. Set the parameters as required.
      Parameter Description
      Rule Name The name of the rule.
      Field You can select All Fields in Table or a specific field of a numeric type or non-numeric type.
      Trigger Valid values: The number of columns is greater than 0 and Table row number dynamic threshold.
      Notice You can select Table row number dynamic threshold only in DataWorks Enterprise Edition or more advanced editions.
  3. Click Batch Create.

Create a custom rule

If template rules do not meet your requirements for monitoring the data quality based on a partition filter expression, you can create custom rules to meet your personalized monitoring requirements.

  1. Find a table and click View Monitoring Rules to go to the Monitoring Rules page of the table.
  2. Click Create rules. The Template Rules tab appears in a panel.
  3. Click the Custom Rules tab.
    To create a custom rule, you can click Add Monitoring Rule or Quick Create.
    • Add Monitoring Rule
      You can select All Fields in Table, SQL Statement, or a specific field for the Field parameter.
      • Select All Fields in Table or a specific field.All Fields in Table
        Parameter Description
        Rule Name The name of the rule.
        Rule Type Valid values: Rule Type and Soft.
        • If you select Rule Type, error alerts are reported and descendant nodes are blocked, whereas warning alerts are reported but descendant nodes are not blocked.
        • If you select Soft, error alerts are reported but descendant nodes are not blocked, whereas warning alerts are not reported and descendant nodes are not blocked.
        Field In this example, select All Fields in Table. If you select All Fields in Table, you can use the WHERE clause to customize filter conditions based on business requirements.
        Sampling Method Valid values: count and count/table_count.
        Filter The filter condition. For example, if you want to query the partitions of the table based on a specific data timestamp, you can specify pt=$[yyyymmdd-1] as the filter condition.
        Check type Valid values: Numeric type, Fluctuation, and Auto-Generated Threshold.
        Note You can select Auto-Generated Threshold only in DataWorks Enterprise Edition or more advanced editions.
        Comparison Method The comparison methods that can be selected vary based on the threshold type.
        • If you set the Check type parameter to Numeric type, the valid values of the Comparison Method parameter are Greater Than, Greater Than or Equal To, Equal To, Unequal To, Less Than, and Less Than or Equal To.
        • If you set the Check type parameter to Fluctuation, the valid values of the Comparison Method parameter are Absolute Value, Raise, and Drop.
        Verification Method The verification methods that can be selected vary based on the threshold type.
        • If you set the Check type parameter to Numeric type, you can set the Verification Method parameter only to Compare with a specified value.
        • If you set the Check type parameter to Fluctuation, the valid values of the Verification Method parameter are Compare the current value with the average value of the last 7 days, Compare the current value with the average value of the last 30 days, Compare the current value with the value 1 day before, Compare the current value with the value 7 days before, Compare the current value with the value 30 days before, The variance between the current value and the value 7 days before, The variance between the current value and the value 30 days before, Compare with the value 1, 7, and 30 days before, and Compare with the value of the previous cycle.
        Expected Value If you set the Check type parameter to Numeric type, you must specify an expected value.
        Thresholds If you set the Check type parameter to Fluctuation, you must specify a warning threshold and an error threshold for the fluctuation. You can enter thresholds or adjust the slider to specify thresholds.
        Description The description of the rule.
      • Select SQL Statement.SQL Statement
        Parameter Description
        Rule Name The name of the rule.
        Rule Type Valid values: Rule Type and Soft.
        • If you select Rule Type, error alerts are reported and descendant nodes are blocked, whereas warning alerts are reported but descendant nodes are not blocked.
        • If you select Soft, error alerts are reported but descendant nodes are not blocked, whereas warning alerts are not reported and descendant nodes are not blocked.
        Field If you select SQL Statement, you can customize the SQL logic. The return value is the value in a row of a column.
        Sampling Method You can set this parameter only to SQL Statement.
        Set Flag The SET clause of the SQL statement to be used.
        Custom SQL The SQL statement to be used. You can specify only an SQL statement that returns the value in a row of a column.

        In the SQL statement, enclose the partition filter expression in brackets [].

        Check type Valid values: Numeric type and Fluctuation.
        Comparison Method The comparison methods that can be selected vary based on the threshold type.
        • If you set the Check type parameter to Numeric type, the valid values of the Comparison Method parameter are Greater Than, Greater Than or Equal To, Equal To, Unequal To, Less Than, and Less Than or Equal To.
        • If you set the Check type parameter to Fluctuation, the valid values of the Comparison Method parameter are Absolute Value, Raise, and Drop.
        Verification Method The verification methods that can be selected vary based on the threshold type.
        • If you set the Check type parameter to Numeric type, you can set the Verification Method parameter only to Compare with a specified value.
        • If you set the Check type parameter to Fluctuation, the valid values of the Verification Method parameter are Compare the current value with the average value of the last 7 days, Compare the current value with the average value of the last 30 days, Compare the current value with the value 1 day before, Compare the current value with the value 7 days before, Compare the current value with the value 30 days before, The variance between the current value and the value 7 days before, The variance between the current value and the value 30 days before, Compare with the value 1, 7, and 30 days before, and Compare with the value of the previous cycle.
        Expected Value If you set the Check type parameter to Numeric type, you must specify an expected value.
        Thresholds If you set the Check type parameter to Fluctuation, you must specify a warning threshold and an error threshold for the fluctuation. You can enter thresholds or adjust the slider to specify thresholds.
        Description The description of the rule.
    • Quick CreateQuick Create
      Parameter Description
      Rule Name The name of the rule.
      Trigger You can select only Values Duplicated in Multiple Fields.
      Field The fields to be monitored.
  4. Click Batch Create.