Data Quality allows you to configure monitoring rules for data in E-MapReduce (EMR), Hologres, AnalyticDB for PostgreSQL, MaxCompute, and DataHub data sources. This topic describes how to configure a rule for monitoring data in a MaxCompute data source.

Go to the Monitoring Rules page

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces.
  3. After you select the region in which the workspace that you want to manage resides, find the workspace and click Data Analytics in the Actions column.
  4. Click the Icon icon in the upper-left corner and choose All Products > Data governance > Data Quality.
  5. In the left-side navigation pane, click Monitoring Rules.
  6. Select MaxCompute from the Engine/Data Source drop-down list and select a MaxCompute project from the Engine/Database Instance drop-down list.
    Monitoring Rules
    Data Quality supports EMR, Hologres, AnalyticDB for PostgreSQL, MaxCompute, and DataHub data sources.
    • If you select an EMR, a Hologres, an AnalyticDB for PostgreSQL, or a MaxCompute data source, all tables in the data source are displayed.
    • If you select a DataHub data source, all topics in the data source are displayed.
  7. Find a table and click View Monitoring Rules in the Actions column.
    Data Quality allows you to configure template rules and custom rules.
    Notice Before you configure a template rule, you must configure a partition filter expression. For more information, see Configure a partition filter expression.

Create a template rule

  1. Find a table and click View Monitoring Rules to go to the Monitoring Rules page of the table.
  2. Click Create rules. The Template Rules tab appears in the Create rules panel.
    To create a template rule, you can click Add Monitoring Rule or Quick Create.
    • Add Monitoring Rule
      Click Add Monitoring Rule. The following table describes the parameters that are displayed if you set the Rule Source parameter to Built-in Template.
      Parameter Description
      Rule Name The name of the rule.
      Rule Type The type of the rule. Valid values: Strong and Soft.
      • If you select Strong, error alerts are reported and descendant nodes are blocked, whereas warning alerts are reported but descendant nodes are not blocked.
      • If you select Soft, error alerts are reported but descendant nodes are not blocked, whereas warning alerts are not reported and descendant nodes are not blocked.
      Auto-Generated Threshold Specifies whether to use dynamic thresholds. Set this parameter as needed.
      Notice You can use the dynamic threshold feature only in DataWorks Enterprise Edition or more advanced editions.
      Rule Source The source of the rule. Valid values: Built-in Template and Rule Templates.
      If you select Rule Templates, you must specify a rule template. For more information, see Create, manage, and use rule templates.
      Notice You can select Rule Templates only in DataWorks Enterprise Edition or more advanced editions.
      Field The fields to be monitored. You can select All Fields in Table or a specific field of a numeric type or non-numeric type.
      Template The template that you want to apply to the rule. Data Quality supports 43 rule templates. You can select only the rule templates that are displayed. For more information, see Built-in rule templates for offline data.
      Note You can set field-specific rules of the average value, accumulated value, minimum value, and maximum value only for numeric fields.
      Comparison Method The comparison method of the rule. Valid values: Absolute Value, Raise, and Drop.
      Thresholds
      • The warning threshold and error threshold of the fluctuation. Calculate the fluctuation.
        You can calculate the fluctuation by using the following formula: Fluctuation = (Sample - Baseline)/Baseline.
        • Sample

          The sample value for the current day. For example, if you want to check the fluctuation of table rows on an SQL node in a day, the sample is the number of table rows on that day.

        • Baseline
          The comparison value from the previous N days. Examples:
          • If you want to check the fluctuation of table rows on an SQL node in a day, the baseline is the number of table rows on the previous day.
          • If you want to check the average fluctuation of table rows on an SQL node in seven days, the baseline is the average number of table rows in the last seven days.
      • Calculate the fluctuation variance.

        You can calculate the fluctuation variance only for numeric fields such as BIGINT and DOUBLE fields by using the following formula: Fluctuation variance = (Sample - Average value of previous N days)/Standard deviation.

      You can specify the warning threshold and error threshold of the fluctuation to monitor data at different severities:
      • If the absolute value of the fluctuation does not exceed the warning threshold, the data is considered to be normal.
      • If the absolute value of the fluctuation does not meet the condition in Case 1 and does not exceed the error threshold, a warning alert is reported.
      • If the fluctuation does not meet the condition in Case 2, an error alert is reported.
      Description The description of the rule.
      The following figure shows the logic of alerts and blocks. Logic
    • Quick Create
      Click Quick Create. Set the parameters as required.
      Parameter Description
      Rule Name The name of the rule.
      Field The fields to be monitored. You can select All Fields in Table or a specific field of a numeric type or non-numeric type.
      Trigger The trigger condition of the rule. Valid values: The number of columns is greater than 0 and Table row number dynamic threshold.
      Notice You can select Table row number dynamic threshold only in DataWorks Enterprise Edition or more advanced editions.
  3. Click Batch Create.

Create a custom rule

If template rules do not meet your requirements for monitoring the data quality based on a partition filter expression, you can create custom rules to meet your personalized monitoring requirements.

  1. Find a table and click View Monitoring Rules to go to the Monitoring Rules page of the table.
  2. Click Create rules. The Template Rules tab appears in the Create rules panel.
  3. Click the Custom Rules tab.
    To create a custom rule, you can click Add Monitoring Rule or Quick Create.
    • Add Monitoring Rule
      You can select All Fields in Table, SQL Statement, or a specific field for the Field parameter.
      • Select All Fields in Table or a specific field.All Fields in Table
        Parameter Description
        Rule Name The name of the rule.
        Rule Type The type of the rule. Valid values: Strong and Soft.
        • If you select Strong, error alerts are reported and descendant nodes are blocked, whereas warning alerts are reported but descendant nodes are not blocked.
        • If you select Soft, error alerts are reported but descendant nodes are not blocked, whereas warning alerts are not reported and descendant nodes are not blocked.
        Field The fields to be monitored. In this example, select All Fields in Table. If you select All Fields in Table, you can use the WHERE clause to customize filter conditions based on business requirements.
        Sampling Method The statistical function of the rule. Valid values: count and count/table_count.
        Note A value of count/table_count indicates the ratio of the number of results obtained after filtering to the total number of table rows. The filtering is performed based on the filter conditions that you specify.
        Filter The filter conditions. For example, if you want to query the partitions of the table based on a specific data timestamp, you can specify pt=$[yyyymmdd-1] as a filter condition.
        Check type The threshold type of the rule. Valid values: Numeric type, Fluctuation, and Auto-Generated Threshold.
        Note You can select Auto-Generated Threshold only in DataWorks Enterprise Edition or more advanced editions.
        Comparison Method The comparison method of the rule. The comparison methods that can be selected vary based on the threshold type.
        • If you set the Check type parameter to Numeric type, the valid values of the Comparison Method parameter are Greater Than, Greater Than or Equal To, Equal To, Unequal To, Less Than, and Less Than or Equal To.
        • If you set the Check type parameter to Fluctuation, the valid values of the Comparison Method parameter are Absolute Value, Raise, and Drop.
        Verification Method The verification method of the rule. The verification methods that can be selected vary based on the threshold type.
        • If you set the Check type parameter to Numeric type, you can set the Verification Method parameter only to Compare with a specified value.
        • If you set the Check type parameter to Fluctuation, the valid values of the Verification Method parameter are Compare the current value with the average value of the last 7 days, Compare the current value with the average value of the last 30 days, Compare the current value with the value 1 day before, Compare the current value with the value 7 days before, Compare the current value with the value 30 days before, The variance between the current value and the value 7 days before, The variance between the current value and the value 30 days before, Compare with the value 1, 7, and 30 days before, and Compare with the value of the previous cycle.
        Expected Value The expected value of the rule. If you set the Check type parameter to Numeric type, you must specify an expected value.
        Thresholds The warning threshold and error threshold of the fluctuation. If you set the Check type parameter to Fluctuation, you must specify a warning threshold and an error threshold for the fluctuation. You can enter thresholds or adjust the slider to specify thresholds.
        Description The description of the rule.
      • Select SQL Statement.SQL Statement
        Parameter Description
        Rule Name The name of the rule.
        Rule Type The type of the rule. Valid values: Strong and Soft.
        • If you select Strong, error alerts are reported and descendant nodes are blocked, whereas warning alerts are reported but descendant nodes are not blocked.
        • If you select Soft, error alerts are reported but descendant nodes are not blocked, whereas warning alerts are not reported and descendant nodes are not blocked.
        Field The fields to be monitored. If you select SQL Statement, you can customize the SQL logic for the rule. The return value is the value in a row of a column.
        Sampling Method The statistical function of the rule. You can set this parameter only to SQL Statement.
        Set Flag The SET clause of the SQL statement to be used.
        Custom SQL The SQL statement to be used. You can specify only an SQL statement that returns the value in a row of a column.

        In the SQL statement, enclose the partition filter expression in brackets [].

        Check type The threshold type of the rule. Valid values: Numeric type and Fluctuation.
        Comparison Method The comparison method of the rule. The comparison methods that can be selected vary based on the threshold type.
        • If you set the Check type parameter to Numeric type, the valid values of the Comparison Method parameter are Greater Than, Greater Than or Equal To, Equal To, Unequal To, Less Than, and Less Than or Equal To.
        • If you set the Check type parameter to Fluctuation, the valid values of the Comparison Method parameter are Absolute Value, Raise, and Drop.
        Verification Method The verification method of the rule. The verification methods that can be selected vary based on the threshold type.
        • If you set the Check type parameter to Numeric type, you can set the Verification Method parameter only to Compare with a specified value.
        • If you set the Check type parameter to Fluctuation, the valid values of the Verification Method parameter are Compare the current value with the average value of the last 7 days, Compare the current value with the average value of the last 30 days, Compare the current value with the value 1 day before, Compare the current value with the value 7 days before, Compare the current value with the value 30 days before, The variance between the current value and the value 7 days before, The variance between the current value and the value 30 days before, Compare with the value 1, 7, and 30 days before, and Compare with the value of the previous cycle.
        Expected Value The expected value of the rule. If you set the Check type parameter to Numeric type, you must specify an expected value.
        Thresholds The warning threshold and error threshold of the fluctuation. If you set the Check type parameter to Fluctuation, you must specify a warning threshold and an error threshold for the fluctuation. You can enter thresholds or adjust the slider to specify thresholds.
        Description The description of the rule.
    • Quick CreateQuick Create
      Parameter Description
      Rule Name The name of the rule.
      Trigger The trigger condition of the rule. You can select only Values Duplicated in Multiple Fields.
      Field The fields to be monitored.
  4. Click Batch Create.