Data Quality allows you to configure monitoring rules for data in E-MapReduce (EMR), Hologres, AnalyticDB for PostgreSQL, MaxCompute, and DataHub data sources. This topic describes how to configure a rule for monitoring data in a MaxCompute data source.

Precautions

Data standard-generated rules are not supported.

Go to the Monitoring Rules page

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces.
  3. In the top navigation bar, select the region in which the workspace that you want to manage resides. Find the workspace and click Data Development in the Actions column.
  4. On the DataStudio page, click the Icon icon in the upper-left corner and choose All Products > Data governance > Data Quality.
  5. In the left-side navigation pane, choose Rule management > Configure by Table.
  6. Select MaxCompute from the Engine/Data Source drop-down list and select a MaxCompute project from the Engine/Database Instance drop-down list.
    Rule Configuration-Configure by Table
    Data Quality supports EMR, Hologres, AnalyticDB for PostgreSQL, MaxCompute, and DataHub data sources.
    • If you select an EMR, Hologres, AnalyticDB for PostgreSQL, or MaxCompute data source, all tables in the data source are displayed.
    • If you select a DataHub data source, all topics in the data source are displayed. For more information about how to add a DataHub data source, see Add a DataHub data source.
  7. Find the table for which you want to configure a monitoring rule and click View Monitoring Rules in the Actions column.
    Data Quality allows you to configure template rules and custom rules.
    Notice Before you configure a template rule, you must configure a partition filter expression. For more information, see Configure a partition filter expression. If you configure a monitoring rule for an EMR data source, you can specify resource queues for tables in the EMR data source after you configure a partition filter expression. By default, the resource queue named default is used. If you want to use other resource queues, you can submit a ticket.

Create a template rule

  1. Find the table for which you want to configure monitoring rules and click View Monitoring Rules in the Actions column. On the page that appears, you can configure monitoring rules for this table.
  2. Click Create Rule. The Template Rules tab is displayed by default in the Create Rule panel.
    To create a template rule, you can click Add Monitoring Rule or Quick Create.
    • Add Monitoring Rule
      Click Add Monitoring Rule. The following table describes the parameters that are displayed if you set the Rule Source parameter to Built-in Template.
      Parameter Description
      Rule Name The name of the monitoring rule.
      Rule Type The strength of the monitoring rule. Valid values: Strong and Soft.
      • If you set this parameter to Strong, error alerts are reported and descendant nodes are blocked, whereas warning alerts are reported but descendant nodes are not blocked.
      • If you set this parameter to Soft, error alerts are reported but descendant nodes are not blocked, whereas warning alerts are not reported and descendant nodes are not blocked.
      Auto-Generated Threshold Specifies whether to use a dynamic threshold. Configure this parameter based on your business requirements.
      Notice You can use the dynamic threshold feature only in DataWorks Enterprise Edition or a more advanced edition.
      Rule Source The source for the monitoring rule. Valid values: Built-in Template and Rule Templates.
      If you select Rule Templates, you must specify a rule template. For more information, see Create, manage, and use rule templates.
      Notice You can select Rule Templates only in DataWorks Enterprise Edition or a more advanced edition.
      Field The fields to be monitored. You can select all fields in a table or a specific field of a numeric type or non-numeric type.
      Template The template that you want to apply to the monitoring rule. Data Quality supports 43 rule templates. You can select only the monitoring rule templates that are displayed. For more information, see Built-in monitoring rule templates.
      Note You can configure field-level monitoring rules of the following types only for numeric fields: average value, sum of values, minimum value, and maximum value.
      Comparison Method The comparison method for the monitoring rule. Valid values: Absolute Value, Raise, and Drop.
      Thresholds
      • Calculate the fluctuation.
        You can calculate the fluctuation by using the following formula: Fluctuation = (Sample value - Baseline)/Baseline.
        • Sample value

          The sample value for the current day. For example, if you want to check the fluctuation in the number of table rows on an SQL node within a day, the sample value is the number of table rows in partitions on that day.

        • Baseline
          The comparison value collected from the previous N days. Examples:
          • If you want to check the fluctuation in the number of table rows on an SQL node within a day, the baseline is the number of table rows in partitions on the previous day.
          • If you want to check the average fluctuation in the number of table rows on an SQL node within seven days, the baseline is the average number of table rows in the last seven days.
      • Calculate the fluctuation variance.

        You can calculate the fluctuation variance only for numeric fields such as BIGINT and DOUBLE fields by using the following formula: Fluctuation variance = (Sample value - Average value of previous N days)/Standard deviation.

      You can specify the warning threshold and error threshold of the fluctuation to monitor data and identify issues of different severities:
      • If the absolute value of the fluctuation does not exceed the warning threshold, the data is considered to be normal.
      • If the absolute value of the fluctuation exceeds the warning threshold and does not exceed the error threshold, a warning alert is reported.
      • If the absolute value of the fluctuation exceeds the error threshold, an error alert is reported.
      Description The description of the monitoring rule.
      The following figure shows the logic of alerting and blocking. Logic of alerting and blocking
    • Quick Create
      Click Quick Create. Configure the parameters as required. The following table describes the parameters.
      Parameter Description
      Rule Name The name of the monitoring rule.
      Field The fields to be monitored. You can select all fields in a table or a specific field of a numeric type or non-numeric type.
      Trigger The trigger condition of the monitoring rule. Valid values: The number of columns is greater than 0 and Table row number dynamic threshold.
      Notice You can select Table row number dynamic threshold only in DataWorks Enterprise Edition or a more advanced edition.
  3. Click Batch Create.

Create a custom rule

If template rules do not meet your business requirements for monitoring the data quality based on a partition filter expression, you can create custom rules based on your business requirements.

  1. Find the table for which you want to configure monitoring rules and click View Monitoring Rules in the Actions column. On the page that appears, you can configure monitoring rules for this table.
  2. Click Create Rule. The Template Rules tab is displayed by default in the Create Rule panel.
  3. Click the Custom Rules tab.
    To create a custom rule, you can click Add Monitoring Rule or Quick Create.
    • Add Monitoring Rule
      You can select all fields in a table, SQL statements, or a specific field from the Field drop-down list.
      • The parameters that you can configure vary based on the option you select for Field. If you select all fields in a table or a specific field for Field, you need to configure the parameters that are described in the following table.All Fields in Table
        Parameter Description
        Rule Name The name of the monitoring rule.
        Rule Type The strength of the monitoring rule. Valid values: Strong and Soft.
        • If you set this parameter to Strong, error alerts are reported and descendant nodes are blocked, whereas warning alerts are reported but descendant nodes are not blocked.
        • If you set this parameter to Soft, error alerts are reported but descendant nodes are not blocked, whereas warning alerts are not reported and descendant nodes are not blocked.
        Field The fields to be monitored. In this example, select All Fields in Table. If you select All Fields in Table, you can use the WHERE clause to customize filter conditions based on your business requirements.
        Sampling Method The sampling method for the monitoring rule. Valid values: count and count/table_count.
        Note The value of count/table_count is the ratio of the number of table rows that you obtain based on filter conditions to the total number of table rows in the current partition.
        Filter The filter conditions. For example, if you want to query the partitions of the table based on a specific data timestamp, you can specify pt=$[yyyymmdd-1] as a filter condition.
        Check type The threshold type for the monitoring rule. Valid values: Numeric type, Fluctuation, and Auto-Generated Threshold.
        Note You can select Auto-Generated Threshold only in DataWorks Enterprise Edition or a more advanced edition.
        Comparison Method The comparison method for the monitoring rule. The comparison methods that can be selected vary based on the threshold type.
        • If you set the Check type parameter to Numeric type, the valid values of the Comparison Method parameter are Greater Than, Greater Than or Equal To, Equal To, Unequal To, Less Than, and Less Than or Equal To.
        • If you set the Check type parameter to Fluctuation, the valid values of the Comparison Method parameter are Absolute Value, Raise, and Drop.
        Verification Method The verification method for the monitoring rule. The verification methods that can be selected vary based on the threshold type.
        • If you set the Check type parameter to Numeric type, you can set the Verification Method parameter only to Compare with a specified value.
        • If you set the Check type parameter to Fluctuation, the valid values of the Verification Method parameter are Compare the current value with the average value of the last 7 days, Compare the current value with the average value of the last 30 days, Compare the current value with the value 1 day before, Compare the current value with the value 7 days before, Compare the current value with the value 30 days before, The variance between the current value and the value 7 days before, The variance between the current value and the value 30 days before, Compare with the value 1, 7, and 30 days before, and Compare with the value of the previous cycle.
        Expected Value The expected value for the monitoring rule. If you set the Check type parameter to Numeric type, you must specify an expected value.
        Thresholds The warning threshold and error threshold of the fluctuation. If you set the Check type parameter to Fluctuation, you must specify a warning threshold and an error threshold for the fluctuation. You can enter a threshold or adjust the slider to specify a threshold.
        Description The description of the custom rule.
      • SQL StatementSQL Statement
        Parameter Description
        Rule Name The name of the monitoring rule.
        Rule Type The strength of the monitoring rule. Valid values: Strong and Soft.
        • If you set this parameter to Strong, error alerts are reported and descendant nodes are blocked, whereas warning alerts are reported but descendant nodes are not blocked.
        • If you set this parameter to Soft, error alerts are reported but descendant nodes are not blocked, whereas warning alerts are not reported and descendant nodes are not blocked.
        Field The fields to be monitored. If you select SQL Statement, you can customize the SQL logic for the monitoring rule. The return value is the value in a row of a column.
        Sampling Method The sampling method for the monitoring rule. You can set this parameter only to SQL Statement.
        Set Flag The SET clause of the SQL statement to be used.
        Custom SQL The custom SQL statement to be used. You can specify only a custom SQL statement that returns the value in a row of a column.

        In the custom SQL statement, enclose the partition filter expression in brackets []. Sample custom SQL statement:

        select count(*) from table_name where ds=$[yyyymmdd];
        Note
        • In this statement, table_name indicates the name of the table for which you configure a monitoring rule. Replace table_name with the actual table name based on your business requirements.
        • You can use the parameter that you configure on the Properties tab to replace the partition filter expression. For more information, see Configure a partition filter expression.
        • The partition filter expression that is used in the custom SQL statement instead of the partition filter expression that you configure in the previous step is used by the data quality monitoring rule that you configure.
        Check type The threshold type for the monitoring rule. Valid values: Numeric type and Fluctuation.
        Comparison Method The comparison method for the monitoring rule. The comparison methods that can be selected vary based on the threshold type.
        • If you set the Check type parameter to Numeric type, the valid values of the Comparison Method parameter are Greater Than, Greater Than or Equal To, Equal To, Unequal To, Less Than, and Less Than or Equal To.
        • If you set the Check type parameter to Fluctuation, the valid values of the Comparison Method parameter are Absolute Value, Raise, and Drop.
        Verification Method The verification method for the monitoring rule. The verification methods that can be selected vary based on the threshold type.
        • If you set the Check type parameter to Numeric type, you can set the Verification Method parameter only to Compare with a specified value.
        • If you set the Check type parameter to Fluctuation, the valid values of the Verification Method parameter are Compare the current value with the average value of the last 7 days, Compare the current value with the average value of the last 30 days, Compare the current value with the value 1 day before, Compare the current value with the value 7 days before, Compare the current value with the value 30 days before, The variance between the current value and the value 7 days before, The variance between the current value and the value 30 days before, Compare with the value 1, 7, and 30 days before, and Compare with the value of the previous cycle.
        Expected Value The expected value for the monitoring rule. If you set the Check type parameter to Numeric type, you must specify an expected value.
        Thresholds The warning threshold and error threshold of the fluctuation. If you set the Check type parameter to Fluctuation, you must specify a warning threshold and an error threshold for the fluctuation. You can enter a threshold or adjust the slider to specify a threshold.
        Description The description of the custom rule.
    • Quick CreateQuick Create
      Parameter Description
      Rule Name The name of the monitoring rule.
      Trigger The trigger condition for the monitoring rule. You can select only Values Duplicated in Multiple Fields.
      Field The fields to be monitored.
  4. Click Batch Create.

What to do next

If you want to prevent data that does not meet the requirements of a monitoring rule from blocking the running of the associated auto triggered node on the specified data timestamp, you can configure a noise reduction rule for the monitoring rule to denoise the data. For more information, see Mange noise reduction rules.