Data Management (DMS) allows you to create quality rules for tables in a database instance or columns of tables. The created quality rules work with the audit tasks in the task flow to automatically check the data quality. This ensures data consistency, uniqueness, and availability for the tables and improves the quality of secondary analysis and development.

Prerequisites

  • The database instance for which you want to enable metadata access control is managed in Security Collaboration mode. For more information, see Control modes.

  • One of the following conditions is met:
    • You are a DMS administrator or a database administrator (DBA) in DMS. Alternatively, you assume the schema read-only role. For more information, see System roles.
    • You are the owner or the DBA of the database instance that you want to manage. For more information, see Modify database instances.
    • You have the permissions to query, change, or export the data of the database instance that you want to manage or a table in the database instance. For more information, see View owned permissions.

Background information

As your business volume grows, an increasing amount of data requires secondary analysis and development. This also brings challenges because the data is interdependent and usually comes from various sources, in a large quantity, and with diverse structures. In this case, the validity and quality of data are significant for your business. For example, before you perform secondary analysis on the obtained data, you can check whether the data quality meets your requirements. This ensures that the data analysis can be complete as expected.

Terms

  • Quality rule: a rule that you create for monitoring a table or a column in the table to ensure that the data meets your expectations. For example, you can specify that the number of rows in a table is a fixed value and the maximum column value in a table is 100.
  • Audit task: a task that you create to monitor data and check whether the data meets a specific quality rule. After you specify a quality rule for a table and a scheduling cycle for the task flow, DMS checks the data quality of the table and generates a report.

Procedure

  1. Go to the DMS console V5.0.
  2. Go to the Tables page. You can use one of the following methods to go to the Tables page:
    • Navigate to the Tables page from the SQLConsole tab
      1. In the top navigation bar, choose SQL Console > SQL Console.
        Note If you use the previous version of the DMS console, perform the following steps: In the top navigation bar, move the pointer over the More icon and choose SQLConsole > Single Database Query.
      2. In the extended feature section in the upper-right corner of the SQLConsole, click the Table List icon.
      3. Find the table that you want to view, move the pointer over More in the Actions column, and then select View Table Details. The Details panel appears.
    • Search for the table that you want to view
      1. In the top navigation bar, click Home. In the search box in the middle of the Home page, enter the name of the table that you want to view.
        Note If you use the previous version of the DMS console, enter the name of the table in the search box in the top navigation bar.
      2. From the results, click the Table tab. Find the table and click Details on the right to go to the Tables page.
      Details
  3. Click the Data quality tab.
  4. Click Create Rule. In the panel that appears, set the parameters that are described in the following table.
    Parameter Description Example
    Rule Name The name of the quality rule. The maximum column value is 100
    Rule Type The type of the quality rule. Valid values:
    • Table Rule: The quality rule is to be created for a table. For example, you can create a quality rule based on a table to check whether the number of rows in the table meets your expectations.
    • Column Rule: The quality rule is to be created for a column in a table. For example, you can create a quality rule based on a column in a table to check the number of null values in the column.
    Column Rule
    Rule Template The rule template that is preset by DMS. For more information, see Quality rule templates. Maximum Value in a Column
    Column Name The name of the column for which the quality rule is to be created.

    This parameter is displayed only when you set the Rule Type parameter to Column Rule.

    columns_test
    Comparison Method The comparison method. The method is used to compare the actual data in the table with the value of the Expected Value parameter that you specify.

    Valid values: Greater, Greater than or equal, Equal, Is less than or equal, Less, and Not equal.

    Is less than or equal
    Expected Value The value that you expect. 100
  5. Click OK.
    By default, after the quality rule is created, the quality rule enters the Enabled state.
    Note You can perform the following operations on the created quality rule:
    • To disable the quality rule, click Disable in the Actions column. After the quality rule is disabled, you cannot create an audit task for the quality rule.

      If the quality rule is used by an audit task, the quality rule cannot be disabled. In this case, delete the audit task and try again.

    • To modify the quality rule, move the pointer over More in the Actions column and select Edit.
    • To view the earlier versions of the quality rule, move the pointer over More in the Actions column and select Logs.
  6. Move the pointer over More in the Actions column and select Create Task Flow.
    Create Task Flow
  7. On the Task Orchestration page, configure an Audit Task node and publish the task flow.
    Note For more information, see Overview.
    After you specify a scheduling cycle and publish the task flow, DMS checks the data based on the scheduling cycle and the quality rule and automatically generates a report.

Quality rule templates

Rule Type Rule Template
Table Rule Number of Rows in a Table. The Number is a Fixed Value: This template checks whether the number of rows in a table is as expected. For example, you can specify that the number of rows in a table must be less than 100,000.
Column Rule
  • Maximum Value in a Column: This template checks whether the maximum value of a column is as expected. For example, you can specify that the maximum value of the primary key in a table cannot exceed 2147483647, which is the result of 2³¹ minus 1.
  • Minimum Value in a Column: This template checks whether the minimum value of a column is as expected. For example, you can specify that the minimum value of a column in a table cannot be less than 3.
  • Number of Strings That Match the Expression Containing Wildcards. The Number is a Fixed Value: This template checks whether the number of column values that meet the specified format requirements is as expected. For example, you can specify that the number of values that start with a in a column must be less than 10.
  • Number of Strings that Match the Regular Expression. The Number is a Fixed Value: This template checks whether the number of column values that match the specified regular expression is as expected. For example, you can specify that the number of values that match the regular expression in a column must be greater than 100.
  • Number of Null Values in a Column. The Number is a Fixed Value: This template checks whether the number of null values in a column is as expected. For example, you can specify that the number of null values in a column must be less than or equal to zero.
  • Number of Unique Values in a Column. The Number is a Fixed Value: This template checks whether the number of unique values in a column is as expected. For example, you can specify that the number of unique values in a column must be greater than or equal to 100. This ensures that at least 100 different values exist in the column.
  • Number of Duplicate Values in a Column. The Number is a Fixed Value: This template checks whether the number of duplicate values in a column is as expected. For example, you can specify that the number of duplicate values in a column must be less than 100. This ensures that the number of duplicate values in the column does not exceed 100.