All Products
Search
Document Center

Data Management:Check the data quality

Last Updated:Apr 18, 2024

Data Management (DMS) allows you to create quality rules for tables or columns. DMS automatically checks the data quality based on the created quality rules in combination with the audit tasks in the task flow. This ensures data consistency, uniqueness, and availability for the tables and improves the quality of secondary analysis and development.

Prerequisites

  • The database instance that you want to manage is managed in Security Collaboration mode. For more information, see Control modes.

  • One of the following conditions is met:

    • You are a DMS administrator, a database administrator (DBA) in DMS, or a user who assumes the schema read-only role. For more information, see System roles.

    • You are the owner or the DBA of the database instance that you want to manage. For more information, see Modify database instances.

    • You have the permissions to query, change, or export the data of the database instance that you want to manage or a table in the database instance. For more information, see View owned permissions.

Background information

An increasing amount of data requires secondary analysis and development. This brings challenges because the data is interdependent and usually comes from various sources, in a large amount, and with diverse structures. In this case, the validity and quality of data are significant for your business. For example, before you perform secondary analysis on the data, you can check whether the data quality meets your requirements. This ensures that the data analysis can be complete as expected.

Terms

  • Quality rule: a rule that you create for monitoring a table or a column in the table to ensure that the data meets your requirements. For example, you can specify that the number of rows in a table is a fixed value and the maximum column value in a table is 100.

  • Audit task: a task that you create to monitor data and check whether the data meets a specific quality rule. After you specify a quality rule for a table and a scheduling cycle for the task flow, DMS automatically checks the data quality of the table and generates a report.

Procedure

  1. Log on to the DMS console V5.0.
  2. Go to the details page of a table. You can use one of the following methods to go to the details page of a table:

    • Go to the Data Visualization tab from the SQL Console tab

      1. In the top navigation bar of the DMS console, choose SQL Console > SQL Console.

      2. In the Please select the database first dialog box, enter a keyword to search for the database instance that you want to manage, select the database instance from the search results, and then click Confirm.

      3. In the upper-right corner of the SQL Console tab, click the 表详情 icon in the extended feature section.

      4. Find the table that you want to view, move the pointer over More in the Actions column, and then select View Table Details. The Details panel appears.

    • Search for the table that you want to view

      1. In the top navigation bar, click Home. In the search box, enter the name of the table that you want to view.

      2. In the search results, click the Table option. Find the table that you want to view and click Details on the right to go to the Tables page.

  3. Click the Data quality tab.

  4. Click Create Rule. In the panel that appears, set the parameters that are described in the following table.

    Parameter

    Description

    Example

    Rule Name

    The name of the quality rule.

    The maximum column value is 100

    Rule Type

    The type of the quality rule. Valid values:

    • Table Rule: The quality rule is to be created for a table. For example, you can create a quality rule based on a table to check whether the number of rows in the table meets your requirements.

    • Column Rule: The quality rule is to be created for a column in a table. For example, you can create a quality rule based on a column in a table to check the number of null values in the column.

    Column Rule

    Rule Template

    The rule template that is preset by DMS. For more information, see the Quality rule templates section of this topic.

    Maximum Value in a Column

    Column Name

    The name of the column for which the quality rule is to be created.

    This parameter appears only when you set the Rule Type parameter to Column Rule.

    columns_test

    Comparison Method

    The comparison method. The method is used to compare the actual data in the table with the value of the Expected Value parameter that you specify.

    Valid values: Greater, Greater than or equal, Equal, Is less than or equal, Less, and Not equal.

    Is less than or equal

    Expected Value

    The value that you expect.

    100

  5. Click OK.

    By default, after a quality rule is created, the quality rule enters the Enabled state.

    Note

    You can perform the following operations on the created quality rule:

    • To disable the quality rule, click Disable in the Actions column. After the quality rule is disabled, you cannot create an audit task for the quality rule.

      If the quality rule is used by an audit task, the quality rule cannot be disabled. In this case, delete the audit task and try again.

    • To modify the quality rule, move the pointer over More in the Actions column and select Edit.

    • To view the earlier versions of the quality rule, move the pointer over More in the Actions column and select Logs.

  6. Move the pointer over More in the Actions column and select New Task Flow.

    新建任务流

  7. On the Task Orchestration page, configure an Audit Task node and publish the task flow.

    Note

    For more information, see Overview.

    After you specify a scheduling cycle and publish the task flow, DMS automatically checks the data based on the scheduling cycle and the quality rule.

Quality rule templates

Rule Type

Rule Template

Table Rule

Number of Rows in a Table. The Number is a Fixed Value: This template checks whether the number of rows in a table meets your business requirements. For example, you can specify that the number of rows in a table must be less than 100,000.

Column Rule

  • Maximum Value in a Column: This template checks whether the maximum value of a column meets your business requirements. For example, you can specify that the maximum value of the primary key in a table cannot exceed 2147483647, which is the result of 2³¹ minus 1.

  • Minimum Value in a Column: This template checks whether the minimum value of a column meets your business requirements. For example, you can specify that the minimum value of a column in a table cannot be less than 3.

  • Number of Strings That Match the Expression Containing Wildcards. The Number is a Fixed Value: This template checks whether the number of column values that meet the specified format requirements meets your business requirements. For example, you can specify that the number of values that start with a in a column must be less than 10.

  • Number of Strings that Match the Regular Expression. The Number is a Fixed Value: This template checks whether the number of column values that match the specified regular expression meets your business requirements. For example, you can specify that the number of values that match the regular expression in a column must be greater than 100.

  • Number of Null Values in a Column. The Number is a Fixed Value: This template checks whether the number of null values in a column meets your business requirements. For example, you can specify that the number of null values in a column must be less than or equal to zero.

  • Number of Unique Values in a Column. The Number is a Fixed Value: This template checks whether the number of unique values in a column meets your business requirements. For example, you can specify that the number of unique values in a column must be greater than or equal to 100. This ensures that at least 100 different values exist in the column.

  • Number of Duplicate Values in a Column. The Number is a Fixed Value: This template checks whether the number of duplicate values in a column meets your business requirements. For example, you can specify that the number of duplicate values in a column must be less than 100. This ensures that the number of duplicate values in the column does not exceed 100.