All Products
Search
Document Center

:Data Quality (new version)

Last Updated:Aug 27, 2025

Data Quality lets you detect changes in source data and dirty data from ETL jobs. It blocks problematic tasks and prevents dirty data from spreading downstream. This avoids unexpected results that can affect business use and decisions. It also reduces the time to fix issues and avoids task rerunning.

Billing

The cost of running data quality rules includes two parts:

  • DataWorks fees

    Charged on a pay-as-you-go basis by the number of data quality rule instances. For more information, see Fees for resources.

  • Engine-specific fees

    Data quality checks generate SQL statements that run on the engine, incurring engine fees. For details, see the billing documentation of each engine. For example, if you use MaxCompute in pay-as-you-go mode, data quality checks generate MaxCompute engine charges. These are billed by MaxCompute and do not appear on your DataWorks bill.

Features

Data Quality supports quality checks on common data analytics engines, including MaxCompute, E-MapReduce, Hologres, AnalyticDB for PostgreSQL, AnalyticDB for MySQL, and CDH.

You can configure rules covering completeness, accuracy, validity, consistency, uniqueness, and timeliness. You can associate these data quality rules with scheduling nodes. When a task finishes, data quality checks are triggered immediately. You can set the strength of rules to control when a task should fail or exit, thereby preventing the spread of dirty data and effectively reducing the time and financial costs of data recovery.

The features of each Data Quality module are described below:

Name

Description

Quality Dashboard

The Quality Dashboard displays key overview metrics for data quality in the current workspace, trends and distributions of data quality check statuses triggered after instances run, top tables and owners with quality issues, and rule coverage. This helps quality assurance managers quickly understand the overall data quality of the workspace and promptly address issues to improve data quality.

Quality Assets

View the list of monitoring rules

Shows all configured quality rules.

Create and manage custom rule templates

Data Quality allows you to build a custom rule template library to centrally manage common custom monitoring rules, improving the efficiency of rule configuration.

Configure Rules

Configure a monitoring rule for a single table

Data Quality supports configuring quality monitoring rules by table or by template.

Configure a monitoring rule for multiple tables based on a template

Quality O&M

Monitors

Displays all quality monitors created in this workspace.

View the details of a monitor

Displays the data quality check results when a quality monitoring task runs. After a quality monitoring task finishes, you can view the details on the Run History page.

Quality Analysis

Configure report templates

Data Quality allows users to create report templates and freely add various metrics for rule configurations and rule runs. Reports are generated and sent periodically based on the configured statistical period, sending time, and subscription information.

Usage notes

  • Supported regions for each engine are as follows:

    Engine

    Supported regions

    E-MapReduce

    China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Germany (Frankfurt), and US (Silicon Valley).

    Hologres

    China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Germany (Frankfurt), US (Silicon Valley), and US (Virginia).

    AnalyticDB for PostgreSQL

    China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), and Japan (Tokyo).

    AnalyticDB for MySQL

    China (Shenzhen), Singapore, and US (Silicon Valley).

    CDH

    China (Shanghai), China (Beijing), China (Zhangjiakou), China (Hong Kong), and Germany (Frankfurt).

  • Before configuring data quality rules for E-MapReduce, Hologres, AnalyticDB for PostgreSQL, AnalyticDB for MySQL, and CDH, you must first collect metadata. For more information, see Collect metadata from an EMR data source.

  • After configuring data quality rules for tables in E-MapReduce, Hologres, AnalyticDB for PostgreSQL, AnalyticDB for MySQL, and CDH, run the scheduling node that generates the table data on a resource group with an established network connection to trigger the data quality rule checks properly.

  • Multiple data quality rules can be configured for a single table.

Scenarios

In offline data check scenarios, Data Quality uses the partition expression configured for a table to check the table partitions generated by a node each day. The data quality rule is associated with the scheduling node that produces the table data. When the task finishes running, the quality check is triggered (dry-run tasks do not trigger quality checks). You can set the strength of the rule to control whether the node fails and exits, thereby preventing the spread of dirty data. You can also configure alert settings to receive alert notifications and handle issues promptly.

Configure rules

  • Create rules: Data Quality allows you to create data quality rules by table. You can also use predefined rule templates to quickly create data quality rules for multiple tables in batches. For more information, see Configure a monitoring rule for a single table and Configure a monitoring rule for multiple tables based on a template.

  • Subscribe to rules: After creating a rule, you can subscribe to it to receive alert notifications for data quality rule checks. Supported methods include EmailEmail and SMSDingTalk ChatbotDingTalk Chatbot @ALLLark Group ChatbotEnterprise WeChat Chatbot, and Custom Webhook.

  • Note

    Only DataWorks Enterprise Edition supports the Custom Webhook method.

Trigger data quality checks

In Operation Center, when a scheduling node associated with a table finishes running (executing the node code logic), it triggers a data quality check, which generates a SQL statement that validates data on the engine. DataWorks determines whether the task should fail and exit based on the strength of the data quality rule and the check results. This blocks downstream nodes from running and prevents dirty data from expanding.

View check results

You can view the data quality check results through the node's runtime log in Operation Center and the Data Quality task query page.

  • View the node's runtime log in Operation Center

    1. Check the instance status. If the status shows failed, the code may have been run, but the output did not pass a strong data quality rule. This caused the task to exit and blocked downstream instances.

    2. Open the DQC Log in the instance's Runtime Log to view the data quality check results. For more information, see View auto triggered instances.

  • View through the Running Records page.

    In the page, search for the check details of a data quality monitoring task by table or node. For more information, see View the details of a monitor.