All Products
Search
Document Center

DataWorks:Data quality

Last Updated:Jun 23, 2026

DataWorks data quality (DQC) is a powerful platform for monitoring and safeguarding your data. It proactively detects and intercepts unexpected "dirty data" in your data production pipelines before it spreads downstream. This ensures accurate business decisions and significantly reduces troubleshooting and resource rerun costs.

Core concepts and workflow

Before you use data quality, you should understand its core concepts and workflow. The data quality system is built around the following core entities:

  1. Template: Defines the data validation logic. DataWorks provides a rich library of built-in templates (such as table row count and the number of distinct values in a column). You can also create custom templates to meet specific business requirements.

  2. Monitoring Rules: A specific instance of a rule template. You can apply a template to a column in a table and configure a specific threshold. For example, the order_count column in the daily_sales table cannot be null.

  3. Monitor: An execution plan that associates one or more Monitoring Rules with a scheduling node. When the scheduling node runs successfully, it automatically triggers all associated quality rules for validation.

  4. Strong/weak rules and blocking: You can set rules as Strong rules or Weak rules. If a check fails, you can choose to Blocks downstream tasks or only send an Alert.

Typical workflow:

Important

Virtual nodes and dry-run nodes do not generate actual data and cannot trigger data quality rules.

image

Features

DataWorks data quality supports quality checks for common big data storage services, such as MaxCompute, E-MapReduce, Hologres, and AnalyticDB. You can configure monitoring rules across multiple dimensions like completeness, accuracy, and consistency, and associate them with scheduling nodes to enable automated validation, alerts, and blocking.

The main modules of data quality and their corresponding pages in the console are as follows:

Module

Description

Data quality overview

The data quality overview page displays key data quality metrics for your workspace, including the trend and distribution of quality rule check statuses triggered by instance runs, tables with the most quality issues and the responsible owners, quality rule coverage, and more. This helps quality managers quickly understand the overall data quality status of a workspace and promptly address quality issues to improve data quality.

Quality assets

Rule list

Displays a list of all configured quality rules.

Rule template library

Data quality supports custom rule template libraries. You can centrally manage common custom monitoring rules in a template library to improve the efficiency of rule configuration.

Rule configuration

Configure rules: By table

One of the primary methods for configuring monitoring rules, which allows you to perform fine-grained configuration for a single table.

Configure rules by template

Batch rule configuration for multiple tables that meet specified conditions based on existing rule templates.

Quality operations

Quality monitor

The quality monitor list page displays all quality monitor tasks created in the current workspace.

Running records

Displays the rule check results of quality monitor task runs. After a quality monitor task runs, you can view the details on the running records page.

Quality analysis

Quality reports

Data quality allows you to create report templates and add various metrics for rule configuration and rule execution. Reports are automatically generated and sent based on the configured statistical period, delivery time, and subscription settings.

Billing

The costs of running data quality rules consist of two parts:

  • DataWorks charges: Pay-as-you-go billing based on the number of data quality rule instance runs. For more information, see Billing of data quality.

  • Compute engine costs: Data quality rule checks generate SQL statements and submit them to the underlying compute engine for execution. This incurs compute costs from the respective engine (for example, MaxCompute compute costs). These costs are charged by the engine provider and are not reflected in your DataWorks bill.

Notes

  • Supported data sources: Only MaxCompute, Hologres, E-MapReduce, DLF, CDH Hive, AnalyticDB PostgreSQL, AnalyticDB MySQL, StarRocks, MySQL, Lindorm, and SQL Server are supported. The supported regions vary by data source type. Refer to the regions supported by the specific engine.

  • Metadata collection: Before you configure rules for non-MaxCompute data sources such as E-MapReduce, Hologres, AnalyticDB, and CDH, you must complete metadata collection first. For more information, see Configure metadata collection.

  • Network connectivity: When you check non-MaxCompute data sources, the associated scheduling node must run on a resource group with a configured network connectivity solution.

Configuration and usage workflow

1. Configure rules

  • Create Rule: Data quality allows you to create data quality rules on a per-table basis. You can also use built-in or custom rule templates to create data quality rules in batches for multiple tables. For more information, see Create a rule for a single table and Create rules in batches.

  • Subscribe to alerts: After rules are created, you can configure alert notifications through subscriptions. Multiple channels are supported, including email, SMS, DingTalk group bot, WeCom, Lark, phone, and custom webhooks.

    Only DataWorks Enterprise Edition and higher support the custom Webhook method.

2. Trigger rule checks

In a Monitor, associate rules with a scheduling node. When the scheduling node runs successfully in Operation Center, the associated data quality rules are automatically triggered for validation. DataWorks determines whether to set the task instance to failed and block downstream tasks based on the rule strength and check results, preventing dirty data from spreading.

3. View check results

On the Running Records page, you can search by table or node name and view the detailed check results and logs for each quality monitor run. For more information, see View running records.

FAQ

Are DataWorks data quality alerts and MaxCompute DingTalk group alerts duplicated?

MaxCompute does not provide an independent data quality monitoring feature. The data quality alerts received in MaxCompute DingTalk groups are actually configured and triggered through DataWorks data quality (DQC). DataWorks data quality generates SQL statements from check rules and submits them to compute engines such as MaxCompute for execution. After the checks are completed, notifications are sent based on the alert subscriptions configured for the rules.

Therefore, DataWorks data quality alerts and the alerts in MaxCompute DingTalk groups are not two independent systems and do not overlap in functionality. You only need to configure monitoring rules and alert subscriptions in DataWorks data quality.

Whether to configure data quality monitoring for a table depends on how critical the data in that table is to your business. For core business tables, configuring quality monitoring enables timely alerts or blocking of downstream tasks when data anomalies occur, preventing empty or dirty data from causing business disruptions.