The Asset Quality Overview page displays an overview of data quality in the current workspace, including the main data quality metrics, the trend and distribution of rule-based check instances, the top N tables that have the maximum number of data quality issues, the owners of the issues, and the coverage status of monitoring rules. This helps data quality owners understand the overall data quality condition of the current workspace and handle data quality issues at the earliest opportunity to improve data quality.
Limits
This feature is in invitational preview. If you want to use the feature, contact technical personnel.
Go to the Overview page
Log on to the DataWorks console. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.
On the DataStudio page, click the icon in the upper-left corner and choose . The Overview page appears.
Usage notes
The Overview page displays the statistics on asset quality in the current workspace.
In the upper-right corner of the Asset Quality Overview page, you can select Current Day, Previous Day, Day Before Previous Day, or another specific day based on your business requirements to view the statistics on asset quality in the selected time range. By default, Current Day is selected.
In the upper-right corner of the Asset Quality Overview page, you can also select View Production Environment Only to view the statistics on asset quality in the workspace only in the production environment.
NoteIf you select View Production Environment Only, the check results of tables in the workspace in the development environment are not included in the statistics. Only workspaces in standard mode provide both the development environment and production environment and isolate the environments. All data tables in a workspace in basic mode are tables in the production environment. For more information, see Differences between workspaces in basic mode and workspaces in standard mode.
You can also view the asset quality overview of a workspace from the following perspectives:
All compute engines: If you select All, the Asset Quality Overview page displays the statistics on asset quality of all types of compute engines in the current workspace.
Specific compute engine: If you select a specific compute engine, the Asset Quality Overview page displays the statistics on asset quality of the compute engine in the current workspace.
NoteStatistics on streaming data are not supported.
Main data quality metrics
On the right side of this section, you can select Rule or Table to view the asset quality statistics related to rules or tables.
Category | Metric | Description |
Table | Number of configured rule tables | The number of tables for which monitoring rules are configured in the current workspace as of the current day. The date of the current day is determined by the date that is selected in the upper-right corner of the Asset Quality Overview page. |
Number of quality problem tables | The number of tables that failed the check of monitoring rules after the running of the rule-based check instances is complete on the current day. The tables that failed the quality threshold check and tables that failed the system check are included. Number of detected tables: The number of tables for which the related rule-based check instances are identified and the running of the instances is complete on the current day. The running of rule-based check instances is considered complete in the following scenarios: Tables passed the quality check, tables failed the quality check, and tables failed the system check. | |
Strong rule exception table | The number of tables that failed the check of strong monitoring rules after the running of the rules is complete on the current day. The tables that failed the quality threshold check and tables that failed the system check are included. Number of blocking tables: indicates the number of tables that failed the critical threshold check of a strong monitoring rule on the current day. Number of alarm tables: indicates the number of tables that failed the warning threshold check of a strong monitoring rule on the current day. | |
Weak rule exception table | The number of tables that failed the check of weak monitoring rules after the running of the rules is complete on the current day. The tables that failed the quality threshold check and tables that failed the system check are included. Number of alarm tables: indicates the number of tables that failed the critical threshold check of a weak monitoring rule on the current day. Prompt Table Number: indicates the number of tables that failed the warning threshold check of a weak monitoring rule on the current day. | |
Rule | Total Quality Rules | The total number of monitoring rules that are created in the current workspace as of the current day. |
Number of problem rules | The number of monitoring rules based on which tables failed the check after the running of the rule-based check instances is complete on the current day. The quality threshold rules and system check rules are included. Number of detected rules: The number of monitoring rules for which the running of the related rule-based check instances is complete on the current day. The running of rule-based check instances is considered complete in the following scenarios: Tables passed the quality check, tables failed the quality check, and tables failed the system check. | |
Strong Rule Exceptions | The number of strong monitoring rules based on which tables failed the check after the running of the rule-based check instances is complete on the current day. The quality threshold rules and system check rules are included. Strong rule Red blocking number: indicates the number of strong monitoring rules based on which tables failed the critical threshold check after the running of the rule-based check instances is complete on the current day. Number of strong rule orange warnings: indicates the number of strong monitoring rules based on which tables failed the warning threshold check after the running of the rule-based check instances is complete on the current day. | |
Weak Rule Exceptions | The number of weak monitoring rules based on which tables failed the check after the running of the rule-based check instances is complete on the current day. The quality threshold rules and system check rules are included. Number of weak rule Red warnings: indicates the number of weak monitoring rules based on which tables failed the critical threshold check after the running of the rule-based check instances is complete on the current day. Weak Rule Orange Prompt Number: indicates the number of weak monitoring rules based on which tables failed the warning threshold check after the running of the rule-based check instances is complete on the current day. |
Overview of rule-based check
This section displays the trend and distribution of rule-based check instances.
Metric | Description | Screenshot |
Instance trend analysis | Displays the trend of the number of rule-based check instances with different check results. You can select By Day or By Hour to view the trend from the selected perspective. You can select All, Strong rules, or Weak rules to view the trend of the number of rule-based check instances with check results of all types of monitoring rules, check results of only strong monitoring rules, and check results of only weak monitoring rules. | |
Running status | Displays the distribution of rule-based check instances with different check results. You can view the number of rule-based check instances that are run on the current day. You can select All, Strong rules, or Weak rules to view the distribution of rule-based check instances with check results of all types of monitoring rules, check results of only strong monitoring rules, and check results of only weak monitoring rules. Rule-based check results:
|
Overview of the top N tables that have the maximum number of data quality issues
This section displays the statistics on the top 10 tables that have the maximum number of data quality issues within the specified period of time and the owners of the issues based on the ranking of the numbers of monitoring rules for the tables. You can click View more in the upper-right corner to go to the Node Query page. On this page, you can view the historical checks and check details and resolve data quality issues at the earliest opportunity.
The owner of a monitoring rule is the data quality owner of a data table partition. In most cases, the owner is the user who creates a partition filter expression.
Asset quality configuration analysis
This section displays the overall coverage of monitoring rules from the following dimensions: Ratio of tables for which rules are configured, Number of tables for which rules are not configured, Number of tables for which rules are not enabled, Number of tables configured with rules with which no scheduling node is associated, and Number of tables configured with rules for which no alert recipient is configured. You can also click View Details in the Actions column of a table to view the rule configuration details of the table from this dimension. Description of each dimension:
Number of tables for which rules are not configured: Tables for which no monitoring rule is configured are included in statistics.
NoteThis dimension is supported only for MaxCompute data sources.
Number of tables for which rules are not enabled: Tables for which no monitoring rule is enabled are included in statistics.
Number of tables configured with rules with which no scheduling node is associated: Tables configured with rules with which no scheduling node is associated are included in statistics. If no scheduling node is associated with a monitoring rule, the monitoring rule can be executed only in a trial run.
Number of tables configured with rules for which no alert recipient is configured: Tables configured with rules for which no alert recipient is configured are included in statistics. If no alert recipient is configured for a monitoring rule, when the rule is executed, no one can receive the check result in a timely manner if data quality issues occur.