All Products
Search
Document Center

DataWorks:Overview

Last Updated:Aug 18, 2023

Data Governance Center can detect issues that need to be handled in the data storage, node computing, code development, data quality, and security dimensions. Data Governance Center provides health scores to evaluate the effectiveness of data governance and visualizes the governance results by providing governance reports and the rankings of governance issues from the global, workspace, and personal perspectives. This helps you achieve governance goals in an efficient manner. Data Governance Center provides various features, such as node resource consumption details and cost estimation, to help you control the costs of various types of resources in an efficient manner.

Limits

  • Limits on editions

    Only DataWorks Enterprise Edition or a more advanced edition supports Data Governance Center. For information about DataWorks editions, see Differences among DataWorks editions. For information about how to activate DataWorks, see Purchase guide.

  • Limits on regions

    Data Governance Center is available in the following regions: China North 2 Ali Gov, China East 2 Finance, China (Shanghai), China (Hangzhou), China (Beijing), China (Shenzhen), China (Chengdu), China (Hong Kong), Singapore, US (Silicon Valley), Germany (Frankfurt), and Indonesia (Jakarta).

  • Limits on permissions

    • Users of Data Governance Center are mainly workspace administrators or common users. The following table describes the permissions of workspace administrators and common users on Data Governance Center.

      Role

      Permission description

      References

      Workspace administrator

      A workspace administrator can view governance reports from the workspace perspective. If you want to view governance reports in a workspace from the workspace perspective, you must be assigned the workspace administrator role of the workspace.

      Common user

      Common users are the personnel who handle detected issues in Data Governance Center. A common user can view check events and governance issues from the personal perspective and perform rectification operations. If you want to perform rectification operations on issues that are identified in a workspace of a tenant, you must be added to the workspace as a member.

      Note

      By default, except for Alibaba Cloud accounts and RAM users to which the AliyunDataWorksFullAccess policy is attached, all other users are common users within a tenant.

      For information about how to grant permissions to users, see Add a RAM user to a workspace as a member and assign roles to the member.

    • Only Alibaba Cloud accounts and RAM users to which the AliyunDataWorksFullAccess policy is attached can use all features of Data Governance Center. The features that common users can use in Data Governance Center are limited. If you want to use all features of Data Governance Center as a RAM user, you must apply for the required permissions. For more information, see Grant the permissions to perform operations in DataWorks to a RAM user.

  • Limits on data sources

    Only MaxCompute and E-MapReduce (EMR) data sources support Data Governance Center.

Logic of data governance

Data Governance Center detects check events based on check items before data development nodes are committed and deployed and detects governance issues based on governance items after the nodes are committed and deployed. This helps you handle events and issues that are related to your data in a comprehensive manner. If the check on an item is triggered for a node and the node fails the check, an event is generated. Severe events may block the subsequent data development process. You can view and handle the event in Data Governance Center. After the event is handled and the node passes the check, you can proceed to the subsequent data development process. The following figure shows the logic of data governance. 数据治理逻辑图DataWorks provides workspaces in standard and basic modes. The node development process varies based on the workspace mode. In this topic, a workspace in standard mode is used to show how to develop a node. The actual data development process varies based on the mode of your workspace. For information about the common development process in workspaces in different modes, see Node development process.

  • Check for violations against constraints based on check items.

    Check items are used to check nodes for violations against constraints before nodes are committed and deployed. Before you commit and deploy nodes, you can check whether data violates the constraints that you specify for data development by using the check items. If the system detects that the data violates the constraints, a check event is generated and blocks the subsequent data development process. You can handle issues that are related to the check event. This way, the data development process can be executed as expected.

  • Detect issues based on governance items.

    Governance items are used to detect governance issues after nodes are committed and deployed. After nodes are committed and deployed, you can view governance issues from the global, personal, or workspace perspective in Data Governance Center. Data governance engineers can handle detected governance issues and implement measures at the earliest opportunity to achieve data governance goals.

Terms

  • check item: A check item is used to check nodes for violations against constraints before the nodes are committed and deployed and generate check events that block the subsequent data development process. Check items can help you restrict and manage the data development process.

    For example, a check item can be configured to prohibit the use of the select* statement or the CREATE TABLE statement.

  • check event: A check event is triggered by a check item and blocks the subsequent data development process.

  • governance item: A governance item is used by DataWorks to detect governance issues after nodes are committed and deployed. Governance items are classified into mandatory governance items and optional governance items. By default, mandatory governance items are globally enabled and cannot be disabled. You can enable optional governance items based on your business requirements.

    For example, you can use governance items to detect the nodes that time out, nodes that fail to run for consecutive times, leaf nodes that are not accessed by users, or dry-run nodes.

  • governance issue: A governance issue is detected by DataWorks based on specific governance items and needs to be resolved based on data governance and optimization.

  • governance plan template: A governance plan template is provided by Data Governance Center and contains built-in check items and governance items. By default, the governance plan template is enabled. You can use the governance plan template to detect issues in data. If the built-in check items and governance items in the template cannot meet your business requirements, you can add custom check items, and create rules to disable the governance items that you do not want to use.

  • health score: A health score is calculated based on the health assessment model provided by DataWorks and reflects the effectiveness of data governance.

  • governance unit: A governance unit consists of one or more workspaces. You can view statistics on the overall health score, governance issues, and check events of the workspaces within the governance unit.

  • knowledge base: The knowledge base provides solutions that are used to check events and governance issues detected based on the built-in check items and governance items.

Data governance procedure

The following figure shows the data governance procedure.治理流程

  1. Configure governance tools.

    • Enable a governance plan template and configure custom items.

      Operation

      Description

      References

      Enable a governance plan template

      Data governance is performed based on a governance plan template in Data Governance Center. The governance plan template contains built-in check items and governance items. To use the data governance feature, you must enable a governance plan template before you perform the subsequent operations. You can use only the default governance plan template, which is enabled by default.

      View a governance plan template

      Configure custom check items

      If the check items provided in the template do not meet your business requirements, you can configure custom check items based on your business requirements.

      • Create a check item for a registered custom extension.

        DataWorks also allows you to create a check item in Data Governance Center for a custom extension. After that, Data Governance Center also detects the check events triggered by the custom extension.

      • Disable one or more check items.

        If the governance plan template contains a check item that is unnecessary for a workspace, you can disable the check item for this workspace. After you disable the check item, Data Governance Center does not detect the check event triggered by the check item in the specified workspace.

      Configure check items

      Create rules to disable governance items

      If the governance plan template contains a governance item that is unnecessary for a workspace, you can create a rule to disable the governance item in the specified workspace. After you disable the governance item, Data Governance Center does not detect governance issues based on the governance item in the specified workspace. No governance issues for the governance item are displayed on the Governance issues page.

      Note

      You can disable only optional governance items. You cannot create governance items.

      Configure governance items

    • Optional. Configure a governance unit.

      DataWorks allows you to perform data governance on multiple workspaces in a centralized manner by creating a governance unit based on your business requirements. Then, you can view statistics on the overall health score, governance issues, and check events for the workspaces within the governance unit. For more information about how to create and manage a governance unit, see Configure a governance unit.

    • Optional. Configure notifications for detected issues.

      If you want the system to notify specified personnel of detected issues by using the notification methods such as system messages, emails, DingTalk group messages, and webhook URLs, you can configure notifications for the detected issues. This way, the specified personnel can view and handle the issues at the earliest opportunity. For more information, see Configure a periodic notification for governance issues.

  2. Start a check and handle detected issues.

    • Check nodes for violations against constraints before the nodes are committed and deployed.

      DataWorks performs a check based on check items. Before nodes are committed and deployed, DataWorks checks the nodes based on the check items. If data violates the constraints, check events are generated. Then, you can view and handle the check events. For more information, see Handle check events.

    • Detect governance issues after the nodes are committed and deployed.

      DataWorks detects governance issues based on governance items. After nodes are committed and deployed, DataWorks detects governance issues based on the governance items. Then, you can view and handle the governance issues. For more information, see View and handle governance issues.

    • Perform a special check by using the features on the Task 360 and Table 360 pages.

      You can use the features on the Task 360 and Table 360 pages to view and detect issues that occur on specific nodes or tables in a comprehensive manner. For more information, see Obtain a panoramic view of a node and Obtain a panoramic view of a table.

    If invalid issues are detected in this process, you can add the issues to a whitelist or undeploy related nodes on which invalid issues are detected. For more information, see Add invalid governance issues to a whitelist and Create and manage node undeployment plans.

  3. Select an analytical perspective.

    • Based on use scenarios: DataWorks provides multiple perspectives such as data production, data usage, and data management to help you analyze the effectiveness of data governance and govern data in an efficient manner.

    • Based on rational use of resources: DataWorks provides statistics on the resource consumption and node running status, the number and storage status of MaxCompute tables, and resource usage overview and details. Data developers and administrators can view and analyze the overall resource situation of a workspace and use resources in a rational manner based on the statistics. For more information, see Data pivoting.

  4. View governance results.

    After you handle the issues, you can go to the Governance assessment page to view the governance results of the operations that you perform from different perspectives on the Report and Rankings tabs. You can analyze the governance results to identify the dimensions and types of governance issues that frequently occur. This can help you take measures to handle the governance issues and achieve data governance goals. For more information about how to view the governance results, see View data governance results.

    Data Governance Center calculates health scores based on the governance items by using the health assessment model. You can view the health scores on the Report and Rankings tabs to learn the governance results. A higher health score indicates a better governance result. For more information about health scores, see the Quantitative assessment: health scores section in this topic.

Quantitative assessment: health scores

Health scores are calculated based on the metadata related to user behaviors, data characteristics, and node type of your data assets in the production, transmission, and management processes. Technologies such as data processing and machine learning are used to calculate health scores. You can view the health scores of your data assets from the personal or workspace perspective. Data Governance Center provides health scores in the following dimensions based on different metadata: storage, computing, R&D, quality, and security.健康分说明

The health scores range from 0 to 100. A higher score indicates healthier data assets. This helps you use data in a secure, efficient, and stable manner and ensure data production and business operation. Data Governance Center calculates health scores based on the governance items by using the health assessment model. You can view the health scores to learn the governance results for the current account. A higher health score indicates a better governance result. The following table lists the assessment grades and the health score range for each grade.

Grade

Health score range

Excellent

[90,100]

Good

[75,90)

Fair

[60,75)

To be improved

[30,60)

Poor

[0,30)