All Products
Search
Document Center

DataWorks:Usage description of DataWorks modules

Last Updated:Sep 22, 2023

This topic describes the features and basic use scenarios of DataWorks modules.

Main modules and data processing procedure

DataWorks is an end-to-end data development and governance platform. The data processing procedure includes the following phases.

image

The following sections describe the DataWorks modules that are involved in each phase.

Data integration

  • Module: Data Integration

  • Feature description:

    Data Integration is a stable, efficient, and scalable data synchronization platform. It is designed to migrate and synchronize data between various heterogeneous data sources in complex network environments at a high speed and in a stable manner. Data Integration supports batch synchronization, real-time synchronization, and integrated batch and real-time synchronization. Data Integration allows you to synchronize data at the table or database level. For example, you can synchronize full and incremental data in a database in real time, or synchronize data in a database in offline mode.

  • References: Data Integration overview

Data modeling and development

Module: Data Modeling

  • Feature description:

    Data Modeling is the first step for end-to-end data governance. Data Modeling uses the modeling methodology of the Alibaba data mid-end, interprets the business data of an enterprise from a business perspective by using the data warehouse planning, data standard, dimensional modeling, and data metric modules, and allows personnel inside the enterprise to quickly understand and share the idea of measuring and interpreting business data in compliance with data warehousing specifications.

  • References: Data Modeling overview

Module: DataStudio

  • Feature description:

    DataStudio is an end-to-end big data development platform that you can use to develop data processing tasks of multiple types of big data compute engines such as MaxCompute, E-MapReduce (EMR), Hologres, CDP, and AnalyticDB online. DataStudio is integrated with powerful task scheduling capabilities and can support centralized orchestration and scheduling for tens of millions of instances. DataStudio also provides a control process for task deployment, which can ensure the stability of task output.

  • References: DataStudio overview

Module: Operation Center

  • Feature description:

    Operation Center is an end-to-end big data O&M and monitoring platform. Operation Center allows you to view the status of tasks and perform O&M operations on tasks on which exceptions occur. For example, you can perform intelligent diagnostics and rerun tasks in Operation Center. Operation Center provides the intelligent baseline feature that you can use to resolve issues such as uncontrollable output time of important tasks and difficulties in monitoring of massive tasks. This feature helps you ensure the timeliness of task output.

  • References: Operation Center overview

Module: Data Quality

  • Feature description:

    Data Quality ensures data availability for the end-to-end data R&D process and provides reliable data for your business in an efficient manner. Data Quality can help you identify data quality issues at the earliest opportunity and prevent data quality issues from escalating by virtue of effective monitoring rule-based quality checks and the combination of monitoring rules and task scheduling processes.

  • References: Data Quality overview

Data analysis

  • Module: DataAnalysis

  • Feature description:

    DataAnalysis allows you to analyze, edit, and share data online. It provides the SQL query and workbook features.

  • References: DataAnalysis overview

Data governance

Module: Data Map

  • Feature description:

    Data Map is an enterprise-grade data management platform that provides management, sorting, quick search, and in-depth understanding capabilities for data objects based on the underlying unified metadata services.

  • References: Data Map overview

Module: Security Center

  • Feature description:

    Security Center is an end-to-end data security governance platform that covers classification of data assets, sensitive data identification, management on data-related authorization, masking of sensitive data, audit of access to sensitive data, and risk identification and response. Security Center helps you determine data security governance issues.

  • References: Security Center overview

Module: Data Governance Center

  • Feature description:

    Data Governance Center automatically identifies items to be governed for multiple governance fields based on rules that come from experience in data-related fields, and provides governance and optimization solutions covering pre-event issue prevention to post-event issue resolution. Data Governance Center can help you actively and systematically complete data governance.

  • References: Data Governance Center overview

Data service

  • Module: DataService Studio

  • Feature description:

    DataService Studio is designed to provide comprehensive data service and sharing capabilities for enterprises and helps enterprises manage API services for internal and external systems in a centralized manner.

  • References: DataService Studio overview

Other modules

Scenario and module

Feature description

References

Data security - Data Security Guard

Data Security Guard is a module that ensures data security. The module provides various features, such as identifying and masking sensitive data, adding watermarks to data, managing data permissions, identifying and auditing data risks, and tracing leak sources.

Data Security Guard overview

Process control and openness - Open Platform

Open Platform provides the OpenAPI, OpenEvent, and Extensions modules. You can use the modules to integrate DataWorks with your applications and subscribe to event messages. These modules facilitate process management of data processing, data governance, and data O&M, and allow you to identify important changes in DataWorks and respond to the changes at the earliest opportunity.

Open Platform overview

Task backup and migration - Migration Assistant

Migration Assistant allows you to migrate jobs of open source scheduling engines to DataWorks. Migration Assistant also allows you to migrate data objects within DataWorks across clouds, regions, or accounts. This way, you can quickly clone and deploy jobs in DataWorks. To quickly migrate data and jobs to the cloud, you can obtain help from the DataWorks team and the big data service team of Alibaba Cloud.

Migration Assistant overview