This topic describes the features and basic use scenarios of DataWorks modules.
Main modules and data processing procedure
DataWorks is an end-to-end data development and governance platform. The data processing procedure includes the following phases.
The following sections describe the DataWorks modules that are involved in each phase.
Module: Data Integration
Data Integration is a stable, efficient, and scalable data synchronization platform. It is designed to migrate and synchronize data between various heterogeneous data sources in complex network environments at a high speed and in a stable manner. Data Integration supports batch synchronization, real-time synchronization, and integrated batch and real-time synchronization. Data Integration allows you to synchronize data at the table or database level. For example, you can synchronize full and incremental data in a database in real time, or synchronize data in a database in offline mode.
References: Data Integration overview
Data modeling and development
Module: Data Modeling
Data Modeling is the first step for end-to-end data governance. Data Modeling uses the modeling methodology of the Alibaba data mid-end, interprets the business data of an enterprise from a business perspective by using the data warehouse planning, data standard, dimensional modeling, and data metric modules, and allows personnel inside the enterprise to quickly understand and share the idea of measuring and interpreting business data in compliance with data warehousing specifications.
References: Data Modeling overview
DataStudio is an end-to-end big data development platform that you can use to develop data processing tasks of multiple types of big data compute engines such as MaxCompute, E-MapReduce (EMR), Hologres, CDP, and AnalyticDB online. DataStudio is integrated with powerful task scheduling capabilities and can support centralized orchestration and scheduling for tens of millions of instances. DataStudio also provides a control process for task deployment, which can ensure the stability of task output.
References: DataStudio overview
Module: Operation Center
Operation Center is an end-to-end big data O&M and monitoring platform. Operation Center allows you to view the status of tasks and perform O&M operations on tasks on which exceptions occur. For example, you can perform intelligent diagnostics and rerun tasks in Operation Center. Operation Center provides the intelligent baseline feature that you can use to resolve issues such as uncontrollable output time of important tasks and difficulties in monitoring of massive tasks. This feature helps you ensure the timeliness of task output.
References: Operation Center overview
Module: Data Quality
Data Quality ensures data availability for the end-to-end data R&D process and provides reliable data for your business in an efficient manner. Data Quality can help you identify data quality issues at the earliest opportunity and prevent data quality issues from escalating by virtue of effective monitoring rule-based quality checks and the combination of monitoring rules and task scheduling processes.
References: Data Quality overview
DataAnalysis allows you to analyze, edit, and share data online. It provides the SQL query and workbook features.
References: DataAnalysis overview
Module: Data Map
Data Map is an enterprise-grade data management platform that provides management, sorting, quick search, and in-depth understanding capabilities for data objects based on the underlying unified metadata services.
References: Data Map overview
Module: Security Center
Security Center is an end-to-end data security governance platform that covers classification of data assets, sensitive data identification, management on data-related authorization, masking of sensitive data, audit of access to sensitive data, and risk identification and response. Security Center helps you determine data security governance issues.
References: Security Center overview
Module: Data Governance Center
Data Governance Center automatically identifies items to be governed for multiple governance fields based on rules that come from experience in data-related fields, and provides governance and optimization solutions covering pre-event issue prevention to post-event issue resolution. Data Governance Center can help you actively and systematically complete data governance.
References: Data Governance Center overview
Module: DataService Studio
DataService Studio is designed to provide comprehensive data service and sharing capabilities for enterprises and helps enterprises manage API services for internal and external systems in a centralized manner.
References: DataService Studio overview
Scenario and module
Data security - Data Security Guard
Data Security Guard is a module that ensures data security. The module provides various features, such as identifying and masking sensitive data, adding watermarks to data, managing data permissions, identifying and auditing data risks, and tracing leak sources.
Process control and openness - Open Platform
Open Platform provides the OpenAPI, OpenEvent, and Extensions modules. You can use the modules to integrate DataWorks with your applications and subscribe to event messages. These modules facilitate process management of data processing, data governance, and data O&M, and allow you to identify important changes in DataWorks and respond to the changes at the earliest opportunity.
Task backup and migration - Migration Assistant
Migration Assistant allows you to migrate jobs of open source scheduling engines to DataWorks. Migration Assistant also allows you to migrate data objects within DataWorks across clouds, regions, or accounts. This way, you can quickly clone and deploy jobs in DataWorks. To quickly migrate data and jobs to the cloud, you can obtain help from the DataWorks team and the big data service team of Alibaba Cloud.