This topic describes the release notes and new features of DataWorks V3.0.
- Date: December 18, 2019
- Region: all regions that support DataWorks
- Content: In addition to the MaxCompute engine supported by DataWorks V2.0, DataWorks
V3.0 integrates many other computing engines such as E-MapReduce, Realtime Compute,
Hologres, and Graph Compute to build a multi-engine architecture.
You can add multiple types of engine instances to manage your workflows, nodes, and tables in a workspace. You can also configure dependencies between nodes in different regions to flexibly schedule nodes and facilitate cross-region collaboration.
- Multiple types of computing engines
DataWorks V3.0 supports plug-ins of multiple computing engines. In addition to the MaxCompute engine supported by DataWorks V2.0, DataWorks V3.0 integrates many other computing engines such as E-MapReduce, Realtime Compute, Hologres, and Graph Compute.
- MaxCompute: MaxCompute is a fast and fully-managed computing platform for large-scale data warehousing. It supports processing exabytes of data. MaxCompute is the first and maturest computing engine supported by DataWorks. Almost all of its features have been seamlessly integrated into DataWorks. For more information, see What is MaxCompute?
- E-MapReduce: E-MapReduce is a big data engine that runs on Alibaba Cloud Elastic Compute
Service (ECS) based on open-source Apache Hadoop and Apache Spark. You can analyze
and process your data by using peripheral systems such as Apache Hive in the Hadoop
and Spark ecosystems.
All services and features of DataWorks V3.0 support E-MapReduce, such as metadata management, Data Map, data lineage, DataStudio, scheduling, node management and monitoring, and Data Quality. Currently, only DataWorks V3.0 Professional Edition and higher support E-MapReduce. To use E-MapReduce in DataWorks, you must bind the target workspace ID to an E-MapReduce cluster and add the endpoint of the E-MapReduce cluster to the whitelist of DataWorks. For more information, see What is E-MapReduce?
- Realtime Compute: Built on Apache Flink, Realtime Compute is a one-stop, high-performance platform used to process big data in real time. All services and features of DataWorks V3.0 support Realtime Compute. DataWorks V3.0 provides Stream Studio for you to create real-time computing nodes by using drag-and-drop operations in directed acyclic graphs (DAGs). DAGs and Flink SQL statements can be converted into each other to facilitate the development of real-time computing nodes with intelligent management and diagnosis.
- Hologres: Hologres is a real-time interactive data analysis service that is fully
compatible with PostgreSQL and seamlessly integrated with Alibaba Cloud big data services.
Using Hologres, you can gain an analytical insight into thousands of billions of concurrent data records from multiple dimensions with low latency and explore for new business opportunities. You can also use your business intelligence (BI) tools with Hologres.
DataWorks V3.0 provides a one-stop online analytical processing (OLAP) service Holo Studio to facilitate standard development management and help you build real-time data warehouses. This contributes to effective and simple development.
- Graph Compute: Graph Compute is a new-generation, one-stop platform used to manage
and analyze graph data. It supports quick data loading, auto scaling, millisecond-level
query latency, hybrid computing engines for online and offline graph computing, and
shared data storage.
DataWorks V3.0 provides Graph Studio based on Graph Compute, which supports one-stop development services, including instance modeling, data import, data query by using Gremlin, and visualized data analysis.
- Custom wrappers
In addition to a variety of computing engines, DataWorks V3.0 Enterprise Edition also supports custom wrappers. For more information, see Overview of custom node types. You can use wrappers to access specified computing services or database query services.
You can also use the intelligent SQL editor to configure custom nodes and schedule, orchestrate, manage, and monitor these custom nodes as required.
- Multiple engine instances in a workspace
In DataWorks V2.0, you can configure only one engine instance for a workspace. For example, if the computing engine is MaxCompute, you can create only one MaxCompute project for a workspace. DataWorks V3.0 Professional Edition and higher allow you to create or bind multiple engine instances for or to a workspace. You can manage the computing engines, computing nodes, and tables required by your business in a more flexible manner.
- Cross-region dependencies
In DataWorks V2.0, you can only configure dependencies between nodes in the same region. DataWorks V3.0 Enterprise Edition and higher allow you to configure dependencies between nodes in different regions in Mainland China under an Alibaba Cloud account. You can use this feature to schedule nodes for your business across different regions.
- Resource group orchestration (available soon)
DataWorks V3.0 will support resource group orchestration. With this feature, you can quickly configure and change the resource group for multiple nodes at a time. For example, you can change multiple nodes from the default shared resource group to an exclusive resource group.
- Workspace import and export (available soon)
DataWorks V2.0 supports the data backup and recovery of workspaces. DataWorks V3.0 upgrades this feature and makes it more flexible. You can import or export nodes, table data definition language (DDL) statements, resources, functions, and connections to or from workspaces. This facilitates workspace migration and initialization.