This topic describes the release notes and new features of DataWorks V2.0.
Release information
- Date: July 25, 2018
- Region: China (Shanghai)
- Content: Based on DataWorks V1.0, DataWorks V2.0 adds some concepts including the workflow and SQL script template, and also improves the data development process. DataWorks V2.0 supports workspaces in basic mode and standard mode. Workspaces in standard mode isolate the development environment from the production environment, which can help you develop data in a standard manner and reduce errors in code.
Available regions
All Alibaba Cloud regions support DataWorks V2.0.
Operation updates
Feature updates
DataWorks V2.0 improves the overall visual interaction and user experience of the DataStudio service. In addition, DataWorks V2.0 provides four new services: Alarm, Data Security Guard, Data Quality, and DataService Studio.
Service | Feature | Item | DataWorks V1.0 | DataWorks V2.0 | Description |
---|---|---|---|---|---|
MaxCompute project | Project management | Project management method | A DataWorks workspace is associated with one MaxCompute project. | The standard mode is added for workspaces. In standard mode, a DataWorks workspace is associated with two MaxCompute projects to isolate the development environment from the production environment. For more information, see Differences between workspaces in basic mode and workspaces in standard mode. | The development environment can be isolated from the production environment to ensure the stability of code in the production environment. |
DataStudio | Node development | Overall features | You can develop code and configure scheduling properties for a single node or a single flow. Then, you can commit the node or flow to Operation Center for automatic scheduling and running. |
|
|
SQL development | You can develop and test code for a single node or a single flow in the SQL editor. | The SQL editor provides more intelligent and user-friendly features, including syntax highlight, code formatting, intelligent code completion, error prompts, and schemas. The tab on the right of the SQL editor displays the SQL code structure in an intuitive manner. | |||
Node configuration | You can develop your business only by associating nodes and flows. | Flows are replaced with workflows. You can develop your business by associating nodes in a workflow and adding the required tables, resources, and functions to the workflow. You can also integrate correlated workflows into a solution for unified management. | |||
Scheduling cycle configuration | The scheduling cycle configuration of a node is subject to the scheduling cycle configuration of the flow to which the node belongs. | You cannot configure the scheduling cycle for associated nodes as a whole. Instead, you must configure the scheduling cycle for each node separately. The scheduling type of a node can be the same as or different from that of its ancestor or descendant nodes. | |||
Dependency | You can configure dependencies only between flows. | You can configure dependencies between nodes of different workflows. | |||
Script | Overall features | You can use scripts to process acyclic temporary data. For example, you can use a script to add, delete, or modify a temporary table. This feature is a supplement to auto triggered nodes and does not require scheduling cycle and dependency configurations. | The feature name is changed to Ad Hoc Query. | ||
Manually triggered nodes | Overall features | In a manually triggered flow, all nodes must be manually triggered and cannot be automatically scheduled by DataWorks. | The feature name is changed to Manually Triggered Workflows. | ||
Resource management | Overall features | Resource management is an independent feature used to manage all resources in a MaxCompute project, including the JAR, file, and archive resources. | Resource management is changed to a feature of workflows. You can add the required resources to a workflow and create multi-level folders to manage these resources. | ||
Function management | Overall features | Function management is an independent feature used to manage the built-in functions and custom functions required by an ODPS SQL node. | Function management is an independent feature used to manage all functions. It is also a feature of workflows and allows you to manage the functions of a workflow. | ||
Table query | Overall features | You can find, preview, and reference all tables of a MaxCompute project. | No update. | ||
Table management | Overall features | Not supported. | Table management is added. It allows developers to manage their own tables. For example, developers can configure the lifecycle and modify the category, description, fields, and partitions for tables, and hide, show, and delete tables. | ||
Ad hoc query | Overall features | Not supported. | The ad hoc query feature is added to test code in the development environment. You do not need to commit and deploy ad hoc query nodes or configure scheduling parameters for ad hoc query nodes. | ||
SQL script template management | Overall features | Not supported. | SQL script template management is added. It allows you to abstract SQL code as an SQL script template to reuse the SQL code. You can select SQL script templates and configure input and output parameters for these templates based on your business requirements. | ||
Operating history | Overall features | Not supported. | You can view the records of all nodes that are run in the development environment in the last three days. You can also view and filter the running results of nodes. | ||
Filtering of SQL results | Overall features | Not supported. | An Excel component is integrated into the SQL editor so that you can filter, screen, and sort the running results of SQL statements. | ||
Recycle bin | Overall features | Not supported. | A recycle bin is added to prevent misoperation. You can view the nodes deleted from the current workspace in the recycle bin and restore them based on your business requirements. | ||
Global code search | Overall features | Not supported. | Global code search is supported. You can search for an ODPS SQL, Shell, or data synchronization node by entering a part of a string included in the node to quickly find the node that you want to view or manage. | ||
Node deployment | Overall features | DataWorks V1.0 supports node deployment. | The feature name is changed to Cross-project cloning. You can clone a node only between workspaces in basic mode. | ||
Operation Center | Node list | Feature | You can search for a node by node type, node name, and owner in the node list. | More filter conditions are added to help you search for nodes, including Workflow, Solution, and Baseline. | Nodes can be managed based on the business. More features are added for node development. |
Node O&M | Feature | You can search for a node by node type, node name, owner, data timestamp, and running date in the node list. | More filter conditions are added to help you search for nodes, including Workflow, Solution, and Baseline. | ||
Alerting | Feature | You can configure monitoring and alerting based on the completion status and running status of nodes. | The smart baseline, event alerting, and rule management features are added to build a more intelligent and comprehensive alerting system. | ||
Alarm | The Alarm service is added as a node monitoring and analysis system of DataWorks. The intelligent monitoring system monitors the status of nodes and sends alert notifications based on the intervals, notification methods, and notification recipients that are specified in alert rules. It can automatically select the most appropriate alerting time, notification methods, and recipients. | An end-to-end cloud platform is built to provide data development, security, governance, and sharing services. | |||
Data Quality | The Data Quality service is added for you to control the data quality of heterogeneous data sources. In Data Quality, you can check data quality, configure alert notifications, and manage data sources. Data Quality monitors data in datasets and allows you to monitor MaxCompute tables and DataHub topics. When offline MaxCompute data changes, Data Quality checks data and blocks nodes if it detects exceptions. This prevents nodes from being affected. In addition, Data Quality allows you to manage the check result history so that you can analyze and evaluate the data quality. | ||||
DataService Studio | The DataService Studio service is added. You can quickly create APIs in DataService Studio based on tables, register existing APIs with DataService Studio, publish APIs, and manage APIs in a centralized manner. DataService Studio and API Gateway are interconnected. This allows you to publish APIs to API Gateway with ease. DataService Studio works together with API Gateway to provide a secure, stable, low-cost, and easy-to-use data sharing service. | ||||
Data Security Guard | The Data Security Guard service is added to identify data assets, detect sensitive data, classify data, mask data, monitor data access behavior, report alerts, and audit risks. |