DataWorks V2.0 - DataWorks - Alibaba Cloud Documentation Center

This topic describes the release notes and new features of DataWorks V2.0.

Release information

Version: DataWorks V2.0

Date: July 25, 2018
Region: China (Shanghai)
Content: Based on DataWorks V1.0, DataWorks V2.0 adds some concepts including the workflow and SQL script template, and also improves the data development process. DataWorks V2.0 supports workspaces in basic mode and standard mode. Workspaces in standard mode isolate the development environment from the production environment, which can help you develop data in a standard manner and reduce errors in code.

Available regions

All Alibaba Cloud regions support DataWorks V2.0.

Operation updates

Feature updates

DataWorks V2.0 improves the overall visual interaction and user experience of the DataStudio service. In addition, DataWorks V2.0 provides four new services: Alarm, Data Security Guard, Data Quality, and DataService Studio.


Service	Feature	Item	DataWorks V1.0	DataWorks V2.0	Description
MaxCompute project	Project management	Project management method	A DataWorks workspace is associated with one MaxCompute project.	The standard mode is added for workspaces. In standard mode, a DataWorks workspace is associated with two MaxCompute projects to isolate the development environment from the production environment. For more information, see Differences between workspaces in basic mode and workspaces in standard mode.	The development environment can be isolated from the production environment to ensure the stability of code in the production environment.
DataStudio	Node development	Overall features	You can develop code and configure scheduling properties for a single node or a single flow. Then, you can commit the node or flow to Operation Center for automatic scheduling and running.	The service name is changed to DataStudio. The concepts of a solution and a workflow are added. The concept of a flow is deleted. The SQL editor is more intelligent. More permissions are granted on node scheduling and dependency configurations.	The SQL editor is optimized to provide more user-friendly and immersive SQL development experience. Node development and management are simplified by using workflows and solutions. More permissions are granted in the scheduling system to handle complex business requirements. Some other features are added to resolve pain points and improve user experience.
		SQL development	You can develop and test code for a single node or a single flow in the SQL editor.	The SQL editor provides more intelligent and user-friendly features, including syntax highlight, code formatting, intelligent code completion, error prompts, and schemas. The tab on the right of the SQL editor displays the SQL code structure in an intuitive manner.
		Node configuration	You can develop your business only by associating nodes and flows.	Flows are replaced with workflows. You can develop your business by associating nodes in a workflow and adding the required tables, resources, and functions to the workflow. You can also integrate correlated workflows into a solution for unified management.
		Scheduling cycle configuration	The scheduling cycle configuration of a node is subject to the scheduling cycle configuration of the flow to which the node belongs.	You cannot configure the scheduling cycle for associated nodes as a whole. Instead, you must configure the scheduling cycle for each node separately. The scheduling type of a node can be the same as or different from that of its ancestor or descendant nodes.
		Dependency	You can configure dependencies only between flows.	You can configure dependencies between nodes of different workflows.
	Script	Overall features	You can use scripts to process acyclic temporary data. For example, you can use a script to add, delete, or modify a temporary table. This feature is a supplement to auto triggered nodes and does not require scheduling cycle and dependency configurations.	The feature name is changed to Ad Hoc Query.
	Manually triggered nodes	Overall features	In a manually triggered flow, all nodes must be manually triggered and cannot be automatically scheduled by DataWorks.	The feature name is changed to Manually Triggered Workflows.
	Resource management	Overall features	Resource management is an independent feature used to manage all resources in a MaxCompute project, including the JAR, file, and archive resources.	Resource management is changed to a feature of workflows. You can add the required resources to a workflow and create multi-level folders to manage these resources.
	Function management	Overall features	Function management is an independent feature used to manage the built-in functions and custom functions required by an ODPS SQL node.	Function management is an independent feature used to manage all functions. It is also a feature of workflows and allows you to manage the functions of a workflow.
	Table query	Overall features	You can find, preview, and reference all tables of a MaxCompute project.	No update.
	Table management	Overall features	Not supported.	Table management is added. It allows developers to manage their own tables. For example, developers can configure the lifecycle and modify the category, description, fields, and partitions for tables, and hide, show, and delete tables.
	Ad hoc query	Overall features	Not supported.	The ad hoc query feature is added to test code in the development environment. You do not need to commit and deploy ad hoc query nodes or configure scheduling parameters for ad hoc query nodes.
	SQL script template management	Overall features	Not supported.	SQL script template management is added. It allows you to abstract SQL code as an SQL script template to reuse the SQL code. You can select SQL script templates and configure input and output parameters for these templates based on your business requirements.
	Operating history	Overall features	Not supported.	You can view the records of all nodes that are run in the development environment in the last three days. You can also view and filter the running results of nodes.
	Filtering of SQL results	Overall features	Not supported.	An Excel component is integrated into the SQL editor so that you can filter, screen, and sort the running results of SQL statements.
	Recycle bin	Overall features	Not supported.	A recycle bin is added to prevent misoperation. You can view the nodes deleted from the current workspace in the recycle bin and restore them based on your business requirements.
	Global code search	Overall features	Not supported.	Global code search is supported. You can search for an ODPS SQL, Shell, or data synchronization node by entering a part of a string included in the node to quickly find the node that you want to view or manage.
	Node deployment	Overall features	DataWorks V1.0 supports node deployment.	The feature name is changed to Cross-project cloning. You can clone a node only between workspaces in basic mode.
Operation Center	Node list	Feature	You can search for a node by node type, node name, and owner in the node list.	More filter conditions are added to help you search for nodes, including Workflow, Solution, and Baseline.	Nodes can be managed based on the business. More features are added for node development.
	Node O&M	Feature	You can search for a node by node type, node name, owner, data timestamp, and running date in the node list.	More filter conditions are added to help you search for nodes, including Workflow, Solution, and Baseline.
	Alerting	Feature	You can configure monitoring and alerting based on the completion status and running status of nodes.	The smart baseline, event alerting, and rule management features are added to build a more intelligent and comprehensive alerting system.
Alarm	The Alarm service is added as a node monitoring and analysis system of DataWorks. The intelligent monitoring system monitors the status of nodes and sends alert notifications based on the intervals, notification methods, and notification recipients that are specified in alert rules. It can automatically select the most appropriate alerting time, notification methods, and recipients.				An end-to-end cloud platform is built to provide data development, security, governance, and sharing services.
Data Quality	The Data Quality service is added for you to control the data quality of heterogeneous data sources. In Data Quality, you can check data quality, configure alert notifications, and manage data sources. Data Quality monitors data in datasets and allows you to monitor MaxCompute tables and DataHub topics. When offline MaxCompute data changes, Data Quality checks data and blocks nodes if it detects exceptions. This prevents nodes from being affected. In addition, Data Quality allows you to manage the check result history so that you can analyze and evaluate the data quality.
DataService Studio	The DataService Studio service is added. You can quickly create APIs in DataService Studio based on tables, register existing APIs with DataService Studio, publish APIs, and manage APIs in a centralized manner. DataService Studio and API Gateway are interconnected. This allows you to publish APIs to API Gateway with ease. DataService Studio works together with API Gateway to provide a secure, stable, low-cost, and easy-to-use data sharing service.
Data Security Guard	The Data Security Guard service is added to identify data assets, detect sensitive data, classify data, mask data, monitor data access behavior, report alerts, and audit risks.