This topic describes the release notes and new features of DataWorks V2.0.

Release information

Version: DataWorks V2.0
  • Date: July 25, 2018
  • Region: China (Shanghai)
  • Content: Based on DataWorks V1.0, DataWorks V2.0 adds some concepts including the workflow and SQL script template, and also improves the data development process. DataWorks V2.0 supports workspaces in basic mode and standard mode. It can isolate the development environment from the production environment to help you develop data in a standard manner and reduce errors in code.

Available regions

Currently, all regions support DataWorks V2.0.

Operation updates

  • DataWorks V2.0 does not display retroactive instances created in earlier versions. To check whether a retroactive instance is created, choose Data Management > Tables > Partitions to check whether the corresponding time partition is generated. If the time partition does not appear as expected, we recommend that you create a retroactive instance.

    If you fail to determine whether the retroactive instance is created, submit a ticket.

  • You can watch the video FAQ and key points of DataWorks V2.0 to learn about DataWorks V2.0.

Feature updates

DataWorks V2.0 improves the overall visual interaction and user experience of the DataStudio service. In addition, it adds four services: Monitor, Data Security Guard, Data Quality, and DataService Studio.

Service Feature Item DataWorks V1.0 DataWorks V2.0 Description
MaxCompute project Project management Project management method A DataWorks workspace is associated with one MaxCompute project. The standard mode is added for workspaces. In standard mode, a DataWorks workspace is associated with two MaxCompute projects to isolate the development environment from the production environment. For more information, see Basic mode and standard mode. The development environment can be isolated from the production environment to guarantee the stability of code in the production environment.
DataStudio Node development Overall feature You can develop code and configure scheduling properties for a single node or a single flow. Then, you can commit the node or flow to Operation Center for automatic scheduling and running.
  • The service name is changed to DataStudio.
  • The concepts of a solution and a workflow are added.
  • The concept of a flow is deleted.
  • The SQL editor is more intelligent. More permissions are granted on node scheduling and dependency configurations.
  • The SQL editor is optimized to provide more user-friendly and immersive SQL development experience.
  • Node development and management are simplified by using workflows and solutions.
  • More permissions are granted in the scheduling system to handle complex business demands.
  • Some other features are added to resolve pain points and improve user experience.
SQL development You can develop and test code for a single node or a single flow in the SQL editor. The SQL editor provides more intelligent and user-friendly features, including syntax highlight, code formatting, intelligent code completion, error prompts, and schemas.

The Code Structure tab on the right of the SQL editor displays the SQL code structure in an intuitive manner.

Node configuration You can develop your business only by associating nodes and flows. Flows are replaced by workflows. You can develop your business by associating nodes in a workflow and adding the required tables, resources, and functions to the workflow. You can also integrate correlated workflows into a solution for unified management.
Recurrence configuration The recurrence configuration of a node is subject to the flow recurrence. You cannot configure the recurrence for associated nodes as a whole. Instead, you must configure the recurrence for each node separately. The recurrence type of a node can be the same as or different from that of its ancestor or descendant nodes.
Dependency You can only configure dependencies between flows. You can configure dependencies between nodes of different workflows.
Script Overall feature You can use scripts to process acylic temporary data. For example, you can use a script to add, delete, or modify a temporary table. This feature is a supplement to auto triggered nodes and does not require recurrence and dependency configurations. The feature name is changed to Ad-Hoc Query.
Manually triggered node Overall feature In a manually triggered flow task, all nodes must be manually triggered and cannot be automatically scheduled by DataWorks. The feature name is changed to Manually Triggered Workflows.
Resource management Overall feature Resource management is an independent feature used to manage all resources in a MaxCompute project, including the JAR, file, and archive resources. Resource management is changed to a feature of workflows. You can add the required resources to a workflow and create multi-level folders to manage these resources.
Function management Overall feature Function management is an independent feature used to manage the built-in functions and custom functions required by an ODPS SQL node. Function management is an independent feature used to manage all the functions. It is also a feature of workflows and allows you to manage the functions of a workflow.
Table query Overall feature You can find, preview, and reference all tables of a MaxCompute project. No update.
Table management Overall feature Not supported. Table management is added. It allows developers to manage their own tables. For example, developers can set the time-to-live (TTL) and modify the category, description, fields, and partitions for tables, and hide, show, and delete tables.
Ad-hoc query Overall feature Not supported. The ad-hoc query feature is added to test code in the development environment. You do not need to commit and deploy ad-hoc query nodes or set scheduling parameters for ad-hoc query nodes.
SQL script template management Overall feature Not supported. SQL script template management is added. It allows you to abstract SQL code as an SQL script template to reuse the SQL code. You can select SQL script templates and configure input and output parameters for these templates based on your business requirements.
Runtime logs Overall feature Not supported. The Runtime Logs tab is added to show the records of all nodes that have been run in the development environment in the last three days. You can view and filter the running results.
Filtering of SQL results Overall feature Not supported. An Excel component is integrated into the SQL editor so that you can filter, screen, and sort the running results of SQL statements.
Recycle bin Overall feature Not supported. A recycle bin is added to prevent misoperation. You can view the nodes deleted from the current workspace in the recycle bin and restore them as needed.
Global code search Overall feature Not supported. Global code search is supported. You can search for an ODPS SQL, Shell, or sync node by entering a part of a string included in the node to quickly find the node that you want to view or manage.
Node deployment Overall feature DataWorks V1.0 supports node deployment. The feature name is changed to Cross-Project Cloning. You can clone a node only between workspaces in basic mode.
Operation Center Node list Feature You can search for a node by node type, node name, and owner in the node list. More filters are added to help you search for nodes, including Workflow, Solution, and Baseline. Nodes can be managed based on the business. More features are added for node development.
Node management Feature You can search for a node by node type, node name, owner, data timestamp, and running date in the node list. More filters are added to help you search for nodes, including Workflow, Solution, and Baseline.
Alerting Feature You can configure alerts based on the completion status and running status of nodes. The Baseline Management, Event Management, and Rule Management features are added to build a more intelligent and comprehensive alerting system.
Monitor The Monitor service is added as a node monitoring and analysis system of DataWorks. It monitors the running status of nodes and sends alerts based on the intervals, notification methods, and recipients specified in alert triggers. It can automatically select the most appropriate alerting time, notification methods, and recipients. A one-stop cloud platform is built to provide data development, security, governance, and sharing services.
Data Quality The Data Quality service is added for you to control the data quality of heterogeneous connections. In Data Quality, you can check data quality, configure alert notifications, and manage connections.

Data Quality monitors data in datasets. Currently, it allows you to monitor MaxCompute tables and DataHub topics. When offline MaxCompute data changes, Data Quality checks data and blocks nodes if it detects exceptions. This prevents nodes from being affected. In addition, Data Quality allows you to manage the check result history so that you can analyze and evaluate the data quality.

DataService Studio The DataService Studio service is added. You can quickly create API operations in DataService Studio based on tables or register existing API operations with DataService Studio. DataService Studio and API Gateway are interconnected. This allows you to deploy API operations to API Gateway with ease. DataService Studio works together with API Gateway to provide a secure, stable, low-cost, and easy-to-use data sharing service.
Data Security Guard The Data Security Guard service is added to identify data assets, detect sensitive data, classify data, de-identify data, monitor data access behavior, report alerts, and audit risks.