All Products
Search
Document Center

DataWorks:Change history

Last Updated:Jan 24, 2024

This topic describes the change history of the DataWorks documentation. You can learn the new features and feature changes of DataWorks.

Note

DataWorks can be automatically updated. The update has no impact on existing users.

Changes in December 2023

Date

Item

Category

Description

References

2023.12.29

New feature

DataStudio

A topic is added to describe how to associate a data source or cluster with DataStudio. If you want to perform data modeling or data development, or periodically schedule tasks in Operation Center, you must associate your data source or cluster with DataStudio. This way, you can read data in the data source or cluster and perform data development operations.

Preparation before development: Binding data sources or clusters

2023.12.29

New feature

Data source

Topics are updated to describe how to add MaxCompute, Hologres, AnalyticDB for PostgreSQL, AnalyticDB for MySQL, and ClickHouse data sources to DataWorks and how to register E-MapReduce (EMR), and CDH or CDP clusters to DataWorks. To ensure better user experience, DataWorks gradually manages the MaxCompute, Hologres, AnalyticDB for PostgreSQL, AnalyticDB for MySQL, and ClickHouse compute engines as data sources, and the EMR, and CDH or CDP compute engines as open source clusters. After the change, you must perform operations that are related to compute engines, such as creating and modifying compute engines, on the Data Source page or Open Source Clusters page in the DataWorks console.

Changes in November 2023

Date

Item

Category

Description

References

2023.11.21

New feature

Data Security Guard

A topic is added to describe how to create a data masking scenario. When you use Data Security Guard to identify sensitive data, you can configure a data masking rule based on your data masking scenario. DataWorks provides multiple ready-to-use level-1 data masking scenarios, such as masking of displayed data in DataStudio and Data Map and static data masking in Data Integration. If the data scope and user scope for which the data masking scenarios take effect cannot meet your requirements for finer-grained data masking, you can use level-1 data masking scenarios to configure level-2 data masking scenarios based on your business requirements.

Create a data masking scenario

2023.11.20

New feature

DataStudio

A topic is added to describe Check nodes provided by DataWorks. DataWorks allows you to use a Check node to check whether a specified partition exists in a MaxCompute partitioned table or whether the operation of writing data to the partition is complete. If a task depends on a MaxCompute partitioned table, you can use a Check node to check whether the partition data in the table is available first. This prevents invalid data from being used.

Use a Check node

Updates in October 2023

Date

Item

Category

Description

References

2023.10.30

New feature

Data Modeling API operation

A topic is added to describe the QueryPublicModelEngine operation that is newly supported by DataWorks. You can call this operation to query information about objects created in Data Modeling, such as information about the model on which a single metric depends and information about composite metrics.

QueryPublicModelEngine

2023.10.20

New feature

Upload and download

A topic is added to describe the upload and download service of DataWorks. This service allows you to upload data from various sources, such as local files and Object Storage Service (OSS) buckets, to MaxCompute for data analysis, processing, and management. The data upload and download service provides efficient and convenient data transmission capabilities to implement data-driven business.

Upload data

2023.10.12

New feature

Data source

A topic is added to describe the notice for a new version of DataWorks data sources.

Notice for a new version of DataWorks data sources

Changes in September 2023

Date

Item

Category

Description

References

2023.9.25

Updated feature

DataStudio

Topics are updated to describe how to configure same-cycle scheduling dependencies and preview scheduling dependencies.

2023.9.20

New feature

Open Platform

A topic is updated to describe the table permission pre-event and the workspace deletion event within a tenant. The events are newly supported.

Appendix: Formats of event messages sent to EventBridge

2023.09.13

Updated feature

Data Security Guard

A topic is optimized to describe how to specify the category and sensitivity level of sensitive data.

Specify the category and sensitivity level of sensitive data

2023.09.12

New feature

DataStudio

A topic is updated to describe the code and log isolation feature that is newly added on the Security Settings and Others tab of the DataStudio page in the DataWorks console. After the code and log isolation feature is enabled for a workspace, members that do not belong to the workspace have no permission to view the code and run logs of tasks in the workspace.

Configure settings on the Security Settings and Others tab

Changes in August 2023

Date

Item

Category

Description

References

2023.8.29

New feature

DataService Studio

A topic is updated to describe the instance types that are newly supported for exclusive resource groups for DataService Studio. The instance types are api.s2.small, api.s2.medium, and api.s2.large.

Billing of exclusive resource groups for DataService Studio (subscription)

2023.8.29

New feature

Operation Center

Topics are updated to describe the following feature that is newly supported in Operation Center: You can adjust the priority of the YARN queue of a node by configuring a priority mapping between the baseline to which the node belongs and a YARN queue.

2023.8.28

New feature

SettingCenter

A topic is updated to describe a built-in workspace-level role named Role_Project_Scheduler. This role is newly added and can be used to schedule and run MaxCompute tasks in the production environment.

Appendix: Mappings between the built-in workspace-level roles of DataWorks and the roles of MaxCompute

2023.8.25

New feature

Data Modeling

A topic is added to describe the relationship diagram feature that is newly provided. This feature allows you to quickly build an architecture for models in your data warehouse, which can intuitively display relationships between the models in your data warehouse. Each relationship diagram displays relationships between models in one data warehouse. You can create multiple relationship diagrams within one Alibaba Cloud account.

Relationship diagram

2023.8.25

New feature

Data Integration

A topic is added to describe Amazon Redshift data sources that are newly supported. You can use Amazon Redshift Reader and Amazon Redshift Writer to read data from and write data to Amazon Redshift data sources and can configure a synchronization task for an Amazon Redshift data source by using the codeless user interface (UI) or code editor.

Amazon Redshift data source

2023.8.24

New feature

Operation Center

A topic is added to describe the scheduling calendar feature that is newly provided. This feature allows you to define scheduling dates and scheduling methods for tasks in a more flexible manner.

Configure a scheduling calendar

2023.08.16

Updated feature

SettingCenter

A topic is added to describe how to create a MaxCompute data source of the new version. To provide a better user experience, the DataWorks development team released a new version of MaxCompute data sources, into which operations related to MaxCompute compute engines are integrated. For example, you can create or edit a MaxCompute compute engine on the pages related to MaxCompute data sources in the DataWorks console. In addition, changes have occurred on permissions on MaxCompute data sources.

2023.08.15

New feature

Operation Center

A topic is updated to describe the trigger conditions that are newly supported if you set the Object Type parameter to Workspace for a custom alert rule. The trigger conditions include the number of instances with errors, the proportion of instances with errors, and task logs that contain specific keywords.

Create a custom alert rule

2023.08.04

New feature

Data Integration

A topic is added to describe how to create and configure a synchronization task in Data Integration to synchronize data from Kafka to a data lake, such as Object Storage Service (OSS), in real time.

Synchronize data from a Kafka table to OSS (Hudi) in real time

Changes in July 2023

Date

Item

Category

Description

References

2023.7.31

Optimized setting

DataService Studio

A topic is updated to optimize the architecture and content of the topics for DataService Studio.

Overview of DataService Studio

2023.7.31

Updated feature

Data Governance Center

Topics are updated to describe how to handle governance issues and check events of MaxCompute and EMR data sources.

2023.7.25

Updated feature

DataWorks console

A topic is updated to describe the updates of the features on various pages in the DataWorks console.

Overview of the DataWorks console

2023.7.18

New feature

Data Integration

A topic is added to describe how to create and configure a real-time extract, transform, and load (ETL) synchronization task to synchronize data from Simple Log Service to Hologres.

Create a real-time ETL synchronization task to synchronize data from Simple Log Service to Hologres

2023.7.16

New feature

Data Modeling

A topic is added to describe composite metrics. Composite metrics are calculated based on specific derived metrics and calculation rules and are fine-grained metrics that can help you collect statistics about your business in a flexible manner.

Composite metric

2023.7.13

New feature

Data Integration

A topic is added to describe how to create and configure a real-time ETL synchronization task to synchronize data from Kafka to Hologres. A real-time ETL synchronization task initializes the schema of a destination Hologres table based on the structure of a specified Kafka topic, and then synchronizes full data from the topic to the table at a time and synchronizes incremental data from the topic to the table in real time.

Create a real-time ETL synchronization task to synchronize data from Kafka to Hologres

2023.07.08

New feature

SettingCenter

A topic is added to describe the built-in logic of a default workspace. A default workspace is generated the first time you activate DataWorks or if you activate DataWorks in a new region.

Built-in logic of a default workspace

2023.07.06

New feature

Data Modeling

A topic is updated to describe the billing standards of Data Modeling. Data Modeling newly supports the Individual Edition.

Billing standards of Data Modeling

Changes in June 2023

Date

Item

Category

Description

References

2023.6.30

New feature

DataStudio

A topic is updated to describe how to configure a code template for node types such as PyODPS 3 and EMR Spark SQL.

Configure a code template

2023.6.29

New feature

DataStudio

DataWorks provides Function Compute nodes. You can use Function Compute nodes to periodically schedule event processing functions and complete integration and joint scheduling with other types of nodes.

Create and use a Function Compute node

2023.6.29

New feature

SettingCenter

A topic is updated to describe the following operations that are supported when you associate an EMR compute engine with a workspace:

  • Select Spark for the Cluster Type parameter if you use an EMR on ACK cluster.

  • Configure global Spark properties.

  • Configure mappings between DataWorks member accounts and OpenLDAP accounts or Kerberos accounts, and upload a keytab file.

Associate an EMR compute engine with a workspace

2023.6.27

Updated feature

Operation Center

A topic is updated to describe the modification to the Overview page in Operation Center. After the modification, the page displays the overall O&M information, including the results of O&M stability assessment, key O&M metrics, usage of scheduling resources, and status information of auto triggered tasks. The page also displays information about synchronization tasks in Data Integration. This helps you quickly understand the overall information of tasks in your workspace, identify and handle exceptions at the earliest opportunity, and improve O&M efficiency.

View the statistics on the Overview page

2023.6.25

New feature

Data Modeling

A topic is updated to describe the following feature that is supported when you perform data modeling by using the code editor on the System Management page in Data Warehouse Planning: Specify whether the Comment field in DDL statements of compute engines in code corresponds to a display name or a description that is specified by a parameter in the codeless UI based on your business requirements.

System management

2023.6.16

New feature

DataStudio

  • Workflow parameters are newly supported for Hologres SQL nodes.

  • The value assignment logic of workflow parameters is updated.

Use workflow parameters

2023.6.10

Updated feature

DataStudio

The structure and content of the Develop a MaxCompute Spark task topic are optimized.

Develop a MaxCompute Spark task

Changes in May 2023

Date

Item

Category

Description

References

2023.5.22

New feature

SettingCenter

A topic is updated to describe the service-linked roles for the Alibaba Cloud services to which compute engines belong. If you want to perform compute engine-related operations in the DataWorks console, such as associating a compute engine with a workspace or modifying an existing compute engine instance, the system prompts you to perform authorization operations for DataWorks. After the authorization is complete, the system creates a service-linked role for the Alibaba Cloud service to which the related compute engine belongs.

Appendix: Service-linked roles used by DataWorks to access Alibaba Cloud services to which compute engines belong

2023.5.10

Updated feature

Open Platform

A topic is updated to describe the optimization and updates to the graphical user interface (GUI) of the Open Platform service.

Overview of Open Platform

Changes in April 2023

Date

Item

Category

Description

References

2023.4.19

New feature

Data Integration

A topic is added to describe how to create and configure a batch synchronization task to synchronize all data in an EMR Hive database to MaxCompute at a time.

Synchronize data in an EMR Hive database to MaxCompute in offline mode

2023.4.17

Optimized setting

SettingCenter

A topic is added to describe how to change the time zone for scheduling. Before you create a DataWorks workspace, you must select the region in which you want to create the workspace. By default, the time zone of the region in which the DataWorks workspace resides is the time zone for scheduling. You can change the time zone for scheduling based on your business requirements.

Scenario: Change the time zone for scheduling

2023.4.14

New feature

Data Integration

A topic is added to describe how to create and configure a batch synchronization task to synchronize all data in a MySQL database to Hive at a time.

Synchronize full data from a MySQL database to Hive at a time

2023.4.12

Updated feature

Data Integration

Topics are updated to describe the data read modes and data write modes that are newly supported for reading data from and writing data to wide tables and time series tables in Tablestore data sources and Tablestore Stream data sources. The new modes are the row mode and column mode.

Changes in March 2023

Date

Item

Category

Description

References

2023.3.28

Updated feature

Data Map

A description for creating crawlers and collecting metadata from various data sources to DataWorks by using the created crawlers is provided.

Metadata collection

2023.3.23

New feature

Data Integration

Topics are updated to describe the LogView feature that is newly provided. You can use the feature to view running information about the batch and real-time synchronization tasks.

2023.3.21

Updated feature

Data Modeling

Topics are updated to describe the optimization and updates that are related to data layers. The features of data layer checkers are optimized. The rules defined in all checkers that are created for tables or derived metrics at the same data layer have the same strength type. The strength type is strong or weak.

2023.3.02

New feature

Data Integration

A topic is added to describe how to create and configure a batch synchronization task to synchronize all data in an ApsaraDB for ClickHouse database to Hologres at a time.

Synchronize data from multiple tables in an ApsaraDB for ClickHouse database to Hologres in offline mode

2023.3.02

New feature

DataStudio

An overview for scheduling properties is provided. If you want the system to periodically schedule a task, you must define scheduling properties such as the scheduling cycle, dependencies, and scheduling parameters for the task.

Overview

Changes in February 2023

Date

Item

Category

Description

References

2023.2.28

New feature

Data Governance Center

Custom configurations of notifications for governance issues that are displayed on the Governance issues page in the DataWorks console are supported. Notifications can be sent to specified personnel by system message, email, or DingTalk group message. This way, the governance issues can be viewed and handled at the earliest opportunity.

Configure a periodic notification for governance issues

2023.2.26

Updated feature

DataStudio

A topic is updated to describe the optimizations to the procedures of undeploying auto triggered tasks and restoring undeployed tasks, and the processing solutions for instances that are generated but are not run and instances that are running after tasks are undeployed.

Undeploy tasks

2023.2.21

New feature

DataStudio

A topic is added to describe a general development process for data development. Different types of compute engine tasks can be encapsulated into different types of nodes to define data development tasks. Resources, functions, and related logic processing nodes can be used to develop more complex tasks. You can refer to the general development process for tasks in DataStudio to perform data development.

General development process

2023.2.17

Optimized setting

Data Integration

The architecture and content of topics in the Data Integration documentation are adjusted.

Overview of Data Integration

2023.2.16

Updated feature

DataStudio

The description for configuring and using OSS object inspection nodes is optimized.

OSS object inspection node

2023.2.14

New feature

Migration Assistant

A topic is added to describe how to export the tasks in DolphinScheduler and then import the tasks to DataWorks.

Export tasks from open source engines

2023.2.09

Updated feature

DataStudio

The architecture of topics in the documentation for script templates is adjusted, and the logic for using script templates is optimized.

Overview of a script template

Changes in January 2023

Date

Item

Category

Description

References

2023.1.17

New feature

DataStudio

An introduction to the procedure of node debugging is provided. You can use features such as run, run with parameters, and quick run to debug complete code or code snippets based on your business requirements. After the debugging is complete, you can view the running results.

Debugging procedure

2023.1.17

Updated feature

DataStudio

The details of node groups are optimized. A description for deleting a node group is added.

Create and manage a node group

2023.1.11

New feature

Operation Center

A topic is added to describe the intelligent diagnosis feature that is newly provided. You can use this feature to quickly determine the reasons why tasks fail to run. Multiple factors may affect the running of a task.

Use the Intelligent Diagnosis feature

2023.1.10

New feature

DataStudio

A topic is updated to describe how to search and view operation records in a workspace on the DataStudio page by operation type, operator, or operation time.

View operation records on the DataStudio page

2023.1.9

New feature

Data Modeling

A topic is added to describe the system management feature that is newly provided. You can use this feature to manage table creation policies in a data warehouse. For example, you can configure a table creation policy that prohibits users who do not have data models from creating physical MaxCompute tables in DataStudio in the production environment. After you enable the table creation policy, when a user creates or modifies a physical MaxCompute table in DataStudio in the production environment, the system checks the name of the table based on the policy. This ensures standardization of table creation.

System management

2023.1.6

New feature

Data Modeling

A topic is updated to describe how to publish and materialize a table. Tables can be published and materialized to EMR and Hologres compute engine instances.

Publish and materialize a table

Changes in December 2022

Date

Item

Category

Description

References

2022.12.29

Updated feature

DataStudio

A topic is updated to describe the optimizations to the operations that are related to the creation and use of MaxCompute tables in the following aspects: visualized creation of MaxCompute tables, committing and deployment of MaxCompute tables, data write to and data export from MaxCompute tables, and query of data in MaxCompute tables.

Create and manage MaxCompute tables

2022.12,23

Optimized setting

DataStudio

A topic is updated to describe the optimizations to the settings that are related to table management, such as configuring table-related formats, creating or managing folders, and creating or managing layers.

Manage settings for tables

2022.12.23

New feature

Compute engine association

A topic is updated to describe the change on the entry points for associating compute engines with a workspace, and a description is provided for the permissions that are required to associate a compute engine with a workspace.

Go to the Compute Engine Information tab

2022.12.15

New feature

DataStudio

Topics are added to describe the processes of developing Hologres nodes and ODPS nodes in DataWorks.

2022.12.6

New feature

Open Platform

A status change event for a workflow is added.

Appendix: Formats of event messages sent to EventBridge

Changes in November 2022

Date

Item

Category

Description

References

2022.11.24

New feature

DataStudio

Guidance for configuring scheduling dependencies and principles of scheduling configurations in complex dependency scenarios are provided to help you understand the procedure and key points of configuring scheduling dependencies. Before you configure scheduling dependencies, make sure that you are familiar with the guidance and principles. This helps prevent data exceptions caused by inappropriate scheduling dependency configurations.

2022.11.23

New feature

DataStudio

Topics are updated to describe how to create Hologres internal tables and Hologres foreign tables in the DataWorks console.

2022.11.18

Updated feature

Open Platform

A topic is updated to describe the change on the entry point of the Open Platform page.

Overview of Open Platform

2022.11.17

New feature

Data Map

A topic is updated to describe how to view the details of and manage a table. Tables can be added to a data album for management, and the data albums to which tables are added can be viewed in Data Map.

View the details of a table

2022.11.3

New feature

Security Center

A topic is added to describe how to use the data query and analysis control feature that is newly provided. This feature allows you to authorize a role or member to query a specific data source in a DataWorks module. This feature also allows you to manage the permissions on query results.

Use the data query and analysis control feature

Changes in October 2022

Date

Item

Category

Description

References

2022.10.21

Updated feature

Management and control

  • The architecture of the "Overview of the DataWorks console" topic is optimized.

  • The logical architecture and content of topics in "Workspace management", "Resource group management", and "Data source management" are optimized. The topics include Configure a workspace, Associate compute engines with a workspace and manage compute engines, and Differences between workspaces in basic and standard modes.

Overview of the DataWorks console and Overview of the features in SettingCenter

2022.10.20

New feature

Resource group

The service-linked role AliyunServiceRoleForDataWorks is automatically created by DataWorks the first time you use an exclusive resource group. You can use the role to access resources in a virtual private cloud (VPC), an elastic network interface (ENI), and a security group. The service-linked role can also be created by using a RAM user.

DataWorks service-linked role

Changes in September 2022

Date

Item

Category

Description

References

2022.9.23

Updated feature

DataWorks console

A topic is updated to describe the optimization to the O&M Assistant feature. This feature allows you to create, run, and delete commands on an exclusive resource group for scheduling. This feature also allows you to view the execution results of the commands.

Use the O&M Assistant feature

2022.9.22

New feature

DataStudio

A topic is updated to describe the forcible code review feature and how to enable and use this feature for a workspace in basic mode.

Code review

2022.9.20

New feature

Operation Center

Topics are updated to describe how to view the custom alert rule and baseline that are associated with an auto triggered task or an auto triggered instance on the General tab of the task or instance. If no custom alert rule or baseline is associated with the task or instance, you can quickly create a custom alert rule or a baseline on this tab.

View auto triggered task instances, Test an auto triggered task and view test instances generated for the task, and Appendix: Use the features provided in a DAG

2022.9.19

Updated feature

Data Integration

Topics are updated to describe how to create and configure a synchronization task that uses DM Reader or DM Writer by using the codeless UI.

DM Reader and DM Writer

2022.9.06

New feature

Data Modeling

A topic is added to describe how to create a dimension. A dimension can be planned and created in the Dimensional Modeling module. You can associate the dimension with a dimension table when you create the dimension table. After the association, you can view your business data based on the dimension.

Create a conceptual model: dimension

2022.9.06

New feature

Data Modeling

A topic is added to describe the import feature that is newly supported. This feature provides different types of import templates for objects such as models and data metrics. You can use this feature to import information of multiple objects at the same time based on an object import template to create the objects in Dimensional Modeling.

Import

Changes in August 2022

Date

Item

Category

Description

References

2022.8.30

Updated feature

Data Integration

  • The architecture and content of topics in the Data Integration documentation are adjusted.

  • The logic and content of topics in the Data Integration documentation are optimized. The adjusted topics include the topics in the following directories: Select a data synchronization feature, Preparation before synchronization, Batch data synchronization, Real-time data synchronization, and Sync solutions.

  • Topics are added to describe how to create a real-time synchronization task to synchronize all data in a database to Oracle, PolarDB, and MySQL, and how to create a batch synchronization task to synchronize all data in a database to OSS.

Overview of Data Integration

2022.8.22

New feature

Operation Center

The Workflow Perspective tab is added on the Cycle Instance page. On this tab, you can view the status of a workflow in the Workflow column based on the icons that represent the status of auto triggered task instances in the workflow. You can perform different operations on a workflow by clicking entry points in the Actions column. The operations that you can perform on a single auto triggered task instance on the Workflow Perspective tab are the same as the operations that you can perform on the auto triggered task instance on the Instance Perspective tab.

View auto triggered task instances

2022.8.18

New feature

Data Modeling

The following features are added to Data Modeling of DataWorks:

  • Multiple metrics can be imported or exported at a time.

  • The versions of a metric can be managed, and the fields that are associated with a metric can be viewed.

  • Checkers used to check the names of derived metrics and tables are added. A checker at a data layer is used to define a naming convention for tables and derived metrics at the data layer and standardize the names of derived metrics and tables.

  • Reverse modeling is supported at an application layer.

  • A resource group can be selected based on your business requirements when you publish a table.

2022.8.05

New feature

DataStudio

Topics are added to describe how to synchronize schemas and data of MaxCompute tables to Hologres.

  • Synchronization of schemas of MaxCompute tables: DataWorks allows you to quickly create and configure a node on the DataStudio page to create multiple Hologres foreign tables whose schemas are the same as the schemas of the source MaxCompute tables at the same time. You can then use the created Hologres foreign tables to accelerate queries of data of the source MaxCompute tables.

  • Synchronization of data of MaxCompute tables: DataWorks allows you to create and configure a node on the DataStudio page to synchronize data of MaxCompute tables to Hologres. This way, you can quickly query the data of MaxCompute tables.

2022.8.02

New feature

DataStudio

  • A topic is added to describe the optimal configurations of an EMR DataLake cluster that is used when you run EMR tasks in DataWorks.

  • A topic is added to describe how to run PySpark jobs in DataWorks.

Changes in July 2022

Date

Item

Category

Description

References

2022.7.29

New feature

Data Modeling

  • The display names and descriptions of fields can be filled based on the on-screen instructions if the fields that you want to import to a table have no display names or descriptions.

  • The information about a table can be converted into DDL or ETL statements of different types of compute engines. Then, you can copy the code or export the code file.

Publish and materialize a table

2022.7.29

New feature

Data Modeling

The model development feature of DataWorks Data Modeling is supported. You can associate a table with an existing node in DataStudio. After the association, you can double-click the name of the node to go to the configuration tab of the node to develop data.

Develop a model

2022.7.29

New feature

Data Modeling

A topic is added to describe how to configure and use a checker that checks the names of derived metrics at a data layer. A checker at a data layer can define a naming convention for derived metrics at the data layer to help reduce O&M costs.

Configure and use a checker at a data layer

2022.7.8

New feature

DataStudio

An EMR DataLake cluster can be associated with a DataWorks workspace as a compute engine instance. This way, you can develop and run EMR nodes based on the compute engine instance. Topics are added to describe the development process of an EMR node in DataWorks, the configurations of an EMR DataLake cluster in DataWorks, and permission management when a user runs EMR nodes in DataWorks.

2022.7.2

Updated feature

DataStudio

Topics are updated to describe the following new scenarios in which zero load nodes can be used:

  • Manage workflows in scenarios in which scheduling dependencies between nodes are complex.

  • Schedule nodes that have no lineage relationship.

  • Configure scheduling dependencies for branch nodes in a workflow on nodes in another workflow.

Create and use a zero load node

Changes in June 2022

Date

Item

Category

Description

References

2022.6.28

New feature

Data Modeling

A topic is updated to describe how to perform reverse modeling on physical tables. The fuzzy match rule can be specified in a reverse modeling policy to match physical table names.

Perform reverse modeling on physical tables

2022.6.27

New feature

Data Security Guard

A topic is updated to describe how to identify sensitive data. You can set the Scanning range parameter to Custom range when you create a sensitive data identification task on the Sensitive data identification page in Data Security Guard in the DataWorks console. In addition, you can view the progress of and logs for sensitive data identification.

Configure sensitive data identification rules

2022.6.22

Updated feature

Open Platform

Topics are updated to describe how to use EventBridge to subscribe to and consume messages. In earlier versions, Kafka is used to subscribe to and consume messages.

2022.6.16

Updated feature

DataStudio

The scenario in which scheduling dependencies must be configured for nodes across workflows or workspaces is added.

Scenario 3: Configure dependencies for tasks across workflows or workspaces

2022.6.13

New feature

DataStudio

Features in DataStudio can be displayed based on the permissions of a user, and custom display of features on the DataStudio page based on business requirements is supported. This can help you easily get started with DataStudio.

Scenario: Adjust the displayed DataStudio modules

2022.6.2

New feature

Data Integration

Query of the data that is synchronized to MaxCompute after the related synchronization task finishes running is supported.

MaxCompute Writer

2022.6.2

New feature

Data Integration

A topic is added to describe how to add a StarRocks data source to DataWorks. You can configure synchronization tasks that use StarRocks Reader or StarRocks Writer to read data from or write data to StarRocks data sources in the codeless UI or code editor.

Add a StarRocks data source

Changes in May 2022

Date

Item

Category

Description

References

2022.5.23

New feature

Approval Center

Topics are updated to describe how to create a request processing policy that is used when a Data Integration task is saved. Such a request processing policy can be created by a user that is assigned the Workspace Administrator role and takes effect for the workspace in which the policy is created.

2022.5.22

Updated feature

Data Security Guard

  • A topic is updated to describe how to configure a whitelist. If a user queries data within the time range that is specified by the Effective From parameter in the whitelist, the query results are not masked.

  • You cannot set the values of all parameters for the whitelist to All.

Create a data masking rule

2022.5.18

New feature

Data Security Guard

A topic is added to describe how to use the data lineage feature of Data Security Guard to visualize the lineage of sensitive data, analyze abnormal associations between fields, and identify fields whose identification results are abnormal. The data lineage feature provides information about the spread and impacts of sensitive data and helps efficiently identify sensitive data.

View the data lineage of sensitive data (in public preview)

2022.5.18

New feature

Data Modeling

A topic is added to describe the Homepage of Data Modeling. On the Homepage, you can view the number of tables and derived metrics in the current workspace of your account. You can also view the tables that are successfully published to the production environment within the last 30 days. This way, you can obtain an overview of the tables.

Homepage

2022.5.13

New feature

API

A topic is added to describe how to query migration tasks.

ListMigrations

2022.5.11

New feature

Data Integration

A topic is added to describe how to use HBase20xsql Reader to read data from Phoenix tables that are mapped to HBase SQL tables.

HBase20xsql Reader

2022.5.12

Updated feature

Commercial use

The architecture of the "Billing overview" topic is adjusted.

Billing overview

2022.5.10

New feature

Intelligent monitoring

  • Custom alert rules can be configured to monitor the status and resource usage of tasks.

  • Intelligent baselines can be configured to ensure that the data you want to obtain is generated as expected in scenarios that involve complex dependencies between tasks.

  • Custom O&M rules for resource groups can be configured based on your business requirements to implement automated O&M for task instances that are run on the resource groups.

Overview

Changes in April 2022

Date

Item

Category

Description

References

2022.04.29

Updated feature

Billing rule and resource group

  • The architecture of topics in editions and resource groups and billing directories is adjusted.

  • The logic and content of the topics in the preceding directories are updated.

  • Topics are added to describe operation guides about specification changes, scaling, deduction and overdue payments, service expiration, and renewal.

2022.04.17

Updated feature

Edition and resource group

A topic is added to describe how to change the specifications of a resource group. The added topic also describes how to prepare for the change of specifications, confirm the possible impact of the operation, and determine whether to allow the system to automatically rerun the terminated task after the change is complete. This improves user experience.

Change the specifications of a resource group

2022.04.15

Updated feature

Intelligent baseline

  • The layout of the intelligent baseline feature in Operation Center is optimized. The Baselines, Baseline Instances, and Events tabs are merged.

  • Alert rules can be configured for baselines. The alerts include baseline alerts and event alerts.

  • The operation records of baselines can be viewed on the Operation History page. The following types of operations are recorded: create, modify, enable, disable, and delete.

2022.04.15

New feature

Data Analyst role

By default, users with the Data Analyst role have permissions only on DataAnalysis.

2022.04.14

New feature

Basic operations in the DataWorks console

By default, after you select a region, the time zone for the region that you select is automatically used as the time zone for scheduling. This indicates that the time zone is used when you configure the scheduling time for a task. When you create a workspace in the US (Silicon Valley) or Germany (Frankfurt) region for the first time, a message appears. In the message, you can submit a ticket to contact technical support to set the time zone for scheduling to the UTC+8 time zone.

Overview

2022.04.13

New feature

Data Security Guard

  • A topic is added to describe the new version of the risk identification rule feature.

    The new version of this feature provides multi-dimensional association analysis methods and algorithms. This feature uses intelligent analysis technologies to identify data risks based on risk identification rules and sends you alert notifications. This feature also allows you to perform end-to-end audit in a visualized manner. DataWorks provides risk identification rules for a variety of scenarios. You can directly use these rules or customize rules based on your business requirements.

  • A topic is added to describe the new version of the Data Risks feature.

    The new version of this feature displays the data risks that are hit by configured risk identification rules from multiple dimensions. You can view the distribution of data risks in different dimensions, the trend of data risks in a specified time range, and the rankings of workspaces in which most data risks are identified. You can obtain the time ranges and workspaces in which a large number of data risks are identified. You can view details about a data risk, such as the user who performed a risky operation, the time when the risky operation is performed, and the operation. This helps you locate and handle data risks at the earliest opportunity.

Changes in March 2022

Date

Item

Category

Description

References

2022.03.28

New feature

DataStudio

A topic is added to describe the quick run feature of DataWorks. This feature allows you to quickly run the code snippet that you select on the configuration tab of a node. You can use this feature to test whether a code snippet is correctly written. The added topic describes how to quickly run a code snippet of a node.

Debug a code snippet: Quickly run a code snippet

2022.03.25

Updated feature

DataStudio

A topic is updated to describe the new features on the DataStudio page. This helps you understand the overall layout of the DataStudio page and the features on this page and view relevant topics with ease. The following features are added:

  • Quickly create a node: When you create a node, the system displays the node types that are recently used. If you click one of the node types, the system automatically configures the Engine Instance and Node Type parameters based on the information about the node that was last used of this type. You can use this method to quickly create a node of a type that was recently used.

  • Delete a workflow: You can select Terminate the Delete Operation or Skip Current Object and Continue to Delete Other Objects in scenarios in which an object cannot be deleted.

Features on the DataStudio page

2022.03.21

Updated feature

Data Governance

A topic is updated to describe how to filter governance items and check events by role from the personal perspective.

View data governance results

2022.03.20

Updated feature

Updates

  • The Workspaces page in the DataWorks console is optimized.

  • The Apply All Contact Information with One Click feature is deprecated.

2022.03.17

Updated feature

Data Map

  • The Data Quality tab is added on the table details page. This tab displays the monitoring rules that are configured for the current table and the alerts that are generated based on the monitoring rules.

  • The total number of MaxCompute projects is displayed on the Overview page of Data Map. The number is collected in real time.

2022.03.17

Updated feature

Scheduling parameter

A topic is updated to describe the adjusted overall structure and logic of topics related to scheduling parameters to help you quickly get started with scheduling parameters. In DataWorks, tasks are scheduled to run based on scheduling parameters. Scheduling parameters are automatically replaced with specific values based on the data timestamps of the tasks, the time when the tasks are scheduled to run, and the value formats of the scheduling parameters. This enables dynamic parameter settings for task scheduling.

Supported formats of scheduling parameters

2022.03.16

Updated feature

DataService Studio

A topic is updated to describe how to use a function as a filter for an API. If you need to use a filter to preprocess the request parameters of an API or perform secondary processing on query results of the API, perform the following operations to configure a filter: In the right-side navigation pane of the configuration tab of the API, click Filter. On the Filter tab, select Use Pre-filter or Use Post-filter based on your business requirements.

2022.03.07

Updated feature

Data Security Guard

  • Data identification rule

    • Content identification rules and metadata identification rules support the And and Or operators.

    • The hit ratio threshold can be configured for identification rules.

  • Global data masking rule

    • The following options are added for the Desensitization way parameter: empty, integer, Range transform, and Characters to replace.

    • The Reserved format encryption and To cover up methods are optimized.

    • The SHA256, SHA512, and SM3 encryption algorithms are added for HASH encryption.

  • Manual correction of sensitive data identification results

    • The feature that allows you to correct multiple sensitive data identification results at a time is added.

    • The filter conditions used to search for sensitive data identification results are optimized.

    • The feature that allows you to export sensitive data identification results is added.

    • The feature that allows you to add sensitive data identification results is added.

    • The display of sensitive data identification results on the Manual Check tab is optimized.

Changes in February 2022

Date

Item

Category

Description

References

2022.02.08

Updated feature

Data Integration

Topics are updated to describe how to configure batch synchronization tasks that use different writers in the codeless UI.

2022.02.15

Updated feature

DataStudio

Topics are updated to describe how to configure related settings on the DataStudio page.

  • Personal Settings: This tab allows you to customize the features to be displayed in the left-side navigation pane of the DataStudio page, the settings of the code editor and the directed acyclic graph (DAG), and the theme of DataStudio.

  • Template Management: A code template provides the content that is displayed at the beginning of the code for a node. The Template Management tab allows you to configure a code template for the following types of nodes based on your business requirements: ODPS SQL, ODPS MR, and Shell.

  • Security Settings and Others:

    • Data security: In the Data Security section, you can configure the Mask Data in Page Query Results parameter based on your business requirements. This parameter specifies whether to mask sensitive information in the returned results of queries that you perform in DataStudio in the current workspace.

    • Forcible code review: In the Code Review section, you can enable forcible code review for workspaces and specify code reviewers. This helps ensure the code quality of nodes.

    • Forcible smoke testing: In the Smoke Testing section, you can enable forcible smoke testing. After you enable this feature for a workspace, nodes in the workspace can be deployed only after the nodes pass smoke testing.

    • Deletion of Datablau data models: In the Datablau DDM section, you can click Delete to delete the Datablau DDM data models that you no longer need at a time.

2022.02.20

New feature

Scheduling dependency

A topic is added to describe how to fix the following issue: After you enable automatic parsing for a node, the scheduling dependencies of the node are different from those that are identified by DataWorks when you commit the node.

Configure same-cycle scheduling dependencies

2022.02.25

Updated feature

DataStudio

A topic is updated to describe how to create a merge node and define the merging logic for the node.

Configure a merge node

Changes in January 2022

Date

Item

Category

Description

References

2022.01.20

New feature

Data Modeling

A topic is added to describe how to create an application table in Data Modeling. Each application table is suitable for different business scenarios. An application table is used to organize statistical data collected by atomic and derived metrics of the same statistical period, dimension, and statistic granularity. This allows you to perform subsequent business queries, online analytical processing (OLAP) analysis, and data distribution in an efficient manner.

Create a logical model: application table

2022.01.18

New feature

Data Modeling

A topic is added to describe how to create and manage dimensions in Data Modeling. The dimension management feature allows you to create and manage dimensions in a centralized manner to ensure that each dimension is unique.

Dimension management

2022.01.18

New feature

Data Modeling

Topics are added to describe how to create data marts and manage subject areas in Data Modeling.

  • A data mart is a data organization that is based on a business category. You can use data marts to organize data for a specific product or scenario. In most cases, a data mart belongs to an application layer and depends on the aggregate data in one or more common layers.

  • A subject area is a collection of business subjects and is used to categorize data in a data mart from various analytical perspectives. You can classify business subjects into different subject areas based on your business requirements. For example, you can create a transaction subject area, a member subject area, and a commodity subject area for e-commerce data.

2022.01.16

New feature

DataStudio

A topic is added to describe how to configure same-cycle scheduling dependencies between nodes in DataStudio. After you configure scheduling dependencies for a node, you can preview the scheduling dependencies of the node from the node dependency and instance dependency dimensions. This allows you to modify the scheduling dependencies that do not meet your business requirements at the earliest opportunity.

Configure same-cycle scheduling dependencies

2022.01.15

Updated feature

DataStudio

A topic is updated to describe how to configure a resource group for scheduling for auto triggered tasks in DataStudio. Running of auto triggered tasks depends on resource groups for scheduling. You can select a resource group in the Resource Group section of the Properties tab for your auto triggered task.

Configure the resource property

2022.01.14

New feature

DataStudio

A topic is added to describe how to enable periodic scheduling and configure the scheduling settings for auto triggered tasks in DataStudio. To run auto triggered tasks as scheduled, you must go to the Scheduling Settings tab in DataStudio to enable periodic scheduling.

Configure scheduling settings

2022.01.14

New feature

DataStudio

A topic is updated to describe how to configure the rerun property for a task in DataStudio. DataStudio allows you to configure rerun-related parameters in the Schedule section of the Properties tab of a task.

Configure time properties

2022.01.14

New feature

DataStudio

A topic is updated to describe the types of scheduling parameters and the precautions for using scheduling parameters in DataStudio. You can assign built-in parameters to scheduling parameters as values for a task.

Supported formats of scheduling parameters

2022.01.12

New feature

DataAnalysis

A topic is added to describe how to write Markdown texts and SQL code, run the code for queries, and then save the query results by using the SQLNotes feature of DataWorks.

SQLNotes

2022.01.06

Updated feature

DataStudio

A topic is added to describe the features on the DataStudio page. This helps you understand the overall layout of and features on the DataStudio page and view relevant topics with ease.

Features on the DataStudio page

Changes in December 2021

Date

Item

Category

Description

References

2021.12.27

New feature

Data Map

A topic is added to describe how to create and manage a CDH Hive sampling crawler in Data Map. Data Map allows you to use the sampling crawler to sample a CDH Hive table. This way, Data Security Guard can detect sensitive data. If you configure data masking rules in Data Security Guard, data of the sensitive fields that match the rules is masked when you preview data on the details page of a table in Data Map.

CDH Hive sampling crawlers

2021.12.24

New feature

API

  • A topic is added to describe how to call the GetDISyncTask operation to query the details about a real-time synchronization task or a data synchronization solution.

  • A topic is added to describe how to call the DeployDISyncTask operation to deploy a real-time synchronization task or a data synchronization solution.

  • A topic is added to describe how to call the GetDISyncInstanceInfo operation to query the status of a real-time synchronization task or a data synchronization solution.

  • A topic is added to describe how to call the TerminateDISyncInstance operation to undeploy a real-time synchronization task.

GetDISyncTask, DeployDISyncTask, GetDISyncInstanceInfo, and TerminateDISyncInstance

2021.12.20

New feature

DataService Studio

Topics are added to describe how to create an Aviator function, use an Aviator function as the prefilter or post-filter for an API, and how to edit code for the Aviator function based on the Aviator syntax.

Create an Aviator function and Best practices of using Aviator functions as filters

2021.12.14

New feature

Data Quality

A topic is added to describe how to configure monitoring rules based on a monitoring rule template. Data Quality provides various built-in table-level and field-level monitoring rule templates based on which you can configure monitoring rules.

Configure monitoring rules for multiple tables by template

2021.12.09

Updated feature

Usage analysis

A topic is added to describe how to view the data governance status in Data Governance Center. Data Governance Center allows you to view the data governance status from the following perspectives: data production, data usage, and data management. You can select a perspective based on your business requirements to facilitate data governance.

The data pivoting feature allows data developers and administrators to view and analyze the information about tables, running status of tasks, and resource usage in one or all workspaces. This helps data developers and administrators allocate resources.

Data pivoting from the resource type perspective

2021.12.02

New feature

API

  • Topics are added to describe the API operations related to extension point events.

  • Topics are added to describe the API operations that can be used to obtain and generate asynchronous information required by synchronization tasks.

API operations related to extension point events:

API operations that can be used to obtain and generate asynchronous information required by synchronization tasks:

Changes in November 2021

Date

Item

Category

Description

References

2021.11.24

Updated feature

Data Integration

Topics are added to describe how to configure batch synchronization tasks that use HDFS Reader or HDFS Writer in the codeless UI.

HDFS Reader and HDFS Writer

2021.11.20

New feature

API

A topic is added to describe how to call the ListDags operation to obtain the details of DAGs for a single data backfill instance based on OpSeq. OpSeq is the unique identifier for data backfill.

ListDags

2021.11.14

New feature

DataStudio

A topic is added to describe how to perform operations on multiple DataWorks objects at the same time. DataWorks allows you to modify configurations such as the owners of multiple nodes, resources, or functions at the same time. After the modification, you can commit and deploy the nodes, resources, or functions to the production environment for the modifications to take effect.

Perform operations on multiple DataWorks objects at a time

2021.11.08

New feature

DataStudio

A topic is added to describe the resource group orchestration feature. This feature allows you to change resource groups for the scheduling of multiple nodes in a workflow at the same time. If multiple resource groups for scheduling exist in your workspace, you can change the resource groups for scheduling of nodes in the workspace based on your business requirements. This can facilitate reasonable resource usage.

Change resource groups for scheduling for nodes

Changes in October 2021

Date

Item

Category

Description

References

2021.10.26

New feature

Data Modeling

  • A topic is added to describe the naming dictionary feature. A naming dictionary can be used to manage the roots and morphemes of business terms, physical tables, and fields and the standardized translation of the roots and morphemes. DataWorks Data Standard allows you to create naming dictionaries and export existing naming dictionaries.

  • A topic is added to describe the reverse modeling feature. If you use a modeling tool to generate models and you want to use DataWorks Dimensional Modeling for subsequent modeling operations, you can use the reverse modeling feature provided by DataWorks Dimensional Modeling. The reverse modeling feature allows you to import the models that are generated by using the modeling tool into a compute engine instance. The system creates models based on the imported models. This way, you do not need to manually create models in DataWorks Dimensional Modeling. This reduces time costs.

2021.10.22

Updated feature

Data Security Guard

  • A topic is updated to describe how to manage data sensitivity levels in a more efficient manner. DataWorks allows you to classify data based on the value, sensitivity level, impact, and distribution range of the data. The management policy and development requirements vary for data of different sensitivity levels.

  • A topic is updated to describe how to identify sensitive field types in an efficient manner. DataWorks allows you to identify sensitive data in your workspace by using the data identification rules configured for built-in and custom sensitive field types.

2021.10.15

New feature

API

  • A topic is added to describe how to call the ListDeployments operation to query the information about deployment packages.

  • A topic is added to describe how to call the UpdateIDEEventResult operation to send the check results of an extension point event to DataStudio after the extension point event is triggered and an extension checks the extension point event.

  • A topic is added to describe how to call the GetIDEEventDetail operation to query the data snapshot of an extension point based on the ID of a DataWorks open message when an event that has the extension point is triggered.

2021.10.14

New feature

API

A topic is added to describe how to call API operations to create, configure, and manage a synchronization task in Data Integration.

Use API operations to create, modify, and delete a batch synchronization task

2021.10.11

New feature

DataStudio

A topic is added to describe the code search feature. The code search feature allows you to query code snippets in the code of nodes by keyword. The search results show the details of each code snippet and the nodes whose code contains the code snippets. You can use this feature to trace the node that causes changes in a table.

Code search

Changes in September 2021

Date

Item

Category

Description

References

2021.09.30

New feature

Scheduling settings in DataStudio

A topic is added to describe the configurations of scheduling parameters. Scheduling parameters are used during the running of DataWorks tasks. The values of scheduling parameters are automatically replaced with specific values based on the data timestamps of the tasks and the value formats of the scheduling parameters. This enables dynamic parameter settings during the running of tasks.

Supported formats of scheduling parameters

2021.09.30

New feature

Scheduling settings in DataStudio

A topic is added to describe how to configure cross-cycle dependencies between nodes and the types of cross-cycle dependencies supported by DataWorks. If you configure cross-cycle dependencies for a node, the instance of the current node in the current cycle can be run only if the instance of the node on which the current node depends in the previous cycle is successfully run.

Configure cross-cycle scheduling dependencies

2021.09.26

New feature

Data Map

Topics are added to describe the new features of the Data Map service. Data Map allows you to query APIs in all workspaces that are owned by the current tenant and view details of the APIs. This enables quick queries. On the details page of an API, you can view the basic information, parameters, and sample responses of the API.

APIs in DataService Studio

View the details of an API

2021.09.15

New feature

DataAnalysis

A topic is added to describe how to use SQL statements to query and analyze data of the added data sources in DataAnalysis.

SQL query

2021.09.02

New feature

Operation Center

A topic is added to describe the advanced mode to generate data backfill instances for auto triggered nodes. The advanced mode is used to generate data backfill instances for multiple nodes at the same time. You can select nodes that may not have dependencies with each other. You can select nodes for which you want to backfill data in the DAG of an auto triggered node or in the node list on the Cycle Task page.

Backfill data for an auto triggered node and view data backfill instances generated for the node

Changes in August 2021

Date

Item

Category

Description

References

2021.08.29

New feature

Data Integration

A topic is added to describe how to use the data masking feature. This feature can mask sensitive data in a single table that is synchronized in real time and store the data in a specific database.

Configure data masking

2021.08.22

New feature

Data Integration

A topic is added to describe how to synchronize data to Kafka in real time by using the Data Integration service of DataWorks.

Plan and configure resources

2021.08.11

New feature

SSL-based authentication

Topics are updated to describe how to configure SSL-based authentication when you add MySQL, SQL Server, and PostgreSQL data sources. After you configure SSL-based authentication for these types of data sources, only trusted applications and services can access data in the data sources. Third-party identity authentication mechanisms are used to perform strict identity authentication on users and services. These mechanisms prevent untrusted applications or services from accessing data and improve the stability of data access during data synchronization.

2021.08.07

Updated feature

Permission management system

A topic is added to describe the permission management system of DataWorks. The permission management system of DataWorks consists of two parts: permissions controlled by using RAM and permissions controlled by DataWorks.

Overview of the DataWorks permission management system

2021.08.01

New feature

Migration Assistant

A topic is updated to describe the Migration Assistant service of DataWorks. Migration Assistant was officially commercialized on August 1, 2021. Migration Assistant allows you to migrate data development objects across different DataWorks editions, Alibaba Cloud accounts, regions, and workspaces. You can export the data objects in your workspace, including auto triggered nodes, manually triggered nodes, resources, functions, data sources, table metadata, ad hoc queries, and script templates. You can also create full export tasks, incremental export tasks, or custom export tasks to export your data objects in DataWorks based on your business requirements.

Overview

Changes in July 2021

Date

Item

Category

Description

References

2021.07.22

New feature

API operation

A topic is added to describe how to call the CreateDISyncTask operation to create a batch synchronization task.

CreateDISyncTask

2021.07.14

New feature

Configurations in the DataWorks console

A topic is added to describe how to add a RAM user or a RAM role as an alert contact on the Alert Contacts page in the DataWorks console. If an error occurs during the running of a node, DataWorks sends alert notifications to the specified alert contact. This allows you to handle exceptions at the earliest opportunity.

Configure and view alert contacts

2021.07.09

New feature

Billing

A topic is updated to describe the billing method for different editions that are used in the China East 2 Finance and China South 1 Finance regions.

Billing of DataWorks advanced editions

2021.07.03

New feature

Data Security Guard

A topic is added to describe how to use the data traceability feature provided by DataWorks. This feature allows you to extract the watermark information of the data in a leaked data file. This helps you trace users who caused data leaks.

Trace leak sources

2021.07.02

New feature

Data Security Guard

A topic is added to describe how to create and manage sample libraries based on the sample files that you provide. You can associate a sample library with a data identification rule to identify data. If the data to be identified contains the data in the sample library, the data to be identified matches the data identification rule. You can use sample libraries to identify enumerated values, such as employee names and user addresses. A topic is added to describe how to create and manage sample libraries.

Identify sensitive data by using sample libraries

2021.07.02

New feature

Data Security Guard

A topic is added to describe how to use sample fields to train models. DataWorks extracts the characteristics of these fields and generates a rule model. You can use this rule model to identify the data that has similar characteristics in your data assets.

Generate a custom data identification model

Changes in June 2021

Date

Item

Category

Description

References

2021.06.11

New feature

DataStudio

A topic is added to describe how to create and run an EMR Streaming SQL node. EMR Streaming SQL nodes allow you to use SQL statements to develop streaming analytics jobs.

-

2021.06.11

New feature

DataStudio

A topic is added to describe how to create and run an EMR Spark Streaming SQL node. EMR Spark Streaming nodes can be used to process streaming data with high throughput. This type of node supports fault tolerance, which helps you restore data streams on which errors occur.

-

2021.06.09

New feature

Operation Center

A topic is added to describe how to view the basic information and status of real-time computing nodes on the Stream Task page in Operation Center in the DataWorks console. This allows you to monitor the status of the nodes. In addition, you can configure alert rules for the nodes that you want to monitor. This way, you can identify and handle exceptions at the earliest opportunity.

Manage real-time compute nodes

Changes in May 2021

Date

Item

Category

Description

References

2021.05.20

New feature

Operation Center

A topic is added to describe how to create and manage a shift schedule in DataWorks. If you set the Recipient parameter to Varies According to Shift Schedule and select a shift schedule when you create a custom alert rule, DataWorks can send alert notifications to the on-duty engineers that you specify for the shift schedule. After the engineers receive the alert notifications, they can identify and handle exceptions at the earliest opportunity.

Create and manage a shift schedule

2021.05.17

New feature

DataStudio

A topic is added to describe how to create and run ClickHouse SQL nodes. A ClickHouse SQL node allows you to use a distributed SQL query engine to process structured data. This improves the running efficiency of jobs.

Create and use a ClickHouse SQL node

2021.05.15

New feature

Data Integration

Topics are added to describe how to synchronize data to AnalyticDB for MySQL 3.0 by using the Data Integration service of DataWorks.

Changes in April 2021

Date

Item

Category

Description

References

2021.04.28

New feature

Data Integration

A topic is added to describe how to add source tables to or remove source tables from a synchronization task used to synchronize data to Hologres after the synchronization task is run.

Add or remove source tables to or from a synchronization solution that is running

2021.04.22

New feature

DataStudio

A topic is added to describe how to create an FTP Check node that can be used to periodically detect whether a specific file exists based on FTP. If the FTP Check node detects that the file exists, the scheduling system starts to run the descendant node of the FTP Check node. Otherwise, the FTP Check node detects the file based on the configured detection interval. The FTP Check node stops the retry until the condition to stop the detection is met. In most cases, FTP Check nodes are used for communications between the DataWorks scheduling system and external scheduling systems.

Create an FTP Check node

2021.04.06

API operation

API operation

A topic is added to describe how to call the GetPermissionApplyOrderDetail operation that is provided by Security Center.

GetPermissionApplyOrderDetail

2021.4.05

New feature

Data Integration

A topic is added to describe how to synchronize data to Kafka in real time by using the Data Integration service of DataWorks.

Plan and configure resources

Changes in March 2021

Date

Item

Category

Description

References

2021.3.19

New feature

Custom role

A topic is added to describe how to create a custom role in a DataWorks workspace.

Manage permissions on workspace-level services

2021.3.11

New engine

Open source engines from which tasks are exported or migrated

Topics are updated to describe how to import tasks that are exported from the open source scheduling engines into DataWorks or migrate tasks from these scheduling engines to DataWorks.

2021.3.11

New feature

Engine O&M

A topic is added to describe how to use the engine O&M feature of DataWorks to view the details of each EMR node and identify and remove the nodes that fail to run. This way, failed nodes do not affect the performance of descendant nodes.

Use the engine O&M feature

2021.3.9

New feature

Aggregate analysis of auto triggered nodes in a DAG

Topics are added to describe the aggregate view and the downstream and upstream analysis features of a DAG. You can view the details of an auto triggered node from the DAG and perform operations based on your business requirements.

2021.3.3

New feature

API operation

Topics are added to describe API operations that are added for the Operation Center, Data Security Guard, and Migration Assistant services of DataWorks.

Changes in February 2021

Date

Item

Category

Description

References

2021.2.24

New feature

Viewing of the status information about synchronization tasks

A topic is added to describe how to view the distribution and execution details of synchronization tasks that are run and how to handle synchronization tasks on which exceptions occur. This improves the O&M efficiency of synchronization tasks.

Perform O&M on a full and incremental synchronization task

2021.2.5

New feature

New feature

A topic is added to describe how to add an ApsaraDB for OceanBase data source. You can configure a synchronization task for an ApsaraDB for OceanBase data source.

Add an ApsaraDB for OceanBase data source

Changes in January 2021

Date

Item

Category

Description

References

2021.1.28

New feature

New node types supported by DataStudio

Topics are added to describe how to create and run a MySQL node and an AnalyticDB for MySQL node. You can use SQL statements to develop data for a MySQL data source and an AnalyticDB for MySQL data source.

2021.1.20

New feature

New synchronization task

Topics are added to describe how to create a batch synchronization task and a real-time synchronization task to synchronize data from specific or all tables in a database to Elasticsearch and view the status of the synchronization tasks after the synchronization tasks are created.

2021.1.19

New feature

Whitelist configuration and category management in Data Map

A topic is added to describe how to configure IP address whitelists for metadata collection and grant category management permissions. After you configure IP address whitelists, you can use the metadata collection feature and category management feature in Data Map.

Configure IP address whitelists for metadata collection

2021.1.13

New feature

Integration with ActionTrail

A topic is added to describe how to query DataWorks behavior events in ActionTrail. You can use the queried event details to perform behavior analysis, security analysis, resource change tracking, and compliance auditing.

Use ActionTrail to query behavior events

Changes in December 2020

Date

Item

Category

Description

References

2020.12.14

New feature

New feature

A topic is added to describe how to create a crawler and collect metadata from a Tablestore data source to DataWorks. You can view collected metadata on the Data Map page.

Collect metadata from a Tablestore data source

Changes in November 2020

Date

Item

Category

Description

References

2020.11.18

New feature

API operation

A topic is added to describe how to call the CreateManualDag operation to trigger the running of a manually triggered workflow.

CreateManualDag

2020.11.18

New feature

API operation

A topic is added to describe how to call the GetManualDagInstances operation to query information about instances in a manually triggered workflow.

GetManualDagInstances

2020.11.18

New feature

API operation

A topic is added to describe how to query the details of a DAG based on the ID of the DAG.

GetDag

2020.11.18

New feature

API operation

A topic is added to describe how to call the SearchNodesByOutput operation to query a node based on the output.

SearchNodesByOutput

2020.11.10

New FAQ

User experience optimization

A topic is added to provide answers to frequently asked questions about Operation Center.

Overview

2020.11.02

New feature

New feature

A topic is updated to describe the code review configuration in DataWorks. If you enable forcible code review, you must commit each node for the specific reviewer to review the code of the node. You can deploy the node only after the reviewer approves the code.

Code review

Changes in October 2020

Date

Item

Category

Description

References

2020.10.30

API operation overview

User experience optimization

A topic is added to describe the applicable scopes, billing rules, and call limits of DataWorks API operations and describe the DataWorks API operations by function.

Overview

2020.10.28

New feature

New feature

A topic is added to describe how to create an EMR table.

Create an EMR table

2020.10.28

New feature

New feature

A topic is updated to describe how to associate an EMR compute engine with a DataWorks workspace. In DataWorks, you can create nodes such as Hive, MapReduce, Presto, and Spark SQL nodes based on an EMR compute engine and configure EMR workflows. You can also schedule the nodes and manage metadata. This can facilitate data output.

Associate an EMR cluster with a DataWorks workspace as an EMR compute engine instance

Changes in September 2020

Date

Item

Category

Description

References

2020.09.03

Updates of billing method

Pricing

A topic is updated to describe the pay-as-you-go services and resources of DataWorks. The pay-as-you-go billing method allows you to use all the basic features of DataWorks in a cost-efficient manner.

Overview of the pay-as-you-go billing method in DataWorks

2020.09.03

Updated feature

User experience optimization

A topic is updated to provide basic information about DataWorks and to describe the features and limits of DataWorks.

What is DataWorks?

2020.09.02

New tutorial

User experience optimization

A topic is added to describe how to use DataWorks together with Platform for AI (PAI) to automatically identify users who steal electricity. This ensures that users use electricity in a safe manner.

Overview

Changes in August 2020

Date

Item

Category

Description

References

2020.08.07

New data source

New feature

A topic is added to describe how to add a Hive data source. You can use Hive Reader and Hive Writer to read data from and write data to a Hive data source and can configure synchronization tasks for Hive data sources by using the codeless UI and code editor.

Add a Hive data source

2020.08.07

New data source

New feature

A topic is added to describe how to add a GBase8a data source. You can use GBase8a Reader and GBase8a Writer to read data from and write data to GBase8a data sources and can configure synchronization tasks for GBase8a data sources by using the codeless UI and code editor.

Add a GBase8a data source

2020.08.07

New data source

New feature

A topic is added to describe how to configure a Hologres data source. You can use Hologres Reader and Hologres Writer to read data from and write data to Hologres data sources and can configure synchronization tasks for Hologres data sources by using the codeless UI and code editor.

Add a Hologres data source

2020.08.07

New data source

New feature

A topic is added to describe how to configure an HBase data source. You can use HBase Reader and HBase Writer to read data from and write data to HBase data sources and can configure synchronization tasks for HBase data sources by using the code editor.

Add an HBase data source

2020.08.07

New data source

New feature

A topic is added to describe how to add an Elasticsearch data source. You can use Elasticsearch Reader and Elasticsearch Writer to read data from and write data to Elasticsearch data sources and can configure synchronization tasks for Elasticsearch data sources by using the code editor.

Add an Elasticsearch data source

2020.08.07

New FAQ

User experience optimization

A topic is added to describe how to troubleshoot issues related to network connectivity, parameters, and permissions when you add data sources in DataWorks.

FAQ about adding data sources

2020.08.07

New feature

New feature

A topic is added to describe how to create an EMR Presto node. EMR Presto nodes allow you to perform interactive analysis and queries on large-scale structured and unstructured data.

Create an EMR Presto node

2020.08.05

New release notes of features

User experience optimization

A topic is added to describe the release notes of key features of DataWorks.

Announcements and updates

Changes in June 2020

Date

Item

Category

Description

References

2020.06.30

New FAQ

User experience optimization

A topic is added to provide answers to frequently asked questions about DataWorks services and features. These services and features include Data Integration, DataStudio, custom resource groups, exclusive resource groups, dependencies, Alarm, and DataService Studio.

Overview

2020.06.28

New feature

New feature

A topic is added to describe how to add a route to a VPC or a data center.

General reference: Add a route

2020.06.28

New best practice

User experience optimization

A best practice is added to describe how to use exclusive resource groups for Data Integration to migrate data from a self-managed MySQL database hosted on an Elastic Compute Service (ECS) instance to MaxCompute.

Migrate data from a user-created MySQL database on an ECS instance to MaxCompute

2020.06.28

New best practice

User experience optimization

A best practice is added to describe how to use AIRec. AIRec is developed by Alibaba Cloud based on cutting-edge big data and AI technologies, and years of experience in the e-commerce industry. AIRec provides a personalized recommendation service to increase the customer purchase rate and order conversion rate.

Intelligently recommend items on e-commerce websites

2020.06.28

New best practice

User experience optimization

A best practice is added to describe how to grant specific users access permissions on specific resources such as tables and user-defined functions (UDFs). This best practice involves data encryption and decryption algorithms that ensure data security.

Grant a specified user the access permissions on a specific UDF

2020.06.28

New best practice

User experience optimization

A best practice is added to describe how to build a data warehouse for an enterprise based on AnalyticDB for MySQL and use the data warehouse for O&M and metadata management.

Build a data warehouse for an enterprise based on AnalyticDB for MySQL

2020.06.28

New best practice

User experience optimization

A best practice is added to describe how to use a PyODPS node in DataWorks to segment Chinese text based on Jieba, an open source segmentation tool. The best practice also describes how to write the segmented words and phrases to a new table and use closure functions to segment Chinese text based on a custom dictionary.

Use a PyODPS node to segment Chinese text based on Jieba

2020.06.28

New best practice

User experience optimization

A best practice is added to describe how to use a PyODPS node that runs on an exclusive resource group to send emails.

Use a PyODPS node to send emails

2020.06.28

New best practice

User experience optimization

A best practice is added to describe how to connect DataV to DataWorks DataService Studio. You can create APIs in DataService Studio and call the APIs in DataV. Then, DataV presents analysis results of the MaxCompute data.

Best practices to connect DataV to DataWorks DataService Studio

2020.06.28

New best practice

User experience optimization

A best practice is added to describe how to use a PyODPS node in DataWorks to reference a third-party package. A PyODPS 2 node is used as an example.

Use a PyODPS node to reference a third-party package

2020.06.28

New best practice

User experience optimization

A best practice is added to describe how to enable automatic synchronization of IoT data to the cloud. IoT is a network that carries data based on the Internet and traditional telecommunication networks. IoT allows physical objects that can be independently addressed to be used as data sources.

Automatically synchronize IoT data to the cloud

2020.06.15

New data source

New data source

A topic is added to describe how to add an ApsaraDB for OceanBase data source. You can use ApsaraDB for OceanBase Reader and ApsaraDB for OceanBase Writer to read data from and write data to an ApsaraDB for OceanBase data source and can configure synchronization tasks for ApsaraDB for OceanBase data sources by using the code editor.

Add an ApsaraDB for OceanBase data source

2020.06.15

New data source

New data source

A topic is added to describe how to add a Vertica data source. You can use Vertica Reader and Vertica Writer to read data from and write data to a Vertica data source. You can configure synchronization tasks for Vertica data sources by using the code editor.

Add a Vertica data source

2020.06.15

New feature

New feature

A topic is added to describe the parameters that are supported by GBase8a Reader and how to configure GBase8a Reader by using the code editor.

Gbase8a Reader

2020.06.15

New feature

New feature

A topic is added to describe Hologres Reader. Hologres Reader allows you to export data from Hologres data warehouses. You can read data from Hologres tables and then write the data to other data sources based on the standard protocol of Data Integration.

Hologres Reader

2020.06.15

New feature

New feature

A topic is added to describe Hologres Writer. Hologres Writer allows you to import data from multiple data sources to Hologres for real-time data analysis.

Hologres Writer

2020.06.15

New feature

New feature

A topic is added to describe how to configure a resource group for scheduling for a node in the Resource Group section of the Properties tab of the node.

Configure the resource property

2020.06.15

New description

User experience optimization

A topic is added to describe the logic of scheduling dependencies. You must make sure that scheduling dependencies configured for a node are correct, which can result in an orderly workflow, ensure that business data is generated in an effective and timely manner, and standardize data development.

Scheduling dependency configuration guide

2020.06.15

New resource group

New feature

A topic is added to describe how to create and use an exclusive resource group for scheduling, and associate an exclusive resource group for scheduling with a virtual private cloud (VPC) to enable the resource group to access data sources in the VPC.

Create and use an exclusive resource group for scheduling

Changes in May 2020

Date

Item

Category

Description

References

2020.05.27

New usage notes

User experience optimization

A topic is added to describe the scenarios and methods of using shared resource groups, exclusive resource groups, and custom resource groups that are supported by DataWorks.

Overview

2020.05.27

New feature

New feature

A topic is added to describe how to manage report templates. You can create a template of data quality reports on the Report Template Management page. DataWorks Data Quality can periodically generate and send data quality reports based on the template.

Create and manage report templates

2020.05.27

New feature

New feature

A topic is added to describe how to manage rule templates. DataWorks Data Quality allows you to manage a set of custom rules and create a rule template library to configure rules in a more efficient manner.

Create and manage custom rule templates

2020.05.27

New feature

New feature

A topic is added to describe the verification logic of Data Quality and the built-in rule templates that DataWorks provides for monitoring offline data.

Built-in monitoring rule templates

Changes in April 2020

Date

Item

Category

Description

References

2020.04.19

Service upgrade

DataWorks V3.0

A topic is updated to describe how to use features provided in Operation Center. In Operation Center, you can view the dashboard, manage auto triggered nodes and manually triggered nodes, and monitor nodes.

Overview

2020.04.18

Service upgrade

DataWorks V3.0

Topics are updated to describe the overall process of building a MaxCompute data warehouse.

Build and optimize a data warehouse

2020.04.18

Service upgrade

DataWorks V3.0

A topic is added to describe the Data Integration service of DataWorks. Data Integration is a stable, efficient, and scalable data synchronization service. Data Integration is designed to migrate and synchronize data between a wide range of heterogeneous data sources in complex network environments in a fast and stable manner.

Data Integration

2020.04.08

Service upgrade

DataWorks V3.0

A topic is updated to describe a complete process of data development and O&M.

Overview

2020.04.08

Service upgrade

DataWorks V3.0

A topic is updated to provide an overview of DataWorks, including the basic concepts, scenarios, and data development processes.

What is DataWorks?

Changes in March 2020

Date

Item

Category

Description

References

2020.03.26

New tutorial

User experience optimization

A tutorial is added to describe the complete operations in the DataWorks for EMR workshop.

DataWorks for EMR Workshop

2020.03.17

Service upgrade

DataWorks V3.0

A topic is updated to describe the upgraded data development mode. In the upgraded data development mode, you can group multiple workflows under a solution in a workspace. The previous hierarchical structure is no longer used.

DataStudio

2020.03.17

Service upgrade

DataWorks V3.0

A topic is updated to describe various types of nodes in DataWorks, such as batch synchronization nodes, ODPS nodes, EMR nodes, general nodes, and custom nodes.

DataWorks nodes

2020.03.02

Service upgrade

DataWorks V3.0

A topic is updated to provide an overview of the DataWorks console. You can view the workspaces, resource groups, and compute engines in the DataWorks console.

Overview of the DataWorks console

Changes in February 2020

Date

Item

Category

Description

References

2020.02.29

New best practice

User experience optimization

A best practice is added to describe how to use the data synchronization feature of DataWorks to migrate data from Oracle to MaxCompute.

Best practice to migrate data from Oracle to MaxCompute

2020.02.02

New feature

New feature

A topic is added to describe how to use DataAnalysis. DataAnalysis allows you to collaboratively edit and analyze workbooks, manage MaxCompute tables in tabular mode, and generate and share visual reports.

Overview

Changes in December 2019