This topic describes the change history of DataWorks documentation. You can learn the new features and feature changes of DataWorks.

Note DataWorks can be automatically updated. The update has no impact on existing users.

Changes in June 2022

Date Item Category Description References
2022.6.28 New feature Data Modeling A topic is updated to describe how to configure a reverse modeling policy. Either exact match or fuzzy match can be used as the table name matching rule when you configure a reverse modeling policy. Reverse modeling
2022.6.27 New feature Data Security Guard A topic is updated to describe the Scanning range parameter for sensitive data identification. This parameter can be set to Custom range. You can view the progress and execution logs of sensitive data identification tasks. Identify sensitive data
2022.6.22 Updated feature Open platform Topics are updated to describe how to use EventBridge to subscribe to and consume messages. In earlier versions, Kafka is used to subscribe to and consume messages.
2022.6.16 Updated feature DataStudio A topic is updated to describe how to configure scheduling dependencies for nodes across workflows or workspaces. Scenario 3: Configure scheduling dependencies for nodes across workflows or workspaces
2022.6.13 New feature DataStudio A topic is added to describe how to assign roles to a user and adjust the displayed DataStudio modules. DataStudio displays different modules for users with different roles. In addition, after a user accesses DataStudio, users can adjust the displayed modules based on their business requirements. Assign roles to a user and adjust the displayed DataStudio modules
2022.6.2 New feature Data Integration A topic is updated to describe how to query the data that is synchronized to MaxCompute after the related data synchronization node finishes running. MaxCompute Writer
2022.6.2 New feature Data Integration Topics are added to describe how to use the codeless user interface (UI) or code editor to configure synchronization nodes for StarRocks data sources. DataWorks provides StarRocks Reader and StarRocks Writer for you to read data from and write data to StarRocks data sources. Add a StarRocks data source

Changes in May 2022

Date Item Category Description References
2022.5.23 New feature Approval Center A topic is added to describe how a user that is assigned the Workspace Manager role can create a request processing policy. The request processing policy is used when a Data Integration node is saved.
2022.5.22 Updated feature Data Security Guard
  • A topic is updated to describe how to configure an allowlist. If a user queries data within the time range that is specified by the Effective From parameter in the allowlist, the query results are not masked.
  • You cannot set the values of all parameters for the allowlist to All.
Create a data masking rule
2022.5.18 New feature Data Security Guard A topic is added to describe how to use the data lineage feature of Data Security Guard to visualize the lineage of sensitive data, analyze abnormal associations between fields, and identify fields whose identification results are abnormal. The data lineage feature provides information about the spread and impacts of sensitive data and helps efficiently identify sensitive data. Data lineage
2022.5.18 New feature Data Modeling A topic is added to describe the Homepage feature. On the Homepage tab, you can view the number of models and derived metrics in the current workspace of your account. You also can view the models that are successfully deployed to the production environment within the last 30 days. This way, you can obtain an overview of the models. Homepage
2022.5.13 New feature API A topic is added to describe how to query migration tasks. ListMigrations
2022.5.11 New feature Data Integration A topic is added to describe how to use HBase20xsql Reader to read data from Phoenix tables that are mapped to HBase SQL tables. HBase20xsql Reader
2022.5.12 Updated feature Commercial use The architecture of the "Billing overview" topic is adjusted. Billing overview
2022.5.10 New feature Alarm
  • Custom alert rules can be configured to monitor the status and resource usage of nodes.
  • Intelligent baselines can be configured to ensure that the data you want to obtain is generated as expected in scenarios that involve complex dependencies between nodes.
  • Custom O&M rules for resource groups can be configured based on your business requirements to implement automated O&M for node instances that are run on the resource groups.
Overview

Changes in April 2022

Date Item Category Description References
2022.04.29 Updated feature Billing rules and resource groups
  • The architecture of topics in Editions and Resource Groups and Billing is adjusted.
  • The Purchase guide topic in Billing and the Overview topic in Editions and Resource Groups are updated.
  • Topics are added to describe operation guides about specification changes, scaling, deduction and overdue payments, service expiration, and renewal.
2022.04.17 Updated feature Editions and resource groups A topic is added to describe how to change the specifications of a resource group. The added topic also describes how to prepare for the change of specifications. In the Change preparation step, you need to confirm the possible impact of the operation and determine whether to allow the system to automatically rerun the terminated production node after the change is complete. This improves user experience. Change the specifications of a resource group
2022.04.15 Updated feature Intelligent baseline
  • A topic is updated to describe the optimized layout of the intelligent baseline feature. The Baselines, Baseline Instances, and Events tabs are merged.
  • A topic is updated to describe how to configure alert rules for baselines. The alerts include baseline alerts and event alerts.
  • A topic is updated to describe how to view the operation records of baselines on the Operation History page. The following types of operations are recorded: create, modify, enable, disable, and delete.
2022.04.15 New feature Data analyst By default, users with the DataAnalysis role have permissions only on DataAnalysis.
2022.04.14 New feature Basic operations in the DataWorks console After you select a region, the time zone for the region that you select is automatically used as the time zone for scheduling. This indicates that the time zone is used when you configure the scheduling time for a node. When you create a workspace in the US (Silicon Valley) or Germany (Frankfurt) region for the first time, a message appears. In the message, you can submit a ticket to set the time zone for scheduling to the UTC+8 time zone. Workspaces page
2022.04.13 New feature Data Security Guard
  • A topic is added to describe the new version of the risk identification rule feature.

    The new version of this feature provides multi-dimensional association analysis methods and algorithms. This feature uses intelligent analysis technologies to identify data risks based on risk identification rules and sends you alert notifications. This feature also allows you to perform end-to-end audit in a visualized manner. DataWorks provide risk identification rules for a variety of scenarios. You can directly use these rules or customize rules based on your business requirements.

  • A topic is added to describe the new version of the Data Risks feature.

    The new version of this feature displays the data risks that are hit by configured risk identification rules from multiple dimensions. You can view the distribution of data risks in different dimensions, the trend of data risks in a specified time range, and the rankings of workspaces in which most data risks are identified. You can obtain the time ranges and workspaces in which a large number of data risks are identified. You can view details about a data risk, such as the user who performed a risky operation, the time when the risky operation is performed, and the specific operation. This helps you locate and handle data risks at the earliest opportunity.

Changes in March 2022

Date Item Category Description References
2022.03.28 New feature DataStudio A topic is added to describe the quick run feature of DataWorks. This feature allows you to quickly run the code snippet that you select on the configuration tab of a node. You can use this feature to test whether a code snippet is correctly written. The added topic describes how to quickly run a code snippet of a node. Debug a code snippet: Quickly run a code snippet
2022.03.25 Updated feature DataStudio A topic is updated to describe the new features on the DataStudio page. This helps you understand the overall layout of the DataStudio page and the modules on this page and view relevant topics with ease. The following features are added:
  • Create a node: When you create a node, the system displays the node types that are recently used. If you click one of the node types, the system automatically configures the Engine Instance and Node Type parameters based on the information about the node that was last used of this type. You can use this method to create a node of a type that was recently used.
  • Delete a workflow: You can select Terminate the Delete Operation or Skip Current Object and Continue to Delete Other Objects in scenarios in which an object cannot be deleted.
Features on the DataStudio page
2022.03.21 Updated feature Data governance A topic is updated to describe how to filter governance items and check events by role from the personal perspective. View data governance results
2022.03.20 Updated feature Updates
  • The Workspaces page in the DataWorks console is optimized.
  • The Apply All Contact Information with One Click feature is deprecated.
2022.03.17 Updated feature Data Map
  • A topic is updated to describe the Data Quality tab on the table details page. This tab displays the monitoring rules that are configured for the current table and the alerts that are generated based on the monitoring rules.
  • A topic is updated to describe the total number of MaxCompute projects that are displayed on the Overview page of Data Map. The number of MaxCompute projects are collected in real time.
2022.03.17 Updated feature Scheduling parameters A topic is updated to describe the adjusted overall structure and document logic of the "Overview of scheduling parameters" section. In DataWorks, nodes are scheduled to run based on scheduling parameters. Scheduling parameters are automatically replaced with specific values based on the data timestamps of the nodes, the time when the nodes are scheduled to run, and the value formats of the scheduling parameters. This enables dynamic parameter settings for node scheduling. This way, you can quickly view information about scheduling parameters and use them. Overview of scheduling parameters
2022.03.16 Updated feature DataService Studio A topic is updated to describe how to use a function as a filter for an API. If you need to use a filter to preprocess the request parameters of an API or perform secondary processing on query results of the API, perform the following operations to configure a filter: In the right-side navigation pane of the configuration tab of the API, click Filter. On the Filter tab, select Use Pre-filter or Use Post-filter based on your business requirements.
2022.03.07 Updated feature Data Security Guard
  • A topic is updated to describe how to configure data identification rules.
    • Content identification rules and metadata identification rules support the And and Or operators.
    • The hit ratio threshold can be configured for identification rules.
  • A topic is updated to describe how to create global de-identification rules.
    • The following options are added for the Desensitization way parameter: empty, integer, Range transform, and Characters to replace.
    • The Reserved format encryption and To cover up methods are optimized.
    • The SHA256, SHA512, and SM3 encryption algorithms are added for HASH encryption.
  • A topic is updated to describe how to manually correct sensitive data identification results.
    • The feature that allows you to correct multiple sensitive data identification results at a time is added.
    • The feature that allows you to configure the filter conditions to search for sensitive data identification results is optimized.
    • The feature that allows you to export sensitive data identification results is added.
    • The feature that allows you to add sensitive data identification results is added.
    • The display of sensitive data identification results on the Manual Check tab is optimized.

Changes in February 2022

Date Item Category Description References
2022.02.08 Updated feature Data Integration Topics are updated to describe how to configure related plug-ins by using the codeless user interface (UI).
2022.02.15 Updated feature DataStudio Topics are updated to describe how to configure related settings on the DataStudio page.
  • Personal Settings: This tab allows you to customize the modules to be displayed in the left-side navigation pane of DataStudio, the settings of the code editor and the directed acyclic graph (DAG), and the theme of DataStudio.
  • Code Templates: A code template provides the content that is displayed at the beginning of the code for a node. The Code Templates tab allows you to configure a code template for the following types of nodes based on your business requirements: ODPS SQL, ODPS MR, and Shell.
  • Security Settings: This tab allows you to configure security settings based on your business requirements. You can determine whether to mask sensitive information in the returned results of queries that you perform in DataStudio in the current workspace.
  • Other Settings: This tab allows you to configure other settings. DataWorks supports various settings for data development. On the Other Settings tab, you can enable forcible code review and specify one or more code reviewers to control the code quality of your nodes. You can also delete all invalid DATABLAU data models on this tab.
2022.02.20 New feature Scheduling dependencies A topic is added to describe how to fix the following issue: After you enable automatic parsing for a node, the scheduling dependencies of the node are different from those that are identified by DataWorks when you commit the node. Configure same-cycle scheduling dependencies
2022.02.25 Updated feature DataStudio A topic is updated to describe how to create a merge node and define the merging logic for the node. Configure a merge node

Changes in January 2022

Date Item Category Description References
2022.01.20 New feature Data Modeling A topic is added to describe how to create an application table in Data Modeling. Each application table is suitable for different business scenarios. An application table is used to aggregate atomic and derived metrics of the same statistical period, dimension, and statistic granularity. This allows you to perform subsequent business queries, online analytical processing (OLAP) analysis, and data distribution in an efficient manner. Create an application table
2022.01.18 New feature Data Modeling A topic is added to describe how to create and manage dimensions in Data Modeling. This feature allows you to create and manage dimensions in a centralized manner to ensure that each dimension is unique. Dimension management
2022.01.18 New feature Data Modeling Topics are added to describe how to create data marts and manage subject domains in Data Modeling.
  • A data mart resides at the application layer and aggregates data from the public layer for a specific business scenario or product based on the business category to which the data mart belongs.
  • A subject area is a collection of business subjects that are closely correlated with each other. A subject area is used to divide a data mart from various analysis perspectives. You can classify business subjects into different subject areas based on your business requirements. For example, the theme domains in the e-commerce industry can be classified into the transaction domain, membership domain, and commodity domain.
2022.01.16 New feature DataStudio A topic is updated to describe how to configure same-cycle scheduling dependencies between nodes in DataStudio. After you configure scheduling dependencies for a node, click Preview Dependencies. In the Preview Dependencies dialog box, you can preview the scheduling dependencies of the node on the Node Dependency and Instance Dependency tabs. You can modify the scheduling dependencies that do not meet your business requirements. Configure same-cycle scheduling dependencies
2022.01.15 Updated feature DataStudio A topic is updated to describe how to configure a resource group for scheduling for auto triggered nodes in DataStudio. To run an auto triggered node, you must configure a resource group for scheduling. You can select a resource group in the Resource Group section of the Properties panel for the node. Configure a resource group
2022.01.14 New feature DataStudio A topic is added to describe how to enable periodic scheduling and configure the scheduling settings for auto triggered nodes in DataStudio. To run auto triggered nodes as scheduled, you must go to the Scheduling Settings tab in DataStudio to enable periodic scheduling. Configure scheduling settings
2022.01.14 New feature DataStudio A topic is updated to describe how to configure recurrence for a node in DataStudio. DataStudio allows you to configure rerun-related parameters in the Schedule section of the Properties panel. Configure time properties
2022.01.14 New feature DataStudio A topic is updated to describe the types of scheduling parameters and the precautions for using scheduling parameters in DataStudio. You must configure system parameters as required in the Parameters section of the Properties panel based on your business requirements. Overview of scheduling parameters
2022.01.12 New feature DataAnalysis A topic is added to describe how to write Markdown texts and SQL code, run the code for queries, and then save the query results by using the SQLNotes feature of DataWorks. SQLNotes
2022.01.06 Updated feature DataStudio A topic is added to describe the features on the DataStudio page. This helps you understand the overall layout of and modules on the DataStudio page and view relevant topics with ease. Features on the DataStudio page

Changes in December 2021

Date Item Category Description References
2021.12.27 New feature Data Map A topic is added to describe how to create and manage a Cloudera Distribution Hadoop (CDH) Hive sampling crawler in Data Map. Data Map allows you to use the sampling crawler to sample a CDH Hive table. This way, Data Security Guard can detect sensitive data. If you configure de-identification rules in Data Security Guard, data of the sensitive fields that match the rules is de-identified when you preview data on the details page of a table in Data Map. CDH Hive sampling crawlers
2021.12.24 New feature API
  • A topic is added to describe how to call the GetDISyncTask operation to query the details about a real-time synchronization node or a synchronization solution.
  • A topic is added to describe how to call the DeployDISyncTask operation to deploy a real-time synchronization node or a synchronization solution.
  • A topic is added to describe how to call the GetDISyncInstanceInfo operation to query the status of a real-time synchronization node or a synchronization solution.
  • A topic is added to describe how to call the TerminateDISyncInstance operation to undeploy a real-time synchronization node.
GetDISyncTask, DeployDISyncTask, GetDISyncInstanceInfo, and TerminateDISyncInstance
2021.12.20 New feature DataService Studio Topics are added to describe how to use an Aviator function as the prefilter or post-filter for an API and how to edit code for the Aviator function based on the Aviator syntax. DataService Studio allows you to create an Aviator function and use the Aviator function as the prefilter or post-filter for an API. Create an Aviator function and use the Aviator function as a filter and Best practices of using Aviator functions as filters
2021.12.14 New feature Data Quality A topic is added to describe how to configure monitoring rules based on a monitoring rule template. Data Quality provides various built-in table-level and field-level monitoring rule templates based on which you can configure monitoring rules. Configure monitoring rules based on a monitoring rule template
2021.12.09 Updated feature Usage analysis A topic is added to describe how to view the data governance status in Data Governance Center. Data Governance Center allows you to view the data governance status from the following three perspectives: data production, data usage, and data management. You can select a perspective based on your business requirements to facilitate data governance.

The data pivoting feature allows data developers and administrators to view and analyze the information about tables, tasks, and resources in one or all workspaces. This helps data developers and administrators allocate resources.

Data pivoting
2021.12.02 New feature API
  • Topics are added to describe the API operations related to extension point events.
  • Topics are added to describe the API operations related to asynchronous and synchronization nodes.
API operations related to extension point events:
2021.12.01 Updated feature Data Map A topic is added to describe how to view the information about Object Storage Service (OSS) buckets in Data Map. After you obtain the permissions on OSS buckets on the OSS Data Management page, the related services are activated and the information about OSS buckets is displayed on the Data Management page. If you disable the services for a specific OSS bucket, information about the bucket is not displayed on the OSS Data Management page.

Changes in November 2021

Date Item Category Description References
2021.11.24 Updated feature Data Integration A topic is updated to describe how to configure HDFS Reader and HDFS Writer by using the codeless UI. HDFS Reader and HDFS Writer
2021.11.20 New feature API A topic is added to describe how to call the ListDags operation to obtain the details of DAGs for a single data backfill instance based on OpSeq. ListDags
2021.11.14 New feature DataStudio A topic is added to describe how to perform operations on multiple DataWorks objects at the same time. DataWorks allows you to modify configurations such as the owner of multiple nodes, resources, or functions at the same time. After the modification, you can commit and deploy the nodes, resources, or functions to the production environment. This way, the new configurations take effect. Perform operations on multiple DataWorks objects at a time
2021.11.08 New feature DataStudio A topic is added to describe the resource group orchestration feature. This feature allows you to change resource groups for the scheduling of multiple nodes in a workflow at the same time. If multiple resource groups for scheduling exist in your workspace, you can change the resource groups for scheduling of nodes in the workspace based on your business requirements. This helps you improve resource usage. Change the resource groups for scheduling for one or more nodes

Changes in October 2021

Date Item Category Description References
2021.10.26 New feature Data Modeling
  • A topic is added to describe the naming dictionary feature. A naming dictionary can be used to manage the roots and morphemes of business terms, physical tables, and fields and the standardized translation of the roots and morphemes. DataWorks Data Standard allows you to create naming dictionaries and export existing naming dictionaries.
  • A topic is added to describe the reverse modeling feature. If you use a modeling tool to generate models and you want to use DataWorks Dimensional Modeling for subsequent modeling operations, you can use the reverse modeling feature provided by DataWorks Dimensional Modeling. The reverse modeling feature allows you to import the models that are generated by using the modeling tool into a compute engine instance. The system creates models based on the imported models. This way, you do not need to manually create models in DataWorks Dimensional Modeling. This shortens the time required to apply new models.
2021.10.22 Updated feature Data Security Guard
  • A topic is updated to describe how to manage data sensitivity levels in a more efficient manner. DataWorks allows you to classify data based on the value, sensitivity level, impact, and distribution range of the data. The management policy and development requirements vary for data of different sensitivity levels.
  • A topic is updated to describe how to identify sensitive field types in an efficient manner. DataWorks allows you to identify sensitive data in your workspace by using the data identification rules configured for built-in and custom sensitive field types.
2021.10.15 New feature API
  • A topic is added to describe how to call the ListDeployments operation to query the information about deployment tasks.
  • A topic is added to describe how to call the UpdateIDEEventResult operation to send the check results of an extension point event to DataStudio after the extension point event is triggered and an extension checks the extension point event.
  • A topic is added to describe how to call the GetIDEEventDetail operation to query the data snapshot of an extension point based on the ID of a DataWorks open message when an event that has the extension point is triggered.
2021.10.14 New feature API A topic is added to describe how to call API operations to create, configure, and manage a data synchronization node in Data Integration. Use API operations to create, configure, and manage a data synchronization node
2021.10.11 New feature DataStudio A topic is added to describe the code search feature. The code search feature allows you to query code snippets in nodes by keyword. The search results show the details of each code snippet and the nodes that contain the code snippets. You can use this feature to trace the node that causes changes in a table. Code search

Changes in September 2021

Date Item Category Description References
2021.09.30 New feature Scheduling settings in DataStudio A topic is added to describe the configurations of scheduling parameters. Scheduling parameters are used during the running of DataWorks nodes. The values of scheduling parameters are automatically replaced with specific values based on the data timestamps of the nodes and the value formats of the scheduling parameters. This enables dynamic parameter settings during the running of nodes. Overview of scheduling parameters
2021.09.30 New feature Scheduling settings in DataStudio A topic is added to describe how to configure cross-cycle dependencies between nodes and the types of cross-cycle dependencies supported by DataWorks. If you configure cross-cycle dependencies for a node, the instances of this node can be run only if the last-cycle instances of the dependent node are run as expected. Configure previous-cycle scheduling dependencies
2021.09.26 New feature Data Map Topics are added to describe the new features of the Data Map service. Data Map allows you to query APIs in all workspaces that are owned by the current tenant and view details of the APIs. This enables quick queries. On the details page of an API, you can view the basic information, parameters, and sample responses of the API. Query an API

View the details of an API

2021.09.15 New feature DataAnalysis A topic is added to describe how to use SQL statements to query and analyze data of the added data sources in DataAnalysis. SQL query
2021.09.02 New feature Operation Center A topic is added to describe the advanced mode to generate data backfill node instances for auto triggered nodes. The advanced mode is used to generate data backfill node instances for multiple nodes at the same time. You can select nodes that may not have dependencies with each other. You can select nodes for which you want to backfill data in the DAG of an auto triggered node or in the node list on the Cycle Task page. View and manage data backfill instances

Changes in August 2021

Date Item Category Description References
2021.08.29 New feature Data Integration A topic is added to describe how to use the data masking feature. This feature can mask sensitive data in a single table that is synchronized in real time and store the data in a specific database. Configure data de-identification
2021.08.22 New feature Data Integration A topic is added to describe how to synchronize data to Kafka in real time by using the Data Integration service of DataWorks. Plan and configure resources
2021.08.11 New feature SSL-based authentication Topics are updated to describe the SSL-based authentication for data sources. If you configure data sources such as MySQL, SQL Server, and PostgreSQL databases, you can enable SSL-based authentication for the data sources. After the configuration is complete, only trusted applications and services can access the data resources. In DataWorks, SSL-based authentication is provided as a third-party identity authentication mechanism. Third-party identity authentication mechanisms are used to perform strict identity authentication on users and services. These mechanisms prevent untrusted applications or services from accessing data and improve the stability of data access during data synchronization.
2021.08.07 Updated feature Permission management system A topic is added to describe the permission management system of DataWorks. The permission management system of DataWorks consists of two parts: permissions controlled by using RAM and permissions controlled by DataWorks. Overview
2021.08.01 New feature Migration Assistant A topic is updated to describe the Migration Assistant service of DataWorks. Migration Assistant was officially commercialized on August 1, 2021. Migration Assistant allows you to migrate data objects across different DataWorks versions, Alibaba Cloud accounts, regions, and workspaces. You can export the data objects in your workspace, including auto triggered nodes, manually triggered nodes, resources, functions, data sources, table metadata, ad hoc queries, and script templates. You can also create full export tasks, incremental export tasks, or custom export tasks to export your data objects in DataWorks based on your business requirements. Overview

Changes in July 2021

Date Item Category Description References
2021.07.22 New feature API operation A topic is added to describe how to call the CreateDISyncTask operation to create a batch synchronization node. CreateDISyncTask
2021.07.14 New feature Configurations in the console A topic is added to describe how to log on to the DataWorks console and add a RAM user or a RAM role as an alert contact on the Alert Contacts page. If an error occurs during the running of a node, DataWorks sends alert notifications to the specified alert contact. This allows you to handle exceptions at the earliest opportunity. Configure and view alert contacts
2021.07.09 New feature Billing rules A topic is updated to describe the billing method for different editions that are used in the China East 2 Finance and China South 1 Finance regions. Billing of DataWorks advanced editions
2021.07.03 New feature Data Security Guard A topic is added to describe how to use the data traceability feature provided by DataWorks. This feature allows you to extract the watermark information of the data in a leaked data file. This helps you trace users who caused data leaks. Trace leak sources
2021.07.02 New feature Data Security Guard A topic is added to describe how to create and manage sample libraries based on the sample files that you provide. You can associate a sample library with a data identification rule to identify data. If the data to be identified contains the data in the sample library, the data to be identified matches the data identification rule. You can use sample libraries to identify enumerated values, such as employee names and user addresses. Create and manage sample libraries
2021.07.02 New feature Data Security Guard A topic is added to describe how to use sample fields to train models. DataWorks extracts the characteristics of these fields and generates a rule model. You can use this rule model to identify the data that has similar characteristics in your data assets. Generate a custom data identification model

Changes in June 2021

Date Item Category Description References
2021.06.11 New feature DataStudio A topic is added to describe how to create and run an E-MapReduce (EMR) Streaming SQL node. EMR Streaming SQL nodes allow you to use SQL statements to develop streaming analytics jobs. -
2021.06.11 New feature DataStudio A topic is added to describe how to create and run an EMR Spark Streaming SQL node. EMR Spark Streaming nodes can be used to process streaming data with high throughput. This type of node supports fault tolerance, which helps you restore data streams on which errors occur. -
2021.06.09 New feature Operation Center A topic is added to describe how to view the basic information and status of real-time computing nodes on the Stream Task page in the DataWorks console. This allows you to monitor the status of the nodes. In addition, you can configure alert rules for the nodes that you want to monitor. This way, you can identify and handle exceptions at the earliest opportunity. Manage real-time compute nodes

Changes in May 2021

Date Item Category Description References
2021.05.20 New feature Operation Center A topic is added to describe how to create and manage a shift schedule in DataWorks. If you set the Recipient parameter to Varies According to Shift Schedule and select a shift schedule when you create a custom alert rule, DataWorks can send alert notifications to the on-duty engineers that you specify for the shift schedule. After the engineers receive the alert notifications, they can identify and handle exceptions at the earliest opportunity. Create and manage a shift schedule
2021.05.17 New feature DataStudio A topic is added to describe how to create and run ClickHouse SQL nodes. A ClickHouse SQL node allows you to use a distributed SQL query engine to process structured data. This improves task efficiency. Create and use a ClickHouse SQL node
2021.05.15 New feature Data Integration Topics are added to describe how to synchronize data to AnalyticDB for MySQL 3.0 by using the Data Integration service of DataWorks.

Changes in April 2021

Date Item Category Description References
2021.04.29 New tutorial Getting started A topic is added to provide AI tutorials that teach how to develop nodes in DataWorks. AI tutorials
2021.04.28 New feature Data Integration A topic is added to describe how to add or remove source tables after you run a node to synchronize data to Hologres. Add or remove source tables to or from a synchronization solution that is running
2021.04.22 New feature DataStudio A topic is added to describe how to create an FTP Check node that can be used to periodically detect whether a specific file exists based on FTP. If the FTP Check node detects that the file exists, the scheduling system starts to run the descendant node of the FTP Check node. Otherwise, the FTP Check node to detect the file based on the configured detection interval. The FTP Check node stops the retry until the condition to stop the detection is met. In most cases, FTP Check nodes are used for communications between the DataWorks scheduling system and external scheduling systems. Create an FTP Check node
2021.04.06 API operation API operation A topic is added to describe how to call the GetPermissionApplyOrderDetail operation that is provided by Security Center. GetPermissionApplyOrderDetail
2021.4.05 New feature Data Integration A topic is added to describe how to synchronize data to Kafka in real time by using the Data Integration service of DataWorks. Plan and configure resources

Changes in March 2021

Date Item Category Description References
2021.3.19 New feature Custom role A topic is added to describe how to customize a role in a DataWorks workspace. Manage workspace-level roles and members
2021.3.11 New engine Open source engines from which tasks are exported or migrated Topics are updated to describe how to import tasks that are exported from the open source scheduling engines into DataWorks or migrate tasks from these scheduling engines to DataWorks.
2021.3.11 New feature Engine O&M A topic is added to describe how to use the engine O&M feature of DataWorks to view the details of each EMR node and identify and remove the nodes that fail to run. This way, failed nodes do not affect the performance of descendant nodes. Use the engine O&M feature
2021.3.9 New feature Aggregate analysis of auto triggered nodes in a DAG Topics are added to describe the aggregate view and the downstream and upstream analysis features of a DAG. You can view the details of an auto triggered node from the DAG and perform operations based on your business requirements.
2021.3.3 New feature API operation Topics are added to describe API operations that are added in the Operation Center, Data Security Guard, and Migration Assistant services of DataWorks.

Changes in February 2021

Date Item Category Description References
2021.2.24 New feature Viewing of the status information about synchronization nodes A topic is added to describe how to view the distribution and execution details of synchronization nodes that are run and how to troubleshoot abnormal nodes. This improves the O&M efficiency of synchronization nodes. View the status information about sync solutions
2021.2.9 New feature Real-time synchronization node A topic is updated to describe how to create a real-time synchronization node to synchronize data from a specific table. After the real-time synchronization node is created, you can view the status of the node. Synchronize data in a single table
2021.2.6 New feature Real-time synchronization node Topics are added to describe how to create a real-time synchronization node to synchronize data from specific or all tables in a database to MaxCompute, Hologres, or DataHub. After the real-time synchronization node is created, you can view the status of the node.
2021.2.5 New feature New feature A topic is added to describe how to add an ApsaraDB for OceanBase data source. You can use the ApsaraDB for OceanBase data source to create a synchronization node. Add an ApsaraDB for OceanBase data source

Changes in January 2021

Date Item Category Description References
2021.1.28 New feature New node types supported by DataStudio Topics are added to describe how to create and run a MySQL node and an AnalyticDB for MySQL node. You can use SQL statements to develop data for a MySQL data source and an AnalyticDB for MySQL data source.
2021.1.20 New feature Synchronization solution Topics are added to describe how to create a batch synchronization node and a real-time synchronization node to synchronize data from specific or all tables in a database to Elasticsearch. After the batch synchronization node and real-time synchronization node are created, you can view the status of the nodes.
2021.1.19 New feature Allowlist configuration and category management in Data Map A topic is added to describe how to configure allowlists and attach the AliyunDataWorksFullAccess policy to a specific RAM user. After you configure allowlists, you can collect metadata and manage categories in Data Map. Configure IP address whitelists for metadata collection
2021.1.13 New feature Integration with ActionTrail A topic is added to describe how to query DataWorks behavior events in ActionTrail. You can use the queried event details to perform behavior analysis, security analysis, resource change tracking, and compliance auditing. Use ActionTrail to query behavior events
2021.1.13 New feature Billing method for the Datablau feature of DataWorks A topic is added to describe the billing method for the Datablau feature of DataWorks.
2021.1.7 New feature New feature A topic is added to describe how to synchronize data from a MySQL data source to Elasticsearch. You can learn how to prepare resource groups and data sources, create a synchronization node, and view the status of the synchronization node. Synchronize data to Elasticsearch

Changes in December 2020

Date Item Category Description References
2020.12.24 New feature Synchronization solution Topics are added to describe how to synchronize data from the PolarDB, Oracle, or MySQL data source to Hologres or MaxCompute. You can learn how to prepare resource groups and data sources, create a synchronization node, and view the status of the synchronization node and the answers to the FAQ.
2020.12.14 New feature New feature A topic is added to describe how to create a crawler and collect metadata from a Tablestore data source to DataWorks. You can view collected metadata on the Data Map page. Collect metadata from a Tablestore data source

Changes in November 2020

Date Item Category Description References
2020.11.18 New feature API operation A topic is added to describe how to call the CreateManualDag operation to trigger the running of a manually triggered workflow. CreateManualDag
2020.11.18 New feature API operation A topic is added to describe how to call the GetManualDagInstances operation to query information about instances in a manually triggered workflow. GetManualDagInstances
2020.11.18 New feature API operation A topic is added to describe how to query the details of a DAG based on the ID of the DAG. GetDag
2020.11.18 New feature API operation A topic is added to describe how to call the SearchNodesByOutput operation to query a node based on the output. SearchNodesByOutput
2020.11.10 New FAQ User experience optimization A topic is added to provide answers to frequently asked questions about Operation Center. Overview
2020.11.02 New feature New feature A topic is updated to describe the code review configuration in DataWorks. If you enable forcible code reviews, you must commit each node for the specific reviewer to review the code of the node. You can deploy the node only after the reviewer approves the code. Code review

Changes in October 2020

Date Item Category Description References
2020.10.30 API operation overview User experience optimization A topic is added to describe the applicable scopes, billing rules, and call limits of DataWorks API operations and describe the DataWorks API operations by function. Overview
2020.10.28 New feature New feature A topic is added to describe how to create an EMR table. Create an EMR table
2020.10.28 New feature New feature A topic is updated to describe how to associate an EMR cluster with a DataWorks workspace. In DataWorks, you can create nodes such as Hive, MapReduce, Presto, and Spark SQL nodes based on an EMR compute engine and configure EMR workflows. You can also schedule the nodes and manage metadata. This improves your data output. Associate an EMR cluster with a DataWorks workspace as an EMR compute engine instance

Changes in September 2020

Date Item Category Description References
2020.09.03 Updates of billing method Billing A topic is updated to describe the pay-as-you-go billing method. This billing method allows you to use all the basic features of DataWorks in a cost-efficient manner. Overview
2020.09.03 Updated feature User experience optimization A topic is updated to provide basic information about DataWorks and to describe the features and limits of DataWorks. What is DataWorks?
2020.09.02 New tutorial User experience optimization A topic is added to describe how to use DataWorks together with Machine Learning Platform for AI (PAI) to automatically identify users who steal electricity. This ensures that users use electricity in a safe manner. Overview

Changes in August 2020

Date Item Category Description References
2020.08.07 New resource groups User experience optimization DataWorks provides custom resource groups for scheduling and custom resource groups for Data Integration to ensure that nodes are flexibly scheduled and data is synchronized as early as possible. A topic is added to describe how to create a custom resource group for scheduling and change the resource group for a node to the created custom resource group for scheduling. Create custom resource groups for scheduling
2020.08.07 New data source New feature A topic is added to describe how to configure a Hive data source. A Hive data source allows you to read data from and write data to Hive. You can use the codeless UI or code editor to configure synchronization nodes. Add a Hive data source
2020.08.07 New data source New feature A topic is updated to describe how to configure a GBase8a data source. A GBase8a data source allows you to read data from and write data to GBase8a by using GBase8a Reader and Writer. You can use the codeless UI or code editor to configure synchronization nodes for GBase8a. Add a GBase8a data source
2020.08.07 New data source New feature A topic is updated to describe how to configure a Hologres data source. A Hologres data source allows you to read data from and write data to Hologres by using Hologres Reader and Writer. You can use the codeless UI or code editor to configure synchronization nodes for Hologres. Add a Hologres data source
2020.08.07 New data source New feature A topic is updated to describe how to configure an HBase data source. An HBase data source allows you to read data from and write data to HBase by using HBase Reader and Writer. You can use the code editor to configure synchronization nodes for HBase. Add an HBase data source
2020.08.07 New data source New feature A topic is updated to describe how to configure an Elasticsearch data source. An Elasticsearch data source allows you to read data from and write data to Elasticsearch. You can use the code editor to configure synchronization nodes for Elasticsearch. Add an Elasticsearch data source
2020.08.07 New FAQ User experience optimization A topic is added to describe how to troubleshoot issues related to connectivity, parameters, and permissions when you add data sources in DataWorks. Connection creation
2020.08.07 New feature New feature A topic is added to describe how to create an EMR Presto node. EMR Presto nodes allow you to perform interactive analysis and queries on large-scale structured and unstructured data. Create an EMR Presto node
2020.08.05 New release notes of features User experience optimization A topic is added to describe the release notes of key features of DataWorks.

Announcements and updates

Changes in June 2020

Date Item Category Description References
2020.06.30 New FAQ User experience optimization A topic is added to provide answers to frequently asked questions about DataWorks services and features. These services and features include Data Integration, DataStudio, custom resource groups, exclusive resource groups, dependencies, Alarm, and DataService Studio. FAQ
2020.06.28 New feature New feature A topic is added to describe how to add a route to a VPC or a data center. Add a route
2020.06.28 New best practice User experience optimization A best practice is added to describe how to use exclusive resource groups for Data Integration to migrate data from a self-managed MySQL database hosted on Elastic Compute Service (ECS) to MaxCompute. Migrate data from a user-created MySQL database on an ECS instance to MaxCompute
2020.06.28 New best practice User experience optimization A best practice is added to describe how to use AIRec. AIRec is developed by Alibaba Cloud based on cutting-edge big data and AI technologies, and years of experience in the e-commerce industry. AIRec provides a personalized recommendation service to increase the customer purchase rate and order conversion rate. Intelligently recommend items on e-commerce websites
2020.06.28 New best practice User experience optimization A best practice is added to describe how to grant specific users access permissions on specific resources such as tables and user-defined functions (UDFs). This best practice involves data encryption and decryption algorithms that ensure data security. Grant a specified user the access permissions on a specific UDF
2020.06.28 New best practice User experience optimization A best practice is added to describe how to build a data warehouse for an enterprise based on AnalyticDB for MySQL and use the data warehouse for O&M and metadata management. Build a data warehouse for an enterprise based on AnalyticDB for MySQL
2020.06.28 New best practice User experience optimization A best practice is added to describe how to use a PyODPS node in DataWorks to segment Chinese text based on Jieba, an open source segmentation tool. The topic also describes how to write the segmented words and phrases to a new table and use closure functions to segment Chinese text based on a custom dictionary. Use a PyODPS node to segment Chinese text based on Jieba
2020.06.28 New best practice User experience optimization A best practice is added to describe how to use a PyODPS node that is running on an exclusive resource group to send emails. Use a PyODPS node to send emails
2020.06.28 New best practice User experience optimization A best practice is added to describe how to connect DataV to DataWorks DataService Studio. You can create APIs in DataService Studio and call the APIs in DataV. Then, DataV presents analysis results of the MaxCompute data. Connect DataV to DataWorks DataService Studio
2020.06.28 New best practice User experience optimization A best practice is added to describe how to use a PyODPS node in DataWorks to reference a third-party package. In this topic, a PyODPS 2 node is used as an example. Use a PyODPS node to reference a third-party package
2020.06.28 New best practice User experience optimization A best practice is added to describe how to enable automatic synchronization of IoT data to the cloud. IoT is a network that carries data based on the Internet and traditional telecommunication networks. IoT allows physical objects that can be independently addressed to be used as data sources. Automatically synchronize IoT data to the cloud
2020.06.16 New tutorial User experience optimization Data quality is crucial for effective and accurate data analysis. A tutorial is added to describe the application scenarios and the standards to access data quality. Overview
2020.06.15 New data source New data source A topic is added to describe how to configure an ApsaraDB for OceanBase data source. An ApsaraDB for OceanBase data source allows you to read data from and write data to ApsaraDB for OceanBase. You can use the code editor to configure synchronization nodes. Add an ApsaraDB for OceanBase data source
2020.06.15 New data source New data source A topic is added to describe how to configure a Vertica data source. A Vertica data source allows you to read data from and write data to Vertica. You can use the code editor to configure synchronization nodes. Add a Vertica data source
2020.06.15 New feature New feature A topic is added to describe the data types that are supported by GBase8a Reader and the parameters that you can use to configure GBase8a Reader. For example, you can specify the data source and configure field mapping for GBase8a Reader. This topic also provides an example on how to configure GBase8a Reader. Gbase8a Reader
2020.06.15 New feature New feature A topic is added to describe Hologres Reader. Hologres Reader allows you to export data from Hologres data warehouses. You can read data from Hologres tables and then write the data to other data sources based on the standard protocol of Data Integration. Hologres Reader
2020.06.15 New feature New feature A topic is added to describe Hologres Writer. Hologres Writer allows you to import data from multiple data sources to Hologres for real-time data analysis. Hologres Writer
2020.06.15 New feature New feature A topic is added to describe how to configure a resource group for node scheduling. You can select the required resource group for node scheduling in the Resource Group section. Configure a resource group
2020.06.15 New description User experience optimization A topic is added to describe the logic of scheduling dependencies. Node dependencies must be correct to ensure that business data is generated in an effective and timely manner and to standardize data development. This results in an effective workflow. Logic of same-cycle scheduling dependencies
2020.06.15 New resource groups New feature A topic is added to describe how to create and use exclusive resource groups for scheduling. DataWorks allows you to associate an exclusive resource group for scheduling with a VPC. This way, the resource group can connect to data sources in the VPC. Create and use an exclusive resource group for scheduling

Changes in May 2020

Date Item Category Description References
2020.05.27 New usage notes User experience optimization DataWorks supports shared resource groups, exclusive resource groups, and custom resource groups. A topic is added to describe the scenarios and methods of using these resource groups. Overview
2020.05.27 New feature New feature A topic is added to describe how to manage report templates. You can create a template of data quality reports on the Report Template Management page. DataWorks Data Quality can periodically generate and send data quality reports based on the template. Create and manage report templates
2020.05.27 New feature New feature A topic is added to describe how to manage rule templates. DataWorks Data Quality allows you to manage a set of custom rules and create a rule template library to configure rules in a more efficient manner. Create, manage, and use rule templates
2020.05.27 New feature New feature A topic is added to describe the verification logic of Data Quality and the built-in rule templates that DataWorks provides for monitoring offline data. Built-in monitoring rule templates

Changes in April 2020

Date Item Category Description References
2020.04.19 Service upgrade DataWorks V3.0 Topics are updated to describe how to use Operation Center. In Operation Center, you can view the dashboard, manage auto triggered nodes and manually triggered nodes, and monitor nodes. Operation Center
2020.04.18 Service upgrade DataWorks V3.0 Topics are updated to describe the overall process of how to build a MaxCompute data warehouse. Build and optimize a data warehouse
2020.04.18 Service upgrade DataWorks V3.0 Topics are added to describe the Data Integration service of DataWorks. Data Integration is a stable, efficient, and scalable data synchronization service. Data Integration is designed to migrate and synchronize data between a wide range of heterogeneous data sources in complex network environments in a fast and stable manner. Data Integration
2020.04.08 Service upgrade DataWorks V3.0 A topic is updated to describe a complete process of data development and O&M. Overview
2020.04.08 Service upgrade DataWorks V3.0 Topics are updated to provide an overview of DataWorks, including the basic concepts, scenarios, and data development processes. What is DataWorks?

Changes in March 2020

Date Item Category Description References
2020.03.26 New tutorial User experience optimization A tutorial is added to describe the complete operations in the DataWorks for EMR workshop. DataWorks for EMR Workshop
2020.03.17 Service upgrade DataWorks V3.0 A topic is updated to describe the upgraded data development mode. In the upgraded data development mode, you can group multiple workflows under a solution in a workspace. The previous hierarchical structure is no longer used. DataStudio
2020.03.17 Service upgrade DataWorks V3.0 Topics are updated to describe various types of nodes in DataWorks, such as batch synchronization nodes, MaxCompute nodes, EMR nodes, general nodes, and custom nodes. Node Type
2020.03.02 Service upgrade DataWorks V3.0 Topics are updated to provide an overview of the DataWorks console. You can view the workspaces, resource groups, and compute engines in the DataWorks console. Overview of the DataWorks console

Changes in February 2020

Date Item Category Description References
2020.02.29 New best practice User experience optimization A best practice is added to describe how to use the data synchronization feature of DataWorks to migrate data from Oracle to MaxCompute. Best practice to migrate data from Oracle to MaxCompute
2020.02.02 New feature New feature Topics are added to describe how to use DataAnalysis. DataAnalysis allows you to collaboratively edit and analyze workbooks, manage MaxCompute tables in tabular mode, and generate and share visual reports. Overview

Changes in December 2019