This topic introduces Alibaba Cloud DataWorks and describes the features and limits of DataWorks.

DataWorks is a platform as a service (PaaS) of Alibaba Cloud and provides services such as Data Integration, DataStudio, Data Map, Data Quality, and DataService Studio. DataWorks also provides an end-to-end data development and management console to help enterprises mine and explore data value.

DataWorks supports multiple types of compute and storage engines, such as MaxCompute, E-MapReduce (EMR), Realtime Compute for Apache Flink, Machine Learning Platform for AI, Graph Compute, and Hologres. DataWorks also allows you to use custom computing and storage services. DataWorks provides end-to-end big data services and allows you to perform artificial intelligence (AI) development and data governance.

DataWorks simplifies operations such as data transmission, conversion, and integration. You can import data from various data sources, convert, develop, and process the data, and then transmit the data to other data systems. Architecture

Limits

DataWorks supports only Google Chrome 69 and later and the new Microsoft Edge (Chromium).

Learning path

You can quickly learn the concepts, basic operations, and advanced operations of DataWorks from the documentation homepage of DataWorks. For more information, see Documentation homepage. Documentation homepage

Features

  • DataWorks is fully hosted on the cloud.
    • DataWorks provides powerful scheduling capabilities. For more information, see Schedule.
      • In DataWorks, nodes can be triggered by time- or dependency-based scheduling configurations. For more information, see Configure time properties and Configure same-cycle scheduling dependencies.
      • DataWorks enables tens of millions of auto triggered nodes to accurately run on time every day based on the node relationships that are defined in directed acyclic graphs (DAGs).
      • DataWorks allows you to schedule nodes by minute, hour, day, week, month, or year.
    • DataWorks is a fully managed service that frees you from server deployment.
    • DataWorks provides the isolation feature to ensure that nodes of different tenants do not affect each other.
  • DataWorks supports a variety of node types. For more information, see Select a data development node.

    Engine capabilities are encapsulated into DataWorks. You do not need to use complex engine CLIs. DataWorks provides custom wrappers for custom nodes. This way, you can add computing task types and use custom nodes to access custom computing services. You can also use custom nodes with other types of DataWorks nodes to process complex data.

    • Data integration: DataWorks supports more than 20 types of data sources and provides stable, efficient data transmission features based on the Data Integration service. For more information, see Data Integration.
    • Data conversion:
      • DataWorks provides superior performance for big data analytics and processing based on the powerful capabilities of compute engines. You can create nodes of various compute engine types, such as ODPS SQL nodes, ODPS Spark nodes, EMR Hive nodes, and EMR MR nodes.
      • DataWorks provides multiple types of general nodes, such as assignment nodes, do-while nodes, and for-each nodes. You can use general nodes with compute engine nodes to analyze and process complex data.
      • DataWorks provides custom nodes and allows you to use the nodes to access custom computing services and develop data. For more information, see Overview.
  • DataWorks provides a visual interface for you to develop code.

    DataWorks provides a graphical user interface (GUI) for you to develop code and design workflows. You can perform simple drag-and-drop operations and code development operations to create complex data analytics nodes. For more information, see GUI elements.

    A browser with Internet access enables you to develop code anytime and anywhere.

  • DataWorks supports monitoring and alerting.

    Operation Center provides a visualized node monitoring and management tool and displays the statuses of nodes in DAGs. For more information, see Operation Center.

    You can configure various alert notification methods to notify relevant personnel at the earliest opportunity when a node error occurs. This ensures normal business operations. For more information, see Monitor.

Features provided by DataWorks Professional Edition

  • Resource optimization: reduces computing and storage costs.
  • Mobile O&M: implements easy, rapid node O&M.
  • Runtime diagnosis: helps you quickly locate difficult issues.
  • Intelligent monitoring: improves productivity and provides intelligent solutions for alerts.
  • Field-level data lineage: helps you quickly locate the source of dirty data.
  • Multiple types of control nodes: meet the logic of complex business scenarios.
  • Data Security Guard: protects the security of your data.
  • Development of real-time stream computing nodes: enables you to easily use new technologies in DataWorks.
DataWorks Professional Edition provides more features and significantly increases the efficiency of big data governance.