Community Blog The Scenarios of Dataworks

The Scenarios of Dataworks

This topic describes typical scenarios of DataWorks.

DataWorks Definition

DataWorks is an important platform as a service (PaaS) of Alibaba Cloud. It offers all-around services, including Data Integration, DataStudio, Data Map, Data Quality, and DataService Studio. In addition, it provides a one-stop data development and management console to help enterprises mine and explore data value.

DataWorks supports multiple compute and storage engines, including MaxCompute, E-MapReduce, Realtime Compute for Apache Flink, Machine Learning Platform for AI, Graph Compute, and Hologres. It also allows you to use custom computing and storage services. As an all-in-one platform, DataWorks provides end-to-end big data services, artificial intelligence (AI) development, and data governance.

DataWorks simplifies data transmission, conversion, and integration. You can import data from different data stores, convert, analyze, and process the data, and then transmit the data to other data systems.

DataWorks Features

DataWorks is hosted on the cloud.

  • DataWorks provides powerful scheduling capabilities. For more information, see Schedule.

    • In DataWorks, nodes can be triggered by time- or dependency-based scheduling configuration. For more information, see Configure time properties and Dependencies.
    • DataWorks enables tens of millions of nodes to run accurately and on time every day based on node relationships in directed acyclic graphs (DAGs).
    • DataWorks allows you to run nodes at custom intervals in minutes, hours, days, weeks, or months.
  • DataWorks is a cloud-hosted environment that frees you from server deployment.
  • DataWorks provides the isolation feature to ensure that nodes of different tenants do not affect each other.

DataWorks supports multiple node types, including batch sync node, Shell node, ODPS SQL node, and ODPS MR node.

  • Data conversion: By using the powerful computing capabilities of MaxCompute, DataWorks ensures superior performance in analyzing and processing big data.
  • Data integration: Based on the Data Integration service, DataWorks supports more than 20 types of data stores and provides stable and efficient data transmission features.

DataWorks provides visualized code development.

DataWorks provides a graphical user interface (GUI) for you to develop code and design workflows. You can perform simple drag-and-drop operations to create complex data analytics nodes without the need to use development tools.

A browser with Internet access enables you to develop code anytime, anywhere.

DataWorks supports monitoring and alerting.

Operation Center provides a visualized node monitoring and management tool and displays the overall node running status in DAGs.

You can configure various alert notification methods to promptly notify relevant staff when a node error occurs. This ensures normal business operation.

Dataworks Scenarios

Log and big data analysis

  • Improved work efficiency: DataWorks allows you to synchronize log data to MaxCompute and use SQL statements to analyze and process the log data. This improves your work efficiency.
  • Enhanced storage efficiency: DataWorks saves the overall costs and improves the performance and stability of storage and computing services.
  • Simplified use of big data: DataWorks supports multiple open source MaxCompute plug-ins so that you can easily migrate data to the cloud.

Related services:
DataWorks + Data Integration + AnalyticDB for MySQL + Quick BI + MaxCompute


Refined business operations

  • Improved business insights: With the help of MaxCompute, DataWorks allows you to refine business operations on millions of users.
  • Data-based business: DataWorks helps you effectively analyze and monitor business data to improve your business efficiency.
  • Quick response to business demands: DataWorks supports business data analysis so that you can quickly process new business demands.

Related services:
DataWorks + Data Integration + Quick BI + MaxCompute


Data security management

  • Sensitive data identification: DataWorks can automatically identify sensitive data and use tags to classify the data based on custom rules.
  • Sensitive data de-identification and presentation: DataWorks allows you to set data de-identification rules to de-identify the sensitive information during data presentation.
  • Risk monitoring of sensitive data operations: DataWorks allows you to monitor data distribution, usage, and export in a visualized manner, and customize risk levels for auditing.

Related service: Data Security Guard of DataWorks


Related Blog

Data Quality Monitoring with DataWorks

In this section, you will learn how to perform data quality monitoring. This section will mainly go over how you can monitor the data quality in the process of using the data workshop, set up quality monitoring rules, monitor alerts and tables.

The Latest Features of DataWorks: How to Choose the Right Edition of DataWorks

This article describes the intermediate-to-advanced features of DataWorks Advanced Edition and introduces the features and applicable scenarios for each feature of DataWorks Basic Edition, Standard Edition, Professional Edition, and Enterprise Edition. It helps you select the most suitable DataWorks edition to solve your problems.

How Does DataWorks Support More Than 99% of Alibaba's Data Development

Independently developed by Alibaba, DataWorks is used to build and administer 99% of the data-driven and data-focused business operations of Alibaba Group by tens of thousands of data and algorithm development engineers every day.

Initially released in 2010, DataWorks has undergone many technological changes and architecture upgrades up to what is the current version, unfortunately resulting in a great deal of historical baggage. Technological innovation and business development often work well together and complement each other, but they can also restrict each other and cause various problems. The latter is the case with DataWorks. The big data product has some long-standing problems, of which include slow access, extensive code changes required to fix a single bug, and environmental complexity. Problematically, previous iterations have not fundamentally upgraded DataWorks and resolved all of these problems. Rather, they have only improved performance, optimized the underlying engineering structures, and reduced repeated code.

This article will take a look at how we can resolve some of the problems that have plagued DataWorks by adopting the wildly popular microservice architecture and explore how we can transform the technical architecture of DataWorks in a practical manner while avoiding jumping through several complicated engineering hoops.

Related Product

Alibaba Cloud DataWorks

DataWorks is a Big Data platform product launched by Alibaba Cloud. It provides one-stop Big Data development, data permission management, offline job scheduling, and other features.

DataWorks works straight ‘out-the-box’ without the need to worry about complex underlying cluster establishment and Operations & Management.

0 0 0
Share on

Alibaba Clouder

2,605 posts | 747 followers

You may also like