A Deep Understanding of DataWorks

DataWorks is a big data research and development platform, which uses MaxCompute as its main calculation engine. These two Alibaba Cloud products' interfacing also includes data integration, modeling, growth, operations tracking, data processing, and security, among other features. DataWorks provides a robust big data analytics approach with the algorithm application PAI, which covers everything from big data creation to Data Mining and Machine Learning.

How Does DataWorks Support More Than 99% of Alibaba's Data Development?

Independently developed by Alibaba, DataWorks is used to build and administer 99% of the data-driven and data-focused business operations of Alibaba Group by tens of thousands of data and algorithm development engineers every day.

Initially released in 2010, DataWorks has undergone through many technological changes and architecture upgrades up to what is the current version, unfortunately resulting in a great deal of historical baggage. Technological innovation and business development often work well together and complement each other, but they can also restrict each other and cause various problems. The latter is the case with DataWorks. The big data product has some long-standing problems, of which include slow access, extensive code changes required to fix a single bug, and environmental complexity. Problematically, previous iterations have not fundamentally upgraded DataWorks and resolved all of these problems. Rather, they have only improved performance, optimized the underlying engineering structures, and reduced repeated code.

This article will take a look at how we can resolve some of the problems that have plagued DataWorks by adopting the wildly popular microservice architecture and explore how we can transform the technical architecture of DataWorks in a practical manner while avoiding jumping through several complicated engineering hoops.

Cooperation and Competition

The DataWorks R&D platform provides a range of functions to assist with daily development work. Users can experience the design features of various functions when using the platform. This is something that is still lacking in platform R&D in general. The PD and the user experience designer (UED) collect requirements and try out the functions themselves. However, without a background in data development, the PD and UED cannot experience the subtle disappointment that is unique to data developers after long-term use. The usage of the DataWorks R&D platform varies greatly in different sectors, like finance, banking, government, large state-owned enterprises, Internet companies, traditional enterprises, private enterprises, and education. Some customers may not know how to use DataWorks. Moreover, users' needs vary and they have different knowledge and skills.

After frontline delivery teams or companies apply DataWorks in fields we have not considered, requirements are collected from these industries and sent to the PD for analysis. Frontline teams can package some DataWorks APIs and provide them as products to customers in specific industries to help solve their problems.

New products are being planned. The engine team uses DataWorks to improve the user-friendliness of designed products. It is difficult to scale up DataWorks to meet the requirements of product planning and improvement if only developers are working according to the schedule. Considering the frontend and backend architectures and countless instances of cooperation and competition, we need to achieve a technical revolution to break away from the SOA and introduce more user-side R&D capabilities. We hope this will allow us to make DataWorks more robust.

dataworks

Related Products

MaxCompute

MaxCompute (previously known as ODPS) is a general purpose, fully managed, multi-tenancy data processing platform for large-scale data warehousing. MaxCompute supports various data importing solutions and distributed computing models, enabling users to effectively query massive datasets, reduce production costs, and ensure data security.

DataWorks

DataWorks is a Big Data platform product launched by Alibaba Cloud. It provides one-stop Big Data development, data permission management, offline job scheduling, and other features.

DataWorks works straight ‘out-the-box’ without the need to worry about complex underlying cluster establishment and Operations & Management.

Community

A Deep Understanding of DataWorks

How Does DataWorks Support More Than 99% of Alibaba's Data Development?

Cooperation and Competition

Related Tutorials

Data Acquisition with DataWorks

Data Processing with DataWorks

Data Quality Monitoring with DataWorks

Related Products

MaxCompute

DataWorks

Related Documentation

Basic mode and standard mode

Read previous post:

Read next post:

Alibaba Clouder

You may also like

Comments

Alibaba Clouder

Related Products

DataWorks

MaxCompute