Catch the replay of the 2021 Alibaba Cloud Summit Live at this link!
By DataWorks Team
The big data technologies of enterprises have enjoyed two metamorphoses. The first one is the transformation from the initial small workshop for solving the big data problem to the later building big platform with various big data technologies. The data productivity is upgraded through platform capabilities.
The second one is the paradigm evolution from big platforms to agile manufacturing. At the 2021 Alibaba Cloud Summit, Jia Yangqing, Senior Fellow of Computing Platform at Alibaba Cloud Intelligence, released an all-in-one big data development and governance platform based on DataWorks. This is the best evidence of this transformation.
Born in 2009, Alibaba Cloud DataWorks has gone through several transformations along with Alibaba's big data industry over the past decade. It is also one of the best practices for constructing Alibaba Cloud's data middle platform. Today, DataWorks supports data business construction for hundreds of business teams at Alibaba and stably schedules tens of millions of data processing tasks every day. Each day, more than 50,000 Alibaba employees perform data analysis, data development, and data governance tasks on DataWorks.
How does DataWorks enable the big data evolution to agile manufacturing? What are its core capabilities? At the Comprehensive-Procedure Data Services―Big Data and AI Sub-Forum, Huang Boyuan, Product Owner of DataWorks, revealed the three ONEs (three core capabilities) of DataWorks.
DataWorks is an all-in-one concept that includes one data development procedure, one standard data architecture, and one data governance system. It is combined with a big data computing engine that enables the all-in-one data development and governance capabilities for enterprises.
The following are the situations that enterprises encounter when they reach a certain stage of development:
1) Data comes from data centers in different regions. For example, the data is scattered since there are Alibaba Cloud's public cloud and Alibaba Cloud Apsara Stack. The enterprises also have their private domains for external customers and partners.
2) Big Data engines are diversified. It is difficult to choose the best one. Alibaba Cloud provides a wide range of products, such as SaaS-based cloud data warehouse MaxCompute, Hologres, and AnalyticDB. Some open-source products, such as EMR, CDH, Flink, and Elasticsearch, are also available. Therefore, enterprises should choose them according to their needs.
3) How can data be integrated better with AI and applications? AI algorithms are also needed so the data processed by big data products can be applied in a service-oriented manner. Thus, BI and AI are integrated to unleash the data value.
DataWorks can help enterprises implement data integration, data development, data governance, and data services to solve the preceding problems. It also integrates the full lifecycle management of big data into one complete procedure.
Firstly, DataWorks supports real-time and offline synchronization for nearly 50 types of data sources under complex network conditions. This is the first step in big data development. Secondly, it integrates with big data engines, such as MaxCompute, EMR, CDH, Hologres, AnalyticDB, and Realtime Computing for Apache Flink, at the bottom layer. This allows the data development and governance for multiple computing engines to be performed on the same platform. Lastly, datasets processed by the big data platform can be integrated seamlessly into the machine learning platform for AI training and online prediction. Data can also be applied in various data services, such as BI and big screen, with the help of AI.
For enterprises, data is never simply piled up. Alibaba Cloud builds a data middle platform, standardizes, and unifies the group's data architecture. This architecture divides the data with a clear scope and boundary at each layer. At the Operational Data Store (ODS) layer, enterprises gather data from all domains and retain all the raw data. At the integration layer, enterprises establish data standardization systems through data standards and data modeling. At the Data Warehouse Summary (DWS) layer, enterprises summarize data based on business requirements to extract common data metrics. At the Application Data Store (ADS) layer, a data mart is built for frontend business applications to provide continuous high-quality data services for applications. This architecture is not capable of productization but enterprises can replicate this standard data architecture quickly based on DataWorks.
How do enterprises manage data assets? How can enterprises guarantee data quality and data security? How can enterprises control costs and reduce unnecessary waste effectively? These problems make it more challenging for data governance. Normally, all kinds of data governance work can also be done manually. Alibaba Cloud processes more than 1.7 EB of data and tens of millions of scheduling tasks every day, which cannot be done manually. DataWorks can do the job based on more than a decade of data governance experience from Alibaba Group. It provides comprehensive-procedure governance functions needed in data processing and utilization, such as model design, data quality management, metadata management, and security management. A complete and systematic set of capabilities are implemented on one platform.
At the sub-forum live, DataWorks released a new data modeling product that enables enterprises to conduct data warehouse planning, data standard definition, dimension modeling, and data metric design from a business perspective. It also allows enterprises to use standardized drawings to guide the construction of big data, improving the normalization and standardization and lowering the threshold and cost of enterprise data middle platform construction. DataWorks will expand the cooperation with ecosystem partners to launch data modeling products with different industry attributes and modeling methods. By doing so, it can support the design of data warehouse models in different industries and scenarios. The DataWorks data modeling product will launch its public beta in July 2021. You are welcome to learn more about DataWorks at https://www.alibabacloud.com/product/ide
In addition to data modeling, DataWorks covers governance capabilities throughout the data lifecycle, such as data synchronization, metadata, data assets, data quality, data map, task O&M, data security, data analysis, and data service.
DataWorks has been applied to the digital transformation of various industries. In the industrial sector, DataWorks helps Sany implement 86 core business systems, process 50 PB of image, video, and IoT data each month, and build the most comprehensive data middle platform in the whole industry. In the energy industry, DataWorks helped enterprises establish more than ten data middle platform operations standards, realize more than 50 metrics for four scenarios, standardize data governance processes, and improve data availability. In the iron and steel industry, DataWorks allows data to be transferred freely on the data middle platform to ensure data accuracy, punctuality, and consistency, reducing costs by RMB 0.1 billion for enterprises. In the Internet industry, Poizon built a comprehensive-procedure data lineage through the DataWorks OpenAPI. It also developed the comprehensive-procedure parsing capability independently, with 20,000 offline tables and nearly 1,000 computing tasks, reducing the costs by 20%.
Future digital transformations of enterprises will put forward higher requirements on data governance and analysis. DataWorks will help enterprises build a data middle platform quickly and provide high-quality databases through comprehensive-procedure data governance. Thus, it allows the transformation from agile manufacturing of data to the agile transformation of enterprise digitalization.
Disclaimer: The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.
Alibaba Clouder - March 1, 2021
Alibaba Cloud MaxCompute - July 15, 2021
Alibaba Developer - January 29, 2021
Alibaba Clouder - July 23, 2021
Alibaba Clouder - September 15, 2020
Alibaba Clouder - February 2, 2021
Alibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.Learn More
Alibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.Learn More
A real-time data warehouse for serving and analytics which is compatible with PostgreSQL.Learn More
This all-in-one omnichannel data solution helps brand merchants formulate brand strategies, monitor brand operation, and increase customer base.Learn More
More Posts by Alibaba Clouder