DataWorks full-link data governance new product - Apsara 2022

Date: Oct 1, 2022

Apsara conference 2022, Alibaba Cloud released the DataWorks full-link data governance product system. Based on various big data architectures such as data warehouse, data lake, and lake-warehouse integration, DataWorks helps enterprises to continuously manage internal The rising "data hanging river" releases the data productivity of enterprises Apsara conference 2022.

"As the amount of data becomes larger and larger, the value of unit data will become smaller and smaller, and full-link data governance allows data to flow from low-quality and low-efficiency to high-quality and high-efficiency Apsara conference 2022."

Jia Yangqing, vice president of Alibaba Group and senior researcher of Alibaba Cloud Intelligent Computing Platform Division, said at the scene. The sedimentation of the Yellow River has continuously raised the river bed, forming a "hanging river" where the river is higher than the ground level. As a result, the embankments on both sides are also constantly increasing. Apsara conference 2022 the digital transformation of enterprises, the amount of data becomes larger and larger, the number of machines becomes more and more, and the team becomes larger and larger. Is digital transformation really getting better and better? For companies, the apparent prosperity does not mean that there will not be a "flood" in the future. In Alibaba, Double 11 has become a daily routine. In 2021, the daily data processing water level of the big data computing service MaxCompute has exceeded the peak of Double 11 in 2020. The ever-increasing data volume has caused great pressure on cost and efficiency Apsara conference 2022.

Machine Efficiency + Human Efficiency = Data Efficiency

In the face of such expansive data every year, Alibaba's solution is to make data efficiency a core indicator of the enterprise through the capabilities of the big data + AI integrated platform. In terms of machine efficiency, MaxCompute, as an offline data warehouse, can process 1.7EB of data per day. However, in addition to the data volume, it should be noted that MaxCompute supports 75% of the data volume growth with only 10% machine growth. . Here, MaxCompute continuously pursues the ultimate optimization in the underlying storage and performance, and has broken the TPCx-BigBench 100TB performance world record for five consecutive years. At the same time, as a real-time data warehouse, Hologres writes 596 million records per second at peak value, and stores up to 2.5PB in a single table. It provides multi-dimensional analysis and services based on trillions of data, and 99.99% of queries can return results within 80ms. Hologres and MaxCompute form an offline, real-time, analysis, and service-integrated data warehouse, which greatly simplifies the complexity of big data architecture from the bottom up. Machine-level efficiency is often easy to measure, but human efficiency is harder to quantify. DataWorks has become the unified big data development and governance platform of Alibaba Group since 2009, completing the construction of Alibaba's data center. For the perfection and ease of use of a platform, users often vote with their feet. At present, the number of daily active users of the large-scale collaborative data center built on DataWorks has exceeded 50,000. On average, one out of every three Alibaba employees is using DataWorks, serving almost all departments within Apsara conference 2022, and the entire chain of precipitation There are more than hundreds of core capabilities of road data governance. In FY2020, Alibaba's comprehensive revenue from data governance exceeded 1 billion yuan. It can be said that the big data development and governance platform DataWorks and the computing engines MaxCompute and Hologres form the "Wintel Alliance" under the big data architecture to jointly improve the efficiency of enterprise data.

Construction experience: from small workshop to large platform to agile manufacturing

Whether it is data governance or data center, it has never been a product conceived from an ivory tower, but has been ground for many years. Alibaba's digital transformation has also experienced the era of slash and burn. Each business team maintains multiple Hadoop clusters, like small workshops: what is used, what is needed, and various technical components are gradually piled up like building blocks. In this process, it is often very painful. The platform releases a new function, and for some reason, another component hangs up, and then the technicians spend a long time to troubleshoot what is wrong with the other component, and repair a Component, released once, and then hung up another, the problem keeps popping up like "pressing the gourd and floating the scoop", as if there is no end. Therefore, Alibaba started a vigorous platform unification plan, built a large platform, changed the open source architecture to a self-developed architecture, and gradually migrated data to MaxCompute. At this time, the concept of data middle platform also began to be promoted within the group, and the three ONE data middle platform methodologies were gradually implemented in DataWorks, completing the construction of Alibaba's entire data middle platform. So far, from the core e-commerce Tmall Taobao, to, Youku, Hema and other business teams, all of them have conducted one-stop collaborative data development on the same large platform. However, with the popularity of large platforms, more and more people use them, and the governance of data will become more and more complicated. With thousands of tables constantly being generated, companies have no way of knowing how many irregular statements are consuming a lot of computing resources like termites; how many tables are being replicated repeatedly, creating an apparent "data boom"; how many dirty Data is constantly being produced and polluting the quality of data; how many tables are constantly being used for permission, facing data security risks. All of these issues pose serious challenges to large platforms. As a result, large platforms are gradually evolving towards agile manufacturing. Through full-link data governance capabilities, they can be managed and controlled from a global perspective, and at the same time, data decision-making can be decentralized.

DataWorks full-link data governance new product release

At the Apsara conference 2022 Full-Link Data Governance Summit, DataWorks released a series of new full-link data governance products based on hundreds of data development and governance capabilities accumulated over the past 12 years.

Data Governance Center

Data governance is not only a technical issue for an enterprise's big data team, but also an organizational and management issue. For the entire organization, how to measure the ultimate effect of data governance? How to better develop the initiative of Apsara conference 2022 organization? In some enterprises, a special data committee will be established to formulate some data governance specifications, but it is found that the platform cannot support these specifications well, or the enterprise has purchased a data platform, but does not know how to use the platform to Complete the work of data governance. In Alibaba, we often refer to the concept of health points. From the perspective of organizational design, there are platform teams, business teams, and collaborative teams such as risk control and finance under the data committee. For a certain business team, it will set a goal for this year, such as raising the health score from 80 to 90, starting from computing, storage, etc., not only from the business side and the production side to carry out governance optimization work, if there is demand It will also be referred to the data platform team to optimize and evolve the engine and data platform products, and everyone will work towards this goal together. Organizations have measurable ways that these departments can put those numbers into their own goals. At the same time, long-term operational work such as various data governance battles, competitions among various teams, etc., can also be continuously extended through healthy divisions to achieve the purpose of organizing data collaboration and give full play to the initiative of the data governance organization.

The newly released data governance center of DataWorks forms enterprise data governance health points in five aspects: enterprise computing, storage, R&D, quality, and security. Based on the problem-driven concept, it covers the whole link of proactive data governance before, during and after the event. and data governance health assessment. Enterprise data governance is no longer a "staged project", but a "sustainable operational project Apsara conference 2022".

Smart Data Modeling

Enterprises have built a platform and done a lot of standardized governance. What is the value to business personnel? How much cost is saved and how many problems are managed is relatively indifferent to business personnel. The business side only wants to get the desired data faster, so the original data warehouse construction method is more about running from the bottom up in small steps, and quickly meeting the needs first. Today's full-link data governance has allowed the construction of data warehouses to evolve towards standardization and sustainable development, emphasizing the two-pronged approach of top-down normative modeling from a business perspective and bottom-up construction of data warehouses from a development perspective Apsara conference 2022.

DataWorks newly released intelligent data modeling, which precipitates Alibaba's data middle platform construction methodology, and interprets the data business of the business from a business perspective from four aspects: data warehouse planning, data standards, dimensional modeling, and data indicators. Intelligent data modeling supports rapid data modeling, including forward modeling and reverse modeling, and provides minute-level model creation capabilities. At the same time, through data development, you can directly publish data models to multiple engines, generate quality rules with one click, publish tables directly, and automatically generate simple ETL codes. The business personnel of the enterprise can easily understand the whole picture of the data, quickly obtain the required data indicators, and conduct data analysis and exploration based on the data model. A truly effective decentralization can be achieved Apsara conference 2022!

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us