Developer Content

From November 26-27, 2022, Flink Forward Asia (FFA) Summit was successfully held. Flink Forward Asia is a technology summit officially authorized by the Apache Software Foundation and hosted by Alibaba Cloud. It is one of the largest Apache top-level project conferences in China, and also an annual event for Flink developers and users. Due to the epidemic situation, the summit is still online. In addition, the summit also held the award ceremony of the fourth Tianchi Real Time Computing Flink Challenge, in which a total of 11 teams from 4346 participating teams came to the fore through various competitions and finally won the awards.

The FFA Conference summarized the development of Apache Flink in the past year as usual. In 2022, the Apache Flink community will continue to maintain rapid development: the number of Github Stars will exceed 20000; The total number of code contributors exceeds 1600; The number of downloads in a single month exceeded 14 million. Among them, the Apache Flink Chinese community is particularly flourishing: according to ossinsight.io, 45% of all the PRs of the Apache Flink project are from Chinese developers up to now; The official WeChat official account authorized by the Apache Software Foundation and managed by Apache Flink PMC has published more than 130 technology sharing articles in 2022, with a cumulative number of subscribers exceeding 60000; The newly opened WeChat video account has released 36 videos, and now there are nearly 4000 subscribers.

We are pleased to see that Apache Flink has become the global de facto standard for real-time stream computing. With its powerful real-time big data computing capability, Flink has formed a series of solutions for real-time big data scenarios, such as real-time large screen display, real-time data integration, real-time lake warehouse analysis, real-time personalized recommendation, real-time risk control monitoring, and has become the core driving force to promote the real-time upgrading of data analysis in all walks of life.

Next, this article will briefly summarize several Keynotes topics in the main forum of the FFA Summit. Interested partners can watch the video replay of all topics on the official website.

Cloud and open source are rooted in the digital world

Before the Keynotes topic, Mr. Jia Yangqing, Vice President of Alibaba Group, Head of Alibaba Open Source Technology Committee and Head of Alibaba Cloud Intelligent Computing Platform, as the opening guest, shared his understanding of the relationship between cloud and open source.

In the era of industrial digitalization and digital industrialization, cloud and open source have been symbiotic, growing together and building the root of a digital world. How to better combine open source and commercialization? We think cloud is the most important link. The cloud provides a better environment for the deployment and acquisition of open source software. In the elastic environment provided by the cloud, users can obtain the ability of open source software and platforms with one click. The symbiosis of cloud and open source software also enables users to have wider and more flexible choices. Everyone can find the most suitable open source software portfolio to solve their own business problems. In this process of development, the concept of cloud native has gradually formed.

In the past decade or more, Alibaba has been a firm supporter and practitioner of open source software and communities, forming a "trinity" strategy: the technologies of open source communities, Alibaba's internal applications, and the technologies provided to customers through commercial forms on Alibaba Cloud are unified. Open source provides a very good user experience. In a large-scale scenario like Alibaba, many personalized or systematic requirements can be generated. Their concerns complement each other. Alibaba has contributed its best practices back to the open source community, so that the ease of use of the community can be well combined with the stability and flexibility used by large-scale enterprises.

Taking Apache Flink as an example, Alibaba began to use Flink as a technical route for internal real-time computing in 2016, and built an internal system based on Flink. Since 2016, we have gradually contributed Blink back to the community, and by 2018, we have become the largest contributor to the Flink community. Today, we are pleased to see that 1/4 of the members of the Apache Flink project management committee are from Alibaba. With the promotion of Alibaba and the cooperation of the whole community, Flink has been adopted by most Internet enterprises in China as the de facto standard of streaming computing. Flink has also been the most active project in the Apache community for two consecutive years.

Today's iteration of cloud and open source also makes people have new exploration in the direction of open source software. Take Flink for example. At first, it was a platform that used Java API to implement stream computing. It has gradually developed capabilities like SQL in Alibaba and Alibaba Cloud applications. In recent years, Alibaba has also been exploring new directions according to its own needs for using Flink, such as the Flink CDC project, which is developing rapidly in the direction of data integration, the Flink ML project combined with machine learning, the streaming digital warehouse concept combined with traditional digital warehouse, and the Flink Table Store project launched under this concept. In addition, there are also many common technologies in the whole big data field. For example, the remote shuffle service of large-scale distributed computing in the storage computing separation environment has similar requirements in Flink, Spark, Hive and other engines. We are also pleased to announce that Alibaba Cloud has donated the Remote Shuffle Service project bred in its own cloud scenario to the Apache Software Foundation, named Apache Celeborn.

Alibaba is not only a beneficiary of open-source software, but also a contributor. Open source has become an integral part of Alibaba's engineer culture. More and more engineers are learning from the open source community, actively participating in open source software and community building, and contributing our own projects to the open source community at an appropriate time. I believe that in the future, we will continue to work with the open source community to provide users with a platform and a way to access and use software more easily based on the cloud. At the same time, we will build a more prosperous open source community with our technical strength and that of the community.

Flink Towards Streaming Data Warehouse

The keynote topic of the main forum was opened by Mr. Wang Feng, the initiator of Apache Flink Chinese Community and the head of Alibaba Cloud's open source big data platform, as usual, to introduce the major technological innovations and achievements of Apache Flink community in 2022, as well as the future development direction.

Apache Flink 2022 - Data Real Time Technology Innovation

In 2022, Apache Flink released two major versions. In Flink 1.15, the community focused on solving many long-standing historical problems, including cross version upgrade of SQL jobs, ownership semantics and lifecycle management of state snapshots, synchronization of Watermark progress across data sources, concurrency settings of batch job adaptive operators, etc. In the 1.16 version of Flink, the community made more new innovations and attempts, including upgrading the distributed consistent snapshot architecture, innovative streaming batch adaptive integration Shuffle, improvement of streaming SQL dimension table Join based on asynchronous and cache technology, full compatibility with Hive ecology, and full production availability of PyFlink functions and performance.

New upgrade of distributed consistent snapshot architecture

Apache Flink is a stateful streaming computing engine, and distributed consistent snapshot is a very core technology. Flink regularly takes snapshots and persists the state during the flow calculation process. When the job is abnormal, it can recover from the latest snapshot to ensure business continuity. Therefore, it is the common appeal of Flink users to take snapshots at a higher frequency and at a lower cost to make business more smooth. However, in a real production environment, especially in a large-scale and complex production environment, distributed consistent snapshots face many challenges: on the one hand, in the back pressure state, the network buffer is congested, and the Barrier used for distributed snapshots cannot be transmitted down the data flow, and the snapshots cannot be triggered in time; On the other hand, even if the snapshot can be triggered, the amount of local state data that needs to be persisted to the remote storage system and the upload time are uncontrollable. As a result of the above reasons, users often fail to generate distributed snapshots within the specified time, which seriously affects the stability of the business.

To solve the above problems, Flink has upgraded the entire distributed consistent snapshot architecture in all aspects in recent versions, mainly including:

• Unaligned Checkpoint: When the Barrier alignment time reaches a certain threshold, it is automatically converted to an Unaligned Checkpoint, taking into account the amount of checkpoint data and the Barrier alignment time.

• Buffer Deblocking: Only the amount of data that can be processed by downstream operators within 1s is cached. On the premise of avoiding the impact of network transmission on operator performance, the amount of data cached between operators is minimized.

• Log based Checkpoint: the state table is decoupled from the incremental log and uploaded asynchronously, greatly reducing the cost of generating snapshots.

With the implementation of the above technologies in Flink 1.16, Flink has formed a new generation of distributed consistent snapshot architecture.

A new generation of state storage management architecture for cloud native

The cloud native era has arrived. Every basic software project needs to consider how to adapt to this era, and Apache Flink is no exception. For Flink, the most significant change brought about by the cloud native era is the demand for resource elastic expansion, which requires that the concurrency of Flink jobs can change with the business volume and resources. When the concurrency changes, Flink's state storage also needs to be reallocated quickly, that is, split and merged state storage. Therefore, the splitting and merging performance of Flink state storage is directly related to the experience of Flink's elastic expansion and contraction.

In version 1.16 of Flink, the community has greatly optimized the state reconstruction algorithm of RocksDB State Backend, which has achieved a performance improvement of 2-10 times, making Flink's elastic expansion more smooth and more adaptable to the cloud native era.

In addition, the community also plans to further upgrade the Flink state storage management system to a complete separation of storage and computing architecture to adapt to the cloud native environment. At present, Flink's state storage management system is not a real storage computing separation architecture. All state data is still stored in the local RocksDB instance. Only when distributed snapshots are taken, incremental data is copied to remote storage to ensure that a full amount of state data is stored in remote storage. In the future, Flink's state data will all originate from remote storage, and local disks and memory will only be used for cache acceleration, forming a tiered storage system, which we call Tiered State Backend architecture.

Hybrid Shuffle innovative technology of stream batch integration

Stream batch integration and stream batch integration is a very distinctive technical concept of Apache Flink, while Shuffle is a very core technology in distributed computing systems that is highly performance related. Flink 1.16 innovatively introduced the Hybrid Shuffle technology of streaming batch integration.

Before that, Apache Flink adopted two different shuffle technologies in streaming mode and batch mode:

• Streaming Pipelined Shuffle: Upstream and downstream tasks are directly connected through the network, data is transmitted through memory and network, and disk IO is not required, so the performance is better.

• Batch Blocking Shuffle: Upstream tasks first write intermediate data to disk or other storage services, and then downstream tasks read intermediate data from disk or storage services. Disk IO is required, and the performance is slightly poor.

Can streaming shuffles also be used in batch execution mode to speed up batch shuffles? It is OK from the perspective of technology itself, but it will face greater constraints in the production environment. Streaming Shuffle requires all connected tasks to be pulled up at the same time, which requires more resources. Whether there are so many resources in the production environment cannot be guaranteed, and even deadlock may occur. If we can only pull up the connected tasks at the same time for streaming shuffle acceleration when resources are sufficient, and at the same time degenerate to batch shuffle when resources are insufficient, we can make more reasonable use of resources to accelerate shuffle. This is the background and idea of Hybrid Shuffle.

The first version of Hybrid Shuffle has been completed in Flink 1.16, and the preliminary evaluation shows that it has achieved good performance improvement compared with the traditional Blocking Shuffle. In subsequent versions, the community will further improve and optimize this technology.

Flink CDC full incremental integrated data synchronization

Flink CDC, a full incremental integrated data synchronization technology based on Apache Flink, is a new concept proposed in recent two years.

Why build a full incremental integrated data synchronization engine based on Flink? Flink is essentially a streaming distributed computing engine. In fact, it has become a data pipeline connecting different storage. Flink has a rich connector ecosystem that can connect to various mainstream storage systems, and an excellent distributed architecture that supports distributed snapshot, stream batch integration and other mechanisms. These are the features required by a full incremental all-in-one data integration engine. Therefore, it is very suitable to build a full incremental integrated data synchronization engine based on Flink, which is the origin of the Flink CDC project.

Last year, we launched Flink CDC 1.0 and got very good feedback from the developer ecosystem. Therefore, this year we increased investment and launched a more mature and perfect Flink CDC 2.0. The main features of Flink CDC 2.0 include:

• The general incremental snapshot framework abstraction reduces the access cost of new data sources and enables Flink CDC to quickly access more data sources.

• Support high-performance parallel reads.

• Based on Flink's distributed snapshot mechanism, it can realize breakpoint continuous transmission of data synchronization and improve reliability.

• There is no lock on the data source in the whole process, and data synchronization has no impact on online business.

Flink CDC innovation project is growing rapidly and is becoming a new generation of data integration engine. At present, Flink CDC has supported mainstream databases including MySQL family, PolarDB, MongoDB, etc., and has access to the incremental snapshot framework. In addition, it also supports familiar databases such as DB2, SQLServer, PostgreSQL, TiDB, OceanBase, etc. It is believed that more data sources will access the Flink CDC framework in the future. The project has also won unanimous praise from developers in the open source ecosystem, and the number of Github Stars has exceeded 3000.

A new generation of iterative computing framework helps Flink ML-2.0

In the old version of Flink, there is a Flink ML module, which is a machine learning algorithm library based on DataSet API. As the basic API layer of Flink is unified to the streaming and batching DataStream API, the original Flink ML module and DataSet API are also abandoned. This year, Flink ML was rebuilt by the Flink community based on the DataStream API as a new sub project, and two versions have been released.

As we all know, the core of machine learning algorithm base operation is the iterative computing framework. Based on Flink DataStream API, Flink ML 2.0 has rebuilt a set of iterative computing framework integrating streaming and batching, which can support online training, training interruption recovery and high-performance asynchronous training on unlimited streams. Flink ML 2.0 is still in its infancy. The first batch of dozens of algorithms has been contributed by Alibaba Cloud's real-time computing and machine learning teams, covering common feature engineering scenarios, and supporting low latency near line reasoning. It is expected that more companies and developers will participate in the future, contribute more classic machine learning algorithms to Flink ML, and make Flink's excellent computing power play a greater role in machine learning scenarios.

Apache Flink Next - Streaming Data Warehouse

At the FFA Summit last year, we proposed the next technological evolution direction of the Apache Flink community - Streaming Data Warehouse.

First, let's review the evolution process of core technology concepts in Flink's history, which helps us understand why we think Streaming Data Warehouse is the next evolution direction of Flink.

• Stateful Streaming: Flink was favored by developers at the beginning of its birth, replacing the previous generation of stream computing engine Storm and becoming a new generation of stream computing engine. The key lies in its positioning of stateful stream computing. By organically integrating stream computing with state storage, Flink can support accurate data consistency of stateful stream computing on the framework layer while maintaining high throughput and low latency.

• Streaming SQL: Early Flink development must write Java programs, which makes the Flink project meet the bottleneck of high promotion threshold after several years of rapid development. In the world of data analysts, the standard language for facts is SQL. So in 2019, Alibaba Cloud contributed the Blink SQL accumulated internally to the Flink community, greatly reducing the development threshold of Flink, and making the application of Flink in all walks of life grow explosively.

• Streaming Data Warehouse: Flink's streaming and batching SQL can realize the experience of full incremental development integration of the computing layer, but it cannot solve the problem of storage layer fragmentation. It is difficult to query and analyze the data in streaming storage, and the timeliness of data in batch storage is poor. Therefore, we believe that the new opportunity of Flink community in the next stage is to continue to improve the integrated experience, and build a streaming database of integrated experience through streaming batch integrated SQL+streaming batch integrated storage.

Flink Table Store, a new sub project launched by Flink community, is positioned to achieve the storage capacity of streaming and batching, and to achieve high-performance streaming read, streaming write, batch read and batch write. The design of Flink Table Store follows the concept of separating storage from computation. Data is stored on the mainstream cloud storage, and its core storage format consists of LakeStore and LogStore. LakeStore applies classic LSM, ORC and other indexing technologies, and is suitable for large-scale, high-performance data update and reading. LogStore provides ChangeLog with complete CDC semantics, which can be incrementally subscribed to the Table Store with Flink Streaming SQL for streaming data analysis. In addition, Flink Table Store adopts an open data format system. In addition to the default connection to Flink, it can also connect to Spark, Hive, Trino and other mainstream open source computing engines.

Since the birth of Flink Table Store one year ago, two versions have been launched, and the incubation from 0 to 1 has been completed. At present, in addition to Alibaba Cloud, developers from companies such as ByteDance are also participating in co construction and trial. We made a performance comparison between Flink Table Store and Hudi, the current mainstream data lake storage. The results show that Flink Table Store's update performance is significantly ahead of Hudi's, and its query performance is significantly ahead of Hudi MOR mode and close to Hudi COW mode, with better overall performance.

Application and Practice of Apache Flink Real time Computing in Midea's Multi business Scenarios

The second Keynotes topic was brought by Ms. Dong Qi, the real-time data principal and senior data architect of Midea Group. She shared the application and practice of Apache Flink real-time computing in traditional and emerging business scenarios in the United States from the perspective of the household appliance industry.

Teacher Dong Qi first introduced the development and construction status of real-time ecosystem in the United States. Midea's real-time digital warehouse system construction mainly focuses on three elements: timeliness, stability and flexibility. In terms of timeliness, a timeliness assurance framework with Flink as the core is designed; In terms of stability, it includes a series of verifications for data source connectivity and metadata parameter format in the development phase and monitoring alarms for cluster resources and task status in the operation phase; In terms of flexibility, it includes unified metadata, UDF, Connector and other resource management, and support for task management functions such as task templates and common logic.

Flink has played an important role in the digital transformation of Midea's core traditional business scenarios. Teacher Dong Qi shared three of them.

Long cycle scenario on the B side: the specific business scenario includes the visualization of the American Cloud App Kanban and the full link order. The procurement, marketing, inventory analysis and tracking of long lead orders in traditional industries all need to backtrack the data for a long time in the past, which poses a great challenge to real-time computing. In our architecture, the historical full data is imported by Flink automatically loading Hive partition table, and combined with Kafka incremental data for further calculation and processing.

Factory production progress: factory managers and employees can see the production progress of each hour through the real-time large screen, which is of great practical value for better completing daily production tasks.

Large screen of order grabbing activities: the order grabbing activities for agents, operators and retailers involve interests in price, supply, new product launch, etc., which is very critical. The real-time large screen on the activity site is of great significance to guide operators to adjust their operation strategies, and agents and retailers to carry out retail and order grabbing activities.

In Midea's emerging business scenarios, there are also many real-time digital application practices based on Flink. In this regard, Mr. Dong Qi also shared three scenarios.

Real time intelligent control of household equipment: Refrigerator cloud housekeeper, floor washer cloud housekeeper, electric heating cloud housekeeper and other products all have the function of analyzing user behavior, adjusting and controlling the behavior of intelligent household appliances to achieve the purpose of energy saving and water saving. Flink consumes the device data in Kafka, associates it with Redis/HBase users, products, third-party data, algorithm models, and rules, writes the results back to Kafka, and finally sends the device instructions through the IoT cloud. In addition, Flink also undertakes the function of real-time monitoring in this set of links.

HI service real-time message push: In addition to the automatic regulation function, smart home products also have many functions that need to be completed through human-computer interaction and manual manipulation, such as fault reminder, completion reminder, consumables reminder, etc. This set of links is very similar to the real-time intelligent control of household devices, except that the final data will be written to the third-party push platform.

Large screen for monitoring e-commerce activities: business data, manually collected and entered business data by operators into the database, captured incremental change data through CDC technology, then processed by Flink, and built a real-time large screen through StarRocks+QuibkBI for fast and intuitive operation decisions.

Teacher Dong Qi pointed out that Midea Group's next real-time ecosystem construction will focus on cost reduction, efficiency improvement and tool empowerment, including cloud native deployment, hot spot balance, root cause of task error reporting and repair tips and other basic operation and maintenance capabilities, as well as visual configuration integration tools, fine-grained resource allocation, streaming batch integration practice of platform and business side.

Application of Apache Flink in Miha Tour

Next is the sharing from Mr. Zhang Jian, the head of the big data real-time computing team of Mihayou.

Teacher Zhang Jian first introduced the development process and platform construction of Flink in Miha Tour. At the beginning of the establishment of the real-time computing platform of Mihayou, Apache Flink was chosen, which is based on the excellent features of Flink such as millisecond delay, window computing, state storage, fault tolerance recovery, and the flourishing community behind it. The original real-time computing platform is completely based on the Flink DataStream API, and has preliminary task management and operation and maintenance capabilities. With the growth of business, the Mihayou real-time computing platform has entered a high-speed development stage since 2020, and started to build a one-stop development platform based on SQL, which has promoted the construction of multi cloud and cross regional task management, SQL and connectors, index and log system, metadata and blood ties, and greatly improved the efficiency of research and development. This year, the Mihayou real-time computing platform began to move towards new goals, and began to promote the function deepening and scenario coverage of the one-stop development platform, including the construction of static and dynamic optimization, automatic capacity expansion, resource elasticity, near real-time digital warehouse and other capabilities.

In terms of application, Mr. Zhang Jian shared four important internal application scenarios of Miha Tour.

Standardized collection and processing of global game logs: Flink is responsible for the daily log processing of nearly 10 billion yuan of the whole game business of Miha Tour, with a peak traffic of over 10 million. Logs received through Filebeat collection and log reporting services are transferred to Kafka real-time data bus, processed by Flink SQL, written to downstream Clickhouse, Doris, Iceberg and other storage, and provided to customer service query system, real-time operation analysis, offline digital warehouse and other application scenarios

Real time report and real-time large screen: we will provide real-time large screen services for important indicators according to business needs, and provide real-time indicator application viewing based on BI reports for operations. In the scenario of community post sorting, data sources are reported to Kafka at the client burial point, and incremental data of the business library is captured through Flink CDC. In order not to introduce additional KV storage, and to solve the problem of association failure caused by untimely updating of dimension tables, we combined the task of Flink streaming consumption Kafka and the task of Flink CDC capturing the business library into the same task, and used RegularJoin for association. Here we have expanded Flink SQL to control the life cycle of the underlying state refinement and avoid the expiration of the dimension table state. The associated data is then calculated by indicators and provided to the post sorting service.

Near real time data warehousing: We realized the near real time offline warehousing of logs through the way that Flink SQL writes to Iceberg in real time. The data warehousing efficiency has been shortened from the hour level to the minute level, and the volatility of offline storage IO has also been much more stable. Through Flink CDC, MySQL database is synchronized in full volume and increment, and combined with the platform's ability of one click task generation, automatic optimization and expansion, automatic submission and operation, the database can be accessed in one click, which greatly improves the development efficiency and reduces the pressure on the database. A typical application scenario of near real time data warehouse is the player's record query.

Real time risk control: In Miha Tour, the risk control team is closely connected with the real-time computing team. The risk control team has provided a good risk control engine, and the real-time computing team has built a relatively automated API and task management mode based on the risk control engine, making the real-time computing platform service-oriented. Specific application scenarios include login verification, game anti cheating, man-machine verification, etc.

Teacher Zhang Jian introduced that the future work of Mihayou in the field of real-time computing mainly includes three aspects: first, platform capacity building, including Flink SQL, resource optimization, automatic operation and maintenance, resource flexibility, etc; The second is the exploration of usage scenarios, such as delayed message service, Binlog service based on Flink CDC, application level indicator service, etc; The third is the constant practice of data lake and TableStore, including the practice and exploration of streaming batch integration and near real-time digital warehouse.

Application Practice of Disney Streaming Media Advertising Flink

The last Keynotes topic was jointly brought by Ms. Hao Youchao, executive director of Disney advertising intelligence, and Ms. Li Dingzhe, director of real-time computing of Disney advertising intelligence.

Hao Youchao first introduced Disney's streaming media advertising business. Hulu, as the leading streaming media platform in the United States, was first founded by Disney, Fox and NBC. With the acquisition of Fox in 2019, Disney has gained Hulu's operational control and high-quality resources of the advertising platform. It has started to promote online streaming media and successively launched Disney+, ESPN+, Star+and other brands. At present, Disney streaming media has 235 million subscribers (in households) worldwide, surpassing Netflix. Hulu is the main source of Disney's current streaming media advertising business. Every day, hundreds of millions of 15 second and 30 second video ads are launched. Every ad you choose will generate dozens or even hundreds of events, posing a high challenge to the data platform. With Disney+going online in December, this challenge is expected to multiply. Disney streaming media advertising data platform is divided into two layers: data algorithm and application service. Apache Flink is mainly used in the data algorithm layer to aggregate key indicators in operational data in real time.

Next, Mr. Li Dingzhe shared the details of real-time data in Disney streaming media advertising data platform. In the real-time link, Flink performs unified streaming calculation on the data collected from the system and the user side, and the calculated indicators are exposed to the business platform, operation and maintenance platform, advertising server, etc. through the data interface. In the offline link, Spark is used to generate offline reports and output external data, and Flink is used to backfill indicators.

Li Dingzhe also shared three real-time application scenarios of using Flink for Disney streaming media advertising.

Advertising decision-making funnel: advertising decision-making is a complex process, which requires selecting the most suitable ads for users from the huge advertising pool through rough sorting, fine sorting and a series of filtering conditions. In order to troubleshoot this complex process, we abstract it into a funnel model to display information such as advertising opportunities, targeting success, whether it is filtered, whether it is successful in the end, and the reasons for failure. We use Flink to decode and correlate the decision information obtained from the advertising server, restore the decision funnel and submit it to the front end for display. In the offline link, we practiced Flink's streaming and batching integration, using the same set of code to correct errors and backfill data when real-time data has problems.

Advertising exposure monitoring: advertisers usually put forward some requirements for advertising, such as targeting specific groups of people, limiting the number of times of advertising within the same time period for the same user, and avoiding the simultaneous appearance of competitive advertising. In response to these needs, we have developed an advertising exposure monitoring platform so that advertisers can view information about their advertising. In this scenario, we use Flink to associate and enhance the context information and user behavior from the advertising system and client, generate a series of fact indicators, and calculate more derivative indicators based on specific rules.

Large screen of the advertising system: it looks like the management and the business side, and provides overall insight into the advertising system and advertising. The data from the fact data source is processed by Flink, exposed through the indicator interface, aggregated according to different business rules, and finally released to the front end for large screen display.

Teacher Li Dingzhe introduced that the real-time data platform of Disney streaming media ads was built on the cloud and deployed on the Kubernetes container scheduling system. Flink operators were used to manage Flink clusters, and Hong Scheduler, streaming batch job mix, queue based elastic capacity expansion and other technologies were practiced.

At the end of the question, Mr. Hao Youchao shared some application scenarios of Flink on Disney streaming media advertising platform in the future, including full streaming and batch, OLAP, real-time attribution, streaming machine learning, etc.

summary

At this conference, we are pleased to see that the Apache Flink community is still developing continuously and prosperously: in terms of community construction, the scale and activity of global and Chinese communities have reached new heights; In terms of technological growth, the state, fault tolerance, shuffle, data integration, machine learning and other directions are continuously innovating, and the Flink Table Store, a streaming batch integrated storage for the future streaming digital warehouse, has also made gratifying progress; In terms of industrial applications, more and more companies from different industries are joining Flink's production practice team to continuously feed back technology accumulation and new demands to the community. Let's look forward to Apache Flink getting better and better~!

Overview of Flink Forward Asia 2022 Main Forum

Related Articles

A detailed explanation of Hadoop core architecture HDFS

What Does IOT Mean

6 Optional Technologies for Data Storage

What Is Blockchain Technology

Explore More Special Offers

Short Message Service(SMS) & Mail Service

Sales Support

Technical Support

Connect & Report Abuse