Developer Content

Apache Flink technology development

Big data has been developing at a high speed for more than 10 years, and big data is also evolving from large-scale computing to more real-time.

For example, during the Double 11 Shopping Festival organized by Alibaba, the real-time transaction volume and turnover of the entire Double 11 can be displayed on the real-time large screen, and can be updated in milliseconds; the CCTV Spring Festival Gala, which is watched by Chinese people all over the world, can be viewed through the Large screen, real-time statistics of national ratings and audience portraits; now many cities have urban brain projects, through IoT camera information, real-time capture of traffic, vehicles, people flow and other information in each city for traffic monitoring and governance ; There is also the financial industry, in the core business scenarios of banks, stock exchanges and other institutions, real-time monitoring of transaction behavior through big data real-time computing power, and detection of anti-cheating and anti-money laundering; In the entire Taobao e-commerce transaction scenario, personalized recommendations are made in real time based on the user's behavior. Based on the user's viewing of the product in the previous minute or 30 seconds, the system will calculate the user's portrait according to the algorithm in the subsequent browsing, and then send it to the user in real time. The user recommends related products that the user may like. It can be said that so many scenes involved in daily life are driven by real-time computing to improve productivity day and night.

Real-time computing requires a set of extremely powerful big data computing capabilities in the background. Apache Flink emerged as an open source big data real-time computing technology. It was started by stream computing from the beginning of its design, because traditional computing engines such as Hadoop and Spark are essentially batch computing engines, and the processing delay cannot be guaranteed by processing data on limited data sets. Apache Flink, as a streaming computing engine, can subscribe to real-time real-time data in real time, analyze and process the data in real time and generate results, so that the data can play its value in the first time.

At present, Apache Flink also gradually has the computing power of streaming and batch integration from the stream computing engine. It can perform streaming analysis and processing through log streams, click streams, and IoT data streams. At the same time, it can also process limited files in databases and file systems. The data set is processed in batch mode, and the results can be analyzed quickly. Apache Flink is now a very popular open source big data technology in the open source community, and has been one of the most active Apache open source projects in the world for three consecutive years. It has strong consistent computing capabilities, large-scale scalability, and excellent overall performance. It also supports multiple languages such as SQL, Java, and Python, and has rich API interfaces to facilitate business use in various scenarios. At present, Flink has become the mainstream real-time big data computing technology among Internet companies at home and abroad, and is the de facto technical standard in the field of real-time computing.

Alibaba Cloud's real-time computing Flink version product has been tempered and verified within the Alibaba Group for many years, and has accumulated a wealth of technologies and products. Now it has been provided on the cloud to provide cloud computing services for small and medium-sized enterprises in various industries. As early as 2016, the third year after Apache Flink was donated to Apache, Ali had already started to use real-time computing products on a large scale. This product was first launched in Ali's core search recommendation and advertising business scenarios. In this scenario, we need a large amount of real-time data processing, such as real-time recommendation, real-time sorting, real-time advertising, etc., which are very important to the core business of the entire e-commerce. big boost.

Product Development History

In 2017, Flink-based real-time computing platform products began to serve the entire Alibaba Group. In the same year, Double 11 served the real-time data of the entire group, including the most core Double 11 big screen. In 2018, the product was officially launched on the cloud, not only serving the group, but also starting to serve small and medium-sized enterprises on the cloud. This is also the first time that real-time computing Flink products have been provided external services in the form of public clouds.

At the beginning of 2019, Alibaba acquired Flink's founding company - Ververica, Ali's Flink technical team - real-time computing technology team and the Flink founding team of the German headquarters successfully joined forces, becoming the strongest Flink technology team in the world, and jointly promoting the entire Apache Development and contributions from the Flink open source community. At present, more than 200,000 developers have participated in the Chinese Apache Flink community, and Flink has become one of the most active projects in the big data field of the Apache Foundation.

Last year, the world's mainstream cloud computing companies and big data companies have adopted Flink's technology to launch their own Flink products. For example, Cloudera, which started with Hadoop, has also launched CDP/CDH that fully integrates Flink, and domestic big data companies have also successively launched real-time computing products based on Flink.

Real-time computing Flink product architecture

Compared with the open source version, Alibaba Cloud's real-time computing product architecture has greatly improved and added value. Many developers now use the open source Apache Flink to build their own real-time computing platforms when building their own computer rooms or operating virtual machines on the cloud. So what are the characteristics of the real-time computing Flink product officially launched by Alibaba Cloud?

product architecture

According to the architecture diagram of the entire product, the bottom layer is based on Alibaba Cloud's complete cloud-native infrastructure, and a set of real-time computing Flink version products are built through containerization. All Flink computing tasks run on the Kubernetes ecosystem. , and implement multi-tenant isolation in a containerized manner to ensure security. At the same time, it is a fully managed service form, providing fully managed services with high SLA guarantees on the cloud, eliminating the troubles of user operation and maintenance. And with the service architecture, users can judge the proportion of various resources more flexibly, and choose according to their business volume, without worrying about machine planning. The real-time computing Flink version product is a natural cloud-native infrastructure.

On the core computing engine, compared with the open source Apache Flink Alibaba Cloud, several core functions have been optimized, and these optimizations have also passed the tempering of Alibaba's internal business. At present, the real-time computing Flink version of the product supports the real-time data services of nearly 100 business divisions of the Ali Group. Through a large number of business practices, the product has been debugged to the best effect in terms of supporting storage, scheduling, and network transmission.

In terms of plug-ins, the product has dozens of enhanced connectors built in, which can be connected to all mainstream open source data storage, including MySQL, HBase, HDFS, Alibaba Cloud SLS, etc., which are naturally integrated and ready to use out of the box. In terms of development platform, it provides an enterprise-level one-stop development platform with built-in development and operation and maintenance capabilities, eliminating the trouble of self-build and improving the overall experience of enterprise users.

The real-time computing Flink version supports multi-language development environments such as SQL, Java, and Python, provides full lifecycle management of development tasks, supports enterprise-level security mechanisms based on OIDC and RBAC, and has full-link monitoring and alarming based on the Prometheus protocol. It provides its own AutoPilot intelligent tuning system, which intelligently helps users to tune the parameters of Flink tasks, including resource tuning and concurrency tuning. The product can completely adapt to the traffic of the business without any manual debugging (intelligent tuning is the core advantage of the real-time computing Flink version of the product).

The difference between real-time computing Flink version and open source Apache Flink

Compared with open source products, the real-time computing Flink version has dozens of performance advantages, which are compared from the perspectives of development, operation and maintenance, cost, and security.

product comparison

In terms of development, it has rich data connection capabilities and a one-stop multi-language development environment. It has a variety of built-in function libraries, which is convenient for users to debug code, and can also carry out multi-tenant development, task debugging, and test simulation. In terms of operation and maintenance, it supports monitoring and alarming of the entire link, and automatic alarms can be issued for data delays, data anomalies, and service interruptions that occur during users' use.

In terms of intelligent operation and maintenance, it supports automatic intelligent diagnosis and optimization, and can automatically help users perform performance tuning, job tuning, parameter tuning, and resource tuning based on business traffic, and can diagnose and optimize problems. At the resource level, on the basis of open source, finer-grained and more refined resource allocation is achieved, so that each job and each operator can be configured at the granularity of CPU and memory, which greatly optimizes resource utilization and helps users Save costs, improve service stability, and reduce the probability of OM. With the original factory's operation and maintenance service, SLA 99.9% guarantee, full link fault tolerance and system stability guarantee, fully solve the worries of users.

In terms of cost, through cost optimization on the cloud, the overall TCO of users can be reduced while performance is improved, which is also an advantage of core performance.

In the standard test of stream computing based on NexMark, the product performance of the real-time computing Flink version is about three times that of the open source. Relying on the practical optimization accumulated by the strong R&D team of Alibaba Group in internal core business scenarios, the product can reduce the basic cost of users First, highlight the core advantages.

The real-time computing Flink version also has cloud-native elastic expansion capabilities, which can help users save resources reasonably and improve resource utilization. The product payment type supports yearly and monthly payment, and also supports pay-as-you-go, which better adapts to different needs.

The security layer improves user experience through containerized task isolation, and supports multiple requirements such as tenant isolation, security isolation, and VPC isolation. At the same time, it is directly connected with Alibaba's account system. Users can seamlessly control the security between products based on Alibaba Cloud's account. It also supports open identity authentication protocols such as role-based and OIDC, which greatly improves business security.

On the whole, the enterprise version has better functionality and stability than the open source version. In addition to the advantages in operation and maintenance, the out-of-the-box operation also makes users more convenient.

Product Solutions

Product Solutions

As a streaming computing engine for real-time computing, Flink can process a variety of real-time data, including ECS online service logs, sensor data in IoT scenarios, and other real-time data. At the same time, you can subscribe to binlog updates in relational databases such as RDS and PolarDB on the cloud. Then subscribe real-time data through DataHub data bus products, SLS log service, open source Kafka message queue products, etc., and include them in real-time computing products for real-time data analysis and processing. Finally, the analysis results are written into different data services, such as MaxCompute, MaxCompute-Hologres interactive analysis, PAI machine learning, Elasticsearch and other products, and the best data service products are selected according to business needs to improve data utilization.

The main application scenario of Flink is to subscribe, process, and analyze data from various real-time data sources in real time, and write the obtained results into other online storage, allowing users to directly produce and use them. The entire system has the characteristics of fast speed, accurate data, cloud-native architecture, and intelligence. It is a very competitive enterprise-level product. The product runs on Alibaba Cloud's container service ECS and other IaaS systems, and is naturally connected with various systems of Alibaba Cloud, which is convenient for customers to apply to more scenarios.

Product Application Scenario

Based on the real-time computing Flink version of the product, 4 major application scenarios are summarized, so that users can easily build their own business real-time computing solutions according to their needs.

Product Application Scenario

1. Real-time data warehouse

The real-time data warehouse is mainly used in various transactional data scenarios such as website pv/uv statistics, commodity sales statistics, and transaction data statistics. By subscribing to the real-time data source of the business, the information is analyzed in real time at the second level, and finally presented on the big screen for decision makers to use, which is convenient for judging the business status of the company and the status of event promotions. Make decisions based on real-time business operation data to achieve true data intelligence. Due to the particularity of the scene, real-time data is particularly important. In the rapidly changing business interaction, it is necessary to analyze and make decisions on the data that occurred in the last minute or even the last second. Real-time computing is the best choice in this scenario.

2. Real-time recommendation

Real-time recommendation is mainly based on user preference for personalized recommendation or recommendation based on AI technology, which is a mainstream product form. It is common in short video scenes, e-commerce shopping scenes, content information scenes, etc. It judges user preferences in real time based on previous user clicks, so as to make targeted recommendations and increase user stickiness. This is a very real-time scenario, and Flink technology combined with AI technology can be used to operate real-time recommendation scenarios.

3. ETL scenario

Real-time ETL scenarios are common in data synchronization operations, and data calculation and processing are required during the data synchronization process. For example, the synchronization and transformation of different tables in the database, the synchronization of different databases, or data aggregation preprocessing and other operations. Finally, the results are written into the data warehouse/data lake for archiving and precipitation, and the preparatory work for subsequent in-depth analysis is carried out, which is convenient for users to perform subsequent log analysis and other operations. In the entire data synchronization and processing link, it is very efficient to do this kind of real-time data synchronization and preprocessing based on Flink.

4. Real-time monitoring

Real-time monitoring is common in financial or trading business scenarios. In view of the uniqueness of the industry, commercialized anti-cheating supervision is required. According to the real-time short-term behavior, determine whether the user is a cheating user, so as to stop the loss in time . This scenario has extremely high requirements on timeliness. Through abnormal data detection, abnormal situations can be found in real time and a stop loss action can be made. Collection of indicators or logs and other statistical indicators of various systems, real-time observation and monitoring of indicators and other demand scenarios can all be solved by real-time computing Flink products.

General introduction to real-time computing Flink version

Related Articles

A detailed explanation of Hadoop core architecture HDFS

What Does IOT Mean

6 Optional Technologies for Data Storage

What Is Blockchain Technology

Explore More Special Offers

Short Message Service(SMS) & Mail Service

Sales Support

Technical Support

Connect & Report Abuse