Alibaba Cloud Realtime Compute for Apache Flink is an end-to-end real-time big data analytics platform that is built on Apache Flink. Alibaba Cloud Realtime Compute for Apache Flink provides end-to-end real-time data analysis capabilities with subsecond data processing latency. It simplifies the business development process by using standard SQL statements to help enterprises transform their business into real-time and intelligent big data computing business.

History

In 2017, Alibaba Group integrated Blink with Galaxy and JStorm. Blink was used as a unified real-time computing V2.0 product due to its excellent performance and provided real-time computing services for all business units (BUs) in Alibaba Group. Blink is a branch of Apache Flink. In the past four years, Alibaba Group made in-depth optimizations and improvements to enable Blink to support ultra-large-scale business scenarios of Alibaba Group, such as search and recommendation.

In January 2019, Alibaba Group acquired the Flink founding company Data Artisans. The Blink technical team and the Flink founding team jointly built a globally unified Flink Enterprise Edition platform, which is called Ververica Platform (VVP). This development leads real-time computing to Era 3.0.

History

Architecture

Architecture

Benefits

  • Superior performance: A single CPU core can process hundreds of thousands of data records per second, with subsecond data processing latency between ports. Tens of thousands of ultra-large-scale real-time computing tasks can run in parallel.
  • Powerful features: Realtime Compute for Apache Flink is an end-to-end SQL-based development and O&M platform that provides intelligent diagnosis and automatic configuration optimization. Realtime Compute for Apache Flink can seamlessly connect to mainstream data services of Alibaba Cloud.
  • Cost-effectiveness: The hourly computing fees per CPU core are low. Auto scaling is implemented based on the workload and the pay-as-you-go billing method is supported. The total cost of ownership (TCO) of Realtime Compute for Apache Flink is significantly lower than the TCO of self-managed Flink in data centers.
  • Guaranteed stability and reliability: The service level agreement (SLA) guarantees 99.9% availability. End-to-end metric monitoring and alerting are supported. Realtime Compute for Apache Flink provides high stability and reliability in large-scale deployment scenarios such as Double 11.
  • Compatibility with self-managed Flink: Realtime Compute for Apache Flink is fully compatible with self-managed Flink. Therefore, you can smoothly migrate the business of self-managed Flink to the cloud. Realtime Compute for Apache Flink can be seamlessly connected to mainstream open source big data ecosystems.
  • Outstanding branding: Realtime Compute for Apache Flink is officially released by the founding team of Apache Flink and certified by the China Academy of Information and Communications Technology (CAICT). Realtime Compute for Apache Flink is the only real-time stream processing product that is recognized in the Forrester Wave.

Comparison between Realtime Compute for Apache Flink and self-managed Flink

Realtime Compute for Apache Flink takes more advantages in terms of functionality and stability over self-managed Flink. In addition to O&M advantages, Realtime Compute for Apache Flink provides the out-of-the-box feature for ease of use. The following table describes the advantages of Realtime Compute for Apache Flink.
Category Feature Description
Development Data connection Fully managed Flink can be seamlessly integrated with mainstream data services of Alibaba Cloud including mainstream databases, Message Queue, and Log Service.
You can access various external storage systems from fully managed Flink by using custom connectors.
Task development Programming languages: Realtime Compute for Apache Flink provides an end-to-end development and management platform, which supports various programming languages, including SQL, Java, Scala, and Python.
Metadata: Realtime Compute for Apache Flink provides a unified metadata management system and can be seamlessly connected to external metadata systems, such as MySQL and Hive.
Function libraries: Realtime Compute for Apache Flink supports multiple built-in function libraries of different fields, such as Analytics Zoo Cluster Serving, and allows you to use user-defined functions (UDFs) based on your business requirements.
Code debugging Test data management: Realtime Compute for Apache Flink supports online sampling and management of mock testing data to help you build a test process.
Fast running and debugging: Realtime Compute for Apache Flink allows you to start or stop jobs in session clusters within seconds. This makes job debugging more efficient.
Development and production isolation: Realtime Compute for Apache Flink isolates development from production. This way, jobs and data in the production environment are not affected during the debugging process.
O&M Monitoring and alerting Realtime Compute for Apache Flink supports end-to-end monitoring and alerting. When you run a job, Realtime Compute for Apache Flink automatically reports alerts if issues such as data delay, data skew, and backpressure occur. Realtime Compute for Apache Flink can also monitor metrics and aggregate dimensions to help you troubleshoot issues, such as job delays, data skew, and backpressure.
Alert notifications can be sent by using DingTalk, emails, and text messages in a timely manner. You can also connect Realtime Compute for Apache Flink to an internal unified alerting system, such as Prometheus or Graphite.
Intelligent diagnosis and Autopilot Intelligent diagnosis: identifies job issues in a timely manner and provides suggestions for troubleshooting.
Autopilot: automatically monitors and adjusts job resource allocation in unattended mode to manage traffic surges.
Fine-grained resource management Realtime Compute for Apache Flink supports fine-grained resource configuration at the operator level. You can configure CPU cores and memory for each operator of each job. This significantly improves resource utilization and service stability and reduces costs and the probability of out of memory (OOM) errors.
High availability The maintenance service delivers SLA-guaranteed service availability of up to 99.9%. In addition, end-to-end automated fault tolerance can ensure system stability.
Cost Billing method The subscription and pay-as-you-go billing methods are supported. You can select a billing method that suits your business requirements.
Core performance The Nexmark benchmark test result shows that the stream computing performance of Realtime Compute for Apache Flink is about three times the performance of self-managed Flink. The strong R&D team of Alibaba Group optimizes Realtime Compute for Apache Flink based on the practices that have been accumulated in core internal business scenarios. This highlights the core advantages of Flink and reduces the basic cost of the service.
Auto scaling Realtime Compute for Apache Flink has cloud-native auto scaling capabilities. It can perform automatic scale-out or scale-in operations based on the workload. This ensures the timeliness of business and improves the resource utilization. Cloud computing costs are optimized to improve system performance and reduce the TCO.
Security Isolation Tenant-level and project-level resource isolation and code isolation are supported to allow different teams to collaborate on projects. Containerized task isolation is used to improve user experience.
Access control Realtime Compute for Apache Flink uses the Alibaba Cloud account system to support the OpenID Connect (OIDC) protocol and role-based access control (RBAC). You can seamlessly manage the security of your services by using your Alibaba Cloud account. This significantly improves the security of your business.

Solution

As a real-time stream computing engine, Flink can process a variety of real-time data, including online service logs of Elastic Compute Service (ECS) instances and sensor data in IoT scenarios. You can also subscribe to updates of binary logs in relational databases, such ApsaraDB RDS and PolarDB. Then, you can use DataHub, Log Service, and Message Queue to subscribe to real-time data. After Realtime Compute for Apache Flink reads the real-time data, it analyzes and processes the data in real time. The analysis results are written to different data services, such as MaxCompute, Hologres, Machine Learning Platform for Artificial Intelligence (PAI), and Elasticsearch. You can select an ideal data service based on your business requirements to improve data utilization.

Realtime Compute for Apache Flink is mainly used to subscribe to, process, and analyze data from various real-time data sources in real time, and write the analysis results to other online storage for subsequent use. Realtime Compute for Apache Flink is a comprehensive enterprise-class service, which is developed based on the cloud-native architecture and provides fast, accurate, and intelligent data computing. Realtime Compute for Apache Flink runs on Infrastructure as a Service (IaaS) services of Alibaba Cloud, such as Container Service for Kubernetes and ECS. Realtime Compute for Apache Flink can connect to various Alibaba Cloud services. Solution