Alibaba Cloud Realtime Compute for Apache Flink provides an end-to-end, high-performance platform to process big data in real time based on Apache Flink. It is widely used to process streaming data or offline data.

Features

  • Powerful real-time computing functions
    Alibaba Cloud Realtime Compute for Apache Flink integrates a wide range of functions to simplify the development process. These functions include:
    • A powerful engine is used. This engine offers the following advantages:
      • Provides Flink SQL that enables automatic data recovery from failures. This ensures accurate data processing when failures occur. For more information, see Flink SQL overview.
      • Supports a variety of built-in functions, such as string, date, and aggregate functions.
      • Enables accurate control over computing resources. This provides isolation between the jobs of different tenants.
    • Realtime Compute for Apache Flink outperforms Apache Flink by three to four times when measured by key performance metrics. For example, in Realtime Compute for Apache Flink, the data processing delay is reduced to seconds. The throughput of a job reaches millions of data records per second, and a cluster can contain thousands of nodes.
    • Realtime Compute for Apache Flink integrates various cloud-based data stores, such as DataHub, Log Service, ApsaraDB for RDS, Tablestore, and AnalyticDB for MySQL. Realtime Compute for Apache Flink can read data from and write data to these systems with minimal data integration.
  • Managed real-time computing services

    Unlike open source or user-developed streaming data services, Realtime Compute for Apache Flink is a fully managed stream processing engine. You can query streaming data without the need to deploy or manage any infrastructure. With Realtime Compute for Apache Flink, you can use streaming data services with a few clicks. Realtime Compute for Apache Flink integrates services such as data storage, data development, data administration, monitoring, and alerting. This allows you to use cost-effective streaming data services for trial and migrate your data for deployment. Realtime Compute for Apache Flink also supports complete tenant isolation. Tenant isolation and protection extend from the top application layer to the underlying infrastructure layer. This helps ensure the security and privacy of your data.

  • Low costs in labors and compute clusters
    Alibaba Cloud has made many improvements to the SQL execution engine, which allows you to create jobs more cost-effectively than open source Flink jobs. Realtime Compute for Apache Flink is more cost-effective than open source stream frameworks in both development and production costs. For example, you must consider the following costs for project budget:
    • Labor costs on writing Flink jobs with complex business logic by using Java code
    • Costs on job debugging, testing, optimization, and publishing
    • Long-term O&M costs of open-source software such as Flink or ZooKeeper
    Realtime Compute for Apache Flink allows you to fully focus on your business without the need to consider these cost issues.

Product positioning

  • Realtime Compute for Apache Flink is able to:
    • Collect data about page views (PVs) and unique visitors (UVs) in real time.
    • Collect data about the average traffic flow at a traffic checkpoint within a certain period of time, such as five minutes.
    • Collect and display the pressure data of hydroelectric dams.
    • Report alerts for financial thefts in online payment services based on fixed rules.
  • Realtime Compute for Apache Flink has limits in the following scenarios:
    • The stored procedure of Oracle databases cannot be replaced by Realtime Compute for Apache Flink because they are designed to handle issues in different fields.
    • Spark jobs cannot be seamlessly migrated to Realtime Compute for Apache Flink. You can rebuild and then migrate the real-time computing part of Spark jobs from Spark to Realtime Compute for Apache Flink. After the migration, the costs on the O&M and development of Spark jobs are reduced.
    • Realtime Compute for Apache Flink does not support alerting by multiple complex rules engines. If a single data record has an alert in which multiple complex rules are specified, the alert continues to change when the system is running. We recommend that you use the rules engine system to solve this issue.

Realtime Compute for Apache Flink uses Flink SQL and user-defined functions (UDFs) to provide services. It provides an end-to-end development tool for data warehousing developers and data analysts to perform streaming data analysis, statistics, and processing. You can write Flink SQL to analyze streaming data without the need to be involved in the development of the underlying code.

Terms

Term Description
compute cluster A compute cluster is a distributed cluster system that hosts computing tasks of Realtime Compute for Apache Flink and runs on YARN. Realtime Compute for Apache Flink has two modes: exclusive mode and shared mode. For more information, see Overview.
web console Realtime Compute for Apache Flink provides a complete set of integrated development platform (IDE) tools to implement end-to-end data storage, data development, data administration, monitoring, and alerting functions to help you develop your business.
project In Realtime Compute for Apache Flink, a project is a basic unit used to manage clusters, jobs, resources, and users. You can join existing projects as a RAM user or create projects.
Note Projects of Realtime Compute for Apache Flink allow concurrent operations by multiple RAM users.
CU In Realtime Compute for Apache Flink, a CU is a basic unit of jobs, with specified CPU cores, memory, and I/O capabilities. A job of Realtime Compute for Apache Flink can use one or more CUs.
One CU represents 1 CPU core and 4 GB memory. The processing capability of one CU depends on the complexity of business operations:
  • For simple operations such as single-stream filtering and string conversion, one CU can process 10,000 data records per second.
  • For complex operations such as operations that use a JOIN clause, GROUP BY clause, or window function, one CU can process 1,000 to 5,000 data records per second.
Note The specific processing capability of CUs in Realtime Compute for Apache Flink depends on your business.