Serverless Stream Processing & Real-Time Analytics with Apache Flink - Realtime Compute for Apache Flink

Overview

Realtime Compute for Apache Flink runs on top of Apache Flink and removes the operational burden of cluster setup, version management, and resource tuning. You get:

No infrastructure to manage — no clusters to provision, patch, or scale manually.
Always-on autoscaling — Autopilot continuously monitors your jobs and adjusts resource allocation to handle throughput changes, backpressure, and resource waste automatically.
Automatic fault tolerance — end-to-end fault recovery and JobManager high availability are built in, eliminating single points of failure.

The service delivers sub-second latency and, through SQL operator optimization and the self-developed GeminiStateBackend, performs twice as high as open-source Apache Flink in Nexmark benchmark tests.

Use cases

Real-time ETL and CDC pipelines — sync entire databases to data warehouses or data lakes using Create Database As Select (CDAS) or Create Table As Select (CTAS), with built-in support for schema evolution and Change Data Capture (CDC).
Fraud and anomaly detection — apply complex event processing (CEP) rules over high-volume event streams to identify patterns in real time.
Real-time analytics dashboards — write Flink SQL to aggregate streaming data and push results to Hologres, ClickHouse, or other downstream systems with sub-second freshness.
AI-powered data pipelines — integrate streaming data preprocessing with AI and intelligent data analytics workflows.
Business process monitoring — track order fulfillment, delivery status, or customer activity by processing event streams with rule-based alerting and DingTalk or email notifications.

Comparison with Apache Flink

Realtime Compute for Apache Flink is fully compatible with Apache Flink APIs and connectors, so existing workloads migrate without code changes. The table below summarizes the key differences.

Area	Apache Flink	Realtime Compute for Apache Flink	What you gain
Performance and cost	No built-in elastic scalability. Resource utilization depends on manual tuning.	2x performance over Apache Flink in Nexmark benchmark tests, powered by SQL operator optimization and GeminiStateBackend. Autopilot automatically monitors and adjusts job resource allocation, resolving throughput bottlenecks, backpressure in the entire pipeline, and resource waste without manual intervention. Fine-grained CPU/memory configuration at the operator level improves resource utilization by up to 100% for large-scale jobs. Flexible billing: Subscription, Pay-as-you-go, and Hybrid billing for compute unit (CU) consumption.	Lower costs through intelligent CU scaling and flexible billing models.
Compatibility and integration	Native Flink SQL and DataStream APIs. Manual integration with systems such as MySQL, Apache Kafka, and Paimon; frequent version updates can cause compatibility issues.	Fully compatible with SQL, DataStream, PyFlink, and Flink CDC APIs. Over 30 built-in connectors covering databases, message queues, data warehouses, data lakes, and file systems — including MySQL, Apache Kafka, Hologres, and Paimon. Custom connectors supported for integrating with external systems.	Lower entry barrier and smoother migration for existing Flink workloads.
Development and debugging	No one-stop development management platform. Limited debugging tools.	Full-database sync with CDAS/CTAS, supporting schema evolution and reads from database replicas for incremental CDC. Draft version management with code comparison, rollback, and integration with remote Git repositories (GitHub, GitLab, Gitee). Native Apache Flink functions and custom functions, plus over 20 Flink SQL script templates for common use cases. Test data management, quick debugging, intermediate result display, development-production isolation, and a Visual Studio Code local development tool.	Faster iteration from development to production with lower debugging costs.
Operations and management	No comprehensive built-in monitoring and alerting. No graphical user interface (GUI). Manual resource scaling with complex scheduling.	GUI for job management, state monitoring, and log tracking. Multi-dimensional monitoring metrics with intelligent aggregation to surface high latency, skew, and backpressure issues quickly. Built-in real-time alerting via DingTalk, email, and SMS, with Prometheus integration for enterprise monitoring. Dynamic configuration changes without job restarts, plus intelligent diagnostics for TaskManager disconnections and tuning suggestions.	Lower O&M overhead with better job observability.
Stability and reliability	Cluster deployment has regional limitations. Fault tolerance must be manually configured.	Cross-zone high availability across multiple regions. Automatic end-to-end fault tolerance and JobManager high availability eliminate single points of failure. Checkpoint and savepoint management with state compatibility checks and data migration support.	Enterprise-grade reliability for large-scale production jobs.
Enterprise support	Relies on Apache Flink documentation and community support. No dedicated technical support team.	24/7 professional technical support from Realtime Compute for Apache Flink engineers, backed by a 99.9% SLA. Rapid response, customized feature support, continuous updates, and long-term version maintenance.	Faster issue resolution and reliable service for production deployments.
Security and access control	Basic authentication mechanisms such as Kerberos. Access control requires integration with external systems.	Alibaba Cloud role-based access control (RBAC). Tenant-level and project-level isolation of resources and code files for cross-team collaboration. Credential security through variable management. Comprehensive action auditing covering all production environment changes.	Unified identity and access management for data security and compliance.
Extensibility and ecosystem	Extends functionality through plug-ins. Ecosystem depends on the Apache Flink community.	Supports emerging use cases including AI and intelligent data analytics. Supports integration with data lakes (Iceberg, Hudi) and data warehouses (ClickHouse, Hologres, MaxCompute). SDKs and RESTful APIs for secondary development.	A flexible, extensible platform for diverse real-time use cases.

Billing

Realtime Compute for Apache Flink has two billable items: management resources and computing resources.

Three billing methods are available:

Subscription: Reserve dedicated resources for a fixed period.
Pay-as-you-go: Pay only for what you use.
Hybrid billing: Combine reserved subscription resources with elastic pay-as-you-go capacity.

For details, see Billing.

Next steps

Log on to the Realtime Compute for Apache Flink console to get started.
What is Realtime Compute for Apache Flink?
What is streaming data?
What are the differences between real-time computing and batch processing?