All Products
Search
Document Center

Realtime Compute for Apache Flink:Monitor deployment performance

Last Updated:Feb 28, 2026

Realtime Compute for Apache Flink provides built-in profiling tools that help you diagnose performance bottlenecks in running deployments at the JVM level. Use flame graphs to pinpoint CPU hot spots, inspect memory allocation across JVM spaces, and analyze thread behavior through sampling and thread dumps.

Prerequisites

Before you begin, make sure that you have:

  • Namespace-level permissions for Realtime Compute for Apache Flink granted to your Alibaba Cloud account or Resource Access Management (RAM) user. For details, see Grant namespace permissions

Limitations

  • Only Ververica Runtime (VVR) 4.0.11 or later supports deployment performance monitoring.

  • Performance data is available only for running deployments. Historical deployment data is not retained.

Choose a tool

Start with the tool that matches the symptom you are investigating.

SymptomToolWhat it shows
High CPU usage or slow throughputFlame GraphCPU-intensive methods and their call stacks
Suspected memory pressureFlame Graph (Alloc mode) or MemoryMemory allocation by function, or JVM memory usage by space
Thread contention or suspected deadlocksFlame Graph (Lock mode) or ThreadsLock contention patterns, or per-thread stack traces via sampling
Need a full snapshot of all thread statesThread DumpAll thread stacks at a single point in time

Access the performance tools

  1. Log on to the Realtime Compute for Apache Flink console.

  2. Find the workspace and click Console in the Actions column.

  3. In the left-side navigation pane, choose O&M > Deployments.

  4. Click the deployment name, then click the Logs tab.

  5. Access the performance tools as follows:

    • Flame Graph, Memory, or Threads: Click the Job Manager or Running Task Managers tab, then click Debug.

    • Thread Dump: Click the Running Task Managers tab, then click the value in the Path, ID column.

Flame Graph

Flame graphs visualize call stacks as layered horizontal bars. Each bar represents a stack frame: the bottom layer is the application entry point, and higher layers represent deeper function calls. Wider bars indicate more CPU time consumed by that function, making them the primary signal for performance bottlenecks.

For background on flame graph concepts, see Flame Graphs in the Apache Flink documentation.

Flame graph modes

ModeWhat it captures
CPUStack traces from actively running threads. Wider frames indicate higher CPU usage.
AllocMemory allocated by each function. Identifies functions that generate the most heap pressure.
LockLock contention and deadlock patterns. Highlights functions waiting to acquire locks.
ITimerCPU consumption across all threads within a sampling interval. Similar to CPU mode but does not require perf_events support.

Identify bottlenecks with flame graphs

Flame graph example
  1. Look for wide frames. A wide frame means the corresponding function consumes a significant share of CPU time relative to other functions. This is the most common indicator of a hot spot.

  2. Check frame frequency. Stack frames that appear repeatedly across samples indicate functions that are called frequently, which may cause cumulative performance degradation.

  3. Interpret vertical position. Wide frames near the bottom of the graph suggest issues in the main application path or entry-point code. Wide frames near the top suggest issues within a specific function deep in the call stack.

  4. Optimize the hot spots. After identifying problematic functions, review the corresponding code. Common optimizations include reducing loop iterations, improving data structures, and minimizing synchronization operations.

  5. Compare before and after. After applying optimizations, generate a new flame graph and compare it against the previous one to verify that the bottleneck is resolved.

Note

Flame graphs are generated from sampled data and may not capture the complete execution context. For more accurate diagnosis, use flame graphs alongside the other performance tools described in this topic. Non-Java functions appear as "unknown" in flame graphs. For details, see the async-profiler discussion.

Memory

The Memory tab shows memory usage across different JVM spaces (heap, non-heap, metaspace, and others). Use this view to identify memory leaks, excessive garbage collection, or JVM spaces approaching their limits.

Threads

Thread sampling captures stack traces from individual threads over a time window, helping you understand what each thread is doing during a performance issue.

Sample a thread

Access the Debug tab for the component to inspect:

  • JobManager: On the Logs tab, click the Job Manager tab, then click Debug.

  • TaskManager: On the Logs tab, click the Running Task Managers tab, click the value in the Path, ID column, then click Debug.

On the Threads tab, find the operator to inspect and click Sample in the Actions column. Wait for sampling to complete, then review the thread stacks.

Thread sampling example

This example shows thread stacks accessed by Gemini State.

Thread Dump

A thread dump captures the state of every thread at a single point in time. Use thread dumps to detect deadlocks, identify blocked threads, or verify state backend interactions.

Capture a thread dump

  1. On the Logs tab, click the Running Task Managers tab, then click the value in the Path, ID column.

  2. Click the Thread Dump tab.

  3. Search for the operator name that processes state data. Check whether thread stacks containing GeminiStateBackend or RocksDBStateBackend interaction are displayed under the operator.

Thread dump example
Note

Find the operator name on the Status tab.

Operator name on Status tab

References