All Products
Search
Document Center

Realtime Compute for Apache Flink:Monitor job performance

Last Updated:Jun 03, 2026

Diagnose performance bottlenecks in running deployments with flame graphs, memory profiling, thread sampling, and thread dumps at the JVM level.

Prerequisites

Before you begin, make sure that you have:

  • Namespace-level permissions for Realtime Compute for Apache Flink granted to your Alibaba Cloud account or RAM user. Grant namespace permissions.

Limitations

  • Only Ververica Runtime (VVR) 4.0.11 or later supports deployment performance monitoring.

  • Performance data is available only for running deployments. Historical deployment data is not retained.

Choose a tool

Start with the tool that matches the symptom you are investigating.

Symptom

Tool

What it shows

High CPU usage or slow throughput

Flame Graph

CPU-intensive methods and their call stacks

Suspected memory pressure

Flame Graph (Alloc mode) or Memory

Memory allocation by function, or JVM memory usage by space

Thread contention or suspected deadlocks

Flame Graph (Lock mode) or Threads

Lock contention patterns, or per-thread stack traces via sampling

Need a full snapshot of all thread states

Thread Dump

All thread stacks at a single point in time

Access the performance tools

  1. Log on to the Realtime Compute for Apache Flink console.

  2. Find the workspace and click Console in the Actions column.

  3. In the left-side navigation pane, choose O&M > Deployments.

  4. Click the deployment name, then click the Logs tab.

  5. Access the performance tools as follows:

    • Flame Graph, Memory, or Threads: Click the Job Manager or Running Task Managers tab, then click Debug.

    • Thread Dump: Click the Running Task Managers tab, then click the value in the Path, ID column.

Flame Graph

Flame graphs render call stacks as layered horizontal bars. Each bar is a stack frame — wider bars indicate more CPU time and signal potential bottlenecks. The bottom layer is the entry point; higher layers are deeper calls.

The Apache Flink Flame Graphs documentation covers the underlying concepts.

Flame graph modes

Mode

What it captures

CPU

Stack traces from actively running threads. Wider frames indicate higher CPU usage.

Alloc

Memory allocated by each function. Identifies functions that generate the most heap pressure.

Lock

Lock contention and deadlock patterns. Highlights functions waiting to acquire locks.

ITimer

CPU consumption across all threads within a sampling interval. Similar to CPU mode but does not require perf_events support.

Identify bottlenecks with flame graphs

Flame graph example

  1. Look for wide frames. A wide frame means the function consumes a large share of CPU time — the most common hot-spot indicator.

  2. Check frame frequency. Frames that appear repeatedly across samples indicate frequently called functions that may cause cumulative performance degradation.

  3. Interpret vertical position. Wide frames near the bottom suggest issues in the main application path. Wide frames near the top point to a specific function deep in the call stack.

  4. Optimize the hot spots. Review the code for problematic functions. Common fixes include reducing loop iterations, improving data structures, and minimizing synchronization.

  5. Compare before and after. Generate a new flame graph after optimizing and compare it with the original to verify the fix.

Note

Flame graphs use sampled data and may not capture the full execution context. Combine them with the other tools on this page for better diagnosis. Non-Java functions appear as "unknown" — the async-profiler discussion explains why.

Memory

The Memory tab shows memory usage across JVM spaces (heap, non-heap, metaspace, and others). Use it to identify memory leaks, excessive garbage collection, or spaces approaching their limits.

Threads

Thread sampling captures stack traces over a time window, showing what each thread does during a performance issue.

Sample a thread

Access the Debug tab for the component to inspect:

  • JobManager: On the Logs tab, click the Job Manager tab, then click Debug.

  • TaskManager: On the Logs tab, click the Running Task Managers tab, click the value in the Path, ID column, then click Debug.

On the Threads tab, find the operator to inspect and click Sample in the Actions column. Wait for sampling to complete, then review the thread stacks.

Thread sampling example

This example shows thread stacks accessed by Gemini State.

Thread Dump

A thread dump captures every thread's state at a single point in time. Use it to detect deadlocks, identify blocked threads, or verify state backend interactions.

Capture a thread dump

  1. On the Logs tab, click the Running Task Managers tab, then click the value in the Path, ID column.

  2. Click the Thread Dump tab.

  3. Search for the operator that processes state data. Check whether thread stacks under the operator show GeminiStateBackend or RocksDBStateBackend interactions.

Thread dump example

Note

Find the operator name on the Status tab.

Operator name on Status tab

References