All Products
Search
Document Center

Container Service for Kubernetes:Automatically collect JVM heap dumps on abnormal exits using CNFS

Last Updated:Nov 21, 2025

When your application is developed in Java and the JVM heap space is set too small, out-of-memory (OOM) issues can occur. You can use Container Network File System (CNFS) as a carrier for logging by mounting it to the corresponding directory in the container. When a JVM OOM occurs, CNFS can record logs to the appropriate directory. This topic describes how to use CNFS to automatically collect JVM heap dumps on abnormal exits.

Prerequisites

Considerations

  • The maximum heap value (Xmx) set for Java should be smaller than the memory limit of the pod to prevent situations where the pod experiences OOM before the JVM does.

  • When using Java heap dumps, we recommend creating a new CNFS and separating the CNFS used for business from the CNFS used for Java heap dumps. This prevents .hprof files from becoming too large and consuming excessive business resources during dumps, which could affect business operations.

  • The image docker.io/filebrowser/filebrowser:v2.18.0 used in the example might fail to pull due to network access restrictions. You must synchronize it to your ACR Enterprise Edition instance by subscribing to images from outside China. The specific configuration is as follows:

    • Artifact Source: Docker Hub

    • Source Repository Coordinates: filebrowser/filebrowser

    • Subscription Policy: v2.18.0

      After completing the image subscription, you need to configure the password-free pull policy between the ACR Enterprise Edition instance and the ACK cluster. For more information, see Pull images from the same account.

Procedure

  1. Use the registry.cn-hangzhou.aliyuncs.com/acs1/java-oom-test:v1.0 sample image as a Java program to simulate OOM and trigger JVM OOM.

  2. Use the following example to create a Deployment named java-application.

    In this example, when starting the Java program Mycode, the requested heap size is set to 80 MB, and the heap dump directory is /mnt/oom/logs. When the JVM heap size is insufficient, the HeapDumpOnOutOfMemoryError error is captured.

    cat << EOF | kubectl apply -f -
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: java-application
    spec:
      selector:
        matchLabels:
          app: java-application
      template:
        metadata:
          labels:
            app: java-application
        spec:
          containers:
          - name: java-application
            image: registry.cn-hangzhou.aliyuncs.com/acs1/java-oom-test:v1.0  #Image address of the sample program in this topic.
            imagePullPolicy: Always
            env:                               #Define two key-value pairs: POD_NAME as metadata.name and POD_NAMESPACE as metadata.namespace.
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
            args:
            - java                            #Execute command.
            - -Xms80m                         #Minimum heap value for heap memory.
            - -Xmx80m                         #Maximum heap value for heap memory.
            - -XX:HeapDumpPath=/mnt/oom/logs  #Path for heap memory dump when OOM occurs.
            - -XX:+HeapDumpOnOutOfMemoryError #Capture heap OOM errors.
            - Mycode                          #Execute program.
            volumeMounts:
            - name: java-oom-pv
              mountPath: "/mnt/oom/logs"      #Use /mnt/oom/logs as the mount directory inside the container.
              subPathExpr: $(POD_NAMESPACE).$(POD_NAME)   #Use $(POD_NAMESPACE).$(POD_NAME) as the created subdirectory to generate OOM dump files in the subdirectory.
          volumes:
          - name: java-oom-pv
            persistentVolumeClaim:
              claimName: cnfs-nas-pvc         #Use the CNFS PVC named cnfs-nas-pvc.
    ---
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
      name: cnfs-nas-pvc
    spec:
      accessModes:
        - ReadWriteMany
      storageClassName: alibabacloud-cnfs-nas
      resources:
        requests:
          storage: 70Gi # If the directory quota feature is enabled, the storage field takes effect, and the maximum amount of data that can be written to the dynamically created directory is 70 GiB.
    ---          
    EOF
  3. Through the Event Center of the Container Service console, you can see that the pod has a Back-off restarting alert event, indicating that the java-application application has experienced an OOM.

    1. Log on to the ACK console. In the left navigation pane, click Clusters.

    2. On the Clusters page, find the cluster you want and click its name. In the left-side pane, choose Operations > Event Center.

    3. View the corresponding events.

      3e0492283c067026c9cfd348a898ecb1

  4. Because NAS currently does not have the functionality to browse, upload, or download files, use File Browser as a web-based access tool. First, mount the NAS mount target to the rootDir of File Browser, then create a Service to map the container port of File Browser, and finally access the files stored on NAS through a browser.

    1. Use the following template to create a File Browser Deployment and the ConfigMap required by File Browser, with port 80 enabled by default.

      cat << EOF | kubectl apply -f -
      apiVersion: v1
      data:
        .filebrowser.json: |
          {
            "port": 80,
            "address": "0.0.0.0"
          }
      kind: ConfigMap
      metadata:
        labels:
          app.kubernetes.io/instance: filebrowser
          app.kubernetes.io/name: filebrowser
        name: filebrowser
        namespace: default
      ---
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        labels:
          app.kubernetes.io/instance: filebrowser
          app.kubernetes.io/name: filebrowser
        name: filebrowser
        namespace: default
      spec:
        progressDeadlineSeconds: 600
        replicas: 1
        revisionHistoryLimit: 10
        selector:
          matchLabels:
            app.kubernetes.io/instance: filebrowser
            app.kubernetes.io/name: filebrowser
        template:
          metadata:
            labels:
              app.kubernetes.io/instance: filebrowser
              app.kubernetes.io/name: filebrowser
          spec:
            containers:
            - image:  XXXX-registry-vpc.cn-hangzhou.cr.aliyuncs.com/test/test:v2.18.0  #The sample image docker.io/filebrowser/filebrowser:v2.18.0 might fail to pull due to network access restrictions. See the considerations section.
              imagePullPolicy: IfNotPresent
              name: filebrowser
              ports:
              - containerPort: 80
                name: http
                protocol: TCP
              resources: {}
              securityContext: {}
              terminationMessagePath: /dev/termination-log
              terminationMessagePolicy: File
              volumeMounts:
              - mountPath: /.filebrowser.json
                name: config
                subPath: .filebrowser.json
              - mountPath: /db
                name: rootdir
              - mountPath: /rootdir
                name: rootdir
            dnsPolicy: ClusterFirst
            restartPolicy: Always
            schedulerName: default-scheduler
            securityContext: {}
            terminationGracePeriodSeconds: 30
            volumes:
            - configMap:
                defaultMode: 420
                name: filebrowser
              name: config
            - name: rootdir
              persistentVolumeClaim:
                claimName: cnfs-nas-pvc
      EOF

      Expected output:

      configmap/filebrowser unchanged
      deployment.apps/filebrowser configured
    2. Log on to the ACK console. In the left navigation pane, click Clusters.

    3. On the Clusters page, find the cluster you want and click its name. In the left-side pane, choose Network > Services.

    4. On the Services page, select the default namespace, click Create, and then configure the following parameters.

      Parameter

      Example description

      Name

      filebrowser

      Service Type

      SLB

      • SLB Type: NLB

      • Select Create Resource, click the Create NLB Instance drop-down list, set Access Method to Public Access.

      For information about NLB billing, see NLB billing Overview.

      Backend

      Select +Reference Workload Label.

      • resource type: Deployments

      • Resources: filebrowser

      Port Mapping

      • Service Port: 8080

      • Container Port: 80

      • Protocol: TCP

    5. In the dialog box, select SLB as the service type. Select Create Resource, set Access Method to Public Access, and submit the configuration changes as prompted.

      For information about NLB billing, see NLB billing Overview.
    6. Open your browser and enter endpoint address:8080 in the address bar. You will see the File Browser login interface. Enter the default account (admin) and password (admin) to access the container interior.

      20fe4dcde1759ebc64cbe0b1bb3168da

    7. Because File Browser mounts the PVC named cnfs-nas-pvc to rootDir, double-click rootDir to enter the NAS mount point.

      image

Result

In File Browser, you can see a directory named default.java-application-76d8cd95b7-prrl2. This directory is generated by the java-application's subPathExpr: $(POD_NAMESPACE).$(POD_NAME) rule.

image

Then enter this directory to see the dump file java_pid1.hprof in the directory. To locate the code line where the OOM occurred in your program, you can download java_pid1.hprof to your local machine and further analyze the JVM stack information using Eclipse Memory Analyzer Tools (MAT).

lQLPJxMFSGyoLcnNAqTNB2awoAbbe3-kh8AIV2X4pMttAA_1894_676