When your application is developed in Java and the JVM heap space is set too small, out-of-memory (OOM) issues can occur. You can use Container Network File System (CNFS) as a carrier for logging by mounting it to the corresponding directory in the container. When a JVM OOM occurs, CNFS can record logs to the appropriate directory. This topic describes how to use CNFS to automatically collect JVM heap dumps on abnormal exits.
Prerequisites
You have used CNFS to manage a NAS file system. For more information, see Manage NAS file systems using CNFS (recommended).
Container Network File System (CNFS) abstracts Alibaba Cloud file storage as a Kubernetes object (CRD) for independent management, including creation, deletion, description, mounting, monitoring, and scaling operations.
You have created a Container Registry Enterprise Edition instance. For more information, see Create an Enterprise instance.
Considerations
The maximum heap value (Xmx) set for Java should be smaller than the memory limit of the pod to prevent situations where the pod experiences OOM before the JVM does.
When using Java heap dumps, we recommend creating a new CNFS and separating the CNFS used for business from the CNFS used for Java heap dumps. This prevents .hprof files from becoming too large and consuming excessive business resources during dumps, which could affect business operations.
The image
docker.io/filebrowser/filebrowser:v2.18.0used in the example might fail to pull due to network access restrictions. You must synchronize it to your ACR Enterprise Edition instance by subscribing to images from outside China. The specific configuration is as follows:Artifact Source: Docker Hub
Source Repository Coordinates: filebrowser/filebrowser
Subscription Policy: v2.18.0
After completing the image subscription, you need to configure the password-free pull policy between the ACR Enterprise Edition instance and the ACK cluster. For more information, see Pull images from the same account.
Procedure
Use the
registry.cn-hangzhou.aliyuncs.com/acs1/java-oom-test:v1.0sample image as a Java program to simulate OOM and trigger JVM OOM.Use the following example to create a Deployment named java-application.
In this example, when starting the Java program Mycode, the requested heap size is set to 80 MB, and the heap dump directory is /mnt/oom/logs. When the JVM heap size is insufficient, the
HeapDumpOnOutOfMemoryErrorerror is captured.cat << EOF | kubectl apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: java-application spec: selector: matchLabels: app: java-application template: metadata: labels: app: java-application spec: containers: - name: java-application image: registry.cn-hangzhou.aliyuncs.com/acs1/java-oom-test:v1.0 #Image address of the sample program in this topic. imagePullPolicy: Always env: #Define two key-value pairs: POD_NAME as metadata.name and POD_NAMESPACE as metadata.namespace. - name: POD_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.namespace args: - java #Execute command. - -Xms80m #Minimum heap value for heap memory. - -Xmx80m #Maximum heap value for heap memory. - -XX:HeapDumpPath=/mnt/oom/logs #Path for heap memory dump when OOM occurs. - -XX:+HeapDumpOnOutOfMemoryError #Capture heap OOM errors. - Mycode #Execute program. volumeMounts: - name: java-oom-pv mountPath: "/mnt/oom/logs" #Use /mnt/oom/logs as the mount directory inside the container. subPathExpr: $(POD_NAMESPACE).$(POD_NAME) #Use $(POD_NAMESPACE).$(POD_NAME) as the created subdirectory to generate OOM dump files in the subdirectory. volumes: - name: java-oom-pv persistentVolumeClaim: claimName: cnfs-nas-pvc #Use the CNFS PVC named cnfs-nas-pvc. --- kind: PersistentVolumeClaim apiVersion: v1 metadata: name: cnfs-nas-pvc spec: accessModes: - ReadWriteMany storageClassName: alibabacloud-cnfs-nas resources: requests: storage: 70Gi # If the directory quota feature is enabled, the storage field takes effect, and the maximum amount of data that can be written to the dynamically created directory is 70 GiB. --- EOFThrough the Event Center of the Container Service console, you can see that the pod has a Back-off restarting alert event, indicating that the java-application application has experienced an OOM.
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, find the cluster you want and click its name. In the left-side pane, choose .
View the corresponding events.

Because NAS currently does not have the functionality to browse, upload, or download files, use File Browser as a web-based access tool. First, mount the NAS mount target to the rootDir of File Browser, then create a Service to map the container port of File Browser, and finally access the files stored on NAS through a browser.
Use the following template to create a File Browser Deployment and the ConfigMap required by File Browser, with port 80 enabled by default.
cat << EOF | kubectl apply -f - apiVersion: v1 data: .filebrowser.json: | { "port": 80, "address": "0.0.0.0" } kind: ConfigMap metadata: labels: app.kubernetes.io/instance: filebrowser app.kubernetes.io/name: filebrowser name: filebrowser namespace: default --- apiVersion: apps/v1 kind: Deployment metadata: labels: app.kubernetes.io/instance: filebrowser app.kubernetes.io/name: filebrowser name: filebrowser namespace: default spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: app.kubernetes.io/instance: filebrowser app.kubernetes.io/name: filebrowser template: metadata: labels: app.kubernetes.io/instance: filebrowser app.kubernetes.io/name: filebrowser spec: containers: - image: XXXX-registry-vpc.cn-hangzhou.cr.aliyuncs.com/test/test:v2.18.0 #The sample image docker.io/filebrowser/filebrowser:v2.18.0 might fail to pull due to network access restrictions. See the considerations section. imagePullPolicy: IfNotPresent name: filebrowser ports: - containerPort: 80 name: http protocol: TCP resources: {} securityContext: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /.filebrowser.json name: config subPath: .filebrowser.json - mountPath: /db name: rootdir - mountPath: /rootdir name: rootdir dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 volumes: - configMap: defaultMode: 420 name: filebrowser name: config - name: rootdir persistentVolumeClaim: claimName: cnfs-nas-pvc EOFExpected output:
configmap/filebrowser unchanged deployment.apps/filebrowser configuredLog on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, find the cluster you want and click its name. In the left-side pane, choose .
On the Services page, select the default namespace, click Create, and then configure the following parameters.
Parameter
Example description
Name
filebrowser
Service Type
SLB
SLB Type: NLB
Select Create Resource, click the Create NLB Instance drop-down list, set Access Method to Public Access.
For information about NLB billing, see NLB billing Overview.
Backend
Select +Reference Workload Label.
resource type: Deployments
Resources: filebrowser
Port Mapping
Service Port: 8080
Container Port: 80
Protocol: TCP
In the dialog box, select SLB as the service type. Select Create Resource, set Access Method to Public Access, and submit the configuration changes as prompted.
For information about NLB billing, see NLB billing Overview.
Open your browser and enter endpoint address:8080 in the address bar. You will see the File Browser login interface. Enter the default account (admin) and password (admin) to access the container interior.

Because File Browser mounts the PVC named
cnfs-nas-pvcto rootDir, double-click rootDir to enter the NAS mount point.
Result
In File Browser, you can see a directory named default.java-application-76d8cd95b7-prrl2. This directory is generated by the java-application's subPathExpr: $(POD_NAMESPACE).$(POD_NAME) rule.

Then enter this directory to see the dump file java_pid1.hprof in the directory. To locate the code line where the OOM occurred in your program, you can download java_pid1.hprof to your local machine and further analyze the JVM stack information using Eclipse Memory Analyzer Tools (MAT).
