When a container program crashes unexpectedly, the Linux kernel captures the program's memory state at the moment of failure and saves it to a core file. Use gdb to analyze the core file and identify the root cause of the crash. This page covers how to enable core dumps for ACS pods, choose a storage method for core files, and access them after a crash.
How it works
In Linux, when a program terminates abnormally, the kernel records the state of the random access memory (RAM) allocated to that program and writes it to a file — a process called a core dump. The resulting file is called a core file.
The following figure shows the Linux signals that trigger a core dump. By default, signals whose action is Core generate core files. For details, see Core dump file.
Prerequisites
Before you begin, ensure that you have:
-
kubectl installed and a kubeconfig file configured for your ACS cluster. For details, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.
-
(Optional) A NAS file system with a mount target, if you plan to store core files on NAS
-
(Optional) An OSS bucket, if you plan to store core files on OSS (Object Storage Service)
You can also run the following kubectl commands in CloudShell without configuring a local kubeconfig file.
Enable core dumps
Core dumps are disabled by default for ACS pods. To enable them, add the alibabacloud.com/core-pattern annotation to your pod spec and set the path where core files are stored:
apiVersion: v1
kind: Pod
metadata:
annotations:
alibabacloud.com/core-pattern: "/data/dump-a/core-%E-%p-%t"
...
The path value also sets the core file naming pattern. The supported format specifiers are:
| Specifier | Description |
|---|---|
%E |
Path of the executable that crashed |
%p |
Process ID (PID) of the crashed process |
%t |
Timestamp of the crash |
For all supported specifiers, see Man page of core.
Choose a storage method
The storage method you choose determines how you access core files after a crash:
| Method | When to use | Persistence |
|---|---|---|
| NAS volume | Multiple pods across nodes need access to core files; production environments | Persistent, shared across pods |
| OSS volume | You prefer object storage; core files need to be retained long-term | Persistent, shared across pods |
| emptyDir + ephemeral container | Quick, one-time debugging of a single pod crash; no shared storage available | Lost when the pod is deleted |
Mount a remote shared volume (NAS or OSS) to keep core files intact and prevent the pod's rootfs layer from filling up, which would cause CrashLoopBackOff events.
Mount a remotely shared volume
Mount a NAS volume to store core files
Use a shared NAS volume to collect core files when a container crashes.
-
Create a NAS file system and a mount target. For details, see Create a NAS file system and a mount target.
-
Create a Deployment using the following YAML. Replace the
volumes.volumeAttributes.servervalue with your NAS server address. For details on creating Deployments, see Create a stateless application by using a Deployment.apiVersion: apps/v1 kind: Deployment metadata: name: coredump-nas-volume-test labels: app: test spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: name: nginx-test labels: app: nginx annotations: alibabacloud.com/core-pattern: "/data/dump-a/core-%E-%p-%t" # Core file storage path spec: containers: - name: nginx image: registry.cn-hangzhou.aliyuncs.com/acs-sample/nginx:latest volumeMounts: - name: nas-volume mountPath: /data/dump-a/ volumes: - name: nas-volume csi: driver: nasplugin.csi.alibabacloud.com fsType: nas volumeAttributes: server: "0389a***-nh7m.cn-shanghai.extreme.nas.aliyuncs.com" path: "/" vers: "3" options: "nolock,tcp,noresvport"When a pod triggers a core dump, the core file is stored in the NAS volume.
-
Verify that the NAS volume is mounted:
kubectl exec -it deploy/coredump-nas-volume-test -- sh -c 'df -h | grep aliyun'Expected output:
0389a***-nh7m.cn-shanghai.extreme.nas.aliyuncs.com:/ 10P 0 10P 0% /data/dump-aThe NAS volume is mounted and ready. Core files generated by crashes are stored at
/data/dump-a/.
Mount an OSS volume to store core files
Use a shared OSS volume to collect core files when a container crashes.
-
Create an OSS bucket. For details, see Mount a statically provisioned OSS volume.
-
Create a Deployment using the following YAML. Replace the
Spec.csi.volumeAttributesvalues with your OSS bucket endpoint and credentials. For details on creating Deployments, see Create a stateless application by using a Deployment.apiVersion: apps/v1 kind: Deployment metadata: name: coredump-oss-volume-test spec: replicas: 2 selector: matchLabels: app: nginx template: metadata: labels: app: nginx annotations: alibabacloud.com/core-pattern: "/data/dump-a/core-%E-%p-%t" # Core file storage path spec: containers: - name: nginx image: registry.cn-hangzhou.aliyuncs.com/acs-sample/nginx:latest volumeMounts: - name: oss-volume mountPath: /data/dump-a/ volumes: - name: oss-volume persistentVolumeClaim: claimName: oss-pvc --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: oss-pvc spec: storageClassName: test # Used for binding mapping only; no resource is created accessModes: - ReadWriteMany resources: requests: storage: 50Gi selector: matchLabels: alicloud-pvname: oss-csi-pv --- apiVersion: v1 kind: PersistentVolume metadata: name: oss-csi-pv labels: alicloud-pvname: oss-csi-pv spec: storageClassName: test # Used for binding mapping only; no resource is created capacity: storage: 50Gi accessModes: - ReadWriteMany persistentVolumeReclaimPolicy: Retain csi: driver: ossplugin.csi.alibabacloud.com volumeHandle: oss-csi-pv volumeAttributes: bucket: "oss-test" url: "oss-cn-hangzhou-internal.aliyuncs.com" otherOpts: "-o max_stat_cache_size=0 -o allow_other" akId: "<your AccessKey ID>" akSecret: "<your AccessKey Secret>"When a pod triggers a core dump, the core file is stored in the OSS volume.
-
Verify that the OSS volume is mounted:
kubectl exec -it deploy/coredump-oss-volume-test -- sh -c 'df -h | grep s3fs'Expected output:
s3fs 16E 0 16E 0% /data/dump-aThe OSS volume is mounted and ready. Core files generated by crashes are stored at
/data/dump-a/.
Use an ephemeral container to access core files
Use this approach when excessive core files are generated or when you need to debug a specific pod crash without a remote shared volume. After a core dump event occurs, the core file is saved to the rootfs layer. You can then log in to an injected ephemeral container to analyze the core file. The core file is stored in an emptyDir volume mounted to both the application container and the ephemeral container.
The open-source kubectl debug command does not support volume mounts when injecting ephemeral containers. As a workaround, the steps below use the Kubernetes API directly via kubectl proxy and curl to inject an ephemeral container with a volume mount configured. Open two terminal windows and keep the proxy terminal running throughout.
-
Create a Deployment using the following YAML. For details on creating Deployments, see Create a stateless application by using a Deployment.
apiVersion: apps/v1 kind: Deployment metadata: name: coredump-emptydir-volume-test spec: replicas: 2 selector: matchLabels: app: nginx template: metadata: labels: app: nginx annotations: alibabacloud.com/core-pattern: "/data/dump-a/core-%E-%p-%t" # Core file storage path spec: containers: - name: nginx image: registry.cn-hangzhou.aliyuncs.com/acs-sample/nginx:latest volumeMounts: - name: emptydir-volume mountPath: /data/dump-a/ volumes: - name: emptydir-volume emptyDir: {}This creates a Deployment with an
emptydir-volumevolume mounted at/data/dump-a/. After a crash, inject an ephemeral container to access core files at that path. -
In the first terminal, start a local proxy between your client and the cluster:
Use
--portto specify a different port. For details, see kubectl proxy.kubectl proxyExpected output:
Starting to serve on 127.0.0.1:8001 -
In the second terminal, inject an ephemeral container into the pod. Replace
coredump-emptydir-volume-test-xxxxxwith the actual pod name andtarget-containerwith the target container name.Parameter Description ...namespaces/${NAMESPACE}...Namespace of the pod ...pods/${POD_NAME}...Name of the pod spec: ${SPEC_DETAIL}Spec of the ephemeral container. Validate the JSON format before submitting. Spec.ephemeralContainers.nameName of the ephemeral container. Must be unique when multiple ephemeral containers are injected. Spec.ephemeralContainers.commandStartup command. Optional when using a custom image with a default entrypoint. Spec.ephemeralContainers.targetContainerNameName of the target container in the pod. Required when the pod has multiple containers. Spec.ephemeralContainers.volumeMountsMount path for the ephemeral container. Must match the core-patternannotation value.curl -k http://127.0.0.1:8001/api/v1/namespaces/default/pods/coredump-emptydir-volume-test-xxxxx/ephemeralcontainers \ -X PATCH \ -H 'Content-Type: application/strategic-merge-patch+json' \ -d '{ "spec": { "ephemeralContainers": [ { "name": "debugger-container-name", "command": [ "/bin/sh", "-c", "sleep 3600" ], "image": "registry.cn-hangzhou.aliyuncs.com/acs-sample/nginx:latest", "stdin": true, "tty": true, "targetContainerName": "target-container", "volumeMounts": [ { "name": "emptydir-volume", "mountPath": "/data/dump-a/" } ] } ] } }'The following table describes the key parameters:
-
Log in to the ephemeral container:
kubectl exec -it -n default coredump-emptydir-volume-test-xxxxx -c debugger-container-name sh -
Access the core file directory:
cd /data/dump-a && pwdExpected output:
/data/dump-aCore files generated by the crashed container are in this directory. Use gdb to analyze them.
-
After finishing the debugging session, stop the proxy process in the first terminal by pressing Ctrl+C.