If a program unexpectedly terminates or stops responding, the operating system records the content of the random access memory (RAM) that is allocated to the program and saves the content to a file for subsequent debugging and analysis. This process is called a core dump. With core dump files, you can use the gdb debugging tool to locate the cause of program crashes. This topic describes how to enable core dumps for an ACS pod. This way, you can view and analyze core dump files generated when a container exceptionally exits, find out the cause of the issue, and fix the issue.
How it works
In Linux, if a program unexpectedly terminates or crashes, the operating system records the state of the RAM that is allocated to the program and saves the state to a file. This process is called a core dump. RAM state files generated during a core dump are known as core dump files, which are usually named as core files. You can use the gdb debugging tool to view and analyze core files in order to find the cause.
The following figure shows the core dump signals supported in Linux. By default, signals whose actions are Core will generate core files. For more information, see Core dump file.

How to work with core dumps
By default, core dumps are disabled for ACS pods. Frequently core dumps generate large numbers of core files, which will exhaust the disk space and further affect your businesses. We recommend that you mount a remotely shared OSS or NAS volume to store core files. By specifying a custom path to store core files, you can ensure the integrity of core files and avoid CrashBackOff events. These events will generate core files and exhaust the storage space at the container rootfs layer.
When excessive core files are generated, you can use ephemeral containers. After a core dump event occurs, the core file is saved to the rootfs layer. You can log on to the ephemeral container to analyze the core file.
You can add the pod annotation alibabacloud.com/core-pattern: core-path/core-pattern to specify the path of core files and enable core dumps. You can mount the path to a shared volume so that you can analyze the core files in the path. You can specify the name of core-pattern files, such as core-%E-%p-%t:
%E: the path of the executable file that causes the crash (Executable Path).%p: the ID of the crashed process (process ID).%t: the time when the crash occurred (Timestamp).
For more information about core-pattern file naming, see Man page of core.
apiVersion: v1
kind: Pod
metadata:
annotations:
alibabacloud.com/core-pattern: "/data/dump-a/core-%E-%p-%t"
...You can modify the kubeconfig file of the ACS cluster or perform the following steps in CloudShell. To modify a kubeconfig file, make sure that kubectl is installed and the kubeconfig file is configured. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.
Mount a remotely shared volume
Mount a NAS volume to store core files
Use a shared NAS volume to collect core files generated by the kernel when a container crash occurs.
Create a NAS file system and a mount target. For more information, see Create a NAS file system and a mount target.
Create a Deployment named coredump-nas-volume-test based on the following YAML content. For more information, see Create a stateless application by using a Deployment. Replace the value of
volumes.volumeAttributes.serverwith the actual NAS server address.apiVersion: apps/v1 kind: Deployment metadata: name: coredump-nas-volume-test labels: app: test spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: name: nginx-test labels: app: nginx annotations: alibabacloud.com/core-pattern: "/data/dump-a/core-%E-%p-%t" # Specify the path of core files. spec: containers: - name: nginx image: registry.cn-hangzhou.aliyuncs.com/acs-sample/nginx:latest volumeMounts: - name: nas-volume mountPath: /data/dump-a/ volumes: # Mount a shared NAS volume. - name: nas-volume csi: driver: nasplugin.csi.alibabacloud.com fsType: nas volumeAttributes: server: "0389a***-nh7m.cn-shanghai.extreme.nas.aliyuncs.com" path: "/" vers: "3" options: "nolock,tcp,noresvport"After the preceding pod triggers a core dump event, the core file is stored in the remote NAS volume.
Run the following command to check whether the volume is mounted. Make sure that you can view core files in the remotely shared volume when core dumps occur.
kubectl exec -it deploy/coredump-nas-volume-test -- sh -c 'df -h | grep aliyun'Expected results:
0389a***-nh7m.cn-shanghai.extreme.nas.aliyuncs.com:/ 10P 0 10P 0% /data/dump-aThe mounted NAS volume is working.
Mount an OSS volume to store core files
Use a shared OSS volume to collect core files generated by the kernel when a container crash occurs.
Create an OSS bucket. For more information, see Mount a statically provisioned OSS volume.
Create a Deployment named coredump-oss-volume-test based on the following YAML content. For more information, see Create a stateless application by using a Deployment. Replace
.Spec.csi.volumeAttributeswith the endpoint of the OSS bucket and the credential information.apiVersion: apps/v1 kind: Deployment metadata: name: coredump-oss-volume-test spec: replicas: 2 selector: matchLabels: app: nginx template: metadata: labels: app: nginx annotations: alibabacloud.com/core-pattern: "/data/dump-a/core-%E-%p-%t" # Specify the path of core files. spec: containers: - name: nginx image: registry.cn-hangzhou.aliyuncs.com/acs-sample/nginx:latest volumeMounts: - name: oss-volume mountPath: /data/dump-a/ volumes: - name: oss-volume persistentVolumeClaim: claimName: oss-pvc --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: oss-pvc spec: storageClassName: test # The storageClass name is used only for binding mapping. No resource is created. accessModes: - ReadWriteMany resources: requests: storage: 50Gi selector: matchLabels: alicloud-pvname: oss-csi-pv --- apiVersion: v1 kind: PersistentVolume metadata: name: oss-csi-pv labels: alicloud-pvname: oss-csi-pv spec: storageClassName: test # The storageClass name is used only for binding mapping. No resource is created. capacity: storage: 50Gi accessModes: - ReadWriteMany persistentVolumeReclaimPolicy: Retain csi: driver: ossplugin.csi.alibabacloud.com volumeHandle: oss-csi-pv volumeAttributes: bucket: "oss-test" url: "oss-cn-hangzhou-internal.aliyuncs.com" otherOpts: "-o max_stat_cache_size=0 -o allow_other" akId: "<your AccessKey ID>" akSecret: "<your AccessKey Secret>"After the preceding pod triggers a core dump event, the core file is stored in the remote OSS volume.
Run the following command to check whether the volume is mounted. Make sure that you can view core files in the remotely shared volume when core dumps occur.
kubectl exec -it deploy/coredump-oss-volume-test -- sh -c 'df -h | grep s3fs'Expected results:
s3fs 16E 0 16E 0% /data/dump-aThe mounted OSS volume is working.
Inject ephemeral containers
Inject an ephemeral container. Mount the path of core files to the container as an emptyDir volume.
Create a Deployment named coredump-emptydir-volume-test based on the following YAML content. For more information, see Create a stateless application by using a Deployment.
apiVersion: apps/v1 kind: Deployment metadata: name: coredump-emptydir-volume-test spec: replicas: 2 selector: matchLabels: app: nginx template: metadata: labels: app: nginx annotations: alibabacloud.com/core-pattern: "/data/dump-a/core-%E-%p-%t" # Specify the path of core files. spec: containers: - name: nginx image: registry.cn-hangzhou.aliyuncs.com/acs-sample/nginx:latest volumeMounts: - name: emptydir-volume mountPath: /data/dump-a/ volumes: - name: emptydir-volume emptyDir: {}Create a Deployment and mount the volume named
emptydir-volumeto the pod. After configuraiton, you can log on to the mount target of the ephemeral container to view core files.ImportantThe open source version of
kubectl debugdoes not allow you to configure mounting when injecting ephemeral containers into a pod. The following steps show how to configure a local proxy environment by usingkubectl proxyto verify the mount target. Open two prompts on the client and keep the proxy configuration prompt.Run the following command in one prompt to launch a proxy service between the client and cluster.
kubectl proxyExpected results:
Starting to serve on 127.0.0.1:8001NoteYou can run the
kubectl proxycommand and set--portto specify a port. For more information, see kubectl proxy.Run the following command in another prompt to inject the ephemeral container to a pod. Replace
coredump-emptydir-volume-test-xxxxxandtarget-containerwith the actual values.curl -k http://127.0.0.1:8001/api/v1/namespaces/default/pods/coredump-emptydir-volume-test-xxxxx/ephemeralcontainers -X PATCH -H 'Content-Type: application/strategic-merge-patch+json' -d '{ "spec": { "ephemeralContainers": [ { "name": "debugger-container-name", "command": [ "/bin/sh", "-c", "sleep 3600" ], "image": "registry.cn-hangzhou.aliyuncs.com/acs-sample/nginx:latest", "stdin": true, "tty": true, "targetContainerName": "target-container", # Specify a container when the pod contains multiple containers. "volumeMounts": [ { "name": "emptydir-volume", "mountPath": "/data/dump-a/" } ] } ] } }'The following table describes some parameters.
Parameter
Description
...namespaces/${NAMESPACE}...
The namespace of the injected ephemeral container.
...pods/${POD_NAME}...
The pod name of the injected ephemeral container.
spec: ${SPEC_DETAIL}
The Spec content of the injected ephemeral container.
ImportantWe recommend that you replace these values in steps and use a tool to verify the JSON format of the Spec field.
Spec.ephemeralContainers.nameThe name of the ephemeral container. When multiple ephemeral containers are injected, the names must be unique.
Spec.ephemeralContainers.commandThe boot command of the ephemeral container. This setting is optional when a custom image is used.
Spec.ephemeralContainers.targetContainerNameWhen a pod contains multiple containers, you can specify the name of the injected container.
Spec.ephemeralContainers.volumeMountsThe mount target of the ephemeral container, which must be the same as the
core-patternvalue of the pod.After the ephemeral container runs, run the following command to log on to it.
kubectl exec -it -n default coredump-emptydir-volume-test-xxxxx -c debugger-container-name shIn the ephemeral container, run the following command to access the mount target.
cd /data/dump-a && pwdExpected results:
/data/dump-a