Prevent Container Escapes with Pod Security Hardening - ACK

Container Service for Kubernetes (ACK) lets you harden pod security by applying a set of controls that reduce the risk of container escapes and privilege escalation. This topic covers nine security recommendations, including how to forbid privileged mode, enforce non-root execution, restrict hostPath volumes, and disable automatic ServiceAccount token mounting.

Why container escapes matter

Container escapes let attackers elevate privileges from inside a container to control the underlying host. Two default Kubernetes behaviors create this risk.

Default root context. By default, container processes run as the root user in Linux. Docker restricts what that root user can do using Linux capabilities, but the default set is broad:

cap_chown, cap_dac_override, cap_fowner, cap_fsetid, cap_kill, cap_setgid, cap_setuid, cap_setpcap, cap_net_bind_service, cap_net_raw, cap_sys_chroot, cap_mknod, cap_audit_write, cap_setfcap

An attacker who exploits a vulnerability in a containerized application can use these capabilities to read Secrets, ConfigMaps, and other sensitive data on the host. Avoid running containers in privileged mode — privileged containers inherit all Linux capabilities of the root user.

Node-wide API access via the kubelet. All Kubernetes worker nodes use the node authorizer, a special-purpose authorization mode that governs API requests made by the kubelet. The node authorizer grants each kubelet read access to Services, Endpoints, Nodes, Pods, Secrets, ConfigMaps, persistent volumes (PVs), and persistent volume claims (PVCs) for pods on that node, plus write access to node status, pod status, and Events. In addition, the node authorizer grants authentication-related permissions: read and write access to the CertificateSigningRequest (CSR) API for Transport Layer Security (TLS) bootstrapping, and the ability to create TokenReview and SubjectAccessReview for reviewing delegated identity authentication and authorization.

By default, ACK clusters enable the NodeRestriction admission controller, which limits each kubelet to modifying only its own node and the pods bound to it. However, NodeRestriction alone cannot prevent an attacker from querying the Kubernetes API to collect information about the cluster environment.

Enforcement mechanism

ACK supports pod security policies built on Open Policy Agent (OPA) and Gatekeeper. These policies validate every request to create or update a pod against the security rules you configure. Non-compliant requests are rejected with an error. For each recommendation below, a corresponding predefined ACK policy is available to enforce the control at the namespace level.

Pod security recommendations

The following nine controls address the most common attack vectors. Apply them together for defense in depth.

1. Forbid privileged containers

Privileged containers inherit all Linux capabilities of the root user on the host. Most workloads do not need these capabilities. Forbid privileged mode to prevent an attacker who gains a foothold in a container from directly accessing host resources.

Restricted fields:

Field	Allowed values
`spec.containers[*].securityContext.privileged`	Undefined, `false`
`spec.initContainers[*].securityContext.privileged`	Undefined, `false`

Deploy the ACKPSPPrivilegedContainer policy to enforce this restriction across specified namespaces.

2. Run pods as a non-root user

All containers run as root by default. An attacker who gains shell access to a root container has a much easier path to the host. Run containers as a non-root user to limit the blast radius of a compromise.

Use any of these approaches:

Remove the shell from the container image.
Add a USER instruction to the Dockerfile.
Set spec.securityContext.runAsUser and runAsGroup in the podSpec.

Deploy the ACKPSPAllowedUsers policy to restrict which users and groups can run containers in specified namespaces.

3. Forbid Docker-in-Docker and Docker.sock mounting

Building images inside a container by using Docker-in-Docker or by mounting Docker.sock grants the container process control over the node.

Use alternative image-build approaches instead:

Use a Container Registry Enterprise Edition instance to build an image
kaniko — builds images inside Kubernetes without Docker daemon access
img — rootless image builds without Docker.sock

4. Restrict hostPath volumes

A hostPath volume mounts a host directory directly into a pod. A container running as root has write access to that directory. Attackers can use this to modify kubelet settings, create symbolic links to files outside the mounted path (such as /etc/shadow), or read Secrets mounted on the host. Set hostPath mounts to read-only to limit the damage.

volumeMounts:
- name: hostPath-volume
  readOnly: true
  mountPath: /host-path

Deploy the ACKPSPHostFilesystem policy to restrict which host directories can be mounted in specified namespaces.

5. Set resource requests and limits

A pod without resource requests or limits can exhaust all CPU and memory on a node, causing the kubelet to crash or other pods to be evicted. Setting requests and limits reduces resource contention and limits the impact of misbehaving applications.

Specify CPU and memory requests and limits in the podSpec. Apply a resource quota to the namespace to require all containers to declare requests and limits. Use a LimitRange to set per-container defaults and bounds. For more information, see Managing Resources for Containers, Resource Quotas, and Limit Ranges.

Deploy the ACKContainerLimits policy to enforce resource limits in specified namespaces.

6. Forbid privilege escalation

Privilege escalation lets a process change its security context at runtime — for example, by executing a binary with the SUID or SGID bit set (such as sudo). This allows a process to run with another user's or group's permissions. Explicitly disabling this prevents a non-root process from regaining root-level access.

Restricted field:

Field	Allowed values
`securityContext.allowPrivilegeEscalation`	`false`

securityContext:
  allowPrivilegeEscalation: false

Deploy the ACKPSPAllowPrivilegeEscalationContainer policy to enforce this setting in specified namespaces.

7. Disable automatic ServiceAccount token mounting

For pods that do not need to call the Kubernetes API, disable automatic ServiceAccount token mounting. This removes the token from the pod's file system so it cannot be used if the pod is compromised.

Disable token mounting for a specific pod:

apiVersion: v1
kind: Pod
metadata:
  name: pod-no-automount
spec:
  automountServiceAccountToken: false

Disable token mounting for all pods that use a specific ServiceAccount:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: sa-no-automount
automountServiceAccountToken: false

Important

Disabling token mounting does not prevent the pod from reaching the Kubernetes API — a pod can still make network connections to the API server. To block API access entirely, configure network policies. For more information, see Use network policies in ACK clusters.

Deploy the ACKBlockAutomountToken policy to enforce automountServiceAccountToken: false across application pods in specified namespaces.

8. Disable service discovery

For pods that do not need to look up or call other cluster services, reduce the information available to them by disabling service links and changing the DNS policy. This limits what an attacker can enumerate if the pod is compromised.

apiVersion: v1
kind: Pod
metadata:
  name: pod-no-service-info
spec:
  dnsPolicy: Default # The value Default does not indicate the default setting of a DNS policy.
  enableServiceLinks: false

By default, a pod's DNS policy is ClusterFirst, which routes DNS queries through the in-cluster CoreDNS service. Setting dnsPolicy: Default routes DNS through the node's resolver instead. Setting enableServiceLinks: false prevents Services in the namespace from being injected as environment variables. For more information, see Environment variables and Pod's DNS policy.

Important

Changing the DNS policy and disabling service links does not prevent the pod from reaching CoreDNS directly. An attacker can still enumerate cluster services by running dig SRV *.*.svc.cluster.local @$CLUSTER_DNS_IP. Use network policies to fully restrict service discovery. For more information, see Use network policies in ACK clusters.

9. Use a read-only root file system

A read-only root file system prevents attackers from overwriting application binaries or configuration files in the container. If the application must write to disk, configure it to write to a tmpfs volume or a mounted persistent volume instead.

Restricted field:

Field	Allowed values
`securityContext.readOnlyRootFilesystem`	`true`

securityContext:
  readOnlyRootFilesystem: true

Deploy the ACKPSPReadOnlyRootFilesystem policy to enforce a read-only root file system for pods in specified namespaces.

What's next

Review all predefined policies available for ACK clusters: Predefined security policies of ACK
Restrict pod-to-pod and pod-to-API-server traffic with network policies: Use network policies in ACK clusters