Containers share the host kernel, which means a container escape can give an attacker direct access to the node and everything running on it. ACK provides several controls—from built-in admission controllers to Open Policy Agent (OPA)-based policy enforcement—to reduce this attack surface. Apply the configurations in this topic to harden pod security in your clusters.
Background
Two default Kubernetes behaviors explain why pod security configuration matters:
Containers run as root by default. Processes inside a container run in the context of the Linux root user. Docker restricts root-level operations by assigning a subset of Linux capabilities to each container, but the default set is broad enough to allow privilege escalation or access to sensitive host resources such as Secrets and ConfigMaps. The default capabilities are:
cap_chown, cap_dac_override, cap_fowner, cap_fsetid, cap_kill, cap_setgid, cap_setuid, cap_setpcap, cap_net_bind_service, cap_net_raw, cap_sys_chroot, cap_mknod, cap_audit_write, cap_setfcap
Avoid running privileged pods unless absolutely necessary—they inherit all Linux capabilities of the root user on the host.
The node authorizer grants broad access. All Kubernetes worker nodes use the node authorizer, which grants every Kubelet permission to perform the following operations:
Read: Services, Endpoints, Nodes, Pods, Secrets, ConfigMaps, PersistentVolumes (PVs), and PersistentVolumeClaims (PVCs) for pods scheduled on that node.
Write: Nodes and node status, Pods and pod status, Events.
Auth-related: Read/write access to the CertificateSigningRequest (CSR) API for TLS bootstrapping; ability to create TokenReview and SubjectAccessReview for delegated authentication and authorization checks.
ACK clusters enable the NodeRestriction admission controller by default, restricting each Kubelet to modifying only its own node object and the pods bound to it. Even so, an attacker who gains access to the host can query the Kubernetes API to retrieve sensitive cluster information. For more information, see NodeRestriction admission controller.
Security recommendations
Restrict privileged containers
Privileged containers inherit all Linux capabilities of the host root user. In most cases, containers don't need these permissions. Deploy an ACKPSPPrivilegedContainer policy instance to block privileged containers in specified namespaces.
ACK's container security policy is built on OPA and Gatekeeper. It validates pod create and update requests against user-defined security rules, and rejects any request that doesn't comply.
The NodeRestriction admission controller is enabled by default in ACK clusters. It restricts Kubelet operations to the node and pods bound to it, but does not block privileged containers—configure the ACKPSPPrivilegedContainer policy for that.
Run containers as non-root
Containers run as root by default. If an attacker exploits a vulnerability and gains shell access to a running container, they inherit root permissions. To mitigate this:
Remove the shell from the container image.
Add a
USERinstruction to your Dockerfile.Set
spec.securityContext.runAsUserandrunAsGroupin the pod specification to run processes as a non-root user.
Deploy an ACKPSPAllowedUsers policy instance to enforce non-root execution across specified namespaces.
Disable privilege escalation
Privilege escalation lets a process change its security context—for example, by executing a binary with the SUID or SGID bit set. Set allowPrivilegeEscalation: false in the pod's security context:
securityContext:
allowPrivilegeEscalation: falseDeploy an ACKPSPAllowPrivilegeEscalationContainer policy instance to enforce this setting in specified namespaces.
Restrict HostPath volume mounts
HostPath directly mounts a host directory into a container. A pod running as root has write access to the mounted path by default, which an attacker can use to modify Kubelet settings, create symbolic links to sensitive files such as /etc/shadow, install SSH keys, or read Secrets mounted on the host.
If your application requires HostPath, mount the volume as read-only and restrict the allowed path prefix:
volumeMounts:
- name: hostpath-volume
readOnly: true
mountPath: /host-pathDeploy an ACKPSPHostFilesystem policy instance to limit which host directories pods can mount in specified namespaces.
Set resource requests and limits
A pod with no resource limits can consume all CPU and memory on a node, causing the Kubelet to crash or evict other pods. Setting requests and limits minimizes resource contention and reduces the blast radius of runaway or malicious workloads.
Use a ResourceQuota to cap total resource consumption per namespace, or a LimitRange to set per-pod and per-container defaults and bounds:
ResourceQuota: specifies the total CPU and memory allocated to a namespace; enforces that all containers declare requests and limits.
LimitRange: sets minimum, maximum, and default resource values at the container or pod level.
Deploy an ACKContainerLimits policy instance to require resource limits on application pods in specified namespaces. For more information, see Managing resources for containers.
Forbid Docker-in-Docker and Docker socket mounts
Mounting /var/run/docker.sock or running Docker-in-Docker gives container processes control over the node's container runtime. Avoid these patterns. For building container images inside Kubernetes, use one of the following alternatives:
Disable automatic ServiceAccount token mounting
By default, Kubernetes mounts a ServiceAccount token into every pod. For pods that don't need to call the Kubernetes API, disable this to reduce the risk of token theft:
# Disable for a specific pod
apiVersion: v1
kind: Pod
metadata:
name: pod-no-automount
spec:
automountServiceAccountToken: false# Disable for all pods using a ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
name: sa-no-automount
automountServiceAccountToken: falseDisabling automountServiceAccountToken does not block network access to the Kubernetes API. To block that, modify the ACK cluster endpoint access method and apply a network policy. For more information, see Use network policies in an ACK cluster.
Deploy an ACKBlockAutomountToken policy instance to enforce this setting across specified namespaces.
Limit service discovery
For pods that don't need to discover or call other cluster services, set the DNS policy to Default (which uses the node's DNS resolver instead of CoreDNS) and disable service environment variable injection:
apiVersion: v1
kind: Pod
metadata:
name: pod-no-service-info
spec:
dnsPolicy: Default # Uses the node's DNS, not CoreDNS. "Default" is not the actual default—the default is "ClusterFirst".
enableServiceLinks: falseChanging the DNS policy and disabling service links does not block network access to the in-cluster DNS service. An attacker can still enumerate cluster services by querying CoreDNS directly—for example: dig SRV *.*.svc.cluster.local @$CLUSTER_DNS_IP. To block in-cluster service discovery, use a network policy. For more information, see Use network policies in an ACK cluster.
For more information on DNS policy values, see Kubernetes docs on pod DNS policy.
Use a read-only root filesystem
A read-only root filesystem prevents attackers from overwriting application files or installing malicious binaries. Set readOnlyRootFilesystem: true in the security context:
securityContext:
readOnlyRootFilesystem: trueIf your application needs to write files, mount a temporary directory (emptyDir) or an additional volume at the path that requires write access.
Deploy an ACKPSPReadOnlyRootFilesystem policy instance to enforce this in specified namespaces.