This topic describes how to configure Container Service for Kubernetes (ACK) to collect and analyze the audit logs of Kubernetes components, including the audit logs of the API server, Ingresses, control plane components, and key Kubernetes events. This helps you locate causes when security issues or cluster issues are found in the log data.

Use cluster auditing

The audit log of an API server in a Kubernetes cluster helps administrators track operations performed by different users. This plays an important role in the security and maintenance of the cluster. For more information about how to collect and analyze audit logs by using Log Service, set custom alert rules, and disable cluster auditing, see Use cluster auditing.

ACK provides the following audit policies:

apiVersion: audit.k8s.io/v1beta1 # This is required.
kind: Policy
#Do not generate audit events for requests in the RequestReceived stage. 
omitStages:
- "RequestReceived"
rules:
#Ignore the following requests because the requests are manually identified as high-volume and low-risk. 
- level: None
  users: ["system:kube-proxy"]
  verbs: ["watch"]
  resources:
    - group: "" # core
      resources: ["endpoints", "services"]
- level: None
  users: ["system:unsecured"]
  namespaces: ["kube-system"]
  verbs: ["get"]
  resources:
    - group: "" # core
      resources: ["configmaps"]
- level: None
  users: ["kubelet"] # legacy kubelet identity
  verbs: ["get"]
  resources:
    - group: "" # core
      resources: ["nodes"]
- level: None
  userGroups: ["system:nodes"]
  verbs: ["get"]
  resources:
    - group: "" # core
      resources: ["nodes"]
- level: None
  users:
    - system:kube-controller-manager
    - system:kube-scheduler
    - system:serviceaccount:kube-system:endpoint-controller
  verbs: ["get", "update"]
  namespaces: ["kube-system"]
  resources:
    - group: "" # core
      resources: ["endpoints"]
- level: None
  users: ["system:apiserver"]
  verbs: ["get"]
  resources:
    - group: "" # core
      resources: ["namespaces"]
#Do not audit requests that are sent to the following read-only URLs. 
- level: None
  nonResourceURLs:
    - /healthz*
    - /version
    - /swagger*
#Do not audit requests that generated upon audit events. 
- level: None
  resources:
    - group: "" # core
      resources: ["events"]
#Secrets, ConfigMaps, and token reviews can contain sensitive and binary data. 
#Therefore, you can audit only the metadata of these resources. 
- level: Metadata
  resources:
    - group: "" # core
      resources: ["secrets", "configmaps"]
    - group: authentication.k8s.io
      resources: ["tokenreviews"]

- level: Request
  verbs: ["get", "list", "watch"]
  resources:
    - group: "" # core
    - group: "admissionregistration.k8s.io"
    - group: "apps"
    - group: "authentication.k8s.io"
    - group: "authorization.k8s.io"
    - group: "autoscaling"
    - group: "batch"
    - group: "certificates.k8s.io"
    - group: "extensions"
    - group: "networking.k8s.io"
    - group: "policy"
    - group: "rbac.authorization.k8s.io"
    - group: "settings.k8s.io"
    - group: "storage.k8s.io"
#The default audit level for known API requests and responses. 
- level: RequestResponse
  resources:
    - group: "" # core
    - group: "admissionregistration.k8s.io"
    - group: "apps"
    - group: "authentication.k8s.io"
    - group: "authorization.k8s.io"
    - group: "autoscaling"
    - group: "batch"
    - group: "certificates.k8s.io"
    - group: "extensions"
    - group: "networking.k8s.io"
    - group: "policy"
    - group: "rbac.authorization.k8s.io"
    - group: "settings.k8s.io"
    - group: "storage.k8s.io"
    - group: "autoscaling.alibabacloud.com"
#The default audit level for other requests. 
- level: Metadata

Use the audit log metadata

The Kubernetes audit log contains two annotations: authorization.k8s.io/decision and authorization.k8s.io/reason. The authorization.k8s.io/decision annotation indicates whether a request is authorized. The authorization.k8s.io/reason annotation indicates the reason for making the decision. The annotations are used to specify the reasons why specific API operations can be called.

Use node-problem-detector with the Kubernetes event center of Log Service to identify abnormal cluster events

node-problem-detector is a tool maintained by ACK to diagnose Kubernetes nodes. node-problem-detector detects node exceptions, generates node events, and works with kube-eventer to generate alerts upon these events and enable closed-loop management of alerts. node-problem-detector generates node events when the following exceptions are detected: Docker engine hangs, Linux kernel hangs, outbound traffic exceptions, and file descriptor exceptions. In addition to node issues and exceptions detected by node-problem-detector, a Kubernetes cluster also generates events when the status of the cluster changes. For example, a Kubernetes cluster generates events when a pod is evicted and the cluster fails to pull an image. The Kubernetes event center of Log Service collects all events generated in Kubernetes clusters and provides the following capabilities: storage, query, analytics, visualization, and alerting. The Kubernetes event center helps O&M administrators identify issues that may affect the cluster stability and abnormal events, such as regular users running the exec command to log on to specific containers. For more information, see Event monitoring.

Enable the Ingress dashboard

Ingress controllers of ACK allow you to stream all HTTP request log data to standard outputs. ACK is also integrated with Log Service. You can create dashboards to monitor and analyze log data. The Ingress dashboard displays the following information about the status of Ingresses in a cluster: the number of page views (PVs), the number of unique visitors (UVs), inbound and outbound traffic, the average latency, and top URLs. This helps you gain insights into the service traffic, and detect malicious traffic and DDoS attacks at the earliest opportunity. For more information, see Ingress Dashboard.

Enable logging for CoreDNS

CoreDNS is deployed in ACK clusters and serves as a DNS server. You can check the log of CoreDNS to locate the causes of slow DNS resolution or analyze DNS queries for high-risk domain names. You can view the analytical report of the CoreDNS log in Log Service dashboards. This helps you identify DNS queries for high-risk domain names. For more information, see Monitor CoreDNS and analyze the CoreDNS log.