Enable API Server Audit Logs to Detect Cluster Security Threats - Container Service for Kubernetes

Prerequisites

Before you begin, ensure that you have:

SLS resource quotas in your account that are not exceeded. Exceeding any of the following quotas prevents you from enabling audit logging: To check and adjust quotas, see Adjust resource quotas.
- Number of Simple Log Service (SLS) projects
- Number of Logstores per SLS project
- Number of dashboards per SLS project

Step 1: Enable API server auditing

When you create an ACK cluster, Enable Log Service is selected by default, which enables the API server auditing feature. If you didn't enable it during cluster creation, follow these steps.

Log on to the ACS console. In the left-side navigation pane, click Clusters.
On the Clusters page, click the ID of the cluster you want to manage. In the left-side navigation pane of the cluster details page, choose Security > Audit.
If cluster logging or cluster auditing is not enabled, follow the on-screen instructions to select an SLS project and enable auditing.

Step 2: View audit reports

ACK provides four built-in audit log reports on the Cluster Auditing tab. Filter audit events by namespace or Resource Access Management (RAM) user to narrow results.

Important

Do not modify the built-in audit reports. To create custom reports, use the Simple Log Service console.

Do not modify the audit reports. To customize audit reports, you can create new reports in the Simple Log Service console.

After viewing report results, click the icon in the upper-right corner of any chart area for additional options, such as full-screen view or query statement preview.

Audit center overview

Displays a summary of cluster events, with details on high-priority events: RAM user operations, public network access, command execution in containers, resource deletion, Secret access, and Kubernetes CVE security risks.

Resource operation overview

Shows create, update, delete, and access statistics for common resource types:

Computing: Deployments, StatefulSets, Jobs, CronJobs, pods, and DaemonSets
Network: Services and Ingresses
Storage: ConfigMaps, Secrets, and PersistentVolumeClaims
Access control: Roles, ClusterRoles, RoleBindings, and ClusterRoleBindings

Resource operation details

Displays a detailed operation list for a specific resource type. Select or type a resource type to run a real-time query. The report shows the total event count, namespace distribution, success rate, time-series trend, and a full operation list.

Note

To query CustomResourceDefinition (CRD) resources or other unlisted types, enter the plural form of the resource name. For example, for the AliyunLogConfig CRD, enter AliyunLogConfigs.

Kubernetes CVE security risks

Shows potential Kubernetes CVE security risks in the cluster. Filter by RAM user ID to scope results to a specific account. For CVE details and fix guidance, see [CVE Security] Vulnerability Fix Announcement.

(Optional) Step 3: View detailed log records

For custom queries and deeper analysis, view raw audit log records in the SLS console.

Note

Audit log data is retained for 30 days by default. To change the retention period, see Manage a Logstore.

Log on to the ACS console. In the left-side navigation pane, click Clusters.
On the Clusters page, click the name of the cluster you want to manage. In the left-side navigation pane of the cluster details page, click Cluster Information.
On the Basic Information tab, find the Cluster Resources section and click the project ID next to Log Service Project. In the project list, click the Logstore named audit-${clusterid}.

Important
Indexes are pre-configured for the audit Logstore. Do not modify them — doing so may break the built-in reports.
Enter a query statement in the search box, set a time range (for example, the last 15 minutes), and click Search & Analyze.

Common query patterns

What you want to find	Query statement
All operations by a RAM user	Enter the RAM user ID
All operations on a specific resource	Enter the resource name (Deployment, Service, Secret, etc.)
Operations with system components excluded	`NOT user.username: node NOT user.username: serviceaccount NOT user.username: apiserver NOT user.username: kube-scheduler NOT user.username: kube-controller-manager`

For the full query and analysis syntax, see Query and analysis methods for Simple Log Service.

(Optional) Step 4: Configure alerts

Use the SLS alerting feature to get real-time notifications when specific operations occur. Supported notification channels include DingTalk chatbots, custom webhooks, and the Notification Center. For setup instructions, see Quickly set log-based alerts.

Example 1: Alert on exec commands in containers

Alert when a user runs a command in a container (kubectl exec). The alert includes the container, command, operator, event ID, timestamp, and source IP address.

Query statement:

verb : create and objectRef.subresource:exec and stage:  ResponseStarted | SELECT auditID as "Event ID", date_format(from_unixtime(__time__), '%Y-%m-%d %T' ) as "Operation Time",  regexp_extract("requestURI", '([^\?]*)/exec\?.*', 1)as "Resource",  regexp_extract("requestURI", '\?(.*)', 1)as "Command" ,"responseStatus.code" as "Status Code",
 CASE
 WHEN "user.username" != 'kubernetes-admin' then "user.username"
 WHEN "user.username" = 'kubernetes-admin' and regexp_like("annotations.authorization.k8s.io/reason", 'RoleBinding') then regexp_extract("annotations.authorization.k8s.io/reason", ' to User "(\w+)"', 1)
 ELSE 'kubernetes-admin' END
 as "Operator Account",
CASE WHEN json_array_length(sourceIPs) = 1 then json_format(json_array_get(sourceIPs, 0)) ELSE  sourceIPs END
as "Source Address" order by "Operation Time" desc  limit 10000

Conditional expression: Operation Event =~ ".*"

Example 2: Alert on failed public network access to the API server

Alert when access from a public network IP reaches a specific threshold, such as 10 attempts, and the failure rate is higher than a specific threshold, such as 50%. The alert includes the source IP, geographic location, and whether the IP is flagged as high-risk.

Query statement:

* | select ip as "Source Address", total as "Access Count", round(rate * 100, 2) as "Failure Rate %", failCount as "Illegal Access Count", CASE when security_check_ip(ip) = 1 then 'yes' else 'no' end  as "Is High-Risk IP",  ip_to_country(ip) as "Country", ip_to_province(ip) as "Province", ip_to_city(ip) as "City", ip_to_provider(ip) as "Carrier" from (select CASE WHEN json_array_length(sourceIPs) = 1 then json_format(json_array_get(sourceIPs, 0)) ELSE  sourceIPs END
as ip, count(1) as total,
sum(CASE WHEN "responseStatus.code" < 400 then 0
ELSE 1 END) * 1.0 / count(1) as rate,
count_if("responseStatus.code" = 403) as failCount
from log  group by ip limit 10000) where ip_to_domain(ip) != 'intranet' and ip not LIKE '%,%' and not try(is_subnet_of('<Your subnet IP address>')) ORDER by "Access Count" desc limit 10000

Conditional expression: Source Address =~ ".*"

More operations

Change the log project

To migrate audit log data to a different SLS project:

Log on to the ACS console. In the left-side navigation pane, click Clusters.
On the Clusters page, click the ID of the cluster you want to manage. In the left-side navigation pane of the cluster details page, choose Security > Audit.
In the upper-right corner of the Cluster Auditing tab, click Change Log Service Project.

Disable API server auditing

Log on to the ACS console. In the left-side navigation pane, click Clusters.
On the Clusters page, click the ID of the cluster you want to manage. In the left-side navigation pane of the cluster details page, choose Security > Audit.
In the upper-right corner of the Cluster Auditing tab, click Disable Cluster Auditing.

Use a third-party logging solution in an ACS cluster

SLS is the default logging solution for ACK cluster audit logs. To use a third-party logging service instead, opt out of SLS during cluster creation and connect your preferred solution to collect and retrieve audit logs.

Reference: API server audit configuration

When Enable Log Service is selected during cluster creation, the API server auditing feature is enabled. It collects event data based on an audit policy and writes events to the log backend.

Audit policy

An audit policy defines which requests are collected and at what level of detail. ACK uses the --audit-policy-file flag to apply the policy at API server startup.

The following audit levels are available:

Audit level	What is collected	Example events
None	Nothing — events matching this rule are skipped	`kube-proxy` watch on endpoints/services; kubelet GET on nodes; health check URLs (`/healthz`, `/version`, `/swagger`)
Metadata	Request metadata only (user, timestamp, resource, verb) — no request or response body	Secret and ConfigMap access; TokenReview requests
Request	Metadata and request body — no response body. Does not apply to non-resource requests.	GET, list, and watch operations on standard Kubernetes API groups
RequestResponse	Metadata, request body, and response body. Does not apply to non-resource requests.	Create, update, and delete operations on standard Kubernetes API groups

Note

Audit events are not recorded at the RequestReceived stage. Recording starts after the response header is sent.

The following YAML shows the default audit policy used by ACK clusters:

View the sample YAML file

apiVersion: audit.k8s.io/v1 # Required. The value is audit.k8s.io/v1 for clusters of Kubernetes v1.24 or later, and audit.k8s.io/v1beta1 for clusters of earlier versions.
kind: Policy
# Do not generate audit events for requests at the RequestReceived stage.
omitStages:
  - "RequestReceived"
rules:
  # The following types of requests are frequent and have low potential risks. We recommend that you set the level to None to skip auditing.
  - level: None
    users: ["system:kube-proxy"]
    verbs: ["watch"]
    resources:
      - group: "" # core
        resources: ["endpoints", "services"]
  - level: None
    users: ["system:unsecured"]
    namespaces: ["kube-system"]
    verbs: ["get"]
    resources:
      - group: "" # core
        resources: ["configmaps"]
  - level: None
    users: ["kubelet"] # legacy kubelet identity
    verbs: ["get"]
    resources:
      - group: "" # core
        resources: ["nodes"]
  - level: None
    userGroups: ["system:nodes"]
    verbs: ["get"]
    resources:
      - group: "" # core
        resources: ["nodes"]
  - level: None
    users:
      - system:kube-controller-manager
      - system:kube-scheduler
      - system:serviceaccount:kube-system:endpoint-controller
    verbs: ["get", "update"]
    namespaces: ["kube-system"]
    resources:
      - group: "" # core
        resources: ["endpoints"]
  - level: None
    users: ["system:apiserver"]
    verbs: ["get"]
    resources:
      - group: "" # core
        resources: ["namespaces"]
  # For read-only URLs, such as /healthz*, /version*, and /swagger*, set the level to None to skip auditing.
  - level: None
    nonResourceURLs:
      - /healthz*
      - /version
      - /swagger*
  # Set the level to None for events to skip auditing.
  - level: None
    resources:
      - group: "" # core
        resources: ["events"]
  # For interfaces such as Secrets, ConfigMaps, and TokenReviews that may contain sensitive information or binary files, set the level to Metadata.
  - level: Metadata
    resources:
      - group: "" # core
        resources: ["secrets", "configmaps"]
      - group: authentication.k8s.io
        resources: ["tokenreviews"]
  # Requests may return large amounts of data. Set the level to Request to not collect the response body.
  - level: Request
    verbs: ["get", "list", "watch"]
    resources:
      - group: "" # core
      - group: "admissionregistration.k8s.io"
      - group: "apps"
      - group: "authentication.k8s.io"
      - group: "authorization.k8s.io"
      - group: "autoscaling"
      - group: "batch"
      - group: "certificates.k8s.io"
      - group: "extensions"
      - group: "networking.k8s.io"
      - group: "policy"
      - group: "rbac.authorization.k8s.io"
      - group: "settings.k8s.io"
      - group: "storage.k8s.io"
  # For known Kubernetes APIs, the level is set to RequestResponse by default to return the request and response bodies.
  - level: RequestResponse
    resources:
      - group: "" # core
      - group: "admissionregistration.k8s.io"
      - group: "apps"
      - group: "authentication.k8s.io"
      - group: "authorization.k8s.io"
      - group: "autoscaling"
      - group: "batch"
      - group: "certificates.k8s.io"
      - group: "extensions"
      - group: "networking.k8s.io"
      - group: "policy"
      - group: "rbac.authorization.k8s.io"
      - group: "settings.k8s.io"
      - group: "storage.k8s.io"
  # For all other requests, the level is set to Metadata by default.
  - level: Metadata

Audit backend

After collection, audit events are written to the log backend file system in standard JSON format.