The Kubernetes API server logs every request and response as an audit event. ACK cluster auditing lets you analyze these logs to answer security questions: who performed an operation? when did it happen? which resource was affected? where did the request originate? and was the request authorized?
Use cluster auditing to trace cluster operation history, investigate anomalies, and meet compliance requirements.
This topic applies to ACK managed clusters, ACK dedicated clusters, and ACK Serverless clusters.
For registered clusters, see Use cluster auditing.
Billing
Audit log data is billed using the pay-by-feature method. On the bills overview page, you can view the billing information about audit log data. For more information, see View your bills and Pay-by-feature.
Enabling cluster auditing generates Simple Log Service (SLS) usage costs. To control costs, monitor your log volume and set a retention period that matches your compliance requirements. The default retention period is 30 days for ACK managed clusters and 365 days for ACK dedicated clusters.
Prerequisites
Before you begin, make sure that the following Simple Log Service (SLS) quotas in your Alibaba Cloud account are sufficient:
-
The quota on SLS projects
-
The quota on Logstores in each SLS project
-
The quota on dashboards in each SLS project
For quota details and how to request increases, see Adjust resource quotas.
Enable cluster auditing
By default, Enable Log Service is automatically selected when you create a cluster, so cluster auditing is enabled out of the box. If you disabled it during cluster creation, follow these steps to enable it.
-
Log on to the ACK console. In the left-side navigation pane, click Clusters.
-
On the Clusters page, click the name of the cluster you want to manage. In the left-side pane, choose Security > Cluster Auditing.
-
Follow the on-screen instructions to select an SLS project and enable cluster auditing.
During this process, ACK automatically creates a Logstore named audit-${clustereid} in the project.
Do not modify the default indexes on this Logstore. Changing the indexes prevents audit reports from being generated correctly.
View audit log reports
ACK provides four built-in audit log reports. On the Cluster Auditing page, filter events by namespace or RAM user to focus on specific activity.
Do not modify the built-in audit log reports. To create custom reports, log on to the Simple Log Service console and create new reports there.
Do not modify the built-in audit log reports. If you want to create custom audit log reports, go to the Simple Log Service console to create new reports.
Click the
icon in the upper-right corner of any chart to view it in full-screen mode or preview its query statement.
Overview
Displays all events in the cluster and detailed information about high-priority events, including RAM user operations, Internet access, command executions, resource deletions, Secret access, and Kubernetes Common Vulnerabilities and Exposures (CVE) vulnerabilities.
Operations overview
Provides statistics on create, update, delete, and read operations across:
-
Computing resources: Deployment, StatefulSet, CronJob, DaemonSet, Job, and Pod
-
Network resources: Service and Ingress
-
Storage resources: ConfigMap, Secret, and PersistentVolumeClaim (PVC)
-
Access control resources: Role, ClusterRole, RoleBinding, and ClusterRoleBinding
Operation details
Shows operation details for a specific resource type. Select or enter a resource type to query in real time. The report displays the total operation count, namespace distribution, success rate, and operation trends over time.
To query CustomResourceDefinition (CRD) resources or resources not listed in the report, enter the plural form of the resource name. For example, to query theAliyunLogConfigCRD, enterAliyunLogConfigs.
CVE vulnerabilities
Displays Kubernetes CVE vulnerabilities in the cluster. Enter a RAM user ID to filter results for that user. For vulnerability details and remediation guidance, see \[CVE Securities\] CVE vulnerability fixes.
Query detailed log data
For custom queries and deeper analysis, access the raw audit log data in the Simple Log Service console.
The default retention period is 30 days for ACK managed cluster audit logs and 365 days for ACK dedicated cluster audit logs. To change the retention period, see Manage a logstore.
-
Log on to the ACK console. In the left-side navigation pane, click Clusters.
-
On the Clusters page, click the cluster name. In the left-side pane, click Cluster Information.
-
On the Cluster Resources tab, click the project ID next to Log Service Project. In the Logstores list, click the Logstore named audit-${clustereid}.
-
Enter a query statement and set the time range (for example, 15 minutes), then click Search & Analysis.
Common query patterns:
-
By RAM user: Enter the RAM user ID.
-
By resource: Enter the resource name (Deployment, Service, ConfigMap, or similar).
-
Exclude system components: Enter the following statement to filter out noise from system components:
NOT user.username: node NOT user.username: serviceaccount NOT user.username: apiserver NOT user.username: kube-scheduler NOT user.username: kube-controller-manager
For query syntax details, see Query methods.
Configure alerting
Set up alert rules in Simple Log Service to get notified in real time when specific operations occur. Supported notification methods include DingTalk chatbots, custom webhooks, and Alibaba Cloud Message Center.
For general setup instructions, see Configure an alert rule in Simple Log Service.
Alert example 1: Commands executed in containers
An enterprise wants to forbid users to log on to containers or run commands in containers. When a user runs a command in a container, an alert is immediately generated. The alert message contains information about the container, command, user, event ID, time, and source IP address.
Sample query statement:
verb : create and objectRef.subresource:exec and stage: ResponseStarted | SELECT auditID as "Event ID", date_format(from_unixtime(__time__), '%Y-%m-%d %T' ) as "Time", regexp_extract("requestURI", '([^\?]*)/exec\?.*', 1)as "Resource", regexp_extract("requestURI", '\?(.*)', 1)as "Command" ,"responseStatus.code" as "Status code",
CASE
WHEN "user.username" != 'kubernetes-admin' then "user.username"
WHEN "user.username" = 'kubernetes-admin' and regexp_like("annotations.authorization.k8s.io/reason", 'RoleBinding') then regexp_extract("annotations.authorization.k8s.io/reason", ' to User "(\w+)"', 1)
ELSE 'kubernetes-admin' END
as "User account",
CASE WHEN json_array_length(sourceIPs) = 1 then json_format(json_array_get(sourceIPs, 0)) ELSE sourceIPs END
as "Source IP address" order by "Time" desc limit 10000
Condition expression: Event =~ ".*"
Alert example 2: Failed Internet access from the API server
Use this alert to monitor outbound Internet requests from your cluster. It fires when the number of Internet requests reaches the threshold (10) and the failure rate exceeds 50%.
Sample query statement:
* | select ip as "Source IP address", total as "Number of times of Internet access", round(rate * 100, 2) as "Failure rate in percentage", failCount as "Number of times of illegal access", CASE when security_check_ip(ip) = 1 then 'yes' else 'no' end as "Whether the IP address is risky", ip_to_country(ip) as "Country", ip_to_province(ip) as "Province", ip_to_city(ip) as "City", ip_to_provider(ip) as "ISP" from (select CASE WHEN json_array_length(sourceIPs) = 1 then json_format(json_array_get(sourceIPs, 0)) ELSE sourceIPs END
as ip, count(1) as total,
sum(CASE WHEN "responseStatus.code" < 400 then 0
ELSE 1 END) * 1.0 / count(1) as rate,
count_if("responseStatus.code" = 403) as failCount
from log group by ip limit 10000) where ip_to_domain(ip) != 'intranet' and ip not LIKE '%,%' and not try(is_subnet_of('7.0.0.0/8', ip)) ORDER by "Number of times of Internet access" desc limit 10000
Condition expression: Source IP address =~ ".*"
Manage cluster auditing settings
Change the Simple Log Service project
To migrate audit logs to a different SLS project:
-
Log on to the ACK console. In the left-side navigation pane, click Clusters.
-
On the Clusters page, click the cluster name. In the left-side pane, choose Security > Cluster Auditing.
-
In the upper-right corner, click Change Log Service Project and follow the prompts.
Disable cluster auditing
-
Log on to the ACK console. In the left-side navigation pane, click Clusters.
-
On the Clusters page, click the cluster name. In the left-side pane, choose Security > Cluster Auditing.
-
In the upper-right corner of the Cluster Auditing page, click Disable Cluster Auditing.
Use a third-party log service (ACK dedicated clusters only)
Simple Log Service is the recommended storage for audit logs. To use a third-party log service instead, skip Simple Log Service during cluster creation and integrate the third-party service to collect logs. The raw audit log files are available on master nodes at /var/log/kubernetes/kubernetes.audit in JSON format.
Audit policy and backend configuration (ACK dedicated clusters)
When you configure cluster components for an ACK dedicated cluster, Enable Log Service is selected by default. Audit events are collected based on the audit policy and written to the backend log file system.
Audit policy
The audit policy defines which events are collected and at what level of detail. The four audit levels are:
| Audit level | What is collected |
|---|---|
| None | Nothing. Events matching this rule are skipped. |
| Metadata | Collect the request metadata, such as the user information and timestamps. The request and response bodies are not collected. |
| Request | Request metadata and request body. The response body is not collected. Does not apply to non-resource requests. |
| RequestResponse | Request metadata, request body, and response body. Does not apply to non-resource requests. |
The audit policy is loaded from /etc/kubernetes/audit-policy.yml on master nodes (set via the --audit-policy-file flag). The following is the default policy used by ACK:
apiVersion: audit.k8s.io/v1 # Required. Set to audit.k8s.io/v1 if the Kubernetes version of the cluster is 1.24 or later and set to audit.k8s.io/v1beta1 if the Kubernetes version of the cluster is earlier than 1.24.
kind: Policy
# No need to generate audit events at the RequestReceived stage.
omitStages:
- "RequestReceived"
rules:
# The following types of requests are frequent and the risk of these requests is low. We recommend that you set the rule to None to skip these requests.
- level: None
users: ["system:kube-proxy"]
verbs: ["watch"]
resources:
- group: "" # core
resources: ["endpoints", "services"]
- level: None
users: ["system:unsecured"]
namespaces: ["kube-system"]
verbs: ["get"]
resources:
- group: "" # core
resources: ["configmaps"]
- level: None
users: ["kubelet"] # legacy kubelet identity
verbs: ["get"]
resources:
- group: "" # core
resources: ["nodes"]
- level: None
userGroups: ["system:nodes"]
verbs: ["get"]
resources:
- group: "" # core
resources: ["nodes"]
- level: None
users:
- system:kube-controller-manager
- system:kube-scheduler
- system:serviceaccount:kube-system:endpoint-controller
verbs: ["get", "update"]
namespaces: ["kube-system"]
resources:
- group: "" # core
resources: ["endpoints"]
- level: None
users: ["system:apiserver"]
verbs: ["get"]
resources:
- group: "" # core
resources: ["namespaces"]
# Set the rule to None for read-only URLs, such as /healthz*, /version*, and /swagger*.
- level: None
nonResourceURLs:
- /healthz*
- /version
- /swagger*
# Set the rule to None for events.
- level: None
resources:
- group: "" # core
resources: ["events"]
# Set the rule to Metadata for Secrets, ConfigMaps, and TokenReview API requests that may contain sensitive information or binary files.
- level: Metadata
resources:
- group: "" # core
resources: ["secrets", "configmaps"]
- group: authentication.k8s.io
resources: ["tokenreviews"]
# Responses may contain large amounts of data. Set the rule to Request so that the response body is not collected.
- level: Request
verbs: ["get", "list", "watch"]
resources:
- group: "" # core
- group: "admissionregistration.k8s.io"
- group: "apps"
- group: "authentication.k8s.io"
- group: "authorization.k8s.io"
- group: "autoscaling"
- group: "batch"
- group: "certificates.k8s.io"
- group: "extensions"
- group: "networking.k8s.io"
- group: "policy"
- group: "rbac.authorization.k8s.io"
- group: "settings.k8s.io"
- group: "storage.k8s.io"
# The rule is set to RequestResponse by default for known Kubernetes API requests to collect the request and response bodies.
- level: RequestResponse
resources:
- group: "" # core
- group: "admissionregistration.k8s.io"
- group: "apps"
- group: "authentication.k8s.io"
- group: "authorization.k8s.io"
- group: "autoscaling"
- group: "batch"
- group: "certificates.k8s.io"
- group: "extensions"
- group: "networking.k8s.io"
- group: "policy"
- group: "rbac.authorization.k8s.io"
- group: "settings.k8s.io"
- group: "storage.k8s.io"
# The rule is set to Metadata by default for other requests.
- level: Metadata
Key behaviors of this policy:
-
Logs are generated after response headers are sent, not when requests are received.
-
kube-proxy watch requests, kubelet and
system:nodesGET requests to nodes, kube-system endpoint operations, and API server GET requests to namespaces are excluded. -
Read operations (
get,list,watch) for theauthentication,rbac,certificates,autoscaling, andstorageAPIs are logged at theRequestlevel. Write operations on these APIs are logged atRequestResponse. -
Secrets and ConfigMaps are logged at
Metadataonly, so their content is never written to audit logs.
Audit backend
Audit events are written as JSON log files to the local file system on master nodes. The API server configuration file at /etc/kubernetes/manifests/kube-apiserver.yaml controls the backend behavior with the following flags:
| Flag | Description | Default |
|---|---|---|
--audit-log-maxbackup |
Maximum number of rotated log file shards to retain | 10 |
--audit-log-maxsize |
Maximum size of a single log file before rotation | 100 MB |
--audit-log-path |
Output path for audit log files | /var/log/kubernetes/kubernetes.audit |
--audit-log-maxage |
Retention period for rotated log files | 7 days |
--audit-policy-file |
Path to the audit policy file | /etc/kubernetes/audit-policy.yml |
What's next
-
To audit
kubectl execcommand executions inside containers, see Enable container auditing. -
For security best practices for enterprise security operations, see Best security practices.