Auditing, an API Server feature, generates logs that record requests to the Kubernetes API and the results of the requests. ACK provides API Server audit logs to help cluster administrators track who did what to which resource and when. You can use these logs to trace the history of cluster operations, troubleshoot cluster failures, and simplify security operations and maintenance (O&M).
Usage notes
This topic applies to ACK managed clusters, ACK dedicated clusters, and ACK Serverless clusters.
To enable the API Server auditing feature for a registered cluster, see Enable cluster auditing.
Billing
You can view billing details, including audit log fees, on the bills overview page of the Billing Management console. For more information, see View your bills. For more information about the billing rules for audit logs, see Pay-by-feature billing.
Step 1: Enable the API server auditing feature
When you create a Kubernetes cluster, the Enable Log Service option is selected by default to enable the API Server auditing feature. If you did not enable this feature during cluster creation, follow the steps below.
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left navigation pane, click .
-
On the Audit page, click the Cluster Auditing tab.
If you have not enabled cluster logging or cluster auditing, follow the on-screen instructions to select a Simple Log Service project and enable the feature.
Ensure your account has sufficient quotas for Simple Log Service resources. Otherwise, enabling the cluster auditing feature may fail.
-
The quota on the number of projects that can be created.
-
The quota on the number of Logstores that can be created in a project.
-
The quota on the number of dashboards that can be created in a project.
For more information about Simple Log Service quotas and how to increase quotas, see Adjust resource quotas.
Step 2: View audit reports
Do not modify the default audit reports. If you want to create a custom audit report, create a new report on the Simple Log Service console.
ACK provides four built-in audit reports: Audit Center Overview, Resource Operations Overview, Resource Operation Details, and Kubernetes CVE Security Risks. On the Cluster Auditing page, you can specify dimensions such as namespace or RAM user to filter audit events and obtain the following information from the reports.
After obtaining results, you can click the
icon in the upper-right corner of a chart for more options, such as viewing the chart in full screen or previewing the query statement that corresponds to the chart.
Audit center overview
The Audit Center Overview report summarizes events in the ACK cluster and details important activities. These activities include RAM user operations, public-facing access, command executions, resource deletions, Secret access, and Kubernetes CVE security risks.
Resource operations overview
The Resource Operations Overview report provides statistics on common operations (create, update, delete, and access) performed on Kubernetes compute, network, and storage resources. The resources include the following types:
-
Compute resources: Deployment, StatefulSet, CronJob, DaemonSet, Job, and Pod.
-
Network resources: Service and Ingress.
-
Storage resources: ConfigMap, Secret, and PersistentVolumeClaim.
-
Access control resources: Role, ClusterRole, RoleBinding, and ClusterRoleBinding.
This page has four tabs: Audit Center Overview, Resource Operations Overview, Resource Operation Details, and Kubernetes CVE Security Risks. You can filter resources by Namespace and specify a time range. Statistics for each resource type are displayed in donut charts that are categorized by operation type.
Resource operation details
This report lists detailed operations on a specific type of resource in the Kubernetes cluster. You can select or enter a resource type to run a real-time query. This report displays information such as the total number of events for each type of resource operation, distribution by namespace, success rate, time-series trend, and a detailed list of operations.
If you want to view the operations on CustomResourceDefinitions (CRDs) registered in Kubernetes or other resources that are not listed, you can enter the plural form of the resource name. For example, if the CRD is AliyunLogConfig, enter AliyunLogConfigs.
Kubernetes CVE security risks
This report displays potential Kubernetes Common Vulnerabilities and Exposures (CVE) security risks in the current cluster. You can run a real-time query by selecting or entering a RAM user ID to view associated CVE risks. For more information about the CVEs and the solutions, see [CVE Security] Vulnerability Fix Announcement.
(Optional) Step 3: View detailed log records
For custom querying and analysis of audit logs, you can go to the Simple Log Service console to view detailed log records.
For an ACK managed cluster, the API Server audit logs are stored in the corresponding Logstore of Simple Log Service for 30 days by default. For an ACK dedicated cluster, the default retention period is 365 days. To change the default retention period, see Manage a Logstore.
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left navigation pane, click Cluster Information.
-
Click the Basic Information tab. Then, click the project ID next to Log Service Project. In the project list, click the Logstore named audit-${clusterid}.
During cluster creation, a Logstore named
audit-${clusterid}is automatically created in the specified Simple Log Service project.ImportantIndexes are configured for the audit Logstore by default. Do not modify them, or the reports may become invalid.
-
Enter a query in the search box, specify a time range such as the last 15 minutes, and then click Search & Analysis to view the results.
The following examples show how to search audit logs:
-
Query the operations of a RAM user: enter the RAM user ID and click Search & Analysis.
-
Query the operations on a resource: enter the name of a compute, network, storage, or access control resource, and click Search & Analysis.
-
Filter out the operations that are performed by system components: enter
NOT user.username: node NOT user.username: serviceaccount NOT user.username: apiserver NOT user.username: kube-scheduler NOT user.username: kube-controller-managerand click Search & Analysis.
For more information about query and analysis methods, see Query and analyze logs.
-
(Optional) Step 4: Configure alerting
To receive real-time alerts for operations on specific resources, you can use the alerting feature of Simple Log Service. Supported notification methods include DingTalk chatbots, custom webhooks, and Notification Center. For more information, see Create an alert rule.
Example 1: Alert on command execution in a container
Scenario: An enterprise prohibits users from executing commands in containers. If a user runs a command, an alert must be sent immediately with details such as the container in which the user is logged on, the executed command, the operator, the event ID, the time, and the source IP address.
-
Query statement:
verb : create and objectRef.subresource:exec and stage: ResponseStarted | SELECT auditID as "Event ID", date_format(from_unixtime(__time__), '%Y-%m-%d %T' ) as "Time", regexp_extract("requestURI", '([^\?]*)/exec\?.*', 1)as "Resource", regexp_extract("requestURI", '\?(.*)', 1)as "Command" ,"responseStatus.code" as "Status Code", CASE WHEN "user.username" != 'kubernetes-admin' then "user.username" WHEN "user.username" = 'kubernetes-admin' and regexp_like("annotations.authorization.k8s.io/reason", 'RoleBinding') then regexp_extract("annotations.authorization.k8s.io/reason", ' to User "(\w+)"', 1) ELSE 'kubernetes-admin' END as "User Account", CASE WHEN json_array_length(sourceIPs) = 1 then json_format(json_array_get(sourceIPs, 0)) ELSE sourceIPs END as "Source Address" order by "Time" desc limit 10000 -
Trigger condition:
"Event ID" =~ ".*".
Example 2: Alert on failed public access to the API server
Scenario: A cluster with public access must be monitored for malicious attacks. An alert is required if access attempts from the internet reach a threshold, such as 10, and the failure rate exceeds a threshold, such as 50%. The alert must include information such as the region of the source IP address, the source IP address, and whether the IP address is a high-risk IP address.
-
Query statement:
* | select ip as "Source Address", total as "Access Attempts", round(rate * 100, 2) as "Failure Rate (%)", failCount as "Illegal Access Attempts", CASE when security_check_ip(ip) = 1 then 'yes' else 'no' end as "High-risk IP", ip_to_country(ip) as "Country", ip_to_province(ip) as "Province", ip_to_city(ip) as "City", ip_to_provider(ip) as "ISP" from (select CASE WHEN json_array_length(sourceIPs) = 1 then json_format(json_array_get(sourceIPs, 0)) ELSE sourceIPs END as ip, count(1) as total, sum(CASE WHEN "responseStatus.code" < 400 then 0 ELSE 1 END) * 1.0 / count(1) as rate, count_if("responseStatus.code" = 403) as failCount from log group by ip limit 10000) where ip_to_domain(ip) != 'intranet' and ip not LIKE '%,%' ORDER by "Access Attempts" desc limit 10000 -
Trigger condition:
"Source Address" =~ ".*".
Related operations
Change the Simple Log Service project
To migrate API Server audit logs to a different Simple Log Service project, you can use the Change Log Service Project feature.
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left navigation pane, click .
-
On the Audit page, click the Cluster Auditing tab, and then click Change Log Service Project to migrate the cluster audit logs to another Simple Log Service project.
Disable cluster auditing
If you no longer need the API Server auditing feature, you can disable it.
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left navigation pane, click .
-
On the Audit page, click the Cluster Auditing tab, and then click Disable Cluster Auditing to disable the cluster auditing feature for the current cluster.
Use third-party logging in ACK dedicated clusters
ACK recommends that you use Alibaba Cloud Simple Log Service (SLS) to record cluster audit logs. However, if you want to use a third-party log service, you do not need to use SLS when you deploy your cluster. You can integrate other logging solutions to collect and query audit logs based on your business requirements. The source files of the audit logs for each control plane node are in the standard JSON format and stored in the /var/log/kubernetes/kubernetes.audit directory.
Reference: API server audit configurations of ACK dedicated clusters
When you configure cluster components for an ACK dedicated cluster, the Enable Log Service option is selected by default on the console. This enables the API Server auditing feature, which collects event data based on an audit policy and writes the data to an audit backend.
The following features involve changing the startup parameters of kube-apiserver and apply only to ACK dedicated clusters. For ACK managed clusters and ACK Serverless clusters, the control plane is managed by ACK and you cannot manually modify the startup parameters.
Audit policy
An audit policy defines the auditing configuration and rules for collecting requests. Log collection rules vary by audit level. The following audit levels are available.
|
Audit level |
Log collection rule |
|
None |
Events that match the rule are not collected. |
|
Metadata |
Collects request metadata, such as user information and timestamps, but does not collect the request body or response body. |
|
Request |
Collects request metadata and the request body, but not the response body. This does not apply to non-resource requests. |
|
RequestResponse |
Collects request metadata, the request body, and the response body. This does not apply to non-resource requests. |
You can use the --audit-policy-file command-line flag to save the following sample YAML file as a startup parameter for the API Server. After you log on to a control plane node, you can find the audit policy configuration file in the /etc/kubernetes/audit-policy.yml directory. The following sample shows the content of an audit policy configuration file.
Logs are not immediately recorded after a request is received. Recording starts only after the response header is sent.
The system does not audit a large number of redundant kube-proxy watch requests, GET requests for nodes from kubelet and system:nodes, endpoint operations that are performed by kube components in the kube-system namespace, or GET requests for namespaces from the API Server.
For sensitive APIs such as authentication, rbac, certificates, autoscaling, and storage, the system records the corresponding request and response bodies based on read and write operations.
Audit backend
Audit events are collected and stored in the audit backend file system. The log files are in the standard JSON format. You can configure and use the following flags as startup parameters for the API Server.
After you log on to a control plane node, you can view the API Server configuration file at /etc/kubernetes/manifests/kube-apiserver.yaml.
|
Parameter |
Description |
|
|
The maximum number of audit log files that can be stored. The value is 10. |
|
|
The maximum size of an audit log file before it is rotated. The value is 100 MB. |
|
|
The path to which audit logs are written. The path is |
|
|
The maximum number of days to retain old audit log files. The value is 7. |
|
|
The path to the file that defines the audit policy. The path is |
Related documents
-
To audit commands that are executed inside a container by using
kubectl exec, see Use the container operation auditing feature. -
For security best practices for enterprise O&M personnel, see Security best practices.