The audit log of an API server in a Kubernetes cluster helps administrators track operations performed by different users. This plays an important role in the security and maintenance of the cluster. This topic describes how to configure cluster auditing, how to use Log Service to collect and analyze audit logs, how to set custom alert rules based on audit logs, and how to disable cluster auditing.
Usage notes
- The cluster auditing feature is suitable for Container Service for Kubernetes (ACK) managed clusters and ACK dedicated clusters. For more information about how to configure cluster auditing for registered clusters, see Use cluster auditing in registered clusters.
- Cluster auditing cannot be disabled for a registered Kubernetes cluster after you enable this feature.
Configure parameters for cluster auditing
Parameter | Description |
---|---|
audit-log-maxbackup | A maximum of 10 audit log files can be retained. |
audit-log-maxsize | The maximum size of an audit log file is 100 MB. |
audit-log-path | The audit log files are stored in the /var/log/kubernetes/kubernetes.audit path. |
audit-log-maxage | Audit log files are retained for a maximum of seven days. |
audit-policy-file | The path of the audit policy file is /etc/kubernetes/audit-policy.yml. |
apiVersion: audit.k8s.io/v1beta1 # This is required.
kind: Policy
# Don't generate audit events for all requests in RequestReceived stage.
omitStages:
- "RequestReceived"
rules:
# The following requests were manually identified as high-volume and low-risk,
# so drop them.
- level: None
users: ["system:kube-proxy"]
verbs: ["watch"]
resources:
- group: "" # core
resources: ["endpoints", "services"]
- level: None
users: ["system:unsecured"]
namespaces: ["kube-system"]
verbs: ["get"]
resources:
- group: "" # core
resources: ["configmaps"]
- level: None
users: ["kubelet"] # legacy kubelet identity
verbs: ["get"]
resources:
- group: "" # core
resources: ["nodes"]
- level: None
userGroups: ["system:nodes"]
verbs: ["get"]
resources:
- group: "" # core
resources: ["nodes"]
- level: None
users:
- system:kube-controller-manager
- system:kube-scheduler
- system:serviceaccount:kube-system:endpoint-controller
verbs: ["get", "update"]
namespaces: ["kube-system"]
resources:
- group: "" # core
resources: ["endpoints"]
- level: None
users: ["system:apiserver"]
verbs: ["get"]
resources:
- group: "" # core
resources: ["namespaces"]
# Don't log these read-only URLs.
- level: None
nonResourceURLs:
- /healthz*
- /version
- /swagger*
# Don't log events requests.
- level: None
resources:
- group: "" # core
resources: ["events"]
# Secrets, ConfigMaps, and TokenReviews can contain sensitive & binary data,
# so only log at the Metadata level.
- level: Metadata
resources:
- group: "" # core
resources: ["secrets", "configmaps"]
- group: authentication.k8s.io
resources: ["tokenreviews"]
# Get repsonses can be large; skip them.
- level: Request
verbs: ["get", "list", "watch"]
resources:
- group: "" # core
- group: "admissionregistration.k8s.io"
- group: "apps"
- group: "authentication.k8s.io"
- group: "authorization.k8s.io"
- group: "autoscaling"
- group: "batch"
- group: "certificates.k8s.io"
- group: "extensions"
- group: "networking.k8s.io"
- group: "policy"
- group: "rbac.authorization.k8s.io"
- group: "settings.k8s.io"
- group: "storage.k8s.io"
# Default level for known APIs
- level: RequestResponse
resources:
- group: "" # core
- group: "admissionregistration.k8s.io"
- group: "apps"
- group: "authentication.k8s.io"
- group: "authorization.k8s.io"
- group: "autoscaling"
- group: "batch"
- group: "certificates.k8s.io"
- group: "extensions"
- group: "networking.k8s.io"
- group: "policy"
- group: "rbac.authorization.k8s.io"
- group: "settings.k8s.io"
- group: "storage.k8s.io"
# Default level for all other requests.
- level: Metadata
- Requests are not logged upon reception. Requests are logged only after response headers are sent.
- The following types of requests are not logged: watch requests by kube-proxy, GET requests by kubelet and system:nodes for node resources, operations on endpoint resources in the kube-system namespace by kube components, and GET requests by kube-apiserver for namespace resources.
- Requests with read-only URLs that match /healthz*, /version*, or /swagger* are not logged.
- Requests for Secrets, ConfigMaps, and TokenReview resources are logged at the metadata level because these resources may contain sensitive information or binary files. Cluster auditing logs only the request metadata, such as requesting users, timestamps, requested resources, and actions. The request body or the response body is not logged.
- Sensitive requests related to authentication, role-based access control (RBAC), certificates, auto scaling, and storage resources are logged, including the request body and the response body.
View audit log reports
ACK provides four audit log reports for each cluster. You can find the following information in these reports:
- Important operations performed by users and system components on the cluster.
- The source IP addresses of these operations and the regional distribution of these IP addresses.
- The details of operations on each type of resource.
- Operations performed by Resource Access Management (RAM) users.
- Details of important operations, such as container logon, Secret retrieval, and resource deletion.
- CVE vulnerabilities.
- By default, the Enable Log Service check box is selected when you create a cluster. This automatically enables cluster auditing. For more information about the billing rules of Log Service, see Billing rules. If Log Service is not activated, see Enable cluster auditing.
- Do not modify audit log reports. If you want to customize audit log reports, log on to the Log Service console to create new reports.
- Log on to the ACK console. In the left-side navigation pane, click Clusters. On the Clusters page, find the cluster that you want to view and choose in the Actions column.
- Log on to the ACK console. In the left-side navigation pane, click Clusters. On the Clusters page, click the name of the cluster that you want to view. In the left-side navigation pane of the cluster details page, choose .
Audit log reports
The Cluster Auditing page displays four audit log reports on four tabs: Overview, Operations Overview, Operation Details, and CVE Vulnerabilities.
- Overview
This report provides an overview of the events in the cluster and the details of important events, such as requests from the Internet, command executions, resource deletions, Secret retrieval, and Common Vulnerabilities and Exposures (CVE) vulnerabilities.
Note By default, the report displays statistics of the last seven days. You can specify a time period and view the statistics of the period. You can filter the statistics by namespace, RAM user ID, and status code. You can also select one or more items to filter the statistics. - Operations Overview
This report provides statistics about common operations on computing resources, network resources, and storage resources in the cluster. The operations include creating resources, updating resources, deleting resources, and accessing resources. The following information is displayed:
- Computing resources include Deployment, StatefulSet, CronJob, DaemonSet, Job, and pod.
- Network resources include Service and Ingress.
- Storage resources include ConfigMap, Secret, and persistent volume claim (PVC).
Note- By default, the report displays statistics of the last seven days. You can specify a time period and view the statistics of the period. You can filter the statistics by namespace and RAM user ID. You can also select one or more items to filter the statistics.
- To view details of the operations on a resource, go to the Operation Details report.
- Operation Details
This report provides operation details on a specific resource type. You can specify a resource type to query operation details in real time. The report contains the total number of operations, distribution of namespaces, operation success rate, temporal order of operations, and other operation details.
Note- To query operations about CustomResourceDefinition (CRD) resources registered in Kubernetes or resources that are not listed in the report, enter the plural form of the resource name. For example, to query operations about a CRD resource named AliyunLogConfig, enter AliyunLogConfigs.
- By default, the report displays statistics of the last seven days. You can specify a time period and view the statistics of the period. You can filter the statistics by namespace, RAM user ID, and status code. You can also select one or more items to filter the statistics.
- CVE Vulnerabilities
This report displays the CVE vulnerabilities. You can select or specify a RAM user ID to filter the vulnerabilities. The page displays the Kubernetes CVE vulnerabilities related to the RAM user that you select or specify. For more information about CVE vulnerabilities and solutions, see [CVE Securities] CVE vulnerability fixes.
View detailed log data
- Log on to the Log Service console.
- In the Projects section, find the project used by the cluster and click the project name.
- Choose audit-${clustered}. . Then, click the Logstore named Note
- During the cluster creation process, a Logstore named
audit-${clustereid}
is automatically created in the project. - By default, indexes are set up for the Logstore. Do not modify the indexes. Otherwise, reports may fail to be generated.
- During the cluster creation process, a Logstore named
- On the Logs tab, enter a query statement in the search box.
- Click 15 Minutes(Relative) to specify a time range for the query.
- Click Search & Analyze to view the query and analysis results.
You can query audit log data by using the following methods:
- To query the operations performed by a RAM user, enter the RAM user ID and click Search & Analyze.
- To query the operations performed on a resource, enter the resource name, and click Search & Analyze.
- To filter out the operations performed by system components, enter
NOT user.username: node NOT user.username: serviceaccount NOT user.username: apiserver NOT user.username: kube-scheduler NOT user.username: kube-controller-manager
, and click Search & Analyze.
For more information about how to query log data, see Query methods.
Configure alerting
You can configure Log Service to generate alerts on the operations that are performed on specific resources in real time. Available alert notification methods include DingTalk chatbots, custom webhooks, and Alibaba Cloud Message Center. For more information, see Configure alert rules.
Example 1: Alerts upon command execution on containers
To monitor command executions on containers, alerts must be sent at the earliest opportunity when a user attempts to log on to a container or run commands on a container. The alert notification must include the following information: the container to which the user logs on, the commands, the user name, the event ID, the operation time, and the source IP address.
- Sample query statement:
verb : create and objectRef.subresource:exec and stage: ResponseStarted | SELECT auditID as "Event ID", date_format(from_unixtime(__time__), '%Y-%m-%d %T' ) as "Operation time", regexp_extract("requestURI", '([^\?]*)/exec\?.*', 1)as "Resource", regexp_extract("requestURI", '\?(.*)', 1)as "Command" ,"responseStatus.code" as "Status code", CASE WHEN "user.username" != 'kubernetes-admin' then "user.username" WHEN "user.username" = 'kubernetes-admin' and regexp_like("annotations.authorization.k8s.io/reason", 'RoleBinding') then regexp_extract("annotations.authorization.k8s.io/reason", ' to User "(\w+)"', 1) ELSE 'kubernetes-admin' END as "User name", CASE WHEN json_array_length(sourceIPs) = 1 then json_format(json_array_get(sourceIPs, 0)) ELSE sourceIPs END as "Source IP address" limit 100
- The conditional expression is
Event =~ ".*"
.
Example 2: Alerts upon failed Internet connection requests to the API server
To protect a cluster against attacks launched from the Internet, you can monitor the number of Internet connection requests and the connection failure rate. Alerts are generated if the number of Internet connection requests and the connection failure rate exceed the specified thresholds. The alert notification must include the following information: the source IP address, the region of the source IP address, and whether the source IP address is malicious. In the following query statement, alerts are generated if the number of Internet connection requests exceeds 10 and the connection failure rate exceeds 50%.
- Sample query statement:
* | select ip as "Source IP address", total as "Number of connections", round(rate * 100, 2) as "Connection failure rate", failCount as "Number of invalid connections", CASE when security_check_ip(ip) = 1 then 'yes' else 'no' end as "Whether the IP address is risky", ip_to_country(ip) as "Country", ip_to_province(ip) as "Province", ip_to_city(ip) as "City", ip_to_provider(ip) as "ISP" from (select CASE WHEN json_array_length(sourceIPs) = 1 then json_format(json_array_get(sourceIPs, 0)) ELSE sourceIPs END as ip, count(1) as total, sum(CASE WHEN "responseStatus.code" < 400 then 0 ELSE 1 END) * 1.0 / count(1) as rate, count_if("responseStatus.code" = 403) as failCount from log group by ip limit 10000) where ip_to_domain(ip) != 'intranet' having "Number of connections" > 10 and "Connection failure rate × 100" > 50 ORDER by "Number of connections" desc limit 100
- The conditional expression is
source IP address =~ ".*"
.
Enable cluster auditing
By default, Enable Log Service is selected when you create a cluster. In this case, kube-apiserver automatically collects audit logs from the cluster. If cluster auditing is disabled, perform the following steps to enable this feature:
- Log on to the ACK console and click Clusters in the left-side navigation pane.
- On the Clusters page, click the name of a cluster and choose in the left-side navigation pane. If cluster auditing is disabled, you are prompted to enable this feature.Important
Make sure that your Alibaba Cloud account has sufficient Log Service quotas. If a Log Service quota is exhausted, you fail to enable cluster auditing.
- The quota on Log Service projects.
- The quota on Logstores in each Log Service project.
- The quota on dashboards in each Log Service project.
- Click Enable. Select an existing project or create a project, and then click OK. If the following page appears, cluster auditing is enabled.
Change the Log Service project
If you want to migrate the audit logs to another Log Service project, you can use the Change Log Service Project feature in cluster auditing.
- Log on to the ACK console and click Clusters in the left-side navigation pane.
- On the Clusters page, click the name of a cluster and choose in the left-side navigation pane.
- In the upper-right corner of the Cluster Auditing page, click Change Log Service Project. Then, you can migrate the audit log data to another Log Service project.
Disable cluster auditing
If cluster auditing is no longer required, you can perform the following steps to disable cluster auditing:
- Log on to the ACK console and click Clusters in the left-side navigation pane.
- On the Clusters page, click the name of a cluster and choose in the left-side navigation pane.
- In the upper-right corner of the Cluster Auditing page, click Disable Cluster Auditing.
Billing rules
- On the bills overview page, you can view the billing information about audit log data. For more information, see View your bills.
- For more information about the billing methods of audit log data, see Pay-as-you-go.
Support for third-party logging services
You can find the source audit log file in the /var/log/kubernetes/kubernetes.audit path of a master node. This file is in standard JSON format. When you create a cluster, you can specify a third-party logging service to collect and retrieve log data.