How to use cluster auditing - Container Service for Kubernetes

The audit log of an API server in a Kubernetes cluster helps administrators track operations performed by different users. This plays an important role in the security and maintenance of the cluster. This topic describes how to configure cluster auditing, how to use Log Service to collect and analyze audit logs, how to set custom alert rules based on audit logs, and how to disable cluster auditing.

Usage notes

The cluster auditing feature is suitable for Container Service for Kubernetes (ACK) managed clusters and ACK dedicated clusters. For more information about how to configure cluster auditing for registered clusters, see Use cluster auditing in registered clusters.
Cluster auditing cannot be disabled for a registered Kubernetes cluster after you enable this feature.

Configure parameters for cluster auditing

By default, Enable Log Service is selected when you create a cluster. This indicates that kube-apiserver automatically collects audit logs from the cluster. The following table describes the parameters of cluster auditing.

Note Log on to a master node. You can find the configuration file of kube-apiserver in the following path: /etc/kubernetes/manifests/kube-apiserver.yaml.

Parameter	Description
audit-log-maxbackup	A maximum of 10 audit log files can be retained.
audit-log-maxsize	The maximum size of an audit log file is 100 MB.
audit-log-path	The audit log files are stored in the /var/log/kubernetes/kubernetes.audit path.
audit-log-maxage	Audit log files are retained for a maximum of seven days.
audit-policy-file	The path of the audit policy file is /etc/kubernetes/audit-policy.yml.

Log on to a master node. You can find the audit policy file in the following path: /etc/kubernetes/audit-policy.yml. The audit policy file contains the following content:

apiVersion: audit.k8s.io/v1beta1 # This is required.
kind: Policy
# Don't generate audit events for all requests in RequestReceived stage.
omitStages:
  - "RequestReceived"
rules:
  # The following requests were manually identified as high-volume and low-risk,
  # so drop them.
  - level: None
    users: ["system:kube-proxy"]
    verbs: ["watch"]
    resources:
      - group: "" # core
        resources: ["endpoints", "services"]
  - level: None
    users: ["system:unsecured"]
    namespaces: ["kube-system"]
    verbs: ["get"]
    resources:
      - group: "" # core
        resources: ["configmaps"]
  - level: None
    users: ["kubelet"] # legacy kubelet identity
    verbs: ["get"]
    resources:
      - group: "" # core
        resources: ["nodes"]
  - level: None
    userGroups: ["system:nodes"]
    verbs: ["get"]
    resources:
      - group: "" # core
        resources: ["nodes"]
  - level: None
    users:
      - system:kube-controller-manager
      - system:kube-scheduler
      - system:serviceaccount:kube-system:endpoint-controller
    verbs: ["get", "update"]
    namespaces: ["kube-system"]
    resources:
      - group: "" # core
        resources: ["endpoints"]
  - level: None
    users: ["system:apiserver"]
    verbs: ["get"]
    resources:
      - group: "" # core
        resources: ["namespaces"]
  # Don't log these read-only URLs.
  - level: None
    nonResourceURLs:
      - /healthz*
      - /version
      - /swagger*
  # Don't log events requests.
  - level: None
    resources:
      - group: "" # core
        resources: ["events"]
  # Secrets, ConfigMaps, and TokenReviews can contain sensitive & binary data,
  # so only log at the Metadata level.
  - level: Metadata
    resources:
      - group: "" # core
        resources: ["secrets", "configmaps"]
      - group: authentication.k8s.io
        resources: ["tokenreviews"]
  # Get repsonses can be large; skip them.
  - level: Request
    verbs: ["get", "list", "watch"]
    resources:
      - group: "" # core
      - group: "admissionregistration.k8s.io"
      - group: "apps"
      - group: "authentication.k8s.io"
      - group: "authorization.k8s.io"
      - group: "autoscaling"
      - group: "batch"
      - group: "certificates.k8s.io"
      - group: "extensions"
      - group: "networking.k8s.io"
      - group: "policy"
      - group: "rbac.authorization.k8s.io"
      - group: "settings.k8s.io"
      - group: "storage.k8s.io"
  # Default level for known APIs
  - level: RequestResponse
    resources:
      - group: "" # core
      - group: "admissionregistration.k8s.io"
      - group: "apps"
      - group: "authentication.k8s.io"
      - group: "authorization.k8s.io"
      - group: "autoscaling"
      - group: "batch"
      - group: "certificates.k8s.io"
      - group: "extensions"
      - group: "networking.k8s.io"
      - group: "policy"
      - group: "rbac.authorization.k8s.io"
      - group: "settings.k8s.io"
      - group: "storage.k8s.io"
  # Default level for all other requests.
  - level: Metadata

Note

Requests are not logged upon reception. Requests are logged only after response headers are sent.
The following types of requests are not logged: watch requests by kube-proxy, GET requests by kubelet and system:nodes for node resources, operations on endpoint resources in the kube-system namespace by kube components, and GET requests by kube-apiserver for namespace resources.
Requests with read-only URLs that match /healthz*, /version*, or /swagger* are not logged.
Requests for Secrets, ConfigMaps, and TokenReview resources are logged at the metadata level because these resources may contain sensitive information or binary files. Cluster auditing logs only the request metadata, such as requesting users, timestamps, requested resources, and actions. The request body or the response body is not logged.
Sensitive requests related to authentication, role-based access control (RBAC), certificates, auto scaling, and storage resources are logged, including the request body and the response body.

View audit log reports

ACK provides four audit log reports for each cluster. You can find the following information in these reports:

Important operations performed by users and system components on the cluster.
The source IP addresses of these operations and the regional distribution of these IP addresses.
The details of operations on each type of resource.
Operations performed by Resource Access Management (RAM) users.
Details of important operations, such as container logon, Secret retrieval, and resource deletion.
CVE vulnerabilities.

Note

By default, the Enable Log Service check box is selected when you create a cluster. This automatically enables cluster auditing. For more information about the billing rules of Log Service, see Billing rules. If Log Service is not activated, see Enable cluster auditing.
Do not modify audit log reports. If you want to customize audit log reports, log on to the Log Service console to create new reports.

You can use one of the following methods to view audit log reports:

Log on to the ACK console. In the left-side navigation pane, click Clusters. On the Clusters page, find the cluster that you want to view and choose More > Cluster Auditing in the Actions column.
Log on to the ACK console. In the left-side navigation pane, click Clusters. On the Clusters page, click the name of the cluster that you want to view. In the left-side navigation pane of the cluster details page, choose Security > Cluster Auditing.

Audit log reports

The Cluster Auditing page displays four audit log reports on four tabs: Overview, Operations Overview, Operation Details, and CVE Vulnerabilities.

Overview
This report provides an overview of the events in the cluster and the details of important events, such as requests from the Internet, command executions, resource deletions, Secret retrieval, and Common Vulnerabilities and Exposures (CVE) vulnerabilities.
Note By default, the report displays statistics of the last seven days. You can specify a time period and view the statistics of the period. You can filter the statistics by namespace, RAM user ID, and status code. You can also select one or more items to filter the statistics.
Operations Overview
This report provides statistics about common operations on computing resources, network resources, and storage resources in the cluster. The operations include creating resources, updating resources, deleting resources, and accessing resources. The following information is displayed:
- Computing resources include Deployment, StatefulSet, CronJob, DaemonSet, Job, and pod.
- Network resources include Service and Ingress.
- Storage resources include ConfigMap, Secret, and persistent volume claim (PVC).
Note
- By default, the report displays statistics of the last seven days. You can specify a time period and view the statistics of the period. You can filter the statistics by namespace and RAM user ID. You can also select one or more items to filter the statistics.
- To view details of the operations on a resource, go to the Operation Details report.
Operation Details
This report provides operation details on a specific resource type. You can specify a resource type to query operation details in real time. The report contains the total number of operations, distribution of namespaces, operation success rate, temporal order of operations, and other operation details.
Note
- To query operations about CustomResourceDefinition (CRD) resources registered in Kubernetes or resources that are not listed in the report, enter the plural form of the resource name. For example, to query operations about a CRD resource named AliyunLogConfig, enter AliyunLogConfigs.
- By default, the report displays statistics of the last seven days. You can specify a time period and view the statistics of the period. You can filter the statistics by namespace, RAM user ID, and status code. You can also select one or more items to filter the statistics.
CVE Vulnerabilities
This report displays the CVE vulnerabilities. You can select or specify a RAM user ID to filter the vulnerabilities. The page displays the Kubernetes CVE vulnerabilities related to the RAM user that you select or specify. For more information about CVE vulnerabilities and solutions, see [CVE Securities] CVE vulnerability fixes.

View detailed log data

To customize queries or analyze audit log data, log on to the Log Service console and view detailed log data.

Note The default retention period of audit logs in the Logstore of Log Service is 30 days. For more information about how to change the log retention period, see Manage a Logstore.

Log on to the Log Service console.
In the Projects section, find the project used by the cluster and click the project name.
Choose Log Storage > Logstores. Then, click the Logstore named audit-${clustered}.
Note
- During the cluster creation process, a Logstore named audit-${clustereid} is automatically created in the project.
- By default, indexes are set up for the Logstore. Do not modify the indexes. Otherwise, reports may fail to be generated.
On the Logs tab, enter a query statement in the search box.
Click 15 Minutes(Relative) to specify a time range for the query.
Click Search & Analyze to view the query and analysis results.

You can query audit log data by using the following methods:

To query the operations performed by a RAM user, enter the RAM user ID and click Search & Analyze.
To query the operations performed on a resource, enter the resource name, and click Search & Analyze.
To filter out the operations performed by system components, enter NOT user.username: node NOT user.username: serviceaccount NOT user.username: apiserver NOT user.username: kube-scheduler NOT user.username: kube-controller-manager, and click Search & Analyze.

For more information about how to query log data, see Query methods.

Configure alerting

You can configure Log Service to generate alerts on the operations that are performed on specific resources in real time. Available alert notification methods include DingTalk chatbots, custom webhooks, and Alibaba Cloud Message Center. For more information, see Configure alert rules.

Note For more information about how to query audit log data, see the query statements in audit reports. You can perform the following steps to view query statements: On the project details page, click the Dashboard icon in the left-side navigation pane. Select a dashboard to go to the details page. Select a chart, click the More icon, and then click View Analysis Details.

Example 1: Alerts upon command execution on containers

To monitor command executions on containers, alerts must be sent at the earliest opportunity when a user attempts to log on to a container or run commands on a container. The alert notification must include the following information: the container to which the user logs on, the commands, the user name, the event ID, the operation time, and the source IP address.

Sample query statement:

verb : create and objectRef.subresource:exec and stage:  ResponseStarted | SELECT auditID as "Event ID", date_format(from_unixtime(__time__), '%Y-%m-%d %T' ) as "Operation time", regexp_extract("requestURI", '([^\?]*)/exec\?.*', 1)as "Resource",  regexp_extract("requestURI", '\?(.*)', 1)as "Command" ,"responseStatus.code" as "Status code",
 CASE 
 WHEN "user.username" != 'kubernetes-admin' then "user.username"
 WHEN "user.username" = 'kubernetes-admin' and regexp_like("annotations.authorization.k8s.io/reason", 'RoleBinding') then regexp_extract("annotations.authorization.k8s.io/reason", ' to User "(\w+)"', 1)
 ELSE 'kubernetes-admin' END  
 as "User name", 
CASE WHEN json_array_length(sourceIPs) = 1 then json_format(json_array_get(sourceIPs, 0)) ELSE  sourceIPs END
as "Source IP address" limit 100

The conditional expression is Event =~ ".*".

Example 2: Alerts upon failed Internet connection requests to the API server

To protect a cluster against attacks launched from the Internet, you can monitor the number of Internet connection requests and the connection failure rate. Alerts are generated if the number of Internet connection requests and the connection failure rate exceed the specified thresholds. The alert notification must include the following information: the source IP address, the region of the source IP address, and whether the source IP address is malicious. In the following query statement, alerts are generated if the number of Internet connection requests exceeds 10 and the connection failure rate exceeds 50%.

Sample query statement:

* | select ip as "Source IP address", total as "Number of connections", round(rate * 100, 2) as "Connection failure rate", failCount as "Number of invalid connections", CASE when security_check_ip(ip) = 1 then 'yes' else 'no' end  as "Whether the IP address is risky",  ip_to_country(ip) as "Country", ip_to_province(ip) as "Province", ip_to_city(ip) as "City", ip_to_provider(ip) as "ISP" from (select CASE WHEN json_array_length(sourceIPs) = 1 then json_format(json_array_get(sourceIPs, 0)) ELSE  sourceIPs END
as ip, count(1) as total,
sum(CASE WHEN "responseStatus.code" < 400 then 0 
ELSE 1 END) * 1.0 / count(1) as rate,
count_if("responseStatus.code" = 403) as failCount
from log  group by ip limit 10000) where ip_to_domain(ip) != 'intranet'  having "Number of connections" > 10 and "Connection failure rate × 100" > 50 ORDER by "Number of connections" desc limit 100

The conditional expression is source IP address =~ ".*".

Enable cluster auditing

By default, Enable Log Service is selected when you create a cluster. In this case, kube-apiserver automatically collects audit logs from the cluster. If cluster auditing is disabled, perform the following steps to enable this feature:

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, click the name of a cluster and choose Security > Cluster Auditing in the left-side navigation pane.
If cluster auditing is disabled, you are prompted to enable this feature.
Important
Make sure that your Alibaba Cloud account has sufficient Log Service quotas. If a Log Service quota is exhausted, you fail to enable cluster auditing.
- The quota on Log Service projects.
- The quota on Logstores in each Log Service project.
- The quota on dashboards in each Log Service project.
Click Enable. Select an existing project or create a project, and then click OK.
If the following page appears, cluster auditing is enabled.

Change the Log Service project

If you want to migrate the audit logs to another Log Service project, you can use the Change Log Service Project feature in cluster auditing.

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, click the name of the cluster that you want to manage and choose Security > Cluster Auditing in the left-side navigation pane.
In the upper-right corner of the Cluster Auditing page, click Change Log Service Project. Then, you can migrate the audit log data to another Log Service project.

Disable cluster auditing

If cluster auditing is no longer required, you can perform the following steps to disable cluster auditing:

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, click the name of the cluster that you want to manage and choose Security > Cluster Auditing in the left-side navigation pane.
In the upper-right corner of the Cluster Auditing page, click Disable Cluster Auditing.

Billing rules

On the bills overview page, you can view the billing information about audit log data. For more information, see View your bills.
For more information about the billing methods of audit log data, see Pay-by-feature.

Support for third-party logging services

You can find the source audit log file in the /var/log/kubernetes/kubernetes.audit path of a master node. This file is in standard JSON format. When you create a cluster, you can specify a third-party logging service to collect and retrieve log data.