The Kubernetes API (KubeAPI) operation audit feature of Alibaba Cloud Service Mesh (ASM) allows you to record and trace routine operations of users. This is an important feature that ensures secure cluster O&M. This topic describes how to enable the mesh audit feature, view audit reports and logs, and set alerts.
Prerequisites
Background information
- Resources in this topic refer to Istio resources, including virtual services, Istio gateways, destination rules, envoy filters, sidecar proxies, and service entries.
- After you enable the mesh audit feature, you are charged for the audit logs that are generated. For more information about the billing method, see Pay-as-you-go.
Enable the KubeAPI operation audit feature
View KubeAPI audit reports
In the ASM console, you can view audit reports from different dimensions on the following tabs: Overview, Operation Audit, Operation Overview, and Operation Details.
View audit logs
If you want to query and analyze audit logs, go to the Log Service console to view detailed logs.
Set alerts
Log Service allows you to set alerts to monitor the operations that are performed on specific resources in real time. Alerts can be sent by using SMS messages, DingTalk chatbots, emails, custom webhooks, and Message Center of the Alibaba Cloud Management Console. For more information, see Overview.
You can also execute query statements in audit reports to query audit logs.
- Example 1: Set an alert on command execution on containers
A company requires strict access control on its ASM instances and does not allow users to log on to or run commands on containers in the ASM instances. The company wants to be notified of an alert immediately if a user attempts to log on to or run commands on a container. The alert notification is required to include the following information: the container that was logged on to, executed commands, operator, event ID, operation time, and source IP address.
- Sample query statement:
verb : create and objectRef.subresource:exec and stage: ResponseStarted | SELECT auditID as "Event ID", date_format(from_unixtime(__time__), '%Y-%m-%d %T' ) as "Operation time", regexp_extract("requestURI", '([^\?]*)/exec\?.*', 1)as "Resource", regexp_extract("requestURI", '\?(.*)', 1)as "Command" ,"responseStatus.code" as "Status code", CASE WHEN "user.username" != 'kubernetes-admin' then "user.username" WHEN "user.username" = 'kubernetes-admin' and regexp_like("annotations.authorization.k8s.io/reason", 'RoleBinding') then regexp_extract("annotations.authorization.k8s.io/reason", ' to User "(\w+)"', 1) ELSE 'kubernetes-admin' END as "Username", CASE WHEN json_array_length(sourceIPs) = 1 then json_format(json_array_get(sourceIPs, 0)) ELSE sourceIPs END as "Source IP address" limit 100
- Sample conditional expression:
Operation event =~ ".*"
- Sample query statement:
- Example 2: Set an alert on failed Internet access to the API server
To prevent attacks on an ASM instance that allows Internet access, a company monitors the number of connections from a source IP address over the Internet and the connection failure rate. The company requires alerts to be sent immediately when the number of connections from the source IP address and the connection failure rate exceed the specified thresholds. For example, the company requires an alert to be sent when the number of connections from a source IP address reaches 10 and more than five of the connections failed. The alert notification is required to include the following information: the source IP address, the region to which the source IP address belongs, and whether the source IP address is risky.
- Sample query statement:
* | select ip as "Source IP address", total as "Number of Internet connection requests", round(rate * 100, 2) as "Connection failure rate", failCount as "Number of unauthorized Internet connection requests", CASE when security_check_ip(ip) = 1 then 'yes' else 'no' end as "Whether the IP address is risky", ip_to_country(ip) as "Country", ip_to_province(ip) as "Province", ip_to_city(ip) as "City", ip_to_provider(ip) as "ISP" from (select CASE WHEN json_array_length(sourceIPs) = 1 then json_format(json_array_get(sourceIPs, 0)) ELSE sourceIPs END as ip, count(1) as total, sum(CASE WHEN "responseStatus.code" < 400 then 0 ELSE 1 END) * 1.0 / count(1) as rate, count_if("responseStatus.code" = 403) as failCount from log group by ip limit 10000) where ip_to_domain(ip) != 'intranet' having "Number of connections" > 10 and "Connection failure rate" > 50 ORDER by "Number of connections" desc limit 100
- Sample conditional expression:
Source IP address =~ ".*"
- Sample query statement:
Recreate a deleted project
If you accidentally delete a project that is used for mesh audit from Log Service but still want to use the mesh audit feature, you must recreate the deleted project.