The mesh audit feature of Alibaba Cloud Service Mesh (ASM) allows you to record and trace routine operations of users. This is an important feature that ensures secure cluster O&M. This topic describes how to enable the mesh audit feature, view audit reports and logs, and set alerts.

Prerequisites

Log Service is activated.

Background information

  • Resources in this topic refer to Istio resources, including virtual services, Istio gateways, destination rules, envoy filters, sidecar proxies, and service entries.
  • After you enable the mesh audit feature, you are charged for the audit logs that are generated. For more information about the billing method, see Pay-as-you-go.

Enable the mesh audit feature

When you create an ASM instance, you can enable the mesh audit feature in the Create ASM Instance panel. For more information, see Create an ASM instance.
Note By default, a project named mesh-log-${mesh-id} is created in Log Service, and a Logstore named audit-${mesh-id} is created in the project to store audit logs.

View audit reports

In the ASM console, you can view audit reports from different dimensions on the following tabs: Overview, Operation Audit, Operation Overview, and Operation Details.

  1. Log on to the ASM console.
  2. In the left-side navigation pane, choose Service Mesh > Mesh Management.
  3. On the Mesh Management page, find the ASM instance that you want to configure. Click the name of the ASM instance or click Manage in the Actions column of the ASM instance.
  4. In the left-side navigation pane, click Mesh Audit.
  5. On the Mesh Audit page, select Enable Mesh Audit and click OK.
  6. Click the Overview tab. This tab provides an overview of events in the ASM instance and detailed information about key events such as access from the Internet, command execution, and resource deletion.
    Overview tab
  7. Click the Operation Audit tab. On this tab, you can view detailed information about the operations of a specified account on the ASM instance, such as resource creation, modification, and deletion. You can also view the distribution of operations by namespace and the distribution of access by geographical location.
    Operation Audit tab
  8. Click the Operation Overview tab. On this tab, you can view the statistics of operations on main resources in the ASM instance.
    Note You can use the following methods to filter the operation statistics:
    • Specify the time range. By default, the statistics in the last seven days are displayed.
    • Specify the namespace and RAM user ID.
    • Specify one or more filters.
    Operation Overview tab
  9. Click the Operation Details tab. On this tab, you can view detailed information about the operations on specified resources in the ASM instance.
    You can specify the resource type to query detailed information about operations on such resources in real time. You can view the total number of operations, the distribution of operations by namespace, operation success rate, temporal order of operations, and detailed operation lists.
    Note You can use the following methods to filter the operation statistics:
    • Specify the time range. By default, the statistics in the last seven days are displayed.
    • Specify the namespace and RAM user ID.
    • Specify one or more filters.
    Operation Details tab

View audit logs

If you want to query and analyze audit logs, go to the Log Service console to view detailed logs.

  1. Log on to the Log Service console.
  2. On the Projects tab, click the project that is named mesh-log-${mesh-id}.
  3. Click the audit-${mesh-id} Logstore that is created for the ASM instance. Then, click the Search & Analysis icon to view the audit logs.
    Search & Analysis icon
    Note
    • When you enable the mesh audit feature, a Logstore that is named audit-${mesh-id} is created in the specified project.
    • By default, indexes are already set up in the Logstore. Do not modify the indexes. Otherwise, reports may fail to be generated from the audit logs.
    You can use the following methods to query audit logs:
    • To query the operations that are performed by a RAM user, enter the RAM user ID in the search box and click Search & Analyze.
    • To query the operations that are performed on a resource, enter the resource name in the search box and click Search & Analyze.
    • To query the operations that are performed by system components, enter NOT user.username: node NOT user.username: serviceaccount NOT user.username: apiserver NOT user.username: kube-scheduler NOT user.username: kube-controller-manager in the search box and click Search & Analyze.

    For more information about the query and statistical methods, see Log search overview.

Set alerts

Log Service allows you to set alerts to monitor the operations that are performed on specific resources in real time. Alerts can be sent by using SMS messages, DingTalk chatbots, emails, custom webhooks, and Message Center of the Alibaba Cloud Management Console. For more information, see Overview.

You can also execute query statements in audit reports to query audit logs.

  • Example 1: Set an alert on command execution on containers

    A company requires strict access control on its ASM instances and does not allow users to log on to or run commands on containers in the ASM instances. The company wants to be notified of an alert immediately if a user attempts to log on to or run commands on a container. The alert notification is required to include the following information: the container that was logged on to, executed commands, operator, event ID, operation time, and source IP address.

    • The following code shows you a sample query statement:
      verb : create and objectRef.subresource:exec and stage:  ResponseStarted | SELECT auditID as "Event ID", date_format(from_unixtime(__time__), '%Y-%m-%d %T' ) as "Operation time",  regexp_extract("requestURI", '([^\?]*)/exec\?.*', 1)as "Resource",  regexp_extract("requestURI", '\?(.*)', 1)as "Command" ,"responseStatus.code" as "Status code",
       CASE 
       WHEN "user.username" != 'kubernetes-admin' then "user.username"
       WHEN "user.username" = 'kubernetes-admin' and regexp_like("annotations.authorization.k8s.io/reason", 'RoleBinding') then regexp_extract("annotations.authorization.k8s.io/reason", ' to User "(\w+)"', 1)
       ELSE 'kubernetes-admin' END  
       as "Username", 
      CASE WHEN json_array_length(sourceIPs) = 1 then json_format(json_array_get(sourceIPs, 0)) ELSE  sourceIPs END
      as "Source IP address" limit 100
    • The following code shows you a sample conditional expression:
      event =~ "*"
  • Example 2: Set an alert on failed Internet access to the API server

    To prevent attacks on an ASM instance that allows Internet access, a company monitors the number of connections from a source IP address over the Internet and the connection failure rate. The company requires alerts to be sent immediately when the number of connections from the source IP address and the connection failure rate exceed the specified thresholds. For example, the company requires an alert to be sent when the number of connections from a source IP address reaches 10 and more than five of the connections failed. The alert notification is required to include the following information: the source IP address, the region to which the source IP address belongs, and whether the source IP address is risky.

    • The following code shows you a sample query statement:
      * | select ip as "Source IP address", total as "Number of Internet connection requests", round(rate * 100, 2) as "Connection failure rate", failCount as "Number of unauthorized Internet connection requests", CASE when security_check_ip(ip) = 1 then 'yes' else 'no' end  as "Whether the IP address is risky",  ip_to_country(ip) as "Country", ip_to_province(ip) as "Province", ip_to_city(ip) as "City", ip_to_provider(ip) as "ISP" from (select CASE WHEN json_array_length(sourceIPs) = 1 then json_format(json_array_get(sourceIPs, 0)) ELSE  sourceIPs END
      as ip, count(1) as total,
      sum(CASE WHEN "responseStatus.code" < 400 then 0 
      ELSE 1 END) * 1.0 / count(1) as rate,
      count_if("responseStatus.code" = 403) as failCount
      from log  group by ip limit 10000) where ip_to_domain(ip) != 'intranet'  having "Number of connections" > 10 and "Connection failure rate × 100" > 50 ORDER by "Number of connections" desc limit 100
    • The following code shows you a sample conditional expression:
      source IP address =~ "*"

Recreate a deleted project

If you accidentally delete a project that is used for mesh audit from Log Service but still want to use the mesh audit feature, you must recreate the deleted project.

  1. Log on to the ASM console.
  2. In the left-side navigation pane, choose Service Mesh > Mesh Management.
  3. On the Mesh Management page, find the ASM instance that you want to configure. Click the name of the ASM instance or click Manage in the Actions column of the ASM instance.
  4. In the left-side navigation pane, click Mesh Audit.
  5. On the Mesh Audit page, click Recreate in the Rebuild Mesh Audit message.
    The recreated project is named after the project name before deletion and suffixed with the timestamp when the project is recreated.