This topic describes how to activate, monitor, and consume service logs in Log Service. You can use this feature to view the service status of Log Service in real time and improve O&M efficiency.

Activate the service log feature

Service logs are divided into detailed logs and important logs. Important logs include Logtail-related logs, consumption delay logs of consumer groups, and metering logs. Detailed logs are stored in the Logstore named internal-operation_log. Important logs are stored in the Logstore named internal-diagnostic_log. The internal-diagnostic_log Logstore is free of charge. The internal-operation_log Logstore is charged as common Logstores. Detailed logs are log data generated when you make API requests. Multiple operations logs are generated when you make multiple read and write requests.001You can enable the service log feature as required.2We recommend that you select Automatic creation (recommended) for Log Storage Location. In this way, service logs generated in the same region are stored in the same project. This simplifies log management and analysis.

Monitor the Logtail heartbeat

After Logtail is installed, you can view the status logs of Logtail to check the status of Logtail.

The status logs of Logtail are stored in the internal-diagnostic_log Logstore. You can execute the query statement __topic__: logtail_status on the Search & Analysis page of the internal-diagnostic_log Logstore to query Logtail status logs. You can query the number of normally running servers within a period of time. Then, you can compare this number with the number of servers in the server group to which Logtail configurations are applied. You can also configure alerts to monitor the status of Logtail. For example, an alert is triggered if the number of normally running servers is smaller than the number of servers in the server group.
  • An example query statement is shown as follows:
    __topic__: logtail_status | SELECT COUNT(DISTINCT ip) as ip_count
  • The following figure shows the result of the example query statement.Search & Analysis
  • The following figure shows the alert rule configurations. In this example, the number of servers in the server group is 100.Create an alert
If an alert is triggered, you can view the status of servers in the server group in the console and check the heartbeat information of the servers.

View the consumption delay of consumer groups

After logs are written to Log Service, you can search, analyze, and consume logs. Log Service provides consumer groups developed in multiple programming languages.

When you use consumer groups to consume log data, you can view the consumption delay to check the consumption progress. If the delay is high, you can increase the consumption speed by increasing the number of consumers.

Consumption delay logs of a consumer group are updated every 2 minutes and stored in the internal-diagnostic_log Logstore. You can execute the query statement __topic__: consumergroup_log on the Search & Analysis page of the internal-diagnostic_log Logstore to query log data about the delay of all consumer groups.

Query consumption delay logs of the consumer group test-consumer-group:
  • An example query statement is shown as follows:
    __topic__: consumergroup_log and consumer_group:  test-consumer-group | SELECT max_by(fallbehind, __time__) as fallbehind
  • The following figure shows the result of the example query statement.3

Monitor Logtail exceptions

Log data cannot be collected if Logtail is not working as expected. To avoid this case, you need to detect Logtail exceptions in time and modify the Logtail configurations.

You can execute the query statement __topic__: logtail_alarm to query the exception logs of Logtail.

Query the number of exceptions within 15 minutes.
  • An example query statement is shown as follows:
    __topic__: logtail_alarm | select sum(alarm_count)as errorCount, alarm_type  GROUP BY alarm_type
  • The following figure shows the result of the example query statement.4

Monitor data writes to a Logstore

You can use metering logs to analyze read and write data traffic within 1 hour. However, if you want to analyze data traffic in a shorter time period, such as 15 minutes, the operations logs are required. An operations log entry is generated when you make an API request.
  1. Query the traffic of raw logs and compressed logs written to a Logstore in the last 15 minutes.
    • An example query statement is shown as follows:
      Method: PostLogStoreLogs AND Project: my-project and LogStore: my-logstore | SELECT sum(InFlow) as raw_bytes, sum(NetInflow) as network_bytes
    • The following figure shows the result of the example query statement.5
  2. Query the traffic decline ratio of the log data written to a Logstore in the last 15 minutes.
    • An example query statement is shown as follows:
      Method: PostLogStoreLogs AND Project: my-project and LogStore: my-logstore | select round((diff[1]-diff[2])/diff[1],2) as rate from (select compare(network_bytes, 900) as diff from (select sum(NetInflow) as network_bytes from log))
    • The following figure shows the result of the example query statement.7
  3. Create an alert.
    Configure the alert rule that triggers an alert if the logs written to a Logstore decreases by more than 50%.Create an alert

Audit operations logs

The logs about operations on all resources in a project are stored in the internal-operation_log Logstore. An operations log entry contains the information about the related resources on which an operation is performed and the user who performs the operation. For example, when a user creates a server group, the name of the server group and the user information are logged. When a user operates on a Logstore, the name of the Logstore and the user information is logged. The following table lists the types of user information.
Type Field
Alibaba Cloud account
  • InvokerUid: the unique ID of the Alibaba Cloud account.
  • CallerType: Parent
RAM user
  • InvokerUid: the unique ID of the RAM user.
  • CallerType: Subuser.
Sts
  • InvokerUid: the unique ID of the Alibaba Cloud account.
  • CallerType: Sts.
  • RoleSessionName: the name of the session.
You can retrieve the user information in operations logs based on the preceding table.