Alibaba Cloud Elasticsearch allows you to specify a keyword and a time range in the Elasticsearch console to query specific logs of your Elasticsearch cluster. You can use the logs to identify cluster issues and perform cluster O&M in an efficient manner. This topic describes common types of logs and describes how to query logs.

Limits

  • You can view the access logs only of Elasticsearch V6.7.0 clusters and Elasticsearch V7.10 clusters in the Elasticsearch console.
  • You can view the audit logs only of Elasticsearch V7.X clusters that reside in the China (Beijing), China (Hangzhou), China (Shanghai), and China (Zhangjiakou) regions in the Elasticsearch console.

Procedure

  1. Log on to the Elasticsearch console.
  2. In the left-side navigation pane, click Elasticsearch Clusters.
  3. Navigate to the desired cluster.
    1. In the top navigation bar, select the resource group to which the cluster belongs and the region where the cluster resides.
    2. In the left-side navigation pane, click Elasticsearch Clusters. On the Elasticsearch Clusters page, find the cluster and click its ID.
  4. In the left-side navigation pane of the page that appears, click Logs. Then, you can view the logs of the cluster.
    Alibaba Cloud Elasticsearch provides the following types of logs for an Elasticsearch cluster: Cluster Log, Search Slow Log, Indexing Slow Log, GC Log, Access Log, and Audit Logs. The following table describes each type of log and their use scenarios. For more information about the logs, see Common types of logs.
    Log type Description Scenario
    Cluster log This type of log records the health status of an Elasticsearch cluster and information about query and write operations performed on the cluster. For example, logs for write operations include logs generated for index creation, index mapping updates, and full write queues. Logs for query operations include logs generated for query queues and query exceptions. If you want to view the status of each node in an Elasticsearch cluster or view information about query and write operations performed on the cluster, such as information about network connectivity between nodes, full garbage collection (GC), index creation or deletion, or errors reported for queries, you can view the cluster logs of the cluster.
    Notice If errors occur in your business, we recommend that you first view the cluster logs and monitoring data of your cluster to troubleshoot performance or configuration issues.
    Slow search log This type of log records information about slow queries. If the time that is required to complete a query exceeds a specific time threshold, the query is considered as a slow query, and the system displays information about the query in slow query logs. The time thresholds used to define slow queries are configured in the scenario-based index template of an Elasticsearch cluster. By default, the configurations of time thresholds for slow queries in the template are optimal, and you can directly apply the template. For more information about the scenario-based index template of an Elasticsearch cluster, see Modify the index template of a cluster. If a long period of time is required to complete queries in your business, you can view slow query logs to troubleshoot the issue.

    Queries that require a longer period of time to complete consume more cluster resources. If a large number of slow logs are generated for a cluster, check the resource usage and loads of the cluster to identify the items that cause bottleneck issues. Then, replenish resources for the cluster based on the items or use the aliyun-qos plug-in to perform throttling for the cluster to ensure the stability of the cluster.

    Slow indexing log This type of log records information about slow data write operations. If the time that is required to write data to an Elasticsearch cluster exceeds a specific time threshold, the data write is considered as a slow data write operation, and the system displays information about the operation in the logs of slow data write operations. The time thresholds used to define slow data write operations are configured in the scenario-based index template of an Elasticsearch cluster. By default, the configurations of time thresholds for slow data write operations in the template are optimal, and you can directly apply the template. For more information about the scenario-based index template of an Elasticsearch cluster, see Modify the index template of a cluster. If a long period of time is required to complete data write operations in your business, you can view the logs of slow data write operations to troubleshoot the issue.

    Data write operations that require a longer period of time to complete consume more cluster resources. If a large number of slow logs are generated for a cluster, check the resource usage and loads of the cluster to identify the items that cause bottleneck issues. Then, replenish resources for the cluster based on the items or use the aliyun-qos plug-in to perform throttling for the cluster to ensure the stability of the cluster.

    GC log This type of log records information about GC for an Elasticsearch cluster. GC logs contain information about GC triggered by JVM heap memory usage. You can obtain GC details including information about GC based on the Old GC, Concurrent Mark Sweep (CMS) GC, Full GC, and Minor GC mechanisms. If a performance bottleneck occurs on an Elasticsearch cluster, you can view the GC logs of the cluster to obtain GC details and check whether GC operations require a long period of time to complete or are frequently performed. If such GC operations exist, replenish resources for the cluster at your earliest opportunity or use the aliyun-qos plug-in to perform throttling for the cluster to ensure the stability of the cluster.
    Notice By default, an Alibaba Cloud Elasticsearch cluster uses the CMS garbage collector. If the volume of data stored on each data node in a cluster is greater than or equal to 32 GiB, we recommend that you use the G1 garbage collector to improve GC efficiency. For more information, see Configure a garbage collector.
    Access log This type of log records information about access to an Elasticsearch cluster. The access logs of an Elasticsearch cluster contain details about all query requests, such as URIs, the sizes of the request bodies, and the time when the requests were initiated.
    Notice You can view the access logs only of Elasticsearch V6.7.0 and V7.10 clusters in the Elasticsearch console.
    If you want to have a command of clients that are used to perform operations on an Elasticsearch cluster, you can view the access logs of the cluster on the Access Log tab of the Logs page of the cluster.
    Audit log This type of log is audit logs generated for operations that are performed on an Elasticsearch cluster, such as the create, delete, modify, and query operations.
    Notice
    • You can view the audit logs only of Elasticsearch V7.X clusters that reside in the China (Beijing), China (Hangzhou), China (Shanghai), and China (Zhangjiakou) regions in the Elasticsearch console. For an Elasticsearch cluster of another version, you can enable the Audit Log Indexing feature in the YML configuration file of the cluster. After you enable the feature for the cluster, the audit logs of the cluster are written to indexes in the cluster. These indexes are named in the .security_audit_log-* format. You can query such indexes to view the audit logs in the Kibana console of the cluster. For more information, see Configure the YML file.
    • Before you view the audit logs of an Elasticsearch cluster, you must click Log Configuration on the Logs page of the cluster in the Elasticsearch console to enable audit log collection.
    • By default, the system collects logs from the following types of audit events: access_denied, anonymous_access_denied, authentication_failed, connection_denied, tampered_request, run_as_denied, and run_as_granted. If you want to change the type of the audit event from which you want to collect logs, you must modify the xpack.security.audit.logfile.events.include parameter in the YML configuration file of an Elasticsearch cluster. For more information, see Configure the Audit Log Indexing feature.
    If your identity fails to be verified when you access an Elasticsearch cluster, a connection to the cluster is denied, you need to view the access events of the cluster, or you need to check whether suspicious events exist, you can troubleshoot the issue or perform the desired operation based on the access logs of the cluster. For example, a modification to data access permissions or user security configurations may be a suspicious event.
  5. On a tab of the Logs page, enter a query string, select the start time and end time, and then click Search.
    You can query logs that are generated during the last seven days. By default, the logs are displayed by time in descending order. The Lucene query syntax is supported. For more information, see Query string syntax.

    In this example, the logs that meet the following conditions are queried on the Cluster Log tab: The value of the level field is info, the value of the host field is 172.16.xx.xx, and the value of the content field contains the health keyword. In this case, the query string is host:172.16.xx.xx AND content:health AND level:info.

    Notice
    • AND in the query string must be uppercase.
    • If you do not specify an end time, the current system time is used as the end time. If you do not specify a start time, the start time is 1 hour earlier than the end time.
    • Alibaba Cloud Elasticsearch can return a maximum of 10,000 logs for each query. If the returned logs do not contain the logs that you want to view, you can shorten the specified time range and perform another query.
    After you click Search, the logs that match your query string are displayed.

Common types of logs

Cluster logs

The Cluster Log tab displays the run logs of the cluster. Each run log contains the following information: Time, Node IP, and Content. Log query results
Parameter Description
Time The time when the log was generated.
Node IP The IP address of the node that generates the log.
Content The details about the log. This parameter contains the following fields:
  • level: the level of the log. Log levels include trace, debug, info, warn, and error.
    Note GC logs do not contain the level field.
  • host: the IP address of the node that generates the log.
  • time: the time when the log is generated.
  • content: the content of the log.

Slow logs

Slow logs include slow query logs and slow indexing logs. If the time that is required to complete an indexing or query operation exceeds a specific time threshold, slow logs are generated. The Search Slow Log tab displays slow query logs, and the Indexing Slow Log tab displays slow indexing logs. By default, slow log collection is enabled. If unbalanced loads, read or write exceptions, or slow data processing issues occur on your cluster, you can troubleshoot issues based on the slow logs.

By default, Elasticsearch records only read and write operations that require 5s to 10s to complete in slow logs. This mechanism does not help troubleshoot issues. After you create a cluster, you can reduce the related time thresholds by using one of the following methods to capture more logs:
  • Use scenario-based templates. After a cluster is created, scenario-based templates are enabled and applied to the cluster. The index template defines the configurations of slow logs. We recommend that you retain the default configurations. The following code shows the default configurations of slow logs in the General scenario:
      "settings": {
        "index": {
          "search": {
            "slowlog": {
              "level": "info",
              "threshold": {
                "fetch": {
                  "warn": "200ms",
                  "trace": "50ms",
                  "debug": "80ms",
                  "info": "100ms"
                },
                "query": {
                  "warn": "500ms",
                  "trace": "50ms",
                  "debug": "100ms",
                  "info": "200ms"
                }
              }
            }
          },
          "refresh_interval": "10s",
          "unassigned": {
            "node_left": {
              "delayed_timeout": "5m"
            }
          },
          "indexing": {
            "slowlog": {
              "level": "info",
              "threshold": {
                "index": {
                  "warn": "200ms",
                  "trace": "20ms",
                  "debug": "50ms",
                  "info": "100ms"
                }
              },
              "source": "1000"
            }
          }
        }
      }
    Note If the value of the Scenario parameter is None in the Scenario-based Configuration section of the Cluster Configuration page, you can configure the parameter based on your business requirements. Then, submit the templates to apply the default configurations of slow logs to the cluster. For more information, see Use a scenario-based template to modify the configurations of a cluster.
  • Log on to the Kibana console of the cluster and run the following command to modify the configurations of slow logs.
    PUT _settings
    {
        "index.indexing.slowlog.threshold.index.warn" : "200ms",
        "index.indexing.slowlog.threshold.index.trace" : "20ms",
        "index.indexing.slowlog.threshold.index.debug" : "50ms",
        "index.indexing.slowlog.threshold.index.info" : "100ms",
        "index.search.slowlog.threshold.fetch.warn" : "200ms",
        "index.search.slowlog.threshold.fetch.trace" : "50ms",
        "index.search.slowlog.threshold.fetch.debug" : "80ms",
        "index.search.slowlog.threshold.fetch.info" : "100ms",
        "index.search.slowlog.threshold.query.warn" : "500ms",
        "index.search.slowlog.threshold.query.trace" : "50ms",
        "index.search.slowlog.threshold.query.debug" : "100ms",
        "index.search.slowlog.threshold.query.info" : "200ms"
    }
After the configurations of slow logs are modified, if the time that is required to complete a read or write operation exceeds the specified time threshold, you can query the related logs on the Search Slow Log or Indexing Slow Log tab of the Logs page to troubleshoot the issue. Slow logs

GC logs

By default, GC log collection is enabled. Each GC log contains the following information: Time, Node IP, and Content. For more information, see Cluster logs. GC logs

Access logs

The Access Log tab displays the details of all query requests, such as the names of nodes that are requested, IP addresses of the nodes, sizes of request bodies, request content, time when the requests are initiated, IP addresses that are used to send requests, and URIs.
Notice You can view the access logs only of Elasticsearch V6.7.0 and V7.10 clusters in the Elasticsearch cluster.
Access logs

Audit logs

The Audit Logs tab displays the audit logs generated for operations that are performed on an Elasticsearch cluster, such as the create, delete, modify, and query operations. By default, audit log collection is disabled. To enable audit log collection and view audit logs, perform the following steps:

  1. On the Logs page of an Elasticsearch cluster, click Log Configuration on the right side.
  2. In the Log Configuration dialog box, turn on Audit Log Collection.
    Notice
    • After you turn on Audit Log Collection, you can view the audit logs of the cluster on the Audit Logs tab of the Logs page. If you want to change the type of the audit event from which you want to collect logs, modify the xpack.security.audit.logfile.events.include parameter in the YML configuration file of the cluster. For more information, see Configure the Audit Log Indexing feature.
    • After you turn on or off Audit Log Collection, the system restarts the cluster. The system uses the rolling restart method to restart a cluster. Before the restart, make sure that the cluster is in the Active state (indicated by the color green), each index in the cluster has at least one replica shard for each primary shard, and the resource usage of the cluster is not high. If all of the preceding conditions are met, the cluster can still provide services during the restart. However, we recommend that you restart your cluster during off-peak hours.
  3. Read the risk warning and select the check box. Then, click OK.
    The system restarts the cluster. You can view the restart progress in the Tasks dialog box. After the cluster is restarted, the system starts to collect audit logs.
    Notice Audit logs occupy the disk space of your cluster. If the disk space that is occupied by audit logs is excessively large, the performance of the cluster may be affected. If you do not need to view the audit logs of the cluster, you can turn off Audit Log Collection in the Log Configuration dialog box.
  4. On the Logs page, click the Audit Logs tab and view the audit logs of the cluster.
    Audit logs

References

ListSearchLog

FAQ