All Products
Search
Document Center

Simple Log Service:Create indexes

Last Updated:Jul 09, 2024

An inverted index is a data storage structure that consists of keywords and logical pointers. Logical pointers can map to actual data. You can use keywords to quickly locate data rows of specific text in logs. An index is similar to a data catalog. You can query and analyze logs only after you create indexes. This topic describes the definition and types of indexes that are supported by Simple Log Service. This topic also describes how to create indexes and provides examples.

Prerequisites

  • Before you can analyze logs, you must store the logs in a Standard Logstore. For more information, see Data collection overview and Manage a Logstore.

  • If you want to use a Resource Access Management (RAM) user to create indexes, make sure that the RAM user is granted the required permissions. For more information about how to grant permissions, see Grant permissions to a RAM user. For more information about policies, see Overview.

Definition and types of indexes

Definition

In most cases, you can use keywords to query data from raw logs. For example, you want to obtain the following log that contains the Chrome keyword. If log splitting is not performed, the log is considered as a whole and the system does not associate the log with the Chrome keyword. In this case, you cannot obtain the log.

Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.2 (KHTML, like Gecko) Chrome/192.0.2.0 Safari/537.2

To search for the log, you must split the log into separate and searchable words. You can split a log by using delimiters. Delimiters determine the positions at which a log is split. In this example, you can use the following delimiters to split the preceding log: \n\t\r,;[]{}()&^*#@~=<>/\?:'". The log is split into the following words: Mozilla, 5.0, Windows, NT, 6.1, AppleWebKit, 537.2, KHTML, like, Gecko, Chrome, 192.0.2.0, Safari, and 537.2.

Simple Log Service creates indexes based on the words that are obtained after log splitting. You can use indexes to quickly locate specific information in a large number of logs.

Index types

Indexes are classified into full-text indexes and field indexes. Chinese content cannot be split by using delimiters. However, if you want to split Chinese content, you can turn on Include Chinese. Then, Simple Log Service automatically splits the Chinese content based on Chinese grammar.

  • Full-text indexes: Simple Log Service splits a log into multiple words that are of the Text type by using delimiters. You can query logs by using keywords. For example, you can query logs that contain Chrome or Safari based on the following search statement: Chrome or Safari. For more information, see Search syntax.

    全文索引

  • Field indexes: Simple Log Service distinguishes logs by field name and then splits the fields by using delimiters. Supported field types are Text, Long, Double, and JSON. After you create field indexes, you can specify field names and field values in the key:value format to query logs. You can also use a SELECT statement to query logs. For more information, see Field-specific search syntax and Log analysis overview.

    字段索引

    Note

    When you collect logs to Simple Log Service or ship logs from Simple Log Service to other cloud services, Simple Log Service adds fields such as log sources and timestamps to logs in the key-value format. The fields are considered reserved fields of Simple Log Service.

    After you create field indexes, you can use the following search or query statements to query data:

    • Search statement: request_method:GET and status in [200 299]. This search statement is used to query logs that record successful GET requests. GET requests whose status code ranges from 200 to 299 are considered successful. Search statement: request_method:GET not region:cn-hangzhou. This search statement is used to query logs that record GET requests from regions other than the China (Hangzhou) region.

    • Query statement: * | SELECT status_code FROM web_logs.

    • Query statement: level: ERROR | SELECT status_code FROM web_logs.

Policies used to create indexes

Configured indexes take effect only for new logs. To query and analyze historical logs, you must reindex the logs. After indexes are created, the indexes take effect within approximately 1 minute. For more information about the configuration examples of field indexes, see Query and analyze JSON logs and Query and analyze website logs.

Important

Query and analysis results vary based on index configurations. You must create indexes based on your business requirements. If you create both full-text indexes and field indexes, the field indexes take precedence.

  • If only full-text indexes are configured, you can use only search syntax to query logs. For more information, see Search syntax.

  • If field indexes are configured, the query statement that you can use to query and analyze logs varies based on the data types of fields in the logs.

    • Fields of the Long and Double types: You can use field-based search statements and analytic statements to query and analyze data. An analytic statement includes a SELECT statement.

    • Fields of the Text type: You can use full text-based search statements, field-based search statements, and analytic statements to query and analyze data. If full-text indexing is not enabled, full text-based search statements query data from all fields of the Text type. If full-text indexing is enabled, full text-based search statements query data from all logs.

Index configuration examples

  • A log contains the request_time field, and the request_time>100 field-based search statement is executed.

    • If only full-text indexes are configured, logs that contain request_time, >, and 100 are returned. The greater-than sign (>) is not a delimiter.

    • If only field indexes are configured and the field types are Double and Long, logs whose request_time field value is greater than 100 are returned.

    • If both full-text indexes and field indexes are configured and the field types are Double and Long, configured full-text indexes do not take effect for the request_time field, and logs whose request_time field value is greater than 100 are returned.

  • A log contains the request_time field, and the request_time full text-based search statement is executed.

    • If only field indexes are configured and the field types are Double and Long, no logs are returned.

    • If only full-text indexes are configured, logs that contain the request_time field are returned. In this case, the statement queries data from all logs.

    • If only field indexes are configured and the field type is Text, logs that contain the request_time field are returned. In this case, the statement queries data from all fields of the Text type.

  • A log contains the status field, and the * | SELECT status, count(*) AS PV GROUP BY status query statement is executed.

    • If only full-text indexes are configured, no logs are returned.

    • If an index is configured for the status field, the total numbers of page views (PVs) for different status codes are returned.

Index traffic

  • Index traffic for full-text indexes: All field names and field values are stored as text. In this case, field names and field values are both included in the calculation of index traffic.

  • Index traffic for field indexes: The method that is used to calculate index traffic varies based on the data type of a field.

    • Text: Field names and field values are both included in the calculation of index traffic.

    • Long and Double: Field names are not included in the calculation of index traffic. Each field value is counted as 8 bytes in index traffic.

      For example, if you create an index for the status field of the Long type and the field value is 400, the string status is not included in the calculation of index traffic, and the value 400 is counted as 8 bytes in index traffic.

    • JSON: Field names and field values are both included in the calculation of index traffic. The subfields that are not indexed are also included. For more information, see Why is index traffic generated for JSON subfields that are not indexed?

      • If a subfield is not indexed, index traffic is calculated by regarding the data type of the subfield as Text.

      • If a subfield is indexed, index traffic is calculated based on the data type of the subfield. The data type can be Text, Long, or Double.

Billing description

Logstores support the following billing modes: pay-by-ingested-data and pay-by-feature. For more information, see Manage a Logstore, Billable items of pay-by-feature, and Billable items of pay-by-ingested-data.

Logstore that uses the pay-by-ingested-data billing mode

  • Indexes occupy storage space. For more information about storage types, see Overview of tiered storage.

  • Reindexing does not generate fees.

Logstore that uses the pay-by-feature billing mode

  • Indexes occupy storage space. For more information about storage types, see Overview of tiered storage.

  • When you create indexes, traffic is generated. For more information about the billing of index traffic, see the index traffic of log data and log index traffic of Query Logstores billable items in Billable items of pay-by-feature. For more information about how to reduce index traffic, see the References section of this topic.

  • Reindexing generates fees. During reindexing, you are charged based on the same billable items and prices as when you create indexes.

Procedure

Step 1: Create indexes

  1. Go to the query and analysis page.

    1. Log on to the Simple Log Service console.

    2. In the Projects section, click the project that you want to manage.

    3. On the Log Storage > Logstores tab, click the Logstore that you want to manage.

    4. On the page that appears, choose Index Attributes > Attributes. If no indexes are created, click Enable.

      配置索引

  2. Turn off Auto Update. If a Logstore is a dedicated Logstore for a cloud service or an internal Logstore, Auto Update is turned on by default. In this case, the built-in indexes of the Logstore are automatically updated to the latest version. If you want to create indexes in the preceding scenario, turn off Auto Update in the Search & Analysis panel.

    Warning

    If you delete the indexes of a dedicated Logstore for a cloud service, features that are enabled for the Logstore may be affected. The features include reports and alerting.

    自动更新索引

  3. Create indexes.

    Configure index parameters. If you want to analyze fields, you must create field indexes. You must include a SELECT statement in your query statement for analysis. Field indexes have a higher priority than full-text indexes. After indexes are created, the indexes take effect within 1 minute.

    Important
    • Simple Log Service automatically creates indexes for specific reserved fields. For more information, see Reserved fields.

      Simple Log Service leaves delimiters empty when it creates indexes for the __topic__ and __source__ reserved fields. Therefore, only exact match is supported when you specify keywords to query the two fields.

    • Fields that are prefixed with __tag__ do not support full-text indexes. If you want to query and analyze fields that are prefixed with __tag__, you must create field indexes. Sample query statement: *| select "__tag__:__receive_time__".

    • If a log contains two fields whose names are the same, such as request_time, Simple Log Service displays one of the fields as request_time_0. The two fields are still stored as request_time in Simple Log Service. If you want to query, analyze, ship, transform, or create indexes for the fields, you must use request_time.

    Parameters for full-text indexes and field indexes

    Full-text indexes

    Parameter

    Description

    LogReduce

    If you turn on LogReduce, Simple Log Service automatically clusters highly similar text logs during collection and extracts patterns from the logs. This can help you fully understand the logs. For more information, see LogReduce.

    Case Sensitive

    Specifies whether searches are case-sensitive.

    • If you turn on Case Sensitive, searches are case-sensitive. For example, if a log contains internalError, you can search for the log by using only the internalError keyword.

    • If you turn off Case Sensitive, searches are not case-sensitive. For example, if a log contains internalError, you can search for the log by using the INTERNALERROR or internalerror keyword.

    Include Chinese

    Specifies whether to distinguish between Chinese content and English content in searches.

    • If you turn on Include Chinese and a log contains Chinese characters, the Chinese content is split based on Chinese grammar. The English content is split by using specified delimiters.

      Important

      When Chinese content is split, the write speed is reduced. Proceed with caution.

    • If you turn off Include Chinese, all content of a log is split by using specified delimiters.

    Delimiter

    The delimiters that are used to split the content of a log into multiple words. By default, Simple Log Service uses the following delimiters: , '";=()[]{}?@&<>/:\n\t\r. If the default delimiters do not meet your business requirements, you can specify custom delimiters. All ASCII codes can be specified as delimiters.

    If you leave Delimiter empty, Simple Log Service considers an entire log as a whole. In this case, you can search for the log only by using a complete string or by performing fuzzy match.

    For example, the content of a log is /url/pic/abc.gif.

    • If you do not specify a delimiter, the content of the log is considered as a single word /url/pic/abc.gif. You can search for the log only by using the /url/pic/abc.gif keyword or by using /url/pic/* to perform fuzzy match.

    • If you set Delimiter to a forward slash (/), the content of the log is split into the following three words: url, pic, and abc.gif. You can search for the log by using the url, abc.gif, or /url/pic/abc.gif keyword, or by using pi* to perform fuzzy match.

    • If you set Delimiter to a forward slash (/) and a period (.), the content of the log is split into the following four words: url, pic, abc, and gif. You can search for the log by using one of the preceding words or by performing fuzzy match.

    Field indexes

    Optional. Click Automatic Index Generation. Simple Log Service automatically generates field indexes based on the first log in the preview results of data collection.

    自动生成索引

    Click the image.png icon in the lower part of the Search & Analysis panel and configure the following parameters.

    Parameter

    Description

    Field Name

    The name of the log field. Example: client_ip.

    The name can contain only letters, digits, and underscores (_). It must start with a letter or an underscore (_).

    Important
    • If you want to create an index for a __tag__ field, such as a public IP address or a UNIX timestamp, you must set Field Name to a value in the __tag__:KEY format. Example: __tag__:__receive_time__. For more information, see Reserved fields.

    • __tag__ fields do not support numeric indexes. When you create an index for a __tag__ field, you must set Type to text.

    Type

    The data type of the field value. Valid values: text, long, double, and json. For more information, see Data types.

    If you set the data type for a field to long or double, you cannot configure Case Sensitive, Include Chinese, or Delimiter for the field.

    Alias

    The alias of the field. For example, you can set the alias of the client_ip field to ip.

    The alias can contain only letters, digits, and underscores (_). It must start with a letter or an underscore (_).

    Important

    You can use the alias of a field only in an analytic statement. You must use the original name of a field in a search statement. You must include a SELECT statement in your query statement for analysis. For more information, see Column aliases.

    Case Sensitive

    Specifies whether searches are case-sensitive.

    • If you turn on Case Sensitive, searches are case-sensitive. For example, if a log contains internalError, you can search for the log by using only the internalError keyword.

    • If you turn off Case Sensitive, searches are not case-sensitive. For example, if a log contains internalError, you can search for the log by using the INTERNALERROR or internalerror keyword.

    Delimiter

    The delimiters that are used to split the content of a log into multiple words. By default, Simple Log Service uses the following delimiters: , '";=()[]{}?@&<>/:\n\t\r. If the default delimiters do not meet your business requirements, you can specify custom delimiters. All ASCII codes can be specified as delimiters.

    If you leave Delimiter empty, Simple Log Service considers an entire log as a whole. In this case, you can search for the log by using a complete string or by performing fuzzy match.

    For example, the content of a log is /url/pic/abc.gif.

    • If you do not specify a delimiter, the content of the log is considered as a single word /url/pic/abc.gif. You can search for the log by using the /url/pic/abc.gif keyword or by using /url/pic/* to perform fuzzy match.

    • If you set Delimiter to a forward slash (/), the content of the log is split into the following three words: url, pic, and abc.gif. You can search for the log by using the url, abc.gif, or /url/pic/abc.gif keyword, or by using pi* to perform fuzzy match.

    • If you set Delimiter to a forward slash (/) and a period (.), the content of the log is split into the following four words: url, pic, abc, and gif. You can search for the log by using one of the preceding words or by performing fuzzy match.

    Include Chinese

    Specifies whether to distinguish between Chinese content and English content in searches.

    • If you turn on Include Chinese and a log contains Chinese characters, the Chinese content is split based on Chinese grammar. The English content is split by using specified delimiters.

    • Important

      When Chinese content is split, the write speed is reduced. Proceed with caution.

    • If you turn off Include Chinese, all content of a log is split by using specified delimiters.

    Enable Analytics

    You can perform statistical analysis on a field only if you turn on Enable Analytics for the field.

Step 2: Reindex logs

Simple Log Service provides the reindexing feature that you can use to configure or modify indexes for historical data. You can reindex the logs of a specified time range in a Logstore based on the most recent indexing rules. For more information, see Reindex logs for a Logstore and Function overview.

What to do next

Query and analyze logs

For more information about how to query and analyze logs, see Query and analyze logs. For more information about the examples of query and analysis, see Query and analyze website logs, Query and analyze JSON logs, Collect, query, and analyze NGINX monitoring logs, and Analyze Layer 7 access logs of SLB.

Specify the maximum length of a field value

The default maximum length of a field value that can be retained for analysis is 2,048 bytes, which is equivalent to 2 KB. You can change the value of Maximum Statistics Field Length. Valid values: 64 to 16384. Unit: bytes.

Important

If the length of a field value exceeds the value of this parameter, the field value is truncated, and the excess part is not involved in analysis.

设置字段最大长度

LogReduce

If you turn on LogReduce, Simple Log Service automatically clusters highly similar text logs during collection and extracts patterns from the logs. This can help you fully understand the logs. For more information, see LogReduce.

Disable indexing

After you disable the indexing feature for a Logstore, the storage space that is occupied by historical indexes is automatically released after the data retention period of the Logstore elapses.

References

FAQ

What do I do if I cannot query logs after the logs are imported to Simple Log Service?

  • Check whether the delimiters that you specify meet the requirements.

  • Configured indexes take effect only for new logs. If you want to query and analyze historical logs, you must reindex the logs. For more information, see Reindex logs for a Logstore.

How do I use two conditions to query logs?

If you want to use two conditions to query logs, specify two statements at a time. For example, if you want to query logs whose status is neither OK nor Unknown in a Logstore, you can specify not OK not Unknown to obtain the logs.

How do I query logs that contain multiple keywords?

For example, if you want to query logs whose http_user_agent field value contains like Gecko, you can use one of the following methods:

  • Phrase search: http_user_agent:#"like Gecko". For more information, see Phrase search.

  • LIKE clause: * | Select * where http_user_agent like '%like Gecko%'

How do I query logs by using a keyword that contains spaces?

For example, if you query logs by using the POS version keyword, logs that contain POS or version are returned. If you query logs by using the "POS version" keyword, logs that contain POS version are returned.

Related operations