This topic describes how to collect delimiter-separated values (DSV) formatted logs and configure indexes. You can specify the required settings in the Log Service console.

Background information

DSV formatted logs use line breaks as boundaries. Each line is a log. The fields of each log are delimited by using single-character delimiters or multi-character delimiters. If a field contains delimiters, you can enclose the field with a pair of quotes.

Commonly used DSV formatted logs include comma-separated values (CSV) and tab-separated values (TSV) formatted logs.

  • Single-character delimiter

    The following examples list some log entries with single-character delimiters:

    05/May/2016:13:30:28,10.10. *. *,"POST /PutData? Category=YunOsAccountOpLog&AccessKeyId=****************&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=******************************** HTTP/1.1",200,18204,aliyun-sdk-java
    05/May/2016:13:31:23,10.10. *. *,"POST /PutData? Category=YunOsAccountOpLog&AccessKeyId=****************&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=******************************** HTTP/1.1",401,23472,aliyun-sdk-java
    For the log entries with single-character delimiters, you must specify each delimiter. You can also use a pair of quotes in the log entry.
    • Delimiter: Available single-character delimiters include the tab (\t), vertical bar (|), space, comma (,), and semicolon (;). You can also specify a non-printable character as the delimiter. A double quotation mark (") cannot be used as the delimiter.

      However, a double quotation mark (") can be used as a quote. You can place the double quotation mark at the border of a field, or in the field. If a double quotation mark (") is included in a log field but is not used as a quote, it must be escaped as double quotation marks (""). When Log Service parses log fields, the double quotation marks ("") are automatically converted into a double quotation mark ("). For example, if you specify a comma (,) as the delimiter, and double quotation mark (") as the quote in a log field, you must enclose the field that contains commas by using a pair of quotes. In addition, you must escape the double quotation mark (") in the field to double quotation marks (""). The log format after processing is 1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00. The log entry can be parsed into five fields: 1999, Chevy, Venture "Extended Edition, Very Large", an empty field, and 5000.00.

    • Quote: If a log field contains delimiters, you must specify a pair of quotes to enclose the field. Log Service parses the content that is enclosed in a pair of quotes into a new complete field.

      Available quotes include the tab (\t), vertical bar (|), space, comma (,), semicolon (;), and non-printable characters.

      For example, you specify a comma (,) as the delimiter, and double quotation mark (") as the quote in log fields. The log entry is 1997,Ford,E350,"ac, abs, moon",3000.00. Then the log entry can be parsed into five fields: 1997, Ford, E350, ac, abs, moon, 3000.00.

  • Multi-character delimiter
    The following examples list some log entries with multi-character delimiters:
    05/May/2016:13:30:28&&10.200. **. **&&POST /PutData? Category=YunOsAccountOpLog&AccessKeyId=****************&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=pD12XYLmGxKQ%2Bmkd6x7hAgQ7b1c%3D HTTP/1.1&&200&&18204&&aliyun-sdk-java
    05/May/2016:13:31:23&&10.200. **. **&&POST /PutData? Category=YunOsAccountOpLog&AccessKeyId=****************&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=******************************** HTTP/1.1&&401&&23472&&aliyun-sdk-java

    A multi-character delimiter can contain two or three characters, such as | |, &&&, ^_^). Log Service parses log field based on delimiters. You do not need to use quotes to enclose the log field to be parsed.

    Note You must ensure that the delimiters in a field cannot be parsed into a new field. Otherwise, Log Service cannot parse the field as expected.

    For example, you specify && as the delimiter. The log entry 1997&&Ford&&E350&&ac&abs&moon&&3000.00 is parsed into the following five fields: 1997, Ford, E350, ac&abs&moon, 3000.00.

Procedure

  1. Log on to the Log Service console.
  2. In the Import Data section, select Delimiter Mode - Text Log.
  3. In the Specify Logstore step, select the target project and Logstore, and click Next.
    You can also click Create Now to create a project and a Logstore. For more information, see Step 1: Create a project and a Logstore.
  4. In the Create Machine Group step, create a machine group.
    • If a machine group is available, click Using Existing Machine Groups.
    • This section uses ECS instances as an example to describe how to create a machine group. To create a machine group, perform the following steps:
      1. Install Logtail on ECS instances. For more information, see Install Logtail on ECS instances.

        If Logtail is installed on the ECS instances, click Complete Installation.

        Note If you need to collect logs from user-created clusters or servers of third-party cloud service providers, you must install Logtail on these servers. For more information, see Install Logtail in Linux or Install Logtail in Windows.
      2. After the installation is complete, click Complete Installation.
      3. On the page that appears, specify the parameters for the machine group. For more information, see Create an IP address-based machine group or Create a custom ID-based machine group.
  5. In the Machine Group Settings step, apply the configurations to the machine group.
    Select the created machine group and move the group from Source Server Groups to Applied Server Groups.
  6. In the Logtail Config step, create a Logtail configuration file. The following table describes the parameters in the Logtail configuration file.
    Parameter Description
    Config Name The name of the Logtail configuration file. After the Logtail configuration file is created, its name cannot be modified.

    You can also click Import Other Configuration to import a Logtail configuration file from another project.

    Log Path Specify the directory and names of the log files.
    The file names can be complete names or names that contain wildcards. For more information, visit Wildcard matching. The log files in all levels of subdirectories under a specified directory are monitored if the log files match the specified pattern. Examples:
    • /apsara/nuwa/ … /*.log indicates that the files whose extension is .log in the /apsara/nuwa directory and its subdirectories are monitored.
    • /var/logs/app_* … /*.log* indicates that each file that meets the following conditions is monitored: The file name contains .log. The file is stored in a subdirectory (at all levels) of the /var/logs directory. The name of the subdirectory matches the app_* pattern.
    Note
    • Each log file can be collected by using only one Logtail configuration file.
    • You can include only asterisks (*) and question marks (?) as wildcard characters in the log path.
    Docker File Specifies whether the Logtail configuration file is a Docker file. If it is a Docker file, you can configure the log path and container tags. Container tags are specified by the configurations of the label whitelist and blacklist, and environment variable whitelist and blacklist. Logtail monitors the creation and destruction of the containers, and collect logs of the specified containers based on the tags. For more information, see Use the console to collect Kubernetes text logs in the DaemonSet mode.
    Blacklist Specifies whether to enable the blacklist feature. After you enable this feature, you can configure the blacklist in the Add Blacklist field. You can configure a blacklist to skip the specified directories or files during log collection. The directories and files in the blacklist support exact match and wildcard match. Examples:
    • If you select Filter by Directory from the Filter Type drop-down list and enter /tmp/mydir in the Content column, all files in the directory are skipped.
    • If you select Filter by File from the Filter Type drop-down list and enter /tmp/mydir/file in the Content column, only the specified file is skipped.
    Mode Select a mode. By default, Delimiter Mode is selected. For information about other modes, see Overview.
    Sample Log Enter a sample log. The delimiter mode applies only to single-line logs.
    Delimiter Select a delimiter based on the log format. Otherwise, Log Service may fail to parse logs.
    Note If you select Hidden Characters as the delimiter, you must enter the character in the following format: 0xthe hexadecimal ASCII code of the non-printable character. For example, to use the non-printable character whose hexadecimal ASCII code is 01, you must enter 0x01.
    Quote Select a quote based on the log format. Otherwise, Log Service may fail to parse logs.
    Note If you select Hidden Characters as the quote, you must enter the character in the following format: 0xthe hexadecimal ASCII code of the non-printable character. For example, to use the non-printable character whose hexadecimal ASCII code is 01, you must enter 0x01.
    Extracted Content Specify the key and value of the extracted content. Log Service extracts the log content based on the specified sample log and delimiter. The extracted log content is delimited into values. You must specify a key for each value.
    Incomplete Entry Upload Specifies whether to upload incomplete log entries. The switch indicates whether to upload a log entry whose number of parsed fields is less than the number of the specified keys. If you enable this feature, the log entry is uploaded. Otherwise, the log entry is dropped.
    For example, if you set the delimiter to the vertical bar (|), the log entry 11|22|33|44|55 is parsed into the following fields: 11, 22, 33, 44, and 55. You can set the keys to A, B, C, D, and E, respectively.
    • If you enable the Incomplete Entry Upload feature, 55 is uploaded as the value of the D key when Log Service collects the log entry 11|22|33|55.
    • If you disable the Incomplete Entry Upload feature, Log Service drops the log entry because the fields and keys do not match.
    Use System Time
    • Specifies whether to the use system time. If you enable the Use System Time feature, the timestamp of a log entry is the system time of the server when the log entry is collected.
    • If you disable the Use System Time feature, you must find the value that indicates time information in the Extracted Content and configure a key named time for the value. Specify the value and then click Auto Generate in the Time Conversion Format field to automatically parse the time. For more information, see Time formats.
    Drop Failed to Parse Logs
    • Specifies whether to drop failed-to-parse logs. If you enable the Drop Failed to Parse Logs feature, logs that fail to be parsed are not uploaded to Log Service.
    • If you disable the Drop Failed to Parse Logs feature, raw logs are uploaded to Log Service when the raw logs fail to be parsed.
    Maximum Directory Monitoring Depth Enter the maximum depth at which the specified log directory is monitored. Valid values: 0 to 1000. The value 0 indicates that only the directory specified in the log path is monitored.
    You can configure advanced options based on your business requirements. We recommend that you do not modify the settings. The following table describes the parameters in the advanced options.
    Parameter Description
    Enable Plug-in Processing Specifies whether to enable the plug-in processing feature. If you enable this feature, plug-ins are used to process logs. For more information, see Process data.
    Upload Raw Log Specifies whether to upload raw logs. If you enable this feature, raw logs are written to the __raw__ field and uploaded together with the parsed logs.
    Topic Generation Mode
    • Null - Do not generate topic: This mode is selected by default. In this mode, the topic field is set to an empty string. You can query logs without the need to enter a topic.
    • Machine Group Topic Attributes: This mode is used to differentiate logs that are generated by different servers.
    • File Path Regex: In this mode, you must configure a regular expression in the Custom RegEx field. The part of a log path that matches the regular expression is used as the topic name. This mode is used to differentiate logs that are generated by different users or instances.
    Log File Encoding
    • utf8: indicates that UTF-8 encoding is used.
    • gbk: indicates that GBK encoding is used.
    Timezone The time zone where logs are collected. Valid values:
    • System Timezone: This option is selected by default. It indicates that the time zone where logs are collected is the same as the time zone to which the server belongs.
    • Custom: Select a time zone.
    Timeout The timeout period of log files. If a log file is not updated within the specified period, Logtail considers the file to be timed out. Valid values:
    • Never: All log files are continuously monitored and never time out.
    • 30 Minute Timeout: If a log file is not updated within 30 minutes, Logtail considers the file to be timed out and no longer monitors the file.

      If you select 30 Minute Timeout, you must specify the Maximum Timeout Directory Depth parameter. Valid values: 1 to 3.

    Filter Configuration The filter conditions that are used to collect logs. Only logs that match the specified filter conditions are collected. Examples:
    • Collect logs that meet a condition: Specify the filter condition to Key:level Regex:WARNING|ERROR if you need to collect only logs of only the WARNING or ERROR severity level.
    • Filter out logs that do not meet a condition:
      • Specify the filter condition to Key:level Regex:^(?!. *(INFO|DEBUG)). * if you need to filter out logs of the INFO or DEBUG severity level.
      • Specify the filter condition to Key:url Regex:. *^(?!.*(healthcheck)). * if you need to filter out logs whose URL contains the keyword healthcheck. For example, logs in which the value of the url key is /inner/healthcheck/jiankong.html are not collected.

    For more examples, visit regex-exclude-word and regex-exclude-pattern.

  7. In the Configure Query and Analysis step, configure the indexes.
    Indexes are configured by default. You can re-configure the indexes based on your business requirements. For more information, see Enable and configure the index feature for a Logstore.
    Note
    • You must configure Full Text Index or Field Search. If you configure both of them, the settings of Field Search are applied.
    • If the data type of index is long or double, the Case Sensitive and Delimiter settings are unavailable.

After all configurations are completed, Log Service starts to collect logs.