You can use the full regex mode to extract custom fields from logs. You can create Logtail configurations to collect logs. This topic describes how to create a Logtail configuration in full regex mode by using the Log Service console.

Prerequisites

  • A project and a Logstore are created. For more information, see Create a project and Create a Logstore.
  • The server on which Logtail is installed can access port 80 and port 443 of remote servers.

Procedure

  1. Log on to the Log Service console.
  2. In the Import Data section, click RegEx - Text Log.
  3. Select the project and Logstore. Then, click Next.
  4. Create a machine group.
    • If a machine group is available, click Using Existing Machine Groups.
    • If no machine groups are available, perform the following steps to create a machine group. In this example, an Elastic Compute Service (ECS) instance is used.
      1. On the ECS Instances tab, select Manually Select Instances. Then, select the ECS instance that you want to use and click Execute Now.

        For more information, see Install Logtail on ECS instances.

        Note If you want to collect logs from self-managed clusters or servers from third-party cloud service providers, you must manually install Logtail. For more information, see Install Logtail on a Linux server or Install Logtail in Windows.
      2. After Logtail is installed, click Complete Installation.
      3. In the Create Machine Group step, configure Name and click Next.

        Log Service allows you to create IP address-based machine groups and custom identifier-based machine groups. For more information, see Create an IP address-based machine group and Create a custom ID-based machine group.

  5. Select the newly created machine group and move it from the Source Server Groups section to the Applied Server Groups section. Then, click Next.
    Notice If you apply a machine group immediately after it is created, the heartbeat status of the machine group may be FAIL. This issue occurs because the machine group is not connected to Log Service. In this case, you can click Automatic Retry. If the issue persists, see What do I do if no heartbeat connections are detected on Logtail?
  6. Create a Logtail configuration and click Next.
    Parameter Description
    Config Name Enter a name for the Logtail configuration. The name must be unique in a project. After the Logtail configuration is created, you cannot change the name of the Logtail configuration.

    You can click Import Other Configuration to import a Logtail configuration from another project.

    Log Path Specify the log file directory and log file name.
    You can specify an exact directory and an exact name. You can also use wildcards to specify the directory and name. For more information, see Wildcard matching. Log Service scans all levels of the specified directory to match log files. Examples:
    • If you specify /apsara/nuwa/**/*.log, Log Service matches the files whose name is suffixed by .log in the /apsara/nuwa directory and its recursive subdirectories.
    • If you specify /var/logs/app_*/*.log, Log Service matches the files that meet the following conditions: The file name contains .log. The file is stored in a subdirectory under /var/logs or in a recursive subdirectory of the subdirectory. The name of the subdirectory matches the app_* pattern.
    Note
    • By default, logs in each log file can be collected by using one Logtail configuration.
    • If you want to collect logs in a log file by using multiple Logtail configurations, you must create a symbolic link for the directory in which the log file is stored. For example, if you want to collect logs from the /home/log/nginx/log/log.log file by using two Logtail configurations, you must run the following command to create a symbolic link that points to the directory of the file. Then, specify the real path in one Logtail configuration and specify the symbolic link in the other Logtail configuration.
      ln -s /home/log/nginx/log /home/log/nginx/link_log
    • When you configure this parameter, you can use only asterisks (*) or question marks (?) as wildcards.
    Blacklist If you turn on Blacklist, you must configure a blacklist to specify the directories or files that you want Log Service to skip when it collects logs. You can specify exact directories and file names. You can also use wildcards to specify directories and file names. Examples:
    • If you select Filter by Directory from a drop-down list in the Filter Type column and enter /home/admin/dir1 for Content, all files in the /home/admin/dir1 directory are skipped.
    • If you select Filter by Directory from a drop-down list in the Filter Type column and enter /home/admin/dir* for Content, the files in all subdirectories whose names are prefixed by dir in the /home/admin/ directory are skipped.
    • If you select Filter by Directory from a drop-down list in the Filter Type column and enter /home/admin/*/dir for Content, all files in dir directories in each subdirectory of the /home/admin/ directory are skipped.

      For example, the files in the /home/admin/a/dir directory are skipped, but the files in the /home/admin/a/b/dir directory are not skipped.

    • If you select Filter by File from a drop-down list in the Filter Type column and enter /home/admin/private*.log for Content, all files whose names are prefixed by private and suffixed by .log in the /home/admin/ directory are skipped.
    • If you select Filter by File from a drop-down list in the Filter Type column and enter /home/admin/private*/*_inner.log for Content, all files whose names are suffixed by _inner.log in the subdirectories whose names are prefixed by private in the /home/admin/ directory are skipped.

      For example, the /home/admin/private/app_inner.log file is skipped, but the /home/admin/private/app.log file is not skipped.

    Note
    • When you configure the blacklist, you can use only asterisks (*) or question marks (?) as wildcards.
    • If you use wildcards to configure Log Path and want to skip some directories in the specified directory, you must configure the blacklist and enter a complete directory.

      For example, if you set Log Path to /home/admin/app*/log/*.log and want to skip all subdirectories in the /home/admin/app1* directory, you must configure the blacklist by selecting Filter by Directory and entering /home/admin/app1*/**. If you enter /home/admin/app1*, the blacklist cannot take effect.

    • Computational overhead is generated when the blacklist is used. We recommend that you add a maximum of 10 entries to the blacklist.
    Docker File If you want to collect logs from Docker containers, you must turn on Docker File and specify the directories and tags of the containers. Logtail monitors the containers to check whether the containers are created or destroyed. Then, Logtail filters the logs of the containers based on tags and collects the logs that meet the filter conditions. For more information about how to collect the text logs of containers, see Use the console to collect Kubernetes text logs in DaemonSet mode.
    Mode Select the log collection mode. By default, Full Regex Mode is displayed. You can change the mode.
    Singleline
    • If you want to collect single-line logs, turn on Singleline. Then, Log Service collects logs by line.
    • If you want to collect multi-line logs such as Java program logs, turn off Singleline and use Single Mode - Multi-line to collect logs.
    Log Sample Enter a sample log that is collected from an actual scenario. This way, Log Service can extract a regular expression from the log. For more information about sample logs, see Case: Collect single-line logs and Case: Collect multi-line logs.
    Regex to Match First Line Configure a regular expression to match the start part in the first line of a log. If you want to collect multi-line logs, you must turn off Singleline and configure this parameter. Log Service can automatically generate the regular expression or use the regular expression that you manually specify.
    • Automatic generation

      After you enter a sample multi-line log, click Auto Generate. Log Service automatically generates a regular expression to match the start part in the first line of the log.

    • Manual configuration

      After you enter a sample multi-line log, click Manual and specify a regular expression to match the start part in the first line of the log. Then, click Validate to check whether the regular expression is valid. For more information, see How do I modify a regular expression?.

    Extract Field If you turn on Extract Field, Log Service can extract key-value pairs by using a regular expression.
    RegEx If you turn on Extract Field, you must configure this parameter.
    • Automatic generation

      In the Log Sample field, select the content that you want to extract and click Generate Regular Expression. A regular expression is automatically generated.

    • Manual configuration

      Click Manual to specify a regular expression. Then, click Validate to check whether the regular expression can be used to parse logs or extract content from logs. For more information, see How do I modify a regular expression?.

    Extracted Content If you turn on Extract Field, you must configure this parameter.

    After log content is extracted as values by using the regular expression, you must specify a key for each value.

    Use System Time If you turn on Extract Field, you must configure this parameter.
    • If you turn on Use System Time, the timestamp of a log indicates the system time when the log is collected. The system time refers to the time of the server on which Logtail runs.
    • If you turn off Use System Time, you must configure Specified Time Key and Time Format based on the value of the time field specified in Extracted Content. For more information about the time format, see Time formats.

      For example, if you set Specify Time Key to time_local and Time Format to %d/%b/%Y:%H:%M:%S, the timestamp of a log is the value of the time_local field.

    Drop Failed to Parse Logs Specify whether to drop the logs that fail to be parsed.
    • If you turn on Drop Failed to Parse Logs, the logs that fail to be parsed are not uploaded to Log Service.
    • If you turn off Drop Failed to Parse Logs, raw logs are uploaded to Log Service if the logs fail to be parsed.
    Maximum Directory Monitoring Depth Specify the maximum number of levels of subdirectories that you want to monitor. The subdirectories are in the log file directory that you specify. Valid values: 0 to 1000. The value 0 indicates that only the specified log file directory is monitored.
    You can configure advanced settings based on your business requirements. We recommend that you do not modify the advanced settings. The following table describes the parameters in the advanced settings.
    Parameter Description
    Enable Plug-in Processing If you turn on Enable Plug-in Processing, you can configure Logtail plug-ins to process logs. For more information, see Overview.
    Note If you turn on Enable Plug-in Processing, specific parameters such as Upload Raw Log, Timezone, Drop Failed to Parse Logs, Filter Configuration, and Incomplete Entry Upload (Delimiter mode) become unavailable.
    Upload Raw Log If you turn on Upload Raw Log, each raw log is uploaded to Log Service as a value of the __raw__ field together with the log parsed from the raw log.
    Topic Generation Mode Select the topic generation mode. For more information, see Log topics.
    • Null - Do not generate topic: This is the default value. In this mode, the topic field is set to an empty string. When you query logs, you do not need to specify a topic.
    • Machine Group Topic Attributes: In this mode, topics are configured at the machine group level. This mode is used to differentiate between the logs that are generated by different servers.
    • File Path RegEx: In this mode, you must specify a regular expression in the Custom RegEx field. The part of a log path that matches the regular expression is used as the topic. This mode is used to differentiate between the logs that are generated by different users or instances.
    Log File Encoding Select the encoding format of log files. Valid values: utf8 and gbk.
    Timezone Select the time zone where logs are collected. Valid values:
    • System Timezone: This is the default value. If you select this value, the time zone to which the server belongs is used.
    • Custom: If you select this value, you must select a time zone.
    Timeout Select a timeout period of log files. If a log file is not updated within the specified period, Logtail considers the file to be timed out. Valid values:
    • Never: All log files are continuously monitored and never time out.
    • 30 Minute Timeout: If a log file is not updated within 30 minutes, Logtail considers the file to be timed out and stops monitoring the file.

      If you select 30 Minute Timeout, you must configure the Maximum Timeout Directory Depth parameter. Valid values: 1 to 3.

    Filter Configuration Specify the filter conditions that are used to collect logs. Only the logs that match the specified filter conditions are collected. Examples:
    • Collect logs that meet specified conditions: If you set Key to level and RegEx to WARNING|ERROR, only the logs whose level is WARNING or ERROR are collected.
    • Filter out logs that do not meet specified conditions. For more information, see Regular-Expressions.info.
      • If you set Key to level and RegEx to ^(?!.*(INFO|DEBUG)).*, the logs whose level is INFO or DEBUG are not collected.
      • If you set Key to level and RegEx to ^(?!(INFO|DEBUG)$).*, the logs whose level is INFO or DEBUG are not collected.
      • If you set Key to url and RegEx to .*^(?!.*(healthcheck)).*, the logs whose URL contains healthcheck are not collected. For example, if a log has the Key field of url and the Value field of /inner/healthcheck/jiankong.html, the log is not collected

    For more information, see regex-exclude-word and regex-exclude-pattern.

    Click Next to complete the Logtail configuration. Then, Log Service starts to collect logs.
    Note
    • A Logtail configuration requires a maximum of 3 minutes to take effect.
    • If an error occurs when you use Logtail to collect logs, see Diagnose collection errors.
  7. Preview collected logs, configure indexes, and then click Next.
    By default, Log Service enables full-text indexing. You can also configure field-based indexes on collected logs in manual or automatic mode. For more information, see Configure indexes.
    Note
    • If you want to query and analyze logs, you must enable full-text indexing or configure field-based indexes. If you configure both features, the system prioritizes the settings of field-based indexes.
    • If the data type of an index is long or double, you cannot configure the Case-Sensitive or Delimiter parameter.

Case: Collect single-line logs

  • Sample log
    127.0.0.1 - - [10/Sep/2018:12:36:49  0800] "GET /index.html HTTP/1.1" 200 612 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36"
  • Regular expression
    (\S )\s-\s(\S )\s\[([^]] )]\s"(\S )\s(\S )\s(\S )"\s(\S )\s(\S )\s"(\S )"\s"([^"] )".*

Case: Collect multi-line logs

  • Sample log
    [2018-10-01T10:30:01,000] [INFO] java.lang.Exception: exception happened
        at TestPrintStackTrace.f(TestPrintStackTrace.java:3)
        at TestPrintStackTrace.g(TestPrintStackTrace.java:7)
        at TestPrintStackTrace.main(TestPrintStackTrace.java:16)
  • Regular expression that is used to match the start part in the first line of the log
    \[\d -\d -\w :\d :\d ,\d ]\s\[\w ]\s.*
  • Regular expression
    \[(\S )]\s\[(\S )]\s(.*)