A Logtail configuration contains a set of policies that Logtail uses to collect logs. You can specify a data source and a collection mode when you create a custom Logtail configuration to collect logs. This topic describes how to configure Logtail when you use the Log Service API to collect data.

Basic parameters

The following table describes the basic parameters of Logtail configurations.
Table 1. Basic parameters of Logtail configurations
Parameter Type Required Example Description
configName string Yes config-sample The name of the Logtail configuration. The name must be unique in a project and cannot be modified after the configuration is created.

The name must meet the following requirements:

  • The name can contain only lowercase letters, digits, hyphens (-), and underscores (_).
  • The name must start and end with a lowercase letter or digit.
  • The name must be 2 to 128 characters in length.
inputType string Yes file The method that is used to collect logs. Valid values:
  • plugin: MySQL binary logs can be collected by using a Logtail plug-in.
  • file: Text logs can be collected in full regex mode or delimiter mode.
inputDetail JSON object Yes None The configuration details for log collection. For more information, see inputDetail.
outputType string Yes LogService The output type of collected logs. Valid value: LogService. This value indicates that the collected logs can only be uploaded to Log Service.
outputDetail JSON object Yes None The output configuration details for collected logs. For more information, see outputDetail.
logSample string No None The sample log.

inputDetail

The inputDetail parameter is used to configure the details of log collection.

  • Basic parameters of inputDetail
    Table 2. Basic parameters of inputDetail
    Parameter Type Required Example Description
    filterKey array No ["ip"] The keys that are used to filter logs. A log is collected only when the values of the keys match the regular expressions that are specified in the filterRegex parameter.
    filterRegex array No ["^10.*"] The regular expressions used to match the values of the keys that are specified in the filterKey parameter. The number of elements in the filterRegex parameter must be the same as the number of elements in the filterKey parameter.
    shardHashKey array No ["__topic__"] The mode of data writes. By default, data is written to Log Service in load balancing mode. For more information, see Load balancing mode.

    If you specify this parameter, data is written to Log Service in shard mode. For more information, see Shard mode. Valid values: __topic__, __hostname__, and __source__.

    enableRawLog boolean No false Specifies whether to upload raw logs. Valid values:
    • true: Raw logs are uploaded.
    • false: Raw logs are not uploaded.
    sensitive_keys array No None The desensitization configuration. For more information, see Table 3.
    mergeType string No topic The method that is used to aggregate data. Valid values:
    • topic: Data is aggregated by topic. This is the default value.
    • logstore: Data is aggregated by Logstore.
    delayAlarmBytes int No 209715200 The alert threshold of log collection delay. Default value: 209715200. This value indicates that an alert is triggered if the delay exceeds 200 MB.
    adjustTimezone boolean No false Specifies whether to change the time zone of logs. This parameter is valid only when the time parsing feature is enabled. For example, if you specify the timeFormat parameter, the adjustTimeZone parameter can be specified.
    logTimezone string No GMT+08:00 The offset of the time zone. Format: GMT+HH:MM or GMT-HH:MM. If the time zone of logs is UTC+8, set this parameter to GMT+08:00.
    advanced JSON object No None The advanced features. For more information, see Table 4.
    The following table describes the settings for the sensitive_keys parameter.
    • Parameter settings
      Table 3. sensitive_keys
      Parameter Type Required Example Description
      key string Yes content The name of the log field.
      type string Yes const The method that is used to desensitize content. Valid values:
      • const: The sensitive content is replaced by the value of the const field.
      • md5: The sensitive content is replaced by the corresponding MD5 value.
      regex_begin string Yes 'password':' The regular expression that is used to match the prefix of the sensitive content. The regular expression is used to search for the sensitive content. Use the RE2 syntax. For more information, see RE2 syntax.
      regex_content string Yes [^']* The regular expression that is used to match the sensitive content. Use the RE2 syntax. For more information, see RE2 syntax.
      all boolean Yes true Specifies whether to replace all sensitive content in the field that is specified in the key parameter. Valid values:
      • true: replaces all sensitive content in field. We recommend that you set the all parameter to true.
      • false: replaces only the first part of the content that matches the specified regular expression in the field.
      const string No "********" This parameter is required only when you set the type parameter to const.
    • Configuration example
      For example, if a log contains the content field whose value is [{'account':'1812213231432969','password':'04a23f38'}, {'account':'1812213685634','password':'123a'}], you can set the sensitive_keys parameter in the following format to desensitize password:
      "key" : "content"
      "type" : "const"
      "regex_begin" : "'password':'"
      "regex_content" : "[^']*"
      "all" : true
      "const" : "********"
                                                      
    • Sample log
      [{'account':'1812213231432969','password':'********'}, {'account':'1812213685634','password':'********'}]
    The following table describes the settings for the advanced parameter.
    Table 4. advanced
    Parameter Type Required Example Description
    exactly_once_concurrency int No 1 Specifies whether to enable the ExactlyOnce write feature. The ExactlyOnce write feature is used to specify the maximum number of log groups that a file can concurrently send. Valid values: 0 to 512. For more information, see Appendix: ExactlyOnce write feature. Valid values:
    • 0: The ExactlyOnce write feature is not enabled.
    • Other values: The ExactlyOnce write feature is enabled to specify the maximum number of log groups that a file can concurrently send.
    Notice
    • If you set this parameter to a greater value, more memory and disk usage is required. Set this parameter based on the actual local write traffic.
    • If the value of this parameter is less than the number of shards in the related Logstore, Logtail randomly processes data to ensure that the data can be evenly written to each shard. The specified value and the number of the shards can be different.
    • The setting is available only for files that are generated after the parameter is specified.
    • This parameter is available only for Logtail V1.0.21 or later.
    enable_log_position_meta boolean No true Specifies whether to add the metadata information of the raw file to the related log. The metadata information includes the __tag__:__inode__ and __file_offset__ fields. Valid values:
    • true: The metadata information of the raw file is added to the related log.
    • false: The metadata information of the raw file is not added to the related log.
    Note This parameter is available only for Logtail V1.0.21 or later.
    specified_year uint No 0 If the time of the raw log does not contain the year information, you can set this parameter to add the current year or a specified year to the log time. Valid values:
    • 0: adds the current year to the log time.
    • A specific year: adds a specified year to the log time, for example, 2020.
    Note This parameter is available only for Logtail V1.0.21 or later.
    force_multiconfig boolean No false Specifies whether the current Logtail configuration can collect data from the files that are matched by other Logtail configurations. Default value: false. This value indicates that the current Logtail configuration cannot collect data from the files that are matched by other Logtail configurations.

    This parameter is applicable to the scenario where the data in a file needs to be written to Log Service multiple times. For example, the data of a file is written to two different Logstores based on two Logtail configurations.

    raw_log_tag string No __raw__ The field that is used to store raw logs. Default value: __raw__.
    blacklist object No None The blacklist configuration. For more information, see Table 5.
    tail_size_kb int No 1024 The tail size of the data that is collected for the first time. Unit: KB. Default value: 1024. This value indicates that the tail size is 1 MB.
    batch_send_interval int No 3 The interval at which aggregated data is sent. Unit: seconds. Default value: 3.
    max_rotate_queue_size int No 20 The maximum length of the queue in which a file is rotated. Default value: 20.
    The following table describes the settings for the blacklist parameter.
    Table 5. blacklist
    Parameter Type Required Example Description
    dir_blacklist array No ["/home/admin/dir1", "/home/admin/dir2*"] The blacklist of directories (absolute paths). The wildcard character asterisk (*) can be used to match multiple directories.

    For example, if the configuration path is /home/admin/dir1, all contents in the /home/admin/dir1 directory are ignored during the log collection process.

    filename_blacklist array No ["app*.log", "password"] The blacklist of file names. The files whose names are specified are not collected regardless of the directories to which the files belong. The wildcard character asterisk (*) can be used to match multiple file names.
    filepath_blacklist array No ["/home/admin/private*.log"] The blacklist of file paths (absolute paths). The wildcard character asterisk (*) can be used to match multiple files.

    If the configuration path is /home/admin/private*.log, all files that start with private and end with .log in the /home/admin/ directory are ignored during the log collection process.

  • Logtail configurations used to collect text logs
    • Basic configurations
      Parameter Type Required Example Description
      logType string Yes common_reg_log The mode in which logs are collected. Valid values:
      • json_log: collects logs in JSON mode.
      • common_reg_log: collects logs in full regex mode.
      • delimiter_log: collects logs in delimiter mode.
      logPath string Yes /var/log/http/ The path of log files.
      filePattern string Yes access*.log The pattern of a log file name.
      topicFormat string Yes none The method that is used to generate a topic. Valid values:
      • none: No log topic is generated.
      • default: The topic of the collected logs is the directory of the logs.
      • group_topic: The topic of the collected logs is the topic of the machine group to which the Logtail configuration is applied.
      • The regular expression of the file path: The topic of the collected logs is a part of the log file path. Example: /var/log/(.*).log.

      For more information, see Log topics.

      timeFormat string No %Y/%m/%d %H:%M:%S The format of the log time. For more information, see Time formats.
      preserve boolean No true The timeout period of log files. If a log file is not updated within the specified period of time, Logtail considers the file to be timed out. Valid values:
      • true: The monitored directory never times out.
      • false: If a log file is not updated within 30 minutes, Logtail considers the file to be timed out and no longer monitors the file.
      preserveDepth integer No 1 If the preserve parameter is set to false, you must specify the maximum depth of the monitored directories. Valid values: 1 to 3.
      fileEncoding string No utf8 The encoding format of the log file. Valid values: utf8 and gbk.
      discardUnmatch boolean No true Specifies whether to discard the logs that do not match the specified filter conditions. Valid values:
      • true: Unmatched logs are not uploaded to Log Service.
      • false: Unmatched logs are uploaded to Log Service.
      maxDepth int No 100 The maximum depth at which the specified log directory is monitored. Valid values: 0 to 1000. The value 0 indicates that only the directory that is specified in the log path is monitored.
      delaySkipBytes int No 0 The size threshold of the data that is generated after a collection delay occurs. The threshold specifies whether to discard the delayed data that is collected at a specific point in time. Valid values:
      • 0: The delayed data is not discarded.
      • Other values: If the size of the data that is generated after a collection delay occurs exceeds the specified value, for example, 1024 bytes, the delayed data is discarded.
      dockerFile boolean No false Specifies whether logs are collected from containers. Default value: false.
      dockerIncludeLabel JSON object No None If you want to specify a label whitelist, you must specify the LabelKey parameter. If the value of the LabelValue parameter is not empty, logs are collected from the containers whose label key-value pairs match the specified key-value pairs. If the value of the LabelValue parameter is empty, logs are collected from the containers whose label keys match the specified keys.
      Note
      • Key-value pairs are associated by the OR operator. If a label key-value pair of a container matches one of the specified key-value pairs, logs of the container are collected.
      • By default, the value of the LabelValue parameter is a string. Logs are collected from the container whose name matches the value of the LabelValue parameter. If you use a regular expression to specify the value of the LabelValue parameter, logs are collected from the containers whose names match the regular expression. For example, if you specify the parameter value that starts with a caret (^) and ends with a dollar sign ($) such as ^(kube-system|istio-system)$, logs are collected from the container named kube-system and the container named istio-system.
      • Do not specify duplicate values for the LabelKey parameter. Otherwise, the existing value of the LabelValue parameter is overwritten.
      dockerExcludeLabel JSON object No None If you want to specify a label blacklist, you must specify the LabelKey parameter. If the value of the LabelValue parameter is not empty, logs are not collected from the containers whose label key-value pairs match the specified key-value pairs. If the value of the LabelValue parameter is empty, logs are not collected from the containers whose label keys match the specified keys.
      Note
      • Key-value pairs are associated by the OR operator. If a label key-value pair of a container matches one of the specified key-value pairs, logs of the container are not collected.
      • By default, the value of the LabelValue parameter is a string. Logs are not collected from the container whose name matches the value of the LabelValue parameter. If you use a regular expression to specify the value of the LabelValue parameter, logs are not collected from the containers whose names match the regular expression. For example, if you specify the parameter value that starts with a caret (^) and ends with a dollar sign ($) such as ^(kube-system|istio-system)$, logs are not collected from the container named kube-system and the container named istio-system.
      • Do not specify duplicate values for the LabelKey parameter. Otherwise, the existing value of the LabelValue parameter is overwritten.
      dockerIncludeEnv JSON object No None If you want to specify an environment variable whitelist, you must specify the EnvKey parameter. If the value of the EnvValue parameter is not empty, logs are collected from the containers whose environment variable key-value pairs match the specified key-value pairs. If the value of the EnvValue parameter is empty, logs are collected from the containers whose environment variable keys match the specified keys.
      Note
      • Key-value pairs are associated by the OR operator. If an environment variable key-value pair of a container matches one of the specified key-value pairs, logs of the container are collected.
      • By default, the value of the EnvValue parameter is a string. Logs are collected from the container whose name matches the value of the EnvValue parameter. If you use a regular expression to specify the value of the EnvValue parameter, logs are collected from the containers whose names match the regular expression. For example, if you specify the value of the EnvValue parameter that starts with a caret (^) and ends with a dollar sign ($) such as ^(kube-system|istio-system)$, logs are collected from the container named kube-system and the container named istio-system.
      dockerExcludeEnv JSON object No None If you want to specify an environment variable blacklist, you must specify the EnvKey parameter. If the value of the EnvValue parameter is not empty, logs are not collected from the containers whose environment variable key-value pairs match the specified key-value pairs. If the value of the EnvValue parameter is empty, logs are not collected from the containers whose environment variable keys match the specified keys.
      Note
      • Key-value pairs are associated by the OR operator. If an environment variable key-value pair of a container matches one of the specified key-value pairs, logs of the container are not collected.
      • By default, the value of the EnvValue parameter is a string. Logs are not collected from the container whose name matches the value of the EnvValue parameter. If you use a regular expression to specify the value of the EnvValue parameter, logs are not collected from the containers whose names match the regular expression. For example, if you specify the value of the EnvValue parameter that starts with a caret (^) and ends with a dollar sign ($) such as ^(kube-system|istio-system)$, logs are not collected from the container named kube-system and the container named istio-system.
    • Configurations that are specific to log collection in full regex mode and simple mode
      Table 6. Parameters in full regex mode and simple mode
      Parameter Type Required Example Description
      key array Yes ["content"] The list of keys that are used to specify fields for raw logs.
      logBeginRegex string No .* The regular expression that is used to match the first line of a log.
      regex string No (.*) The regular expression that is used to extract the value of a field.

      The following example shows a Logtail configuration that is used to collect logs in full regex mode:

      {
          "configName": "logConfigName", 
          "outputType": "LogService", 
          "inputType": "file", 
          "inputDetail": {
              "logPath": "/logPath", 
              "filePattern": "*", 
              "logType": "common_reg_log", 
              "topicFormat": "default", 
              "discardUnmatch": false, 
              "enableRawLog": true, 
              "fileEncoding": "utf8", 
              "maxDepth": 10, 
              "key": [
                  "content"
              ], 
              "logBeginRegex": ".*", 
              "regex": "(.*)"
          }, 
          "outputDetail": {
              "projectName": "test-project", 
              "logstoreName": "test-logstore"
          }
      }
    • Configurations that are specific to log collection in JSON mode
      Parameter Type Required Example Description
      timeKey string No time The key that is used to specify the time field.
    • Configurations that are specific to log collection in delimiter mode
      Parameter Type Required Example Description
      separator string No , Select a delimiter based on the log format. For more information, see Appendix: delimiters and sample log entries.
      quote string Yes \ If a log field contains delimiters, you must specify a pair of quotes to enclose the field. Log Service parses the content that is enclosed in a pair of quotes into a complete field. Select a quote based on the log format. For more information, see Appendix: delimiters and sample log entries.
      key array Yes [ "ip", "time"] The list of keys that are used to specify fields for raw logs.
      timeKey string Yes time Specify a field in the key list as a time field.
      autoExtend boolean No true This parameter specifies whether to upload a log entry whose number of parsed fields is less than the number of the specified keys.
      For example, if you set the delimiter to the vertical bar (|), the log entry 11|22|33|44|55 is parsed into the following fields: 11, 22, 33, 44, and 55. You can set the keys to A, B, C, D, and E.
      • true: 55 is uploaded as the value of the D key when Log Service collects the log entry 11|22|33|55.
      • false: Log Service discards the log entry 11|22|33|55 whose number of parsed fields is less than the number of the specified keys.

      The following example shows a Logtail configuration that is used to collect logs in delimiter mode:

      {
          "configName": "logConfigName", 
          "logSample": "testlog", 
          "inputType": "file", 
          "outputType": "LogService", 
          "inputDetail": {
              "logPath": "/logPath", 
              "filePattern": "*", 
              "logType": "delimiter_log", 
              "topicFormat": "default", 
              "discardUnmatch": true, 
              "enableRawLog": true, 
              "fileEncoding": "utf8", 
              "maxDepth": 999, 
              "separator": ",", 
              "quote": "\"", 
              "key": [
                  "ip", 
                  "time"
              ], 
              "autoExtend": true
          }, 
          "outputDetail": {
              "projectName": "test-project", 
              "logstoreName": "test-logstore"
          }
      }
  • Plug-in configurations
    The following table describes the specific parameter in plug-in configurations.
    Parameter Type Required Example Description
    plugin JSON object Yes None If you use a Logtail plug-in to collect logs, you must specify this parameter. For more information, see Customize Logtail plug-ins to collect data.
    The following example shows Logtail plug-in configurations:
    {
        "configName": "logConfigName", 
        "outputType": "LogService", 
        "inputType": "plugin",
        "inputDetail": {
            "plugin": {
                "inputs": [
                    {
                        "detail": {
                            "ExcludeEnv": null, 
                            "ExcludeLabel": null, 
                            "IncludeEnv": null, 
                            "IncludeLabel": null, 
                            "Stderr": true, 
                            "Stdout": true
                        }, 
                        "type": "service_docker_stdout"
                    }
                ]
            }
        }, 
        "outputDetail": {
            "projectName": "test-project", 
            "logstoreName": "test-logstore"
        }
    }

outputDetail

The following table describes how to configure the project and Logstore for output logs.
Parameter Type Required Example Description
projectName string Yes my-project The name of the project. The name must be the same as the name of the requested project.
logstoreName string Yes my-logstore The name of the Logstore.

Appendix: ExactlyOnce write feature

If you enable the ExactlyOnce write feature, Logtail records fine-grained checkpoint information by file on your local disk. If exceptions occur during the log collection process or the server restarts, Logtail uses the checkpoint information to check the processing scope of the data in each file, and then uses the incremental sequence numbers of Log Service to prevent sending duplicate data. However, disk resources are consumed when you use the ExactlyOnce write feature to write data. Limits:
  • Checkpoints are stored by using the local disk. If the disk has no storage space or disk failure occurs, checkpoints cannot be recorded. In this case, the checkpoints cannot be recovered.
  • Checkpoints contain only the metadata information of a file and do not contain the file data. If the file is deleted or modified, the related checkpoints cannot be recovered.
  • The ExactlyOnce write feature depends on the current write sequence numbers that are recorded by Log Service. Each shard supports only 10,000 records. If the limit is exceeded, the previous records are replaced. To ensure the reliability of the feature, the parameter value that is calculated by using the following formula cannot exceed 9500: Value = Number of active files that are written to the same Logstore × Number of Logtail instances. We recommend that you reserve a greater value.
    • Number of active files: the number of the files that are being read and sent. Different rotated files with the same logical file name are sent by using the serial communication method. These files are considered as an active file.
    • Number of Logtail instances: the number of the log collection processes of Logtail. By default, each machine is a Logtail instance. The number of Logtail instances is the same as the number of machines.

By default, the sync function is not called when Logtail writes checkpoints to the disk. If the machine restarts, the data in the buffer cannot be written to the disk and the checkpoints may be lost. To enable the sync write feature, you can add "enable_checkpoint_sync_write": true, to the /usr/local/ilogtail/ilogtail_config.json Logtail configuration file. For more information, see Set the startup parameters of Logtail.