For information about how to configure monitoring metrics, see Monitor Log Service.

  1. Read/write traffic
    • The real-time read traffic from and write traffic to a Logstore. Data is read or written by using Logtail, API, or SDK. The size is equal to the size of raw data or compressed data, and is measured every minute.
    • Unit: bytes/min.
  2. Size of raw data
    • The size of raw data written to a Logstore.
    • Unit: bytes/min.
  3. Overall QPS
    • The total number of operations per second. The number is counted every minute.
    • Unit: times/min.
  4. Number of operations
    • The number of each type of operation per second. The number is counted every minute.
    • Unit: times/min.
    • The operations include:
      • Write operations:
        • PostLogStoreLogs: API 0.5 or later.
        • PutData: API 0.4 or earlier.
      • Query by keyword:
        • GetLogStoreHistogram: queries the distribution of one or more keywords in log data. The API version is 0.5 or later.
        • GetLogStoreLogs: queries log entries by keyword. The API version is 0.5 or later.
        • GetDataMeta: queries the distribution of one or more keywords in log data. The API version is 0.4 or earlier.
        • GetData: queries log entries by keyword. The API version is 0.4 or earlier.
      • Query log data in batches:
        • GetCursorOrData: retrieves the cursor or batch log data.
        • ListShards: retrieves all shards in a Logstore.
      • List operations:
        • ListCategory: lists Logstores in a project. This API operation has the same function as ListLogStores. The API version is 0.4 or earlier.
        • ListTopics: lists all topics in a Logstore.
  5. Service status
    • This metric lists the QPS based on the HTTP status code of each API operation. You can use the status codes to check whether a request succeeds. If the request fails, you can modify the parameters in the request based on the returned code.
    • HTTP status codes:
      • 200: success code. This code indicates that the request succeeded.
      • 400: indicates request parameter errors. The parameters may include Host, Content-Length, x-log-apiversion, RequestTimeExpired, query time range, Reverse, Accept-Encoding, Accept, shard, cursor, PostBody, and Content-Type.
      • 401: indicates authentication failures. The probable cause is that the AccessKey ID does not exist, the signature is invalid, or the account is not authorized to send the request. Log on to the console to check whether the AccessKey ID of the account is granted relevant permissions to access the project.
      • 403: indicates that a resource quota limit is exceeded. For example, the maximum number of Logstores that can be created, the maximum number of shards that can be split, or the maximum read/write speed per minute is exceeded. You can analyze the error based on the returned error message.
      • 404: indicates that the requested resource does not exist. The resources can be projects, Logstores, log data topics, and users.
      • 405: indicates that the request method is incorrect. Check the URL of the request.
      • 500: indicates a server-side error. Try again.
      • 502: indicates a server-side error. Try again.
  6. Traffic resolved successfully
    • The raw log data that Logtail successfully collects.
    • Unit: bytes.
  7. Lines resolved successfully
    • The number of lines of log data that Logtail successfully collects.
    • Unit: lines.
  8. Lines failed to be resolved
    • The number of lines of log data that Logtail fails to collect. If data is collected for this metric, errors occurred during log data collection.
    • Unit: lines.
  9. Number of errors
    • The number of errors that occurred during log data collection.
    • Unit: times.
  10. Number of error instances
    • The number of servers on which errors occurred during log data collection.
    • N/A
  11. Number of error IP addresses
    • The number of server IP addresses categorized based on the errors that occurred during log data collection. The number is calculated every 5 minutes.
      • LOGFILE_PERMINSSION_ALARM: Data cannot be collected because you have no permissions to access the log file.
      • SENDER_BUFFER_FULL_ALARM: Data is dropped because the data collection speed exceeds the network transmission speed.
      • INOTIFY_DIR_NUM_LIMIT_ALARM(INOTIFY_DIR_QUOTA_ALARM): The number of monitored directories exceeds 3,000. Set a lower-level directory for monitoring.
      • DISCARD_DATA_ALARM: Data is dropped because CPU resources allocated to Logtail are insufficient or the network bandwidth is insufficient.
      • MULTI_CONFIG_MATCH_ALARM: Multiple configuration files are used to collect the same log file. Logtail selects a random configuration file to collect data. No data is collected by using other configuration files.
      • REGISTER_INOTIFY_FAIL_ALARM: The inotify tool fails to be registered. For more information, check the logs of Logtail.
      • LOGDIR_PERMINSSION_ALARM: The user is not authorized to access the monitored directory.
      • REGEX_MATCH_ALARM: The regular expression fails to match the log entry. Adjust the regular expression.
      • ENCODING_CONVERT_ALARM: The encoding format of log data fails to be converted. For more information, check the logs of Logtail.
      • PARSE_LOG_FAIL_ALARM: Log data fails to be parsed. This error is returned because the regular expression fails to match the first line of a log entry or a log entry exceeds 512 KB. Check the Logtail logs to identify the cause. If the regular expression used to match the first line of a log entry is incorrect, modify the regular expression.
      • DISCARD_DATA_ALARM: Data is discarded because Logtail fails to write data to the local cache or send the data to Log Service. The probable cause is that log data is generated at a higher speed than it is written to the local cache.
      • DISCARD_DATA_ALARM: Parsed data fails to be sent to Log Service. The probable cause is that the size of log data sent to Log Service exceeds the limit or a network exception occurs. Check the returned error code and error message of Logtail to determine the cause.
      • PARSE_TIME_FAIL_ALARM: The time field in a log entry matched with the regular expression fails to be parsed by using the specified time format. Modify the configurations as required.
      • OUTDATED_LOG_ALARM: Data is dropped because the time period between the data is collected and the data is written to Log Service exceeds 12 hours.
    • Locate the specific IP address of the server based on the error. Log on to the server and check the /usr/logtail/ilogtail.LOG file to analyze the cause of the error.