All Products
Search
Document Center

Simple Log Service:Diagnose and monitor LoongCollector runtime status

Last Updated:Jun 02, 2026

Simple Log Service provides diagnostics to pinpoint collection errors such as regex parsing failures, incorrect file paths, or traffic exceeding shard capacity. You can also use built-in alert rules to monitor the collector in real time and receive notifications through DingTalk or other channels.

Prerequisites

  • A collector is configured to collect logs. Collect text logs from a host.

  • Enable important logs for the destination project

    Enable the required service logs. Enable service logs.

    1. Log on to the Simple Log Service console. In the project list, click the destination project. On the project details page, click the Service Log tab and then click Enable Service Logs.

    2. In the Enable Detailed Logs panel, select Important Logs and Job Operational Logs, and then click OK.

      • A project named log-service-{user-id}-{region} is automatically created in the destination region.

      • Ingestion, storage, query, and analysis of important logs and job operational logs are free of charge. Data transformation and data shipping are billed on a pay-as-you-go basis.

Diagnose runtime issues

Two diagnostic modes are available:

  • Advanced Diagnostics (Recommended): Displays an exception dashboard with collector-related exceptions and supports querying over a longer time range.

  • Basic Diagnostics: Shows collection exceptions from the last hour.

Use cases

  • Abnormal collector status: heartbeat failures, inactive processes, or SSL certificate errors.

  • Log collection failures: logs not collected, high latency, or parsing errors such as regex mismatches.

  • Configuration errors: wrong file paths, mismatched machine group IPs, or cross-account permission issues.

  • Performance bottlenecks: collection rate near or above the default limit (20 MB/s), causing dropped logs.

  • Container log collection issues: frequent pod restarts or rapid log rotation causing incomplete collection.

  • Plugin and custom collection issues: custom plugin failures (for example, Grok parsing) or HTTP data source collection errors.

  • Data reliability issues: log loss from an inactive LoongCollector or excessively fast log rotation.

Procedure

  1. Log on to the Simple Log Service console. In the project list, click the destination project.

  2. Click imageLog Storage. In the LogStore list, hover over the target LogStore and click the Logtail configuration management icon.

  3. Click Advanced Diagnostics or Basic Diagnostics to view the diagnostic information.

  4. View diagnostic results.

    Basic diagnostics

    The Log Collection Error panel lists all LoongCollector collection errors for the LogStore. Click an error code to view details. Common data collection errors.

    Advanced diagnostics

    The LoongCollector/Logtail Exception Monitoring page shows metrics such as Active Collection Agent Count and Complete Error Information. For dashboard details, see View data reports. For error codes, see Common data collection errors.

  5. After resolving an issue, check for new errors. Historical errors remain visible until they expire — ignore these and confirm no new errors appear. LoongCollector reports errors every 10 minutes.

    To view complete logs dropped due to parsing failures, check the LoongCollector runtime logs:
    For hosts: the /usr/local/ilogtail/loongcollector.LOG file on the server.
    For containers: the /usr/local/ilogtail/loongcollector.LOG file in the container.

Monitor runtime status

SLS provides built-in alert policies to monitor the collector in real time:

  • Monitor collector heartbeats

    Query the internal-diagnostic_log LogStore for logs with __topic__:logtail_status to count machines with normal heartbeats. Configure an alert rule to trigger when the heartbeat count falls below the expected value, identifying machines that are down or have network issues.

  • Set up alerts for collection exceptions

    Run the __topic__: logtail_alarm query to analyze exceptions within 15 minutes, such as unreadable files, insufficient permissions, and parsing failures. This helps you identify and fix configuration issues to prevent log loss.

  • Receive warnings for performance bottlenecks

    Use the Logtail exception monitoring dashboard to view active LoongCollector counts, restart history, and error messages. Monitor runtime status and resource usage (CPU, memory) to identify performance bottlenecks or abnormal restarts.

  • Monitor centralized log collection

    Use the LoongCollector file collection monitoring dashboard to track collected file counts, average latency, and parsing failure rates. Centrally manage log collection status across multi-account or multi-region scenarios.

Procedure

  1. Configure an action policy to define how notifications are sent when an alert status changes.

    1. Log on to the Simple Log Service console.

    2. In the project list, click the project where you enabled important logs.

    3. In the left-side navigation pane, click imageAlerts. On the Alert Center page, choose Notification Management > Action Policy.

    4. In the action policy list, find the sls.app.logtail.builtin action policy and click Modify in the Actions column.

    5. In the Edit Action Policy dialog box, select and configure a notification channel based on your needs. Notification channels. Then, click Confirm.

  2. Create an alert rule to trigger when the LoongCollector runtime status meets a specified threshold.

    1. On the Alert Center page, click the Alert Rules tab, and then click the image icon next to Create Alert.

    2. Click Create from Template. In the Create from Template panel, click Logtail Fault Monitor under All Templates, then click the target card.

    3. In the Create Alert panel, review the configuration. The built-in alert rule includes preset parameters. Click OK. Create an alert rule.