Simple Log Service provides diagnostics to pinpoint collection errors such as regex parsing failures, incorrect file paths, or traffic exceeding shard capacity. You can also use built-in alert rules to monitor the collector in real time and receive notifications through DingTalk or other channels.
Prerequisites
-
A collector is configured to collect logs. Collect text logs from a host.
-
Diagnose runtime issues
Two diagnostic modes are available:
-
Advanced Diagnostics (Recommended): Displays an exception dashboard with collector-related exceptions and supports querying over a longer time range.
-
Basic Diagnostics: Shows collection exceptions from the last hour.
Use cases
-
Abnormal collector status: heartbeat failures, inactive processes, or SSL certificate errors.
-
Log collection failures: logs not collected, high latency, or parsing errors such as regex mismatches.
-
Configuration errors: wrong file paths, mismatched machine group IPs, or cross-account permission issues.
-
Performance bottlenecks: collection rate near or above the default limit (20 MB/s), causing dropped logs.
-
Container log collection issues: frequent pod restarts or rapid log rotation causing incomplete collection.
-
Plugin and custom collection issues: custom plugin failures (for example, Grok parsing) or HTTP data source collection errors.
-
Data reliability issues: log loss from an inactive LoongCollector or excessively fast log rotation.
Procedure
-
Log on to the Simple Log Service console. In the project list, click the destination project.
-
Click
Log Storage. In the LogStore list, hover over the target LogStore and click the
icon. -
Click Advanced Diagnostics or Basic Diagnostics to view the diagnostic information.
-
View diagnostic results.
Basic diagnostics
The Log Collection Error panel lists all LoongCollector collection errors for the LogStore. Click an error code to view details. Common data collection errors.
Advanced diagnostics
The LoongCollector/Logtail Exception Monitoring page shows metrics such as Active Collection Agent Count and Complete Error Information. For dashboard details, see View data reports. For error codes, see Common data collection errors.
-
After resolving an issue, check for new errors. Historical errors remain visible until they expire — ignore these and confirm no new errors appear. LoongCollector reports errors every 10 minutes.
To view complete logs dropped due to parsing failures, check the LoongCollector runtime logs:
For hosts: the
/usr/local/ilogtail/loongcollector.LOGfile on the server.For containers: the
/usr/local/ilogtail/loongcollector.LOGfile in the container.
Monitor runtime status
SLS provides built-in alert policies to monitor the collector in real time:
-
Monitor collector heartbeats
Query the
internal-diagnostic_logLogStore for logs with__topic__:logtail_statusto count machines with normal heartbeats. Configure an alert rule to trigger when the heartbeat count falls below the expected value, identifying machines that are down or have network issues. -
Set up alerts for collection exceptions
Run the
__topic__: logtail_alarmquery to analyze exceptions within 15 minutes, such as unreadable files, insufficient permissions, and parsing failures. This helps you identify and fix configuration issues to prevent log loss. -
Receive warnings for performance bottlenecks
Use the Logtail exception monitoring dashboard to view active LoongCollector counts, restart history, and error messages. Monitor runtime status and resource usage (CPU, memory) to identify performance bottlenecks or abnormal restarts.
-
Monitor centralized log collection
Use the LoongCollector file collection monitoring dashboard to track collected file counts, average latency, and parsing failure rates. Centrally manage log collection status across multi-account or multi-region scenarios.
Procedure
-
Configure an action policy to define how notifications are sent when an alert status changes.
-
Log on to the Simple Log Service console.
-
In the project list, click the project where you enabled important logs.
-
In the left-side navigation pane, click
Alerts. On the Alert Center page, choose . -
In the action policy list, find the
sls.app.logtail.builtinaction policy and click Modify in the Actions column. -
In the Edit Action Policy dialog box, select and configure a notification channel based on your needs. Notification channels. Then, click Confirm.
-
-
Create an alert rule to trigger when the LoongCollector runtime status meets a specified threshold.
-
On the Alert Center page, click the Alert Rules tab, and then click the
icon next to Create Alert. -
Click Create from Template. In the Create from Template panel, click Logtail Fault Monitor under All Templates, then click the target card.
-
In the Create Alert panel, review the configuration. The built-in alert rule includes preset parameters. Click OK. Create an alert rule.
-