This topic describes the operations that you can perform to troubleshoot an error that may occur when you use Logtail to collect logs from standard containers or containers in a Kubernetes cluster.
Troubleshooting
- Troubleshoot the error that occurs due to the abnormal heartbeat status of a machine group
- Troubleshoot the error that occurs due to the abnormal collection of container logs
Troubleshoot the error that occurs due to the abnormal heartbeat status of a machine group
You can check whether Logtail is installed by checking the heartbeat status of the machine group.
- Check the heartbeat status of the machine group.
- Log on to the Log Service console.
- In the Projects section, click the project that you want to view.
- In the left-side navigation pane, choose .
- In the Machine Groups list, click the machine group whose heartbeat status you want to check.
- On the Machine Group Settings page, check the machine group status and record the number of servers whose heartbeat status is OK.
- Count the number of worker nodes in the container cluster.
- Check whether the number of servers whose heartbeat status is OK in the machine group is equal to the number of worker nodes in the container cluster. Troubleshoot the error based on the check result.
- The heartbeat status of all servers in the machine group is FAIL.
- If you want to collect logs from standard Docker containers, check whether the specified values of the ${your_region_name}, ${your_aliyun_user_id}, and ${your_machine_group_user_defined_id} parameters are valid. For more information, see Parameters.
- If you use a Container Service for Kubernetes (ACK) cluster, submit a ticket. For more information, see Alibaba Cloud Container Service for Kubernetes (ACK) clusters.
- If you use a self-managed Kubernetes cluster, check whether the specified values of the {your-project-suffix}, {regionId}, {aliuid}, {access-key-id}, and {access-key-secret} parameters are valid. For more information, see Parameters.
If specified values of the parameters are invalid, run the
helm del --purge alibaba-log-controller
command to delete the installation package and then re-install the package.
- The number of servers whose heartbeat status is OK in the machine group is less than the number of worker nodes in the container cluster.
- Check whether you used a YAML file to deploy a DaemonSet.
- Run the following command. If a response is returned, the DaemonSet is deployed by using the YAML file.
kubectl get po -n kube-system -l k8s-app=logtail
- Download the latest version of the Logtail DaemonSet template.
- Configure parameters such as ${your_region_name}, ${your_aliyun_user_id}, and ${your_machine_group_name} based on your business scenario.
- Run the following command to update the file:
kubectl apply -f ./logtail-daemonset.yaml
- Run the following command. If a response is returned, the DaemonSet is deployed by using the YAML file.
- In other cases, you must submit a ticket.
- Check whether you used a YAML file to deploy a DaemonSet.
- The heartbeat status of all servers in the machine group is FAIL.
Troubleshoot the error that occurs due to the abnormal collection of container logs
- Take note of the following items when you collect logs from container files:
- Logtail collects only incremental logs. If a log file is not updated after the Logtail configuration is delivered, Logtail does not collect the logs in the file. For more information, see Read log files.
- Logtail collects logs only from files that are that are automatically stored in containers or mounted on a local server. Other storage methods are not supported.
- After logs are collected, you must create indexes. Then, you can query and analyze the logs in the Logstore. For more information, see Create indexes.
- Check whether the heartbeat status of the machine group is abnormal. For more information, see Trouble an error if the heartbeat status of a machine group is abnormal.
- Check whether the Logtail configuration is valid. Check whether the following parameters in the Logtail configuration meet your business requirements:IncludeLabel, ExcludeLabel, IncludeEnv, and ExcludeEnv.Note
- Container labels are retrieved by running the docker inspect command. Container labels are different from Kubernetes labels.
- To check whether logs can be collected as expected, you can temporarily remove the IncludeLabel, ExcludeLabel, IncludeEnv, and ExcludeEnv configurations from the Logtail configuration. If logs can be collected, the preceding parameters are incorrectly configured.
Related O&M operations
- Log on to a Logtail container
- View the operational logs of Logtail
- Standard output (stdout) of a Logtail container
- View the status of Log Service components in a Kubernetes cluster
- View the version number, IP address, and startup time of Logtail
- What do I do if I accidentally delete a Logstore that is created by using a custom resource definition (CRD)?
Log on to a Logtail container
- Common Docker container
- Run the following command on the host to query the Logtail container:
docker ps | grep logtail
The system returns an output similar to the following example:
223****6e registry.cn-hangzhou.aliyuncs.com/log-service/logtail "/usr/local/ilogta..." 8 days ago Up 8 days logtail-iba
- Run the following command to log on to the Logtail container:
docker exec -it 223****6e bash
223****6e
indicates the ID of the Logtail container. Replace the value with the actual container ID.
- Run the following command on the host to query the Logtail container:
- Kubernetes
- Run the following command to query the pods of Logtail:
kubectl get po -n kube-system | grep logtail
The system returns an output similar to the following example:
logtail-ds-****d 1/1 Running 0 8d logtail-ds-****8 1/1 Running 0 8d
- Run the following command to log on to one of the returned pods:
kubectl exec -it -n kube-system logtail-ds-****d bash
logtail-ds-****d
indicates the ID of the pod. Replace the value with the actual pod ID.
- Run the following command to query the pods of Logtail:
View the operational logs of Logtail
The logs of Logtail are stored in the ilogtail.LOG and logtail_plugin.LOG files in the /usr/local/ilogtail/ directory of a Logtail container.
- Log on to the Logtail container. For more information, see Log on to a Logtail container.
- Open the /usr/local/ilogtail/ directory.
cd /usr/local/ilogtail
- View the ilogtail.LOG and logtail_plugin.LOG files.
cat ilogtail.LOG cat logtail_plugin.LOG
Ignore the stdout of a Logtail container
start umount useless mount points, /shm$|/merged$|/mqueue$
umount: /logtail_host/var/lib/docker/overlay2/3fd0043af174cb0273c3c7869500fbe2bdb95d13b1e110172ef57fe840c82155/merged: must be superuser to unmount
umount: /logtail_host/var/lib/docker/overlay2/d5b10aa19399992755de1f85d25009528daa749c1bf8c16edff44beab6e69718/merged: must be superuser to unmount
umount: /logtail_host/var/lib/docker/overlay2/5c3125daddacedec29df72ad0c52fac800cd56c6e880dc4e8a640b1e16c22dbe/merged: must be superuser to unmount
......
xargs: umount: exited with status 255; aborting
umount done
start logtail
ilogtail is running
logtail status:
ilogtail is running
View the status of Logtail components in a Kubernetes cluster
kubectl get deploy alibaba-log-controller -n kube-system
kubectl get ds logtail-ds -n kube-system
View the version number, IP address, and startup time of Logtail
- Log on to the Logtail container. For more information, see Log on to a Logtail container.
- Run the following command to view the version number, IP address, and startup time of Logtail.
The related information is stored in the /usr/local/ilogtail/app_info.json file of the Logtail container.
kubectl exec logtail-ds-****k -n kube-system cat /usr/local/ilogtail/app_info.json
The system returns an output similar to the following example:{ "UUID" : "", "hostname" : "logtail-****k", "instance_id" : "0EB****_172.20.4.2_1517810940", "ip" : "172.20.4.2", "logtail_version" : "0.16.2", "os" : "Linux; 3.10.0-693.2.2.el7.x86_64; #1 SMP Tue Sep 12 22:26:13 UTC 2017; x86_64", "update_time" : "2018-02-05 06:09:01" }
What do I do if I accidentally delete a Logstore that is created by using a CRD?
- Specify another Logstore in the CRD configuration. In this case, the deleted Logstore is not used.
- Restart the alibaba-log-controller pod. Run the following command to find the pod:
kubectl get po -n kube-system | grep alibaba-log-controller