This topic provides solutions to exceptions that may occur when you use a Logtail container (a common container or Kubernetes) to collect logs.
- Log on to the Logtail container
- View Logtail operational logs
- View Logtail standard output (stdout)
- View the status of log-related components in a Kubernetes cluster
- View the version information, IP address, and time of Logtail
- What do I do if I mistakenly delete a Logstore that is created through CRD?
Troubleshoot heartbeat exceptions in a machine group
You can determine whether the Logtail on a container is correctly installed by checking the heartbeat status of a machine group.
- Check the heartbeat status of the machine group.
- Log on to the Log Service console, and then click the target project name.
- In the left-side navigation pane, click Logtail Machine Group.
- Find the target machine group and click Status.
Record the number of nodes for which heartbeat status is OK.
- Check the number of Worker nodes in the cluster.
Run
kubectl get node | grep -v master
to view the number of Worker nodes.$kubectl get node | grep -v master NAME STATUS ROLES AGE VERSION cn-hangzhou.i-bp17enxc2us3624wexh2 Ready <none> 238d v1.10.4 cn-hangzhou.i-bp1ad2b02jtqd1shi2ut Ready <none> 220d v1.10.4
- Compare whether the number of the nodes with heartbeat status of OK is the same as the number of Worker nodes. Then, use an appropriate troubleshooting
method according to the following possible comparison results:
- The heartbeat status of all nodes is Failed.
- If you use standard Docker logs, check whether ${your_region_name}, ${your_aliyun_user_id}, and ${your_machine_group_user_defined_id} are correct by following the instructions provided in parameter description.
- If you use installation for Kubernetes on Alibaba Cloud Container Service, open a ticket.
- If you use self-built Kubernetes installation, check whether {your-project-suffix}, {regionId}, {aliuid}, {access-key-id}, and {access-key-secret} are correct by following the instructions provided in parameter description. If the parameters are incorrect, run
helm del --purge alibaba-log-controller
to delete the installation package and reinstall Kubernetes.
- The number of nodes for which the heartbeat status is OK is smaller than the number
of Worker nodes.
- Determine whether to use the yaml file to manually deploy DaemonSet.
Run
kubectl get po -n kube-system -l k8s-app=logtail
. If any result is returned, you have manually deployed DaemonSet by using the yaml file. - Download the latest DaemonSet template.
- Set ${your_region_name}, ${your_aliyun_user_id}, and ${your_machine_group_name} as needed.
- Run
kubectl apply -f ./logtail-daemonset.yaml
to update the DaemonSet yaml file.
- Determine whether to use the yaml file to manually deploy DaemonSet.
- The heartbeat status of all nodes is Failed.
Troubleshoot log collection exceptions in a container
If you cannot find any log on the preview or query page in the console, Log Service has not collected any log from your container. In this case, check the container status and perform the following steps:
- Check whether the machine group status is normal.
- Check whether the Config identifier is correct.
Check whether IncludeLabel, ExcludeLabel, IncludeEnv, and ExcludeEnv in the Config match the configurations of the target container.Note
Label
indicates the container label (label information in docker inspect) instead of the one defined in Kubernetes. You can temporarily remove the parameters and check whether any log can be collected. If yes, the exception is caused by an incorrect Config identifier. - Check other items.
If you want to collect files from your container, note that:
- Logtail does not collect any file if there are no modified files in your container.
- Only the files that are stored by default or mounted to your local PC can be collected.
Log on to the Logtail container
- Common Docker
- On the host, run
docker ps | grep logtail
to search for the Logtail container. - Run
docker exec -it ****** bash
to log on to the Logtail container.
$docker ps | grep logtail 223fbd3ed2a6e registry.cn-hangzhou.aliyuncs.com/log-service/logtail "/usr/local/ilogta..." 8 days ago Up 8 days logtail-iba $docker exec -it 223fbd3ed2a6e bash
- On the host, run
- Kubernetes
- Run
kubectl get po -n kube-system | grep logtail
to search for the Logtail Pod. - Run
kubectl exec -it -n kube-system ****** bash
to log on to the Pod.
$kubectl get po -n kube-system | grep logtail logtail-ds-g5wgd 1/1 Running 0 8d logtail-ds-slpn8 1/1 Running 0 8d $kubectl exec -it -n kube-system logtail-ds-g5wgd bash
- Run
View Logtail operational logs
- Log on to the Logtail container.
- Open the /usr/local/ilogtail/ directory.
cd /usr/local/ilogtail
- View the ilogtail.LOG and logtail_plugin.LOG files.
cat ilogtail.LOG cat logtail_plugin.LOG
View Logtail standard output (stdout)
start umount useless mount points, /shm$|/merged$|/mqueue$
umount: /logtail_host/var/lib/docker/overlay2/3fd0043af174cb0273c3c7869500fbe2bdb95d13b1e110172ef57fe840c82155/merged: must be superuser to unmount
umount: /logtail_host/var/lib/docker/overlay2/d5b10aa19399992755de1f85d25009528daa749c1bf8c16edff44beab6e69718/merged: must be superuser to unmount
umount: /logtail_host/var/lib/docker/overlay2/5c3125daddacedec29df72ad0c52fac800cd56c6e880dc4e8a640b1e16c22dbe/merged: must be superuser to unmount
......
xargs: umount: exited with status 255; aborting
umount done
start logtail
ilogtail is running
logtail status:
ilogtail is running
View the status of log-related components in a Kubernetes cluster
To view the status of log-related components in a Kubernetes cluster, you can run
helm status alibaba-log-controller
.
View the version information, IP address, and time of Logtail
kubectl exec logtail-ds-gb92k -n kube-system cat /usr/local/ilogtail/app_info.json
{
"UUID" : "",
"hostname" : "logtail-gb92k",
"instance_id" : "0EBB2B0E-0A3B-11E8-B0CE-0A58AC140402_172.20.4.2_1517810940",
"ip" : "172.20.4.2",
"logtail_version" : "0.16.2",
"os" : "Linux; 3.10.0-693.2.2.el7.x86_64; #1 SMP Tue Sep 12 22:26:13 UTC 2017; x86_64",
"update_time" : "2018-02-05 06:09:01"
}
What do I do if I mistakenly delete a Logstore that is created through CRD?
- Use another CRD-created Logstore and take care to name the Logstore with a different name to the Logstore that was mistakenly deleted.
- Restart the alibaba-log-controller Pod. You can run
kubectl get po -n kube-system | grep alibaba-log-controller
to search for the Pod.