All Products
Search
Document Center

Simple Log Service:What do I do if an error occurs when I use Logtail to collect logs from containers?

Last Updated:Dec 12, 2023

This topic describes the operations that you can perform to troubleshoot an error that may occur when you use Logtail to collect logs from standard containers or containers in a Kubernetes cluster.

Troubleshooting

Troubleshoot the error that occurs due to the abnormal heartbeat status of a machine group

You can check whether Logtail is installed by checking the heartbeat status of the machine group.

  1. Check the heartbeat status of the machine group.

    1. Log on to the Simple Log Service console.

    2. In the Projects section, click the project that you want to manage.

    3. In the left-side navigation pane, choose Resources > Machine Groups.

    4. In the Machine Groups list, click the machine group whose heartbeat status you want to check.

    5. On the Machine Group Settings page, check the machine group status and record the number of servers whose heartbeat status is OK.

  2. Count the number of worker nodes in the container cluster.

    1. Log on to the container cluster.

    2. Run the following command to view the number of worker nodes in the cluster:

      kubectl get node | grep -v master

      The system returns an output similar to the following example:

      NAME                                 STATUS    ROLES     AGE       VERSION
      cn-hangzhou.i-bp17enxc2us3624wexh2   Ready     <none>    238d      v1.10.4
      cn-hangzhou.i-bp1ad2b02jtqd1shi2ut   Ready     <none>    220d      v1.10.4
  3. Check whether the number of servers whose heartbeat status is OK in the machine group is equal to the number of worker nodes in the container cluster. Troubleshoot the error based on the check result.

    • The heartbeat status of all servers in the machine group is FAIL.

      • If you want to collect logs from standard Docker containers, check whether the specified values of the ${your_region_name}, ${your_aliyun_user_id}, and ${your_machine_group_user_defined_id} parameters are valid. For more information, see Parameters.

      • If you use a Container Service for Kubernetes (ACK) cluster, submit a ticket. For more information, see Install Logtail.

      • If you use a self-managed Kubernetes cluster, check whether the specified values of the {your-project-suffix}, {regionId}, {aliuid}, {access-key-id}, and {access-key-secret} parameters are valid. For more information, see Parameters.

        If specified values of the parameters are invalid, run the helm del --purge alibaba-log-controller command to delete the installation package and then re-install the package.

    • The number of servers whose heartbeat status is OK in the machine group is less than the number of worker nodes in the container cluster.

      • Check whether you used a YAML file to deploy a DaemonSet.

        1. Run the following command. If a response is returned, the DaemonSet is deployed by using the YAML file.

          kubectl get po -n kube-system -l k8s-app=logtail
        2. Download the latest version of the Logtail DaemonSet template.

        3. Configure parameters such as ${your_region_name}, ${your_aliyun_user_id}, and ${your_machine_group_name} based on your business scenario.

        4. Run the following command to update the file:

          kubectl apply -f ./logtail-daemonset.yaml
      • In other cases, you must submit a ticket.

Troubleshoot the error that occurs due to the abnormal collection of container logs

If no data exists in the Preview Data section in the Log Service console or no logs can be queried on the Search & Analysis page of the related Logstore, Log Service does not collect your container logs. Check the status of the containers that correspond to the servers in the machine group. If the containers are working as expected, perform the following steps to troubleshoot the error.

Important
  • Take note of the following items when you collect logs from container files:

    • Logtail collects only incremental logs. If a log file is not updated after the Logtail configuration is delivered, Logtail does not collect the logs in the file. For more information, see Read log files.

    • Logtail collects logs only from files that are automatically stored in containers or mounted on a local server. Other storage methods are not supported.

  • After logs are collected, you must create indexes. Then, you can query and analyze the logs in the Logstore. For more information, see Create indexes.

  1. Check whether the heartbeat status of the machine group is abnormal. For more information, see Trouble an error if the heartbeat status of a machine group is abnormal.

  2. Check whether the Logtail configuration is valid.

    Check whether the following parameters in the Logtail configuration meet your business requirements:IncludeLabel, ExcludeLabel, IncludeEnv, and ExcludeEnv.

    Note
    • Container labels are retrieved by running the docker inspect command. Container labels are different from Kubernetes labels.

    • To check whether logs can be collected as expected, you can temporarily remove the IncludeLabel, ExcludeLabel, IncludeEnv, and ExcludeEnv configurations from the Logtail configuration. If logs can be collected, the preceding parameters are incorrectly configured.

Related O&M operations

Log on to a Logtail container

  • Common Docker container

    1. Run the following command on the host to query the Logtail container:

      docker ps | grep logtail

      The system returns an output similar to the following example:

      223****6e        registry.cn-hangzhou.aliyuncs.com/log-service/logtail                             "/usr/local/ilogta..."   8 days ago          Up 8 days                               logtail-iba
    2. Run the following command to log on to the Logtail container:

      docker exec -it 223****6e  bash

      223****6e indicates the ID of the Logtail container. Replace the value with the actual container ID.

  • Kubernetes

    1. Run the following command to query the pods of Logtail:

      kubectl get po -n kube-system | grep logtail

      The system returns an output similar to the following example:

      logtail-ds-****d                                             1/1       Running    0          8d
      logtail-ds-****8                                             1/1       Running    0          8d
    2. Run the following command to log on to one of the returned pods:

      kubectl exec -it -n kube-system logtail-ds-****d bash

      logtail-ds-****d indicates the ID of the pod. Replace the value with the actual pod ID.

View the operational logs of Logtail

The logs of Logtail are stored in the ilogtail.LOG and logtail_plugin.LOG files in the /usr/local/ilogtail/ directory of a Logtail container.

  1. Log on to the Logtail container. For more information, see Log on to a Logtail container.

  2. Open the /usr/local/ilogtail/ directory.

    cd /usr/local/ilogtail
  3. View the ilogtail.LOG and logtail_plugin.LOG files.

    cat ilogtail.LOG
    cat logtail_plugin.LOG

Ignore the stdout of a Logtail container

The stdout of the container is irrelevant to this case. Ignore the following stdout:

start umount useless mount points, /shm$|/merged$|/mqueue$
umount: /logtail_host/var/lib/docker/overlay2/3fd0043af174cb0273c3c7869500fbe2bdb95d13b1e110172ef57fe840c82155/merged: must be superuser to unmount
umount: /logtail_host/var/lib/docker/overlay2/d5b10aa19399992755de1f85d25009528daa749c1bf8c16edff44beab6e69718/merged: must be superuser to unmount
umount: /logtail_host/var/lib/docker/overlay2/5c3125daddacedec29df72ad0c52fac800cd56c6e880dc4e8a640b1e16c22dbe/merged: must be superuser to unmount
......
xargs: umount: exited with status 255; aborting
umount done
start logtail
ilogtail is running
logtail status:
ilogtail is running

View the status of Logtail components in a Kubernetes cluster

Run the following commands:

kubectl get deploy alibaba-log-controller -n kube-system
kubectl get ds logtail-ds -n kube-system

View the version number, IP address, and startup time of Logtail

  1. Log on to the Logtail container. For more information, see Log on to a Logtail container.

  2. Run the following command to view the version number, IP address, and startup time of Logtail.

    The related information is stored in the /usr/local/ilogtail/app_info.json file of the Logtail container.

    kubectl exec logtail-ds-****k -n kube-system cat /usr/local/ilogtail/app_info.json

    The system returns an output similar to the following example:

    {
       "UUID" : "",
       "hostname" : "logtail-****k",
       "instance_id" : "0EB****_172.20.4.2_1517810940",
       "ip" : "172.20.4.2",
       "logtail_version" : "0.16.2",
       "os" : "Linux; 3.10.0-693.2.2.el7.x86_64; #1 SMP Tue Sep 12 22:26:13 UTC 2017; x86_64",
       "update_time" : "2018-02-05 06:09:01"
    }

What do I do if I accidentally delete a Logstore that is created by using a CRD?

If you delete a Logstore that is automatically created by using a CRD, the collected data cannot be restored, and the CRD configuration for the Logstore becomes invalid. You can use one of the following methods to prevent log collection exceptions:

  • Specify another Logstore in the CRD configuration. In this case, the deleted Logstore is not used.

  • Restart the alibaba-log-controller pod.

    Run the following command to find the pod:

    kubectl get po -n kube-system | grep alibaba-log-controller