Understanding the Kubelet Core Execution Frame

Kubelet is the node agent in a Kubernetes cluster, and is responsible for the Pod lifecycle management on the local node. Kubelet first obtains the Pod configurations assigned to the local node, and then invokes the bottom-layer container runtime, such as Docker or PouchContainer, based on the obtained configurations to create Pods. Then Kubelet monitors the Pods, ensuring that all Pods on the node run in the expected state. This article analyzes the previous process using the Kubelet source code.

Obtaining Pod Configurations

Kubelet can obtain Pod configurations required by the local node in multiple ways. The most important way is Apiserver. Kubelet can also obtain the Pod configurations by specifying the file directory or accessing the specified HTTP port. Kubelet periodically accesses the directory or HTTP port to obtain Pod configuration updates and adjust the Pod running status on the local node.

During the initialization of Kubelet, a PodConfig object is created, as shown below:

// kubernetes/pkg/kubelet/config/config.go
type PodConfig struct {
    pods *podStorage
    mux  *config.Mux
    // the channel of denormalized changes passed to listeners
    updates chan kubetypes.PodUpdate
    ...
}

PodConfig is essentially a multiplexer of Pod configurations. The built-in mux can listen on the sources of various Pod configurations (including apiserver, file, and http), and periodically synchronize the Pod configuration status of the sources. The pods caches the Pod configuration status of the sources in last synchronization. After comparing the configurations, mux can get the Pods of which the configurations have changed. Then, mux classifies the Pods based on the change types, and injects a PodUpdate structure into each type of Pod:

// kubernetes/pkg/kubelet/types/pod_update.go
type PodUpdate struct {
    Pods   []*v1.Pod
    Op     PodOperation
    Source string
}

The Op field defines the Pod change type. For example, its value can be ADD or REMOVE, indicating to add or delete the Pods defined in Pods. Last, all types of PodUpdate will be injected to updates of PodConfig. Therefore, we only need to listen to the updates channel to obtain Pod configuration updates of the local node.

Pod Synchronization

After the Kubelet initialization is complete, the syncLoop function as shown below is invoked:

// kubernetes/pkg/kubelet/kubelet.go
// syncLoop is the main loop for processing changes. It watches for changes from
// three channels (file, apiserver, and http) and creates a union of them. For
// any new change seen, will run a sync against desired state and running state. If
// no changes are seen to the configuration, will synchronize the last known desired
// state every sync-frequency seconds. Never returns.
func (kl *Kubelet) syncLoop(updates <-chan kubetypes.PodUpdate, handler SyncHandler){
    ...
    for {
        if !kl.syncLoopIteration(...) {
            break
        }        
    }
    ...
}

As indicated in the comments, the syncLoop function is the major cycle of Kubelet. This function listens on the updates, obtains the latest Pod configurations, and synchronizes the running state and desired state. In this way, all Pods on the local node can run in the expected states. Actually, syncLoop only encapsulates syncLoopIteration, while the synchronization operation is carried out by syncLoopIteration.

// kubernetes/pkg/kubelet/kubelet.go
func (kl *Kubelet) syncLoopIteration(configCh <-chan kubetypes.PodUpdate ......) bool {
    select {
    case u, open := <-configCh:
        switch u.Op {
        case kubetypes.ADD:
            handler.HandlePodAdditions(u.Pods)
        case kubetypes.UPDATE:
            handler.HandlePodUpdates(u.Pods)
        ...
        }
    case e := <-plegCh:
        ...
        handler.HandlePodSyncs([]*v1.Pod{pod})
        ...
    case <-syncCh:
        podsToSync := kl.getPodsToSync()
        if len(podsToSync) == 0 {
            break
        }
        handler.HandlePodSyncs(podsToSync)
    case update := <-kl.livenessManager.Updates():
        if update.Result == proberesults.Failure {
            ...
            handler.HandlePodSyncs([]*v1.Pod{pod})
        }
    case <-housekeepingCh:
         ...
        handler.HandlePodCleanups()
        ...
    }
}

The syncLoopIteration function has a simple processing logic. It listens to multiple channels. Once it obtains a type of event from a channel, it invokes the corresponding function to process the event. The following is the processing of different events:

Obtain the Pod configuration changes from configCh, and invoke the corresponding function based on the change type. For example, if new Pods are bound to the local node, the HandlePodAdditions function is invoked to create these Pods. If some Pod configurations are changed, the HandlePodUpdates function is invoked to update the Pods.
If the container status in the Pod has changed (for example, a new container is created and launched), a PodlifecycleEvent is sent to the plegCh channel. The event includes the event type ContainerStarted, container ID, and the ID of the Pod to which the container belongs. Then syncLoopIteration will invoke HandlePodSyncs to synchronize the Pod configurations.
syncCh is in fact a timer. By default, Kubelet triggers this timer every second to synchronize the Pod configurations on the local node.
During initialization, Kubelet creates a livenessManager to check the health status of configured Pods. If Kubelet detects a running error of a Pod, it invokes HandlePodSyncs to synchronize the Pod. This part will be further described later.
houseKeepingCh is also a timer. By default, Kubelet triggers this timer every two seconds and invokes the HandlePodCleanups function for processing. This is a periodic cleanup mechanism in which the resources of the stopped Pods are reclaimed at a certain interval.

As shown in the above figure, the execution paths of most processing functions are similar. The functions, including HandlePodAdditions, HandlePodUpdates, and HandlePodSyncs will invoke the dispatchWork function after completing their own operations. If the dispatchWork function detects that the Pod to be synchronized is not in the Terminated state, it invokes the Update method of podWokers to update the Pod. We can consider the process of Pod creation, update, or synchronization as the status transition from running to desired. This helps you understand the Pod update and synchronization processes. For Pod creation, we can consider that the current status of new Pod is null. Then the Pod creation can also be considered as a status transition process. Therefore, in Pod creation, update, or synchronization, the status of Pods can be changed to the target status only by invoking the Update function.

podWorkers is created during Kubelet initialization, as shown below:

// kubernetes/pkg/kubelet/pod_workers.go
type podWorkers struct {
    ...
    podUpdates map[types.UID]chan UpdatePodOptions

    isWorking map[types.UID]bool

    lastUndeliveredWorkUpdate map[types.UID]UpdatePodOptions

    workQueue queue.WorkQueue

    syncPodFn syncPodFnType
    
    podCache kubecontainer.Cache
    ...
}

Kubelet configures a dedicated pod worker for each created pod. The pod worker is in fact the goroutine. It creates a channel with buffer size 1 and type UpdatePodOptions (which is a pod update event), listens to the channel to obtain pod update events, and invokes the specified synchronization function in the syncPodFn field of podWorkers to perform synchronization.

In addition, the pod worker registers the channel to the podUpdates map in podWorkers so that the specified update event can be sent to the corresponding pod worker for processing.

If another update event occurs when the current event is being processed, what will happen? podWorkers caches the latest event to lastUndeliveredWorkUpdate, and processes it immediately after the processing of the current event is complete.

The pod worker adds the processed pod to workQueue of podWorkers every time an update event is processed, and inserts an additional delay. The pod can be retrieved from the queue only when the delay expires, and the next synchronization is performed. As previously mentioned, syncCh is triggered every second to collect the Pods to be synchronized on the local node, and then HandlePodSyncs is invoked to perform synchronization. These Pods are expired at the current time point and are obtained from workQueue. Then, the entire pod synchronization process form a closed ring, as shown below.

When creating the podWorkers object, Kubelet uses its own syncPod method to initialize syncPodFn. However, this method is only used to prepare the synchronization. For example, it uploads the latest Pod status to Apiserver, creates the dedicated directory for Pods, and obtains the pull secrets of Pods. Then, Kubelet invokes the SyncPod method of its own containerRuntime for synchronization. containerRuntime abstracts the bottom-layer container running of Kubelet, and defines various interfaces for container running. SyncPod is one of the interfaces.

Kubelet does not carry out any container-related operation. Pod synchronization is essentially the container status change. Achieving container status change must invoke and run the bottom-layer container such as PouchContainer.

The following describes the SyncPod method of containerRuntime to show the real synchronization operations:

// kubernetes/pkg/kubelet/kuberuntime/kuberuntime_manager.go
func (m *kubeGenericRuntimeManager) SyncPod(pod *v1.Pod, _ v1.PodStatus, podStatus *kubecontainer.PodStatus, pullSecrets []v1.Secret, backOff *flowcontrol.Backoff) (result kubecontainer.PodSyncResult)

This function first invokes computePodActions(pod, podStatus) to compare the current Pod status podStatus and target Pod status pod, and then calculates the required synchronization operations. After the calculation is complete, a PodActions object is returned, as shown below:

// kubernetes/pkg/kubelet/kuberuntime/kuberuntime_manager.go
type podActions struct {
    KillPod bool
    
    CreateSandbox bool
    
    SandboxID string
    
    Attempt uint32
    
    ContainersToKill map[kubecontainer.ContainerID]containerToKillInfo
    
    NextInitContainerToStart *v1.Container
    
    ContainersToStart []int
}

Actually, PodActions is an operation list:

Generally, the values of KillPod and CreateSandbox are the same, indicating whether to kill the current Pod sandbox (if a new Pod is created, this operation is null) and create a new sandbox.
SandboxID identifies the Pod creation operation. If its value is null, this is the first time to create Pod. If its value is not null, this is the new sandbox created after the old one is killed.
Attempt indicates the number of times the Pod recreates sanboxes. For the first time to create Pod, this value is 0. It has the similar function to SandboxID.
ContainersToKill specifies the containers to be killed in the Pod because the container configurations have changed or the health check fails.
If the running of init container of Pod has not finished or a running error occurs, NextInitContainerToStart indicates the next init container to be created. Create and start this init container. The synchronization is complete.
If the Pod sandbox has been created and running of init container is complete, start the ordinary containers that have not run in the Pod according to ContainersToStart.

With such an operation list, the remaining operations of SyncPod are simple. That is, it only needs to invoke the interfaces corresponding to the bottom-layer container running one by one to perform the container adding and deleting operations, to complete synchronization.

The summarized Pod synchronization procedure is: When the Pod target status changes or a synchronization interval times out, a Pod synchronization is triggered. Synchronization is to compare the container target status with the current status, generate a container start/stop list, and invoke the bottom-layer container runtime interfaces based on the list to start or stop the containers.

Conclusion

If a container is a process, Kubelet is a container-oriented process monitor. The job of Kubelet is to continuously change the Pod running status on the local node to the target status. During the transition, unwanted containers are deleted and new containers are created and configured. There is no repeated modification, start, or stop operations on an existing container. This is all about Kubelet's core processing logic.

Note

The source code in this article is from Kubernetes v1.9.4, commit: bee2d1505c4fe820744d26d41ecd3fdd4a3d6546
For detailed comments about Kubernetes source code, visit my GitHub page.
Reference: What even is a kubelet?

Community

Understanding the Kubelet Core Execution Frame

Obtaining Pod Configurations

Pod Synchronization

Conclusion

Note

Read previous post:

Read next post:

Alibaba Cloud Native Community

You may also like

Comments

Alibaba Cloud Native Community

Related Products

Container Service for Kubernetes

ACK One

Architecture and Structure Design

Container Compute Service (ACS)