Open modular multi-cluster management platform OCM

1、 Overview

Open Cluster Management, abbreviated as OCM, is an open modular multi cluster management system that refers to Kubernetes clusters. OCM is one of the CNCF cloud native sandbox projects. In Alibaba Cloud's proprietary cloud, OCM will also become a very prominent part of Alibaba Cloud's CNStack. Below, from the perspective of a user's scenario, we will explain why we need to move from single cluster to multi cluster, the main architecture and functions of OCM, and the recent developments of OCM.

2、 Why go from single cluster to multi cluster

In the process of practicing cloud native related technologies, as the scale of the practice gradually expands, the scale of a single Kubernetes cluster cannot be supported and needs to transition to a multi cluster solution. A year ago, OCM was still a very new project, so its activity was relatively low. With more and more partners joining recently and developing a series of features based on OCM, KubeVela and OCM have also had some integration (such as multi cluster application distribution).

The core components of KubeVela are shared with OCM, which can perform more stable tasks for production on the basis of multi cluster KubeVela, such as certificate rolling and automatic extension discovery. So what are the bottlenecks of a single cluster?

1. Single cluster bottleneck

a. Cross machine room/regional Etcd operation and maintenance issues

Firstly, there is the issue of cross machine room and cross regional Etcd operation and maintenance. In lightweight K8s operation and maintenance, a Kubernetes cluster can be seen as a stateless application on Etcd. In most cases, the operation and maintenance of K8s depends on the operation and maintenance issues of Etcd;

Etcd has some hard requirements for localized configuration, including latency between two computer rooms. If the region is far away, it will face the need for Etcd to be split for deployment, such as across computer rooms and regions. As Etcd needs to be split for deployment, the corresponding k8s cluster also needs to be re planned. In addition, as the scale of K8s increases, it will also encounter the bottleneck of Etcd request pressure in the production and operation process.

b. Kubenretes' Availability radius control problem

Secondly, a bottleneck for single machine clusters is the issue of controlling the "availability radius". Although there is a concept of a namespace in K8s, it is hoped that this tenant isolation can be described through a namespace in the design. For example, Application 1 should be deployed in Namespace1 and Application 2 should be deployed in Namespace2. However, in fact, the native isolation of K8s has not yet met expectations. In addition, even if the ideal isolation state is achieved, But it is also necessary to prevent some unpredictable disasters (such as k8s upgrades/operations/downtime events). In order to isolate these problems in the simplest way, it is necessary to implement "availability radius" control, This is also commonly referred to as explosion radius control (for example, there is a computer room in Chengdu and a computer room in Shanghai, where a set of K8s is deployed separately. In localized recommendation services, a set of clusters in each region can also be used to ensure that their respective availability does not conflict or become intertwined).

c. Insufficient tenant level isolation in native Kubernetes

Finally, there is insufficient isolation at the native tenant level within K8s. In addition to multiple clusters, there is also a parallel working group called multi tenants. The problems that need to be solved by multiple clusters and multi tenants are homogeneous, while multi clusters solve isolation by adopting a more gentle approach: fully respecting a single cluster K8s without modifying the source code, managing the corresponding K8s cluster on the control plane, and flexibly taking over or kicking out a cluster. The so-called multi tenant cluster management aims to transparently map/collect the Information mapping of each cluster back to the central control plane for unified management.

2. Multi cluster user scenario

a. Elastic expansion at the cluster level

There are currently many scenarios for multi cluster applications, including cluster level elastic expansion, such as the elastic promotion and expansion of the Double 11 on Alibaba Cloud; Similar scenarios also include flexible migration at the computer room level, such as in the case of multi computer room migration or operator switching, which requires cluster level migration. In this case, a control plane for multiple clusters is needed to manage and connect the complexity involved.

b. Expansion of Sandbox Testing and Exercise Environment

Secondly, the expansion of sandbox testing and exercise environments requires various environments in practical work, such as integration testing, regular CI testing, etc. Isolation of such environments can be achieved through multi cluster expansion. As the tool chain (similar to K8s cluster level elastic expansion) matures step by step, managing the testing environment through multiple clusters is also a natural thing.

c. Infrastructure operation and maintenance in a wide area

Furthermore, in the scenario of infrastructure operation and maintenance in a wide area, due to the physical level of network latency between regions, there is not much room for choice when selecting infrastructure planning. Generally, the topology of multi cluster planning is directly determined based on physical conditions such as whether to build a dedicated line.

d. Flexibility in a multi cloud and multi service provider environment

Multi cluster allows users to have more flexibility in choosing a multi cloud and multi service provider environment.

e. Cross team/organizational cluster collaboration

Finally, for cross team and cross organizational cluster collaboration issues, a single K8s cluster will always encounter problems and be held accountable. In cases where observability construction is incomplete, a cluster can be assigned to a team for management.

3. Evaluation of soft indicators

What are the indicators for evaluating OCM?

• The first and most important thing is infrastructure dependency and coupling. In the multi cluster scenario, there is a multi cluster central brain, which can manage different clusters flexibly without too many dependencies preset before taking over the cluster;

Secondly, the scalability of OCM aims to be like building blocks, allowing for the flexibility to add necessary resources and kick out unnecessary resources, with hot pluggable and customizable features. In terms of openness, there will be no preset restrictions on manufacturers in terms of functionality;

• Finally, the neutrality of vendors does not involve any vendor level mutual exclusion, and can enable flexible switching between multiple vendor clouds.

3、 Introduction to the main architecture/functions of OCM

1. Main architecture

Architecture diagram of OCM

Firstly, in the K8s architecture, there is a control node called kube master (including components such as scheduler, Apiserver, Controller manager, etc.) that registers the node in the master. During this process, the node pulls the corresponding metadata information (such as Pod and Node) from the remote control surface through List Watch, so this is a classic Pull architecture model. In the native K8s, there are Kubemaster and Kubelet. In OCM, the benchmarking is Hub Cluster benchmarking against Kubemaster and Klusterlet benchmarking against Kubelet. The K8s cluster is regarded as a Klusterlet when viewed as an atomic black box. When operating and maintaining a cluster with multiple cluster control surfaces in OCM, it is actually operating and maintaining a single Kubernetes cluster like a node. This architecture naturally imitates the architecture of Kubernetes, which is based on the Pull model.

2. Main concepts

After understanding the core concepts of Hub and Klusterlet, what are the core functions currently supported by OCM?

a. ManagedCluster/Logical Hosted Cluster

First, the ManagedCluster is a logical managed cluster. Whether it is the flexibility of the cluster in the takeover process, or the coupling and dependency of the infrastructure, it is oriented to the ManagedCluster model. Then it is registered by the managed cluster to the OCM hub, where you can operate the ManagedCluster model and view the metadata at the corresponding cluster level;

b. ManifestWork/Resource Distribution

When multiple clusters are registered on the OCM hub, resources can be distributed to multiple ManagedClusters. During the process of distributing resources, it is necessary to define a ManifestWork resource and define the content to be distributed in the Work API. For example, to distribute the deployment application and corresponding service, package it into the API and establish a corresponding relationship with ManagedClusters, and then publish marked resources on the corresponding cluster. Currently, ManifestWork is a public standard in the community;

c. Placement/Multi Cluster Routing

The third function is Placement, which refers to multi cluster routing or matching or scheduling of multiple clusters. It involves how to associate distributed resources with corresponding clusters and associate the resources that need to be defined with a routing policy that determines which clusters to copy or paste this set of resources onto;

d. Add On/Plugin

The Add On plugin is designed to improve the scalability of multiple clusters, similar to the Control Runtime in K8s. It is a framework for developers to customize Operator Controllers.

3. Cluster nanotube

As shown in the above picture, it is a scenario where a managed cluster is connected to the OCM hub. Firstly, it is necessary to plan the hub cluster, execute the command Clusteradm join in the hub cluster, generate the rendered instructions, and copy them to be executed in the managed cluster. This will register the managed cluster successfully. The OCM registration process is not mutual trust and requires a bidirectional handshake, similar to a TCP handshake. It not only requires the managed cluster to initiate a takeover request to the central cluster, but also requires the central cluster to approve the request.

4、 Introduction to recent features of OCM

1. Addon Framework

The plugin framework has the following features:

a. Free expansion;

b. Free disassembly;

c. Continuous operation and upgrading;

2. Knnoectivity Multi Cluster Tunnel/Cluster Proxy

Knnoectivity is a native technology of K8s, used to solve the problem of the control surface network and node network not being in the same plane. By using network tunneling technology, it can cross any network topology and solve the problems of certificates, network IP, etc. required when pushing requests from multiple cluster backbone network planes to the managed cluster network plane.

3. Multi cluster resource state backflow

The following figure shows the yaml file for multi cluster resource state reflow.

4. Scalable scoring based multi cluster scheduling

Scalable scoring based multi cluster scheduling is based on a scoring mechanism to deploy different applications to different clusters, suitable for scenarios such as fine-grained scheduling of multiple clusters, active and standby disaster recovery.

5、 Integration of OCM and KubeVela

KubeVela is a control surface distributed by the application. The Cluster Gateway in the OCM component is extended through the native mechanism of K8s, which includes components such as Port Proxy and Node Proxy, and extends resources such as Cluster Proxy in multiple clusters.

Users in the central cluster need to access the managed cluster, and the Cluster Gateway will connect the network between the central cluster and the managed cluster. Through the Cluster Gateway and the Knnoectivity tunnel, they can integrate natively, shielding the complexity of the network between the central cluster and the managed cluster, and solving the problem of multi cluster network connectivity.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us