Community Blog How Kubernetes SIG-Cloud-Provider-Alibaba Works

How Kubernetes SIG-Cloud-Provider-Alibaba Works

This blog highlights the key takeaways from the online presentation Tang Zhimin and Xie Yaoyao gave about SIG Cloud Providers.

By Tang Zhimin and Xie Yaoyao (Chuyang)

On February 12, 2020, Alibaba Cloud and Cloud Native Computing Foundation (CNCF) jointly held a webinar. During the webinar, Alibaba Cloud fully introduced more than 20 open-source Kubernetes projects for 10 categories for the first time, providing a practice of complete Kubernetes lifecycle management. This article summarizes the content of the complete video, provides the documents for download, and sorts out the questions that were left behind at the webinar.


What is SIG Cloud Provider?

Over the years, more and more enterprises use Kubernetes in their production environments. Kubernetes is widely accepted due to its sound design and prosperous community. So far, there are about 20 Special Interest Groups (SIGs) around Kubernetes. As an important SIG for the Kubernetes community, SIG Cloud Provider is devoted to promoting all cloud vendors to provide Kubernetes services with standard capabilities.

SIG-Cloud-Provider-Alibaba is the only sub-project of SIG Cloud Provider in China. SIG Cloud Provider is a cloud vendor interest group for the Kubernetes community. It ensures that the Kubernetes ecosystem is evolving in a way that is neutral to all cloud vendors and establishes standards and requirements that are common to all providers to ensure optimal Kubernetes integration. Currently, there are seven cloud vendors in SIG Cloud Provider, including Amazon Web Services (AWS), GCP, Alibaba Cloud, and IBM Cloud.

Why Did Alibaba Cloud Join SIG Cloud Provider?

1. Work with Global Cloud Providers to Promote Multi-Cloud Standards and Contribute to the Kubernetes Community with Alibaba Cloud Practices

In the era of full migration to the cloud, enterprises' IT architecture has been reshaped in the cloud. Cloud-native computing is a set of best practices and methodologies for building scalable, robust, and loosely coupled applications in Alibaba Cloud, Apsara Stack, and multi-cloud environments. This facilitates quick innovation and lower-cost trials.

As a world-leading cloud vendor, Alibaba Cloud hopes to promote Kubernetes standardization and deepen the cooperation with other cloud vendors such as AWS, Google, and Azure to optimize the cloud-Kubernetes connection and unify modular and standardized protocols for different components.

2. Provide Transparency, Controllability, Collaboration, and Smooth Evolution for Alibaba Cloud Kubernetes Developers

We hope to establish the best Kubernetes running environment for Kubernetes developers and users and to provide Alibaba Cloud open-source plug-ins for Kubernetes. Alibaba Cloud Container Service for Kubernetes (ACK) also reuses these components.


  • Transparency and controllability: Research-oriented developers can build Kubernetes clusters by using these plug-ins, whereas ACK users can better understand the related implementations.
  • Collaboration: Developers who need to use Kubernetes in Alibaba Cloud computing, network, and storage can raise issues or participate in the development of open-source components and formulation of RoadMap.
  • Smooth evolution: Kubernetes open-source plug-ins of Alibaba Cloud support deployment but impose higher requirements on enterprise O&M, upgrades, and stability control. Smooth evolution to ACK can be implemented to obtain expert services such as continuous upgrade, high-availability guarantee, and correction recommendations.

How Kubernetes SIG-Cloud-Provider-Alibaba Works

  • Slack
  • Bimonthly conference
  • Minutes of conference
  • Google Docs and YouTube
  • Conference languages: Chinese and English

Introduction to Alibaba Cloud Products for Kubernetes


Alibaba Cloud Open-Source Suite for Kubernetes


As a cloud-native operating system for applications, Kubernetes has become a standard. During Kubernetes practices, Alibaba Cloud has provided many open-source projects to provide full-stack lifecycle management for user applications. Such projects involve five underlying categories (CloudController, computing, storage, network, and security) and five upper-layer categories (AI, ServiceBroker, application management, migration, and serverless.)

SIG-Cloud-Provider-Alibaba provides a channel for communicating Kubernetes cloud-native best practices on Alibaba Cloud. Any individual and organizational participants can learn how Cloud Provider works and apply it to production to realize its business value. For more information, please see the following:










Application Management

Introduction to Some Open-Source Components


CloudController is a cloud controller manager (CCM) of Kubernetes. It can interconnect with basic services of different cloud vendors, including Server Load Balancer (SLB), Virtual Private Cloud (VPC) routing, Elastic Compute Service (ECS), and Alibaba Cloud Domain Name System (DNS) services through NodeController, ServiceController, RouteController, and PVLController.

NodeController manages compute nodes, such as managing the lifecycle of ECS instances. By marking nodes with zones, regions, and hostnames, it provides complete information required for the orchestration system to schedule workloads for compute pools. It also regularly polls the IP addresses of ECS instances and checks whether ECS resources are released to dynamically update node information. This ensures that the orchestration system can respond to computing node events promptly.

ServiceController implements load balancing management for applications. It monitors the changes of Kubernetes Service objects, automatically configures and manages off-premises SLB services including SLB instances, listeners, and virtual server groups, and adjusts backend server groups of SLB instances based on application replica changes. On this basis, we have defined a wide range of annotations to customize the configuration of load balancing for applications. We have also worked with the Kubernetes community to standardize configurations and added the Elastic Network Interface (ENI) mode to the Kubernetes service discovery model. This simplifies the network hierarchy of service discovery and improves the overall application network performance by 10%.


Terway: The High-Performance Network Component

Terway supports Kubernetes CNI specifications and is specially optimized for Alibaba Cloud. It supports multiple enterprise features, including the VPC routing mode, ENI mode, and inclusive ENI mode. Its performance in ENI mode is about 10% higher than that in the native VPC.

Terway is integrated with Alibaba Cloud Infrastructure as a Service (IaaS.) It allows pods to use network products such as Cloud Enterprise Network (CEN) and SLB, and use ENIs to avoid network performance loss. This eliminates experience compromise or performance degradation in the containerization process. It also supports advanced features such as Kubernetes network policies and quality of service (QoS)-based throttling.


CSI: The High-Performance Container Storage Component

The Alibaba Cloud Container Storage Interface (CSI) plug-in enables you to manage the lifecycle of container volumes in Kubernetes, including creating, mounting, and using cloud volumes. The CSI plug-in is implemented based on Kubernetes versions later than V1.14. It supports Alibaba Cloud storage services, such as disks, Apsara File Storage NAS, Cloud Paralleled File System (CPFS), Object Storage Service (OSS), and Logical Volume Manager (LVM.)


Log-Pilot: The High-Performance Log Collector

Log-pilot is used to efficiently collect logs from containers. It can easily collect the standard output logs of containers and dynamically discover and collect log files from containers. In declarative configuration mode, it can automatically detect the status of a container in the cluster to configure the container log collection function. It has many advanced features, such as automatic checkpoint and handle retention, tagging, and tag customization. With these features, log-pilot can flexibly collect and save log data to various log storage backends, such as Elasticsearch, Message Queue for Apache Kafka, Logstash, Redis, and Graylog.


Arena: The Lightweight Solution for Machine Learning Platform for AI

Arena is a lightweight solution for the Machine Learning Platform for AI based on Kubernetes. It supports data preparation, model development, model training, and model prediction throughout the lifecycle, improving the work efficiency of data scientists. The service platform allows data scientists and algorithm engineers to quickly perform data preparation, model development, model training, evaluation, and prediction tasks by using Alibaba Cloud resources. These cloud resources include ECS, Elastic GPU Service, Apsara File Storage NAS, CPFS, OSS, E-MapReduce, and SLB instances. The service platform can also easily transform deep learning capabilities into service APIs to accelerate business application integration. It can also improve the utilization of Elastic GPU Service resources in a cluster through visual management of Elastic GPU Service resources and shared scheduling of devices.


Welcome to SIG Cloud Provider

This webinar talked about the strategic arrangement of Alibaba Cloud products for the Kubernetes community for the first time. We cannot detail all the open-source components here. Instead, we hope developers that are interested in Kubernetes can find corresponding open-source projects. Any developers are welcomed to raise PR, or issues, or to give roadmap suggestions. SIG-Cloud-Provider-Alibaba will share principles and best practices for specific components.


Q1: Can Cloud Provider of Alibaba Cloud Kubernetes add parameters to enable or disable each function?

A1: Yes. You can add annotations for this purpose.

Q2: Will it cause issues if we use a specified version of Kubernetes to make modifications based on Alibaba Cloud CCM?

A2: No, this will not cause issues because CCM is independent of the Kubernetes version.

Q3: Do Alibaba Cloud Kubernetes-based container services directly use open-source CCM? If so, what adjustments have been made internally before launch, and what is the specific format of provider_id?

A3: Yes, Alibaba Cloud Kubernetes-based container services directly use open-source CCM. The format of provider_id is ${regionid}.${nodeid}.

Q4: Must the node name of Kubernetes be the same as the instance ID of Alibaba Cloud for CCM? O&M personnel said they must be the same.

A4: No. Currently, only the provider ID needs to be set.

Q5: How is the underlying layer of Terway accelerated, by kernel level or Data Plane Development Kit (DPDK)?

A5: Terway can work on different networks with different configurations.

  • In exclusive ENI mode, the ENI at the IaaS layer is used as the pod ENI. The host does not support virtualization and DPDK can be used to speed up the application network in a pod. The high-performance IaaS network developed by Alibaba Cloud can be directly used among nodes.
  • In inclusive ENI mode, the lightweight virtualization solution of IPVLAN is used for virtualization in a node with much lower performance degradation than that of the host network.

Q6: Can underlying kernel parameters of a pod be set in namespaces?

A6: It depends on the kernel. In new kernels, such as Linux Kernel 4.19 of Aliyun Linux2, most kernel parameters can be set and modified in a pod.

Q7: What security container products of Alibaba Cloud are available now?

A7: Alibaba Cloud Container Service currently provides the security sandbox as a container engine for users. In addition, some Alibaba Cloud serverless products, such as Serverless App Engine (SAE) and Elastic Container Instance (ECI), are also built on security containers.

Q8: Does Arena support multitenancy and virtual graphics processing units (vGPUs)?

A8: Arena reuses the existing user authorization and multitenancy policies of Kubernetes. Different users can use different kubeconfig files for authentication and use namespaces to isolate and share resources. In terms of Arena, users can view only the training and inference tasks for this namespace. Here, the vGPU refers to the NVIDIA vGPU technology. Currently, the vGPU technology that supports P4 in Alibaba Cloud has been integrated with ACK. You can get started in Alibaba Cloud Container Service. In terms of Arena, a vGPU is a resource that can be scheduled and orchestrated, but not a special resource.

Q9: Does the multi-container GPU sharing solution support resource isolation and can the GPU be limited?

A9: For our GPU sharing solution, Alibaba Cloud Container Service provides the only open-source GPU sharing solution in the industry. Currently, our solution implements multi-container GPU sharing at the scheduling layer and can be integrated with frameworks, such as TensorFlow, to limit GPU resources at the application layer. For more information about the usage of the solution, please see the user guide. We are also working with the Alibaba Cloud team to develop a secure and high-performance GPU isolation solution. In the near future, you may experience a complete solution with the GPU sharing and isolation functions.

Q10: Does ExternalDNS support Alibaba Cloud DNS?

A10: Alibaba Cloud DNS PrivateZone is supported now. The resolution of services or pods can be synchronized from the Kubernetes cluster to Alibaba Cloud DNS, reducing the loss caused by CoreDNS deployed in the cluster.

Q11: What is the major difference between the ingress-nginx of Alibaba Cloud and that of the Kubernetes community?

A11: The ingress-nginx of Alibaba Cloud provides more advanced features, such as the dynamic update of the ingress-nginx configuration. It also supports a phased release policy based on the headers, cookies, request parameters, and weight.

Q12: What is the release cycle of ACK and its development kits?

A12: A major version of ACK is updated every six months. Bugs are fixed irregularly.

Q13: Has the business edition of ACK@Edge been released and which users are using it?

A13: ACK@Edge has been launched for production. Its users come from many fields and industries, such as online education, video, Alibaba Cloud IoT, and Alibaba Cloud CDN. The business edition is expected to launch before June 2020.

Q14: Are there any control group (cgroup) memory leaks on the worker node on the host? If so, how can I solve the problem?

A14: The cgroup driver used by Container Service is the systemd cgroup driver. So far, no cgroup memory leaks have been reported.

Q15: Are the CPU and memory resources of a pod isolated from the host? If so, how are they isolated?

A15: You can use Kubelet to reserve resources for the host so that the resources of the pod are limited within the remaining resource space for isolation.

Q16: Does Alibaba Cloud have a tool similar to eckctl or ackctl from AWS?

A16: Please see aliyun-cli for the answer.

Q17: How does Alibaba Cloud support Windows containers?

A17: Windows 10 of version 1809 is currently supported and version 1903 will be supported soon. Windows nodes can be added to Linux clusters.

Q18: Can I integrate an open component into an existing Kubernetes cluster?

A18: Yes. Existing Kubernetes clusters meet the requirements of Kubernetes conformance testing.

You can find the complete video of the live presentation (in Chinese) here.

0 0 0
Share on

You may also like


Related Products