A hybrid cluster connects your on-premises self-managed Kubernetes cluster to Alibaba Cloud through a registered cluster. This lets you scale your on-premises cluster with cloud-based Elastic Compute Service (ECS) nodes while managing both environments from a single control plane.
This guide uses a data center cluster running Calico in route reflector mode as the example. On the cloud side, Alibaba Cloud Container Service for Kubernetes (ACK) uses the Terway plugin for container networking.
Prerequisites
Before you begin, make sure you have:
-
Network connectivity between your on-premises cluster and the virtual private cloud (VPC) used by the registered cluster — covering both the compute node network and the container network. Use Cloud Enterprise Network (CEN) to establish this connectivity. For details, see Establish multi-VPC connections in different scenarios.
Network connectivity between your on-premises environment and the VPC is the foundational requirement. If this is not in place, no subsequent steps will succeed.
-
The on-premises cluster connected to the registered cluster using the private cluster import agent configuration provided by the registered cluster
-
Cloud-based compute nodes added through the registered cluster that can reach the API Server of your on-premises cluster
-
A kubectl connection to the registered cluster. See Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.
Hybrid cluster architecture
In a hybrid cluster, Calico runs on-premises and Terway runs in the cloud. The two network plugins must not interfere with each other: Calico pods stay on on-premises nodes, and Terway pods run only on cloud-based ECS nodes.
The following diagram shows the network topology for this example.
On-premises (data center):
-
Private CIDR block:
192.168.0.0/24 -
Container network CIDR block:
10.100.0.0/16 -
Network plugin: Calico (route reflector mode)
Cloud side:
-
VPC CIDR block:
10.0.0.0/8 -
vSwitch CIDR block for compute nodes:
10.10.24.0/24 -
vSwitch CIDR block for pods:
10.10.25.0/24 -
Network plugin: Terway (shared mode)
Make sure the on-premises container network CIDR block (10.100.0.0/16) does not overlap with the VPC CIDR block (10.0.0.0/8).
Set up the hybrid cluster
The setup consists of seven steps:
-
Restrict Calico to on-premises nodes
-
Grant RAM permissions to Terway
-
Install Terway
-
Verify the Terway DaemonSet
-
Configure the Terway ENI ConfigMap
-
Create a custom node initialization script
-
Create a node pool and scale out ECS nodes
Step 1: Restrict Calico to on-premises nodes
Cloud-based ECS nodes added to a registered ACK cluster are automatically labeled alibabacloud.com/external=true. Configure nodeAffinity on the Calico DaemonSet so that Calico pods run only on on-premises nodes (nodes without this label).
cat <<EOF > calico-ds.patch
spec:
template:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: alibabacloud.com/external
operator: NotIn
values:
- "true"
- key: type
operator: NotIn
values:
- "virtual-kubelet"
EOF
kubectl -n kube-system patch ds calico-node -p "$(cat calico-ds.patch)"
Step 2: Grant RAM permissions to Terway
Terway needs Resource Access Management (RAM) permissions to manage elastic network interfaces (ENIs) on ECS nodes.
Option A: Using onectl (recommended)
-
Install and configure onectl. See Manage registered clusters using onectl.
-
Run:
onectl ram-user grant --addon terway-eniipExpected output:
Ram policy ack-one-registered-cluster-policy-terway-eniip granted to ram user ack-one-user-ce313528c3 successfully.
Option B: Using the console
Grant the following RAM policy to the AccessKey used by Terway. For instructions, see Manage RAM user permissions.
{
"Version": "1",
"Statement": [
{
"Action": [
"ecs:CreateNetworkInterface",
"ecs:DescribeNetworkInterfaces",
"ecs:AttachNetworkInterface",
"ecs:DetachNetworkInterface",
"ecs:DeleteNetworkInterface",
"ecs:DescribeInstanceAttribute",
"ecs:AssignPrivateIpAddresses",
"ecs:UnassignPrivateIpAddresses",
"ecs:DescribeInstances",
"ecs:ModifyNetworkInterfaceAttribute"
],
"Resource": [
"*"
],
"Effect": "Allow"
},
{
"Action": [
"vpc:DescribeVSwitches"
],
"Resource": [
"*"
],
"Effect": "Allow"
}
]
}
Step 3: Install Terway
Option A: Using onectl
onectl addon install terway-eniip
Expected output:
Addon terway-eniip, version **** installed.
Option B: Using the console
-
Log on to the Container Service Management Console. In the left navigation pane, click Clusters.
-
Click your cluster name, then click Add-ons in the left navigation pane.
-
On the Add-ons page, search for terway-eniip. Click Install in the lower-right corner of the component card, then click OK.
Step 4: Verify the Terway DaemonSet
Before adding cloud-based nodes, confirm that Terway is not scheduled on any on-premises nodes.
kubectl -nkube-system get ds |grep terway
Expected output:
terway-eniip 0 0 0 0 0 alibabacloud.com/external=true 16s
The alibabacloud.com/external=true node selector confirms that Terway pods will run only on cloud-based ECS nodes.
Step 5: Configure the Terway ENI ConfigMap
Edit the eni-config ConfigMap in the kube-system namespace to set the AccessKey credentials for Terway:
kubectl -n kube-system edit cm eni-config
Set access_key and access_secret in the eni_conf section:
kind: ConfigMap
apiVersion: v1
metadata:
name: eni-config
namespace: kube-system
data:
eni_conf: |
{
"version": "1",
"max_pool_size": 5,
"min_pool_size": 0,
"vswitches": {"AZoneID":["VswitchId"]},
"eni_tags": {"ack.aliyun.com":"{{.ClusterID}}"},
"service_cidr": "{{.ServiceCIDR}}",
"security_group": "{{.SecurityGroupId}}",
"access_key": "",
"access_secret": "",
"vswitch_selection_policy": "ordered"
}
10-terway.conf: |
{
"cniVersion": "0.3.0",
"name": "terway",
"type": "terway"
}
Replace access_key and access_secret with the AccessKey ID and AccessKey Secret of the RAM user that has the permissions granted in Step 2.
Step 6: Create a custom node initialization script
When a registered ACK cluster adds an ECS node, it runs a node initialization script and passes cloud-specific environment variables to it. Your existing on-premises initialization script (init-node.sh) must be extended to handle these variables.
Environment variables passed by the registered cluster:
| Environment variable | Purpose |
|---|---|
ALIBABA_CLOUD_PROVIDER_ID |
Sets --provider-id on kubelet |
ALIBABA_CLOUD_NODE_NAME |
Sets --hostname-override on kubelet |
ALIBABA_CLOUD_LABELS |
Sets --node-labels on kubelet |
ALIBABA_CLOUD_TAINTS |
Sets --register-with-taints on kubelet |
6a. Extend the initialization script
The custom script init-node-ecs.sh starts with the same setup as init-node.sh (installing containerd, kubelet, kubeadm, and so on), then adds a section that reads the Alibaba Cloud environment variables and passes them to kubelet before joining the cluster.
6b. Host the script and register it with the cluster
-
Upload
init-node-ecs.shto an HTTP file server accessible from the cloud — for example, an Object Storage Service (OSS) bucket:https://kubelet-****.oss-cn-hangzhou-internal.aliyuncs.com/init-node-ecs.sh -
Set
addNodeScriptPathin theack-agent-configConfigMap to the script URL:apiVersion: v1 data: addNodeScriptPath: https://kubelet-****.oss-cn-hangzhou-internal.aliyuncs.com/init-node-ecs.sh kind: ConfigMap metadata: name: ack-agent-config namespace: kube-system
Step 7: Create a node pool and scale out ECS nodes
-
Log on to the Container Service Management Console. In the left navigation pane, click Clusters.
-
Click your cluster name, then click Nodes > Node Pools in the left navigation pane.
-
On the Node Pools page, create a node pool and scale out nodes. For details, see Create and manage a node pool.
Verify the hybrid cluster
After scaling out, confirm that the new ECS nodes have joined the cluster with the correct label:
kubectl get nodes --show-labels | grep external=true
Nodes labeled alibabacloud.com/external=true are cloud-based ECS nodes. Nodes without this label are on-premises nodes. If both appear in the output, the hybrid cluster is operational.
What's next
-
Plan container network CIDR blocks for the Terway scenario: Network planning for ACK managed clusters
-
Connect your data center network to a VPC: Features
-
Create the registered cluster that anchors this setup: Create an ACK One registered cluster