When you migrate your business to Container Service for Kubernetes (ACK) clusters, we recommend that you deploy the clusters by using the default OS images and relevant OS services provided by ACK.
Background information
When you migrate your business to ACK clusters, we recommend that you deploy the clusters by using the default OS images (CentOS 7.6 or Alibaba Cloud Linux (Alinux) 2.1903) and relevant OS services provided by ACK. The OS services included the OS kernel, DNS service, and Yellowdog Updater, Modified (yum) repositories. ACK also provides the open source tool ack-image-builder for you to create custom images. You can deploy ACK clusters by using customs images to meet special business requirements.
Use ack-image-builder to create a custom image
In this topic, a root user account is used to create and configure custom images.
The ack-image-builder tool is developed based on open source tool HashiCorp Packer. The ack-image-builder tool provides a default template and a verification script for you to create custom images.
By using ack-image-builder, you can reduce errors caused by manual operations. The ack-image-builder tool also records image changes to facilitate troubleshooting. To use the ack-image-builder tool to create a custom image for an ACK cluster, perform the following steps:
Install Packer.
Download Packer from its official website. Make sure that the version is compatible with your operating system. Then, install and verify Packer based on its installation documentation.
Run the packer version command. The following command output indicates that Packer is installed.
packer version
Packer v1.4.1
Create a template in Packer.
When you create a custom image by using Packer, you must create a template file in JSON format. In the template file, specify the image builder provided by Alibaba Cloud and the provisioner that is used to configure the custom image.
{ "variables": { "region": "cn-hangzhou", "image_name": "test_image{{timestamp}}", "source_image": "centos_7_06_64_20G_alibase_20190711.vhd", "instance_type": "ecs.n1.large", "access_key": "{{env `ALICLOUD_ACCESS_KEY`}}", "secret_key": "{{env `ALICLOUD_SECRET_KEY`}}" }, "builders": [ { "type": "alicloud-ecs", "access_key": "{{user `access_key`}}", "secret_key": "{{user `secret_key`}}", "region": "{{user `region`}}", "image_name": "{{user `image_name`}}", "source_image": "{{user `source_image`}}", "ssh_username": "root", "instance_type": "{{user `instance_type`}}", "io_optimized": "true" } ], "provisioners": [ { "type": "shell", "scripts": [ "config/default.sh", "scripts/updateDNS.sh", "scripts/reboot.sh", "scripts/verify.sh" ], "expect_disconnect": true } ] }
Parameter
Description
access_key
The AccessKey ID of your Alibaba Cloud account.
secret_key
The AccessKey secret of your Alibaba Cloud account.
region
The region of the cloud resources that are temporarily used to create the custom image.
image_name
The name of the custom image.
source_image
The name of the base image used to create the custom image. You can obtain the name of a base image from the public image list of Alibaba Cloud.
instance_type
The type of the cloud resources that are temporarily used to create the custom image.
provisioners
The provisioner that is used to configure the custom image.
Create a Resource Access Management (RAM) user and create an AccessKey pair for the RAM user.
We recommend that you create a RAM user and attach a RAM policy that provides Packer-related permissions to the RAM user. You also need to create an AccessKey pair for the RAM user.
Add the AccessKey pair information to the template and create a custom image.
Run the following commands to add the AccessKey pair information:
export ALICLOUD_ACCESS_KEY=XXXXXX export ALICLOUD_SECRET_KEY=XXXXXX
Run the following commands to create a custom image:
packer build alicloud.json
alicloud-ecs output will be in this color. ==> alicloud-ecs: Prevalidating source region and copied regions... ==> alicloud-ecs: Prevalidating image name... alicloud-ecs: Found image ID: centos_7_06_64_20G_alibase_20190711.vhd ==> alicloud-ecs: Creating temporary keypair: xxxxxx ==> alicloud-ecs: Creating vpc... alicloud-ecs: Created vpc: xxxxxx ==> alicloud-ecs: Creating vswitch... alicloud-ecs: Created vswitch: xxxxxx ==> alicloud-ecs: Creating security group... alicloud-ecs: Created security group: xxxxxx ==> alicloud-ecs: Creating instance... alicloud-ecs: Created instance: xxxxxx ==> alicloud-ecs: Allocating eip... alicloud-ecs: Allocated eip: xxxxxx alicloud-ecs: Attach keypair xxxxxx to instance: xxxxxx ==> alicloud-ecs: Starting instance: xxxxxx ==> alicloud-ecs: Using ssh communicator to connect: 47.111.127.54 ==> alicloud-ecs: Waiting for SSH to become available... ==> alicloud-ecs: Connected to SSH! ==> alicloud-ecs: Provisioning with shell script: scripts/verify.sh alicloud-ecs: [20190726 11:04:10]: Check if kernel version >= 3.10. Verify Passed! alicloud-ecs: [20190726 11:04:10]: Check if systemd version >= 219. Verify Passed! alicloud-ecs: [20190726 11:04:10]: Check if sshd is running and listen on port 22. Verify Passed! alicloud-ecs: [20190726 11:04:10]: Check if cloud-init is installed. Verify Passed! alicloud-ecs: [20190726 11:04:10]: Check if wget is installed. Verify Passed! alicloud-ecs: [20190726 11:04:10]: Check if curl is installed. Verify Passed! alicloud-ecs: [20190726 11:04:10]: Check if kubeadm is cleaned up. Verify Passed! alicloud-ecs: [20190726 11:04:10]: Check if kubelet is cleaned up. Verify Passed! alicloud-ecs: [20190726 11:04:10]: Check if kubectl is cleaned up. Verify Passed! alicloud-ecs: [20190726 11:04:10]: Check if kubernetes-cni is cleaned up. Verify Passed! ==> alicloud-ecs: Stopping instance: xxxxxx ==> alicloud-ecs: Waiting instance stopped: xxxxxx ==> alicloud-ecs: Creating image: test_image1564110199 alicloud-ecs: Detach keypair xxxxxx from instance: xxxxxxx ==> alicloud-ecs: Cleaning up 'EIP' ==> alicloud-ecs: Cleaning up 'instance' ==> alicloud-ecs: Cleaning up 'security group' ==> alicloud-ecs: Cleaning up 'vSwitch' ==> alicloud-ecs: Cleaning up 'VPC' ==> alicloud-ecs: Deleting temporary keypair... Build 'alicloud-ecs' finished. ==> Builds finished. The artifacts of successful builds are: --> alicloud-ecs: Alicloud images were created: cn-hangzhou: m-bp1aifbnupnaktj00q7s
The scripts/verify.sh script is used to verify the custom image.
Use a custom operating system kernel
ACK requires a Linux operating system with the kernel of V3.10
or later. We recommend that you update only the RPM packages to be customized. You must set boot parameters for the kernel.
You can use the following code:
cat scripts/updateOSKernel.sh
#!/bin/bash
VERSION_KERNEL="3.10.0-1062.4.3.el7"
yum localinstall -y http://xxx.xxx.xxx.xxx/kernel-${VERSION_KERNEL}.x86_64.rpm http://xxx.xxx.xxx.xxx/kernel-devel-${VERSION_KERNEL}.x86_64.rpm http://xxx.xxx.xxx.xxx/kernel-headers-${VERSION_KERNEL}.x86_64.rpm
grub_num=$(cat /etc/grub2.cfg |awk -F\' '$1=="menuentry " {print i++ " : " $2}' |grep $VERSION_KERNEL |awk -F ':' '{print $1}')
grub2-set-default $grub_num
We recommend that you do not run commands that update all RPM packages, such as the yum update -y
command.
Customize the operating system kernel
When you customize kernel parameters, do not overwrite the following parameters:
["vm.max_map_count"]="262144"
["kernel.softlockup_panic"]="1"
["kernel.softlockup_all_cpu_backtrace"]="1"
["net.core.somaxconn"]="32768"
["net.core.rmem_max"]="16777216"
["net.core.wmem_max"]="16777216"
["net.ipv4.tcp_wmem"]="4096 12582912 16777216"
["net.ipv4.tcp_rmem"]="4096 12582912 16777216"
["net.ipv4.tcp_max_syn_backlog"]="8096"
["net.ipv4.tcp_slow_start_after_idle"]="0"
["net.core.netdev_max_backlog"]="16384"
["fs.file-max"]="2097152"
["fs.inotify.max_user_instances"]="8192"
["fs.inotify.max_user_watches"]="524288"
["fs.inotify.max_queued_events"]="16384"
["net.ipv4.ip_forward"]="1"
["net.bridge.bridge-nf-call-iptables"]="1"
["fs.may_detach_mounts"]="1"
["net.ipv4.conf.default.rp_filter"]="0"
["net.ipv4.tcp_tw_reuse"]="0"
["net.ipv4.tcp_tw_recycle"]="0"
If you need to modify some of the preceding parameters, submit a ticket to the ACK technical team to analyze the effects. After you are authorized to modify the preceding parameters, you can go to the cluster creation page or cluster scale-out page in the ACK console, choose
, and then enter the script.Use a custom DNS service
If you use a custom DNS service, pay attention to the following notes:
Add Alibaba Cloud name servers to the upstream name servers of the custom DNS service.
cat /etc/resolv.conf options timeout:2 attempts:3 rotate single-request-reopen ; generated by /usr/sbin/dhclient-script nameserver 100.XX.XX.136 nameserver 100.XX.XX.138
Lock the /etc/resolve.conf file after you modify it. Otherwise, cloud-init restores the file to default settings after ECS instances are restarted. Example:
cat scripts/updateDNS.sh #!/bin/bash # unlock DNS file in case it was locked chattr -i /etc/resolv.conf # Using your custom nameserver to replace xxx.xxx.xxx.xxx echo -e "nameserver xxx.xxx.xxx.xxx\nnameserver xxx.xxx.xxx.xxx" > /etc/resolv.conf # Keep resolv locked to prevent overwriting by cloudinit/NetworkManager chattr +i /etc/resolv.conf
Ensure adequate performance of the custom DNS service.
Make sure that the performance of the custom DNS service can meet the requirements if your cluster contains a large number of nodes.
Use a custom YUM repository
If you use a custom YUM repository, pay attention to the following notes:
Do not update all RPM packages.
Update only the RPM packages to be installed. Do not run the
yum update -y
command to update all RPM packages.Ensure adequate performance of the YUM repository.
If you want to add a large number of worker nodes to the cluster at a time and update RPM packages based on the YUM repository, make sure that the performance of the YUM repository can meet your business requirements. You can use the following code:
cat scripts/add-yum-repo.sh #! /bin/bash cat << EOF > /etc/yum.repos.d/my.repo [base] name=CentOS-\$releasever enabled=1 failovermethod=priority baseurl=http://mirrors.cloud.aliyuncs.com/centos/\$releasever/os/\$basearch/ gpgcheck=1 gpgkey=http://mirrors.cloud.aliyuncs.com/centos/RPM-GPG-KEY-CentOS-7 EOF
Preload the container images of DaemonSet components
If you want to add more than 1,000 worker nodes to the cluster at a time, we recommend that you preload the container images of DaemonSet components before you create the custom image. This reduces the workload of pulling these container images when nodes start and improves the efficiency of cluster scale-outs.
Compress the relevant system component images TAR files and store them in the custom image.
Assume that the ACK cluster uses the Terway network plug-in and the Container Storage Interface (CSI) volume plug-in, and resides in the China (Hangzhou) region. You can use the following script to preload the container images of the preceding plug-ins:
cat scripts/prepare-images.sh #!/bin/bash set -x -e EXPORT_PATH=/preheated # Install Docker. yum install -y docker systemctl start docker # Pull and save images. images=( registry-vpc.cn-hangzhou.aliyuncs.com/acs/csi-plugin:v1.14.5.60-5318afe-aliyun registry-vpc.cn-hangzhou.aliyuncs.com/acs/terway:v1.0.10.78-g97729ee-aliyun ) mkdir -p ${EXPORT_PATH} for image in "${images[@]}"; do echo "preheating ${image}" docker pull ${image} docker save -o ${EXPORT_PATH}/$(echo ${image}| md5sum | cut -f1 -d" ").tar ${image} done # Uninstall Docker. yum erase -y docker rm -rf /var/lib/docker
Log on to the ACK console. Go to the cluster creation page, click Show Advanced Options, and then enter the following script in the field:
ls /preheated/ | xargs -n 1 -i docker load -i /preheated/{} rm -rf /preheated
Administrator permissions are required for migrating the OS.
Edit the configuration file of the custom image
Add the following configurations about provisioners
to the alicloud.json
file for creating the custom image:
"provisioners": [
{
"type": "shell",
"scripts": [
"config/default.sh",
"scripts/updateOSKernel.sh",
"scripts/updateDNS.sh",
"scripts/add-yum-repo.sh",
"scripts/prepare-images.sh",
"scripts/reboot.sh",
"scripts/verify.sh"
],
"expect_disconnect": true
}
]
The config/default.sh
, scripts/reboot.sh
, and scripts/verify.sh
scripts are default scripts that you must run. Others are custom scripts.
The config/default.sh
script sets the time zone and disables swap partitions.
The scripts/verify.sh
script checks whether the custom image meets the requirements of the desired ACK cluster.
After you edit the configuration file of the custom image, you can create the custom image and use it to create or scale out an ACK cluster.
Create an ACK cluster
We recommend that you first create an ACK dedicated cluster that contains no worker nodes or an ACK managed cluster that contains two worker nodes, add worker nodes that use a custom image to the cluster, and verify the result. This saves time and decreases the probability of errors.
Use the default system image to create an ACK dedicated cluster that contains three or five master nodes and no worker nodes. For more information, see Create an ACK dedicated cluster.
NoteIf you create an ACK managed cluster, select at least two worker nodes. For more information, see Create an ACK managed cluster.
Add worker nodes that use the custom image to the cluster. For more information, see Increase the number of nodes in an ACK cluster.
If you want to customize the initialization script of the nodes that add, you can customize the user data of the relevant ECS instances.
NoteTo use custom images and configure the user data of ECS instances, Submit a ticket.