When you migrate a self-built Kubernetes cluster to Container Service for Kubernetes, we recommend that you use the default system image and services to create the Container Service for Kubernetes cluster. You can also use a custom image to create the Container Service for Kubernetes cluster based on your business requirements. This topic describes how to use a custom image to create a Container Service for Kubernetes cluster.

Background information

When you migrate a self-built Kubernetes cluster to Container Service for Kubernetes, we recommend that you use the default system image for CentOS 7.6 or Aliyun Linux 2.1903 and default system services, including the operating system kernel, domain name system (DNS) service, and Yum repository. Container Service for Kubernetes also supports custom images to meet special business requirements. We provide an open-source tool ack-image-builder to help you quickly create custom images that meet the requirements of Container Service for Kubernetes.

Create a Container Service for Kubernetes cluster

We recommend that you first create a dedicated Kubernetes cluster with no worker node or a managed Kubernetes cluster with two worker nodes and then add worker nodes that use a custom image to the cluster. This saves time and decreases the error probability.

  1. Create a dedicated Kubernetes cluster with three or five master nodes and no worker node by using a default system image. For more information, see Create a Kubernetes cluster.
    Note If you create a managed Kubernetes cluster, select at least two worker nodes. For more information, see Create a managed Kubernetes cluster.
  2. Add worker nodes that use a custom image to the cluster. For more information, see Expand clusters.
    If you want to run an initialization script after you add the worker nodes to the cluster, you can configure the user data for Elastic Compute Service (ECS) instances.
    Note To use custom images and configure the user data, submit a ticket.

The following sections describe how to use ack-image-builder to create a custom image.

Use ack-image-builder

The ack-image-builder tool is developed based on the open-source tool HashiCorp Packer. The ack-image-builder tool provides the default configuration template and verification script for you to create custom images.

With ack-image-builder, you can reduce the error probability caused by manual operations. In addition, the ack-image-builder tool records changes on images to facilitate troubleshooting. To use the ack-image-builder tool to create a custom image for a Container Service for Kubernetes cluster, follow these steps:

  1. Install Packer.
    Download Packer from its official website. Make sure that the version is compatible with your operating system. Then, install and verify Packer by following its installation documentation.

    Run the packer version command. If the version number of Packer is returned, Packer is installed successfully.

    $ packer version
    Packer v1.4.1
  2. Create a Packer template.
    To use Packer to create a custom image, you must create a template in JSON format. In the template, you must specify the image builder and provisioner that are used to create the custom image. In this example, Alicloud Image Builder and the shell provisioner are used.
    {
      "variables": {
        "region": "cn-hangzhou",
        "image_name": "test_image{{timestamp}}",
        "source_image": "centos_7_06_64_20G_alibase_20190711.vhd",
        "instance_type": "ecs.n1.large",
        "access_key": "{{env `ALICLOUD_ACCESS_KEY`}}",
        "secret_key": "{{env `ALICLOUD_SECRET_KEY`}}"
      },
      "builders": [
        {
          "type": "alicloud-ecs",
          "access_key": "{{user `access_key`}}",
          "secret_key": "{{user `secret_key`}}",
          "region": "{{user `region`}}",
          "image_name": "{{user `image_name`}}",
          "source_image": "{{user `source_image`}}",
          "ssh_username": "root",
          "instance_type": "{{user `instance_type`}}",
          "io_optimized": "true"
        }
      ],
      "provisioners": [
        {
          "type": "shell",
          "scripts": [
            "config/default.sh",
            "scripts/updateDNS.sh",
            "scripts/reboot.sh",
            "scripts/verify.sh"
          ],
          "expect_disconnect": true
        }
      ]
    }
    Parameter Description
    access_key Your AccessKey ID.
    secret_key Your AccessKey secret.
    region The region of the intermediate instance that is used to create the custom image.
    image_name The name of the custom image.
    source_image The name of the source image used to create the custom image. Set this parameter to the name of a public image provided by Alibaba Cloud.
    instance_type The type of the intermediate instance used to create the custom image.
    provisioners The provisioner used to create the custom image.
  3. Create a Resource Access Management (RAM) user and generate an AccessKey.
    Creating custom images requires high permissions. We recommend that you create a RAM user and use a RAM policy to grant the permissions required by Packer to the RAM user. Then, generate an AccessKey for the RAM user. For more information, see Create an AccessKey.
  4. Add the AccessKey information to the template and create a custom image.
    1. Run the following commands to add the AccessKey information:
      export ALICLOUD_ACCESS_KEY=XXXXXX
      export ALICLOUD_SECRET_KEY=XXXXXX
    2. Run the following commands to create a custom image:
      $ packer build alicloud.json
      alicloud-ecs output will be in this color.
      
      ==> alicloud-ecs: Prevalidating source region and copied regions...
      ==> alicloud-ecs: Prevalidating image name...
          alicloud-ecs: Found image ID: centos_7_06_64_20G_alibase_20190711.vhd
      ==> alicloud-ecs: Creating temporary keypair: xxxxxx
      ==> alicloud-ecs: Creating vpc...
          alicloud-ecs: Created vpc: xxxxxx
      ==> alicloud-ecs: Creating vswitch...
          alicloud-ecs: Created vswitch: xxxxxx
      ==> alicloud-ecs: Creating security group...
          alicloud-ecs: Created security group: xxxxxx
      ==> alicloud-ecs: Creating instance...
          alicloud-ecs: Created instance: xxxxxx
      ==> alicloud-ecs: Allocating eip...
          alicloud-ecs: Allocated eip: xxxxxx
          alicloud-ecs: Attach keypair xxxxxx to instance: xxxxxx
      ==> alicloud-ecs: Starting instance: xxxxxx
      ==> alicloud-ecs: Using ssh communicator to connect: 47.111.127.54
      ==> alicloud-ecs: Waiting for SSH to become available...
      ==> alicloud-ecs: Connected to SSH!
      ==> alicloud-ecs: Provisioning with shell script: scripts/verify.sh
          alicloud-ecs: [20190726 11:04:10]: Check if kernel version >= 3.10.  Verify Passed!
          alicloud-ecs: [20190726 11:04:10]: Check if systemd version >= 219.  Verify Passed!
          alicloud-ecs: [20190726 11:04:10]: Check if sshd is running and listen on port 22.  Verify Passed!
          alicloud-ecs: [20190726 11:04:10]: Check if cloud-init is installed.  Verify Passed!
          alicloud-ecs: [20190726 11:04:10]: Check if wget is installed.  Verify Passed!
          alicloud-ecs: [20190726 11:04:10]: Check if curl is installed.  Verify Passed!
          alicloud-ecs: [20190726 11:04:10]: Check if kubeadm is cleaned up.  Verify Passed!
          alicloud-ecs: [20190726 11:04:10]: Check if kubelet is cleaned up.  Verify Passed!
          alicloud-ecs: [20190726 11:04:10]: Check if kubectl is cleaned up.  Verify Passed!
          alicloud-ecs: [20190726 11:04:10]: Check if kubernetes-cni is cleaned up.  Verify Passed!
      ==> alicloud-ecs: Stopping instance: xxxxxx
      ==> alicloud-ecs: Waiting instance stopped: xxxxxx
      ==> alicloud-ecs: Creating image: test_image1564110199
          alicloud-ecs: Detach keypair xxxxxx from instance: xxxxxxx
      ==> alicloud-ecs: Cleaning up 'EIP'
      ==> alicloud-ecs: Cleaning up 'instance'
      ==> alicloud-ecs: Cleaning up 'security group'
      ==> alicloud-ecs: Cleaning up 'vSwitch'
      ==> alicloud-ecs: Cleaning up 'VPC'
      ==> alicloud-ecs: Deleting temporary keypair...
      Build 'alicloud-ecs' finished.
      
      ==> Builds finished. The artifacts of successful builds are:
      --> alicloud-ecs: Alicloud images were created:
      
      cn-hangzhou: m-bp1aifbnupnaktj00q7s
      The scripts/verify.sh script is used to verify the custom image.

Use a custom operating system kernel

Container Service for Kubernetes requires a Linux operating system with the kernel of V3.10 or later. We recommend that you only update the RPM packages to be customized. In addition, you must set the boot parameters for the kernel.

The sample code is as follows:
$ cat scripts/updateOSKernel.sh
#! /bin/bash
VERSION_KERNEL="3.10.0-1062.4.3.el7"
yum  localinstall -y  http://xxx.xxx.xxx.xxx/kernel-${VERSION_KERNEL}.x86_64.rpm   http://xxx.xxx.xxx.xxx/kernel-devel-${VERSION_KERNEL}.x86_64.rpm   http://xxx.xxx.xxx.xxx/kernel-headers-${VERSION_KERNEL}.x86_64.rpm
grub_num=$(cat /etc/grub2.cfg |awk -F\' '$1=="menuentry " {print i++ " : " $2}' |grep $VERSION_KERNEL |awk -F ':' '{print $1}')
grub2-set-default $grub_num
Note We do not recommend that you update all RPM packages by running the yum update -y command.

Customize the operating system kernel

When you customize kernel parameters, do not overwrite the following parameters:
["vm.max_map_count"]="262144"
["kernel.softlockup_panic"]="1"
["kernel.softlockup_all_cpu_backtrace"]="1"
["net.core.somaxconn"]="32768"
["net.core.rmem_max"]="16777216"
["net.core.wmem_max"]="16777216"
["net.ipv4.tcp_wmem"]="4096 12582912 16777216"
["net.ipv4.tcp_rmem"]="4096 12582912 16777216"
["net.ipv4.tcp_max_syn_backlog"]="8096"
["net.ipv4.tcp_slow_start_after_idle"]="0"
["net.core.netdev_max_backlog"]="16384"
["fs.file-max"]="2097152"
["fs.inotify.max_user_instances"]="8192"
["fs.inotify.max_user_watches"]="524288"
["fs.inotify.max_queued_events"]="16384"
["net.ipv4.ip_forward"]="1"
["net.bridge.bridge-nf-call-iptables"]="1"
["fs.may_detach_mounts"]="1"
["net.ipv4.conf.default.rp_filter"]="0"
["net.ipv4.tcp_tw_reuse"]="0"
["net.ipv4.tcp_tw_recycle"]="0"
Note If you must modify some of the preceding parameters, submit a ticket for Alibaba Cloud engineers to analyse the effects. After you are authorized to modify the preceding parameters, you can go to the page for creating or scaling out a cluster, click Show Advanced Options, and then enter your script in the User Data field.

Use a custom DNS service

If you use a custom DNS service, note the following:
  • Add Alibaba Cloud name servers to the upstream name servers.
    cat /etc/resolv.conf
    options timeout:2 attempts:3 rotate single-request-reopen
    ; generated by /usr/sbin/dhclient-script
    nameserver 100.XX.XX.136
    nameserver 100.XX.XX.138
  • Lock the /etc/resolve.conf file after you modify it. Otherwise, cloud-init restores the file to default settings after ECS instances restart. The sample code is as follows:
    $ cat scripts/updateDNS.sh
    #! /bin/bash
    # unlock DNS file in case it was locked
    chattr -i /etc/resolv.conf
    
    # Using your custom nameserver to replace xxx.xxx.xxx.xxx
    echo -e "nameserver xxx.xxx.xxx.xxx\nnameserver xxx.xxx.xxx.xxx" > /etc/resolv.conf
    
    # Keep resolv locked to prevent overwriting by cloudinit/NetworkManager
    chattr +i /etc/resolv.conf
  • Assure adequate performance of the custom DNS service.

    Make sure that the performance of the custom DNS service can meet the requirements if your cluster contains a large number of nodes.

Use a custom Yum repository

If you use a custom Yum repository, note the following:
  • Do not update all RPM packages.

    Update only the RPM packages to be installed. Do not run the yum update -y command to update all RPM packages.

  • Assure adequate performance of the Yum repository.
    If you may add a large number of worker nodes to the cluster at a time and need to update RPM packages from the Yum repository, make sure that the performance of the Yum repository can meet your business requirements. The sample code is as follows:
    $ cat scripts/add-yum-repo.sh
    #! /bin/bash
    cat << EOF > /etc/yum.repos.d/my.repo
    [base]
    name=CentOS-\$releasever
    enabled=1
    failovermethod=priority
    baseurl=http://mirrors.cloud.aliyuncs.com/centos/\$releasever/os/\$basearch/
    gpgcheck=1
    gpgkey=http://mirrors.cloud.aliyuncs.com/centos/RPM-GPG-KEY-CentOS-7
    EOF

Preload the container images of DaemonSet components

If you may add more than 1,000 worker nodes to the cluster at a time, we recommend that you preload the container images of DaemonSet components when creating the custom image. This saves the operations for pulling these container images when nodes start and improves the efficiency of cluster scale-outs.

  1. Compress the required container images to TAR packages and stores them in the custom image.
    Assume that the Container Service for Kubernetes cluster uses the Terway network plug-in and Container Storage Interface (CSI) storage plug-in and resides in the China (Hangzhou) region. The sample script for preloading the container images of the preceding plug-ins is as follows:
    $ cat scripts/prepare-images.sh
    #! /bin/bash
    set -x -e
    EXPORT_PATH=/preheated
    
    # Install Docker.
    yum install -y docker
    systemctl start docker
    
    # Pull and save images.
    images=(
    registry-vpc.cn-hangzhou.aliyuncs.com/acs/csi-plugin:v1.14.5.60-5318afe-aliyun
    registry-vpc.cn-hangzhou.aliyuncs.com/acs/terway:v1.0.10.78-g97729ee-aliyun
    )
    
    mkdir -p ${EXPORT_PATH}
    
    for image in "${images[@]}"; do
        echo "preheating ${image}"
        docker pull ${image}
        docker save -o ${EXPORT_PATH}/$(echo ${image}| md5sum | cut -f1 -d" ").tar ${image}
    done
    
    # Uninstall Docker.
    yum erase -y docker
    rm -rf /var/lib/docker
  2. Log on to the Container Service console. Go to the cluster creation page, click Show Advanced Options, and then enter the following script in the User Data field:
    ls /preheated/ | xargs -n 1 -i docker load -i /preheated/{}
    rm -rf /preheated

Edit the configuration file of the custom image

Add the following configurations about provisioners to the alicloud.json file for creating the custom image:

  "provisioners": [
    {
      "type": "shell",
      "scripts": [
        "config/default.sh",
        "scripts/updateOSKernel.sh",
        "scripts/updateDNS.sh",
        "scripts/add-yum-repo.sh",
        "scripts/prepare-images.sh",
        "scripts/reboot.sh",
        "scripts/verify.sh"
      ],
      "expect_disconnect": true
    }
  ]
Note The config/default.sh, scripts/reboot.sh, and scripts/verify.sh scripts are default scripts that must be run. Others are user-defined scripts.

The config/default.sh script sets the time zone and disables the Swap partition.

The scripts/verify.sh script checks whether the custom image meets the requirements of Container Service for Kubernetes.

After you edit the configuration file of the custom image, you can create the custom image and use it to create or scale out a Container Service for Kubernetes cluster.