This topic provides answers to some frequently asked questions about container networks.

FAQ about Terway

FAQ about Flannel

FAQ about kube-proxy

FAQ about IPv6

How do I fix common issues related to IPv4/IPv6 dual stack?

Others

How do I resolve the issue that the cluster installed with the Terway network plug-in cannot access the Internet after I create a vSwitch for the cluster?

Symptom: In the Terway network, after a new vSwitch is created to provide more IP addresses for pods, the cluster cannot access the Internet.

Cause: The new vSwitch that assigns IP addresses to pods does not have access to the Internet.

Solution: You can create a NAT gateway and configure SNAT rules to enable the new vSwitch to access the Internet. For more information, see Enable an existing ACK cluster to access the Internet by using SNAT.

How do I resolve the issue that Flannel becomes incompatible with clusters of Kubernetes 1.16 or later after I manually update Flannel?

Symptom:

After the Kubernetes version of a cluster is updated to 1.16 or later, the nodes in the cluster change to the NotReady state.

Cause:

You manually updated Flannel but did not update the Flannel configuration. As a result, kubelet cannot recognize Flannel.

Solution:

  1. Run the following command to modify the Flannel configuration file and add the cniVersion field.
    kubectl edit cm kube-flannel-cfg -n kube-system 

    Add the cniVersion field to the configuration file.

    "name": "cb0",      
    "cniVersion":"0.3.0",
    "type": "flannel",
  2. Run the following command to restart Flannel:
    kubectl delete pod -n kube-system -l app=flannel

How do I resolve the issue that a pod is not immediately ready for communication after it is started?

Symptom:

After a pod is started, you must wait for a period of time before the pod is ready for communication.

Cause:

The network policies require a period of time to take effect. To resolve this issue, you can disable network policies.

Solution:

  1. Run the following command to modify the ConfigMap of Terway and disable network policies:
    kubectl edit cm -n kube-system eni-config 

    Add the following field to the ConfigMap:

    disable_network_policy: "true"
  2. Optional:If Terway is not updated to the latest version, log on to the Container Service for Kubernetes (ACK) console and update Terway.
    1. Log on to the ACK console.
    2. In the left-side navigation pane of the ACK console, click Clusters.
    3. On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
    4. In the left-side navigation pane, choose Operations > Add-ons.
    5. On the Add-ons page, click the Networking tab, find the Terway section, and then click Upgrade.
    6. In the Note message, click OK.
  3. Run the following command to restart all Terway pods:
     kubectl delete pod -n kube-system -l app=terway-eniip

How do I enable a pod to access a Service that is used to expose the pod?

Symptom:

Pods are not allowed to access Services that are used to expose them. If a pod accesses the Service that is exposed on the pod, the performance of the Service becomes unstable or scheduling errors may occur.

Cause:

Your cluster uses a Flannel version that does not allow loopback requests.
Note
  • Flannel versions earlier than 0.15.1.4-e02c8f12-aliyun do not allow loopback requests. If you directly update Flannel to 0.15.1.4-e02c8f12-aliyun or later, Flannel still does not allow loopback requests.
  • To allow loopback requests, you must uninstall the current Flannel version and then install Flannel 0.15.1.4-e02c8f12-aliyun or a later version.

Solution:

  • Use a headless Service to expose and access applications. For more information, see Headless Services.
    Note We recommend that you use this method.
  • Recreate a cluster that uses the Terway network plug-in. For more information, see Work with Terway.
  • Modify the Flannel configuration, reinstall Flannel, and then recreate the pod.
    Note We recommend that you do not use this method because the Flannel configuration may be overwritten when you update Flannel.
    1. Run the following command to modify cni-config.json:
      kubectl edit cm kube-flannel-cfg -n kube-system
    2. Add hairpinMode: true to the delegate parameter.
      Example:
      cni-conf.json: |
          {
            "name": "cb0",
            "cniVersion":"0.3.1",
            "type": "flannel",
            "delegate": {
              "isDefaultGateway": true,
              "hairpinMode": true
            }
          }
    3. Run the following command to restart Flannel:
      kubectl delete pod -n kube-system -l app=flannel   
    4. Delete and recreate the pod.

Which network plug-in should I choose for an ACK cluster, Terway or Flannel?

The following section describes the Flannel and Terway network plug-ins for ACK clusters.

You can select one of the following network plug-ins when you create an ACK cluster:
  • Flannel: a simple and stable Container Network Interface (CNI) plug-in developed by the Kubernetes community. You can use Flannel with Virtual Private Cloud (VPC) of Alibaba Cloud. This ensures that your clusters and containers run in a high-speed and stable network. However, Flannel provides only basic features and does not support standard Kubernetes network policies.
  • Terway: a network plug-in developed by ACK. Terway provides all the features of Flannel and allows you to attach Alibaba Cloud elastic network interfaces (ENIs) to containers. You can use Terway to configure access control policies for containers based on standard Kubernetes network policies. Terway also supports bandwidth throttling on individual containers. If you do not want to use Kubernetes network policies, you can choose Flannel. In other cases, we recommend that you choose Terway. For more information about Terway, see Work with Terway.

How do I plan the network of a cluster?

When you create an ACK cluster, you must specify a VPC, vSwitches, the pod CIDR (block) and the Service CIDR block. We recommend that you plan the CIDR block of Elastic Compute Service (ECS) instances in the cluster, the pod CIDR block, and the Service CIDR block before you create an ACK cluster. For more information, see Plan CIDR blocks for an ACK cluster.

Can I use the hostPort feature to create port mappings in an ACK cluster?

  • Only Flannel allows you to use the hostPort feature to create port mappings in an ACK cluster.
  • A pod in a VPC can be accessed by other cloud resources that are deployed in the same VPC through the endpoint of the pod in the VPC. Therefore, port mapping is not required.
  • To expose applications to the Internet, you can use NodePort Services or LoadBalancer Services.

Can I configure multiple route tables for the VPC where my cluster is deployed?

Only ACK dedicated clusters allow you to configure multiple route tables for the VPC. For more information about how to attach the policy to the RAM role, see Configure multiple route tables for a VPC. To configure multiple route tables for the VPC where an ACK managed cluster is deployed, Submit a ticket.

How do I check the network type and vSwitches of a cluster?

ACK supports two types of container network: Flannel and Terway.

To check the network type of the cluster, perform the following steps:

  1. Log on to the ACK console.
  2. In the left-side navigation pane, click Clusters.
  3. On the Clusters page, find the cluster that you want to manage and click the name of the cluster, or click Details in the Actions column. The details page of the cluster appears.
  4. On the Basic Information tab, check the value of Network Plug-in in the Cluster Information section.
    • If the value of Network Plug-in is terway-eniip, it indicates that the Terway network is used.
    • If the value of Network Plug-in is Flannel, it indicates that the Flannel network is used.

To check the vSwitch to which the nodes in the network belong, perform the following steps:

  1. Log on to the ACK console.
  2. In the left-side navigation pane of the ACK console, click Clusters.
  3. On the Clusters page, find the cluster that you want to manage. Then, click the name of the cluster or click Details in the Actions column.
  4. In the left-side navigation pane of the details page, choose Nodes > Node Pools.
  5. Find the node pool that you want to manage and click Details in the Actions column.

    In the Node Configurations section, check the value of Node vSwitch.

To check the vSwitch to which the pods in the Terway network belong, perform the following steps:

Note Only pods in the Terway network require a vSwitch. Pods in the Flannel network do not require a vSwitch.
  1. Log on to the ACK console.
  2. In the left-side navigation pane, click Clusters. On the Clusters page, find the cluster that you want to manage and click the name of the cluster, or click Details in the Actions column.
  3. On the details page of the cluster, click the Cluster Resources tab and check the value of Pod vSwitch.

How do I check the cloud resources used in an ACK cluster?

You can perform the following steps to check the cloud resources used in ACK clusters, such as vSwitches, VPCs, and worker RAM roles:
  1. Log on to the ACK console.
  2. In the left-side navigation pane, click Clusters.
  3. On the Clusters page, find the cluster that you want to manage and click the name of the cluster, or click Details in the Actions column. The details page of the cluster appears.
  4. On the Cluster Resources tab, view the information about the cloud resources that are used in the cluster.

How do I modify the kube-proxy configuration?

By default, a DaemonSet named kube-proxy-worker is deployed in an ACK managed cluster for load balancing. You can modify the ConfigMap named kube-proxy-worker to change the parameters of the kube-proxy-worker DaemonSet. If you use an ACK dedicated cluster, a DaemonSet and a ConfigMap both named kube-proxy-master are deployed on each master node.

The kube-proxy configuration items in ACK are fully compatible with the open source KubeProxyConfiguration. You can refer to the open source version when you customize configurations. For more information, see kube-proxy configuration. The kube-proxy configuration must conform to specific formats. Do not omit colons (:) or space characters. Perform the following steps to modify the kube-proxy configuration:

  • If you use an ACK managed cluster, modify the kube-proxy-worker ConfigMap.
    1. Log on to the ACK console.
    2. In the left-side navigation pane of the ACK console, click Clusters.
    3. On the Clusters page, find the cluster that you want to manage. Then, click the name of the cluster or click Details in the Actions column.
    4. In the left-side navigation pane of the details page, choose Configurations > ConfigMaps.
    5. On the ConfigMap page, select the kube-system namespace, find the kube-proxy-worker ConfigMap, and then click Edit YAML in the Actions column.
    6. In the View in YAML panel, modify the parameters and click OK.
    7. Recreate all kube-proxy-worker pods for the configuration to take effect.
      Notice Your workloads are not interrupted when kube-proxy is restarted. However, new releases will be postponed because the system must wait until kube-proxy is restarted. We recommend that you restart kube-proxy during off-peak hours.
      1. In the left-side navigation pane of the details page, choose Workloads > DaemonSets.
      2. On the DaemonSets page, find and click kube-proxy-worker.
      3. On the Pods tab of the kube-proxy-worker page, select a pod and choose More > Delete in the Actions column. In the message that appears, click Confirm.

        Repeat the preceding steps to delete all of the pods. After you delete the pods, the system automatically recreates the pods.

  • If you use an ACK dedicated cluster, modify the kube-proxy-worker and kube-proxy-master Configmaps, and then delete kube-proxy-worker and kube-proxy-master pods. The system automatically recreates them. For more information, see the preceding steps.

How do I increase the maximum number of tracked connections in the conntrack table of the Linux kernel?

If the dmesg command returns the conntrack full message, it indicates that the number of tracked connections in the conntrack table has exceeded the limit specified by conntrack_max. In this case, you may need to increase the maximum number of tracked connections in the conntrack table.
  1. Run the conntrack -L command to check the proportions of protocols used by the connections in the conntrack table.
    • If a large number of TCP connections are found, you must check the application type and consider using long-lived connections to replace short-lived connections.
    • If a large number of DNS connections are found, you must use NodeLocal DNSCache to improve DNS performance for your cluster. For more information, see Configure NodeLocal DNSCache.
  2. If the proportions of connections in the conntrack table are proper or you do not want to modify the application, you can add the maxPerCore parameter to the kube-proxy configuration to adjust the maximum number of tracked connections.
    • If you use an ACK managed cluster, add the maxPerCore parameter to the kube-proxy-worker ConfigMap and set its value to 65536 or greater. Then, delete kube-proxy-worker pods. The system automatically recreates the pods for the configuration to take effect. For more information about how to modify the kube-proxy-worker ConfigMap and delete kube-proxy-worker pods, see How do I modify the kube-proxy configuration?.
      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: kube-proxy-worker
        namespace: kube-system
      data:
        config.conf: |
          apiVersion: kubeproxy.config.k8s.io/v1alpha1
          kind: KubeProxyConfiguration
          featureGates:
            IPv6DualStack: true
          clusterCIDR: 172.20.0.0/16
          clientConnection:
            kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
          conntrack:
            maxPerCore: 65536 // Set maxPerCore to a proper value. Default value: 65536. 
          mode: ipvs
      # Irrelevant fields are not shown.
    • If you use an ACK dedicated cluster, add the maxPerCore parameter to the kube-proxy-worker and kube-proxy-master ConfigMaps and set its value to 65536 or greater. Then, delete kube-proxy-worker and kube-proxy-master pods. The system automatically recreates the pods for the configurations to take effect. For more information about how to modify the kube-proxy-worker and kube-proxy-master ConfigMaps, and delete kube-proxy-worker and kube-proxy-master pods, see How do I modify the kube-proxy configuration?.

How do I modify the IPVS load balancing algorithm in the kube-proxy configuration?

You can modify the IPVS load balancing algorithm in the kube-proxy configuration to resolve the issue that a large number of long-lived connections are not evenly distributed. To do this, perform the following steps:
  1. Select a proper scheduling algorithm. For more information about how to select scheduling algorithms, see Parameter changes.
  2. Set the ipvs scheduler parameter to the proper scheduling algorithm.
    • If you use an ACK managed cluster, set the ipvs scheduler parameter to the proper scheduling algorithm in the kube-proxy-worker ConfigMap. Then, delete kube-proxy-worker pods. The system automatically recreates the pods for the configuration to take effect. For more information about how to modify the kube-proxy-worker ConfigMap and delete kube-proxy-worker pods, see How do I modify the kube-proxy configuration?.
      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: kube-proxy-worker
        namespace: kube-system
      data:
        config.conf: |
          apiVersion: kubeproxy.config.k8s.io/v1alpha1
          kind: KubeProxyConfiguration
          featureGates:
            IPv6DualStack: true
          clusterCIDR: 172.20.0.0/16
          clientConnection:
            kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
          conntrack:
            maxPerCore: 65536
          mode: ipvs
          ipvs:
            scheduler: lc // Set scheduler to a proper scheduling algorithm. 
      # Irrelevant fields are not shown.
    • If you use an ACK dedicated cluster, set the ipvs scheduler parameter to the proper scheduling algorithm in the kube-proxy-worker and kube-proxy-master ConfigMaps. Then, delete kube-proxy-worker and kube-proxy-master pods. The system automatically recreates the pods for the configurations to take effect. For more information about how to modify the kube-proxy-worker and kube-proxy-master ConfigMaps, and delete kube-proxy-worker and kube-proxy-master pods, see How do I modify the kube-proxy configuration?.

How do I modify the timeout period for IPVS UDP sessions in the kube-proxy configuration?

If the kube-proxy mode is set to IPVS in your cluster, the default session persistence policy of IPVS may cause packet loss five minutes after the UDP connection is closed. If your application is dependent on CoreDNS, issues such as API latency and request timeouts may occur five minutes after CoreDNS is updated or its host is restarted.

If your applications in the cluster do not use the UDP protocol, you can shorten the timeout period for IPVS UDP sessions to minimize the effect of DNS resolution latency or failures. Perform the following operations:
Note If your application uses the UDP protocol, Submit a ticket to request technical support.
  • ACK clusters that run Kubernetes 1.18 or later
    • If you use an ACK managed cluster, modify the udpTimeout parameter in the kube-proxy-worker ConfigMap. Then, delete kube-proxy-worker pods. The system automatically recreates the pods for the configuration to take effect. For more information about how to modify the kube-proxy-worker ConfigMap and delete kube-proxy-worker pods, see How do I modify the kube-proxy configuration?.
      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: kube-proxy-worker
        namespace: kube-system
      data:
        config.conf: |
          apiVersion: kubeproxy.config.k8s.io/v1alpha1
          kind: KubeProxyConfiguration
          # Irrelevant fields are not shown. 
          mode: ipvs
          // If the ipvs field does not exist, you must add the field. 
          ipvs:
            udpTimeout: 10s // The default timeout period is 300 seconds. In this example, the timeout period is set to 10 seconds. This way, the time period during which packet loss may occur is reduced to 10 seconds after a UDP connection is closed. 
    • If you use an ACK dedicated cluster, modify the udpTimeout parameter in the kube-proxy-worker and kube-proxy-master ConfigMaps. Then, delete kube-proxy-worker and kube-proxy-master pods. The system automatically recreates the pods for the configurations to take effect. For more information about how to modify the kube-proxy-worker ConfigMap and delete kube-proxy-worker pods, see How do I modify the kube-proxy configuration?.
  • ACK clusters that run Kubernetes 1.16 or earlier
    The kube-proxy component in these ACK clusters does not support the udpTimeout parameter. We recommend that you use Operation Orchestration Service (OOS) to run the following ipvsadm commands on all cluster nodes to modify the UDP timeout configuration: Run the following commands:
    yum install -y ipvsadm
    ipvsadm -L --timeout > /tmp/ipvsadm_timeout_old
    ipvsadm --set 900 120 10
    ipvsadm -L --timeout > /tmp/ipvsadm_timeout_new
    diff /tmp/ipvsadm_timeout_old /tmp/ipvsadm_timeout_new

    For more information about how to use OOS to manage multiple ECS instances at the same time, see Manage multiple instances.

How do I fix common issues related to IPv4/IPv6 dual stack?

  • Symptom: The pod IP addresses displayed in kubectl are still IPv4 addresses.

    Solution: Run the following command to display pod IPv6 addresses:

    kubectl get pods -A -o jsonpath='{range .items[*]}{@.metadata.namespace} {@.metadata.name} {@.status.podIPs[*].ip} {"\n"}{end}'
  • Symptom: The cluster IP addresses displayed in kubectl are still IPv4 addresses.
    Solution:
    1. Confirm that the spec.ipFamilyPolicy field is not set to SingleStack.
    2. Run the following command to display cluster IPv6 addresses:
      kubectl get svc -A -o jsonpath='{range .items[*]}{@.metadata.namespace} {@.metadata.name} {@.spec.ipFamilyPolicy} {@.spec.clusterIPs[*]} {"\n"}{end}'
  • Symptom: You cannot access a pod through its IPv6 address.

    Cause: By default, some applications do not listen on IPv6 addresses, such as NGINX containers.

    Solution: Run the netstat -anp command to check whether the pod listens on IPv6 addresses.

    Expected output:

    Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
    tcp        0      0 127.0.XX.XX:10248         0.0.0.0:*               LISTEN      8196/kubelet
    tcp        0      0 127.0.XX.XX:41935         0.0.0.0:*               LISTEN      8196/kubelet
    tcp        0      0 0.0.XX.XX:111             0.0.0.0:*               LISTEN      598/rpcbind
    tcp        0      0 0.0.XX.XX:22              0.0.0.0:*               LISTEN      3577/sshd
    tcp6       0      0 :::30500                :::*                    LISTEN      1916680/kube-proxy
    tcp6       0      0 :::10250                :::*                    LISTEN      8196/kubelet
    tcp6       0      0 :::31183                :::*                    LISTEN      1916680/kube-proxy
    tcp6       0      0 :::10255                :::*                    LISTEN      8196/kubelet
    tcp6       0      0 :::111                  :::*                    LISTEN      598/rpcbind
    tcp6       0      0 :::10256                :::*                    LISTEN      1916680/kube-proxy
    tcp6       0      0 :::31641                :::*                    LISTEN      1916680/kube-proxy
    udp        0      0 0.0.0.0:68              0.0.0.0:*                           4892/dhclient
    udp        0      0 0.0.0.0:111             0.0.0.0:*                           598/rpcbind
    udp        0      0 47.100.XX.XX:323           0.0.0.0:*                           6750/chronyd
    udp        0      0 0.0.0.0:720             0.0.0.0:*                           598/rpcbind
    udp6       0      0 :::111                  :::*                                598/rpcbind
    udp6       0      0 ::1:323                 :::*                                6750/chronyd
    udp6       0      0 fe80::216:XXXX:fe03:546 :::*                                6673/dhclient
    udp6       0      0 :::720                  :::*                                598/rpcbind

    If the value in the Proto column is tcp, it indicates that the pod listens on IPv4 addresses. If the value is tcp6, it indicates that the pod listens on IPv6 addresses.

  • Symptom: You can access a pod through its IPv6 address from within the cluster but not from the Internet.

    Cause: No public bandwidth is configured for the IPv6 address.

    Solution: Configure public bandwidth for the IPv6 address. see Enable and manage IPv6 Internet bandwidth.

  • Symptom: You cannot access a pod through the cluster IPv6 address.
    Solution:
    1. Confirm that the spec.ipFamilyPolicy field is not set to SingleStack.
    2. Run the netstat -anp command to check whether the pod listens on IPv6 addresses.

      Expected output:

      Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
      tcp        0      0 127.0.XX.XX:10248         0.0.0.0:*               LISTEN      8196/kubelet
      tcp        0      0 127.0.XX.XX:41935         0.0.0.0:*               LISTEN      8196/kubelet
      tcp        0      0 0.0.XX.XX:111             0.0.0.0:*               LISTEN      598/rpcbind
      tcp        0      0 0.0.XX.XX:22              0.0.0.0:*               LISTEN      3577/sshd
      tcp6       0      0 :::30500                :::*                    LISTEN      1916680/kube-proxy
      tcp6       0      0 :::10250                :::*                    LISTEN      8196/kubelet
      tcp6       0      0 :::31183                :::*                    LISTEN      1916680/kube-proxy
      tcp6       0      0 :::10255                :::*                    LISTEN      8196/kubelet
      tcp6       0      0 :::111                  :::*                    LISTEN      598/rpcbind
      tcp6       0      0 :::10256                :::*                    LISTEN      1916680/kube-proxy
      tcp6       0      0 :::31641                :::*                    LISTEN      1916680/kube-proxy
      udp        0      0 0.0.0.0:68              0.0.0.0:*                           4892/dhclient
      udp        0      0 0.0.0.0:111             0.0.0.0:*                           598/rpcbind
      udp        0      0 47.100.XX.XX:323           0.0.0.0:*                           6750/chronyd
      udp        0      0 0.0.0.0:720             0.0.0.0:*                           598/rpcbind
      udp6       0      0 :::111                  :::*                                598/rpcbind
      udp6       0      0 ::1:323                 :::*                                6750/chronyd
      udp6       0      0 fe80::216:XXXX:fe03:546 :::*                                6673/dhclient
      udp6       0      0 :::720                  :::*                                598/rpcbind

      If the value in the Proto column is tcp, it indicates that the pod listens on IPv4 addresses. If the value is tcp6, it indicates that the pod listens on IPv6 addresses.

What do I do if the IP address of a newly created pod does not fall within the vSwitch CIDR block in Terway mode?

Symptom:

In Terway mode, the IP address of a newly created pod does not fall within the vSwitch CIDR block.

Cause:

After the pod is created, the ENI of the ECS instance assigns an IP address from the VPC CIDR block to the pod. You can configure vSwitches only if a newly created ENI is attached to the node. If an ENI is attached to the node before you add the node to the cluster or modify vSwitches used by Terway, the ENI assigns IP addresses to newly created pods on the node from the vSwitch to which the ENI belongs.

This issue may occur in the following scenarios:

  • You add a node that is removed from another cluster to your cluster. The node was not drained when it was removed. In this case, the node uses the ENI that was attached by the cluster to which the node previously belonged.
  • You manually add or modify vSwitches used by Terway. However, a node may still be attached to the original ENI. In this case, the ENI assigns IP addresses to newly created pods on the node from the vSwitch to which the ENI belongs.

Solution:

Create new nodes or create pods on other nodes.

What do I do if the IP address of a newly created pod does not fall within the vSwitch CIDR block after I add a vSwitch in Terway mode?

Symptom:

In Terway mode, the IP address of a newly created pod does not fall within the vSwitch CIDR block after you add a vSwitch.

Cause:

After the pod is created, the ENI of the ECS instance assigns an IP address from the VPC CIDR block to the pod. You can configure vSwitches only if a newly created ENI is attached to the node. If an ENI is attached to the node before you add the node to the cluster or modify vSwitches used by Terway, the ENI assigns IP addresses to newly created pods on the node from the vSwitch to which the ENI belongs. If the number of ENIs that are attached to the node reaches the upper limit, new ENIs cannot be created. As a result, an error is returned. For more information about the ENI quota, see Limits.

Solution:

Create new nodes or create pods on other nodes.

How do I enable load balancing within a cluster in Terway IPVLAN mode?

Symptom:

In Terway IPVLAN mode, when you create a cluster that uses Terway 1.2.0 or a later version, load balancing is automatically enabled within the cluster. If you access an external IP address or an SLB instance from within a cluster, the traffic is routed to the backend Service. How do I enable load balancing within an existing cluster that uses the Terway IPVLAN mode?

Cause:

If you access an external IP address or an SLB instance from within a cluster, kube-proxy directly routes the traffic to the endpoint of the backend Service. In Terway IPVLAN mode, the traffic is handled by Cilium instead of kube-proxy. This feature is not supported in Terway versions earlier than 1.2.0. This feature is automatically enabled for clusters that use Terway 1.2.0 or later versions. You must manually enable this feature for clusters that use Terway versions earlier than 1.2.0.

Solution:

Note
  • Update Terway to 1.2.0 or later and enable the IPVLAN mode.
  • If the IPVLAN mode is not enabled, the following configuration does not take effect.
  • This feature is automatically enabled for newly created clusters that use Terway 1.2.0 or later versions.
  1. Run the following command to modify the ConfigMap of Terway named eni_conf:
    kubectl edit cm eni-config -n kube-system
  2. Add the following content to the eni_conf ConfigMap:
    in_cluster_loadbalance: "true"
    Note Make sure that in_cluster_loadbalance is aligned with eni_conf.
  3. Run the following command to enable load balancing within the cluster by recreating the Terway pod:
    kubectl delete pod -n kube-system -l app=terway-eniip
    Verify the configuration
    Run the following command to query the policy log of the terway-ennip application. If enable-in-cluster-loadbalance=true is returned, the configuration is applied.
    kubectl logs -n kube-system <terway pod name> policy | grep enable-in-cluster-loadbalance

How do I add the pod CIDR block to a whitelist if my cluster uses the Terway network plug-in?

Symptom:

You want to enforce access control on services such as database services by using whitelists. To configure access control in container networks, you must add pod IP addresses to a whitelist. However, pod IP addresses dynamically change.

Cause:

ACK provides the Flannel and Terway network plug-ins to set up the container network:
  • If your cluster uses Flannel as the network plug-in, pods in the cluster use the IP addresses of the nodes that host the pods to access services such as database services. You can schedule client pods to a small number of nodes and then add the IP addresses of these nodes to the whitelist of a database service.
  • If your cluster uses Terway as the network plug-in, ENIs assign IP addresses to pods in the cluster. Pods use the IP addresses assigned by ENIs to access external services. Therefore, client pods do not use the IP addresses of nodes to access external services even if you schedule the client pods to specific nodes based on affinity settings. Random IP addresses are allocated to pods from the vSwitch that is specified by Terway. In most cases, auto scaling is required for pods. Therefore, static IP addresses are not suitable for pods. To meet the requirement of auto scaling, we recommend that you specify a CIDR block for pods and then add the CIDR block to the whitelist of a database service that you want to access.

Solution:

Add a label to a node to specify the vSwitch that is used to allocate IP addresses to pods. This way, the system uses the vSwitch to allocate IP addresses to pods when the pods are scheduled to the node with the specified label.

  1. Create a ConfigMap named eni-config-fixed in the kube-system namespace. Specify the vSwitch that you want to use in the ConfigMap.

    In this example, the vsw-2zem796p76viir02c**** vSwitch is used. The CIDR block of the vSwitch is 10.2.1.0/24.

    apiVersion: v1
    data:
      eni_conf: |
        {
           "vswitches": {"cn-beijing-h":["vsw-2zem796p76viir02c****"]},
           "security_group": "sg-bp19k3sj8dk3dcd7****",
           "security_groups": ["sg-bp1b39sjf3v49c33****","sg-bp1bpdfg35tg****"]
        }
    kind: ConfigMap
    metadata:
      name: eni-config-fixed
      namespace: kube-system
  2. Create a node pool and add the terway-config: eni-config-fixed label to the nodes in the node pool. For more information about how to create a node pool, see Create a node pool.
    To ensure that irrelevant pods are not scheduled to the nodes in the node pool, add specific taints, for example, fixed=true:NoSchedule, to the nodes. 82
  3. Scale out the node pool. For more information about how to attach the policy to the RAM role, see Increase the number of nodes in an ACK cluster.
    The label and taint that you added in the previous step are automatically added to the nodes that are newly added to the node pool.
  4. Create pods. To ensure that the pods are scheduled to nodes with the terway-config: eni-config-fixed label, you must add a specific toleration rule to the pod configuration.
    apiVersion: apps/v1 # Use apps/v1beta1 in clusters that run Kubernetes 1.8.0 or earlier versions. 
    kind: Deployment
    metadata:
      name: nginx-fixed
      labels:
        app: nginx-fixed
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: nginx-fixed
      template:
        metadata:
          labels:
            app: nginx-fixed
        spec:
          tolerations:        # Add a toleration rule. 
          - key: "fixed"
            operator: "Equal"
            value: "true"
            effect: "NoSchedule"
          nodeSelector:
            terway-config: eni-config-fixed
          containers:
          - name: nginx
            image: nginx:1.9.0 # Replace the value with the actual image that you use. Set the value in the following format: <image_name:tags>. 
            ports:
            - containerPort: 80
    Verify the result
    1. Run the following command to query the IP addresses of the pods that you created:
      kubectl get po -o wide | grep fixed
      Expected output:
      nginx-fixed-57d4c9bd97-l****                   1/1     Running             0          39s    10.2.1.124    bj-tw.062149.aliyun.com   <none>           <none>
      nginx-fixed-57d4c9bd97-t****                   1/1     Running             0          39s    10.2.1.125    bj-tw.062148.aliyun.com   <none>           <none>
      The output shows that the IP addresses of the pods are allocated from the vSwitch that you specified.
    2. Run the following command to increase the number of pods to 30:
      kubectl scale deployment nginx-fixed --replicas=30
      Expected output:
      nginx-fixed-57d4c9bd97-2****                   1/1     Running     0          60s     10.2.1.132    bj-tw.062148.aliyun.com   <none>           <none>
      nginx-fixed-57d4c9bd97-4****                   1/1     Running     0          60s     10.2.1.144    bj-tw.062149.aliyun.com   <none>           <none>
      nginx-fixed-57d4c9bd97-5****                   1/1     Running     0          60s     10.2.1.143    bj-tw.062148.aliyun.com   <none>           <none>
      ...
      The output shows that the IP addresses of the pods are allocated from the vSwitch that you specified. You can add the CIDR block of the vSwitch to the whitelist of a database service that you want to access. This way, access control is enforced for pods that use dynamic IP addresses.
Note
  • We recommend that you create new nodes. If you use existing nodes, you must disassociate ENIs from the ECS instances before you can add the ECS instances to the cluster. You must add existing nodes in Auto mode. In Auto mode, the system disks of the nodes are replaced. For more information, see Unbind an ENI and Automatically add ECS instances.
  • You must add specific labels and taints to the nodes in the node pools that you want to use to host client pods. This ensures that irrelevant pods are not scheduled to the node pools.
  • After you create the eni-config-fixed ConfigMap, it overwrites the eni-config ConfigMap. For more information about how to configure the eni-config ConfigMap, see Dynamic Terway configuration on nodes.
  • We recommend that you specify a vSwitch that can provide at least twice as many IP addresses as the client pods. This ensures sufficient IP addresses when pods are scaled out or IP addresses cannot be reclaimed.

What do I do if I fail to ping ECS nodes from a pod?

Symptom:

Your cluster uses Flannel as the network plug-in and the VPN gateway works as normal. After you log on to a pod, you fail to ping specific ECS nodes.

Cause:

The following causes are possible:
  • Cause 1: The ECS instances that you accessed from the pod are deployed in the same VPC as the cluster in which the pod is deployed but do not belong to the same security group as the cluster.
  • Cause 2: The ECS instances that you accessed from the pod are not deployed in the same VPC as the cluster in which the pod is deployed.

Solution:

The solution to the issue varies based on the cause.
  • If the issue is caused by Cause 1, you must add the ECS instances to the security group to which the cluster belongs. For more information about how to attach the policy to the RAM role, see Configure a security group.
  • If the issue is caused by Cause 2, you must add the public IP address of the cluster in which the pod is deployed to the inbound rules of the security group to which the ECS instances belong.