All Products
Search
Document Center

Container Service for Kubernetes:Improve the performance of the NetworkPolicy feature for a large ACK cluster in Terway mode

Last Updated:May 23, 2024

In a Container Service for Kubernetes (ACK) cluster that has the Terway network plug-in installed, you can use the NetworkPolicy feature to control communication among pods. When an ACK cluster that has Terway installed contains more than 100 nodes, the NetworkPolicy proxies cause heavy loads on the management of the cluster. To resolve this issue, you must optimize the NetworkPolicy feature for the cluster. This topic describes how to optimize the performance of the NetworkPolicy feature for a large ACK cluster in Terway mode.

Background information

Terway implements the NetworkPolicy feature by using the Felix agent of Calico. In an ACK cluster that contains more than 100 nodes, the Felix agent on each node retrieves proxy rules from the API server. This increases the loads of the API server. To reduce the loads of the API server, you can disable the NetworkPolicy feature or deploy the Typha component as a repeater.

You can improve the performance of the NetworkPolicy feature for a large ACK cluster in the following ways:

  • Deploy Typha as a repeater.

  • Disable the NetworkPolicy feature.

    Note

    After you disable the NetworkPolicy feature, you cannot use network policies to control communication among pods.

Prerequisites

Deploy Typha as a repeater

  1. Log on to the ACK console.

  2. Update Terway to the latest version. For more information, see Manage components.

    Components used in different Terway modes are different. For more information, see Compare Terway modes.

  3. Create a file named calico-typha.yaml and copy the following content to the file to deploy Typha as a repeater.

    apiVersion: v1
    kind: Service
    metadata:
      name: calico-typha
      namespace: kube-system
      labels:
        k8s-app: calico-typha
    spec:
      ports:
        - port: 5473
          protocol: TCP
          targetPort: calico-typha
          name: calico-typha
      selector:
        k8s-app: calico-typha
    
    ---
    
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: calico-typha
      namespace: kube-system
      labels:
        k8s-app: calico-typha
    spec:
      replicas: 3 # Modify the value of the replicas parameter based on the cluster size. Create 1 replica for every 200 nodes. You must create at least three replicas. 
      revisionHistoryLimit: 2
      selector:
        matchLabels:
          k8s-app: calico-typha
      template:
        metadata:
          labels:
            k8s-app: calico-typha
          annotations:
            cluster-autoscaler.kubernetes.io/safe-to-evict: 'true'
        spec:
          nodeSelector:
            kubernetes.io/os: linux
          hostNetwork: true
          tolerations:
            - operator: Exists
          serviceAccountName: terway
          priorityClassName: system-cluster-critical
          containers:
          -image: registry-vpc.{REGION-ID}.aliyuncs.com/acs/typha:v3.20.2 # Replace {REGION-ID} with the region ID of the cluster. 
            name: calico-typha
            ports:
            - containerPort: 5473
              name: calico-typha
              protocol: TCP
            env:
              - name: TYPHA_LOGSEVERITYSCREEN
                value: "info"
              - name: TYPHA_LOGFILEPATH
                value: "none"
              - name: TYPHA_LOGSEVERITYSYS
                value: "none"
              - name: TYPHA_CONNECTIONREBALANCINGMODE
                value: "kubernetes"
              - name: TYPHA_DATASTORETYPE
                value: "kubernetes"
              - name: TYPHA_HEALTHENABLED
                value: "true"
            livenessProbe:
              httpGet:
                path: /liveness
                port: 9098
                host: localhost
              periodSeconds: 30
              initialDelaySeconds: 30
            readinessProbe:
              httpGet:
                path: /readiness
                port: 9098
                host: localhost
              periodSeconds: 10
    
    ---
    
    apiVersion: policy/v1 # If the Kubernetes version of the cluster is earlier than 1.21, set the value of the apiVersion parameter to policy/v1beta1. 
    kind: PodDisruptionBudget
    metadata:
      name: calico-typha
      namespace: kube-system
      labels:
        k8s-app: calico-typha
    spec:
      maxUnavailable: 1
      selector:
        matchLabels:
          k8s-app: calico-typha
    
    ---
    
    apiVersion: apiextensions.k8s.io/v1
    kind: CustomResourceDefinition
    metadata:
      name: bgppeers.crd.projectcalico.org
    spec:
      scope: Cluster
      group: crd.projectcalico.org
      versions:
      - name: v1
        served: true
        storage: true
        schema:
          openAPIV3Schema:
            type: object
            properties:
              apiVersion:
                type: string
      names:
        kind: BGPPeer
        plural: bgppeers
        singular: bgppeer
    Note
    • Replace {REGION-ID} with the specified region ID.

    • Modify the value of the replicas parameter based on the cluster size. Create 1 replica for every 200 nodes. You must create at least three replicas.

    • Modify the value of the apiVersion parameter of PodDisruptionBudget based on the Kubernetes version of the cluster. If the Kubernetes version of the cluster is 1.21 or later, set the value of the apiVersion parameter to policy/v1. If the Kubernetes version of the cluster is earlier than 1.21, set the value of the apiVersion parameter to policy/v1beta1.

  4. Run the following command to deploy Typha as a repeater:

    kubectl apply -f calico-typha.yaml
  5. Run the following command to modify the eni-config configuration file of the Terway plug-in:

    kubectl edit cm eni-config -n kube-system

    Add the felix_relay_service: calico-typha repeater configuration to the file and set the value of the disable_network_policy parameter to "false". If this parameter is unavailable, no configuration is required. The configuration of the two parameters must be aligned with the eni_conf parameter.

      felix_relay_service: calico-typha
      disable_network_policy: "false" # If this parameter is unavailable, you do not need to add the setting.
  6. Run the following command to restart Terway:

    kubectl get pod -n kube-system  | grep terway | awk '{print $1}' | xargs kubectl delete -n kube-system pod

    Expected output:

    pod "terway-eniip-8hmz7" deleted
    pod "terway-eniip-dclfn" deleted
    pod "terway-eniip-rmctm" deleted
    ...

Disable the NetworkPolicy feature

If you no longer need to use network policies, you can disable the NetworkPolicy feature to reduce the heavy load on the API server. The heavy load is caused by the NetworkPolicy proxies.

  1. Run the following command to modify the eni-config configuration file of the Terway plug-in, and add the disable_network_policy: "true" setting to disable the NetworkPolicy feature.

    kubectl edit cm -n kube-system eni-config 
    #Add or modify (if this key exists) the following setting:
    disable_network_policy: "true"
  2. Run the following command to restart Terway:

    kubectl get pod -n kube-system  | grep terway | awk '{print $1}' | xargs kubectl delete -n kube-system pod

    Expected output:

    pod "terway-eniip-8hmz7" deleted
    pod "terway-eniip-dclfn" deleted
    pod "terway-eniip-rmctm" deleted
    ...

Result

After the preceding operations are complete, the NetworkPolicy proxies start to use the Typha component. This reduces the loads on the API server. You can monitor the traffic that is distributed to the Server Load Balancer (SLB) instances to check whether the loads on the API server are reduced.