Background information

Due to the errors in Elastic Compute Service (ECS) OpenAPI on February 23, 2021, IP addresses allocated to pods that were created on this day may be invalid. As a result, the pods may become inaccessible. Only ACK clusters that run Terway in exclusive ENI mode or inclusive ENI mode are affected. We recommend that you perform the following steps to check for pods that have this issue and then fix the issue:


Step 1: Run a script to scan nodes

Run the following script on each node:
set -e
    echo "error at line $1"
trap 'err $LINENO' ERR
    pid=$(docker inspect $cid -f '{{.State.Pid}}')
    if [ -z "$pid" ]; then
        echo 'cannot get pid from container $cid'
        return 1
    nsenter -t $pid -n curl -s --connect-timeout 4 -o /dev/null
for line in $(docker ps|grep -v k8s_POD|awk '$NF~/^k8s_/{print $1"_"$NF}'|awk -F_ '{print $1"_"$3"_"$4"_"$5}')
    IFS=_ read cid cname pod namespace <<< $line
    if ! check $cid; then
        echo "pod $namespace/$pod has connectivity issues"
    if [[ "$cname" == "terway" && "$namespace" == "kube-system"  && "$pod" =~ ^(terway-|terway-eniip-|terway-eni-) ]]; then
if [ -n "$terway_container" ]; then
   for pod in $(docker exec -it $terway_container terway-cli mapping|sed -r "s/\x1B\[([0-9]{1,3}(;[0-9]{1,2})?)?[mGK]//g"|awk '$3=="X"{print $1}')
        echo "pod $pod on this host has connectivity issues"

Expected output:

pod *** has connectivity issues

If the preceding output is returned, it indicates that the pod may be assigned an invalid IP address.

Step 2: Recreate the pods that cannot connect to the network

  1. Recreate the pods on nodes where Terway is installed.
    kubectl -n kube-system delete pod -l app=terway 
    kubectl -n kube-system delete pod -l app=terway-eniip
    kubectl -n kube-system delete pod -l app=terway-eni
  2. Recreate the pods that cannot connect to the network.
    • If the pod is created from a Deployment or DaemonSet, you can directly delete the pod. Then, the system recreates the pod.
    • If the pod is manually created, you must delete the pod and recreate it.
Note The preceding script is executed to reload the Terway plug-in on nodes where the pods cannot connect to the network. After the script is executed, you can perform the operations in Step 1: Run a script to scan nodes again to check whether the cluster works as expected. If the issue persists, Submit a ticket.