FAQ about Adding Nodes to an ACK Cluster - - Alibaba Cloud Documentation Center

Overview

This document describes the FAQ about adding existing Elastic Compute Service (ECS) instances to a Container Service for Kubernetes (ACK) cluster.

Details

Prerequisites

If an ECS instance fails to be added, make sure that the following prerequisites are met:

The ACK cluster is running.
The ECS instance and the Kubernetes cluster reside in the same virtual private cloud (VPC) in the same region.
The ECS instance has not been added to another ACK cluster.
The ECS instance is running.
An elastic IP address (EIP) is bound to the existing ECS instance. Alternatively, the source network address translation (SNAT) entries are configured in a vSwitch for Internet access. In short, make sure that the ECS instance can access the Internet.
The ECS instance runs the CentOS operating system.
The number of ECS instances to be added each time is less than or equal to 100.
A sufficient quota is provided for adding ECS instances to the ACK cluster.
If the Flannel plug-in is used, the quota of route entries for a VPC is sufficient.
If you are a RAM user, you have permissions to manage nodes in the ACK cluster.
We recommend that you use the default system image provided by ACK. For more information about custom images, see Use a custom image to create an ACK cluster.

Procedure of adding an ECS instance

The following figure describes how to add an ECS instance to an ACK cluster by calling API operations.

Troubleshooting

Log on to the ACK console. Find the cluster that you want to manage and click View Logs in the Actions column. On the log information page of the cluster, troubleshoot errors based on the log information of the cluster. A "Node XXX joined cluster successfully" message indicates that an ECS instance is added to the cluster.
Troubleshoot a specific error. A "Wait k8s node XXX join cluster timeout" message indicates that an error occurred when a deployment script was executed. In this case, you can log on to the ECS instance where the error occurs and troubleshoot the error based on deployment logs. You can run the following command to obtain the deployment logs:
```
cat /var/log/messages | grep cloud-init
```

Check the cluster status and system configurations by using the intelligent operations and maintenance (O&M) feature. For more information, see Use cluster check to troubleshoot cluster issues.
Collect diagnostics logs to troubleshoot errors. For more information, see How do I collect Kubernetes diagnosis information when a Kubernetes cluster exception or a cluster node exception occurs?

FAQ

Error message	Description and solution
Code:ForbiddenAttachInstance, Message:Forbidden attach instance	The error message returned because the RAM user does not have the permissions to manage the Kubernetes cluster. For more information about how to grant permissions to RAM users, see Authorization overview.
Code:ErrorNoAttachEcsInstance, Message:ecs instances invalid	The error message returned because the ECS instances does not meet the requirements. Adjust the configurations of the ECS instance based on the prerequisites.
Throttling Message: Request was denied due to request throttling.	The error message returned because the API request was denied due to throttling. Try again later.
Code: 404 Code: InvalidImageId.NotFound Message: The specified ImageId does not exist	The error message returned because the custom image ID does not exist. Make sure that the custom image ID is valid.
Code: IncorrectInstanceStatus Message: The specified instance is in an incorrect status for the requested action	The error message returned because the status check of the ECS instance failed. Make sure that the ECS instance is running.
Code: OperationDenied.UnpaidOrder Message: The specified instance has unpaid order.	The error message returned because the ECS instance has unpaid bills. Pay the bills and try again later.
error on the server ("Get https://XXXX:XX/api/v1/namespaces/kube-system/services/kube-dns: net/http: request canceled while waiting for connection	The error message returned because the kube-dns service is unavailable. Make sure that you can connect to the kube-dns service by running the kubectl -n kube-system get svc command.
OperationDenied Message: The specified image contains the snapshot of the data disk,does not support this operation.	The error message returned because the custom image contains the snapshot of the data disk. Unbind the data disk from the ECS instance and generate a new custom image.
Failed to config security group: wait for ecs instance join to security group i-xx running timeout	The error message returned because the ECS instance failed to be added to the default security group of the cluster. Manually add the ECS instance to the security group.
Failed to start instance i-xx: Aliyun API Error: RequestId: 909DA063-0BAE-4C40-844C-01FDAA502F80 Status Code: 403 Code: IncorrectInstanceStatus Message: The specified instance is in an incorrect status for the requested action; Status of the specified instance is Running but the expected status is in (Stopped).	The error message returned because the ECS instance named i-xx is not in the expected status. In general, this error occurs due to human interference. Add the instance again without human interference.
Failed to attach node i-xxxx, err Aliyun API Error: RequestId: 7CE63A45-7932-493D-AE54-D1F199FD1EC7 Status Code: 403 Code: OperationDenied.UnpaidOrder Message: The specified instance has unpaid order.	The error message returned because the ECS instance named i-xx has unpaid bills. Pay the bills.
mout: unknown filesystem type 'swap'	The error message returned because your disk has been formatted into swap partitions. Format your disk to ext4 partitions or delete all partitions.
error ipv4 ip_forward not set to 1	The error message returned because the ip_forward parameter is not set to 1. Set the ip_forward parameter to 1 for each node. You can run the following command to change the value of the ip_forward parameter: echo 1 > /proc/sys/net/ipv4/ip_forward
May 27 17:11:32 iZuXXXz2lZ cloud-init: [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused. May 27 17:11:32 iZuXXXz2lZ cloud-init: Unfortunately, an error has occurred: May 27 17:11:32 iZuXXXz2lZ cloud-init: timed out waiting for the condition May 27 17:11:32 iZuXXXz2lZ cloud-init: This error is likely caused by: May 27 17:11:32 iZuXXXz2lZ cloud-init: - The kubelet is not running May 27 17:11:32 iZuXXXz2lZ cloud-init: - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)	The error message returned because kubelet failed to be started. You can troubleshoot the error by running the following command: journalctl -u kubelet
curl -k --connect-timeout 4 https://172.XXX.XXX.184:XXXX/version curl: (28) Connection timed out after 4001 milliseconds	The error message returned because the connection to the API server timed out. · Check whether the healthy check feature of the internal-facing Server Load Balancer (SLB) instance works as expected. · Check whether an access control list (ACL) is correctly configured for the internal-facing SLB instance. · Troubleshoot the error by using the intelligent O&M feature.

:FAQ about Adding Nodes to an ACK Cluster

Overview

Details

Prerequisites

Procedure of adding an ECS instance

Troubleshooting

FAQ

Applicable scope