All Products
Search
Document Center

FAQ about the Scale-out of a Kubernetes Cluster

Last Updated: May 10, 2021

Overview

This document describes the FAQ about increasing the number of Elastic Compute Service (ECS) instances in a Container Service for Kubernetes (ACK) cluster.

Details

Take note of the following items:

  • Before you perform operations that may cause risks, such as modifying instance configurations or data, check the disaster recovery and fault tolerance capabilities of the instances to ensure data security.
  • We recommend that you back up logs or create snapshots before you modify the configurations and data of instances that include, but are not limited to, ECS and ApsaraDB RDS instances.
  • If you have granted permissions or submitted sensitive information, such as the logon account and password, in the Alibaba Cloud Management Console, modify the information at the earliest opportunity.

Prerequisites

If an ECS instance fails to be added, make sure that the following prerequisites are met:

  • The ACK cluster is running.
  • The number of the ECS instances to be added each time is less than or equal to 100.
  • A sufficient quota is provided for adding ECS instances to the ACK cluster.
  • The quota of route entries for a virtual private cloud (VPC) is sufficient.
  • The vCPU quota of the ECS instance is sufficient.
  • If the Flannel plug-in is used, the quota of route entries for the VPC is sufficient.
  • The default security group of the ACK cluster is used. You can use an existing security group. If an existing security group is used, make sure the following requirements are met:
    • The security group and the cluster are in the same VPC.
    • If a basic security group is used, the configured security group rule must allow the inbound traffic from the internal network by using the Classless Inter-Domain Routing (CIDR) blocks of pods.
    • If an advanced security group is used, the configured security group rule must allow the inbound traffic from the internal network by using the CIDR blocks of vSwitches.

Troubleshooting

  1. Log on to the ACK console. Find the cluster that you want to manage and click View Logs in the Actions column. On the log information page of the cluster, troubleshoot errors based on the log information of the cluster. A "Node XXX joined cluster successfully" message indicates that an ECS instance is added and the cluster is scaled out.
  2. View the log information. If the cluster logs contain messages such as "Scaling activity asa-bp1fik5agdxxx, add "1" ecs instance, status:Successful", all ECS instances are created and added to the cluster. If the value of the status parameter is Warning, not all ECS instances are created and added to the cluster.
  3. Troubleshoot a specific error. A "Wait k8s node XXX join cluster timeout" message indicates that an error occurred when a deployment script is executed. In this case, you can log on to the ECS instance where the error occurs and troubleshoot the error based on deployment logs. You can run the following command to obtain the deployment logs:
    cat /var/log/messages | grep cloud-init
  1. Check the cluster status and system configurations by using the intelligent operations and maintenance (O&M) feature. For more information, see Use cluster check to troubleshoot cluster issues.

FAQ

Error message

Description

Solution

Failed to ExecuteScalingRule, err: SDK.ServerError ErrorCode: ScalingRule.InvalidScalingRuleType Recommend: RequestId: AEB2E940-3F70-41B5-95A3-0FFBF7C430CE Message: Specific scaling rule type: TargetTrackingScalingRule can not be executed.

The error message returned because the type of the scaling rule is invalid. Make sure that a simple scaling rule is used.

Use a simple scaling rule.

Scaling activity asa-2zxxxx, add "20" ecs instances, status: Warning

The error message returned because not all ECS instances were created. This error occurs when ECS instances of the specific instance type are out of stock.

Scale out the cluster by using multiple ECS instance types.

Failed to DescribeScalingActivities error Reson : A user requests to execute scaling rule "asr-xxx", changing the Total Capacity from "1" to "2". , Description : Add "1" ECS instance , Detail : Fail to create Instances into scaling group(code:"QuotaExceed.ElasticQuota", msg:"No available ecs quota for the specified ecs instance type."). , Error code: ScalingActivity.Failed

The error message returned because ECS instances are out of stock.

Scale out the cluster by specifying multiple instance types, multiple regions, and multiple zones.

Failed to attach node i-xxxxx, err Aliyun API Error: RequestId: 89515AB0-E3EE-4FD2-996A-AC031D3CF921 Status Code: 404 Code: NoSuchResource Message: The specified resourceis not found.

The error message returned because a large number of ECS instances are waiting to be added in the background.

Submit a ticket for consultation.

Failed to ModifyScalingConfiguration, err: SDK.ServerError ErrorCode: InvalidSecurityGroupId.NotFound Recommend: RequestId: 498018EA-2F0B-4A19-B1D1-FBA93CC3A83E Message: The specified value of parameter "SecurityGroupId" is not valid.

The error message returned because the security group of the cluster is deleted. This causes a scale-out failure.

Specify a security group for scale-out or submit a ticket for consultation.

Applicable scope

  • ACK