This topic describes the common issues that the Linux and Windows operating systems of Elastic Compute Service (ECS) instances experience and their solutions.
Common issues of Linux operating systems and their solutions
Instance startup failure
Check whether the fstab file contains all the block device names of the instance.
If a block device exists on the instance but the name of the block device is not present in the fstab file, the system may be unable to restart. You must remove from the instance the block device whose name is not present in the
/etc/fstabfile. For more information, see How do I remove a block device that does not exist in the /etc/fstab file of a Linux instance?Check whether the block devices contained in the fstab file are properly attached to the instance.
If a block device is not properly attached, the system may be unable to restart. For more information, see A Linux instance has a disk that is not properly mounted.
Check whether the format of the fstab file is valid.
If the format of the
/etc/fstabconfiguration file is invalid, the system may be unable to restart. For more information, see How do I fix a format error of the /etc/fstab configuration file of a Linux instance?Run the fsck command to check system files.
If a file system is damaged, the instance may be unable to start. For more information, see How to check and fix the file systems of Linux instances.
Instance logon failure
Check whether the limits configuration file is correct.
You can use the
/etc/security/limits.confconfiguration file to limit system resources for a Linux instance. If the value ofnofilein the system is greater than the value ofnr_open, you may be unable to connect to the instance. For more information, see How to adjust the limits system parameters of Linux instances.Check whether the password of the critical system user (the root account) is contained in the /etc/shadow file.
If the password of the root account is not contained in the /etc/shadow file, you cannot log to the instance. For more information, see The key system user does not exist in the Linux instance.
Check the format of critical system files.
If a critical system file is not in the UNIX format, you may be unable to log on to the instance. For more information, see How can I modify the Unix format of a Linux instance?
Check whether SSH access permissions are properly configured.
If SSH access permissions are not properly configured in a Linux instance, you cannot log on to the instance. For more information, see Failed to remotely connect to a Linux instance due to an SSH access exception.
Check whether the critical files or directories required for SSH access exist.
If a critical file or directory that is required for SSH access (such as the
sshd_configconfiguration file) is missing from a Linux instance, you may be unable to log on to the instance. For more information, see Check Linux instances for the required files or directories required by the SSH service.Check whether the configured huge page size exceeds the permitted range.
If the huge page size of an instance exceeds the permitted range, you may be unable to log on to the instance and must adjust the huge page size in the
/etc/sysctl.conffile. For more information, see How to adjust the huge page memory of Linux instances.Check whether the operating system is out of memory.
If the operating system is out of memory, you may be unable to log on to the instance. For more information, see How to solve the OOM problem of Linux instances.
Check whether the system firewall is enabled.
If the firewall is enabled for an instance and has rules configured to block external access, you may be unable to connect to the instance. For more information, see Enable or disable the system firewall function for Linux instances.
Check whether TCP SACK is enabled.
If TCP SACK is disabled on a Linux instance, the network performance of the instance may be affected. For more information, see How to enable TCP SACK for a Linux instance.
Check whether the UDP buffer overflows.
If the UDP buffer of a Linux instance overflows, the network performance of the instance may be affected and you may be unable to log on to the instance. For more information, see Failure to connect to a Linux instance because the UDP buffer of the instance overflows.
Check whether Security-Enhanced Linux (SELinux) is enabled.
If the SELinux service is enabled on an instance, an error may be reported when you attempt to connect to the instance. For more information, see An SSH remote connection exception occurs in a Linux instance because the SELinux service is enabled.
You cannot log on to an instance by using SSH or Virtual Network Computing (VNC).
You can detach the system disk from the instance, attach the system disk as a data disk to another instance, and then log on to the new instance by using SSH or VNC. For more information, see Detach the system disk from a Linux instance and attach the disk to another instance.
An error is reported when you attempt to connect to an instance.
When you attempt to use SSH to connect to the instance as the root user, the
Permission denied, please try againerror message is returned. For more information, see "Permission denied, please try again" error occurs when you log on to a Linux instance through SSH as the root user.
Instance access failure
Check whether the kernel parameters related to the NAT environment are valid.
You use NAT to access a Linux instance over your private network, and the kernel parameters related to the NAT environment are not properly configured. As a result, you cannot connect to the Linux instance by using SSH or access HTTP services on the instance. For more information, see Failed to access the instance through a NAT Linux due to kernel configuration issues.
Check whether processes are started and whether common ports are being listened to.
If you cannot access a service hosted on a Linux instance, the process of the service may not be running. For more information, see How do I start common services of a Linux instance and query the status of the listening port.
Failed network connection
Check the Dynamic Host Configuration Protocol (DHCP) configurations.
By default, ECS instances use DHCP to assign IP addresses to elastic network interfaces (ENIs) and obtain the lease expiration time of the IP addresses. If the configuration file of an ENI is incorrect or if the dhclient process of an ENI does not run, the DHCP service on the instance may experience exceptions, which causes the instance to be disconnected from the network. For more information, see Check and repair the DHCP configurations for local network interface controllers on Linux instances.
Check whether a network-related process exists.
If no network process is present in the operating system and the network configuration is DHCP, the lease of IP addresses cannot be renewed after the lease expires. This causes the instance to be disconnected from the network. For more information, see The Linux network process does not exist.
Check whether network interface controller (NIC) multi-queue is enabled.
NIC multi-queue enables an ECS instance to use multiple NIC queues to improve network performance. Performance bottlenecks may occur when a single vCPU of an instance is used to process NIC interrupts. To solve this issue, you can use NIC multi-queue to distribute NIC interrupts across different vCPUs. For more information, see Configure NIC multi-queue.
Performance issues
Check whether the TCP backlog buffer overflows.
If the TCP backlog buffer overflows on a Linux instance, the network performance of the instance may be affected and you may be unable to log on to the instance. For more information, see Linux instance TCP backlog cache overflow caused by failure to connect to instance remotely.
Check whether CPU utilization exceeds the normal range.
A high CPU utilization affects system stability and business operations. For more information, see Query and case analysis Linux CPU load.
Files cannot be written to disks.
You can extend the capacity of your system disks and data disks online as your storage requirements increase. For more information, see Resize disks online for Linux instances or Resize disks offline for Linux instances.
Common issues of Windows operating systems and their solutions
Instance logon failure
Check the NICs of the instance.
If the NICs of a Windows instance are unavailable, you cannot log on to the instance. For more information, see Network interface controller of a Windows instance is not available.
Check whether port 3389 is enabled.
You can use Remote Desktop Services to manage Windows instances. If the Remote Desktop Services feature is not enabled, remote desktop connections cannot be established. For more information, see How do I start Remote Desktop Connection to RDP on a Windows instance?
Check whether the version of the virtio driver is earlier than expected.
If the version of the virtio driver is earlier than expected on a Windows instance, you may be unable to log on to the instance. For more information, see Update Red Hat virtio drivers of Windows instances.
Check whether the firewall is properly configured.
If the firewall is not properly configured on a Windows instance, you may be unable to log on to the instance. For more information, see Configure Windows Firewall rules for Windows Server instances.
Performance issues
Check whether CPU utilization exceeds the normal range.
A high CPU utilization affects system stability and business operations. For more information, see How to troubleshoot the high CPU usage of Windows instances.
Check the version of the Windows operating system.
On January 14, 2020, Microsoft stopped providing support for Windows Server 2008 and Windows Server 2008 R2 operating systems. In light of this, Alibaba Cloud no longer provides technical support for ECS instances that use these operating systems. If you have ECS instances that use these operating systems, upgrade them to Windows Server 2012 or later at your earliest opportunity. For information about supported images, see Overview of public images and visit the image buy page.
Check disk capacity.
The operating system of a Windows instance may be unable to work normally if free space on the C drive keeps shrinking. For more information, see Temporary files occupy a large amount of disk space on Windows instances.