Diagnose and resolve startup failures, logon issues, network problems, and performance bottlenecks on Linux and Windows ECS instances.
Linux operating system issues
Startup failures
-
Check whether the fstab file lists all block devices on the instance.
A block device not listed in
/etc/fstabcan prevent the system from restarting. See How do I remove a block device that does not exist in the /etc/fstab file of a Linux instance? -
Check whether the block devices in the fstab file are properly attached.
An improperly attached block device can prevent the system from restarting. See A Linux instance has a disk that is not properly mounted.
-
Check whether the format of the fstab file is valid.
An invalid
/etc/fstabformat can prevent the system from restarting. See How do I fix a format error of the /etc/fstab configuration file of a Linux instance? -
Run the fsck command to check system files.
A damaged file system can prevent the instance from starting. See Check and repair the file systems on a Linux instance.
Logon failures
-
Check whether the limits configuration file is correct.
If the
nofilevalue in/etc/security/limits.confexceedsnr_open, connections to the instance may fail. See Resolve remote connection failures or "Too many open files" errors on a Linux instance. -
Check whether the root account password exists in the /etc/shadow file.
A missing root password in /etc/shadow prevents logon. See The key system user does not exist in the Linux instance.
-
Check the format of critical system files.
A system file not in UNIX format can prevent logon. See How can I modify the Unix format of a Linux instance?
-
Check whether SSH access permissions are properly configured.
Incorrect SSH permissions block logon. See Incorrect SSH permissions block remote connection to a Linux instance.
-
Check whether critical SSH files and directories exist.
Missing SSH files (such as the
sshd_configconfiguration file) can prevent logon. See Verify required SSH files on a Linux instance. -
Check whether the configured huge page size exceeds the permitted range.
An oversized huge page setting can prevent logon. Adjust the value in
/etc/sysctl.conf. See How to adjust Huge Pages on a Linux instance. -
Check whether the operating system is out of memory.
Out of memory conditions can prevent logon. See Troubleshoot Out of Memory issues in a Linux instance.
-
Check whether the system firewall is enabled.
Firewall rules that block external access can prevent connections. See Manage system firewall on Linux.
-
Check whether TCP SACK is enabled.
Disabled TCP SACK degrades network performance. See Enable TCP SACK on Linux instances.
-
Check whether the UDP buffer overflows.
UDP buffer overflow degrades network performance and can prevent logon. See Linux instance UDP cache overflow causes remote connection failure.
-
Check whether Security-Enhanced Linux (SELinux) is enabled.
Enabled SELinux can cause SSH connection errors. See SSH connection fails when SELinux is enabled on a Linux instance.
-
SSH and VNC logon both fail.
Detach the system disk, attach it as a data disk to another instance, and log on to the new instance. See Detach the system disk from a Linux instance and attach the disk to another instance.
-
SSH connection returns an error.
SSH as root returns
Permission denied, please try again. See Resolve the "Permission denied, please try again" error for SSH connections to a Linux instance.
Access failures
-
Check NAT-related kernel parameters.
Misconfigured NAT kernel parameters can block SSH and HTTP access over the private network. See Failed to access the instance through a NAT Linux due to kernel configuration issues.
-
Check whether service processes are running and ports are listening.
A stopped service process causes access failures. See How do I start common services of a Linux instance and query the status of the listening port.
Network connection failures
-
Check the Dynamic Host Configuration Protocol (DHCP) configurations.
ECS instances use DHCP to assign IP addresses to ENIs. An incorrect ENI configuration file or a stopped dhclient process can disrupt DHCP and disconnect the instance. See Check and fix DHCP configurations for local network interfaces on Linux instances.
-
Check whether a network-related process exists.
Without a network process, DHCP leases cannot renew, disconnecting the instance. See The Linux network process does not exist.
-
Check whether network interface controller (NIC) multi-queue is enabled.
NIC multi-queue distributes NIC interrupts across vCPUs, preventing bottlenecks on a single vCPU. See NIC multi-queue.
Performance issues (Linux)
-
Check whether the TCP backlog buffer overflows.
TCP backlog buffer overflow degrades network performance and can prevent logon. See Linux instance TCP backlog cache overflow caused by failure to connect to instance remotely.
-
Check whether CPU utilization exceeds the normal range.
High CPU utilization affects system stability. See Troubleshoot high CPU utilization or load on a Linux instance.
-
Disk write failures.
Extend disk capacity online as storage needs grow. See Step 1: Resize a disk to extend the disk capacity or Resize disks offline for Linux instances.
Windows operating system issues
Logon failures
-
Check the NICs of the instance.
Unavailable NICs prevent logon. See Network interface controller of a Windows instance is not available.
-
Check whether port 3389 is enabled.
Remote Desktop Services must be enabled for remote desktop connections. See How do I enable Remote Desktop Services on a Windows ECS instance?
-
Check the virtio driver version.
An outdated virtio driver can prevent logon. See Update the virtio driver for a Windows instance.
-
Check whether the firewall is properly configured.
Incorrect firewall settings can prevent logon. See Configure firewall rules for Windows.
Performance issues (Windows)
-
Check whether CPU utilization exceeds the normal range.
High CPU utilization affects system stability. See Troubleshoot high CPU utilization on Windows instances.
-
Check the version of the Windows operating system.
Microsoft ended support for Windows Server 2008 and 2008 R2 on January 14, 2020. Alibaba Cloud no longer provides technical support for ECS instances running these operating systems. Upgrade to Windows Server 2012 or later. For supported images, see Public images.
-
Check disk capacity.
Shrinking free space on the C drive can cause system instability. See Troubleshoot decreasing disk space on Drive C of a Windows instance.