Issue description
If you use an Alibaba Cloud Elastic Compute Service (ECS) instance for a long time without a restart, network issues can occur. The instance may disconnect from the network, and you may be unable to ping its public or private IP addresses.
Cause
When an ECS instance first starts, the system uses the Dynamic Host Configuration Protocol (DHCP) to automatically assign an IP address to the Elastic Network Interface (ENI) and obtains a lease with an expiration time. Under normal conditions, the dhclient process in Linux and the DHCP Client service in Windows periodically renew the lease with the DHCP server to ensure the IP address remains available. However, instances created from some CentOS 7 images (see the Scope section) might unexpectedly terminate the dhclient process. The DHCP Client service in Windows Server also has known issues. As a result, the instance cannot automatically renew its IP address lease. When the lease expires, the instance's private IP address is released, causing a network connection failure.
Scope
This issue affects ECS instances that meet the following conditions and use DHCP to automatically assign IP addresses to their ENIs. You can fix the issue by following the instructions in this topic. If your ECS instance uses a static IP address, you do not need to take any action.
Instances created from the following CentOS 7 public images before May 31, 2018, and not restarted after November 15, 2018.
centos_7_04_64_20G_alibase_20180419.vhd
centos_7_04_64_20G_alibase_20180326.vhd
centos_7_04_64_20G_alibase_201701015.vhd
centos_7_03_64_20G_alibase_20170818.vhd
centos_7_02_64_20G_alibase_20170818.vhd
centos_7_03_64_40G_alibase_20170710.vhd
centos_7_03_64_40G_alibase_20170625.vhd
centos_7_03_64_40G_alibase_20170523.vhd
centos_7_03_64_40G_alibase_20170503.vhd
Instances that run the following Windows Server operating systems, were created before November 15, 2018, and have not been restarted since.
Windows Server 2008 R2
Windows Server 2012 R2
Windows Server 2016
Windows Server Version 1709
Solutions
Before you perform high-risk operations, such as modifying an instance or its data, ensure that the instance has disaster recovery and fault tolerance capabilities to protect your data.
Before you modify the configuration or data of an instance, such as an ECS or ApsaraDB RDS instance, create a snapshot or enable a feature such as log backup for RDS.
If you have granted permissions or submitted security information, such as logon credentials, on the Alibaba Cloud platform, change them promptly.
Choose one of the four solutions in this topic based on your requirements.
Method 1: Batch fix with Cloud Assistant. This method is simple and suitable for fixing multiple instances from the ECS console.
Method 2: Python SDK script. This method uses a Python script based on the Cloud Assistant API. It checks the status of your instances in a region and automatically fixes them. This method is for users familiar with script-based operations and maintenance.
Method 3: Shell and PowerShell scripts. This method requires you to log on to each ECS instance and run a script to manually fix the issue. It is suitable for polling or testing on a small number of instances. The script content is the same as that used in Method 1.
Method 4: Troubleshoot ENIs one by one. This method is suitable for a small number of instances.
Method 1: Batch fix with Cloud Assistant
This example shows how to use Cloud Assistant to check and automatically repair ECS instances. Ensure that the Cloud Assistant Agent is installed on your instance. The Cloud Assistant Agent is pre-installed by default on ECS instances that are created after December 1, 2017. For more information about the Cloud Assistant Agent, see Install Cloud Assistant Agent.
Download the appropriate Shell or PowerShell script and paste it into the Command Content field in Cloud Assistant.
CentOS instances: linux_fix_dhclient.sh
Windows instances: win_fix_dhclient.ps1
Select the target ECS instances and run the command. For more information, see Run a command.
Verify that the command was executed successfully. For more information, see View execution results and status. The following figure shows the command output for CentOS and Windows instances.

Method 2: Batch fix with a Python SDK script
This method uses a Python script based on the Cloud Assistant API. The script checks and automatically fixes all affected instances in an Alibaba Cloud region. For more information about how to install the ECS SDK, see the installation document in the Alibaba Cloud GitHub repository.
Preparations
Run the following command to download the required Python SDK dependencies to your local computer or ECS instance.
pip install alibabacloud_ecs20140526Procedure
Download the autofix_dhclient.py file to your ECS instance.
Run the following command to execute the script.
sudo python autofix_dhclient.py <AccessKeyID> <AccessKeySecret> <region-id>NoteReplace
<AccessKeyID>,<AccessKeySecret>, and<region-id>in the command with your actual values.AccessKeyID: The AccessKey ID of your Alibaba Cloud account or RAM user.
AccessKeySecret: The AccessKey secret of your Alibaba Cloud account or RAM user.
region-id: The ID of the region where the instance is located. For a list of region IDs, see Regions and zones.
Execution results
The following figure shows a sample output of the script.
The following list describes the instance status checks.
Cloud Assistant: Checks whether the Cloud Assistant Agent is installed on the instance.
Installed: The Cloud Assistant Agent is installed.
Not Installed: The Cloud Assistant Agent is missing. You must install the Cloud Assistant Agent and then continue with the fix.
NeedFix: Checks whether the dhclient process or DHCP Client service needs to be fixed.
Yes: A fix is required. The script automatically completes the subsequent operations.
No: No fix is required.
Unknown: The script cannot determine if a fix is required. You must perform the fix manually.
FixResult: Reports the result of the fix.
Success: The dhclient process or DHCP Client service was fixed successfully.
Failed: The fix failed.
NoChange: No fix was required.
Unknown: The script cannot determine the result. You must perform the fix manually.
Method 3: Fix with a Shell or PowerShell script
This method requires you to log on to each affected instance and fix the issue. It is suitable for a small number of instances.
Procedure for CentOS instances
Log on to the ECS instance. For more information, see Connection methods overview.
Download the linux_fix_dhclient.sh script to any folder.
Switch to the working directory of the script and run the script as the root user.
sudo bash linux_fix_dhclient.shNoteA return value of "0" indicates that the script completed the check and fix successfully.
Any other return value indicates that the fix failed.
Procedure for Windows instances
Log on to the ECS instance. For more information, see Connection methods overview.
Download the win_fix_dhclient.ps1 script to any folder.
Open PowerShell as an administrator and run the following command.
powershell -executionpolicy bypass -file C:\win_fix_dhclient.ps1NoteNote:
Replace
C:\win_fix_dhclient.ps1with the actual file path.If the output is "No ip will expire in recent 500 days. Then no need fix.", the DHCP Client service on the instance is normal and no fix is required.
If the output is "Found one ip will expire in 500 days. We need fixing it!!! Fix it now... Fix success.", the DHCP Client service on the instance was abnormal and the script fixed it.
Any other output indicates that the fix failed.
Method 4: Troubleshoot ENIs one by one
This method requires you to check and fix the dhclient process (for CentOS instances) or the IP address lease expiration time (for Windows instances) for each ENI.
Procedure for CentOS instances
Log on to the ECS instance. For more information, see Connection methods overview.
Run the following command to check all ENIs on the instance.
ls -al /sys/class/net/Run the following command to check if the eth0 ENI uses DHCP to obtain an IP address.
cat /etc/sysconfig/network-scripts/ifcfg-eth0If the output contains
BOOTPROTO=dhcp, the ENI uses DHCP. Otherwise, skip to step 7.
You can run the following command to check the status of the dhclient process for the eth0 ENI.
ps aux | grep dhclient | grep eth0If the output is empty, the dhclient process is abnormal.
If the output is similar to the following, the dhclient process is running normally. Skip to Step 7.
root 15340 0.0 0.3 113372 12788 ? Ss 14:16 0:00 /sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient--eth0.lease -pf /var/run/dhclient-eth0.pid -H izuf****************** eth0
Run the following command to restart the dhclient process.
ifup eth0NoteThis example uses the eth0 ENI. Replace eth0 with the actual identifier of your ENI.
Verify the status of the dhclient process for the ENI again.
Repeat steps 3 to 6 to check and fix the dhclient process for all other ENIs.
Procedure for Windows instances
Log on to the ECS instance. For more information, see Connection methods overview.
Open Command Prompt as an administrator.
Run the following command. For each network adapter that is described as Red Hat VirtIO Ethernet Adaptor, check whether DHCP Enabled is set to Yes and note its Lease Expires time.
ipconfig /allNoteThe primary and secondary ENIs of an ECS instance are described as Red Hat VirtIO Ethernet Adaptor. Custom-configured network interface cards (NICs), such as VPN or loopback NICs, and NICs for which DHCP is not enabled are not affected.
If the lease expires within one year, run the following command to renew the lease.
ipconfig /renewRun the
ipconfig /allcommand again. The fix is successful if the new lease expiration time is several years in the future.
Applicable to
ECS