All Products
Search
Document Center

Troubleshoot IP address faults in CentOS 7 instances and Windows instances

Last Updated: May 14, 2019

This topic describes how to handle instance accessibility failure caused by invalid IP addresses. The information provided in this topic applies to ECS instances created by using CentOS 7 and Windows Server images.

Symptoms

After ECS instances run for a period of time without restart, their IP addresses cannot be pinged.

Cause

When ECS instances are started for the first time, the system automatically assigns IP addresses to the ENIs through the Dynamic Host Configuration Protocol (DHCP) and obtains the lease period of these IP addresses from the DHCP server. During running, the dhclient process of Linux and the DHCP Client of Windows periodically extend this lease period.

However, some CentOS 7 instances inadvertently clean up the dhclient process or the DHCP Client may encounter known issues, which means that the lease period can no longer be extended. As a result, the private network IP addresses will be released when the lease period elapses, leaving networks unreachable. For more information of these CentOS 7 instances, see Applicable scope.

Applicable scope

This topic applies only to ECS instances that meet the following conditions:

  • CentOS 7 public images on ECS instances created before May 31, 2018 and have not been restarted since November 15, 2018:
    • centos_7_04_64_20G_alibase_20180419.vhd
    • centos_7_04_64_20G_alibase_20180326.vhd
    • centos_7_04_64_20G_alibase_201701015.vhd
    • centos_7_03_64_20G_alibase_20170818.vhd
    • centos_7_02_64_20G_alibase_20170818.vhd
    • centos_7_03_64_40G_alibase_20170710.vhd
    • centos_7_03_64_40G_alibase_20170625.vhd
    • centos_7_03_64_40G_alibase_20170523.vhd
    • centos_7_03_64_40G_alibase_20170503.vhd
  • Windows Server images on ECS instances created before November 15, 2018 and have not been restarted since then:
    • Windows Server 2008 R2
    • Windows Server 2012 R2
    • Windows Server 2016
    • Windows Server Version 1709

Resolution

To resolve this issue, use one of the following solutions based on your specific requirements:

  • Solution 1: Use the cloud assistant client (batch operation).
  • Solution 2: Run the Python SDK script (batch operation).
  • Solution 3: Run the Shell or PowerShell script.
  • Solution 4: Check the ENIs individually (for scenarios with a small number of CentOS instances).

Solution 1: Use the cloud assistant client

In this example, the cloud assistant client is used to automatically check and repair ECS instances. Make sure that the cloud assistant client is installed on your ECS instances. Note that the cloud assistant client is pre-installed on ECS instances created after December 1, 2017. For more information, see Cloud assistant client.

  1. Log on to the Cloud Assistant page of the ECS console.
  2. Select the target region.
  3. Click Create Script. For more information, see Create commands.
  4. Download the following Shell or PowerShell script and paste it into the Script field.
  5. Locate the created command, and then click Create Task in the Actions column. Then, select the affected ECS instances and run the command on them in batches. For more information, see Run commands.
    Procedure
  6. After the Status changes to Task Completed, click View Results in the Actions column in the Tasks pane. For more information, see Query execution results and statuses.
    The following figure shows the command results for CentOS instances and Windows instances.
    Command results

Solution 2: Run the Python SDK script

In this example, a Python script is developed based on cloud assistant APIs to automatically check and repair all affected instances in an Alibaba Cloud region. For information about how to install ECS SDK, see Alibaba Cloud Github repositories.

Preparations

Download the following Python SDK dependencies to your local computer or ECS instance:

  1. pip install aliyun-python-sdk-core
  2. pip install aliyun-python-sdk-ecs

Procedure

  1. Download autofix_dhclient.py to your local computer or ECS instance.
  2. (Optional) Run the python autofix_dhclient.py command to view instructions on script use.
    1. # python autofix_dhclient.py
    2. Usage: autofix_dhclient.py <AccessKeyID> <AccessKeySecret> <region-id>
    Parameter descriptions:
    • AccessKeyID: Your AccessKeyId. For information about the value range, see Create an AccessKey.
    • AccessKeySecret: Your AccessKeySecret.
    • region-id: The ID of the region to which the instance belongs. For information about the value range, see Regions and zones.
  3. Follow the instructions to set the AccessKeyID, AccessKeySecret and region-id, and then run the script as the root user or the administrator, for example:
    1. # python autofix_dhclient.py LTAIn*******Py6J kXXIOEoPXXvsYRUd**********TRyU cn-hangzhou

Command result

The following figure shows the command result.

PyhtonSDKResult

The status check items of your instances are described as follows:

  • Cloud Assistant: Checks whether the cloud assistant client is installed on your instances.
    • Installed: The cloud assistant client is installed.
    • Not Installed: The cloud assistant client is not installed. You can configure the cloud assistant client, and then continue the repair.
  • NeedFix: Checks whether the dhclient process or DHCP Client service needs to be repaired.
    • Yes: A repair is required and will be automatically completed by the script.
    • No: No repair is required.
    • Unknown: The script cannot determine whether a repair is required. You need to manually run the script.
  • FixResult: Reports the repair result.
    • Success: The dhclient process or DHCP Client service is repaired.
    • Failed: The repair fails.
    • NoChange: No repair is required.
    • Unknown: The script cannot determine whether a repair is required. You need to manually run the script.

Solution 3: Run the Shell or PowerShell script

This solution requires you to connect to the affected instances. It applies to scenarios with a small number of instances.

Steps for CentOS instances

  1. Connect to the instances. For information about how to connect to an instance, see Overview.
  2. Download the repair script linux_fix_dhclient.sh to a local directory.
  3. Go to the working directory where the script is stored and run the script as the root user.
    1. sudo bash linux_fix_dhclient.sh
    • If return code: 0 is returned, the repair is successful and you can skip the following steps.
    • If other messages are returned, continue to repair abnormal processes.

Steps for Windows instances

  1. Connect to the instances. For information about how to connect to an instance, see Overview.
  2. Download the repair script win_fix_dhclient.ps1 to a local directory.
  3. Go to the working directory where the script is stored as the administrator and run the PowerShell script:
    1. powershell -executionpolicy bypass -file C:\win_fix_dhclient.ps1
    Note You need to replace C:\win_fix_dhclient.ps1 in the script with the actual file path.
    • If No ip will expire in recent 500 days. Then no need fix. is returned, the dhclient process is abnormal and the abnormal process is repaired.is returned, the dhclient process is normal and no repair is required.
    • If Found one ip will expire in 500 days. We need fixing it!!! Fix it now... Fix success. is returned, the dhclient process is abnormal and the abnormal process is repaired.
    • If other messages are returned, the repair fails.

Solution 4: Check the ENIs individually

Check the dhclient process of each ENI to detect abnormal dhclient processes. If the dhclient process is not running, run the ifup ethX command to restart the process, where X indicates the ID of the ENI. To do so, follow these steps:

  1. Connect to the instances.
  2. Run the ls -al /sys/class/net/ command to check all ENIs of the instances.
  3. Run the cat /etc/sysconfig/network-scripts/ifcfg-eth0 command to check whether eth0 uses the IP address assigned through DHCP.
    • BOOTPROTO=dhcp indicates that eth0 uses the IP address assigned through DHCP.
      etho
    • If IP addresses are not assigned through DHCP, go to step 7.
  4. Run the ps aux | grep dhclient | grep eth0 command to check the running status of the dhclient process that corresponds to eth0.
    • If no result is returned, the dhclient process is running abnormally. In this case, go to step 5.
    • If the following result is returned, the dhclient process is running normally. In this case, go to step 7.
      1. root 15340 0.0 0.3 113372 12788 ? Ss 14:16 0:00 /sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient--eth0.lease -pf /var/run/dhclient-eth0.pid -H izuf695ygwh32u2i******z eth0
  5. Run the ifup eth0 command to restart the dhclient process.
  6. Check the running status of the dhclient process that corresponds to eth0.
  7. Repeat steps 3 through 6 to check and repair the dhclient process of other ENIs as needed.
    Note You need to replace eth0 in the command with the actual ID of the ENI.

Contact us

If the repair fails, open a ticket to contact Alibaba Cloud.