All Products
Search
Document Center

Elastic Compute Service:Automatic service restoration

Last Updated:May 14, 2024

Services or scripts may stop running due to program exceptions, instance restarts, or power outage. If the services or scripts fail to resume at the earliest opportunity, online business may suffer losses. You can use the Cloud Assistant plug-in ecs-tool-servicekeepalive to quickly resume the interrupted services or scripts. This ensures service reliability and continuity.

Solution overview

The solution is implemented by using the systemd service provided by the Linux operating system. When you use the ecs-tool-servicekeepalive plug-in, you need to only enter a command that can start a service or program. For example, enter the python /home/root/main.py command. After the systemd service is activated, the plug-in automatically generates the systemd service configuration based on the startup command that you enter. This enables the service or script to automatically start without the need to configure the systemd service.

Note

The systemd service is a Linux component and can be used to automatically manage services. For example, the systemd service can start a service or script on instance startup or restart a service after an unexpected stop. For more information, see systemd documentation.

Procedure

  1. After services or programs are deployed, start the ecs-tool-servicekeepalive plug-in of Cloud Assistant as the root user.

    Run a service or script as the root user

    sudo acs-plugin-manager --exec --plugin ecs-tool-servicekeepalive --params "start,'cmd'"

    cmd: Replace the parameter with a command that starts a service or script. For example, you can enter the /bin/bash /home/work/debug/debug.sh command, which is used to run a script, or the python /home/root/main.py command, which is used to run a program.

    Important

    The path of the script or program file must be a root path.

    Run a service or script by specifying a username

    sudo acs-plugin-manager --exec --plugin ecs-tool-servicekeepalive --params "start,execstart='cmd',user=user_name,group=group_name"
    • cmd: Replace the parameter with a service startup command. For example, the /bin/bash /home/work/debug/debug.sh command is used to run a script, or the python /home/root/main.py command is used to run a program.

      Important

      The path of the script or program file must be a root path.

    • user_name: Replace the parameter with the username that you want to use to run the service. To view the created users, run the cut -d: -f1 /etc/passwd command.

    • group_name: Replace the parameter with the name of a user group in which the service is running. To view the created user groups, run the cut -d: -f1 /etc/group command.

  2. Run the following command to check whether automatic restoration is enabled for the service:

    sudo acs-plugin-manager --exec --plugin ecs-tool-servicekeepalive --params "status"

    If the configuration is successful, the response shown in the following figure is returned.

    image

  3. (Optional) To disable automatic restoration for a service or script, run the following command:

    sudo acs-plugin-manager --exec --local --plugin ecs-tool-servicekeepalive --params "stop service_name"

    service_name: Replace the parameter with the name of the service. You can obtain the service name displayed in the service_name column of the command output in Step 2.

Example

  1. Prepare the environment.

    Create the /home/work/debug folder and then create the debug.sh script in the folder. The script prints one line of log data per second to the specified log file.

    sudo mkdir -p /home/work/debug && \
    sudo tee /home/work/debug/debug.sh > /dev/null << 'EOF'
    #!/bin/bash
    while true
    do
       sudo echo "$(date '+%Y-%m-%d %H:%M:%S') progress is alive" >> $1
        sleep 1
    done
    EOF

    Run the ps aux |grep debug.sh command. The command output shows that the script was not running.

    image

  2. Start the Cloud Assistant plug-in.

    sudo acs-plugin-manager --exec --plugin ecs-tool-servicekeepalive --params "start,'/bin/bash /home/work/debug/debug.sh /home/work/debug/debug.log'"

    Run the ps aux | grep debug.sh command. The command output shows that the script is running. The process number is 2572.

    image

  3. Check whether the script can automatically resume.

    Restart the ECS instance and check whether the script resumes as expected

    Restart the Elastic Compute Service (ECS) instance in the ECS console. After the ECS instance is restored, log on to the instance and run the following command:

    ps aux |grep debug.sh

    The debug.sh process of the service is run as expected, and the process number is updated to 764, which indicates that the script is restarted.

    image

    Kill the process and check whether the script resumes as expected

    1. Run the following command to find the number of the debug.sh process.

      ps aux |grep debug.sh

      The following output is displayed. The number of the debug.sh process is 2572.

      image

    2. Run the following command to kill the debug.sh process:

      sudo date && kill -9 <Process number>
    3. Run the following command. The command output shows that the debug.sh process is still running and the process number is updated to 4220, which indicates that the script is restarted.

      ps aux |grep debug.sh

      image

References

As your business grows, the numbers of data requests and concurrency page views increase. You can deploy multiple ECS instances to implement zone-level disaster recovery to ensure data availability and continuity. For more information, see Deploy a highly available architecture.