This topic explains how to configure persistent disk mounting and automatic startup of business processes for Elastic Compute Service (ECS) instances. It also explains how to set up monitoring and alerting to ensure services automatically recover after unexpected restarts, improving system stability.
Business scenario
An ECS instance may restart unexpectedly due to operating system out-of-memory (OOM) errors or host failures. After a restart, temporary configurations—such as manually mounted disks or manually started processes—are not restored automatically. This can prevent services dependent on data disks from starting or cause extended downtime for core business functions, affecting business continuity and availability.
Solution architecture
This solution uses configuration persistence to convert temporary operations into permanent settings. This allows the instance to automatically restore its business state after an unexpected restart. It also combines built-in operating system features with Alibaba Cloud monitoring services to build an elastic system that supports automatic recovery and monitoring and alerting.
Persistent disk mounting: Use the Linux
/etc/fstabfile to permanently record disk mount information. Configure mounts using the device UUID instead of the device name (such as/dev/vdb). This avoids mount failures caused by device name changes after a restart.Persistent business process management: Use
systemdto manage business processes as system services.systemdis the standard initialization system for modern Linux distributions. It starts services at boot and automatically restarts them if they exit unexpectedly, ensuring continuous process operation.Status monitoring and validation:
Monitoring: Configure Cloud Monitor to monitor disk mount points, core business processes, and instance restart events. Send alerts promptly when automatic recovery fails or anomalies occur.
Validation: Use the Cloud Assistant fault injection plug-in to simulate instance crashes in non-production environments. Validate end-to-end that configuration persistence and monitoring and alerting work as expected.
Implementation steps
Follow these steps to complete the full configuration—from disk mounting and process auto-start to monitoring, alerting, and final validation.
Step 1: Configure automatic disk mounting at boot
This step ensures your data disk mounts automatically to a specified directory after an instance restart. Use the disk’s UUID to identify it. The UUID stays constant across the instance lifetime and is more reliable than device names such as /dev/vdb1.
Back up the fstab file
Before editing the critical system configuration file
/etc/fstab, back it up. This lets you quickly restore it if a configuration error occurs.sudo cp /etc/fstab /etc/fstab.bakObtain the disk UUID
Run the
blkidcommand to obtain the UUID of the target partition. This example uses partition/dev/vdb1.sudo blkid /dev/vdb1The command returns output similar to the following. Record the
UUIDvalue./dev/vdb1: UUID="f1645951-134f-4677-b5f4-c65c71f8f86d" TYPE="ext4" PARTUUID="..."Add the mount entry to /etc/fstab
Edit the
/etc/fstabfile and add a new mount entry at the end.sudo nano /etc/fstabAdd a new line in the following format. Replace
<Your-UUID>,<Mount-point>, and<File-system-type>with your actual values.# <Device> <Mount-point> <File-system-type> <Mount-options> <dump> <pass> UUID=<Your-UUID> /mnt ext4 defaults,nofail 0 2About the
nofailmount option:Purpose: The
nofailoption ensures the instance boots normally even if the disk cannot be mounted. Without it, the boot process would hang waiting for the mount.When to use it: Add this option to all non-critical data disks.
Risk note: With this option, the system does not report mount failures during boot. Therefore, monitor the mount point in later steps to detect and fix issues quickly.
Verify the automatic mount configuration
Test the
/etc/fstabconfiguration without restarting the instance.# Unmount the current mount point, for example /mnt sudo umount /mnt # Reload all entries in /etc/fstab sudo mount -a # Check whether the mount succeeded sudo lsblk -fIf the output of the
lsblk -fcommand shows that the target device is correctly mounted to the specified directory, the configuration is successful. If themount -acommand reports an error, this indicates a problem with the/etc/fstabfile configuration. In this case, you can connect to the instance using VNC. Then, run thesudo mv /etc/fstab.bak /etc/fstabcommand to restore from the backup and ensure that the instance can restart normally.
Step 2: Configure automatic process startup and service keepalive at boot
This step registers your business process as a system service using systemd. This enables automatic startup at boot and automatic restart if the process exits unexpectedly.
Method 1: systemd (recommended): Offers stronger, more transparent, and customizable service management. It is the best practice for production environments. Though slightly more complex to configure, it provides fine-grained control over service dependencies, run-as users, and environment variables.
Method 2: Use the Cloud Assistant plug-in for service keep-alive: The
Cloud Assistantecs-tool-servicekeepaliveplug-in auto-generatessystemdconfiguration files. It is simple to use and fits basic scenarios where detailed service management is not required.
This tutorial uses manual systemd configuration.
(Optional but recommended) Create a dedicated user for the application
Follow the principle of least privilege. Do not run business processes as the
rootuser.sudo groupadd myapp sudo useradd -r -s /bin/false -g myapp myappCreate a systemd service configuration file
In the
/etc/systemd/system/directory, create a service file ending in.service, such asmyapp.service.sudo nano /etc/systemd/system/myapp.servicePaste the template below into the file and update it for your environment.
[Unit] Description=My Application Service # Start the service after network and remote file systems are ready After=network-online.target remote-fs.target Wants=network-online.target # Uncomment the next line if the service depends on the disk mounted in Step 1 (for example, /mnt) # RequiresMountsFor=/mnt [Service] # --- Security configuration --- # Run the service as the dedicated user and group User=myapp Group=myapp # --- Runtime configuration --- # Working directory for the process WorkingDirectory=/opt/myapp # Absolute path and arguments for the start command ExecStart=/usr/local/bin/myapp --config /etc/myapp/config.json # --- Automatic restart policy --- # Restart only if the service exits abnormally (for example, crashes) Restart=on-failure # Time interval between restart attempts RestartSec=5s [Install] # Define the target for enabling the service. multi-user.target means enable in multi-user mode. WantedBy=multi-user.targetEnable and start the service
Run the following commands to reload the
systemdconfiguration, enable boot-time auto-start, and start the service immediately.# Reload systemd configuration sudo systemctl daemon-reload # Enable boot-time autostart sudo systemctl enable myapp.service # Start the service now sudo systemctl start myapp.serviceCheck the service status
Verify the service is running and enabled for boot-time auto-start.
sudo systemctl status myapp.serviceIf the
Activefield showsactive (running)and theLoadedline includesenabled, the configuration is successful.
Step 3: Configure monitoring and alerting
After configuration persistence, set up monitoring to receive alerts immediately if automatic recovery fails.
Mount point availability monitoring
Purpose: Ensure key mount points (such as
/mnt) remain available.How to implement: Use a script (such as
df -h) to check mount points regularly. Alert immediately if a mount is missing.
Key process liveness monitoring
Purpose: Ensure core business processes (such as
myapp) are running.How to implement: Use Cloud Monitor’s process monitoring feature.
Instance unexpected restart monitoring
Purpose: Detect system events.
How to implement: Use Cloud Monitor’s system event alerting feature.
For more information about monitoring configuration, see Cloud Monitor.
Step 4: Validate configuration persistence (crash drill)
After completing all configurations, test the full self-healing and alerting system by simulating a failure.
Fault injection causes instance restarts and service interruptions. Run this drill only in non-production environments or during approved maintenance windows.
Inject the fault
Use the Cloud Assistant plug-in
ecs-fault-oscrashto simulate a kernel panic and trigger an unexpected restart .sudo acs-plugin-manager --exec --plugin ecs-fault-oscrash --params injectWait for the instance to restart
After the command runs, the instance crashes and restarts immediately. Wait approximately three to five minutes. Then, reconnect to the instance.
End-to-end validation
Check disk mounting: Run
lsblk -fand confirm the data disk is mounted automatically.Check the business process: Run
systemctl status myapp.serviceand confirm the service started automatically and showsactive (running).Check monitoring and alerting: Log in to the Cloud Monitor console and verify that event alerts were received.