startup configuration persistence fstab boot autostart disk mount - Elastic Compute Service

This topic explains how to configure persistent disk mounting and automatic startup of business processes for Elastic Compute Service (ECS) instances. It also explains how to set up monitoring and alerting to ensure services automatically recover after unexpected restarts, improving system stability.

Business scenario

An ECS instance may restart unexpectedly due to operating system out-of-memory (OOM) errors or host failures. After a restart, temporary configurations—such as manually mounted disks or manually started processes—are not restored automatically. This can prevent services dependent on data disks from starting or cause extended downtime for core business functions, affecting business continuity and availability.

Solution architecture

This solution uses configuration persistence to convert temporary operations into permanent settings. This allows the instance to automatically restore its business state after an unexpected restart. It also combines built-in operating system features with Alibaba Cloud monitoring services to build an elastic system that supports automatic recovery and monitoring and alerting.

Persistent disk mounting: Use the Linux /etc/fstab file to permanently record disk mount information. Configure mounts using the device UUID instead of the device name (such as /dev/vdb). This avoids mount failures caused by device name changes after a restart.
Persistent business process management: Use systemd to manage business processes as system services. systemd is the standard initialization system for modern Linux distributions. It starts services at boot and automatically restarts them if they exit unexpectedly, ensuring continuous process operation.
Status monitoring and validation:
- Monitoring: Configure Cloud Monitor to monitor disk mount points, core business processes, and instance restart events. Send alerts promptly when automatic recovery fails or anomalies occur.
- Validation: Use the Cloud Assistant fault injection plug-in to simulate instance crashes in non-production environments. Validate end-to-end that configuration persistence and monitoring and alerting work as expected.

Implementation steps

Follow these steps to complete the full configuration—from disk mounting and process auto-start to monitoring, alerting, and final validation.

Step 1: Configure automatic disk mounting at boot

This step ensures your data disk mounts automatically to a specified directory after an instance restart. Use the disk’s UUID to identify it. The UUID stays constant across the instance lifetime and is more reliable than device names such as /dev/vdb1.

Back up the fstab file
Before editing the critical system configuration file /etc/fstab, back it up. This lets you quickly restore it if a configuration error occurs.
```
sudo cp /etc/fstab /etc/fstab.bak
```
Obtain the disk UUID
Run the blkid command to obtain the UUID of the target partition. This example uses partition /dev/vdb1.
```
sudo blkid /dev/vdb1
```
The command returns output similar to the following. Record the UUID value.
```
/dev/vdb1: UUID="f1645951-134f-4677-b5f4-c65c71f8f86d" TYPE="ext4" PARTUUID="..."
```
Add the mount entry to /etc/fstab
Edit the /etc/fstab file and add a new mount entry at the end.
```
sudo nano /etc/fstab
```
Add a new line in the following format. Replace <Your-UUID>, <Mount-point>, and <File-system-type> with your actual values.
```
# <Device>         <Mount-point>    <File-system-type>    <Mount-options>      <dump> <pass>
UUID=<Your-UUID>  /mnt        ext4        defaults,nofail   0      2
```
About the nofail mount option:
- Purpose: The nofail option ensures the instance boots normally even if the disk cannot be mounted. Without it, the boot process would hang waiting for the mount.
- When to use it: Add this option to all non-critical data disks.
- Risk note: With this option, the system does not report mount failures during boot. Therefore, monitor the mount point in later steps to detect and fix issues quickly.
Verify the automatic mount configuration
Test the /etc/fstab configuration without restarting the instance.
```
# Unmount the current mount point, for example /mnt
sudo umount /mnt

# Reload all entries in /etc/fstab
sudo mount -a

# Check whether the mount succeeded
sudo lsblk -f
```
If the output of the lsblk -f command shows that the target device is correctly mounted to the specified directory, the configuration is successful. If the mount -a command reports an error, this indicates a problem with the /etc/fstab file configuration. In this case, you can connect to the instance using VNC. Then, run the sudo mv /etc/fstab.bak /etc/fstab command to restore from the backup and ensure that the instance can restart normally.

Step 2: Configure automatic process startup and service keepalive at boot

This step registers your business process as a system service using systemd. This enables automatic startup at boot and automatic restart if the process exits unexpectedly.

Method 1: systemd (recommended): Offers stronger, more transparent, and customizable service management. It is the best practice for production environments. Though slightly more complex to configure, it provides fine-grained control over service dependencies, run-as users, and environment variables.
Method 2: Use the Cloud Assistant plug-in for service keep-alive: The Cloud Assistant ecs-tool-servicekeepalive plug-in auto-generates systemd configuration files. It is simple to use and fits basic scenarios where detailed service management is not required.

This tutorial uses manual systemd configuration.

(Optional but recommended) Create a dedicated user for the application
Follow the principle of least privilege. Do not run business processes as the root user.
```
sudo groupadd myapp
sudo useradd -r -s /bin/false -g myapp myapp
```

Create a systemd service configuration file

In the /etc/systemd/system/ directory, create a service file ending in .service, such as myapp.service.

sudo nano /etc/systemd/system/myapp.service

Paste the template below into the file and update it for your environment.

[Unit]
Description=My Application Service
# Start the service after network and remote file systems are ready
After=network-online.target remote-fs.target
Wants=network-online.target
# Uncomment the next line if the service depends on the disk mounted in Step 1 (for example, /mnt)
# RequiresMountsFor=/mnt

[Service]
# --- Security configuration ---
# Run the service as the dedicated user and group
User=myapp
Group=myapp

# --- Runtime configuration ---
# Working directory for the process
WorkingDirectory=/opt/myapp
# Absolute path and arguments for the start command
ExecStart=/usr/local/bin/myapp --config /etc/myapp/config.json

# --- Automatic restart policy ---
# Restart only if the service exits abnormally (for example, crashes)
Restart=on-failure
# Time interval between restart attempts
RestartSec=5s

[Install]
# Define the target for enabling the service. multi-user.target means enable in multi-user mode.
WantedBy=multi-user.target

Enable and start the service

Run the following commands to reload the systemd configuration, enable boot-time auto-start, and start the service immediately.

# Reload systemd configuration
sudo systemctl daemon-reload
# Enable boot-time autostart
sudo systemctl enable myapp.service
# Start the service now
sudo systemctl start myapp.service

Check the service status
Verify the service is running and enabled for boot-time auto-start.
```
sudo systemctl status myapp.service
```
If the Active field shows active (running) and the Loaded line includes enabled, the configuration is successful.

Step 3: Configure monitoring and alerting

After configuration persistence, set up monitoring to receive alerts immediately if automatic recovery fails.

Mount point availability monitoring
- Purpose: Ensure key mount points (such as /mnt) remain available.
- How to implement: Use a script (such as df -h) to check mount points regularly. Alert immediately if a mount is missing.
Key process liveness monitoring
- Purpose: Ensure core business processes (such as myapp) are running.
- How to implement: Use Cloud Monitor’s process monitoring feature.
Instance unexpected restart monitoring
- Purpose: Detect system events.
- How to implement: Use Cloud Monitor’s system event alerting feature.

For more information about monitoring configuration, see Cloud Monitor.

Step 4: Validate configuration persistence (crash drill)

After completing all configurations, test the full self-healing and alerting system by simulating a failure.

Important

Fault injection causes instance restarts and service interruptions. Run this drill only in non-production environments or during approved maintenance windows.

Inject the fault
Use the Cloud Assistant plug-in ecs-fault-oscrash to simulate a kernel panic and trigger an unexpected restart .
```
sudo acs-plugin-manager --exec --plugin ecs-fault-oscrash --params inject
```
Wait for the instance to restart
After the command runs, the instance crashes and restarts immediately. Wait approximately three to five minutes. Then, reconnect to the instance.
End-to-end validation
- Check disk mounting: Run lsblk -f and confirm the data disk is mounted automatically.
- Check the business process: Run systemctl status myapp.service and confirm the service started automatically and shows active (running).
- Check monitoring and alerting: Log in to the Cloud Monitor console and verify that event alerts were received.