All Products
Search
Document Center

Alibaba Cloud DevOps:Common issues and solutions for host deployment

Last Updated:Apr 21, 2025

This topic lists common issues and solutions for host deployment by Alibaba Cloud Devops.

Alibaba Cloud ECS prompts "deploy channel error" or "offline"

  1. Restart Cloud Assistant if it is not working normally. For more information, see View execution results and fix common issues.

  2. Clean up ECS disk if it is out of space.

Self-hosted instances go offline

Restart the agent if it is offline.

Common agent operations

  • Check status.

    /home/staragent/bin/staragentctl status
  • Restart agent.

    /home/staragent/bin/staragentctl restart
  • Uninstall agent.

    /home/staragent/bin/staragentctl stop
    rm -rf /home/staragent
    rm /usr/sbin/staragent_sn

The build running on the instance is not the latest, and cannot be updated

Check the server disk as indicated by the logs. Clean up disk if it is out of space.

Host deployment fails and no logs for it

For non-Alibaba Cloud ECS, uninstall the agent if the self-hosted instance is generated through an image, then re-add the self-hosted instance and try deployment again.

Rollback tasks

You must roll back multiple tasks separately.

  1. In the drop-down menu in Deployment History, select the task you want to rollback.

  2. Click Rollback.

Deployment fails and the service does not start

Debug your deployment script to verify it works as Alibaba Cloud Devops execute the script without any modifications.

  1. Copy the deployment script from Alibaba Cloud Devops configuration to server, and execute it to see if it can start the service.

  2. If the script works on server but not on Alibaba Cloud Devops, replace all relative paths with absolute paths.

Windows host deployment fails with deploy channel error?

Alibaba Cloud DevOps does not support Windows hosts. Here are two options to try:

  1. Execute script on a Linux server to connect with the Windows host.

  2. Upload the build to your OSS, so that the Windows host can then download the package from it for deployment.

Cannot add a non-Alibaba Cloud ECS instance to the host group

It is recommended to use the hybrid cloud hosting mode for self-hosted instance that are not Alibaba Cloud ECS. For details, see https://help.aliyun.com/document_detail/201140.html

The host reports an environment variable not a valid identifier

It is because the environment variable contains special characters. To solve this, following the steps below:

  1. In the host deployment task, select Encode Variables.

  2. In the deployment script, decode all the environment variables with Base 64. For example, to use the PIPELINE_ID environment variable, add export PIPELINE_ID=$(echo $PIPELINE_ID | base64 -d) at the beginning of the deployment script.

Deployment fails due to insufficient disk space

  1. Log on to your host and execute df -hl to check the remaining disk space. Clean up the disk if it is out of space.

  2. Check the Runner status and service log information of the host ls -al /etc/systemd/system | grep runner systemctl status runner-{version}-{username}.service.

  3. If the host goes offline, uninstall the residual service and reinstall it.

    #  Stop the specified systemd service.
    systemctl stop runner-{version}-{tenant}.service
    #  Delete the systemd configuration file of the service.
    rm -rf /etc/systemd/system/runner-{version}-{tenant}.service
    #  Delete the runner configuration directory related to the user.
    rm -rf /root/yunxiao/{username}/runner/config

Host deployment fails due to the conflict of Runner configurations

If you create instance B by replicating the image from instance A, host deployment will fail due to the reuse of Runner configurations from instance A in instance B.

  • Root cause analysis: The Runner's unique identifier (e.g., registration information) of Instance A is the same as that of instance B. As a result, the routing system was unable to differentiate between instance A and instance B. 

  • Critical impact:

    • Deployment targets might be misrouted. Tasks to be processed by instance B were dispatched to instance A.

    • Automation workflows might be disrupted, leading to environmental contamination or service overlap.

  • Recommended solutions:

    • Uninstall Runner on instance B.

      1. Identify the Runner service name, which is typically named after “runner-{version}-{tenant}.service”.

        ls -al /etc/systemd/system | grep runner
      2. Stop and remove the Runner service.

        # Stop the Runner service  
        systemctl stop runner-{version}-{tenant}.service  
        
        # Remove the service file  
        rm -rf /etc/systemd/system/runner-{version}-{tenant}.service  
      3. Clean up Runner configuration directories.

        # Delete the Runner configuration directory  
        rm -rf /root/yunxiao/{tenant}/runner/config  
        
        # Completely remove the entire Yunxiao directory 
        rm -rf /root/yunxiao/  
      4. Verify full cleanup.

        # Check for running processes  
        ps -ef | grep runner
    • If you choose to replicate instance by image cloning, reinstall Runner on the new instance and ensure that the Runner's channel ID is unique.

The deployment script executes successfully on the host, but fails on Flow

  • Add environment variables related to the script, such as "source /root/.bash_profile;source /etc/profile;".

  • Use an absolute path for the script, such as "/home/admin/app/deploy.sh" instead of "./deploy.sh".

  • Add "| grep -v rdc_deploy_command" at the end of grep command. For example, "ps -ef | grep athens | grep -v grep" needs to be changed to "ps -ef | grep athens | grep -v grep | grep -v rdc_deploy_command".

The host deployment fails and reports User.NoPermission

It means the user does not have permission to call a certain API. Check the service connection of the deployment group.

Host deployment service has started successfully, but the pipeline still shows "deploying"

It is usually because the return code is not correctly passed, or the child process does not exit normally.

Script example demo: https://atomgit.com/flow-example/spring-boot/blob/master/deploy.sh

Follow the steps below for troubleshooting:

  1. Return code verification

    • Add echo $? after key steps in the script (such as service start command) to ensure the return code is 0. If a non-zero value is returned, check the exception handling logic of related commands.

    • The end of the script must explicitly declare exit 0 to avoid implicit reliance on the return code of the last command.

  2. Child process management

    • If you are using nohup , use the standard syntax: nohup java -jar ${JAR_NAME} > ${JAVA_OUT} 2>&1 & to start a background process (such as Java service). The & at the end indicates background operation to avoid blocking the main process.

    • Check other commands that may create child processes (such as systemd or docker run -d) to ensure they correctly separate processes.

  3. Timeout mechanism enhancement

    • If the service startup takes a long time, add polling detection (such as HTTP health check) in the script, so that the script exits once the service is ready.

Example:

# Start the service and capture the return code
nohup java -jar app.jar > log.txt 2>&1 &
echo "Service started with exit code: $?"

# Explicitly declare script successful exit
exit 0

If the problem is still not solved after the above optimizations, check the pipeline configuration, or Alibaba Cloud DevOps host Runner Service status.

The workspace cleanup task gets stuck during the host deployment

If Docker is installed manually, follow the steps below for troubleshooting and fixes.

# get docker status
systemctl status docker

###Output key information:
# Active: active (running): The service is running
# Active: inactive (dead): The service is not started
# Loaded: loaded (...): The service has correctly loaded the configuration

#  View information of the Docker client and server, including images, containers, volumes, etc.
docker ps

#  Restart the docker service
sudo systemctl restart docker

Troubleshoot Flow Runner

  1. Use the tool to get prompts for troubleshooting

    Important

    Check the Runner status before when host deployment fails, or self-hosted instance fails to apply for environment.

    1. Download the troubleshooting tool (supports Linux only).

      wget "https://rdc-public-software.oss-cn-hangzhou.aliyuncs.com/runner/runnerStatusCheck" -O runnerStatusCheck
    2. Set execution permissions.

      chmod u+x runnerStatusCheck
    3. Execute the tool.

      ./runnerStatusCheck
  2. Follow the prompts provided by the tool for further processing, as illustrated below.image

    1. Check the Linux system version.

      Use the command lsb_release -a to get the version. imageAlibaba Cloud Devops Unified Runner currently supports the following Linux distribution versions:

      1. CentOS 6 and later

      2. Ubuntu 16.04 and later

      3. Alibaba Cloud Linux 2 and 3

    2. Check the Runner service status and logs.

      1. Use the command ls -al /etc/systemd/system | grep runner to identify the Runner service name, which is typically formatted as runner-{version}-{tenant}.service, as shown below.image

      2. Check the Runner service status with the command systemctl status runner-{version}-{tenant}.service. If the status is active (running), the service is functioning properly, as shown below. image

      3. You can view the Runner execution logs with the command journalctl -u runner-{version}-{tenant}.service -a --no-pager --since '5 minutes ago' -f.

Common issues with offline host groups

  • During building or deploying, logs are not reported or the host goes offline

    1. Use the command df -hl to get the status of the disk. Clean it up once it is out of space.

    2. Restart the Runner service with systemctl restart runner-{version}-{tenant}.service If it is not active (running).

    3. Ensure the network connectivity.

      1. Query the Runner service status with systemctl status runner-{version}-{tenant}.service and note the service process parameter --configPath=***, as illustrated below. image

      2. Check the URL in the configPath using cat {***}/config.yml | grep url.

        image

      3. Check if the URL is accessible using the following command.

        # Note to replace the url in the command below
        curl '{url}/api/v2/runner/storage/latest?os=linux&arch=amd64'