All Products
Search
Document Center

E-MapReduce:Use bootstrap actions to execute scripts

Last Updated:Mar 26, 2026

Bootstrap actions let you automatically run custom shell scripts on cluster nodes during cluster creation, scale-out, or auto scaling. Use them to install third-party software, modify runtime configurations, or set up services that EMR does not support natively.

For information about running scripts on existing nodes, see Manually execute scripts.

How it works

When a cluster node starts, EMR downloads your script from an Object Storage Service (OSS) bucket and runs it at the stage you specify:

  • Before component installation — runs before any services are installed

  • Before component startup — runs after services are installed but before they start

  • After component startup — runs after services are running

Bootstrap actions run in the order you define them. If a script fails, EMR either stops the process (blocking cluster creation or scale-out) or continues to the next script, depending on the failure policy you set.

Common use cases include:

  • Installing software using YUM (when the installation package is available)

  • Downloading public software from the Internet

  • Reading data from OSS

  • Installing and running services such as Flink or Impala

Limitations

  • Each cluster supports a maximum of 10 bootstrap actions.

  • Scripts run as root by default. To switch to the Hadoop user, run su - hadoop inside the script.

Prerequisites

Before you begin, ensure that you have:

  • An OSS bucket in the same region as your cluster, with your script file uploaded

  • The script file stored at a path in oss://<bucket>/<path>.sh format

Add a bootstrap action

During cluster creation

  1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.

  2. In the top navigation bar, select a region and a resource group.

  3. Click Create Cluster.

  4. In the Basic Configuration step, scroll to the Advanced Settings section, find Bootstrap Actions, and click Add Bootstrap Action.

  5. Configure the following parameters.

    ParameterDescription
    Action nameA display name for this bootstrap action.
    Script pathThe OSS path to your script file. Format: oss://<bucket>/<path>.sh.
    ParameterOptional arguments passed to the script. Use this to set variable values referenced inside the script.
    Execution timeWhen the script runs: Before component installation, Before component startup, or After component startup.
    Execution failure policyProceed: if the script fails, continue with the next script. Stop: if the script fails, abort cluster creation or scale-out.
    Execution scopeCluster: run on all nodes. Node group type: run only on nodes of the specified group types.
  6. Click OK.

After the cluster is created, check the Script Operation tab to confirm the script ran without errors. If an error occurred, see View execution logs for troubleshooting.

The added bootstrap action may fail to be executed. However, the failure does not affect the creation of the cluster. Check the Script Operation tab after creation.

After cluster creation

  1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.

  2. In the top navigation bar, select a region and a resource group.

  3. Find the target cluster and click Services in the Actions column.

  4. Click the Script Operation tab, then click the Bootstrap Actions tab.

  5. Click Add Bootstrap Action.

  6. In the Add Bootstrap Action dialog box, configure the following parameters.

    ParameterDescription
    NameA display name for this bootstrap action.
    Script addressThe OSS path to your script file. Format: oss://<bucket>/<path>.sh.
    ParameterOptional arguments passed to the script. Use this to set variable values referenced inside the script.
    Execution scopeCluster: run on all nodes. Node group type: run only on nodes of the specified group types. Node group: run only on the specific node groups you select.
    Execution timeWhen the script runs: Before component installation, Before component startup, or After component startup.
    Execution failure policyProceed: if the script fails, continue with the next script. Stop: if the script fails, abort cluster creation or scale-out.

    image

  7. Click OK.

Manage existing bootstrap actions

On the Bootstrap Actions tab, you can edit, clone, or delete a bootstrap action from the Actions column.

View execution logs

Add logging statements at key points in your script so you can trace script behavior from the operational log.

  1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.

  2. In the top navigation bar, select a region and a resource group.

  3. Find the target cluster and click Services in the Actions column.

  4. On the Script Operation tab, find the script and click View Execution Result in the Actions column.

  5. In the Operation History panel, locate the relevant operation record and click 展开 to expand activity details.

    • DataLake, Dataflow, OLAP, DataServing, and custom clusters: look for createNodeGroup or increaseNodeGroup records. Tasks prefixed with RUN_BOOTSTRAP_CLUSTER_SCRIPT_<action-name>_<action-ID>

    • Hadoop, Data Science, and EMR Studio clusters: look for CREATE_CLUSTER or RESIZE_CLUSTER records. Under pollDeployTaskStatusActivity, tasks named RUN_SCRIPT_HOST_** are bootstrap action activities. View Stdout and Stderr logs.

Examples

Each node downloads the script from the specified OSS path and runs it directly or with the parameters you define. The following examples cover common use cases.

Example 1: Download a file from OSS and decompress it

You can specify the file to download from OSS directly in the script. The following scripts download the <myFile>.tar.gz file from OSS and decompress it to the /<yourDir> directory. Choose the appropriate script based on your cluster type.

Important

The OSS endpoint in your script must be accessible from your cluster's network. Use an internal endpoint for classic networks (for example, oss-cn-hangzhou-internal.aliyuncs.com for the China (Hangzhou) region) and a virtual private cloud (VPC) endpoint for VPC networks (for example, vpc100-oss-cn-hangzhou.aliyuncs.com).

DataLake, Dataflow, OLAP, DataServing, and custom clusters (uses ossutil64):

#!/bin/bash
ossutil64 cp oss://<yourBucket>/<myFile>.tar.gz ./ \
  -e oss-cn-hangzhou-internal.aliyuncs.com \
  -i <yourAccessKeyId> \
  -k <yourAccessKeySecret>
mkdir -p /<yourDir>
tar -zxvf <myFile>.tar.gz -C /<yourDir>

Hadoop clusters (uses osscmd):

#!/bin/bash
osscmd --id=<yourAccessKeyId> \
  --key=<yourAccessKeySecret> \
  --host=oss-cn-hangzhou-internal.aliyuncs.com \
  get oss://<yourBucket>/<myFile>.tar.gz ./
mkdir -p /<yourDir>
tar -zxvf <myFile>.tar.gz -C /<yourDir>

Replace the following placeholders:

PlaceholderDescriptionExample
<yourBucket>OSS bucket namemy-emr-bucket
<myFile>File name in the bucketmy-package
<yourDir>Local directory to decompress intoopt/myapp
<yourAccessKeyId>AccessKey IDLTAI5tXxx
<yourAccessKeySecret>AccessKey secretxXxXxXx

Example 2: Install system software with YUM

#!/bin/bash
yum install -y ld-linux.so.2

Troubleshooting

Script interrupted with no error in the logs

This usually means the script exited unexpectedly before writing to the log. The most common causes:

  • Network issue: The ECS instances and the OSS bucket must be in the same region. A cross-region connection attempt (for example, a China (Beijing) ECS instance accessing an OSS bucket outside China (Beijing)) silently fails.

  • Missing IAM role: If the cluster nodes cannot get the AccessKey pair, check that the ECS instances are assigned the AliyunECSInstanceForEMRRole role.

  • `nohup` without output redirection: If your script uses nohup but does not redirect output, the process may hang indefinitely. Use the form nohup <command> ><logfile> 2>&1.

Add logging at key checkpoints to identify where execution stops:

#!/bin/bash
echo "Step 1: Downloading file..." >> /tmp/bootstrap.log 2>&1
ossutil64 cp oss://<yourBucket>/<myFile>.tar.gz ./ -e <endpoint> >> /tmp/bootstrap.log 2>&1
echo "Step 1: Done" >> /tmp/bootstrap.log 2>&1

^M in error logs (Windows line endings)

If the Operation History error log contains ^M, the script was saved with Windows-style CRLF line endings, which cause errors in the Linux environment. Fix the line endings before uploading to OSS:

# Option 1: Use tr
tr -d '\r' < your-script.sh > your-script-fixed.sh

# Option 2: Use perl
perl -pi -e 's/\r\n/\n/g' your-script.sh

YARN or HDFS commands not found

By default, scripts run without loading the system profile, so Hadoop-related commands are not in the PATH. Add the following line at the start of your script:

. /etc/profile
Important

There must be a space between . and /etc/profile. Without the space, the command will not work.

What's next