After you create a cluster, especially a subscription cluster, you can run scripts on multiple nodes at a time to meet your business requirements, such as to install third-party software or modify the running environment of the cluster.

Prerequisites

  • A cluster is created. For more information, see Create a cluster.
  • The cluster is in the Idle or Running state. Scripts cannot be executed on clusters in other states.
  • Cluster scripts are developed or obtained and uploaded to Object Storage Service (OSS). For more information about the cluster scripts, see Examples.

Background information

You can take bootstrap actions to initialize on-demand clusters. A cluster script is similar to a bootstrap action. After a cluster is created, you can use the cluster script feature to install software and services previously unavailable to your cluster. Example:
  • Use YUM to install software already provided.
  • Download public software from the Internet.
  • Install software to read your data from OSS.
  • Install and run a service, such as Flink or Impala. However, the script you want to compile is more complex.

Features

Only one cluster script can run on a cluster at a specific point in time. You cannot submit another cluster script if one is already in progress. You can retain a maximum of 10 cluster script records for each cluster. If more than 10 records exist, you must delete the previous records before you create new cluster scripts.

A cluster script may succeed on some nodes, but fail on others. For example, if you restart a node, the script may fail to run. After you resolve the issue, you can run the cluster script on the failed nodes again. After you scale out a cluster, you can also run cluster scripts on the added nodes.

You can use the cluster script feature to download scripts from OSS to a specific node and run these scripts. If the scripts fail to be executed, you can log on to the node to check the operational log. The operational log for each node is stored in the /var/log/cluster-scripts/clusterScriptId path. If the cluster is configured with an OSS log storage directory, the operational log is also stored in the osslogpath/clusterId/ip/cluster-scripts/clusterScriptId path. To create a cluster script, follow the following steps:

  1. Log on to the EMR console.
  2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
  3. Click the Cluster Management tab.
  4. On the Cluster Management page, find your cluster and click Details in the Actions column.
  5. In the left-side navigation pane of the Cluster Overview page, click Cluster Scripts.
  6. On the Cluster Scripts page, click Create and Run in the upper-right corner.
  7. In the Create Script dialog box, specify Name, Script, and Target Nodes.
    Note We recommend that you test the cluster script feature on a single node before you use the feature on the entire cluster.
  8. Click OK.
    After a cluster script is created, it is displayed in the cluster script list and is in the running state.
    • Click Refresh in the upper-right corner to update the status of the cluster script.
    • Click Details in the Actions column to view the running status of the cluster script.

Examples

Similar to a bootstrap action, you can specify the file that you want to download from OSS in the script. In the following example, the myfile.tar.gz file in the oss://yourbucket directory is downloaded to your local machine and decompressed to the /yourdir directory.
#! /bin/bash
osscmd --id=<yourid> --key=<yourkey> --host=oss-cn-hangzhou-internal.aliyuncs.com get oss://<yourbucket>/<myfile>.tar.gz ./<myfile>.tar.gz
mkdir -p /<yourdir>
tar -zxvf <myfile>.tar.gz -C /<yourdir>
Note The specified OSS address can be an internal, public, or VPC endpoint. If the node is of the classic network type, you must specify an internal endpoint. For example, the internal endpoint for a node that resides in the China (Hangzhou) region is oss-cn-hangzhou-internal.aliyuncs.com. If the node is of the VPC type, you must specify a domain name that you can access from the VPC. For example, the domain name for a node that resides in the China (Hangzhou) region is vpc100-oss-cn-hangzhou.aliyuncs.com.
You can also use YUM to install additional system software packages, such as ld-linux.so.2.
#! /bin/bash
yum install -y ld-linux.so.2

By default, the root account is used to run specified scripts on a cluster. You can run the su hadoop command in the script to switch to the hadoop user.