This topic covers how to upgrade JindoSDK on E-MapReduce (EMR) clusters with X86 architecture — whether you're patching a running cluster, adding nodes, or rolling back a failed upgrade.
Supported versions: EMR V3.40.0 or later, EMR V5.6.0 or later.
Which scenario applies to you?
| Scenario | When to use |
|---|---|
| Upgrade an existing cluster | Your cluster is running and you need to apply a patch or move to a newer JindoSDK version. |
| Scale out or create a new cluster | You're adding nodes to an existing cluster or creating a cluster from scratch. |
| Roll back to the default version | An upgrade caused issues and you need to revert to the default JindoSDK version. |
Scenario 1: Upgrade an existing cluster
Use this approach when your cluster is running and you want to upgrade JindoSDK in place.
If you're upgrading from JindoSDK 4.6.8 or earlier to 4.6.9 or later, or to any 6.x version, the default temporary job path used by JindoCommitter changes. To prevent data loss, complete both actions in sequence:
Before the upgrade — add one of the following configuration items:
-
fs.jdo.committer.allow.concurrent=falsein Hadoop-Common > Configuration > core-site.xml -
spark.hadoop.fs.jdo.committer.allow.concurrent=falsein Spark > Configuration > spark-defaults.conf
After the upgrade completes on all nodes (including Gateway nodes) — set the same parameter back to true.
Step 1: Prepare the software package and upgrade script
Before downloading, decide which JindoSDK version you need:
-
Direct OSS or OSS-HDFS access only: Check your local Hadoop dependency version. If it's below 2.7, additional compatibility steps may be required.
-
Semi-managed services (JindoCache, JindoAuth, JindoFSx): Contact Alibaba Cloud EMR technical support to confirm version compatibility before proceeding.
-
Log in to the master node. See Log on to a cluster.
-
Download the patch package and decompress it:
su - emr-user cd /home/emr-user/ wget https://jindodata-binary.oss-cn-shanghai.aliyuncs.com/resources/emr-taihao/jindosdk-patches.tar.gz tar zxf jindosdk-patches.tar.gz -
Download the JindoSDK package into the
jindosdk-patchesdirectory. The following example uses version 6.8.2:cd jindosdk-patches wget https://jindodata-binary.oss-cn-shanghai.aliyuncs.com/release/6.8.2/jindosdk-6.8.2-linux.tar.gz ls -lThe directory should contain:
-rwxrwxr-x 1 emr-user emr-user 2439 May 01 00:00 apply_all.sh -rwxrwxr-x 1 emr-user emr-user 7315 May 01 00:00 apply.sh -rw-rw-r-- 1 emr-user emr-user 40 May 01 00:00 hosts -rw-r----- 1 emr-user emr-user xxxxxxxxx May 01 00:00 jindosdk-6.8.2-linux.tar.gz -rwxrwxr-x 1 emr-user emr-user 1112 May 01 00:00 revert_all.sh -rwxrwxr-x 1 emr-user emr-user 2042 May 01 00:00 revert.sh
Step 2: Configure node information
Populate the hosts file with all cluster node hostnames. Choose either method:
Automatic (recommended):
cat /usr/local/taihao-executor-all/data/cache/.cluster_context | jq --raw-output '.nodes[].hostname.alias[]' > hosts
If this command fails to populate the file, use the manual method instead.
Manual:
vim hosts
Add one hostname per line:
master-1-1
core-1-1
core-1-2
Step 3: Run the upgrade
Run apply_all.sh with the target version:
./apply_all.sh 6.8.2
The script is complete when ### DONE appears in the output:
>> updating ... master-1-1
>>> updating ... core-1-1
>>> updating ... core-1-2
### DONE
Step 4: Update the plugin directory (OSS Ranger compatibility)
Skip this step if EMR OSS Ranger authentication is not enabled.
If OSS Ranger authentication is enabled and you're upgrading from EMR-3.51.2 or earlier (or EMR-5.17.2 or earlier) to a JindoSDK version in the range [6.5.0, 6.7.2], upgrade to 6.7.3 or later instead to avoid compatibility issues. Then update the plugin directory path:
-
Open the Configuration page for the HADOOP-COMMON service and click the core-sites.xml tab.
-
Find and update
fs.jdo.plugin.dir:Parameter Before After fs.jdo.plugin.dir/opt/apps/RANGER/jindoauth-current/plugins/opt/apps/JINDOSDK/jindosdk-current/plugins
Step 5: Handle special nodes
Gateway nodes created via EMR CLI
Gateway nodes created through the EMR console are already covered by the previous steps. If any Gateway nodes were created via EMR CLI, run the upgrade script on those nodes manually. In addition, when elastic nodes are initialized, complete the upgrade in advance through the script.
Trino, Presto, and Impala (versions below EMR-3.53.0 / EMR-5.19.0)
For clusters running EMR versions earlier than EMR-3.53.0 (V3 series) or earlier than EMR-5.19.0 (V5 series), JindoSDK is not automatically upgraded for Trino, Presto, or Impala. Manually replace the JindoSDK JAR files in the plugins path of each service with the target version, then restart those services.
Step 6: Verify the upgrade
ls -l /opt/apps/JINDOSDK/jindosdk-current/lib
After a successful upgrade from 6.2.0 to 6.8.2, the symlinks in lib point to the new version:
lrwxrwxrwx 1 emr-user emr-user 64 Apr 12 11:08 jindo-core-6.2.0.jar -> /opt/apps/JINDOSDK/jindosdk-6.8.2-linux/lib/jindo-core-6.8.2.jar
lrwxrwxrwx 1 emr-user emr-user 82 Apr 12 11:08 jindo-core-linux-el7-aarch64-6.2.0.jar -> /opt/apps/JINDOSDK/jindosdk-6.8.2-linux/lib/jindo-core-linux-el7-aarch64-6.8.2.jar
lrwxrwxrwx 1 emr-user emr-user 63 Apr 12 11:08 jindo-sdk-6.2.0.jar -> /opt/apps/JINDOSDK/jindosdk-6.8.2-linux/lib/jindo-sdk-6.8.2.jar
lrwxrwxrwx 1 emr-user emr-user 50 Apr 12 11:08 native -> /opt/apps/JINDOSDK/jindosdk-6.8.2-linux/lib/native
lrwxrwxrwx 1 emr-user emr-user 57 Apr 12 11:08 site-packages -> /opt/apps/JINDOSDK/jindosdk-6.8.2-linux/lib/site-packages
Step 7: Restart services
Restart the relevant services for the upgrade to take effect.
| Service type | How to restart |
|---|---|
| Hive, Presto, Impala, Flink, Ranger, Spark, Zeppelin, and similar batch-oriented services | On the service page in the EMR cluster, choose More> > Restart. |
| Spark Streaming and Flink jobs running on YARN | Wait for jobs to stop, then perform a rolling restart on YARN NodeManager. |
Scenario 2: Scale out or create a new cluster
Use bootstrap actions to apply a JindoSDK upgrade automatically when new nodes join the cluster or when creating a new cluster. This ensures every node starts with the correct JindoSDK version without manual intervention.
Use-genwhen scaling out an existing cluster. Use-gen-fullwhen creating a new cluster — the full package includes all required dependencies.
Step 1: Prepare a bootstrap upgrade package
-
Download the required files into a
jindo-patchdirectory. The following example targets version 6.8.2:mkdir jindo-patch cd jindo-patch wget https://jindodata-binary.oss-cn-shanghai.aliyuncs.com/resources/emr-taihao/jindosdk-patches.tar.gz wget https://jindodata-binary.oss-cn-shanghai.aliyuncs.com/release/6.8.2/jindosdk-6.8.2-linux.tar.gz wget https://jindodata-binary.oss-cn-shanghai.aliyuncs.com/resources/emr-taihao/bootstrap_jindosdk.sh ls -lThe output shows:
-rw-r----- 1 hadoop hadoop xxxx May 01 00:00 bootstrap_jindosdk.sh -rw-r----- 1 hadoop hadoop xxxxxxxxx May 01 00:00 jindosdk-6.8.2-linux.tar.gz -rw-r----- 1 hadoop hadoop xxxx May 01 00:00 jindosdk-patches.tar.gz -
Generate the upgrade package:
-
For scaling out an existing cluster:
bash bootstrap_jindosdk.sh -gen 6.8.2 -
For creating a new cluster:
bash bootstrap_jindosdk.sh -gen-full 6.8.2
When the package is ready, the output shows:
Generated patch at /home/emr-user/jindo-patch/jindosdk-bootstrap-patches.tar.gz -
Step 2: Upload the package to OSS
Upload both files to Object Storage Service (OSS). Use Hadoop commands, ossutil, OSS Browser, or the OSS console.
Example using Hadoop commands:
hadoop dfs -mkdir -p oss://<bucket-name>/path/to/patch/
cd /home/hadoop/patch/
hadoop dfs -put jindosdk-bootstrap-patches.tar.gz oss://<bucket-name>/path/to/patch/
hadoop dfs -put bootstrap_jindosdk.sh oss://<bucket-name>/path/to/patch/
hadoop dfs -ls oss://<bucket-name>/path/to/patch/
Expected output:
Found 2 items
-rw-rw-rw- 1 2634 2022-05-13 14:07 oss://<bucket-name>/.../bootstrap_jindosdk.sh
-rw-rw-rw- 1 597342992 2022-05-13 13:41 oss://<bucket-name>/.../jindosdk-bootstrap-patches.tar.gz
Step 3: Add a bootstrap action
In the EMR console, add a bootstrap action with the following settings. For general guidance, see Manage bootstrap actions.
| Parameter | Description | Example |
|---|---|---|
| Name | A name for this bootstrap action. | update_jindosdk |
| Script location | The OSS path to the script file, in oss://*/*.sh format. |
oss:///path/to/patch/bootstrap_jindosdk.sh |
| Parameter | Arguments passed to the script. | -bootstrap oss:///path/to/patch/jindosdk-bootstrap-patches.tar.gz |
| Execution scope | Select Cluster. | Cluster |
| Execution time | Select After Component Startup. | After Component Startup |
| Failure policy | Select Proceed With Execution. | Proceed With Execution |
Step 4: Handle special nodes
Gateway nodes created via EMR CLI
Gateway nodes created through the EMR console are already covered by the bootstrap action. If any Gateway nodes were created via EMR CLI, run the upgrade script on those nodes manually. In addition, when elastic nodes are initialized, complete the upgrade in advance through the script.
Trino, Presto, and Impala (versions below EMR-3.53.0 / EMR-5.19.0)
For clusters running EMR versions earlier than EMR-3.53.0 (V3 series) or earlier than EMR-5.19.0 (V5 series), JindoSDK is not automatically upgraded for Trino, Presto, or Impala. Manually replace the JindoSDK JAR files in the plugins path of each service with the latest version, then restart those services.
Step 5: Restart services
After the bootstrap action completes, restart the relevant services:
-
New cluster: Restart Hive, Presto, Impala, Flink, Ranger, Spark, and Zeppelin.
-
Scale-out: Restart the relevant services on the new nodes only.
Scenario 3: Roll back to the default version
If a JindoSDK upgrade causes issues on a cluster running EMR V3.40.0 or later, or EMR V5.6.0 or later, use the rollback script to revert to the default version.
Step 1: Download the rollback script
-
Log in to the master node. See Log on to a cluster.
-
Download the patch package and decompress it:
su - emr-user cd /home/emr-user/ wget https://jindodata-binary.oss-cn-shanghai.aliyuncs.com/resources/emr-taihao/jindosdk-patches.tar.gz tar zxf jindosdk-patches.tar.gz cd jindosdk-patches ls -lThe directory contains:
-rwxrwxr-x 1 emr-user emr-user 2439 May 01 00:00 apply_all.sh -rwxrwxr-x 1 emr-user emr-user 7315 May 01 00:00 apply.sh -rw-rw-r-- 1 emr-user emr-user 40 May 01 00:00 hosts -rwxrwxr-x 1 emr-user emr-user 1112 May 01 00:00 revert_all.sh -rwxrwxr-x 1 emr-user emr-user 2042 May 01 00:00 revert.sh
Step 2: Configure node information
Populate the hosts file with all cluster node hostnames. Choose either method:
Automatic (recommended):
cat /usr/local/taihao-executor-all/data/cache/.cluster_context | jq --raw-output '.nodes[].hostname.alias[]' > hosts
If this command fails to populate the file, use the manual method instead.
Manual:
vim hosts
Add one hostname per line:
master-1-1
core-1-1
core-1-2
Step 3: Run the rollback
./revert_all.sh
The script is complete when ### DONE appears:
>> updating ... master-1-1
>>> updating ... core-1-1
>>> updating ... core-1-2
### DONE
Step 4: Verify the rollback
ls -l /opt/apps/JINDOSDK/jindosdk-current/lib
After a successful rollback to 6.2.0, the files appear as regular files (not symlinks):
-rw-r--r-- 1 emr-user emr-user 1253740 Apr 24 17:40 jindo-core-6.2.0.jar
-rw-r--r-- 1 emr-user emr-user 13110547 Apr 24 17:40 jindo-core-linux-el7-aarch64-6.2.0.jar
-rw-r--r-- 1 emr-user emr-user 4432227 Apr 24 17:40 jindo-sdk-6.2.0.jar
drwxr-xr-x 2 emr-user emr-user 4096 Apr 24 17:40 native
Step 5: Handle special nodes
Gateway nodes created via EMR CLI
Gateway nodes created through the EMR console are already covered by the previous steps. If any Gateway nodes were created via EMR CLI, run the rollback script on those nodes manually. In addition, when elastic nodes are initialized, complete the rollback in advance through the script.
Trino, Presto, and Impala (versions below EMR-3.53.0 / EMR-5.19.0)
For clusters running EMR versions earlier than EMR-3.53.0 (V3 series) or earlier than EMR-5.19.0 (V5 series), JindoSDK for Trino, Presto, and Impala is not rolled back automatically. Manually replace the JindoSDK JAR files in the plugins path of each service with the default version, then restart those services.
Step 6: Restart services
Restart the relevant services for the rollback to take effect.
| Service type | How to restart |
|---|---|
| Hive, Presto, Impala, Flink, Ranger, Spark, Zeppelin, and similar batch-oriented services | On the service page in the EMR cluster, choose More> > Restart. |
| Spark Streaming and Flink jobs running on YARN | Wait for jobs to stop, then perform a rolling restart on YARN NodeManager. |