Hortonworks Data Platform (HDP) 3.0.1 includes Hadoop 3.1.1, which supports Object Storage Service (OSS) natively. Earlier HDP versions do not. This guide shows you how to add OSS support to an HDP 2.6.1.0 cluster by installing the OSS connector JARs, configuring the Hadoop file system settings, and verifying connectivity with MapReduce jobs.
Prerequisites
Before you begin, ensure that you have:
An HDP 2.6.1.0 cluster. If you do not have one, create it using one of these methods:
Ambari — for cluster creation with a management UI
Manual setup — if Ambari is not available in your environment
sudoaccess on all cluster nodesAn OSS bucket and its endpoint. For endpoint formats, see Regions and endpoints.
An AccessKey ID and AccessKey secret with read/write permissions on the bucket
Install the OSS connector JARs
Download the OSS connector package for HDP 2.6.1.0.
Extract the archive:
sudo tar -xvf hadoop-oss-hdp-2.6.1.0-129.tarThe extracted directory contains the following files:
hadoop-oss-hdp-2.6.1.0-129/ hadoop-oss-hdp-2.6.1.0-129/hadoop-aliyun-2.7.3.2.6.1.0-129.jar hadoop-oss-hdp-2.6.1.0-129/aliyun-sdk-oss-3.4.1.jar hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-core-3.4.0.jar hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-ecs-4.2.0.jar hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-ram-3.0.0.jar hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-sts-3.0.0.jar hadoop-oss-hdp-2.6.1.0-129/jdom-1.1.jarMove
hadoop-aliyun-2.7.3.2.6.1.0-129.jarto the Hadoop client directory:sudo mv hadoop-oss-hdp-2.6.1.0-129/hadoop-aliyun-2.7.3.2.6.1.0-129.jar \ /usr/hdp/current/hadoop-client/Verify the file is in place:
sudo ls -lh /usr/hdp/current/hadoop-client/hadoop-aliyun-2.7.3.2.6.1.0-129.jarExpected output:
-rw-r--r-- 1 root root 64K Oct 28 20:56 /usr/hdp/current/hadoop-client/hadoop-aliyun-2.7.3.2.6.1.0-129.jarMove all other JAR files to the Hadoop client
libdirectory:sudo mv hadoop-oss-hdp-2.6.1.0-129/aliyun-*.jar \ hadoop-oss-hdp-2.6.1.0-129/jdom-1.1.jar \ /usr/hdp/current/hadoop-client/lib/Verify the files are in place:
sudo ls -ltrh /usr/hdp/current/hadoop-client/libThe output should include entries similar to:
-rw-r--r-- 1 root root 114K Oct 28 20:56 aliyun-java-sdk-core-3.4.0.jar -rw-r--r-- 1 root root 513K Oct 28 20:56 aliyun-sdk-oss-3.4.1.jar -rw-r--r-- 1 root root 13K Oct 28 20:56 aliyun-java-sdk-sts-3.0.0.jar -rw-r--r-- 1 root root 211K Oct 28 20:56 aliyun-java-sdk-ram-3.0.0.jar -rw-r--r-- 1 root root 770K Oct 28 20:56 aliyun-java-sdk-ecs-4.2.0.jar -rw-r--r-- 1 root root 150K Oct 28 20:56 jdom-1.1.jarRepeat steps 1–4 on every node in your HDP cluster.
All paths shown in this guide (such as
/usr/hdp/current) reflect standard HDP 2.6 default layouts. Adjust the paths if your cluster uses a custom installation directory.
Configure OSS settings
Add the following properties to your Hadoop configuration. How you apply them depends on whether your cluster uses Ambari.
Option 1: Use Ambari (recommended)
In the Ambari web UI, go to HDFS > Configs > Custom core-site.
Add each property from the table below.
Restart the cluster when Ambari prompts you.
Option 2: Edit core-site.xml directly
If your cluster does not use Ambari, add the properties directly to /etc/hadoop/conf/core-site.xml on each node, then restart the relevant services.
Required and recommended properties
Copy the following XML block and adapt it for your environment. It includes all required and recommended properties with their default values and descriptions.
<property>
<name>fs.oss.endpoint</name>
<value>oss-cn-zhangjiakou-internal.aliyuncs.com</value>
<description>Endpoint of the OSS region where your bucket is located.
Use the internal endpoint when your cluster runs inside Alibaba Cloud to avoid egress charges.</description>
</property>
<property>
<name>fs.oss.accessKeyId</name>
<value>YOUR_ACCESS_KEY_ID</value>
<description>AccessKey ID used to authenticate with OSS.</description>
</property>
<property>
<name>fs.oss.accessKeySecret</name>
<value>YOUR_ACCESS_KEY_SECRET</value>
<description>AccessKey secret used to authenticate with OSS.</description>
</property>
<property>
<name>fs.oss.impl</name>
<value>org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem</value>
<description>OSS file system implementation class. Do not change this value.</description>
</property>
<property>
<name>fs.oss.buffer.dir</name>
<value>/tmp/oss</value>
<description>Local directory for temporary files during OSS read/write operations.</description>
</property>
<property>
<name>fs.oss.connection.secure.enabled</name>
<value>false</value>
<description>Whether to use HTTPS for OSS connections. Set to false for internal cluster traffic
to avoid the performance overhead of TLS. Set to true if your security policy requires encryption in transit.</description>
</property>
<property>
<name>fs.oss.connection.maximum</name>
<value>2048</value>
<description>Maximum number of concurrent connections to OSS. The default upstream value is 32;
increase it for workloads with high parallelism.</description>
</property>Replace YOUR_ACCESS_KEY_ID and YOUR_ACCESS_KEY_SECRET with your actual credentials. For a full list of supported properties, see the Hadoop-Aliyun module reference.
Verify connectivity
After restarting the cluster, run the following commands to confirm that Hadoop can read from and write to OSS. Replace <your-bucket-name> with your actual bucket name.
Read test — list the root of your bucket:
sudo hadoop fs -ls oss://<your-bucket-name>/Write test — create a directory in your bucket:
sudo hadoop fs -mkdir oss://<your-bucket-name>/hadoop-testIf both commands succeed without errors, the connector is working. If you see authentication errors, double-check your fs.oss.accessKeyId and fs.oss.accessKeySecret values. If you see connection errors, verify that fs.oss.endpoint matches the region where your bucket is located.
Run MapReduce jobs against OSS
Before running MapReduce jobs, update the cluster's distributed MapReduce archive to include the OSS connector JARs.
The steps below use MapReduce as an example. For other frameworks such as Tez, apply the same approach — copy the connector JARs into the equivalent archive (for example, hdfs://hdp-master:8020/hdp/apps/2.6.1.0-129/tez/tez.tar.gz) and repackage it.Run the following commands as the hdfs user:
sudo su hdfs
# Download the existing MapReduce archive from HDFS
hadoop fs -copyToLocal /hdp/apps/2.6.1.0-129/mapreduce/mapreduce.tar.gz
# Back up the original archive
cp mapreduce.tar.gz mapreduce.tar.gz.bak
# Extract the archive
tar zxf mapreduce.tar.gz
# Copy the OSS connector JARs into the archive's tools/lib directory
cp /usr/hdp/current/hadoop-client/hadoop-aliyun-2.7.3.2.6.1.0-129.jar \
hadoop/share/hadoop/tools/lib/
cp /usr/hdp/current/hadoop-client/lib/aliyun-*.jar \
hadoop/share/hadoop/tools/lib/
cp /usr/hdp/current/hadoop-client/lib/jdom-1.1.jar \
hadoop/share/hadoop/tools/lib/
# Repackage and upload the updated archive
tar zcf mapreduce.tar.gz hadoop
hadoop fs -rm /hdp/apps/2.6.1.0-129/mapreduce/mapreduce.tar.gz
hadoop fs -copyFromLocal mapreduce.tar.gz /hdp/apps/2.6.1.0-129/mapreduce/Verify with TeraGen and TeraSort
Run the standard Hadoop benchmark suite to confirm that MapReduce jobs can read and write OSS end-to-end. Replace <bucket-name> with your actual bucket name.
Generate test data with TeraGen:
sudo hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar \
teragen -Dmapred.map.tasks=100 10995116 oss://<bucket-name>/1G-inputA successful run ends with output similar to:
18/10/28 21:35:15 INFO mapreduce.Job: Job job_1540728986531_0005 completed successfully
18/10/28 21:35:15 INFO mapreduce.Job: Counters: 36
...Sort the data with TeraSort:
sudo hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar \
terasort -Dmapred.map.tasks=100 \
oss://<bucket-name>/1G-input \
oss://<bucket-name>/1G-outputA successful run ends with:
18/10/28 21:43:56 INFO mapreduce.Job: Job job_1540728986531_0006 completed successfully
18/10/28 21:43:56 INFO mapreduce.Job: Counters: 54
...If both jobs complete successfully, your HDP 2.6.1.0 cluster is fully configured to run MapReduce workloads against OSS.
Next steps
Hadoop-Aliyun module reference — full list of
fs.oss.*configuration propertiesOSS endpoints by region — find the correct internal or public endpoint for your bucket's region