Configure HDP 2.6 to read and write data in OSS - Object Storage Service

Hortonworks Data Platform (HDP) 3.0.1 includes Hadoop 3.1.1, which supports Object Storage Service (OSS) natively. Earlier HDP versions do not. This guide shows you how to add OSS support to an HDP 2.6.1.0 cluster by installing the OSS connector JARs, configuring the Hadoop file system settings, and verifying connectivity with MapReduce jobs.

Prerequisites

Before you begin, ensure that you have:

An HDP 2.6.1.0 cluster. If you do not have one, create it using one of these methods:
- Ambari — for cluster creation with a management UI
- Manual setup — if Ambari is not available in your environment
sudo access on all cluster nodes
An OSS bucket and its endpoint. For endpoint formats, see Regions and endpoints.
An AccessKey ID and AccessKey secret with read/write permissions on the bucket

Install the OSS connector JARs

Download the OSS connector package for HDP 2.6.1.0.

Extract the archive:

   sudo tar -xvf hadoop-oss-hdp-2.6.1.0-129.tar

The extracted directory contains the following files:

   hadoop-oss-hdp-2.6.1.0-129/
   hadoop-oss-hdp-2.6.1.0-129/hadoop-aliyun-2.7.3.2.6.1.0-129.jar
   hadoop-oss-hdp-2.6.1.0-129/aliyun-sdk-oss-3.4.1.jar
   hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-core-3.4.0.jar
   hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-ecs-4.2.0.jar
   hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-ram-3.0.0.jar
   hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-sts-3.0.0.jar
   hadoop-oss-hdp-2.6.1.0-129/jdom-1.1.jar

Move hadoop-aliyun-2.7.3.2.6.1.0-129.jar to the Hadoop client directory:

   sudo mv hadoop-oss-hdp-2.6.1.0-129/hadoop-aliyun-2.7.3.2.6.1.0-129.jar \
     /usr/hdp/current/hadoop-client/

Verify the file is in place:

   sudo ls -lh /usr/hdp/current/hadoop-client/hadoop-aliyun-2.7.3.2.6.1.0-129.jar

Expected output:

   -rw-r--r-- 1 root root 64K Oct 28 20:56 /usr/hdp/current/hadoop-client/hadoop-aliyun-2.7.3.2.6.1.0-129.jar

Move all other JAR files to the Hadoop client lib directory:

   sudo mv hadoop-oss-hdp-2.6.1.0-129/aliyun-*.jar \
     hadoop-oss-hdp-2.6.1.0-129/jdom-1.1.jar \
     /usr/hdp/current/hadoop-client/lib/

Verify the files are in place:

   sudo ls -ltrh /usr/hdp/current/hadoop-client/lib

The output should include entries similar to:

   -rw-r--r-- 1 root root 114K Oct 28 20:56 aliyun-java-sdk-core-3.4.0.jar
   -rw-r--r-- 1 root root 513K Oct 28 20:56 aliyun-sdk-oss-3.4.1.jar
   -rw-r--r-- 1 root root  13K Oct 28 20:56 aliyun-java-sdk-sts-3.0.0.jar
   -rw-r--r-- 1 root root 211K Oct 28 20:56 aliyun-java-sdk-ram-3.0.0.jar
   -rw-r--r-- 1 root root 770K Oct 28 20:56 aliyun-java-sdk-ecs-4.2.0.jar
   -rw-r--r-- 1 root root 150K Oct 28 20:56 jdom-1.1.jar

Repeat steps 1–4 on every node in your HDP cluster.
All paths shown in this guide (such as /usr/hdp/current) reflect standard HDP 2.6 default layouts. Adjust the paths if your cluster uses a custom installation directory.

Configure OSS settings

Add the following properties to your Hadoop configuration. How you apply them depends on whether your cluster uses Ambari.

Option 1: Use Ambari (recommended)

In the Ambari web UI, go to HDFS > Configs > Custom core-site.
Add each property from the table below.
Restart the cluster when Ambari prompts you.

Option 2: Edit core-site.xml directly

If your cluster does not use Ambari, add the properties directly to /etc/hadoop/conf/core-site.xml on each node, then restart the relevant services.

Required and recommended properties

Copy the following XML block and adapt it for your environment. It includes all required and recommended properties with their default values and descriptions.

<property>
  <name>fs.oss.endpoint</name>
  <value>oss-cn-zhangjiakou-internal.aliyuncs.com</value>
  <description>Endpoint of the OSS region where your bucket is located.
  Use the internal endpoint when your cluster runs inside Alibaba Cloud to avoid egress charges.</description>
</property>

<property>
  <name>fs.oss.accessKeyId</name>
  <value>YOUR_ACCESS_KEY_ID</value>
  <description>AccessKey ID used to authenticate with OSS.</description>
</property>

<property>
  <name>fs.oss.accessKeySecret</name>
  <value>YOUR_ACCESS_KEY_SECRET</value>
  <description>AccessKey secret used to authenticate with OSS.</description>
</property>

<property>
  <name>fs.oss.impl</name>
  <value>org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem</value>
  <description>OSS file system implementation class. Do not change this value.</description>
</property>

<property>
  <name>fs.oss.buffer.dir</name>
  <value>/tmp/oss</value>
  <description>Local directory for temporary files during OSS read/write operations.</description>
</property>

<property>
  <name>fs.oss.connection.secure.enabled</name>
  <value>false</value>
  <description>Whether to use HTTPS for OSS connections. Set to false for internal cluster traffic
  to avoid the performance overhead of TLS. Set to true if your security policy requires encryption in transit.</description>
</property>

<property>
  <name>fs.oss.connection.maximum</name>
  <value>2048</value>
  <description>Maximum number of concurrent connections to OSS. The default upstream value is 32;
  increase it for workloads with high parallelism.</description>
</property>

Replace YOUR_ACCESS_KEY_ID and YOUR_ACCESS_KEY_SECRET with your actual credentials. For a full list of supported properties, see the Hadoop-Aliyun module reference.

Verify connectivity

After restarting the cluster, run the following commands to confirm that Hadoop can read from and write to OSS. Replace <your-bucket-name> with your actual bucket name.

Read test — list the root of your bucket:

sudo hadoop fs -ls oss://<your-bucket-name>/

Write test — create a directory in your bucket:

sudo hadoop fs -mkdir oss://<your-bucket-name>/hadoop-test

If both commands succeed without errors, the connector is working. If you see authentication errors, double-check your fs.oss.accessKeyId and fs.oss.accessKeySecret values. If you see connection errors, verify that fs.oss.endpoint matches the region where your bucket is located.

Run MapReduce jobs against OSS

Before running MapReduce jobs, update the cluster's distributed MapReduce archive to include the OSS connector JARs.

The steps below use MapReduce as an example. For other frameworks such as Tez, apply the same approach — copy the connector JARs into the equivalent archive (for example, hdfs://hdp-master:8020/hdp/apps/2.6.1.0-129/tez/tez.tar.gz) and repackage it.

Run the following commands as the hdfs user:

sudo su hdfs

# Download the existing MapReduce archive from HDFS
hadoop fs -copyToLocal /hdp/apps/2.6.1.0-129/mapreduce/mapreduce.tar.gz

# Back up the original archive
cp mapreduce.tar.gz mapreduce.tar.gz.bak

# Extract the archive
tar zxf mapreduce.tar.gz

# Copy the OSS connector JARs into the archive's tools/lib directory
cp /usr/hdp/current/hadoop-client/hadoop-aliyun-2.7.3.2.6.1.0-129.jar \
   hadoop/share/hadoop/tools/lib/
cp /usr/hdp/current/hadoop-client/lib/aliyun-*.jar \
   hadoop/share/hadoop/tools/lib/
cp /usr/hdp/current/hadoop-client/lib/jdom-1.1.jar \
   hadoop/share/hadoop/tools/lib/

# Repackage and upload the updated archive
tar zcf mapreduce.tar.gz hadoop
hadoop fs -rm /hdp/apps/2.6.1.0-129/mapreduce/mapreduce.tar.gz
hadoop fs -copyFromLocal mapreduce.tar.gz /hdp/apps/2.6.1.0-129/mapreduce/

Verify with TeraGen and TeraSort

Run the standard Hadoop benchmark suite to confirm that MapReduce jobs can read and write OSS end-to-end. Replace <bucket-name> with your actual bucket name.

Generate test data with TeraGen:

sudo hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar \
  teragen -Dmapred.map.tasks=100 10995116 oss://<bucket-name>/1G-input

A successful run ends with output similar to:

18/10/28 21:35:15 INFO mapreduce.Job: Job job_1540728986531_0005 completed successfully
18/10/28 21:35:15 INFO mapreduce.Job: Counters: 36
...

Sort the data with TeraSort:

sudo hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar \
  terasort -Dmapred.map.tasks=100 \
  oss://<bucket-name>/1G-input \
  oss://<bucket-name>/1G-output

A successful run ends with:

18/10/28 21:43:56 INFO mapreduce.Job: Job job_1540728986531_0006 completed successfully
18/10/28 21:43:56 INFO mapreduce.Job: Counters: 54
...

If both jobs complete successfully, your HDP 2.6.1.0 cluster is fully configured to run MapReduce workloads against OSS.

Next steps

Hadoop-Aliyun module reference — full list of fs.oss.* configuration properties
OSS endpoints by region — find the correct internal or public endpoint for your bucket's region