Hortonworks Data Platform (HDP) is a big data platform released by Hortonworks and consists of open source components such as Hadoop, Hive, and HBase. Hadoop 3.1.1 is included in HDP 3.0.1 and supports Object Storage Service (OSS). However, earlier versions of HDP do not support OSS. This topic uses HDP 2.6.1.0 as an example to describe how to configure HDP 2.6 to read and write OSS data.
Prerequisites
An HDP 2.6.1.0 cluster is created.If you do not have an HDP 2.6.1.0 cluster, you can use one of the following methods to create an HDP 2.6.1.0 cluster:
- Use Ambari to create an HDP 2.6.1.0 cluster.
- If Ambari is not available, you can manually create an HDP 2.6.1.0 cluster.
Procedure
- Download the HDP 2.6.1.0 package that supports OSS.
- Run the following command to decompress the downloaded package:
sudo tar -xvf hadoop-oss-hdp-2.6.1.0-129.tar
Sample success response:
hadoop-oss-hdp-2.6.1.0-129/ hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-ram-3.0.0.jar hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-core-3.4.0.jar hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-ecs-4.2.0.jar hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-sts-3.0.0.jar hadoop-oss-hdp-2.6.1.0-129/jdom-1.1.jar hadoop-oss-hdp-2.6.1.0-129/aliyun-sdk-oss-3.4.1.jar hadoop-oss-hdp-2.6.1.0-129/hadoop-aliyun-2.7.3.2.6.1.0-129.jar
- Modify the directories of the JAR packages. Note In this topic, all contents enclosed by ${} are environment variables. Modify the environment variables based on the actual environment.
- Perform the preceding operations on all HDP nodes.
- Use Ambari to add configurations. If your cluster does not use Ambari for management, modify core-site.xml. In this example, Ambari is used. The following table describes the configurations that you must add.
Parameter Description fs.oss.endpoint Specify the endpoint of the region in which the bucket that you want to access is located. Example: oss-cn-zhangjiakou-internal.aliyuncs.com.
fs.oss.accessKeyId Enter the AccessKey ID used to access OSS. fs.oss.accessKeySecret Enter the AccessKey secret used to access OSS. fs.oss.impl Specify the class used to implement the OSS file system based on Hadoop. Set the value to org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem. fs.oss.buffer.dir Specify the name of the directory used to store temporary files. We recommend that you set this parameter to /tmp/oss.
fs.oss.connection.secure.enabled Specify whether to enable HTTPS. Performance may be affected when HTTPS is enabled. We recommend that you set this parameter to false.
fs.oss.connection.maximum Specify the maximum number of connections to OSS. We recommend that you set this parameter to 2048.
For more information about more parameters, visit Hadoop-Aliyun module.
- Restart the cluster as prompted by Ambari.
- Test whether data can be read from and written to OSS.
- To run MapReduce jobs, run the following command to move the HDP 2.6.1.0 package to the hdfs://hdp-master:8020/hdp/apps/2.6.1.0-129/mapreduce/mapreduce.tar.gz package: Note In this example, MapReduce jobs are used. For more information about how to run jobs of other types, refer to the following step and code. For example, to run TEZ jobs, move the HDP 2.6.1.0 package to the hdfs://hdp-master:8020/hdp/apps/2.6.1.0-129/tez/tez.tar.gz package.
sudo su hdfs sudo cd sudo hadoop fs -copyToLocal /hdp/apps/2.6.1.0-129/mapreduce/mapreduce.tar.gz sudo hadoop fs -rm /hdp/apps/2.6.1.0-129/mapreduce/mapreduce.tar.gz sudo cp mapreduce.tar.gz mapreduce.tar.gz.bak sudo tar zxf mapreduce.tar.gz sudo cp /usr/hdp/current/hadoop-client/hadoop-aliyun-2.7.3.2.6.1.0-129.jar hadoop/share/hadoop/tools/lib/ sudo cp /usr/hdp/current/hadoop-client/lib/aliyun-* hadoop/share/hadoop/tools/lib/ sudo cp /usr/hdp/current/hadoop-client/lib/jdom-1.1.jar hadoop/share/hadoop/tools/lib/ sudo tar zcf mapreduce.tar.gz hadoop sudo hadoop fs -copyFromLocal mapreduce.tar.gz /hdp/apps/2.6.1.0-129/mapreduce/
Verify the configurations
You can test TeraGen and TeraSort to check whether the configurations take effect.
- Run the following command to test TeraGen:
sudo hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar teragen -Dmapred.map.tasks=100 10995116 oss://{bucket-name}/1G-input
Sample success response:
18/10/28 21:32:38 INFO client.RMProxy: Connecting to ResourceManager at cdh-master/192.168.0.161:8050 18/10/28 21:32:38 INFO client.AHSProxy: Connecting to Application History server at cdh-master/192.168.0.161:10200 18/10/28 21:32:38 INFO aliyun.oss: [Server]Unable to execute HTTP request: Not Found [ErrorCode]: NoSuchKey [RequestId]: 5BD5BA7641FCE369BC1D052C [HostId]: null 18/10/28 21:32:38 INFO aliyun.oss: [Server]Unable to execute HTTP request: Not Found [ErrorCode]: NoSuchKey [RequestId]: 5BD5BA7641FCE369BC1D052F [HostId]: null 18/10/28 21:32:39 INFO terasort.TeraSort: Generating 10995116 using 100 18/10/28 21:32:39 INFO mapreduce.JobSubmitter: number of splits:100 18/10/28 21:32:39 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1540728986531_0005 18/10/28 21:32:39 INFO impl.YarnClientImpl: Submitted application application_1540728986531_0005 18/10/28 21:32:39 INFO mapreduce.Job: The url to track the job: http://cdh-master:8088/proxy/application_1540728986531_0005/ 18/10/28 21:32:39 INFO mapreduce.Job: Running job: job_1540728986531_0005 18/10/28 21:32:49 INFO mapreduce.Job: Job job_1540728986531_0005 running in uber mode : false 18/10/28 21:32:49 INFO mapreduce.Job: map 0% reduce 0% 18/10/28 21:32:55 INFO mapreduce.Job: map 1% reduce 0% 18/10/28 21:32:57 INFO mapreduce.Job: map 2% reduce 0% 18/10/28 21:32:58 INFO mapreduce.Job: map 4% reduce 0% ... 18/10/28 21:34:40 INFO mapreduce.Job: map 99% reduce 0% 18/10/28 21:34:42 INFO mapreduce.Job: map 100% reduce 0% 18/10/28 21:35:15 INFO mapreduce.Job: Job job_1540728986531_0005 completed successfully 18/10/28 21:35:15 INFO mapreduce.Job: Counters: 36 ...
- Run the following command to test TeraSort:
sudo hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar terasort -Dmapred.map.tasks=100 oss://{bucket-name}/1G-input oss://{bucket-name}/1G-output
Sample success response:
18/10/28 21:39:00 INFO terasort.TeraSort: starting ... 18/10/28 21:39:02 INFO mapreduce.JobSubmitter: number of splits:100 18/10/28 21:39:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1540728986531_0006 18/10/28 21:39:02 INFO impl.YarnClientImpl: Submitted application application_1540728986531_0006 18/10/28 21:39:02 INFO mapreduce.Job: The url to track the job: http://cdh-master:8088/proxy/application_1540728986531_0006/ 18/10/28 21:39:02 INFO mapreduce.Job: Running job: job_1540728986531_0006 18/10/28 21:39:09 INFO mapreduce.Job: Job job_1540728986531_0006 running in uber mode : false 18/10/28 21:39:09 INFO mapreduce.Job: map 0% reduce 0% 18/10/28 21:39:17 INFO mapreduce.Job: map 1% reduce 0% 18/10/28 21:39:19 INFO mapreduce.Job: map 2% reduce 0% 18/10/28 21:39:20 INFO mapreduce.Job: map 3% reduce 0% ... 18/10/28 21:42:50 INFO mapreduce.Job: map 100% reduce 75% 18/10/28 21:42:53 INFO mapreduce.Job: map 100% reduce 80% 18/10/28 21:42:56 INFO mapreduce.Job: map 100% reduce 86% 18/10/28 21:42:59 INFO mapreduce.Job: map 100% reduce 92% 18/10/28 21:43:02 INFO mapreduce.Job: map 100% reduce 98% 18/10/28 21:43:05 INFO mapreduce.Job: map 100% reduce 100% ^@18/10/28 21:43:56 INFO mapreduce.Job: Job job_1540728986531_0006 completed successfully 18/10/28 21:43:56 INFO mapreduce.Job: Counters: 54 ...
If the tests are successful, the configurations take effect.