This topic describes how to use open source HDFS clients to access LindormDFS.
Prerequisites
Java Development Kit (JDK) 1.7 or later versions are installed.
The IP address of your client is added to the whitelist of your Lindorm instance. For more information, see Configure whitelists.
Usage notes
If your client is deployed on an ECS instance, the ECS instance and the Lindorm instance meet the following requirements to ensure network connectivity:
The ECS instance and the Lindorm instance are deployed in the same region. We recommend that you also deploy the two instances in the same zone to reduce network latency.
The ECS instance and the Lindorm instance belong to the same virtual private cloud (VPC).
Download the client
You can download the Apache Hadoop SDK V2.7.3 package hadoop-2.7.3.tar.gz from the Apache Hadoop official site.
Configure Apache Hadoop
Run the following command to decompress the downloaded SDK package:
tar -zxvf hadoop-2.7.3.tar.gzRun the following command to configure environment variables:
export HADOOP_HOME=/${Hadoop installation directory}/hadoop-2.7.3Run the following command to go to the
hadoopdirectory:cd $HADOOP_HOMERun the following commands to add the
JAVA_HOMEvariable to thehadoop-env.shfile in theetc/hadoop/directory. In this example, Java is installed in the/opt/install/javadirectory.# set to the root of your Java installation export JAVA_HOME=/opt/install/javaModify the
etc/hadoop/hdfs-site.xmlfile. The following sample file shows how to modify thehdfs-site.xmlfile. You must replace ${Instance ID} in the file with the ID of your Lindorm instance.<configuration> <property> <name>dfs.nameservices</name> <value>${Instance ID}</value> </property> <property> <name>dfs.client.failover.proxy.provider.${Instance ID}</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>dfs.ha.namenodes.${Instance ID}</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.${Instance ID}.nn1</name> <value>${Instance ID}-master1-001.lindorm.rds.aliyuncs.com:8020</value> </property> <property> <name>dfs.namenode.rpc-address.${Instance ID}.nn2</name> <value>${Instance ID}-master2-001.lindorm.rds.aliyuncs.com:8020</value> </property> </configuration>
You can also use the configuration file that is automatically generated by the system. For more information, see Activate LindormDFS.
The preceding example shows how to configure Apache Hadoop on a single instance. The
${Instance ID}field is the ID of a single Lindorm instance. To configure Apache Hadoop on multiple instances, add multiple replicas of all<property>attributes in the example to the<configuration>attribute based on the number of the instances and replace the instance ID in each replica with the ID of an instance on which you want to configure Apache Hadoop.
Examples of common operations
Upload a local file.
Create a directory.
$HADOOP_HOME/bin/hadoop fs -mkdir hdfs://${Instance ID}/testPrepare a file and upload the file to the created directory in LindormDFS.
echo "test" > test.log $HADOOP_HOME/bin/hadoop fs -put test.log hdfs://${Instance ID}/testView the uploaded file.
$HADOOP_HOME/bin/hadoop fs -ls hdfs://${Instance ID}/testDownload the file to your local computer.
$HADOOP_HOME/bin/hadoop fs -get hdfs://${Instance ID}/test/test.logNoteYou must replace ${Instance ID} in the preceding commands with the ID of your Lindorm instance.