If your data is stored in Hadoop Distributed File System (HDFS), you must make sure that the HDFS can be accessed, test the network connectivity between a Data Transport device and the HDFS, and then migrate data by using the Data Transport device. This topic describes how to configure a service IP address, mount an HDFS file system to a Data Transport device, and test the connectivity between the HDFS and the Data Transport device.
Prerequisites
The device where the HDFS file system is deployed is connected to the Ethernet port or optical port of a Data Transport device directly over an Ethernet cable or over a network switch.
The Ethernet cable, optical fiber cable, and optical modules are connected properly. The port connection indicator functions as expected.
The HDFS file system from which the data is migrated supports no authentication or Kerberos authentication.
Step 1: Configure a service IP address for the Data Transport device
Log on to the Data Transport device. For more information, see Install a Data Transport device.
Step 2: Perform operations on the HDFS cluster
The HDFS file system supports no authentication or Kerberos authentication. In the case of no authentication, you must use a username and password to log on to the HDFS cluster. Perform operations based on your business requirements.
No authentication
Check whether the HDFS cluster can be accessed.
Log on to the HDFS cluster and confirm the IP address of the primary node for the HDFS cluster. Example:
192.168.24.247.Confirm the HDFS service port. Example:
8020.In the HDFS configuration file
core-site.xml, the value offs.defaultFSis the port number of the HDFS service.Run the following command to test whether a device in the cluster can access the HDFS:
hadoop fs -ls <IP address of the primary node for the HDFS cluster>:<HDFS service port>.Example:
hadoop fs -ls hdfs://192.168.24.247:8020.NoteIf the HDFS directory information is returned, the HDFS cluster can be accessed.
If an access denied message is returned, the IP address or the port number of the HDFS cluster may be invalid. Check the IP address and the port number and try again.
Test the connectivity between the Data Transport device and the HDFS cluster.
Log on to the Data Transport device and run the ping <IP address of the primary node for the HDFS cluster> command to test the connectivity between the Data Transport device and the HDFS cluster. Example:
ping 192.168.24.247.Run the telnet <IP address of the primary node for the HDFS cluster> <HDFS service port> command to test the connectivity between the Data Transport device and the HDFS cluster. Example:
telnet 192.168.24.247 8020.If the expected output information is returned, the Data Transport device can access the HDFS cluster.
Kerberos authentication
Check whether the Kerberos authentication succeeds.
Run the
kinit PrincipcalName -k -t <keytab path>command.Run the
hdfs dfs -lscommand.If the directories can be listed correctly, the Kerberos authentication succeeds.
Check whether the HDFS cluster can be accessed.
Log on to the HDFS cluster and confirm the IP address of the primary node for the HDFS cluster. Example:
192.168.24.247.Confirm the HDFS service port. Example:
8020.In the HDFS configuration file
core-site.xml, the value offs.defaultFSis the port number of the HDFS service.Select a device in the cluster to test whether the HDFS cluster can be accessed. Run the following command:
hadoop fs -ls <IP address of the primary node for the HDFS cluster>:<HDFS service port>.Example:
hadoop fs -ls hdfs://192.168.24.247:8020.NoteIf the HDFS directory information is returned, the HDFS cluster can be accessed.
If an access denied message is returned, the IP address or the port number of the HDFS cluster may be invalid. Check the IP address and the port number and try again.
Test the connectivity between the Data Transport device and the HDFS cluster.
Log on to the Data Transport device and run the ping <IP address of the primary node for the HDFS cluster> command to test the connectivity between the Data Transport device and the HDFS cluster. Example:
ping 192.168.24.247.Run the telnet <IP address of the primary node for the HDFS cluster> <HDFS service port> command to test the connectivity between the Data Transport device and the HDFS cluster. Example:
telnet 192.168.24.247 8020.If the expected output information is returned, the Data Transport device can access the HDFS cluster.
Copy the specified files to the Data Transport device.
Find the core-site.xml, hdfs-site.xml, krb5.conf, and keytab files in the HDFS cluster and download these files. You can find the files in the following ways:
Run the
find / -name core-site.xmlcommand to find the location of the core-site.xml file.Run the
find / -name hdfs-site.xmlcommand to find the location of the hdfs-site.xml file.Run the
find / -name krb5.confcommand to find the location of the krb5.conf file.Run the
find / -name keytabcommand to find the location of the keytab file.
Download and copy the files to the /mnt/cube/software/hdfs/ directory of the Data Transport device.
In the krb5.conf file, check whether the domain name or the IP address of the Kerberos server is specified by the kdc parameter. If the kdc parameter specifies the domain name of the Kerberos server, you must configure the hosts file on the Data Transport device. You can configure the hosts file in the following way:
Run the
vim /etc/hostscommand to add the IP address and the domain name of the Kerberos server in the<IP address of the Kerberos server> <Domain name of the Kerberos server>format.