Hadoop is an open-source, highly reliable and extensible distributed computing framework developed by the Apache Software Foundation. The core design of the Hadoop framework is HDFS and MapReduce.
HDFS provides massive data storage, and MapReduce provides massive data computing.
- HDFS is an open-source implementation of Google File System (GFS).
- MapReduce is a programming model for parallel operations on large-scale datasets (greater than 1 TB).
By default, Hadoop opens a lot of ports to provide WebUI features. The following table lists those corresponding open ports:
You can download any file by accessing the port 50070 of the NameNode WebUI management interface. Additionally, if the DataNode’s default port 50075 is open, attackers can manipulate stored data in HDFS through the Restful API provided by HDFS.
- Cloudera Manager (earlier than version 5.5)
- Cloudera HUE (earlier than version 3.9.0)
- Apache Ranger (earlier than version 0.5)
- Unauthenticated policy download
- Authenticated SQL injection (CVE-2016-2174)
- Apache Group Hadoop (version 2.6.x)
Hive is a data warehouse infrastructure built on Hadoop. It provides a series of tools for data Extract-Transform-Load (ETL), and a mechanism for storing, querying and analyzing large-scale data stored in Hadoop. Hive uses a simple quasi-SQL query language called HQL, which allows users who are familiar with the SQL language to query data. Meanwhile, the language also allows developers who are familiar with MapReduce to develop customized mappers and reducers to handle complicated analysis tasks that the built-in mappers and reducers fail to accomplish.
HQL can use the transform command to customize the Map/Reduce scripts used by Hive, replacing them with shell or python scripts. Then, the attacker can obtain server privileges through the Hive interface or other related operations.
The information above shows that exposed service ports harbor severe security risks.
According to those Hadoop security issues, exposing service ports may cause severe security risks in Hadoop environment. To harden the security of the Hadoop environment, apply the following configurations.
Use the Security Group Firewall or the firewall of the local operating system to manage accessing IP addresses. If your application only provides services for intranet servers, we recommend that you prohibit exposing all Hadoop service ports to the Internet.
Enable the Kerberos authentication protocol in the Hadoop environment.
We recommend that you pay attention to the latest Hadoop official releases and apply the updates immediately.
Information about all ports in Hadoop
Port Description 9000 fs.defaultFS (for example, hdfs://172.25.40.171:9000.) 9001 dfs.namenode.rpc-address (DataNode connects this port.) 50070 dfs.namenode.http-address 50470 dfs.namenode.https-address 50100 dfs.namenode.backup.address 50105 dfs.namenode.backup.http-address 50090 dfs.namenode.secondary.http-address (for example, 172.25.39.166:50090.) 50091 dfs.namenode.secondary.https-address (for example, 172.25.39.166:50091.) 50020 dfs.datanode.ipc.address 50075 dfs.datanode.http.address 50475 dfs.datanode.https.address 50010 dfs.datanode.address (The data transmission port of DataNode) 8480 dfs.journalnode.rpc-address 8481 dfs.journalnode.https-address 8032 yarn.resourcemanager.address 8088 yarn.resourcemanager.webapp.address (The HTTP port of YARN) 8090 yarn.resourcemanager.webapp.https.address 8030 yarn.resourcemanager.scheduler.address 8031 yarn.resourcemanager.resource-tracker.address 8033 yarn.resourcemanager.admin.address 8042 yarn.nodemanager.webapp.address 8040 yarn.nodemanager.localizer.address 8188 yarn.timeline-service.webapp.address 10020 mapreduce.jobhistory.address 19888 mapreduce.jobhistory.webapp.address 2888 ZooKeeper (Used to listen the Follower connections if Leader) 3888 ZooKeeper (Used for the Leader selection) 2181 ZooKeeper (Used to listen the client connections) 60010 hbase.master.info.port (The HTTP port of HMaster) 60000 hbase.master.port (The RPC port of HMaster) 60030 hbase.regionserver.info.port (The HTTP port of HRegionServer) 60020 hbase.regionserver.port (The RPC port of HRegionServer) 8080 hbase.rest.port (The port of the HBase REST server) 10000 hive.server2.thrift.port 9083 hive.metastore.uris
- Hadoop safari : Hunting for vulnerabilities
- Hadoop Default Ports Quick Reference