After reading this HashMap
Date: Oct 25, 2022
Related Tags:1. The solution and principle analysis of hashmap conflict
2.Java. util. HashMap, java. util. hashmap
Abstract: HashMap is an implementation of the Map interface. HashMap allows empty key-value pairs. HashMap is considered an enhanced version of Hashtable. HashMap is a non-thread-safe container. If you want to construct a thread-safe Map, consider using ConcurrentHashMap. HashMap is unordered because HashMap cannot guarantee the ordering of key-value pairs stored internally.
HashMap is an implementation of the Map interface. HashMap allows empty key-value pairs. HashMap is considered an enhanced version of Hashtable. HashMap is a non-thread-safe container. If you want to construct a thread-safe Map, consider using ConcurrentHashMap. HashMap is unordered because HashMap cannot guarantee the ordering of key-value pairs stored internally.
The underlying data structure of HashMap is a collection of arrays + linked lists. Arrays are also called buckets in HashMap. The time consumption required to traverse the HashMap is the number of HashMap instance buckets + the number of (key - value mapping). So don't set the initial capacity too high or the load factor too low if traversing the elements is important.
HashMap instance has two very important factors, initial capacity and load factor. The initial capacity refers to the number of hash table buckets, and the load factor is a standard to measure the filling degree of the hash table. entry, so that the load factor and current capacity are exceeded, the hash table will perform a rehash operation, and the internal data structure will be rebuilt.
Note that HashMap is not thread-safe. If multiple threads affect HashMap at the same time, and at least one thread modifies the structure of HashMap, then the HashMap must be synchronized. You can use Collections.synchronizedMap(new HashMap) to create a thread-safe Map.
HashMap will cause the external remove method in addition to the iterator itself to cause a fail-fast mechanism, so try to use the iterator's own remove method. If the structure of the map is modified during the iterator creation process, a ConcurrentModificationException will be thrown.
Let's talk about the details of HashMap. Let's start with the interview questions to analyze HashMap .
We introduced HashMap above, now let's introduce HashTable.
Same point
Both HashMap and HashTable are implemented based on hash tables, each of which is a key-value key-value pair. Both HashMap and HashTable implement Map, Cloneable, and Serializable interfaces.
difference
The parent classes are different: HashMap inherits the AbstractMap class, while HashTable inherits the Dictionary class
Null values are different: HashMap allows empty key and value values, HashTable does not allow empty key and value values. HashMap treats Null keys as normal keys. Duplicate null keys are not allowed.
Thread safety: HashMap is not thread-safe. If multiple external operations modify the data structure of HashMap at the same time, such as add or delete, synchronization operations must be performed. Only the modification of key or value is not an operation to change the data structure. Optionally construct thread-safe Maps such as Collections.synchronizedMap or ConcurrentHashMap. And HashTable itself is a thread-safe container.
Performance: Although both HashMap and HashTable are based on singly linked lists, HashMap can achieve constant time performance for put or get???? operations; while HashTable's put and get operations are all added with synchronized locks, so efficiency very poor.
The initial capacity is different: the initial length of HashTable is 11, and then each expansion capacity becomes the previous 2n+1 (n is the last length)
The initial length of HashMap is 16, and then each expansion becomes twice the original length. When creating, if an initial value of capacity is given, then HashTable will directly use the size you give, while HashMap will expand it to a power of 2 size.
The difference between HashMap and HashSet is also often asked
HashSet inherits the AbstractSet interface and implements the Set, Cloneable, and java.io.Serializable interfaces. HashSet does not allow duplicate values in the collection. The bottom layer of HashSet is actually HashMap, and all operations on HashSet are actually operations on HashMap. So HashSet also does not guarantee the order of the collection.
To understand a class, you must first understand the structure of the class, first look at the structure of HashMap:
The three main classes (interfaces) are HashMap, AbstractMap and Map. We have briefly introduced HashMap in the overview above. Let's introduce AbstractMap.
AbstractMap class
This abstract class is the backbone implementation of the Map interface, in order to minimize the workload of implementing classes. To implement an unmodifiable map, the programmer only needs to extend this class and provide an implementation of the entrySet method. It will return a segment of a set of maps. Typically, the returned collection will be implemented on top of AbstractSet. The set should not support the add or remove methods, and neither does its iterator support the remove method.
To implement a modifiable map, the programmer must additionally override the put method of this class (otherwise an UnsupportedOperationException will be thrown), and the iterator returned by entrySet.iterator() must implement the remove() method.
Map interface
The Map interface defines a standard for key-value pairs. An object supports key-value storage. Map cannot contain duplicate keys, each key maps at most one value. This interface replaces the Dictionary class, which is an abstract class rather than an interface.
The Map interface provides three collection constructors that allow the contents of a map to be treated as a set of keys, a set of values, or a set of key-value maps. The order of a map is defined as the order in which an iterator over a map-mapped collection returns its elements. Some map implementations, like the TreeMap class, guarantee map ordering; others, like HashMap, do not.
Important inner classes and interfaces
Node interface
The Node node is used to store instances of HashMap. It implements the Map.Entry interface. Let's first take a look at the definition of the internal interface Entry interface in Map.
Map.Entry
The Node node will store four attributes, hash value, key, value, and a reference to the next Node node
Because Map.Entry is connected by entry chains, Node nodes are also entry chains. When constructing a new HashMap instance, these four attribute values are divided into incoming
Implements the Map.Entry interface, so the methods must be implemented, so the Node node also includes the above five methods
KeySet inner class
The keySet class inherits from the AbstractSet abstract class. It is created by the keyset() method in HashMap to create a KeySet instance. It is designed to operate on the key keys in HashMap. See a code example
In the figure, the three keys "1, 2, 3" are placed in the HashMap, and then the lambda expression is used to loop through the key values. You can see that map.keySet() actually returns a Set interface, KeySet() It is defined in the Map interface, but it is implemented by HashMap, just look at the source code to understand
So the KeySet class operates on the Key in the Map:
Values inner class
The creation of the Values class is actually very similar to the KeySet class, but the KeySet is designed to operate on the keys in the Map, and the Values is designed to use the value value in the key-value key-value pair. Take a look at the code example:
Loop through the values in the Map and see what the values() method ends up creating:
All values are actually stored in AbstractMap, and the Values class actually implements the Values interface in Map. Let's see what methods are available for operations on values.
In fact, it is similar to the operation of the key
EntrySet inner class
As mentioned above, HashMap operates on key and value respectively. In fact, there is also an inner class that operates on key-value key-value pairs, which is EntrySet. Let's take a look at the creation process of EntrySet:
Click on entrySet() and you will find that this method is also defined in the Map interface, which is rewritten by HashMap
If es is empty, a new EntrySet instance is created. EntrySet mainly includes methods for mapping key-value pairs, as follows
The underlying structure of HashMap 1.7
In JDK1.7, HashMap adopts the implementation of bit bucket + linked list, that is, the linked list is used to handle conflicts, and the linked list of the same hash value is stored in an array. However, when there are many elements in a bucket, that is, when there are many elements with the same hash value, the efficiency of searching sequentially by key value is low. Its data structure is as follows
HashMap rough structure
The underlying data structure of HashMap is an Entry array. Entry is the basic unit of HashMap, and each Entry contains a key-value pair.
And each Entry contains "hash, key, value" attributes, which is an inner class of HashMap
So, the overall structure of HashMap is as follows
The underlying structure of HashMap 1.8
Compared with JDK 1.7, 1.8 has made some changes in the underlying structure. When the number of elements in each bucket is greater than 8, it will be converted into a red-black tree. The purpose is to optimize query efficiency. JDK 1.8 rewrites the resize() method.
Important properties of HashMap
"Initial Capacity"
The default initial capacity of a HashMap is managed by the DEFAULT_INITIAL_CAPACITY property.
The default initial capacity of HashMap is 1 << 4 = 16, << is a left shift operation, which is equivalent to
"Maximum capacity"
The maximum capacity of HashMap is
Is there a question here? An int occupies four bytes. It is said that the maximum capacity should be shifted to the left by 31 bits. Why is the maximum capacity of HashMap shifted to the left by 30 bits? Because in numerical calculation, the highest bit, that is, the leftmost bit, represents the sign, 0 -> positive number, 1 -> negative number, the capacity cannot be negative, so the highest bit of HashMap can only be shifted to 2 ^ 30 power.
"Default Load Factor"
The default load factor for HashMap is
The float type uses .f as the unit. The load factor is related to the expansion mechanism. It will be briefly mentioned here and will be discussed in detail later. The principle of the expansion mechanism is that when the quantity stored in HashMap > HashMap capacity * load factor, the capacity of HashMap will be doubled.
The first expansion of HashMap is performed when DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR = 12.
"Tree Threshold"
The treeing threshold for HashMap is
When adding elements, when the number of elements stored in a bucket is > 8, it will be automatically converted to a red-black tree (JDK1.8 feature).
"Linked List Threshold"
The linked list threshold of HashMap is
When deleting elements, if the number of elements stored in a bucket is < 6, it will be automatically converted to a linked list
"Expansion Threshold"
This value indicates that when the capacity of the bucket array is smaller than this value, the expansion is given priority instead of treeing
"node array"
The node array in HashMap is the Entry array, which represents the array in the "array + linked list" data structure in HashMap.
The Node array is initialized when it is used for the first time, and resized and resi are performed when necessary.

"Number of key-value pairs"
In HashMap, use size to represent the number of key-value pairs in the HashMap.
"Number of Modifications"
In HashMap, modCount is used to represent the number of modifications, which is mainly used for the fast fail-fail-fast mechanism when concurrently modifying HashMap.
"Expansion Threshold"
In HashMap, use threshold to represent the threshold for expansion, that is, the value of initial capacity * load factor.
threshold involves an expansion threshold problem, which is solved by the tableSizeFor source code. Let's take a look at its source code and then explain
The code involves an operator |= , which means bitwise OR, what does it mean? You must know that "a+=b means a=a+b", then the same is true: a |= b means a = a | b, that is, both sides are converted into binary to perform the AND operation. As shown below
We used a relatively large number for expansion above. From the above figure, we can see that after a series of OR operations, the result of the 2^29 power array will be calculated as 2^30 power.
Therefore, the length of the expanded array is twice the original size.
"Load Factor"
loadFactor represents the load factor, which represents the density in the HashMap.
HashMap constructor
In the HashMap source code, there are four kinds of constructors, which will be introduced separately.
Constructor with initial capacity initialCapacity and load factor loadFactor
The initial capacity cannot be negative, so when the initial capacity < 0 is passed, an IllegalArgumentException will be thrown directly. If the incoming initial capacity > max capacity, initial capacity = max capacity. The load factor also cannot be less than 0 . Then expand the array. This expansion mechanism is also very important. We will discuss it later.
Constructor with only initialCapacity
The above constructor will also be called eventually, but the default load factor is the default load factor of HashMap, which is 0.75f
parameterless constructor
The default load factor is also 0.75f
Constructor with map
The constructor with Map will directly batch the external elements into the HashMap.
Talk about the whole process of HashMap put
I remember that I went to Beijing for an interview one year after graduation. When a company asked me about the HashMap put process, I hesitated and couldn’t answer. Benchmarked against JDK 1.8 and later. Post the entire code first, and then analyze it line by line.
First look at the putVal method. This method is final. If you define your own HashMap inheritance, you are not allowed to override the put method yourself. Then this method involves five parameters.
hash -> put is placed in the bucket. Before put, the hash function will be calculated.
key -> the key value of the parameter
value -> the value of the parameter
onlyIfAbsent -> whether to change the existing value, that is, whether to replace the value value
evict -> is the flag of the HashMap just created
When calling the putVal method, the hash function will first calculate the position where it should be inserted
The source code of the hash function is as follows
First, let's understand the calculation rules of the hash function
Hash function
The hash function will calculate according to the key value you pass, first calculate the hashCode value of the key, then perform an unsigned right shift operation on the hashcode, and finally perform an XOR ^ operation with the hashCode.
❝
>>>: Unsigned right shift operation, which refers to "unsigned right shift, also called logical right shift, that is, if the number is positive, the high-order bit is filled with 0, and if the number is negative, the high-order bit is shifted right. Also fill 0", that is, whether it is a positive or negative number, right shift will fill 0 in the vacant position.
❞
After getting the hash value, the put process will be performed.
First, it will determine whether the Node array in the HashMap is null. If the HashMap is created for the first time and the element is inserted for the first time, the array will first be resized, that is, reassigned. It also involves a resize() expansion mechanism source code analysis , which we will introduce later. After the expansion is completed, the storage location of the HashMap will be calculated by using "( n - 1 ) & hash".
Then this position will be used as the subscript of the array as the position to store the element. If not empty, then compute this true hash in the table compared to key.hash to insert. If the hash value is the same and the key-value is different, then judge whether it is an instance of the tree, and if so, insert it into the tree. If not, the tail insertion method is performed to insert at the end of the entry chain.
It will judge whether it is a linked list or a red-black tree according to the number of elements in the bucket. Then judge whether the number of key-value pairs is greater than the threshold, and if so, expand the capacity.
Expansion mechanism
In Java, arrays are of fixed length, which means that arrays can only store a fixed amount of data. But in the process of development, many times we can't know how big the array should be. Fortunately, HashMap is an auto-expanding data structure. In this variable-length-based data structure, the expansion mechanism is very important.
In HashMap, the threshold size is the product of the bucket array length and the load factor. When the number of key-value pairs in the HashMap exceeds the threshold, expand the capacity. The expansion mechanism in HashMap is implemented by the resize() method. Let's take a look at it. (Post a Chinese note for easy copying)
The source code of the expansion mechanism is relatively long, we will split it patiently
We split it with if...else if...else logic. The above code mainly does these things
Determine the length of the array in HashMap, that is (Node[])oldTab.length(), and then determine whether the length of the array is larger than the largest length, which is the power of 2^30, if it is larger, directly Take the maximum length, otherwise use bit operation << to expand to twice the original size
If the length of the array is not greater than 0, then judge whether the expansion threshold threshold is greater than 0, that is, to see if there is an externally specified expansion threshold, if so, use it. Here we need to explain when the threshold is oldThr > 0, because oldThr = threshold, here In fact, the comparison is the threshold, because each constructor in HashMap will call the HashMap(initCapacity, loadFactor) constructor, so if there is no external specified initialCapacity, the initial capacity is 16, and then according to this.threshold = tableSizeFor(initialCapacity) ; Find the value of threshold.
Otherwise, the default initial capacity and expansion threshold are directly used, and the logic of going to else is when the table is just initialized.
Then it will judge whether newThr is 0. When I first started researching, I found that newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); I always thought that this was a constant multiplication, how could it be 0? Actually, it is not the problem of this part, but the above logical judgment The expansion operation in , may cause a bit overflow.
Example of causing bit overflow: oldCap = power of 2^28, threshold > 2 to a cubic integer power. When entering float ft = (float)newCap * loadFactor; this method is that 2^28 * 2^(3+n) will be directly > 2^31 power, causing all to zero.
"After the expansion, the nodes need to be placed in the newly expanded array, which also involves three steps"
For each Node node in the loop bucket, judge whether Node[i] is empty, return it directly if it is empty, traverse the bucket array if it is not empty, and map the key-value pair to the new bucket array.
If it is not empty, then judge whether it is a tree structure. If it is a tree structure, it will be split according to the tree structure. The split method is in the split method.
If it is not a tree structure, traverse the linked list and group the nodes of the linked list in the original order.
Talk about the whole process of the get method
We talked about the whole process of the put method in HashMap above. Let's take a look at the process of the get method.
Let's briefly introduce it. First, it will check whether the elements in the table are empty, and then calculate the position of the specified key according to the hash. Then check whether the first element of the linked list is empty, if it is not empty, whether it matches, if it matches, return the record directly; if it matches, then judge whether the value of the next element is null, if it is empty, return it directly, if not If it is empty, then judge whether it is a TreeNode instance. If it is a TreeNode instance, directly use TreeNode.getTreeNode to retrieve the element, otherwise, execute the loop until the next element is null.
The getNode method has a more important process is "(n - 1) & hash", this code is to determine the location of the bucket to be searched, so why (n - 1) & hash?
n is the number of buckets in the HashMap, which means that (n - 1) & hash is (the capacity of the bucket - 1) & hash
Then (n - 1) & hash after each calculation, in order
That is, the specific position calculated by "tab[(n - 1) & hash]".
Traversal of HashMap
Traversal of HashMap is also a very frequently used operation
The base class of HashMap traversal is HashIterator, which is a Hash iterator, which is an abstract class inside HashMap. Its structure is relatively simple. There are only three methods, "hasNext, remove and nextNode" methods, of which the nextNode method is composed of three methods. Iterator implemented
These three iterators are
KeyIterator , iterating over keys
ValueIterator, iterates over value
EntryIterator, which traverses the Entry chain
Although there are many iterators, in fact, their traversal order is the same, and the structure is very simple. They all use the nextNode method in HashIterator to traverse
Traversal in HashIterator
next and current represent the next Node node and the current Node node respectively, HashIterator will traverse all the nodes during initialization. Below we use a graph to represent their traversal order
You will find that the traversal method of the nextNode() method is the same as the traversal method of HashIterator, but the judgment conditions are different. When constructing the HashIterator, the judgment conditions are whether there is a linked list, whether the bucket is null, and the judgment conditions for traversing nextNode become the next node The node is not null, and the bucket is not null.
Remove method in HashMap
The removal method in HashMap is also relatively simple, the source code is as follows
There are many remove methods, which will eventually call the removeNode method, but the parameter values passed are different. Let's take remove(object) to demonstrate.
First, the corresponding bucket will be found through hash, and then the node with the same key value will be found by traversing the linked list, and then the corresponding node will be deleted.
HashMap data structure
In JDK1.7, HashMap adopts the implementation of bit bucket + linked list, that is, the linked list is used to handle conflicts, and the linked list of the same hash value is stored in an array. However, when there are many elements in a bucket, that is, when there are many elements with the same hash value, the efficiency of searching sequentially by key value is low.
Therefore, compared with JDK 1.7, JDK 1.8 has made some changes in the underlying structure. When the number of elements in each bucket is greater than 8, it will be converted into a red-black tree to optimize query efficiency.
The put process of HashMap
The general process is as follows. First, the hash method is used to calculate the hash code of the object, and the storage location in the bucket is determined according to the hash code. If there is no Node node in the bucket, put it directly. If there is already a Node node in the corresponding bucket, it will put the object in the bucket. The length of the linked list is analyzed to determine whether the length is greater than 8. If the length of the linked list is less than 8, the head insertion method will be used before JDK1.7, and it will be changed to the tail insertion method after JDK1.8. If the length of the linked list is greater than 8, a tree operation will be performed, and the linked list will be converted into a red-black tree and stored on the red-black tree.
Why HashMap is not thread safe
HashMap is not a thread-safe container, and the insecurity is reflected in the concurrent put operation of HashMap by multiple threads. If there are two threads A and B, first A wants to insert a key-value pair into the HashMap. When the position of the bucket is determined and put, the time slice of A is just used up, it is B's turn to run, and B is executed after running The same operation as A, except that B successfully inserts the key-value pair into it. If A and B are inserted at the same location (bucket), then thread A will overwrite the record of B after continuing to execute, causing data inconsistency.
Another point is that when HashMap is expanding, a loop will be formed due to the resize method, resulting in an infinite loop and causing the CPU to soar.
How HashMap handles hash collisions
The bottom layer of HashMap is implemented by using bit bucket + linked list. The bit bucket determines the insertion position of the element. The bit bucket is determined by the hash method. All are placed in the corresponding bit buckets to form a linked list. This way of dealing with hash collisions is called the chain address method.
Other ways to deal with hash collisions include "open address method, rehash method, and establishing a public overflow area".
How HashMap gets elements
First, it will check whether the elements in the table are empty, and then calculate the position of the specified key according to the hash. Then check whether the first element of the linked list is empty, if it is not empty, whether it matches, if it matches, return the record directly; if it matches, then judge whether the value of the next element is null, if it is empty, return it directly, if not If it is empty, then judge whether it is a TreeNode instance. If it is a TreeNode instance, directly use TreeNode.getTreeNode to retrieve the element, otherwise, execute the loop until the next element is null.
What is the difference between HashMap and HashTable
see above
Difference between HashMap and HashSet
see above
How HashMap scales
There are two very important variables in HashMap, one is loadFactor and the other is threshold, loadFactor represents the load factor, and threshold represents the threshold to be expanded next time. When threshold = loadFactor * array length, the array length is expanded by the original twice, to resize the map and put the original objects into the new bucket array.
Why is the length of HashMap a power of 2?
I thought about this question for a few days. When I discussed the daily question with the friends in the group, I asked them why length%hash == (n - 1) & hash. They said that the premise of equality is the length of length 2 power of 2, and then I replied, can length not be a power of 2? In fact, I didn't understand the causal relationship, because the length of HashMap is a power of 2, so the remainder is used to determine the subscript in the bucket. If the length of length is not a power of 2, you can give an example to try.
For example, when the length is 9, 3 & (9-1) = 0, 2 & (9-1) = 0, all on 0, colliding;
This increases the chance of HashMap collisions.
What are the implementations of HashMap thread safety
Because HashMap is not a thread-safe container, it is recommended to use ConcurrentHashMap in concurrent scenarios, or use thread-safe HashMap, and use thread-safe containers under Collections package, such as
You can also use HashTable, which is also a thread-safe container based on key-value storage. HashMap and HashTable are often compared because the data structure of HashTable is the same as HashMap.
Related Tags:1. The solution and principle analysis of hashmap conflict
2.Java. util. HashMap, java. util. hashmap
Abstract: HashMap is an implementation of the Map interface. HashMap allows empty key-value pairs. HashMap is considered an enhanced version of Hashtable. HashMap is a non-thread-safe container. If you want to construct a thread-safe Map, consider using ConcurrentHashMap. HashMap is unordered because HashMap cannot guarantee the ordering of key-value pairs stored internally.
HashMap is an implementation of the Map interface. HashMap allows empty key-value pairs. HashMap is considered an enhanced version of Hashtable. HashMap is a non-thread-safe container. If you want to construct a thread-safe Map, consider using ConcurrentHashMap. HashMap is unordered because HashMap cannot guarantee the ordering of key-value pairs stored internally.
The underlying data structure of HashMap is a collection of arrays + linked lists. Arrays are also called buckets in HashMap. The time consumption required to traverse the HashMap is the number of HashMap instance buckets + the number of (key - value mapping). So don't set the initial capacity too high or the load factor too low if traversing the elements is important.
HashMap instance has two very important factors, initial capacity and load factor. The initial capacity refers to the number of hash table buckets, and the load factor is a standard to measure the filling degree of the hash table. entry, so that the load factor and current capacity are exceeded, the hash table will perform a rehash operation, and the internal data structure will be rebuilt.
Note that HashMap is not thread-safe. If multiple threads affect HashMap at the same time, and at least one thread modifies the structure of HashMap, then the HashMap must be synchronized. You can use Collections.synchronizedMap(new HashMap) to create a thread-safe Map.
HashMap will cause the external remove method in addition to the iterator itself to cause a fail-fast mechanism, so try to use the iterator's own remove method. If the structure of the map is modified during the iterator creation process, a ConcurrentModificationException will be thrown.
Let's talk about the details of HashMap. Let's start with the interview questions to analyze HashMap .
The difference between HashMap and HashTable
We introduced HashMap above, now let's introduce HashTable.
Same point
Both HashMap and HashTable are implemented based on hash tables, each of which is a key-value key-value pair. Both HashMap and HashTable implement Map, Cloneable, and Serializable interfaces.
difference
The parent classes are different: HashMap inherits the AbstractMap class, while HashTable inherits the Dictionary class
Null values are different: HashMap allows empty key and value values, HashTable does not allow empty key and value values. HashMap treats Null keys as normal keys. Duplicate null keys are not allowed.
Thread safety: HashMap is not thread-safe. If multiple external operations modify the data structure of HashMap at the same time, such as add or delete, synchronization operations must be performed. Only the modification of key or value is not an operation to change the data structure. Optionally construct thread-safe Maps such as Collections.synchronizedMap or ConcurrentHashMap. And HashTable itself is a thread-safe container.
Performance: Although both HashMap and HashTable are based on singly linked lists, HashMap can achieve constant time performance for put or get???? operations; while HashTable's put and get operations are all added with synchronized locks, so efficiency very poor.
The initial capacity is different: the initial length of HashTable is 11, and then each expansion capacity becomes the previous 2n+1 (n is the last length)
The initial length of HashMap is 16, and then each expansion becomes twice the original length. When creating, if an initial value of capacity is given, then HashTable will directly use the size you give, while HashMap will expand it to a power of 2 size.
Difference between HashMap and HashSet
The difference between HashMap and HashSet is also often asked
HashSet inherits the AbstractSet interface and implements the Set, Cloneable, and java.io.Serializable interfaces. HashSet does not allow duplicate values in the collection. The bottom layer of HashSet is actually HashMap, and all operations on HashSet are actually operations on HashMap. So HashSet also does not guarantee the order of the collection.
HashMap underlying structure
To understand a class, you must first understand the structure of the class, first look at the structure of HashMap:
The three main classes (interfaces) are HashMap, AbstractMap and Map. We have briefly introduced HashMap in the overview above. Let's introduce AbstractMap.
AbstractMap class
This abstract class is the backbone implementation of the Map interface, in order to minimize the workload of implementing classes. To implement an unmodifiable map, the programmer only needs to extend this class and provide an implementation of the entrySet method. It will return a segment of a set of maps. Typically, the returned collection will be implemented on top of AbstractSet. The set should not support the add or remove methods, and neither does its iterator support the remove method.
To implement a modifiable map, the programmer must additionally override the put method of this class (otherwise an UnsupportedOperationException will be thrown), and the iterator returned by entrySet.iterator() must implement the remove() method.
Map interface
The Map interface defines a standard for key-value pairs. An object supports key-value storage. Map cannot contain duplicate keys, each key maps at most one value. This interface replaces the Dictionary class, which is an abstract class rather than an interface.
The Map interface provides three collection constructors that allow the contents of a map to be treated as a set of keys, a set of values, or a set of key-value maps. The order of a map is defined as the order in which an iterator over a map-mapped collection returns its elements. Some map implementations, like the TreeMap class, guarantee map ordering; others, like HashMap, do not.
Important inner classes and interfaces
Node interface
The Node node is used to store instances of HashMap. It implements the Map.Entry interface. Let's first take a look at the definition of the internal interface Entry interface in Map.
Map.Entry
The Node node will store four attributes, hash value, key, value, and a reference to the next Node node
Because Map.Entry is connected by entry chains, Node nodes are also entry chains. When constructing a new HashMap instance, these four attribute values are divided into incoming
Implements the Map.Entry interface, so the methods must be implemented, so the Node node also includes the above five methods
KeySet inner class
The keySet class inherits from the AbstractSet abstract class. It is created by the keyset() method in HashMap to create a KeySet instance. It is designed to operate on the key keys in HashMap. See a code example
In the figure, the three keys "1, 2, 3" are placed in the HashMap, and then the lambda expression is used to loop through the key values. You can see that map.keySet() actually returns a Set interface, KeySet() It is defined in the Map interface, but it is implemented by HashMap, just look at the source code to understand
So the KeySet class operates on the Key in the Map:
Values inner class
The creation of the Values class is actually very similar to the KeySet class, but the KeySet is designed to operate on the keys in the Map, and the Values is designed to use the value value in the key-value key-value pair. Take a look at the code example:
Loop through the values in the Map and see what the values() method ends up creating:
All values are actually stored in AbstractMap, and the Values class actually implements the Values interface in Map. Let's see what methods are available for operations on values.
In fact, it is similar to the operation of the key
EntrySet inner class
As mentioned above, HashMap operates on key and value respectively. In fact, there is also an inner class that operates on key-value key-value pairs, which is EntrySet. Let's take a look at the creation process of EntrySet:
Click on entrySet() and you will find that this method is also defined in the Map interface, which is rewritten by HashMap
If es is empty, a new EntrySet instance is created. EntrySet mainly includes methods for mapping key-value pairs, as follows
The underlying structure of HashMap 1.7
In JDK1.7, HashMap adopts the implementation of bit bucket + linked list, that is, the linked list is used to handle conflicts, and the linked list of the same hash value is stored in an array. However, when there are many elements in a bucket, that is, when there are many elements with the same hash value, the efficiency of searching sequentially by key value is low. Its data structure is as follows
HashMap rough structure
The underlying data structure of HashMap is an Entry array. Entry is the basic unit of HashMap, and each Entry contains a key-value pair.
And each Entry contains "hash, key, value" attributes, which is an inner class of HashMap
So, the overall structure of HashMap is as follows
The underlying structure of HashMap 1.8
Compared with JDK 1.7, 1.8 has made some changes in the underlying structure. When the number of elements in each bucket is greater than 8, it will be converted into a red-black tree. The purpose is to optimize query efficiency. JDK 1.8 rewrites the resize() method.
Important properties of HashMap
"Initial Capacity"
The default initial capacity of a HashMap is managed by the DEFAULT_INITIAL_CAPACITY property.
The default initial capacity of HashMap is 1 << 4 = 16, << is a left shift operation, which is equivalent to
"Maximum capacity"
The maximum capacity of HashMap is
Is there a question here? An int occupies four bytes. It is said that the maximum capacity should be shifted to the left by 31 bits. Why is the maximum capacity of HashMap shifted to the left by 30 bits? Because in numerical calculation, the highest bit, that is, the leftmost bit, represents the sign, 0 -> positive number, 1 -> negative number, the capacity cannot be negative, so the highest bit of HashMap can only be shifted to 2 ^ 30 power.
"Default Load Factor"
The default load factor for HashMap is
The float type uses .f as the unit. The load factor is related to the expansion mechanism. It will be briefly mentioned here and will be discussed in detail later. The principle of the expansion mechanism is that when the quantity stored in HashMap > HashMap capacity * load factor, the capacity of HashMap will be doubled.
The first expansion of HashMap is performed when DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR = 12.
"Tree Threshold"
The treeing threshold for HashMap is
When adding elements, when the number of elements stored in a bucket is > 8, it will be automatically converted to a red-black tree (JDK1.8 feature).
"Linked List Threshold"
The linked list threshold of HashMap is
When deleting elements, if the number of elements stored in a bucket is < 6, it will be automatically converted to a linked list
"Expansion Threshold"
This value indicates that when the capacity of the bucket array is smaller than this value, the expansion is given priority instead of treeing
"node array"
The node array in HashMap is the Entry array, which represents the array in the "array + linked list" data structure in HashMap.
The Node array is initialized when it is used for the first time, and resized and resi are performed when necessary.

"Number of key-value pairs"
In HashMap, use size to represent the number of key-value pairs in the HashMap.
"Number of Modifications"
In HashMap, modCount is used to represent the number of modifications, which is mainly used for the fast fail-fail-fast mechanism when concurrently modifying HashMap.
"Expansion Threshold"
In HashMap, use threshold to represent the threshold for expansion, that is, the value of initial capacity * load factor.
threshold involves an expansion threshold problem, which is solved by the tableSizeFor source code. Let's take a look at its source code and then explain
The code involves an operator |= , which means bitwise OR, what does it mean? You must know that "a+=b means a=a+b", then the same is true: a |= b means a = a | b, that is, both sides are converted into binary to perform the AND operation. As shown below
We used a relatively large number for expansion above. From the above figure, we can see that after a series of OR operations, the result of the 2^29 power array will be calculated as 2^30 power.
Therefore, the length of the expanded array is twice the original size.
"Load Factor"
loadFactor represents the load factor, which represents the density in the HashMap.
HashMap constructor
In the HashMap source code, there are four kinds of constructors, which will be introduced separately.
Constructor with initial capacity initialCapacity and load factor loadFactor
The initial capacity cannot be negative, so when the initial capacity < 0 is passed, an IllegalArgumentException will be thrown directly. If the incoming initial capacity > max capacity, initial capacity = max capacity. The load factor also cannot be less than 0 . Then expand the array. This expansion mechanism is also very important. We will discuss it later.
Constructor with only initialCapacity
The above constructor will also be called eventually, but the default load factor is the default load factor of HashMap, which is 0.75f
parameterless constructor
The default load factor is also 0.75f
Constructor with map
The constructor with Map will directly batch the external elements into the HashMap.
Talk about the whole process of HashMap put
I remember that I went to Beijing for an interview one year after graduation. When a company asked me about the HashMap put process, I hesitated and couldn’t answer. Benchmarked against JDK 1.8 and later. Post the entire code first, and then analyze it line by line.
First look at the putVal method. This method is final. If you define your own HashMap inheritance, you are not allowed to override the put method yourself. Then this method involves five parameters.
hash -> put is placed in the bucket. Before put, the hash function will be calculated.
key -> the key value of the parameter
value -> the value of the parameter
onlyIfAbsent -> whether to change the existing value, that is, whether to replace the value value
evict -> is the flag of the HashMap just created
When calling the putVal method, the hash function will first calculate the position where it should be inserted
The source code of the hash function is as follows
First, let's understand the calculation rules of the hash function
Hash function
The hash function will calculate according to the key value you pass, first calculate the hashCode value of the key, then perform an unsigned right shift operation on the hashcode, and finally perform an XOR ^ operation with the hashCode.
❝
>>>: Unsigned right shift operation, which refers to "unsigned right shift, also called logical right shift, that is, if the number is positive, the high-order bit is filled with 0, and if the number is negative, the high-order bit is shifted right. Also fill 0", that is, whether it is a positive or negative number, right shift will fill 0 in the vacant position.
❞
After getting the hash value, the put process will be performed.
First, it will determine whether the Node array in the HashMap is null. If the HashMap is created for the first time and the element is inserted for the first time, the array will first be resized, that is, reassigned. It also involves a resize() expansion mechanism source code analysis , which we will introduce later. After the expansion is completed, the storage location of the HashMap will be calculated by using "( n - 1 ) & hash".
Then this position will be used as the subscript of the array as the position to store the element. If not empty, then compute this true hash in the table compared to key.hash to insert. If the hash value is the same and the key-value is different, then judge whether it is an instance of the tree, and if so, insert it into the tree. If not, the tail insertion method is performed to insert at the end of the entry chain.
It will judge whether it is a linked list or a red-black tree according to the number of elements in the bucket. Then judge whether the number of key-value pairs is greater than the threshold, and if so, expand the capacity.
Expansion mechanism
In Java, arrays are of fixed length, which means that arrays can only store a fixed amount of data. But in the process of development, many times we can't know how big the array should be. Fortunately, HashMap is an auto-expanding data structure. In this variable-length-based data structure, the expansion mechanism is very important.
In HashMap, the threshold size is the product of the bucket array length and the load factor. When the number of key-value pairs in the HashMap exceeds the threshold, expand the capacity. The expansion mechanism in HashMap is implemented by the resize() method. Let's take a look at it. (Post a Chinese note for easy copying)
The source code of the expansion mechanism is relatively long, we will split it patiently
We split it with if...else if...else logic. The above code mainly does these things
Determine the length of the array in HashMap, that is (Node
If the length of the array is not greater than 0, then judge whether the expansion threshold threshold is greater than 0, that is, to see if there is an externally specified expansion threshold, if so, use it. Here we need to explain when the threshold is oldThr > 0, because oldThr = threshold, here In fact, the comparison is the threshold, because each constructor in HashMap will call the HashMap(initCapacity, loadFactor) constructor, so if there is no external specified initialCapacity, the initial capacity is 16, and then according to this.threshold = tableSizeFor(initialCapacity) ; Find the value of threshold.
Otherwise, the default initial capacity and expansion threshold are directly used, and the logic of going to else is when the table is just initialized.
Then it will judge whether newThr is 0. When I first started researching, I found that newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); I always thought that this was a constant multiplication, how could it be 0? Actually, it is not the problem of this part, but the above logical judgment The expansion operation in , may cause a bit overflow.
Example of causing bit overflow: oldCap = power of 2^28, threshold > 2 to a cubic integer power. When entering float ft = (float)newCap * loadFactor; this method is that 2^28 * 2^(3+n) will be directly > 2^31 power, causing all to zero.
"After the expansion, the nodes need to be placed in the newly expanded array, which also involves three steps"
For each Node node in the loop bucket, judge whether Node[i] is empty, return it directly if it is empty, traverse the bucket array if it is not empty, and map the key-value pair to the new bucket array.
If it is not empty, then judge whether it is a tree structure. If it is a tree structure, it will be split according to the tree structure. The split method is in the split method.
If it is not a tree structure, traverse the linked list and group the nodes of the linked list in the original order.
Talk about the whole process of the get method
We talked about the whole process of the put method in HashMap above. Let's take a look at the process of the get method.
Let's briefly introduce it. First, it will check whether the elements in the table are empty, and then calculate the position of the specified key according to the hash. Then check whether the first element of the linked list is empty, if it is not empty, whether it matches, if it matches, return the record directly; if it matches, then judge whether the value of the next element is null, if it is empty, return it directly, if not If it is empty, then judge whether it is a TreeNode instance. If it is a TreeNode instance, directly use TreeNode.getTreeNode to retrieve the element, otherwise, execute the loop until the next element is null.
The getNode method has a more important process is "(n - 1) & hash", this code is to determine the location of the bucket to be searched, so why (n - 1) & hash?
n is the number of buckets in the HashMap, which means that (n - 1) & hash is (the capacity of the bucket - 1) & hash
Then (n - 1) & hash after each calculation, in order
That is, the specific position calculated by "tab[(n - 1) & hash]".
Traversal of HashMap
Traversal of HashMap is also a very frequently used operation
The base class of HashMap traversal is HashIterator, which is a Hash iterator, which is an abstract class inside HashMap. Its structure is relatively simple. There are only three methods, "hasNext, remove and nextNode" methods, of which the nextNode method is composed of three methods. Iterator implemented
These three iterators are
KeyIterator , iterating over keys
ValueIterator, iterates over value
EntryIterator, which traverses the Entry chain
Although there are many iterators, in fact, their traversal order is the same, and the structure is very simple. They all use the nextNode method in HashIterator to traverse
Traversal in HashIterator
next and current represent the next Node node and the current Node node respectively, HashIterator will traverse all the nodes during initialization. Below we use a graph to represent their traversal order
You will find that the traversal method of the nextNode() method is the same as the traversal method of HashIterator, but the judgment conditions are different. When constructing the HashIterator, the judgment conditions are whether there is a linked list, whether the bucket is null, and the judgment conditions for traversing nextNode become the next node The node is not null, and the bucket is not null.
Remove method in HashMap
The removal method in HashMap is also relatively simple, the source code is as follows
There are many remove methods, which will eventually call the removeNode method, but the parameter values passed are different. Let's take remove(object) to demonstrate.
First, the corresponding bucket will be found through hash, and then the node with the same key value will be found by traversing the linked list, and then the corresponding node will be deleted.
Interview questions about HashMap
HashMap data structure
In JDK1.7, HashMap adopts the implementation of bit bucket + linked list, that is, the linked list is used to handle conflicts, and the linked list of the same hash value is stored in an array. However, when there are many elements in a bucket, that is, when there are many elements with the same hash value, the efficiency of searching sequentially by key value is low.
Therefore, compared with JDK 1.7, JDK 1.8 has made some changes in the underlying structure. When the number of elements in each bucket is greater than 8, it will be converted into a red-black tree to optimize query efficiency.
The put process of HashMap
The general process is as follows. First, the hash method is used to calculate the hash code of the object, and the storage location in the bucket is determined according to the hash code. If there is no Node node in the bucket, put it directly. If there is already a Node node in the corresponding bucket, it will put the object in the bucket. The length of the linked list is analyzed to determine whether the length is greater than 8. If the length of the linked list is less than 8, the head insertion method will be used before JDK1.7, and it will be changed to the tail insertion method after JDK1.8. If the length of the linked list is greater than 8, a tree operation will be performed, and the linked list will be converted into a red-black tree and stored on the red-black tree.
Why HashMap is not thread safe
HashMap is not a thread-safe container, and the insecurity is reflected in the concurrent put operation of HashMap by multiple threads. If there are two threads A and B, first A wants to insert a key-value pair into the HashMap. When the position of the bucket is determined and put, the time slice of A is just used up, it is B's turn to run, and B is executed after running The same operation as A, except that B successfully inserts the key-value pair into it. If A and B are inserted at the same location (bucket), then thread A will overwrite the record of B after continuing to execute, causing data inconsistency.
Another point is that when HashMap is expanding, a loop will be formed due to the resize method, resulting in an infinite loop and causing the CPU to soar.
How HashMap handles hash collisions
The bottom layer of HashMap is implemented by using bit bucket + linked list. The bit bucket determines the insertion position of the element. The bit bucket is determined by the hash method. All are placed in the corresponding bit buckets to form a linked list. This way of dealing with hash collisions is called the chain address method.
Other ways to deal with hash collisions include "open address method, rehash method, and establishing a public overflow area".
How HashMap gets elements
First, it will check whether the elements in the table are empty, and then calculate the position of the specified key according to the hash. Then check whether the first element of the linked list is empty, if it is not empty, whether it matches, if it matches, return the record directly; if it matches, then judge whether the value of the next element is null, if it is empty, return it directly, if not If it is empty, then judge whether it is a TreeNode instance. If it is a TreeNode instance, directly use TreeNode.getTreeNode to retrieve the element, otherwise, execute the loop until the next element is null.
What is the difference between HashMap and HashTable
see above
Difference between HashMap and HashSet
see above
How HashMap scales
There are two very important variables in HashMap, one is loadFactor and the other is threshold, loadFactor represents the load factor, and threshold represents the threshold to be expanded next time. When threshold = loadFactor * array length, the array length is expanded by the original twice, to resize the map and put the original objects into the new bucket array.
Why is the length of HashMap a power of 2?
I thought about this question for a few days. When I discussed the daily question with the friends in the group, I asked them why length%hash == (n - 1) & hash. They said that the premise of equality is the length of length 2 power of 2, and then I replied, can length not be a power of 2? In fact, I didn't understand the causal relationship, because the length of HashMap is a power of 2, so the remainder is used to determine the subscript in the bucket. If the length of length is not a power of 2, you can give an example to try.
For example, when the length is 9, 3 & (9-1) = 0, 2 & (9-1) = 0, all on 0, colliding;
This increases the chance of HashMap collisions.
What are the implementations of HashMap thread safety
Because HashMap is not a thread-safe container, it is recommended to use ConcurrentHashMap in concurrent scenarios, or use thread-safe HashMap, and use thread-safe containers under Collections package, such as
You can also use HashTable, which is also a thread-safe container based on key-value storage. HashMap and HashTable are often compared because the data structure of HashTable is the same as HashMap.
Related Articles
-
A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team
-
What Does IOT Mean
Knowledge Base Team
-
6 Optional Technologies for Data Storage
Knowledge Base Team
-
What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers
-
Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00