In theory, a hash function would convert data of any size (any large text) to a whole number that is greater or equal to zero. Different large texts are converted to different numbers. The interest in this article is not to convert large text to an integer. The interest in this article is to map data of any size to a whole number. This includes the possibility of mapping (converting) a single number into another single number.
Data (on the left) to be mapped are called keys. So each key is converted to a whole number that is equal to or greater than zero. Imagine that there are five keys that have been converted to numbers: 0, 1, 2, 3, and 4. If these keys are the indexes of an array holding five values of the same type, then these keys have been mapped to five different values.
And so, what is a hashmap (hash map)? A hashmap is a data structure consisting of key/value pairs, where each key has been mapped to a value. Actually, each key has been hashed into a number of an array, and the corresponding array cell is to hold a value. In the hashmap topic, the location for the cell of each array index is called a bucket.”
Collision Problem and Resolution
In practice, different keys can be mapped to more than one value. Consider the case of ten keys for an array of length 5, with 5 buckets. In this case, a structure like an array-of-arrays would be constructed. Each bucket would actually be an array of length 2. Two keys would share the same bucket. In this sharing, the first key would map to the first array element for the bucket, and the second key would map to the second array element for the same bucket. This scheme resolves the collision problem, and ten keys have been mapped to 10 values, as expected.
For the rest of this article, imagine a hashmap with the collision problem resolved. The aim of this article is to provide the time complexity of the coding for insertion into a hashmap, coding for deletion into a hashmap, and coding for searching in a hashmap. The time complexity for hashmap is described with these three features. Hash-mapping for C++ is also addressed in this article.
Key/Value Pairs and Value_Type
Imagine a hashmap of names of people against telephone numbers for a telephone directory. The names of telephone users are of data type, text (string). The values of telephone numbers are of data type and text, assuming that spaces and/or hyphens are allowed in a telephone number. The user names are the keys, and the telephone numbers are the values. Do not forget that, internally, the keys are actually converted to array indexes in the data structure. So, the key type is text, and the value type is still text.
Value type means the key/value pair as an element. In C++, each element (pair) is pointed to by an iterator. So, in C++, there is iterator/pair mapping, as well. A pair (key and its value) is referred to as a value type.
Time Complexity for Hashmap Insertion
The time complexity for a hashmap does not refer to the time taken to create the hashmap. It refers to the time taken to insert, delete or search for a value based on a given key. Time complexity is normally written using the big-O notation. The big-O notation consists of O in uppercase, followed by parentheses. In the parentheses are variables and numbers, which give the relative running time for a piece of code and not necessarily the whole program. With the hashmap, k means the number of keys, and n means the number of buckets. Remember that with some hashmaps, a bucket may have more than one value for correspondingly different keys. In this case, more than one key is hashed into the same bucket index. A good hashmap resolves this collision.
Insertion
Given a new key, the average-case time complexity to have the key and its corresponding value inserted into a hashmap data structure is:
Where n is the number of buckets and k is the number of keys. For example, n maybe 10, and k maybe 5. In this particular situation, some buckets are empty (have no value). So, the number of operations would be, 1 + 10/5 = 1 + 2 = 3. That is, there would be 3 coding operations to insert a key/value pair element (given the key). Given a new key and value, the worst-case time complexity to have the key and its corresponding value inserted into a hashmap data structure is:
Where n is the number of buckets for n operations, if the hashmap takes more than one value per bucket for more than one key, then any extra time to insert a further value in the same bucket is negligible and neglected.
Deletion
Given a key already in the hashmap data structure, deletion deletes the key/value pair element. The average-case time complexity for deletion is:
Where n is the number of buckets and k is the number of keys. For example, n maybe 10, and k maybe 5. In this particular situation, some buckets are empty (have no value). So, the number of operations for the code of deletion, would be, 1 + 10/5 = 1 + 2 = 3. So here, there would be 3 coding operations to delete a key/value pair element, given the key.
The worst-case time complexity for deletion, given a key, is:
Where n is the number of buckets, so, if there are n buckets for the hashmap data structure, at the limit, it would take 10 operations to delete a key/value pair element in the hashmap.
Searching
Searching means finding the key/value pair element that has the given key, which should already be in the hashmap. The average-case time complexity for this is:
With the arguments having the same meanings as above. The worst-case time complexity for this is:
With the argument having the same meaning as above.
C++20
Does C++20 have a Hashmap class? – Well, yes, but not with the name, Hashmap. C++20 has four unordered associative containers, which are different forms of Hashmap classes. The containers are: unordered_map, unordered_multimap, unordered_set, and unordered_multiset. These associative containers are in the C++20 standard library. The associative container to consider in this article is unordered_map. The unordered_map uses a default Hash function provided by the C++ standard library.
To include the unordered_map header into a program, use the directive,
Do not use #include <map>, which will include the ordered_map. C++ does not take #include <ordered_map> . The unordered_map header brings the unordered_map class into the C++ program.
Insert with C++
The following code segment in the C++ main function will insert a key/value pair element (name and phone number) into the unordered_map object, ump:
unordered_map<string, string> ump;
ump.insert({"John Rambo", "555-5555"});
Delete with C++
The syntax to delete a key/value pair element from an unordered_map object, given the key, k, is:
Where “a” is the unordered_map object (such as ump above), and erase() is the unordered_map member function.
Searching
The syntax to search a key/value pair element in an unordered_map object, given the key k, which should already be in the unordered_map, is:
Where b is the unordered_map object (such as ump above), and find() is the unordered_map member function.
Conclusion
Time complexity means relative running time for some code. Though the time complexity for creating a hashmap can be determined, when dealing with the hashmap, the time complexity is for inserting, deleting, and searching tasks. The average and worse case time complexities for a well-defined hashmap insert are:
O(n)
The average and worse case time complexities for a well-defined hashmap delete are:
O(n)
The average and worse case time complexities for a well-defined hashmap search are:
O(n)