Data Structure & Algorithms: Hashing, Hash Table, and Hash Function

    With an array, we can search for an element with time complexity O(1). However, it has limitations, such as it stores similar data types only, each cell of an array occupies the same amount of space, and to find an element, we require its index value.

    To make things worse, finding the index value of an element in an array can have a time complexity of O(n) or O(log n). By using the concept of hashing, we can build a data structure that can search elements with constant time complexity.

    Let's learn more!

    What is Hashing?

    Hashing is a technique in which we store data in an array at specific indices using some methods rather than storing the data in an ascending or descending order or randomly.

    Suppose if we want to store 4 in an array, we perform some methods or operations on 4 and calculate the perfect index value for it, and if we want to retrieve 4 from the array, we just reverse that method or operation and get 4 with constant time complexity.

    Hashing Table

    A hashing table or hash table is a collection of elements that are stored in a data structure using a hashing method, which makes it easy to find the elements later. The hash table consists of keys and indexes (or slots).

    Here, a key represents the value that is stored in the table, and an index represents the index (location) of that key. Each position of the hash table, i.e., slots, can hold an item and is named by an integer value starting at 0. We can use an array to implement a hash table, and initially, all the elements of the array would be None.

    Hashing Example: Suppose we want to insert 1, 3, 5, 7, 8, 10, and 11 in a hash table using hashing. Now, create an array of arbitrary size as shown below:

    m=13
    arr[m]
    Key Value Hashing (key value % size of array) Array Index
    1 1 % 7 = 1
    3 3 % 7 = 3
    5 5 % 7 = 5
    7 7 % 7 = 0
    9 9 % 7 = 2
    10 10 % 7 = 3 ( collision ) 3 is already occupied Collision resolution = 3+1= 4
    11 11 % 7 = 4 (Collison) = 4+1=5(collision)= 5+1 = 6

    The array will be arr = [7, 1,  9, 3, 10, 5, 11, None, None, None, None, None, None]

    Hashing Function

    Hash functions or hashing methods are used to map each element or key to a unique slot or index. A hash function is also used to minimize the number of collisions, and using some easy methods, it computes and evenly distributes the items in the hash table.

    There are various hashing methods we can use to map a key to its slot. These are:

    1. Remainder Method
    2. Folding Method
    3. Mid Square Method

    1) Remainder method

    In the reminder method, we divide the key value by the total size of the table or array and use its remainder to specify the index or slot value of that key. For example, if we want to insert 9 in an array of size 20, it would be placed at 9%20 = 9th index if the 9th index is free.

    2) Folding Method

    The folding method for constructing hash functions begins by dividing the item into equal-sized pieces (the last piece may not be of equal size, however). These pieces are then added together to give the resulting hash value. This hashing method applies to large numbers.

    For example, if we want a hash table in which key elements are the mobile numbers of the customers, then the reminder method would not be efficient. We have to use the folding method here.

    If our key was the phone number 436-555-4601. We would take the digits and divide them into groups of 2 (43,65,55,46,01). After the addition of 43+65+55+46+01, we get 210. If we assume our hash table has 11 slots, then we need to perform the extra step of dividing by 11 and keeping the remainder. 210 % 11 is 1, so the phone number 436-555-4601 hashes to slot 1 .

    3. Mid-Square Method

    In the mid-square method, we compute the key slot number or index. We first square the item and then extract some portion of the resulting digits.

    For example, If the item were 44, we would first compute 442=1,936. By extracting the middle two digits, 93, and performing the remainder step, we get 93%11 = 5

    Implementation of the Hash Table

    Python Code for Implementing a Hash Table:

    #load factor = number of items / table size
    class Hash_Table():
        def __init__(self,size):
            self.size=size
            self.slots= [None]*self.size
            self.data =[None]*self.size
    
        def put(self,key,data):
            hashvalue = self.hashfunction(key,len(self.slots))
    
            #if slot is empty
            if self.slots[hashvalue]==None:
                self.slots[hashvalue]=key
                self.data[hashvalue]=data
            else:
                if self.slots[hashvalue]==key:          #if slot has the similar key as old update it with new data
                    self.data[hashvalue]=data
    
                else:
                    nextslots =self.rehash(hashvalue,len(self.slots))    #if key is different
    
                    while self.slots[nextslots]!=None and self.slots[nextslots]!=key:
                        nextslots=self.rehash(nextslots,len(self.slots))
                        if self.slots[nextslots]==None:
                            self.slots[nextslots]=key
                            self.data[nextslots]=data
                        else:
                            self.data[nextslots]=data  
        def hashfunction(self,key,size):
            return key%size
     
        def rehash(self,oldhash,size):
            return (oldhash+1)%size
    
        def get(self,key):
            startslot =self.hashfunction(key,len(self.slots))
            data= None
            stop =False
            found= False
            position =startslot
    
            while self.slots[position]!=None and not found and not stop:
                 if self.slots[position]==key:
                     found=True
                     data= self.data[position]
                 else:
                     position=self.rehash(position,len(self.slots))
                     if position==startslot:
                         stop=True
            return data
    
        def __getitem__(self,key):
            return self.get(key)
    
        def __setitem__(self,key,data):
            self.put(key,data)
    
    h = Hash_Table(5)
    h[1]='go'
    h[2]='python'
    print(h[1])
    print(h[2])
    

    Output:

    go
    python

    Conclusion

    Hashing is an effective way of searching stored elements. Choosing the best hash method for a given problem depends on the problem structure. Therefore, pay good attention to the problem first and then write a program to solve the same using the best hash function.

    We hope that this tutorial helps you understand the topic of hashing in detail.

    Keep learning!

    People are also reading: