Fundamentals 13 min read

Understanding Hash Tables and Implementing Them in Python

This article explains the concept of hashing and hash tables, describes their characteristics and collision‑resolution strategies, showcases practical Python implementations with load‑factor handling and resizing, and highlights common applications such as dictionaries and network session tables.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Understanding Hash Tables and Implementing Them in Python

Hashing is a computer‑science technique that maps a key to an index using a hash function, creating a hash table that enables fast data retrieval.

A hash table stores data based on keys; the hash function determines the memory location for each key‑value pair.

It provides direct access without comparison, but different keys may map to the same index, causing collisions.

A uniform hash function distributes keys evenly, reducing collisions.

Collision‑resolution methods include open addressing (linear probing, quadratic probing), chaining (storing colliding elements in a linked list), rehashing, and using a separate overflow area.

Typical applications are dictionary implementations, network firewalls session tables, and extensive use in the Linux kernel.

Below is a simple hash table using open addressing with linear probing (Code Segment 1).

<code>class Hash:
    # Table length set to 11
    def __init__(self):
        self.hash_table=[[None,None] for i in range(11)]
    # Hash function
    def hash(self,k,i):
        h_value=(k+i)%11
        if self.hash_table[h_value][0]==k:
            return h_value
        if self.hash_table[h_value][0]!=None:
            i+=1
            h_value=self.hash(k,i)
        return h_value
    def put(self,k,v):
        hash_v=self.hash(k,0)
        self.hash_table[hash_v][0]=k
        self.hash_table[hash_v][1]=v
    def get(self,k):
        hash_v=self.hash(k,0)
        return self.hash_table[hash_v][1]

hash = Hash()
hash.put(1 ,'wang')
print(hash.get(1))</code>

The simple table has a fixed length of 11, so as more elements are added, the chance of collisions grows.

To address this, the load factor (ratio of stored elements to table size) is introduced; when it exceeds a threshold (e.g., 0.75), the table should be resized, typically doubling its capacity.

An improved implementation with automatic resizing is shown in Code Segment 2.

<code>class Map:
    def __init__(self):
        self.capacity=11
        self.hash_table=[[None,None] for i in range(self.capacity)]
        self.num=0
        self.load_factor=0.75
    def hash(self,k,i):
        h_value=(k+i)%self.capacity
        if self.hash_table[h_value][0]==k:
            return h_value
        if self.hash_table[h_value][0]!=None:
            i+=1
            h_value=self.hash(k,i)
        return h_value
    def resize(self):
        # Expand to twice the number of inserted elements
        self.capacity=self.num*2
        temp=self.hash_table[:]
        self.hash_table=[[None,None] for i in range(self.capacity)]
        for i in temp:
            if(i[0]!=None):
                hash_v=self.hash(i[0],0)
                self.hash_table[hash_v][0]=i[0]
                self.hash_table[hash_v][1]=i[1]
    def put(self,k,v):
        hash_v=self.hash(k,0)
        self.hash_table[hash_v][0]=k
        self.hash_table[hash_v][1]=v
        self.num+=1
        if(self.num/len(self.hash_table)>self.load_factor):
            self.resize()
    def get(self,k):
        hash_v=self.hash(k,0)
        return self.hash_table[hash_v][1]
</code>

Python’s built‑in dict uses a hash table; its syntax is demonstrated, followed by a custom dictionary class that implements its own hash function, rehashing, and basic operations (Code Segment 3).

<code>dictionary = {'name': 'wang', 'age': 17, 'class': 'first'}
dictionary['age'] = 18
dictionary['country'] = 'china'
print(dictionary['age'])
print(dictionary['country'])

class MyDictionary(object):
    # Initialization
    def __init__(self):
        self.table_size = 13
        self.key_list = [None]*self.table_size
        self.value_list = [None]*self.table_size
    # Hash function
    def hashfuction(self, key):
        count_char = 0
        key_string = str(key)
        for key_char in key_string:
            count_char += ord(key_char)
        length = len(str(count_char))
        if length > 3:
            mid_int = 100*int(str(count_char)[length//2-1]) + 10*int(str(count_char)[length//2]) + int(str(count_char)[length//2+1])
        else:
            mid_int = count_char
        return mid_int%self.table_size
    # Rehash function (linear probing with step 3)
    def rehash(self, hash_value):
        return (hash_value+3)%self.table_size
    # Set item
    def __setitem__(self, key, value):
        hash_value = self.hashfuction(key)
        if None == self.key_list[hash_value]:
            pass
        elif key == self.key_list[hash_value]:
            pass
        else:
            hash_value = self.rehash(hash_value)
            while(None != self.key_list[hash_value] and (key != self.key_list[hash_value])):
                hash_value = self.rehash(hash_value)
        self.key_list[hash_value] = key
        self.value_list[hash_value] = value
    # Get item
    def __getitem__(self, key):
        hash_value = self.hashfuction(key)
        first_hash = hash_value
        if None == self.key_list[hash_value]:
            return None
        elif key == self.key_list[hash_value]:
            return self.value_list[hash_value]
        else:
            hash_value = self.rehash(hash_value)
            while(None != self.key_list[hash_value] and (key != self.key_list[hash_value])):
                hash_value = self.rehash(hash_value)
            if hash_value == first_hash:
                return None
            if None == self.key_list[hash_value]:
                return None
            else:
                return self.value_list[hash_value]
    # Delete item
    def __delitem__(self, key):
        hash_value = self.hashfuction(key)
        first_hash = hash_value
        if None == self.key_list[hash_value]:
            return
        elif key == self.key_list[hash_value]:
            self.key_list[hash_value] = None
            self.value_list[hash_value] = None
            return
        else:
            hash_value = self.rehash(hash_value)
            while(None != self.key_list[hash_value] and (key != self.key_list[hash_value])):
                hash_value = self.rehash(hash_value)
            if hash_value == first_hash:
                return
            if None == self.key_list[hash_value]:
                return
            else:
                self.key_list[hash_value] = None
                self.value_list[hash_value] = None
                return
    # Length of dictionary
    def __len__(self):
        count = 0
        for key in self.key_list:
            if key != None:
                count += 1
        return count

def main():
    H = MyDictionary()
    H["kcat"]="cat"
    H["kdog"]="dog"
    H["klion"]="lion"
    H["ktiger"]="tiger"
    H["kbird"]="bird"
    H["kcow"]="cow"
    H["kgoat"]="goat"
    H["pig"]="pig"
    H["chicken"]="chicken"
    print("Dictionary length is %d"%len(H))
    print("Key %s value is %s"%("kcow",H["kcow"]))
    print("Dictionary length is %d"%len(H))
    print("Key %s value is %s"%("kmonkey",H["kmonkey"]))
    print("Dictionary length is %d"%len(H))
    del H["klion"]
    print("Dictionary length is %d"%len(H))
    print(H.key_list)
    print(H.value_list)

if __name__ == "__main__":
    main()
</code>

These examples illustrate how hash tables work, how to handle collisions, and how to manage load factor and resizing in Python.

Data Structureshash tablehash functioncollision resolutiondictionaryload factor
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.