Fundamentals 13 min read

Understanding Hash Tables: Concepts, Collision Resolution, and Python Implementations

This article explains the concept of hashing and hash tables, describes their characteristics and common collision‑resolution strategies such as open addressing, linear and quadratic probing, chaining, and rehashing, and provides complete Python implementations including a simple hash class, a dynamic map class, and a custom dictionary.

Top Architecture Tech Stack
Top Architecture Tech Stack
Top Architecture Tech Stack
Understanding Hash Tables: Concepts, Collision Resolution, and Python Implementations

Hashing is a method of processing data using a hash function that maps keys to indices, creating a hash table for efficient search.

A hash table stores key‑value pairs directly in memory based on the hash of the key. The hash function determines the storage location.

Characteristics of hash tables include direct access, possible collisions when different keys map to the same index, and the desire for uniform hash functions to reduce collisions.

Collision handling methods covered are open addressing (linear probing, quadratic probing), chaining (linked list at each bucket), rehashing, and using a separate overflow area.

Open addressing uses a probe function such as h(k,i) = (hash(k) + i) % table_size , where i is the probe count.

Code segment 1 shows a simple fixed‑size hash table implementation in Python using linear probing.

class Hash:
    # 表的长度定位11
    def __init__(self):
        self.hash_table=[[None,None] for i in range(11)]
    # 散列函数
    def hash(self,k,i):
        h_value=(k+i)%11
        if self.hash_table[h_value][0]==k:
            return h_value
        if self.hash_table[h_value][0]!=None:
            i+=1
            h_value=self.hash(k,i)
        return h_value
    def put(self,k,v):
        hash_v=self.hash(k,0)
        self.hash_table[hash_v][0]=k
        self.hash_table[hash_v][1]=v
    def get(self,k):
        hash_v=self.hash(k,0)
        return self.hash_table[hash_v][1]

hash = Hash()
hash.put(1 ,'wang')
print(hash.get(1))

The load factor concept is introduced; when the ratio of stored elements exceeds 0.75, the table should be resized, as demonstrated in code segment 2.

class Map:
    def __init__(self):
        self.capacity=11
        self.hash_table=[[None,None] for i in range(self.capacity)]
        self.num=0
        self.load_factor=0.75
    def hash(self,k,i):
        h_value=(k+i)%self.capacity
        if self.hash_table[h_value][0]==k:
            return h_value
        if self.hash_table[h_value][0]!=None:
            i+=1
            h_value=self.hash(k,i)
        return h_value
    def resize(self):
        self.capacity=self.num*2
        temp=self.hash_table[:]
        self.hash_table=[[None,None] for i in range(self.capacity)]
        for i in temp:
            if(i[0]!=None):
                hash_v=self.hash(i[0],0)
                self.hash_table[hash_v][0]=i[0]
                self.hash_table[hash_v][1]=i[1]
    def put(self,k,v):
        hash_v=self.hash(k,0)
        self.hash_table[hash_v][0]=k
        self.hash_table[hash_v][1]=v
        self.num+=1
        if(self.num/len(self.hash_table)>self.load_factor):
            self.resize()
    def get(self,k):
        hash_v=self.hash(k,0)
        return self.hash_table[hash_v][1]

A practical example is Python's built‑in dictionary, which uses a hash table internally. A custom dictionary class (MyDictionary) is implemented to illustrate insertion, lookup, deletion, and length operations with collision handling.

class MyDictionary(object):
    # 字典类的初始化
    def __init__(self):
        self.table_size = 13 # 哈希表的大小
        self.key_list = [None]*self.table_size #用以存储key的列表
        self.value_list = [None]*self.table_size #用以存储value的列表
    # 散列函数,返回散列值
    def hashfuction(self, key):
        count_char = 0
        key_string = str(key)
        for key_char in key_string:
            count_char += ord(key_char)
        length = len(str(count_char))
        if length > 3 :
            mid_int = 100*int((str(count_char)[length//2-1])) \
                    + 10*int((str(count_char)[length//2])) \
                    + 1*int((str(count_char)[length//2+1]))
        else:
            mid_int = count_char
        return mid_int%self.table_size
    def rehash(self, hash_value):
        return (hash_value+3)%self.table_size #线性探测步长为3
    def __setitem__(self, key, value):
        hash_value = self.hashfuction(key)
        if None == self.key_list[hash_value]:
            pass
        elif key == self.key_list[hash_value]:
            pass
        else:
            hash_value = self.rehash(hash_value)
            while (None != self.key_list[hash_value]) and (key != self.key_list[hash_value]):
                hash_value = self.rehash(hash_value)
        self.key_list[hash_value] = key
        self.value_list[hash_value] = value
    def __getitem__(self, key):
        hash_value = self.hashfuction(key)
        first_hash = hash_value
        if None == self.key_list[hash_value]:
            return None
        elif key == self.key_list[hash_value]:
            return self.value_list[hash_value]
        else:
            hash_value = self.rehash(hash_value)
            while (None != self.key_list[hash_value]) and (key != self.key_list[hash_value]):
                hash_value = self.rehash(hash_value)
                if hash_value == first_hash:
                    return None
            if None == self.key_list[hash_value]:
                return None
            else:
                return self.value_list[hash_value]
    def __delitem__(self, key):
        hash_value = self.hashfuction(key)
        first_hash = hash_value
        if None == self.key_list[hash_value]:
            return
        elif key == self.key_list[hash_value]:
            self.key_list[hash_value] = None
            self.value_list[hash_value] = None
            return
        else:
            hash_value = self.rehash(hash_value)
            while (None != self.key_list[hash_value]) and (key != self.key_list[hash_value]):
                hash_value = self.rehash(hash_value)
                if hash_value == first_hash:
                    return
            if None == self.key_list[hash_value]:
                return
            else:
                self.key_list[hash_value] = None
                self.value_list[hash_value] = None
                return
    def __len__(self):
        count = 0
        for key in self.key_list:
            if key != None:
                count += 1
        return count

def main():
    H = MyDictionary()
    H["kcat"]="cat"
    H["kdog"]="dog"
    H["klion"]="lion"
    H["ktiger"]="tiger"
    H["kbird"]="bird"
    H["kcow"]="cow"
    H["kgoat"]="goat"
    H["pig"]="pig"
    H["chicken"]="chicken"
    print("字典的长度为%d"%len(H))
    print("键 %s 的值为为 %s"%("kcow",H["kcow"]))
    print("字典的长度为%d"%len(H))
    print("键 %s 的值为为 %s"%("kmonkey",H["kmonkey"]))
    print("字典的长度为%d"%len(H))
    del H["klion"]
    print("字典的长度为%d"%len(H))
    print(H.key_list)
    print(H.value_list)
    if __name__ == "__main__":    main()

Overall, the article provides a comprehensive overview of hash tables, their properties, collision resolution techniques, and concrete Python code examples for building and extending hash‑based data structures.

PythonData Structureshash tablecollision resolutionchainingopen addressing
Top Architecture Tech Stack
Written by

Top Architecture Tech Stack

Sharing Java and Python tech insights, with occasional practical development tool tips.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.