Understanding Hash Tables: Concepts, Collision Resolution, and Python Implementations
This article explains the concept of hashing and hash tables, describes their characteristics and common collision‑resolution strategies such as open addressing, linear and quadratic probing, chaining, and rehashing, and provides complete Python implementations including a simple hash class, a dynamic map class, and a custom dictionary.
Hashing is a method of processing data using a hash function that maps keys to indices, creating a hash table for efficient search.
A hash table stores key‑value pairs directly in memory based on the hash of the key. The hash function determines the storage location.
Characteristics of hash tables include direct access, possible collisions when different keys map to the same index, and the desire for uniform hash functions to reduce collisions.
Collision handling methods covered are open addressing (linear probing, quadratic probing), chaining (linked list at each bucket), rehashing, and using a separate overflow area.
Open addressing uses a probe function such as h(k,i) = (hash(k) + i) % table_size , where i is the probe count.
Code segment 1 shows a simple fixed‑size hash table implementation in Python using linear probing.
class Hash:
# 表的长度定位11
def __init__(self):
self.hash_table=[[None,None] for i in range(11)]
# 散列函数
def hash(self,k,i):
h_value=(k+i)%11
if self.hash_table[h_value][0]==k:
return h_value
if self.hash_table[h_value][0]!=None:
i+=1
h_value=self.hash(k,i)
return h_value
def put(self,k,v):
hash_v=self.hash(k,0)
self.hash_table[hash_v][0]=k
self.hash_table[hash_v][1]=v
def get(self,k):
hash_v=self.hash(k,0)
return self.hash_table[hash_v][1]
hash = Hash()
hash.put(1 ,'wang')
print(hash.get(1))The load factor concept is introduced; when the ratio of stored elements exceeds 0.75, the table should be resized, as demonstrated in code segment 2.
class Map:
def __init__(self):
self.capacity=11
self.hash_table=[[None,None] for i in range(self.capacity)]
self.num=0
self.load_factor=0.75
def hash(self,k,i):
h_value=(k+i)%self.capacity
if self.hash_table[h_value][0]==k:
return h_value
if self.hash_table[h_value][0]!=None:
i+=1
h_value=self.hash(k,i)
return h_value
def resize(self):
self.capacity=self.num*2
temp=self.hash_table[:]
self.hash_table=[[None,None] for i in range(self.capacity)]
for i in temp:
if(i[0]!=None):
hash_v=self.hash(i[0],0)
self.hash_table[hash_v][0]=i[0]
self.hash_table[hash_v][1]=i[1]
def put(self,k,v):
hash_v=self.hash(k,0)
self.hash_table[hash_v][0]=k
self.hash_table[hash_v][1]=v
self.num+=1
if(self.num/len(self.hash_table)>self.load_factor):
self.resize()
def get(self,k):
hash_v=self.hash(k,0)
return self.hash_table[hash_v][1]A practical example is Python's built‑in dictionary, which uses a hash table internally. A custom dictionary class (MyDictionary) is implemented to illustrate insertion, lookup, deletion, and length operations with collision handling.
class MyDictionary(object):
# 字典类的初始化
def __init__(self):
self.table_size = 13 # 哈希表的大小
self.key_list = [None]*self.table_size #用以存储key的列表
self.value_list = [None]*self.table_size #用以存储value的列表
# 散列函数,返回散列值
def hashfuction(self, key):
count_char = 0
key_string = str(key)
for key_char in key_string:
count_char += ord(key_char)
length = len(str(count_char))
if length > 3 :
mid_int = 100*int((str(count_char)[length//2-1])) \
+ 10*int((str(count_char)[length//2])) \
+ 1*int((str(count_char)[length//2+1]))
else:
mid_int = count_char
return mid_int%self.table_size
def rehash(self, hash_value):
return (hash_value+3)%self.table_size #线性探测步长为3
def __setitem__(self, key, value):
hash_value = self.hashfuction(key)
if None == self.key_list[hash_value]:
pass
elif key == self.key_list[hash_value]:
pass
else:
hash_value = self.rehash(hash_value)
while (None != self.key_list[hash_value]) and (key != self.key_list[hash_value]):
hash_value = self.rehash(hash_value)
self.key_list[hash_value] = key
self.value_list[hash_value] = value
def __getitem__(self, key):
hash_value = self.hashfuction(key)
first_hash = hash_value
if None == self.key_list[hash_value]:
return None
elif key == self.key_list[hash_value]:
return self.value_list[hash_value]
else:
hash_value = self.rehash(hash_value)
while (None != self.key_list[hash_value]) and (key != self.key_list[hash_value]):
hash_value = self.rehash(hash_value)
if hash_value == first_hash:
return None
if None == self.key_list[hash_value]:
return None
else:
return self.value_list[hash_value]
def __delitem__(self, key):
hash_value = self.hashfuction(key)
first_hash = hash_value
if None == self.key_list[hash_value]:
return
elif key == self.key_list[hash_value]:
self.key_list[hash_value] = None
self.value_list[hash_value] = None
return
else:
hash_value = self.rehash(hash_value)
while (None != self.key_list[hash_value]) and (key != self.key_list[hash_value]):
hash_value = self.rehash(hash_value)
if hash_value == first_hash:
return
if None == self.key_list[hash_value]:
return
else:
self.key_list[hash_value] = None
self.value_list[hash_value] = None
return
def __len__(self):
count = 0
for key in self.key_list:
if key != None:
count += 1
return count
def main():
H = MyDictionary()
H["kcat"]="cat"
H["kdog"]="dog"
H["klion"]="lion"
H["ktiger"]="tiger"
H["kbird"]="bird"
H["kcow"]="cow"
H["kgoat"]="goat"
H["pig"]="pig"
H["chicken"]="chicken"
print("字典的长度为%d"%len(H))
print("键 %s 的值为为 %s"%("kcow",H["kcow"]))
print("字典的长度为%d"%len(H))
print("键 %s 的值为为 %s"%("kmonkey",H["kmonkey"]))
print("字典的长度为%d"%len(H))
del H["klion"]
print("字典的长度为%d"%len(H))
print(H.key_list)
print(H.value_list)
if __name__ == "__main__": main()Overall, the article provides a comprehensive overview of hash tables, their properties, collision resolution techniques, and concrete Python code examples for building and extending hash‑based data structures.
Top Architecture Tech Stack
Sharing Java and Python tech insights, with occasional practical development tool tips.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.