Understanding Hash Tables and Implementing Them in Python
This article explains the concept of hashing and hash tables, describes their characteristics and collision‑resolution strategies, showcases practical Python implementations with load‑factor handling and resizing, and highlights common applications such as dictionaries and network session tables.
Hashing is a computer‑science technique that maps a key to an index using a hash function, creating a hash table that enables fast data retrieval.
A hash table stores data based on keys; the hash function determines the memory location for each key‑value pair.
It provides direct access without comparison, but different keys may map to the same index, causing collisions.
A uniform hash function distributes keys evenly, reducing collisions.
Collision‑resolution methods include open addressing (linear probing, quadratic probing), chaining (storing colliding elements in a linked list), rehashing, and using a separate overflow area.
Typical applications are dictionary implementations, network firewalls session tables, and extensive use in the Linux kernel.
Below is a simple hash table using open addressing with linear probing (Code Segment 1).
<code>class Hash:
# Table length set to 11
def __init__(self):
self.hash_table=[[None,None] for i in range(11)]
# Hash function
def hash(self,k,i):
h_value=(k+i)%11
if self.hash_table[h_value][0]==k:
return h_value
if self.hash_table[h_value][0]!=None:
i+=1
h_value=self.hash(k,i)
return h_value
def put(self,k,v):
hash_v=self.hash(k,0)
self.hash_table[hash_v][0]=k
self.hash_table[hash_v][1]=v
def get(self,k):
hash_v=self.hash(k,0)
return self.hash_table[hash_v][1]
hash = Hash()
hash.put(1 ,'wang')
print(hash.get(1))</code>The simple table has a fixed length of 11, so as more elements are added, the chance of collisions grows.
To address this, the load factor (ratio of stored elements to table size) is introduced; when it exceeds a threshold (e.g., 0.75), the table should be resized, typically doubling its capacity.
An improved implementation with automatic resizing is shown in Code Segment 2.
<code>class Map:
def __init__(self):
self.capacity=11
self.hash_table=[[None,None] for i in range(self.capacity)]
self.num=0
self.load_factor=0.75
def hash(self,k,i):
h_value=(k+i)%self.capacity
if self.hash_table[h_value][0]==k:
return h_value
if self.hash_table[h_value][0]!=None:
i+=1
h_value=self.hash(k,i)
return h_value
def resize(self):
# Expand to twice the number of inserted elements
self.capacity=self.num*2
temp=self.hash_table[:]
self.hash_table=[[None,None] for i in range(self.capacity)]
for i in temp:
if(i[0]!=None):
hash_v=self.hash(i[0],0)
self.hash_table[hash_v][0]=i[0]
self.hash_table[hash_v][1]=i[1]
def put(self,k,v):
hash_v=self.hash(k,0)
self.hash_table[hash_v][0]=k
self.hash_table[hash_v][1]=v
self.num+=1
if(self.num/len(self.hash_table)>self.load_factor):
self.resize()
def get(self,k):
hash_v=self.hash(k,0)
return self.hash_table[hash_v][1]
</code>Python’s built‑in dict uses a hash table; its syntax is demonstrated, followed by a custom dictionary class that implements its own hash function, rehashing, and basic operations (Code Segment 3).
<code>dictionary = {'name': 'wang', 'age': 17, 'class': 'first'}
dictionary['age'] = 18
dictionary['country'] = 'china'
print(dictionary['age'])
print(dictionary['country'])
class MyDictionary(object):
# Initialization
def __init__(self):
self.table_size = 13
self.key_list = [None]*self.table_size
self.value_list = [None]*self.table_size
# Hash function
def hashfuction(self, key):
count_char = 0
key_string = str(key)
for key_char in key_string:
count_char += ord(key_char)
length = len(str(count_char))
if length > 3:
mid_int = 100*int(str(count_char)[length//2-1]) + 10*int(str(count_char)[length//2]) + int(str(count_char)[length//2+1])
else:
mid_int = count_char
return mid_int%self.table_size
# Rehash function (linear probing with step 3)
def rehash(self, hash_value):
return (hash_value+3)%self.table_size
# Set item
def __setitem__(self, key, value):
hash_value = self.hashfuction(key)
if None == self.key_list[hash_value]:
pass
elif key == self.key_list[hash_value]:
pass
else:
hash_value = self.rehash(hash_value)
while(None != self.key_list[hash_value] and (key != self.key_list[hash_value])):
hash_value = self.rehash(hash_value)
self.key_list[hash_value] = key
self.value_list[hash_value] = value
# Get item
def __getitem__(self, key):
hash_value = self.hashfuction(key)
first_hash = hash_value
if None == self.key_list[hash_value]:
return None
elif key == self.key_list[hash_value]:
return self.value_list[hash_value]
else:
hash_value = self.rehash(hash_value)
while(None != self.key_list[hash_value] and (key != self.key_list[hash_value])):
hash_value = self.rehash(hash_value)
if hash_value == first_hash:
return None
if None == self.key_list[hash_value]:
return None
else:
return self.value_list[hash_value]
# Delete item
def __delitem__(self, key):
hash_value = self.hashfuction(key)
first_hash = hash_value
if None == self.key_list[hash_value]:
return
elif key == self.key_list[hash_value]:
self.key_list[hash_value] = None
self.value_list[hash_value] = None
return
else:
hash_value = self.rehash(hash_value)
while(None != self.key_list[hash_value] and (key != self.key_list[hash_value])):
hash_value = self.rehash(hash_value)
if hash_value == first_hash:
return
if None == self.key_list[hash_value]:
return
else:
self.key_list[hash_value] = None
self.value_list[hash_value] = None
return
# Length of dictionary
def __len__(self):
count = 0
for key in self.key_list:
if key != None:
count += 1
return count
def main():
H = MyDictionary()
H["kcat"]="cat"
H["kdog"]="dog"
H["klion"]="lion"
H["ktiger"]="tiger"
H["kbird"]="bird"
H["kcow"]="cow"
H["kgoat"]="goat"
H["pig"]="pig"
H["chicken"]="chicken"
print("Dictionary length is %d"%len(H))
print("Key %s value is %s"%("kcow",H["kcow"]))
print("Dictionary length is %d"%len(H))
print("Key %s value is %s"%("kmonkey",H["kmonkey"]))
print("Dictionary length is %d"%len(H))
del H["klion"]
print("Dictionary length is %d"%len(H))
print(H.key_list)
print(H.value_list)
if __name__ == "__main__":
main()
</code>These examples illustrate how hash tables work, how to handle collisions, and how to manage load factor and resizing in Python.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.