Understanding HashMap Implementations: A Comparative Analysis of Hash Tables and HAMT
This article explores two fundamental implementations of the HashMap data structure, comparing traditional Hash Tables with Hash Array Mapped Tries (HAMT), while analyzing their collision resolution strategies, hash functions, time and space complexities, and practical applications in modern programming languages like Ruby and functional paradigms.
HashMap is a fundamental data structure widely used across programming languages. This article introduces two common implementation approaches: the traditional Hash Table and the more complex Hash Array Mapped Trie (HAMT), analyzing their respective time and space efficiencies.
Hash Table
The most straightforward implementation uses an array to store elements, mapping keys to indices via a hash function. For example, using modulo arithmetic on an array of length 11 allows O(1) insertion and retrieval. However, collisions occur when multiple keys map to the same index.
def set(key, value)
@hash[key.to_i % 11] = value
endCollision Resolution
A common solution uses linked lists to chain elements sharing the same hash index. In the worst case, such as inserting multiples of the array size, lookup degrades to O(n). Practical implementations mitigate this by resizing buckets and rehashing when chains exceed a threshold.
class Node
attr_accessor :value, :next
def initialize(value, next = nil)
@value = value
@next = next
end
end
class Hash
HASH_NUM = 11
def initialize
@hash = Array.new
end
def set(key, value)
hash_key = key.to_i % HASH_NUM
@hash[hash_key] = Node.new([key, value], @hash[hash_key])
end
def get(key)
node = @hash[key.to_i % HASH_NUM]
value = nil
while next = node.try(:next) do
if next.value[0] == key
value = next.value[1]
break
end
node = next
end
end
endHash Functions
Simple modulo operations have limitations, including high collision rates and expensive rehashing during resizing. Advanced functions like MurmurHash or consistent hashing address these issues. In Ruby, objects are hashed based on memory addresses or element aggregation for strings/arrays. Cryptographic hashes like MD5 and SHA-1 are also widely used, though SHA-1 collision vulnerabilities highlight security trade-offs.
Time and Space Complexity
Performance heavily depends on the hash function. Ideal cases achieve O(1) time and constant space, while poor functions degrade to O(n). Ruby optimizes small hashes (≤6 elements) by storing them as arrays and using linear search, bypassing hash calculations entirely, which explains benchmark performance drops at 7+ elements.
Benchmark.ips do |x|
x.report(5) do
{ a: 0, b: 1, c: 2, d: 3, e: 4 }
end
x.report(6) do
{ a: 0, b: 1, c: 2, d: 3, e: 4, f: 5 }
end
x.report(7) do
{ a: 0, b: 1, c: 2, d: 3, e: 4, f: 5, g: 6 }
end
x.report(8) do
{ a: 0, b: 1, c: 2, d: 3, e: 4, f: 5, g: 6, h: 7 }
end
x.compare!
endHash Array Mapped Trie (HAMT)
Functional programming emphasizes immutability and concurrency, driving the adoption of persistent data structures. HAMT serves as an efficient persistent HashMap implementation, preserving old states while creating new versions upon modification.
Vector Trie
Before HAMT, understanding Vector Trie is essential. It stores data in leaf nodes, using a Trie structure for indexing instead of linked pointers. Indices are converted to binary (or base-32) to traverse the tree, achieving O(log₃₂N) time complexity. While not strictly O(1), it offers excellent performance for persistent arrays.
HAMT Implementation
By adapting Vector Trie for sparse, non-contiguous hash spaces, HAMT efficiently represents hash maps. The wide hash space minimizes collisions, making tree traversal highly competitive despite being slower than direct array access.
Persistent HAMT
HAMT excels in persistence by modifying only affected subtrees. Instead of altering existing nodes, new nodes are created along the path to the root, preserving the original tree until the new version is committed. This minimizes memory overhead and maintains structural sharing.
Time and Space Complexity
Persistence inherently increases memory usage and slightly slows queries due to structural complexity. However, HAMT achieves near-constant time complexity and high space efficiency by omitting empty tree nodes. Optimizations like path compression and node compaction further enhance performance.
Conclusion
Both Hash Tables and HAMT offer distinct advantages depending on the use case. Traditional Hash Tables provide simplicity and speed, as seen in Ruby MRI, while HAMT supports immutability and concurrency in functional languages. Understanding these implementations provides a solid foundation for selecting the right data structure for specific programming paradigms.
Liulishuo Tech Team
Help everyone become a global citizen!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.