Fundamentals 12 min read

Understanding HashMap Implementations: A Comparative Analysis of Hash Tables and HAMT

This article explores two fundamental implementations of the HashMap data structure, comparing traditional Hash Tables with Hash Array Mapped Tries (HAMT), while analyzing their collision resolution strategies, hash functions, time and space complexities, and practical applications in modern programming languages like Ruby and functional paradigms.

Liulishuo Tech Team

Mar 10, 2017

HashMap is a fundamental data structure widely used across programming languages. This article introduces two common implementation approaches: the traditional Hash Table and the more complex Hash Array Mapped Trie (HAMT), analyzing their respective time and space efficiencies.

Hash Table

The most straightforward implementation uses an array to store elements, mapping keys to indices via a hash function. For example, using modulo arithmetic on an array of length 11 allows O(1) insertion and retrieval. However, collisions occur when multiple keys map to the same index.

def set(key, value)
  @hash[key.to_i % 11] = value
end

Collision Resolution

A common solution uses linked lists to chain elements sharing the same hash index. In the worst case, such as inserting multiples of the array size, lookup degrades to O(n). Practical implementations mitigate this by resizing buckets and rehashing when chains exceed a threshold.

class Node
  attr_accessor :value, :next
  def initialize(value, next = nil)
    @value = value
    @next = next
  end
end

class Hash
  HASH_NUM = 11
  def initialize
    @hash = Array.new
  end
  def set(key, value)
    hash_key = key.to_i % HASH_NUM
    @hash[hash_key] = Node.new([key, value], @hash[hash_key])
  end
  def get(key)
    node = @hash[key.to_i % HASH_NUM]
    value = nil
    while next = node.try(:next) do
      if next.value[0] == key
        value = next.value[1]
        break
      end
      node = next
    end
  end
end

Hash Functions

Simple modulo operations have limitations, including high collision rates and expensive rehashing during resizing. Advanced functions like MurmurHash or consistent hashing address these issues. In Ruby, objects are hashed based on memory addresses or element aggregation for strings/arrays. Cryptographic hashes like MD5 and SHA-1 are also widely used, though SHA-1 collision vulnerabilities highlight security trade-offs.

Time and Space Complexity

Performance heavily depends on the hash function. Ideal cases achieve O(1) time and constant space, while poor functions degrade to O(n). Ruby optimizes small hashes (≤6 elements) by storing them as arrays and using linear search, bypassing hash calculations entirely, which explains benchmark performance drops at 7+ elements.

Benchmark.ips do |x|
  x.report(5) do
    { a: 0, b: 1, c: 2, d: 3, e: 4 }
  end
  x.report(6) do
    { a: 0, b: 1, c: 2, d: 3, e: 4, f: 5 }
  end
  x.report(7) do
    { a: 0, b: 1, c: 2, d: 3, e: 4, f: 5, g: 6 }
  end
  x.report(8) do
    { a: 0, b: 1, c: 2, d: 3, e: 4, f: 5, g: 6, h: 7 }
  end
  x.compare!
end

Hash Array Mapped Trie (HAMT)

Functional programming emphasizes immutability and concurrency, driving the adoption of persistent data structures. HAMT serves as an efficient persistent HashMap implementation, preserving old states while creating new versions upon modification.

Vector Trie

Before HAMT, understanding Vector Trie is essential. It stores data in leaf nodes, using a Trie structure for indexing instead of linked pointers. Indices are converted to binary (or base-32) to traverse the tree, achieving O(log₃₂N) time complexity. While not strictly O(1), it offers excellent performance for persistent arrays.

HAMT Implementation

By adapting Vector Trie for sparse, non-contiguous hash spaces, HAMT efficiently represents hash maps. The wide hash space minimizes collisions, making tree traversal highly competitive despite being slower than direct array access.

Persistent HAMT

HAMT excels in persistence by modifying only affected subtrees. Instead of altering existing nodes, new nodes are created along the path to the root, preserving the original tree until the new version is committed. This minimizes memory overhead and maintains structural sharing.

Time and Space Complexity

Persistence inherently increases memory usage and slightly slows queries due to structural complexity. However, HAMT achieves near-constant time complexity and high space efficiency by omitting empty tree nodes. Optimizations like path compression and node compaction further enhance performance.

Conclusion

Both Hash Tables and HAMT offer distinct advantages depending on the use case. Traditional Hash Tables provide simplicity and speed, as seen in Ruby MRI, while HAMT supports immutability and concurrency in functional languages. Understanding these implementations provides a solid foundation for selecting the right data structure for specific programming paradigms.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

HashMap Data Structures hash table algorithm complexity HAMT Persistent Data Structures Ruby Programming

Written by

Liulishuo Tech Team

Help everyone become a global citizen!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.