When to use hash tables?

Use when fast search is needed. Can find values in O(1) average time complexity.

How to resolve hash collisions?

Use Chaining (linked lists) or Open Addressing (find another empty slot).

Is Python dict a hash table?

Yes, Python dict is implemented as a hash table.

Hash Table | O(1) Search Data Structure Complete Guide

2026년 3월 28일 · 20분 읽기 · 수정 2026년 3월 28일 Intermediate Tutorial

이 글의 핵심

Practical guide to hash tables. Complete coverage of O(1) search data structure with detailed examples and problem-solving patterns.

Introduction

What is a Hash Table?

A hash table is a data structure that stores key-value pairs, enabling search/insertion/deletion in average O(1) time complexity. O(1) means nearly constant time regardless of input size.

Other names:

Hash Map
Dictionary (Python)
Associative Array
Map (C++, Java)

Why is it fast?

Arrays allow O(1) access by index. Hash tables use a hash function (converts key to fixed-length number/slot) to transform keys into indices, enabling array-like fast access.

# Array: access by index
arr = [10, 20, 30, 40, 50]
print(arr[2])  # 30 (O(1))

# Hash table: access by key
hash_table = {"apple": 100, "banana": 200}
print(hash_table["apple"])  # 100 (O(1))
# Internally: hash("apple") % size → index → value

Time Complexity Comparison:

Data Structure	Search	Insert	Delete
Array (unsorted)	O(n)	O(1)	O(n)
Array (sorted)	O(log n)	O(n)	O(n)
Linked List	O(n)	O(1)	O(n)
Hash Table	O(1)	O(1)	O(1)

1. Hash Function

What is a Hash Function?

A hash function converts arbitrary-size data into a fixed-size value (hash value).

Good Hash Function Properties:

✅ Deterministic: Same input → same output
✅ Fast Computation: O(1)
✅ Uniform Distribution: Hash values evenly distributed
✅ Minimize Collisions: Different inputs → different outputs (ideally)

Simple Hash Function Example

# Method 1: Python built-in hash() + modulo
def simple_hash(key, size):
    return hash(key) % size

print(simple_hash("apple", 10))   # 8 (store at index 8)
print(simple_hash("banana", 10))  # 3 (store at index 3)
print(simple_hash("cherry", 10))  # 6 (store at index 6)

# Method 2: String hash (manual implementation)
def string_hash(s, size):
    hash_value = 0
    for char in s:
        hash_value = (hash_value * 31 + ord(char)) % size
    return hash_value

print(string_hash("apple", 10))

# Python built-in hash()
print(hash("apple"))   # -6076896708304529682 (example)
print(hash(42))        # 42 (integers hash to themselves)
print(hash((1, 2)))    # 3713081631934410656 (tuple hash)

# ❌ Mutable objects cannot be hashed
# print(hash([1, 2]))  # TypeError: unhashable type: 'list'

Hash Collision

Collision occurs when different keys produce the same hash value.

# Collision example
def bad_hash(key, size):
    return len(key) % size  # Bad hash function

print(bad_hash("apple", 10))   # 5
print(bad_hash("grape", 10))   # 5 (collision!)
print(bad_hash("peach", 10))   # 5 (collision!)

2. Python dict Usage

Basic Operations

# Creation
hash_table = {}
hash_table = dict()

# Insert O(1)
hash_table["apple"] = 100
hash_table["banana"] = 200
hash_table["cherry"] = 300

# Search O(1)
print(hash_table["apple"])  # 100
print(hash_table.get("grape", 0))  # 0 (default)

# Update O(1)
hash_table["apple"] = 150
print(hash_table["apple"])  # 150

# Delete O(1)
del hash_table["apple"]
print("apple" in hash_table)  # False

# Existence check O(1)
if "banana" in hash_table:
    print("exists")

# Iteration O(n)
for key in hash_table:
    print(key, hash_table[key])

for key, value in hash_table.items():
    print(f"{key}: {value}")

# Keys/values lists
keys = list(hash_table.keys())
values = list(hash_table.values())

3. Collision Resolution

Method 1: Chaining

Store values with same index in a linked list.

class HashTableChaining:
    def __init__(self, size=10):
        self.size = size
        self.table = [[] for _ in range(size)]
    
    def _hash(self, key):
        return hash(key) % self.size
    
    def insert(self, key, value):
        idx = self._hash(key)
        
        # Update if key exists
        for i, (k, v) in enumerate(self.table[idx]):
            if k == key:
                self.table[idx][i] = (key, value)
                return
        
        # Add if not exists
        self.table[idx].append((key, value))
    
    def get(self, key):
        idx = self._hash(key)
        
        for k, v in self.table[idx]:
            if k == key:
                return v
        
        return None

# Test
ht = HashTableChaining(size=5)
ht.insert("apple", 100)
ht.insert("banana", 200)
print(ht.get("apple"))  # 100

Method 2: Open Addressing

Find another empty slot on collision.

class HashTableLinearProbing:
    def __init__(self, size=10):
        self.size = size
        self.keys = [None] * size
        self.values = [None] * size
    
    def _hash(self, key):
        return hash(key) % self.size
    
    def insert(self, key, value):
        idx = self._hash(key)
        
        while self.keys[idx] is not None:
            if self.keys[idx] == key:
                self.values[idx] = value
                return
            idx = (idx + 1) % self.size
        
        self.keys[idx] = key
        self.values[idx] = value
    
    def get(self, key):
        idx = self._hash(key)
        
        while self.keys[idx] is not None:
            if self.keys[idx] == key:
                return self.values[idx]
            idx = (idx + 1) % self.size
        
        return None

4. Problem Solving

Problem 1: Two Sum

def two_sum(arr, target):
    """
    Find indices of two numbers that sum to target
    [2, 7, 11, 15], target=9 → [0, 1]
    
    Time: O(n), Space: O(n)
    """
    seen = {}  # value: index
    
    for i, num in enumerate(arr):
        complement = target - num
        
        if complement in seen:
            return [seen[complement], i]
        
        seen[num] = i
    
    return []

# Test
arr = [2, 7, 11, 15]
print(two_sum(arr, 9))   # [0, 1]
print(two_sum(arr, 13))  # [0, 2]

Problem 2: Group Anagrams

def group_anagrams(strs):
    """
    Group anagrams together
    ["eat", "tea", "tan", "ate", "nat", "bat"]
    → [["eat", "tea", "ate"], ["tan", "nat"], ["bat"]]
    
    Time: O(n * k log k), Space: O(n * k)
    """
    anagrams = {}
    
    for word in strs:
        key = ''.join(sorted(word))
        
        if key not in anagrams:
            anagrams[key] = []
        anagrams[key].append(word)
    
    return list(anagrams.values())

# Test
strs = ["eat", "tea", "tan", "ate", "nat", "bat"]
print(group_anagrams(strs))
# [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]

Problem 3: Longest Substring Without Repeating Characters

def length_of_longest_substring(s):
    """
    "abcabcbb" → 3 ("abc")
    "bbbbb" → 1 ("b")
    
    Time: O(n), Space: O(min(n, m))
    """
    char_index = {}
    max_length = 0
    start = 0
    
    for i, char in enumerate(s):
        if char in char_index and char_index[char] >= start:
            start = char_index[char] + 1
        
        char_index[char] = i
        max_length = max(max_length, i - start + 1)
    
    return max_length

# Test
print(length_of_longest_substring("abcabcbb"))  # 3
print(length_of_longest_substring("bbbbb"))     # 1
print(length_of_longest_substring("pwwkew"))    # 3

5. Python Collections Module

Counter

from collections import Counter

# Create
counter = Counter([1, 2, 2, 3, 3, 3])
print(counter)  # Counter({3: 3, 2: 2, 1: 1})

# String count
text = "hello world"
counter = Counter(text)
print(counter)  # Counter({'l': 3, 'o': 2, ...})

# most_common(): most frequent elements
print(counter.most_common(3))  # [('l', 3), ('o', 2), ('h', 1)]

# Operations
c1 = Counter(['a', 'b', 'c', 'a'])
c2 = Counter(['a', 'b', 'b'])

print(c1 + c2)  # Counter({'a': 3, 'b': 3, 'c': 1})
print(c1 - c2)  # Counter({'a': 1, 'c': 1})
print(c1 & c2)  # Counter({'a': 1, 'b': 1}) (intersection)
print(c1 | c2)  # Counter({'a': 2, 'b': 2, 'c': 1}) (union)

defaultdict

from collections import defaultdict

# int default (0)
word_count = defaultdict(int)
for word in ["apple", "banana", "apple"]:
    word_count[word] += 1
print(dict(word_count))  # {'apple': 2, 'banana': 1}

# list default ([])
groups = defaultdict(list)
for name, grade in [("Alice", "A"), ("Bob", "B"), ("Charlie", "A")]:
    groups[grade].append(name)
print(dict(groups))  # {'A': ['Alice', 'Charlie'], 'B': ['Bob']}

LRU Cache Implementation

from collections import OrderedDict

class LRUCache:
    def __init__(self, capacity):
        self.cache = OrderedDict()
        self.capacity = capacity
    
    def get(self, key):
        if key not in self.cache:
            return -1
        
        self.cache.move_to_end(key)
        return self.cache[key]
    
    def put(self, key, value):
        if key in self.cache:
            self.cache.move_to_end(key)
        
        self.cache[key] = value
        
        if len(self.cache) > self.capacity:
            self.cache.popitem(last=False)

# Test
cache = LRUCache(3)
cache.put("a", 1)
cache.put("b", 2)
cache.put("c", 3)

cache.get("a")
cache.put("d", 4)  # Removes 'b'

6. Usage Patterns

Pattern 1: Frequency Counting

def char_frequency(s):
    freq = {}
    for char in s:
        freq[char] = freq.get(char, 0) + 1
    return freq

print(char_frequency("hello"))  # {'h': 1, 'e': 1, 'l': 2, 'o': 1}

# Using Counter
from collections import Counter
print(dict(Counter("hello")))

Pattern 2: Grouping

from collections import defaultdict

words = ["apple", "pie", "banana", "cat", "dog", "cherry"]

groups = defaultdict(list)
for word in words:
    groups[len(word)].append(word)

print(dict(groups))
# {5: ['apple'], 3: ['pie', 'cat', 'dog'], 6: ['banana', 'cherry']}

Pattern 3: Caching (Memoization)

# Fibonacci with caching
def fib_fast(n, cache={}):
    if n in cache:
        return cache[n]
    
    if n <= 1:
        return n
    
    cache[n] = fib_fast(n-1, cache) + fib_fast(n-2, cache)
    return cache[n]

print(fib_fast(30))  # 832040 (fast)

# Using functools.lru_cache (recommended)
from functools import lru_cache

@lru_cache(maxsize=None)
def fib(n):
    if n <= 1:
        return n
    return fib(n-1) + fib(n-2)

print(fib(30))  # 832040 (very fast)

Pattern 4: Index Mapping

def find_indices(arr, target):
    """Find all indices of target value"""
    index_map = {}
    
    for i, num in enumerate(arr):
        if num not in index_map:
            index_map[num] = []
        index_map[num].append(i)
    
    return index_map.get(target, [])

arr = [1, 2, 3, 2, 4, 2, 5]
print(find_indices(arr, 2))  # [1, 3, 5]

7. Common Mistakes

Mistake 1: Using Mutable Objects as Keys

# ❌ Wrong
# hash_table = {[1, 2]: "value"}  # TypeError: unhashable type: 'list'

# ✅ Correct
hash_table = {(1, 2): "value"}  # Use tuple
print(hash_table[(1, 2)])  # value

Mistake 2: Missing KeyError Handling

# ❌ Wrong
hash_table = {"a": 1}
# print(hash_table["b"])  # KeyError: 'b'

# ✅ Correct 1: Use get()
value = hash_table.get("b", 0)
print(value)  # 0

# ✅ Correct 2: Check with in
if "b" in hash_table:
    print(hash_table["b"])
else:
    print("Key not found")

Mistake 3: Modifying During Iteration

# ❌ Wrong
hash_table = {"a": 1, "b": 2, "c": 3}
# for key in hash_table:
#     if hash_table[key] == 2:
#         del hash_table[key]  # RuntimeError

# ✅ Correct 1: Convert to list
hash_table = {"a": 1, "b": 2, "c": 3}
for key in list(hash_table.keys()):
    if hash_table[key] == 2:
        del hash_table[key]

# ✅ Correct 2: Dictionary comprehension
hash_table = {"a": 1, "b": 2, "c": 3}
hash_table = {k: v for k, v in hash_table.items() if v != 2}

8. Performance Optimization

Load Factor Management

class DynamicHashTable:
    def __init__(self, initial_size=8):
        self.size = initial_size
        self.count = 0
        self.table = [[] for _ in range(self.size)]
    
    def _load_factor(self):
        return self.count / self.size
    
    def _resize(self):
        """Double size when load factor > 0.7"""
        old_table = self.table
        self.size *= 2
        self.table = [[] for _ in range(self.size)]
        self.count = 0
        
        for bucket in old_table:
            for key, value in bucket:
                self.insert(key, value)
    
    def insert(self, key, value):
        if self._load_factor() > 0.7:
            self._resize()
        
        idx = self._hash(key)
        
        for i, (k, v) in enumerate(self.table[idx]):
            if k == key:
                self.table[idx][i] = (key, value)
                return
        
        self.table[idx].append((key, value))
        self.count += 1

Summary

Key Points

Hash Table: Key-value storage, average O(1) search/insert/delete
Hash Function: Convert key to index, uniform distribution important
Collision Resolution:
- Chaining: Store in linked list
- Open Addressing: Find another empty slot
Python Tools:
- dict: Basic hash table
- Counter: Frequency counting
- defaultdict: Auto-create default values
- OrderedDict: Maintain order + extra features

When to Use

Situation	Data Structure	Time Complexity
Fast search	Hash Table	O(1)
Frequency count	Counter	O(n)
Duplicate check	set	O(n)
Grouping	defaultdict	O(n)
Caching	dict / LRU Cache	O(1)

Time Complexity Improvement Pattern

# Before: O(n²) - nested loops
def has_duplicate_slow(arr):
    for i in range(len(arr)):
        for j in range(i+1, len(arr)):
            if arr[i] == arr[j]:
                return True
    return False

# After: O(n) - hash table
def has_duplicate_fast(arr):
    seen = set()
    for num in arr:
        if num in seen:
            return True
        seen.add(num)
    return False

# Or simpler
def has_duplicate(arr):
    return len(arr) != len(set(arr))

Keywords

Hash Table, HashMap, Dictionary, Data Structure, O(1) Search, Hash Function, Collision Resolution, Chaining, Open Addressing, Counter, defaultdict, Coding Interview, Algorithm