Hash Table | O(1) Search Data Structure Complete Guide
이 글의 핵심
Practical guide to hash tables. Complete coverage of O(1) search data structure with detailed examples and problem-solving patterns.
Introduction
What is a Hash Table?
A hash table is a data structure that stores key-value pairs, enabling search/insertion/deletion in average O(1) time complexity. O(1) means nearly constant time regardless of input size.
Other names:
- Hash Map
- Dictionary (Python)
- Associative Array
- Map (C++, Java)
Why is it fast?
Arrays allow O(1) access by index. Hash tables use a hash function (converts key to fixed-length number/slot) to transform keys into indices, enabling array-like fast access.
# Array: access by index
arr = [10, 20, 30, 40, 50]
print(arr[2]) # 30 (O(1))
# Hash table: access by key
hash_table = {"apple": 100, "banana": 200}
print(hash_table["apple"]) # 100 (O(1))
# Internally: hash("apple") % size → index → value
Time Complexity Comparison:
| Data Structure | Search | Insert | Delete |
|---|---|---|---|
| Array (unsorted) | O(n) | O(1) | O(n) |
| Array (sorted) | O(log n) | O(n) | O(n) |
| Linked List | O(n) | O(1) | O(n) |
| Hash Table | O(1) | O(1) | O(1) |
1. Hash Function
What is a Hash Function?
A hash function converts arbitrary-size data into a fixed-size value (hash value).
Good Hash Function Properties:
- ✅ Deterministic: Same input → same output
- ✅ Fast Computation: O(1)
- ✅ Uniform Distribution: Hash values evenly distributed
- ✅ Minimize Collisions: Different inputs → different outputs (ideally)
Simple Hash Function Example
# Method 1: Python built-in hash() + modulo
def simple_hash(key, size):
return hash(key) % size
print(simple_hash("apple", 10)) # 8 (store at index 8)
print(simple_hash("banana", 10)) # 3 (store at index 3)
print(simple_hash("cherry", 10)) # 6 (store at index 6)
# Method 2: String hash (manual implementation)
def string_hash(s, size):
hash_value = 0
for char in s:
hash_value = (hash_value * 31 + ord(char)) % size
return hash_value
print(string_hash("apple", 10))
# Python built-in hash()
print(hash("apple")) # -6076896708304529682 (example)
print(hash(42)) # 42 (integers hash to themselves)
print(hash((1, 2))) # 3713081631934410656 (tuple hash)
# ❌ Mutable objects cannot be hashed
# print(hash([1, 2])) # TypeError: unhashable type: 'list'
Hash Collision
Collision occurs when different keys produce the same hash value.
# Collision example
def bad_hash(key, size):
return len(key) % size # Bad hash function
print(bad_hash("apple", 10)) # 5
print(bad_hash("grape", 10)) # 5 (collision!)
print(bad_hash("peach", 10)) # 5 (collision!)
2. Python dict Usage
Basic Operations
# Creation
hash_table = {}
hash_table = dict()
# Insert O(1)
hash_table["apple"] = 100
hash_table["banana"] = 200
hash_table["cherry"] = 300
# Search O(1)
print(hash_table["apple"]) # 100
print(hash_table.get("grape", 0)) # 0 (default)
# Update O(1)
hash_table["apple"] = 150
print(hash_table["apple"]) # 150
# Delete O(1)
del hash_table["apple"]
print("apple" in hash_table) # False
# Existence check O(1)
if "banana" in hash_table:
print("exists")
# Iteration O(n)
for key in hash_table:
print(key, hash_table[key])
for key, value in hash_table.items():
print(f"{key}: {value}")
# Keys/values lists
keys = list(hash_table.keys())
values = list(hash_table.values())
3. Collision Resolution
Method 1: Chaining
Store values with same index in a linked list.
class HashTableChaining:
def __init__(self, size=10):
self.size = size
self.table = [[] for _ in range(size)]
def _hash(self, key):
return hash(key) % self.size
def insert(self, key, value):
idx = self._hash(key)
# Update if key exists
for i, (k, v) in enumerate(self.table[idx]):
if k == key:
self.table[idx][i] = (key, value)
return
# Add if not exists
self.table[idx].append((key, value))
def get(self, key):
idx = self._hash(key)
for k, v in self.table[idx]:
if k == key:
return v
return None
# Test
ht = HashTableChaining(size=5)
ht.insert("apple", 100)
ht.insert("banana", 200)
print(ht.get("apple")) # 100
Method 2: Open Addressing
Find another empty slot on collision.
class HashTableLinearProbing:
def __init__(self, size=10):
self.size = size
self.keys = [None] * size
self.values = [None] * size
def _hash(self, key):
return hash(key) % self.size
def insert(self, key, value):
idx = self._hash(key)
while self.keys[idx] is not None:
if self.keys[idx] == key:
self.values[idx] = value
return
idx = (idx + 1) % self.size
self.keys[idx] = key
self.values[idx] = value
def get(self, key):
idx = self._hash(key)
while self.keys[idx] is not None:
if self.keys[idx] == key:
return self.values[idx]
idx = (idx + 1) % self.size
return None
4. Problem Solving
Problem 1: Two Sum
def two_sum(arr, target):
"""
Find indices of two numbers that sum to target
[2, 7, 11, 15], target=9 → [0, 1]
Time: O(n), Space: O(n)
"""
seen = {} # value: index
for i, num in enumerate(arr):
complement = target - num
if complement in seen:
return [seen[complement], i]
seen[num] = i
return []
# Test
arr = [2, 7, 11, 15]
print(two_sum(arr, 9)) # [0, 1]
print(two_sum(arr, 13)) # [0, 2]
Problem 2: Group Anagrams
def group_anagrams(strs):
"""
Group anagrams together
["eat", "tea", "tan", "ate", "nat", "bat"]
→ [["eat", "tea", "ate"], ["tan", "nat"], ["bat"]]
Time: O(n * k log k), Space: O(n * k)
"""
anagrams = {}
for word in strs:
key = ''.join(sorted(word))
if key not in anagrams:
anagrams[key] = []
anagrams[key].append(word)
return list(anagrams.values())
# Test
strs = ["eat", "tea", "tan", "ate", "nat", "bat"]
print(group_anagrams(strs))
# [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]
Problem 3: Longest Substring Without Repeating Characters
def length_of_longest_substring(s):
"""
"abcabcbb" → 3 ("abc")
"bbbbb" → 1 ("b")
Time: O(n), Space: O(min(n, m))
"""
char_index = {}
max_length = 0
start = 0
for i, char in enumerate(s):
if char in char_index and char_index[char] >= start:
start = char_index[char] + 1
char_index[char] = i
max_length = max(max_length, i - start + 1)
return max_length
# Test
print(length_of_longest_substring("abcabcbb")) # 3
print(length_of_longest_substring("bbbbb")) # 1
print(length_of_longest_substring("pwwkew")) # 3
5. Python Collections Module
Counter
from collections import Counter
# Create
counter = Counter([1, 2, 2, 3, 3, 3])
print(counter) # Counter({3: 3, 2: 2, 1: 1})
# String count
text = "hello world"
counter = Counter(text)
print(counter) # Counter({'l': 3, 'o': 2, ...})
# most_common(): most frequent elements
print(counter.most_common(3)) # [('l', 3), ('o', 2), ('h', 1)]
# Operations
c1 = Counter(['a', 'b', 'c', 'a'])
c2 = Counter(['a', 'b', 'b'])
print(c1 + c2) # Counter({'a': 3, 'b': 3, 'c': 1})
print(c1 - c2) # Counter({'a': 1, 'c': 1})
print(c1 & c2) # Counter({'a': 1, 'b': 1}) (intersection)
print(c1 | c2) # Counter({'a': 2, 'b': 2, 'c': 1}) (union)
defaultdict
from collections import defaultdict
# int default (0)
word_count = defaultdict(int)
for word in ["apple", "banana", "apple"]:
word_count[word] += 1
print(dict(word_count)) # {'apple': 2, 'banana': 1}
# list default ([])
groups = defaultdict(list)
for name, grade in [("Alice", "A"), ("Bob", "B"), ("Charlie", "A")]:
groups[grade].append(name)
print(dict(groups)) # {'A': ['Alice', 'Charlie'], 'B': ['Bob']}
LRU Cache Implementation
from collections import OrderedDict
class LRUCache:
def __init__(self, capacity):
self.cache = OrderedDict()
self.capacity = capacity
def get(self, key):
if key not in self.cache:
return -1
self.cache.move_to_end(key)
return self.cache[key]
def put(self, key, value):
if key in self.cache:
self.cache.move_to_end(key)
self.cache[key] = value
if len(self.cache) > self.capacity:
self.cache.popitem(last=False)
# Test
cache = LRUCache(3)
cache.put("a", 1)
cache.put("b", 2)
cache.put("c", 3)
cache.get("a")
cache.put("d", 4) # Removes 'b'
6. Usage Patterns
Pattern 1: Frequency Counting
def char_frequency(s):
freq = {}
for char in s:
freq[char] = freq.get(char, 0) + 1
return freq
print(char_frequency("hello")) # {'h': 1, 'e': 1, 'l': 2, 'o': 1}
# Using Counter
from collections import Counter
print(dict(Counter("hello")))
Pattern 2: Grouping
from collections import defaultdict
words = ["apple", "pie", "banana", "cat", "dog", "cherry"]
groups = defaultdict(list)
for word in words:
groups[len(word)].append(word)
print(dict(groups))
# {5: ['apple'], 3: ['pie', 'cat', 'dog'], 6: ['banana', 'cherry']}
Pattern 3: Caching (Memoization)
# Fibonacci with caching
def fib_fast(n, cache={}):
if n in cache:
return cache[n]
if n <= 1:
return n
cache[n] = fib_fast(n-1, cache) + fib_fast(n-2, cache)
return cache[n]
print(fib_fast(30)) # 832040 (fast)
# Using functools.lru_cache (recommended)
from functools import lru_cache
@lru_cache(maxsize=None)
def fib(n):
if n <= 1:
return n
return fib(n-1) + fib(n-2)
print(fib(30)) # 832040 (very fast)
Pattern 4: Index Mapping
def find_indices(arr, target):
"""Find all indices of target value"""
index_map = {}
for i, num in enumerate(arr):
if num not in index_map:
index_map[num] = []
index_map[num].append(i)
return index_map.get(target, [])
arr = [1, 2, 3, 2, 4, 2, 5]
print(find_indices(arr, 2)) # [1, 3, 5]
7. Common Mistakes
Mistake 1: Using Mutable Objects as Keys
# ❌ Wrong
# hash_table = {[1, 2]: "value"} # TypeError: unhashable type: 'list'
# ✅ Correct
hash_table = {(1, 2): "value"} # Use tuple
print(hash_table[(1, 2)]) # value
Mistake 2: Missing KeyError Handling
# ❌ Wrong
hash_table = {"a": 1}
# print(hash_table["b"]) # KeyError: 'b'
# ✅ Correct 1: Use get()
value = hash_table.get("b", 0)
print(value) # 0
# ✅ Correct 2: Check with in
if "b" in hash_table:
print(hash_table["b"])
else:
print("Key not found")
Mistake 3: Modifying During Iteration
# ❌ Wrong
hash_table = {"a": 1, "b": 2, "c": 3}
# for key in hash_table:
# if hash_table[key] == 2:
# del hash_table[key] # RuntimeError
# ✅ Correct 1: Convert to list
hash_table = {"a": 1, "b": 2, "c": 3}
for key in list(hash_table.keys()):
if hash_table[key] == 2:
del hash_table[key]
# ✅ Correct 2: Dictionary comprehension
hash_table = {"a": 1, "b": 2, "c": 3}
hash_table = {k: v for k, v in hash_table.items() if v != 2}
8. Performance Optimization
Load Factor Management
class DynamicHashTable:
def __init__(self, initial_size=8):
self.size = initial_size
self.count = 0
self.table = [[] for _ in range(self.size)]
def _load_factor(self):
return self.count / self.size
def _resize(self):
"""Double size when load factor > 0.7"""
old_table = self.table
self.size *= 2
self.table = [[] for _ in range(self.size)]
self.count = 0
for bucket in old_table:
for key, value in bucket:
self.insert(key, value)
def insert(self, key, value):
if self._load_factor() > 0.7:
self._resize()
idx = self._hash(key)
for i, (k, v) in enumerate(self.table[idx]):
if k == key:
self.table[idx][i] = (key, value)
return
self.table[idx].append((key, value))
self.count += 1
Summary
Key Points
- Hash Table: Key-value storage, average O(1) search/insert/delete
- Hash Function: Convert key to index, uniform distribution important
- Collision Resolution:
- Chaining: Store in linked list
- Open Addressing: Find another empty slot
- Python Tools:
dict: Basic hash tableCounter: Frequency countingdefaultdict: Auto-create default valuesOrderedDict: Maintain order + extra features
When to Use
| Situation | Data Structure | Time Complexity |
|---|---|---|
| Fast search | Hash Table | O(1) |
| Frequency count | Counter | O(n) |
| Duplicate check | set | O(n) |
| Grouping | defaultdict | O(n) |
| Caching | dict / LRU Cache | O(1) |
Time Complexity Improvement Pattern
# Before: O(n²) - nested loops
def has_duplicate_slow(arr):
for i in range(len(arr)):
for j in range(i+1, len(arr)):
if arr[i] == arr[j]:
return True
return False
# After: O(n) - hash table
def has_duplicate_fast(arr):
seen = set()
for num in arr:
if num in seen:
return True
seen.add(num)
return False
# Or simpler
def has_duplicate(arr):
return len(arr) != len(set(arr))
Recommended Problems
Beginner
- LeetCode 1: Two Sum
- LeetCode 217: Contains Duplicate
- LeetCode 242: Valid Anagram
Intermediate
- LeetCode 49: Group Anagrams
- LeetCode 3: Longest Substring Without Repeating Characters
- LeetCode 146: LRU Cache
Advanced
- LeetCode 76: Minimum Window Substring
- LeetCode 149: Max Points on a Line
- LeetCode 380: Insert Delete GetRandom O(1)
Related Posts
- Arrays and Lists | Essential Data Structures
- Stack and Queue | LIFO and FIFO Data Structures
- Python Data Types | Complete Guide
Keywords
Hash Table, HashMap, Dictionary, Data Structure, O(1) Search, Hash Function, Collision Resolution, Chaining, Open Addressing, Counter, defaultdict, Coding Interview, Algorithm