Python Comprehensions | List, Dict, Set, and Generator Expressions
이 글의 핵심
Practical guide to Python comprehensions: concise loops for lists, dicts, and sets, plus generator expressions for memory-efficient iteration.
Introduction
“Build a list in one line”
Comprehensions are a concise, fast feature of Pythonic code.
1. List comprehensions
What they are
A list comprehension builds a list in a single expression. For many cases it is clearer and faster than a manual for loop with append.
Syntax: [expr for var in iterable if condition]
Basics
# Traditional loop
squares = []
for i in range(10):
squares.append(i ** 2)
print(squares)
# List comprehension
squares = [i ** 2 for i in range(10)]
print(squares)
Rough timing comparison (order-of-magnitude; depends on Python version and hardware):
import time
start = time.time()
result1 = []
for i in range(1000000):
result1.append(i ** 2)
print(f"for loop: {time.time() - start:.4f}s")
start = time.time()
result2 = [i ** 2 for i in range(1000000)]
print(f"comprehension: {time.time() - start:.4f}s")
Filtering with if
evens = [i for i in range(10) if i % 2 == 0]
print(evens)
multiples = [i for i in range(30) if i % 3 == 0 and i > 10]
print(multiples)
words = ['apple', 'banana', 'cherry', 'date', 'elderberry']
long_words = [word for word in words if len(word) > 5]
print(long_words)
Conditional expressions (if / else)
labels = ['even' if i % 2 == 0 else 'odd' for i in range(5)]
print(labels)
# if/else sits before the final `for`:
# [expr_if_true if cond else expr_if_false for x in iterable]
numbers = [-2, -1, 0, 1, 2]
signs = [
'positive' if n > 0 else ('negative' if n < 0 else 'zero')
for n in numbers
]
print(signs)
scores = [95, 85, 75, 65, 55]
grades = [
'A' if s >= 90 else 'B' if s >= 80 else 'C' if s >= 70 else 'D' if s >= 60 else 'F'
for s in scores
]
print(grades)
Filter (if only) vs map (if/else)
evens = [i for i in range(10) if i % 2 == 0]
labels = ['even' if i % 2 == 0 else 'odd' for i in range(10)]
positive_squares = [i ** 2 if i > 0 else 0 for i in range(-5, 6) if i != 0]
print(positive_squares)
Nested loops
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = []
for row in matrix:
for num in row:
flat.append(num)
print(flat)
flat = [num for row in matrix for num in row]
print(flat)
# Left `for` is outer, right `for` is inner — same order as nested fors
multiplication_table = [
f"{i} x {j} = {i*j}"
for i in range(2, 10)
for j in range(1, 10)
]
print(multiplication_table[:5])
coordinates = [(x, y) for x in range(3) for y in range(3)]
print(coordinates)
diagonal = [(x, y) for x in range(5) for y in range(5) if x == y]
print(diagonal)
Nested comprehension vs nested lists
matrix = [[i * j for j in range(5)] for i in range(5)]
print(matrix)
matrix = []
for i in range(5):
row = []
for j in range(5):
row.append(i * j)
matrix.append(row)
2. Dictionary comprehensions
Syntax
{key_expr: value_expr for var in iterable if condition}
squares_dict = {}
for i in range(5):
squares_dict[i] = i ** 2
squares_dict = {i: i ** 2 for i in range(5)}
print(squares_dict)
names = ['Alice', 'Bob', 'Charlie']
name_dict = {i: name for i, name in enumerate(names)}
print(name_dict)
Filtering and transforms
even_squares = {i: i ** 2 for i in range(10) if i % 2 == 0}
print(even_squares)
scores = {'Alice': 85, 'Bob': 92, 'Carol': 78, 'Dana': 95}
high_scores = {name: score for name, score in scores.items() if score >= 90}
print(high_scores)
data = {'apple': 5, 'banana': 3, 'cherry': 8, 'date': 2}
filtered = {
k.upper(): v * 2
for k, v in data.items()
if len(k) > 4 and v > 3
}
print(filtered)
Swapping keys and values
original = {'a': 1, 'b': 2, 'c': 3}
swapped = {v: k for k, v in original.items()}
print(swapped)
original = {'a': 1, 'b': 2, 'c': 1}
swapped = {v: k for k, v in original.items()}
print(swapped)
from collections import defaultdict
swapped_multi = defaultdict(list)
for k, v in original.items():
swapped_multi[v].append(k)
print(dict(swapped_multi))
Practical snippets
words = ['apple', 'banana', 'cherry', 'date']
word_lengths = {word: len(word) for word in words}
print(word_lengths)
env_str = "DEBUG=True,PORT=8000,HOST=localhost"
env_dict = {
pair.split('=')[0]: pair.split('=')[1]
for pair in env_str.split(',')
}
print(env_dict)
keys = ['name', 'age', 'city']
values = ['Alice', 25, 'Seoul']
person = {k: v for k, v in zip(keys, values)}
print(person)
products = {'apple': 1000, 'banana': 500, 'cherry': 2000, 'date': 800}
discounted = {
name: price * 0.9
for name, price in products.items()
if price >= 1000
}
print(discounted)
3. Set comprehensions
Syntax
{expr for var in iterable if condition} — unordered, unique elements.
numbers = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
unique = set(numbers)
unique = {n for n in numbers}
print(unique)
numbers = [1, -2, 2, -3, 3, -4, 4]
abs_unique = {abs(n) for n in numbers}
print(abs_unique)
Conditional sets
even_set = {i for i in range(10) if i % 2 == 0}
print(even_set)
text = "Hello World"
vowels = {char.lower() for char in text if char.lower() in 'aeiou'}
print(vowels)
words = ['hi', 'hello', 'hey', 'hello', 'world', 'hi']
long_words = {word for word in words if len(word) >= 3}
print(long_words)
Examples
emails = [
'[email protected]',
'[email protected]',
'[email protected]',
'[email protected]',
'[email protected]'
]
domains = {email.split('@')[1] for email in emails}
print(domains)
files = ['image.jpg', 'doc.pdf', 'photo.jpg', 'video.mp4', 'report.pdf']
extensions = {file.split('.')[-1] for file in files}
print(extensions)
numbers = [123, 456, 789, 111, 222, 333]
last_digits = {n % 10 for n in numbers}
print(last_digits)
Set operations
list1 = [1, 2, 3, 4, 5]
list2 = [4, 5, 6, 7, 8]
common = {x for x in list1} & {x for x in list2}
print(common)
print(set(list1) & set(list2))
diff = {x for x in list1} - {x for x in list2}
print(diff)
union = {x for x in list1} | {x for x in list2}
print(union)
4. Generator expressions
Syntax
(expr for var in iterable if condition) — lazy, one value at a time.
squares_list = [i ** 2 for i in range(1000000)]
print(type(squares_list))
squares_gen = (i ** 2 for i in range(1000000))
print(type(squares_gen))
print(next(squares_gen))
print(next(squares_gen))
for square in (i ** 2 for i in range(5)):
print(square, end=' ')
print()
gen = (i for i in range(3))
print(list(gen))
print(list(gen))
Memory footprint (illustrative)
import sys
list_comp = [i for i in range(100000)]
print(sys.getsizeof(list_comp))
gen_expr = (i for i in range(100000))
print(sys.getsizeof(gen_expr))
When generators shine
total = sum(i ** 2 for i in range(1000000))
maximum = max(i ** 2 for i in range(1000))
has_large = any(i ** 2 > 10000 for i in range(1000000))
with open('large_file.txt') as f:
non_empty_lines = sum(1 for line in f if line.strip())
numbers = range(1000000)
evens = (x for x in numbers if x % 2 == 0)
squares = (x ** 2 for x in evens)
large = (x for x in squares if x > 100)
result = sum(large)
Generator vs list
# Generator: one pass, low memory
total = sum(i ** 2 for i in range(1000000))
# List: multiple passes, indexing, len()
squares = [i ** 2 for i in range(10)]
print(squares[5])
print(len(squares))
print(sum(squares))
print(max(squares))
5. Practical examples
Example 1: CSV-like string to dict rows
csv_data = "name,age,city\nAlice,25,Seoul\nBob,30,Busan\nCarol,28,Daejeon"
lines = csv_data.strip().split('\n')
header = lines[0].split(',')
data = [
dict(zip(header, line.split(',')))
for line in lines[1:]
]
print(data)
Typed conversion
data_typed = [
{
'name': parts[0],
'age': int(parts[1]),
'city': parts[2]
}
for line in lines[1:]
for parts in [line.split(',')]
]
print(data_typed)
Example 2: student records
students = [
{'name': 'Alice', 'score': 85},
{'name': 'Bob', 'score': 92},
{'name': 'Carol', 'score': 78},
{'name': 'Dana', 'score': 95},
{'name': 'Eve', 'score': 88}
]
high_scores = [s['name'] for s in students if s['score'] >= 90]
print(high_scores)
graded = [
{**s, 'grade': 'A' if s['score'] >= 90 else 'B' if s['score'] >= 80 else 'C'}
for s in students
]
print(graded)
passed = [
{**s, 'grade': 'A' if s['score'] >= 90 else 'B'}
for s in students
if s['score'] >= 80
]
print(passed)
Example 3: cleaning strings
names = [' alice ', 'BOB', ' Charlie', 'david ']
cleaned = [name.strip().lower() for name in names]
print(cleaned)
capitalized = [name.strip().capitalize() for name in names]
print(capitalized)
filtered = [name.strip() for name in names if len(name.strip()) >= 3]
print(filtered)
Example 4: file paths
import os
files = ['data.txt', 'image.png', 'report.txt', 'video.mp4', 'notes.txt']
txt_files = [f for f in files if f.endswith('.txt')]
print(txt_files)
names_only = [os.path.splitext(f)[0] for f in txt_files]
print(names_only)
base_path = '/home/user/documents'
full_paths = [os.path.join(base_path, f) for f in txt_files]
print(full_paths)
Example 5: JSON API payload
api_response = {
'users': [
{'id': 1, 'name': 'Alice', 'active': True, 'age': 25},
{'id': 2, 'name': 'Bob', 'active': False, 'age': 30},
{'id': 3, 'name': 'Charlie', 'active': True, 'age': 35},
{'id': 4, 'name': 'David', 'active': True, 'age': 28}
]
}
active_ids = [
user['id']
for user in api_response['users']
if user['active']
]
print(active_ids)
active_users = [
{'name': user['name'], 'age': user['age']}
for user in api_response['users']
if user['active']
]
print(active_users)
senior_active = [
user['name']
for user in api_response['users']
if user['active'] and user['age'] >= 30
]
print(senior_active)
6. Performance notes
Memory: list vs generator
import sys
squares_list = [i ** 2 for i in range(1000000)]
squares_gen = (i ** 2 for i in range(1000000))
print(sys.getsizeof(squares_list))
print(sys.getsizeof(squares_gen))
Micro-benchmark (illustrative)
import time
data = list(range(1000000))
start = time.time()
result1 = [x * 2 for x in data if x % 2 == 0]
print(f"comprehension: {time.time() - start:.4f}s")
start = time.time()
result2 = []
for x in data:
if x % 2 == 0:
result2.append(x * 2)
print(f"for loop: {time.time() - start:.4f}s")
start = time.time()
result3 = list(map(lambda x: x * 2, filter(lambda x: x % 2 == 0, data)))
print(f"map+filter: {time.time() - start:.4f}s")
Deeply nested comprehensions
def is_ascending(x, y, z):
return x < y < z
result = [
z
for x in range(10)
for y in range(10)
for z in range(10)
if is_ascending(x, y, z)
]
from itertools import combinations
result = [c[2] for c in combinations(range(10), 3)]
7. Style and best practices
Readability first
squares = [x ** 2 for x in range(10)]
result = []
for x in range(10):
if x % 2 == 0:
temp = x ** 2
if temp > 20:
result.append(temp)
else:
result.append(temp * 2)
# Avoid overly dense one-liners that hide intent
Common pitfalls
1) Accidental shared rows in a matrix
matrix = [[0] * 3] * 3
matrix[0][0] = 1
print(matrix)
matrix = [[0] * 3 for _ in range(3)]
matrix[0][0] = 1
print(matrix)
2) Building a list just to sum
total = sum([i ** 2 for i in range(1000000)])
total = sum(i ** 2 for i in range(1000000))
3) Side effects inside comprehensions
results = []
[results.append(x * 2) for x in range(10)]
results = []
for x in range(10):
results.append(x * 2)
results = [x * 2 for x in range(10)]
Debug strategy
data = [1, 2, 3, 4, 5]
filtered = [x for x in data if x % 2 == 0]
print(filtered)
result = [x ** 2 for x in filtered]
print(result)
Patterns
raw_names = [' ALICE ', 'bob', ' Charlie ', 'DAVID']
normalized = [name.strip().title() for name in raw_names]
print(normalized)
numbers = range(1, 11)
even_sum = sum(x for x in numbers if x % 2 == 0)
odd_sum = sum(x for x in numbers if x % 2 == 1)
print(even_sum, odd_sum)
8. Troubleshooting
“list index out of range”
data = [[1, 2], [3, 4, 5], [6]]
result = [row[2] for row in data if len(row) > 2]
print(result)
result = [row[2] if len(row) > 2 else None for row in data]
print(result)
Duplicate keys in dict comprehensions
items = [('a', 1), ('b', 2), ('a', 3)]
d = {k: v for k, v in items}
print(d)
from collections import defaultdict
d = defaultdict(list)
[d[k].append(v) for k, v in items]
print(dict(d))
Exceptions inside comprehensions
data = ['1', '2', 'three', '4', 'five']
result = [int(x) for x in data if x.isdigit()]
def safe_int(x):
try:
return int(x)
except ValueError:
return None
result = [safe_int(x) for x in data]
result_filtered = [x for x in result if x is not None]
print(result_filtered)
9. Quick reference table
| Situation | Prefer | Why |
|---|---|---|
| Simple map/filter | Comprehension | Short and fast |
| Complex branching | for loop | Clarity |
| Side effects (I/O, DB) | for loop | Obvious intent |
| One-pass huge data | Generator expression | Memory |
| Need index, len, many passes | List comprehension | Reusable list |
10. Exercises
Exercise 1
Squares of multiples of 3 from 1 through 20:
# Expected: [9, 36, 81, 144, 225, 324]
Exercise 2
Turn people = [('Alice', 25), ('Bob', 30), ('Charlie', 35)] into {name: age}.
Exercise 3
From a 2-D list, flatten even numbers only.
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
# Expected: [2, 4, 6, 8]
Exercise 4
Classify temperatures as "cold" (<10), "mild" (10–25), or "hot" (>25).
temps = [5, 15, 30, 8, 22, 28]
Answers
multiples_of_3 = [x ** 2 for x in range(1, 21) if x % 3 == 0]
people = [('Alice', 25), ('Bob', 30), ('Charlie', 35)]
people_dict = {name: age for name, age in people}
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
evens_flat = [num for row in matrix for num in row if num % 2 == 0]
temps = [5, 15, 30, 8, 22, 28]
labels = [
'cold' if t < 10 else 'mild' if t <= 25 else 'hot'
for t in temps
]
print(labels)
Summary
Key takeaways
- List comprehension:
[expr for x in iterable if cond]— concise list construction. - Dict comprehension:
{k: v for ...}— build mappings in one expression. - Set comprehension:
{expr for ...}— unique values with optional transforms. - Generator expression:
(expr for ...)— lazy iteration, tiny memory footprint. - Readability wins: reach for a plain loop when the comprehension becomes cryptic.
After you master comprehensions
- Code tends to be shorter and idiomatic
- Data prep tasks feel lighter
- You can choose list vs generator deliberately
Next steps
- Decorators
- Generator functions with
yield - Functions | lambdas and higher-order functions