Python Comprehensions | List, Dict, Set, and Generator Expressions

Python Comprehensions | List, Dict, Set, and Generator Expressions

이 글의 핵심

Practical guide to Python comprehensions: concise loops for lists, dicts, and sets, plus generator expressions for memory-efficient iteration.

Introduction

“Build a list in one line”

Comprehensions are a concise, fast feature of Pythonic code.


1. List comprehensions

What they are

A list comprehension builds a list in a single expression. For many cases it is clearer and faster than a manual for loop with append.

Syntax: [expr for var in iterable if condition]

Basics

# Traditional loop
squares = []
for i in range(10):
    squares.append(i ** 2)

print(squares)

# List comprehension
squares = [i ** 2 for i in range(10)]
print(squares)

Rough timing comparison (order-of-magnitude; depends on Python version and hardware):

import time

start = time.time()
result1 = []
for i in range(1000000):
    result1.append(i ** 2)
print(f"for loop: {time.time() - start:.4f}s")

start = time.time()
result2 = [i ** 2 for i in range(1000000)]
print(f"comprehension: {time.time() - start:.4f}s")

Filtering with if

evens = [i for i in range(10) if i % 2 == 0]
print(evens)

multiples = [i for i in range(30) if i % 3 == 0 and i > 10]
print(multiples)

words = ['apple', 'banana', 'cherry', 'date', 'elderberry']
long_words = [word for word in words if len(word) > 5]
print(long_words)

Conditional expressions (if / else)

labels = ['even' if i % 2 == 0 else 'odd' for i in range(5)]
print(labels)

# if/else sits before the final `for`:
# [expr_if_true if cond else expr_if_false for x in iterable]

numbers = [-2, -1, 0, 1, 2]
signs = [
    'positive' if n > 0 else ('negative' if n < 0 else 'zero')
    for n in numbers
]
print(signs)

scores = [95, 85, 75, 65, 55]
grades = [
    'A' if s >= 90 else 'B' if s >= 80 else 'C' if s >= 70 else 'D' if s >= 60 else 'F'
    for s in scores
]
print(grades)

Filter (if only) vs map (if/else)

evens = [i for i in range(10) if i % 2 == 0]

labels = ['even' if i % 2 == 0 else 'odd' for i in range(10)]

positive_squares = [i ** 2 if i > 0 else 0 for i in range(-5, 6) if i != 0]
print(positive_squares)

Nested loops

matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

flat = []
for row in matrix:
    for num in row:
        flat.append(num)
print(flat)

flat = [num for row in matrix for num in row]
print(flat)

# Left `for` is outer, right `for` is inner — same order as nested fors

multiplication_table = [
    f"{i} x {j} = {i*j}"
    for i in range(2, 10)
    for j in range(1, 10)
]
print(multiplication_table[:5])

coordinates = [(x, y) for x in range(3) for y in range(3)]
print(coordinates)

diagonal = [(x, y) for x in range(5) for y in range(5) if x == y]
print(diagonal)

Nested comprehension vs nested lists

matrix = [[i * j for j in range(5)] for i in range(5)]
print(matrix)

matrix = []
for i in range(5):
    row = []
    for j in range(5):
        row.append(i * j)
    matrix.append(row)

2. Dictionary comprehensions

Syntax

{key_expr: value_expr for var in iterable if condition}

squares_dict = {}
for i in range(5):
    squares_dict[i] = i ** 2

squares_dict = {i: i ** 2 for i in range(5)}
print(squares_dict)

names = ['Alice', 'Bob', 'Charlie']
name_dict = {i: name for i, name in enumerate(names)}
print(name_dict)

Filtering and transforms

even_squares = {i: i ** 2 for i in range(10) if i % 2 == 0}
print(even_squares)

scores = {'Alice': 85, 'Bob': 92, 'Carol': 78, 'Dana': 95}
high_scores = {name: score for name, score in scores.items() if score >= 90}
print(high_scores)

data = {'apple': 5, 'banana': 3, 'cherry': 8, 'date': 2}
filtered = {
    k.upper(): v * 2
    for k, v in data.items()
    if len(k) > 4 and v > 3
}
print(filtered)

Swapping keys and values

original = {'a': 1, 'b': 2, 'c': 3}
swapped = {v: k for k, v in original.items()}
print(swapped)

original = {'a': 1, 'b': 2, 'c': 1}
swapped = {v: k for k, v in original.items()}
print(swapped)

from collections import defaultdict
swapped_multi = defaultdict(list)
for k, v in original.items():
    swapped_multi[v].append(k)
print(dict(swapped_multi))

Practical snippets

words = ['apple', 'banana', 'cherry', 'date']
word_lengths = {word: len(word) for word in words}
print(word_lengths)

env_str = "DEBUG=True,PORT=8000,HOST=localhost"
env_dict = {
    pair.split('=')[0]: pair.split('=')[1]
    for pair in env_str.split(',')
}
print(env_dict)

keys = ['name', 'age', 'city']
values = ['Alice', 25, 'Seoul']
person = {k: v for k, v in zip(keys, values)}
print(person)

products = {'apple': 1000, 'banana': 500, 'cherry': 2000, 'date': 800}
discounted = {
    name: price * 0.9
    for name, price in products.items()
    if price >= 1000
}
print(discounted)

3. Set comprehensions

Syntax

{expr for var in iterable if condition} — unordered, unique elements.

numbers = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]

unique = set(numbers)
unique = {n for n in numbers}
print(unique)

numbers = [1, -2, 2, -3, 3, -4, 4]
abs_unique = {abs(n) for n in numbers}
print(abs_unique)

Conditional sets

even_set = {i for i in range(10) if i % 2 == 0}
print(even_set)

text = "Hello World"
vowels = {char.lower() for char in text if char.lower() in 'aeiou'}
print(vowels)

words = ['hi', 'hello', 'hey', 'hello', 'world', 'hi']
long_words = {word for word in words if len(word) >= 3}
print(long_words)

Examples

emails = [
    '[email protected]',
    '[email protected]',
    '[email protected]',
    '[email protected]',
    '[email protected]'
]
domains = {email.split('@')[1] for email in emails}
print(domains)

files = ['image.jpg', 'doc.pdf', 'photo.jpg', 'video.mp4', 'report.pdf']
extensions = {file.split('.')[-1] for file in files}
print(extensions)

numbers = [123, 456, 789, 111, 222, 333]
last_digits = {n % 10 for n in numbers}
print(last_digits)

Set operations

list1 = [1, 2, 3, 4, 5]
list2 = [4, 5, 6, 7, 8]

common = {x for x in list1} & {x for x in list2}
print(common)
print(set(list1) & set(list2))

diff = {x for x in list1} - {x for x in list2}
print(diff)

union = {x for x in list1} | {x for x in list2}
print(union)

4. Generator expressions

Syntax

(expr for var in iterable if condition) — lazy, one value at a time.

squares_list = [i ** 2 for i in range(1000000)]
print(type(squares_list))

squares_gen = (i ** 2 for i in range(1000000))
print(type(squares_gen))

print(next(squares_gen))
print(next(squares_gen))

for square in (i ** 2 for i in range(5)):
    print(square, end=' ')
print()

gen = (i for i in range(3))
print(list(gen))
print(list(gen))

Memory footprint (illustrative)

import sys

list_comp = [i for i in range(100000)]
print(sys.getsizeof(list_comp))

gen_expr = (i for i in range(100000))
print(sys.getsizeof(gen_expr))

When generators shine

total = sum(i ** 2 for i in range(1000000))
maximum = max(i ** 2 for i in range(1000))

has_large = any(i ** 2 > 10000 for i in range(1000000))

with open('large_file.txt') as f:
    non_empty_lines = sum(1 for line in f if line.strip())

numbers = range(1000000)
evens = (x for x in numbers if x % 2 == 0)
squares = (x ** 2 for x in evens)
large = (x for x in squares if x > 100)
result = sum(large)

Generator vs list

# Generator: one pass, low memory
total = sum(i ** 2 for i in range(1000000))

# List: multiple passes, indexing, len()
squares = [i ** 2 for i in range(10)]
print(squares[5])
print(len(squares))
print(sum(squares))
print(max(squares))

5. Practical examples

Example 1: CSV-like string to dict rows

csv_data = "name,age,city\nAlice,25,Seoul\nBob,30,Busan\nCarol,28,Daejeon"

lines = csv_data.strip().split('\n')
header = lines[0].split(',')

data = [
    dict(zip(header, line.split(',')))
    for line in lines[1:]
]
print(data)

Typed conversion

data_typed = [
    {
        'name': parts[0],
        'age': int(parts[1]),
        'city': parts[2]
    }
    for line in lines[1:]
    for parts in [line.split(',')]
]
print(data_typed)

Example 2: student records

students = [
    {'name': 'Alice', 'score': 85},
    {'name': 'Bob', 'score': 92},
    {'name': 'Carol', 'score': 78},
    {'name': 'Dana', 'score': 95},
    {'name': 'Eve', 'score': 88}
]

high_scores = [s['name'] for s in students if s['score'] >= 90]
print(high_scores)

graded = [
    {**s, 'grade': 'A' if s['score'] >= 90 else 'B' if s['score'] >= 80 else 'C'}
    for s in students
]
print(graded)

passed = [
    {**s, 'grade': 'A' if s['score'] >= 90 else 'B'}
    for s in students
    if s['score'] >= 80
]
print(passed)

Example 3: cleaning strings

names = ['  alice  ', 'BOB', '  Charlie', 'david  ']

cleaned = [name.strip().lower() for name in names]
print(cleaned)

capitalized = [name.strip().capitalize() for name in names]
print(capitalized)

filtered = [name.strip() for name in names if len(name.strip()) >= 3]
print(filtered)

Example 4: file paths

import os

files = ['data.txt', 'image.png', 'report.txt', 'video.mp4', 'notes.txt']

txt_files = [f for f in files if f.endswith('.txt')]
print(txt_files)

names_only = [os.path.splitext(f)[0] for f in txt_files]
print(names_only)

base_path = '/home/user/documents'
full_paths = [os.path.join(base_path, f) for f in txt_files]
print(full_paths)

Example 5: JSON API payload

api_response = {
    'users': [
        {'id': 1, 'name': 'Alice', 'active': True, 'age': 25},
        {'id': 2, 'name': 'Bob', 'active': False, 'age': 30},
        {'id': 3, 'name': 'Charlie', 'active': True, 'age': 35},
        {'id': 4, 'name': 'David', 'active': True, 'age': 28}
    ]
}

active_ids = [
    user['id']
    for user in api_response['users']
    if user['active']
]
print(active_ids)

active_users = [
    {'name': user['name'], 'age': user['age']}
    for user in api_response['users']
    if user['active']
]
print(active_users)

senior_active = [
    user['name']
    for user in api_response['users']
    if user['active'] and user['age'] >= 30
]
print(senior_active)

6. Performance notes

Memory: list vs generator

import sys

squares_list = [i ** 2 for i in range(1000000)]
squares_gen = (i ** 2 for i in range(1000000))

print(sys.getsizeof(squares_list))
print(sys.getsizeof(squares_gen))

Micro-benchmark (illustrative)

import time

data = list(range(1000000))

start = time.time()
result1 = [x * 2 for x in data if x % 2 == 0]
print(f"comprehension: {time.time() - start:.4f}s")

start = time.time()
result2 = []
for x in data:
    if x % 2 == 0:
        result2.append(x * 2)
print(f"for loop: {time.time() - start:.4f}s")

start = time.time()
result3 = list(map(lambda x: x * 2, filter(lambda x: x % 2 == 0, data)))
print(f"map+filter: {time.time() - start:.4f}s")

Deeply nested comprehensions

def is_ascending(x, y, z):
    return x < y < z

result = [
    z
    for x in range(10)
    for y in range(10)
    for z in range(10)
    if is_ascending(x, y, z)
]

from itertools import combinations
result = [c[2] for c in combinations(range(10), 3)]

7. Style and best practices

Readability first

squares = [x ** 2 for x in range(10)]

result = []
for x in range(10):
    if x % 2 == 0:
        temp = x ** 2
        if temp > 20:
            result.append(temp)
        else:
            result.append(temp * 2)

# Avoid overly dense one-liners that hide intent

Common pitfalls

1) Accidental shared rows in a matrix

matrix = [[0] * 3] * 3
matrix[0][0] = 1
print(matrix)

matrix = [[0] * 3 for _ in range(3)]
matrix[0][0] = 1
print(matrix)

2) Building a list just to sum

total = sum([i ** 2 for i in range(1000000)])
total = sum(i ** 2 for i in range(1000000))

3) Side effects inside comprehensions

results = []
[results.append(x * 2) for x in range(10)]

results = []
for x in range(10):
    results.append(x * 2)

results = [x * 2 for x in range(10)]

Debug strategy

data = [1, 2, 3, 4, 5]

filtered = [x for x in data if x % 2 == 0]
print(filtered)

result = [x ** 2 for x in filtered]
print(result)

Patterns

raw_names = ['  ALICE  ', 'bob', '  Charlie  ', 'DAVID']
normalized = [name.strip().title() for name in raw_names]
print(normalized)

numbers = range(1, 11)
even_sum = sum(x for x in numbers if x % 2 == 0)
odd_sum = sum(x for x in numbers if x % 2 == 1)
print(even_sum, odd_sum)

8. Troubleshooting

“list index out of range”

data = [[1, 2], [3, 4, 5], [6]]

result = [row[2] for row in data if len(row) > 2]
print(result)

result = [row[2] if len(row) > 2 else None for row in data]
print(result)

Duplicate keys in dict comprehensions

items = [('a', 1), ('b', 2), ('a', 3)]
d = {k: v for k, v in items}
print(d)

from collections import defaultdict
d = defaultdict(list)
[d[k].append(v) for k, v in items]
print(dict(d))

Exceptions inside comprehensions

data = ['1', '2', 'three', '4', 'five']

result = [int(x) for x in data if x.isdigit()]

def safe_int(x):
    try:
        return int(x)
    except ValueError:
        return None

result = [safe_int(x) for x in data]
result_filtered = [x for x in result if x is not None]
print(result_filtered)

9. Quick reference table

SituationPreferWhy
Simple map/filterComprehensionShort and fast
Complex branchingfor loopClarity
Side effects (I/O, DB)for loopObvious intent
One-pass huge dataGenerator expressionMemory
Need index, len, many passesList comprehensionReusable list

10. Exercises

Exercise 1

Squares of multiples of 3 from 1 through 20:

# Expected: [9, 36, 81, 144, 225, 324]

Exercise 2

Turn people = [('Alice', 25), ('Bob', 30), ('Charlie', 35)] into {name: age}.

Exercise 3

From a 2-D list, flatten even numbers only.

matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
# Expected: [2, 4, 6, 8]

Exercise 4

Classify temperatures as "cold" (<10), "mild" (10–25), or "hot" (>25).

temps = [5, 15, 30, 8, 22, 28]
Answers
multiples_of_3 = [x ** 2 for x in range(1, 21) if x % 3 == 0]

people = [('Alice', 25), ('Bob', 30), ('Charlie', 35)]
people_dict = {name: age for name, age in people}

matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
evens_flat = [num for row in matrix for num in row if num % 2 == 0]

temps = [5, 15, 30, 8, 22, 28]
labels = [
    'cold' if t < 10 else 'mild' if t <= 25 else 'hot'
    for t in temps
]
print(labels)

Summary

Key takeaways

  1. List comprehension: [expr for x in iterable if cond] — concise list construction.
  2. Dict comprehension: {k: v for ...} — build mappings in one expression.
  3. Set comprehension: {expr for ...} — unique values with optional transforms.
  4. Generator expression: (expr for ...) — lazy iteration, tiny memory footprint.
  5. Readability wins: reach for a plain loop when the comprehension becomes cryptic.

After you master comprehensions

  • Code tends to be shorter and idiomatic
  • Data prep tasks feel lighter
  • You can choose list vs generator deliberately

Next steps