If sorting not needed, always use unordered_set?

Average lookup is close to O(1) but varies by hash/collision. Consider worst case, memory, and traversal needs together.

How to insert custom types?

unordered_set needs hash and equality comparison, set needs comparison operator (or custom comparator). Define consistently.

Can I delete during iteration?

Iterator invalidates if rules not followed. Use safe idioms like erase return iterator pattern or copy-then-delete.

How to do set operations (union/intersection)?

Apply STL algorithms like set_union to sorted sequences, or choose method of iterating one side and querying the other depending on size.

C++ set/unordered_set | "Duplicate Removal" Complete Guide

2026년 3월 12일 · 17분 읽기 · 수정 2026년 3월 30일 Beginner Tutorial

이 글의 핵심

Covers selection criteria for set and unordered_set, multiset·hash·iterator rules, and practical set operations.

set vs unordered_set

Feature	set	unordered_set
Sorting	O (auto-sorted)	X
Speed	O(log n)	O(1)
Duplicates	Not allowed	Not allowed
Traversal	Sorted order	Random

set vs unordered_set Performance Comparison

From time complexity (average/amortized) perspective, set is based on balanced binary search tree (typically red-black), so insert/delete/search is O(log n). unordered_set is a hash table, so average O(1), but with many hash collisions can approach worst case O(n).

Practical differences are:

Small n·order needed: If element count is hundreds~thousands and need sorted traversal or range search like lower_bound, set is simpler.
Large n·key lookup only: For millions+ where you only need fast “exists/not exists” check and order doesn’t matter, unordered_set is often advantageous.
Memory: Hash table has additional overhead from buckets/load factor, tree has pointer overhead per node. Varies by n and key size, so measurement is safe.
Cache locality: Neither uses contiguous memory, both have frequent skip access causing cache misses. Hard to say “unordered is always faster” based on traversal alone.

Selection Summary: If need order/range search use set, if keys are hash-friendly (integers, well-distributed strings) and mostly lookups, consider unordered_set first.

multiset and unordered_multiset

set / unordered_set store only one of each key. If need same value multiple times, use multiset, unordered_multiset.

Feature	multiset	unordered_multiset
Order	Maintains sort (same value insertion order is implementation-defined)	No order
Lookup	`count`, `equal_range` for same key range	`count`, `equal_range`
Delete	`erase(key)` removes all of that key, use iterator to remove one	Same

#include <set>
#include <unordered_set>
#include <iostream>

int main() {
    std::multiset<int> ms{1, 1, 2, 2, 2};
    std::cout << ms.count(2) << '\n';  // 3

    auto [first, last] = ms.equal_range(2);
    for (auto it = first; it != last; ++it) { /* ... */ }

    std::unordered_multiset<std::string> ums;
    ums.insert("a"); ums.insert("a");
    std::cout << ums.count("a") << '\n';  // 2
}

When only top k needed, multiset maintains sorted state making it easy to handle max/min at ends. If only counting frequency and order not needed, unordered_multiset or unordered_map<T, int> is often better.

Custom Comparator and Hash Function

set: `Compare` and `operator<`

set sorting criterion must satisfy strict weak ordering. Can use operator< or std::less for custom types.

struct Person {
    std::string name;
    int id;
};

struct ById {
    bool operator()(const Person& a, const Person& b) const {
        return a.id < b.id;
    }
};

std::set<Person, ById> by_id_set;

unordered_set: `Hash` and `KeyEqual`

In unordered_set<Key, Hash, KeyEqual>, same key must always produce same hash, and KeyEqual determines actual equality on hash collision. Custom types typically provide both.

struct Point {
    int x, y;
    bool operator==(const Point& o) const { return x == o.x && y == o.y; }
};

struct PointHash {
    std::size_t operator()(const Point& p) const noexcept {
        // Simple combination example (consider boost::hash_combine for projects)
        return std::hash<int>{}(p.x) ^ (std::hash<int>{}(p.y) << 1);
    }
};

std::unordered_set<Point, PointHash> pts;

Caution: When putting strings in unordered_set, bad custom hash causes severe performance drop. If key is std::string, typically use default hash.

Practice: Duplicate Removal·Intersection·Union Summary

Duplicate removal only with original order preservation: Putting in set then moving to vector changes order. If order preservation needed, consider sort then unique or unordered_set + separate order array.
Intersection·union·difference: Above set_intersection / set_union / set_difference patterns require both ranges sorted. Using set automatically satisfies sorted order via iterators.
unordered_set operations: If library doesn’t provide one-shot operations, collect one side into vector and sort, or iterate smaller side and check inclusion in other with count/find (choose better algorithm by size).

Iterator Invalidation

set / multiset

insert / emplace: Existing element iterators and references not invalidated.
erase(it): Only that iterator invalid. Other iterators maintained.
erase(key) / clear: Iterators to deleted elements invalid.

unordered_set / unordered_multiset

Rehash changes bucket structure, all iterators may be invalid. insert exceeding load factor can trigger rehash, so indiscriminate insertion during iteration is dangerous.
reserve(n) (or appropriate bucket count reservation) reduces rehash frequency, making iterator invalidation less frequent.
erase: Only iterator at deleted position invalid, other iterators usually valid after C++11, but implementation-dependent, so using erase’s return iterator pattern is safe.

for (auto it = s.begin(); it != s.end(); ) {
    if (condition) it = s.erase(it);  // erase returns next iterator
    else ++it;
}

set Basic Usage

#include <set>
#include <iostream>
using namespace std;

int main() {
    set<int> s;
    
    // Insert
    s.insert(3);
    s.insert(1);
    s.insert(2);
    s.insert(1);  // Duplicate ignored
    
    // Traverse (auto-sorted: 1, 2, 3)
    for (int x : s) {
        cout << x << " ";
    }
    
    // Search
    if (s.find(2) != s.end()) {
        cout << "\n2 exists" << endl;
    }
    
    // Delete
    s.erase(2);
    
    // Size
    cout << "Size: " << s.size() << endl;
    
    return 0;
}

unordered_set Basic Usage

#include <unordered_set>
#include <iostream>
using namespace std;

int main() {
    unordered_set<string> words;
    
    words.insert("apple");
    words.insert("banana");
    words.insert("apple");  // Duplicate ignored
    
    // Check existence (fast)
    if (words.count("apple") > 0) {
        cout << "apple exists" << endl;
    }
    
    // Traverse (order not guaranteed)
    for (const string& word : words) {
        cout << word << " ";
    }
    
    return 0;
}

Practical Examples

Example 1: Remove Duplicates from Array

#include <iostream>
#include <vector>
#include <set>
using namespace std;

vector<int> removeDuplicates(const vector<int>& arr) {
    set<int> s(arr.begin(), arr.end());
    return vector<int>(s.begin(), s.end());
}

int main() {
    vector<int> arr = {5, 2, 8, 2, 9, 5, 1, 8};
    
    cout << "Original: ";
    for (int x : arr) cout << x << " ";
    
    vector<int> unique = removeDuplicates(arr);
    
    cout << "\nDuplicate removed (sorted): ";
    for (int x : unique) cout << x << " ";
    
    return 0;
}

Explanation: Most basic pattern utilizing set’s auto-sorting and duplicate removal.

Example 2: Intersection/Union of Two Arrays

#include <iostream>
#include <set>
#include <algorithm>
using namespace std;

int main() {
    set<int> a = {1, 2, 3, 4, 5};
    set<int> b = {3, 4, 5, 6, 7};
    
    // Intersection
    set<int> intersection;
    set_intersection(a.begin(), a.end(),
                     b.begin(), b.end(),
                     inserter(intersection, intersection.begin()));
    
    cout << "Intersection: ";
    for (int x : intersection) cout << x << " ";  // 3 4 5
    
    // Union
    set<int> union_set;
    set_union(a.begin(), a.end(),
              b.begin(), b.end(),
              inserter(union_set, union_set.begin()));
    
    cout << "\nUnion: ";
    for (int x : union_set) cout << x << " ";  // 1 2 3 4 5 6 7
    
    // Difference
    set<int> difference;
    set_difference(a.begin(), a.end(),
                   b.begin(), b.end(),
                   inserter(difference, difference.begin()));
    
    cout << "\nDifference (A-B): ";
    for (int x : difference) cout << x << " ";  // 1 2
    
    return 0;
}

Explanation: Can implement mathematical set operations directly. Frequently used in algorithm problems.

Example 3: Visit Check (Graph Traversal)

#include <iostream>
#include <unordered_set>
#include <queue>
#include <vector>
using namespace std;

void bfs(int start, vector<vector<int>>& graph) {
    unordered_set<int> visited;
    queue<int> q;
    
    q.push(start);
    visited.insert(start);
    
    while (!q.empty()) {
        int node = q.front();
        q.pop();
        
        cout << node << " ";
        
        for (int neighbor : graph[node]) {
            if (visited.count(neighbor) == 0) {
                visited.insert(neighbor);
                q.push(neighbor);
            }
        }
    }
}

int main() {
    // Graph: 0-1, 0-2, 1-3, 2-3
    vector<vector<int>> graph = {
        {1, 2},    // Node 0's neighbors
        {0, 3},    // Node 1's neighbors
        {0, 3},    // Node 2's neighbors
        {1, 2}     // Node 3's neighbors
    };
    
    cout << "BFS traversal: ";
    bfs(0, graph);
    
    return 0;
}

Explanation: Use unordered_set to check visited status at O(1) speed. Essential pattern for graph algorithms.

Common Problems

Problem 1: Cannot Modify set Elements

Symptom: Compile error when trying to modify set element directly

Cause: set elements are const to maintain sorting

Solution:

// ❌ Wrong code
set<int> s = {1, 2, 3};
auto it = s.find(2);
*it = 5;  // Compile error! const int&

// ✅ Correct code (delete then reinsert)
set<int> s = {1, 2, 3};
auto it = s.find(2);
if (it != s.end()) {
    s.erase(it);
    s.insert(5);
}

// ✅ For structs (use mutable)
struct Item {
    int id;
    mutable int count;  // mutable can be modified
    
    bool operator<(const Item& other) const {
        return id < other.id;
    }
};

set<Item> items;
auto it = items.find(Item{1, 0});
if (it != items.end()) {
    it->count++;  // OK (mutable)
}

Problem 2: Custom Comparison Function

Symptom: Compile error when inserting custom type into set

Cause: Needs operator< or comparison function

Solution:

// ❌ Compile error
struct Point {
    int x, y;
};

set<Point> points;  // Error! No operator<

// ✅ Method 1: Define operator<
struct Point {
    int x, y;
    
    bool operator<(const Point& other) const {
        if (x != other.x) return x < other.x;
        return y < other.y;
    }
};

set<Point> points;  // OK

// ✅ Method 2: Comparison function object
struct PointCompare {
    bool operator()(const Point& a, const Point& b) const {
        if (a.x != b.x) return a.x < b.x;
        return a.y < b.y;
    }
};

set<Point, PointCompare> points;  // OK

// ✅ Method 3: Lambda (C++11)
auto cmp = [](const Point& a, const Point& b) {
    if (a.x != b.x) return a.x < b.x;
    return a.y < b.y;
};

set<Point, decltype(cmp)> points(cmp);  // OK

Problem 3: Confusion with multiset

Symptom: Want to allow duplicates but set doesn’t

Cause: set doesn’t allow duplicates, multiset does

Solution:

// ❌ set doesn't allow duplicates
set<int> s;
s.insert(1);
s.insert(1);
s.insert(1);
cout << s.size();  // 1 (duplicates ignored)

// ✅ Use multiset
#include <set>
multiset<int> ms;
ms.insert(1);
ms.insert(1);
ms.insert(1);
cout << ms.size();  // 3 (duplicates allowed)

// Count specific value
cout << ms.count(1);  // 3

// Delete all of specific value
ms.erase(1);  // All 1s deleted

// Delete only one
auto it = ms.find(1);
if (it != ms.end()) {
    ms.erase(it);  // Delete only one
}

Performance Optimization

Optimization Strategies

Choose efficient data structure
- How: Use appropriate STL container for situation
- Effect: Improve time complexity
Prevent unnecessary copying
- How: Use reference passing
- Effect: Reduce memory usage
Compiler optimization
- How: Use -O2, -O3 flags
- Effect: Improve execution speed

Benchmark Results

Method	Execution Time	Memory Usage	Notes
Basic implementation	100ms	10MB	-
Optimization 1	80ms	8MB	Reference passing
Optimization 2	50ms	5MB	STL algorithms

Conclusion: Appropriate optimization can improve performance 2x or more

Caution: Absolute time/memory numbers in above table are examples. In real services, remeasure before/after with same input and compile options.

Advanced: `reserve`·`rehash`·Load Distribution

unordered_set triggers rehash when exceeding load factor, at which point all iterators may be invalid. If approximate element count known before insertion:

std::unordered_set<int> s;
s.reserve(1'000'000); // Reduce rehash frequency, stabilize benchmark

set doesn’t have reserve, but reducing unnecessary copies/temporary objects helps perceived performance.

Advanced: Heterogeneous Lookup (C++20, `unordered_set`)

When key is std::string, to prevent find("literal") from creating temporary std::string, need custom policy with transparent hash/transparent equality (std::hash<std::string_view>, std::equal_to<>). Standard specialization varies by implementation/version, so check project’s standard library documentation.

// Concept example: lookup with string_view (must verify project policy/standard version)
// std::unordered_set<std::string, SVHash, SVEqual> etc.

Advanced: `node_type` / `extract` (C++17)

set/unordered_set can extract nodes and move to other containers, used for patterns that update keys without reallocation.

std::set<int> a{1, 2, 3};
auto nh = a.extract(2);
if (!nh.empty()) {
    nh.value() = 5;
    a.insert(std::move(nh));
}

unordered_set similar but watch rehash timing.

Advanced: Benchmark Method (Practically)

Fix: Compiler optimization (-O2/-O3), CPU fixed frequency (if possible), same seed.
Warmup: Start timer after cache/hash table warmup.
Metrics: Not just average but p95/p99, especially for unordered_* include worst-case hash input as separate case.

#include <chrono>

template<typename F>
auto bench(F&& f, int reps) {
    using clock = std::chrono::steady_clock;
    auto t0 = clock::now();
    for (int i = 0; i < reps; ++i) f();
    return clock::now() - t0;
}

Advanced: Debugging Guide

Symptom	Check
Slow in `unordered_set`	Hash quality, key collision, load factor, wrong `operator==`
Abnormal behavior during iteration	Rehash invalidates iterators — use `reserve` or copy-then-iterate
`set` sorting is strange	Check if comparison operation satisfies strict weak ordering

Advanced: Common Mistake Patterns (Additional)

Using vector as key in unordered_set with only default hash → Hash inefficient or definition missing. Use immutable key as identifier or define custom hash.
Wrong iterator increment during erase while iterating → Use it = container.erase(it) pattern.
Assuming const member read-only is safe in multithreading → unordered_set needs synchronization even for reads (read-only snapshot or shared_mutex design).

FAQ

Q1: Can beginners learn this?

A: Yes, this guide is written for beginners. Basic C++ syntax knowledge is sufficient.

Q2: Frequently used in practice?

A: Yes, very frequently. Essential concept in production projects.

Q3: Comparison with other languages?

A: C++ advantage is performance and control. Faster than Python, more flexible than Java.

Q4: How long to learn?

A: Basic concepts 1-2 hours, mastery about 1-2 weeks.

Q5: Recommended learning order?

Learn basic syntax
Follow simple examples
Apply to practical projects
Learn advanced techniques

Q6: Common mistakes?

Not initializing
Memory management mistakes
Not considering time complexity
Missing exception handling

Posts connected to this topic.

C++ string | “String Processing” Complete Guide [Practical Function Summary]
C++ Algorithm Set | “Set Algorithms” Guide
C++ STL vector | “More Convenient Than Arrays” Vector Complete Guide [Practical Examples]

C++ map vs unordered_map (STL Series)
C++ map·set Complete Guide | ordered vs unordered· Custom Keys

Practical Tips

Tips you can apply immediately in practice.

Debugging Tips

Check compiler warnings first when problems occur
Reproduce problem with simple test cases
Use print debugging to trace execution

Performance Tips

Choose correct data structure for use case
Measure before optimizing
Profile to find bottlenecks

Code Review Tips

Pre-check frequently pointed out parts
Follow team coding conventions
Write clear variable names

Practical Checklist

When Choosing Container

Need sorted order?
Need range queries (lower_bound)?
Only existence check?
What is expected element count?

When Using Custom Types

Defined operator< or comparator?
Defined hash and operator==?
Tested with sample data?

When Iterating

Modifying container during iteration?
Using safe erase pattern?
Considered iterator invalidation?

Keywords

C++, set, unordered_set, Set, STL, Duplicate Removal, Hash Table, Binary Search Tree, Iterator Invalidation, Custom Comparator, multiset

이 글의 핵심

set vs unordered_set

set vs unordered_set Performance Comparison

multiset and unordered_multiset

Custom Comparator and Hash Function

set: Compare and operator<

unordered_set: Hash and KeyEqual

Practice: Duplicate Removal·Intersection·Union Summary

Iterator Invalidation

set Basic Usage

unordered_set Basic Usage

Practical Examples

Example 1: Remove Duplicates from Array

Example 2: Intersection/Union of Two Arrays

Example 3: Visit Check (Graph Traversal)

Common Problems

Problem 1: Cannot Modify set Elements

Problem 2: Custom Comparison Function

Problem 3: Confusion with multiset

Performance Optimization

Optimization Strategies

Benchmark Results

Advanced: reserve·rehash·Load Distribution

Advanced: Heterogeneous Lookup (C++20, unordered_set)

Advanced: node_type / extract (C++17)

Advanced: Benchmark Method (Practically)

Advanced: Debugging Guide

Advanced: Common Mistake Patterns (Additional)

FAQ

Q1: Can beginners learn this?

Q2: Frequently used in practice?

Q3: Comparison with other languages?

Q4: How long to learn?

Q5: Recommended learning order?

Q6: Common mistakes?

Related Posts (Internal Links)

Related Posts

Practical Tips

Debugging Tips

Performance Tips

Code Review Tips

Practical Checklist

When Choosing Container

When Using Custom Types

When Iterating

Keywords

set: `Compare` and `operator<`

unordered_set: `Hash` and `KeyEqual`

Advanced: `reserve`·`rehash`·Load Distribution

Advanced: Heterogeneous Lookup (C++20, `unordered_set`)

Advanced: `node_type` / `extract` (C++17)