C++ set/unordered_set | "Duplicate Removal" Complete Guide
이 글의 핵심
Covers selection criteria for set and unordered_set, multiset·hash·iterator rules, and practical set operations.
set vs unordered_set
| Feature | set | unordered_set |
|---|---|---|
| Sorting | O (auto-sorted) | X |
| Speed | O(log n) | O(1) |
| Duplicates | Not allowed | Not allowed |
| Traversal | Sorted order | Random |
set vs unordered_set Performance Comparison
From time complexity (average/amortized) perspective, set is based on balanced binary search tree (typically red-black), so insert/delete/search is O(log n). unordered_set is a hash table, so average O(1), but with many hash collisions can approach worst case O(n).
Practical differences are:
- Small n·order needed: If element count is hundreds~thousands and need sorted traversal or range search like lower_bound,
setis simpler. - Large n·key lookup only: For millions+ where you only need fast “exists/not exists” check and order doesn’t matter,
unordered_setis often advantageous. - Memory: Hash table has additional overhead from buckets/load factor, tree has pointer overhead per node. Varies by n and key size, so measurement is safe.
- Cache locality: Neither uses contiguous memory, both have frequent skip access causing cache misses. Hard to say “unordered is always faster” based on traversal alone.
Selection Summary: If need order/range search use set, if keys are hash-friendly (integers, well-distributed strings) and mostly lookups, consider unordered_set first.
multiset and unordered_multiset
set / unordered_set store only one of each key. If need same value multiple times, use multiset, unordered_multiset.
| Feature | multiset | unordered_multiset |
|---|---|---|
| Order | Maintains sort (same value insertion order is implementation-defined) | No order |
| Lookup | count, equal_range for same key range | count, equal_range |
| Delete | erase(key) removes all of that key, use iterator to remove one | Same |
#include <set>
#include <unordered_set>
#include <iostream>
int main() {
std::multiset<int> ms{1, 1, 2, 2, 2};
std::cout << ms.count(2) << '\n'; // 3
auto [first, last] = ms.equal_range(2);
for (auto it = first; it != last; ++it) { /* ... */ }
std::unordered_multiset<std::string> ums;
ums.insert("a"); ums.insert("a");
std::cout << ums.count("a") << '\n'; // 2
}
When only top k needed, multiset maintains sorted state making it easy to handle max/min at ends. If only counting frequency and order not needed, unordered_multiset or unordered_map<T, int> is often better.
Custom Comparator and Hash Function
set: Compare and operator<
set sorting criterion must satisfy strict weak ordering. Can use operator< or std::less for custom types.
struct Person {
std::string name;
int id;
};
struct ById {
bool operator()(const Person& a, const Person& b) const {
return a.id < b.id;
}
};
std::set<Person, ById> by_id_set;
unordered_set: Hash and KeyEqual
In unordered_set<Key, Hash, KeyEqual>, same key must always produce same hash, and KeyEqual determines actual equality on hash collision. Custom types typically provide both.
struct Point {
int x, y;
bool operator==(const Point& o) const { return x == o.x && y == o.y; }
};
struct PointHash {
std::size_t operator()(const Point& p) const noexcept {
// Simple combination example (consider boost::hash_combine for projects)
return std::hash<int>{}(p.x) ^ (std::hash<int>{}(p.y) << 1);
}
};
std::unordered_set<Point, PointHash> pts;
Caution: When putting strings in unordered_set, bad custom hash causes severe performance drop. If key is std::string, typically use default hash.
Practice: Duplicate Removal·Intersection·Union Summary
- Duplicate removal only with original order preservation: Putting in
setthen moving to vector changes order. If order preservation needed, consider sort then unique orunordered_set+ separate order array. - Intersection·union·difference: Above
set_intersection/set_union/set_differencepatterns require both ranges sorted. Usingsetautomatically satisfies sorted order via iterators. - unordered_set operations: If library doesn’t provide one-shot operations, collect one side into
vectorand sort, or iterate smaller side and check inclusion in other withcount/find(choose better algorithm by size).
Iterator Invalidation
set / multiset
insert/emplace: Existing element iterators and references not invalidated.erase(it): Only that iterator invalid. Other iterators maintained.erase(key)/clear: Iterators to deleted elements invalid.
unordered_set / unordered_multiset
- Rehash changes bucket structure, all iterators may be invalid.
insertexceeding load factor can trigger rehash, so indiscriminate insertion during iteration is dangerous. reserve(n)(or appropriate bucket count reservation) reduces rehash frequency, making iterator invalidation less frequent.erase: Only iterator at deleted position invalid, other iterators usually valid after C++11, but implementation-dependent, so using erase’s return iterator pattern is safe.
for (auto it = s.begin(); it != s.end(); ) {
if (condition) it = s.erase(it); // erase returns next iterator
else ++it;
}
set Basic Usage
#include <set>
#include <iostream>
using namespace std;
int main() {
set<int> s;
// Insert
s.insert(3);
s.insert(1);
s.insert(2);
s.insert(1); // Duplicate ignored
// Traverse (auto-sorted: 1, 2, 3)
for (int x : s) {
cout << x << " ";
}
// Search
if (s.find(2) != s.end()) {
cout << "\n2 exists" << endl;
}
// Delete
s.erase(2);
// Size
cout << "Size: " << s.size() << endl;
return 0;
}
unordered_set Basic Usage
#include <unordered_set>
#include <iostream>
using namespace std;
int main() {
unordered_set<string> words;
words.insert("apple");
words.insert("banana");
words.insert("apple"); // Duplicate ignored
// Check existence (fast)
if (words.count("apple") > 0) {
cout << "apple exists" << endl;
}
// Traverse (order not guaranteed)
for (const string& word : words) {
cout << word << " ";
}
return 0;
}
Practical Examples
Example 1: Remove Duplicates from Array
#include <iostream>
#include <vector>
#include <set>
using namespace std;
vector<int> removeDuplicates(const vector<int>& arr) {
set<int> s(arr.begin(), arr.end());
return vector<int>(s.begin(), s.end());
}
int main() {
vector<int> arr = {5, 2, 8, 2, 9, 5, 1, 8};
cout << "Original: ";
for (int x : arr) cout << x << " ";
vector<int> unique = removeDuplicates(arr);
cout << "\nDuplicate removed (sorted): ";
for (int x : unique) cout << x << " ";
return 0;
}
Explanation: Most basic pattern utilizing set’s auto-sorting and duplicate removal.
Example 2: Intersection/Union of Two Arrays
#include <iostream>
#include <set>
#include <algorithm>
using namespace std;
int main() {
set<int> a = {1, 2, 3, 4, 5};
set<int> b = {3, 4, 5, 6, 7};
// Intersection
set<int> intersection;
set_intersection(a.begin(), a.end(),
b.begin(), b.end(),
inserter(intersection, intersection.begin()));
cout << "Intersection: ";
for (int x : intersection) cout << x << " "; // 3 4 5
// Union
set<int> union_set;
set_union(a.begin(), a.end(),
b.begin(), b.end(),
inserter(union_set, union_set.begin()));
cout << "\nUnion: ";
for (int x : union_set) cout << x << " "; // 1 2 3 4 5 6 7
// Difference
set<int> difference;
set_difference(a.begin(), a.end(),
b.begin(), b.end(),
inserter(difference, difference.begin()));
cout << "\nDifference (A-B): ";
for (int x : difference) cout << x << " "; // 1 2
return 0;
}
Explanation: Can implement mathematical set operations directly. Frequently used in algorithm problems.
Example 3: Visit Check (Graph Traversal)
#include <iostream>
#include <unordered_set>
#include <queue>
#include <vector>
using namespace std;
void bfs(int start, vector<vector<int>>& graph) {
unordered_set<int> visited;
queue<int> q;
q.push(start);
visited.insert(start);
while (!q.empty()) {
int node = q.front();
q.pop();
cout << node << " ";
for (int neighbor : graph[node]) {
if (visited.count(neighbor) == 0) {
visited.insert(neighbor);
q.push(neighbor);
}
}
}
}
int main() {
// Graph: 0-1, 0-2, 1-3, 2-3
vector<vector<int>> graph = {
{1, 2}, // Node 0's neighbors
{0, 3}, // Node 1's neighbors
{0, 3}, // Node 2's neighbors
{1, 2} // Node 3's neighbors
};
cout << "BFS traversal: ";
bfs(0, graph);
return 0;
}
Explanation: Use unordered_set to check visited status at O(1) speed. Essential pattern for graph algorithms.
Common Problems
Problem 1: Cannot Modify set Elements
Symptom: Compile error when trying to modify set element directly
Cause: set elements are const to maintain sorting
Solution:
// ❌ Wrong code
set<int> s = {1, 2, 3};
auto it = s.find(2);
*it = 5; // Compile error! const int&
// ✅ Correct code (delete then reinsert)
set<int> s = {1, 2, 3};
auto it = s.find(2);
if (it != s.end()) {
s.erase(it);
s.insert(5);
}
// ✅ For structs (use mutable)
struct Item {
int id;
mutable int count; // mutable can be modified
bool operator<(const Item& other) const {
return id < other.id;
}
};
set<Item> items;
auto it = items.find(Item{1, 0});
if (it != items.end()) {
it->count++; // OK (mutable)
}
Problem 2: Custom Comparison Function
Symptom: Compile error when inserting custom type into set
Cause: Needs operator< or comparison function
Solution:
// ❌ Compile error
struct Point {
int x, y;
};
set<Point> points; // Error! No operator<
// ✅ Method 1: Define operator<
struct Point {
int x, y;
bool operator<(const Point& other) const {
if (x != other.x) return x < other.x;
return y < other.y;
}
};
set<Point> points; // OK
// ✅ Method 2: Comparison function object
struct PointCompare {
bool operator()(const Point& a, const Point& b) const {
if (a.x != b.x) return a.x < b.x;
return a.y < b.y;
}
};
set<Point, PointCompare> points; // OK
// ✅ Method 3: Lambda (C++11)
auto cmp = [](const Point& a, const Point& b) {
if (a.x != b.x) return a.x < b.x;
return a.y < b.y;
};
set<Point, decltype(cmp)> points(cmp); // OK
Problem 3: Confusion with multiset
Symptom: Want to allow duplicates but set doesn’t
Cause: set doesn’t allow duplicates, multiset does
Solution:
// ❌ set doesn't allow duplicates
set<int> s;
s.insert(1);
s.insert(1);
s.insert(1);
cout << s.size(); // 1 (duplicates ignored)
// ✅ Use multiset
#include <set>
multiset<int> ms;
ms.insert(1);
ms.insert(1);
ms.insert(1);
cout << ms.size(); // 3 (duplicates allowed)
// Count specific value
cout << ms.count(1); // 3
// Delete all of specific value
ms.erase(1); // All 1s deleted
// Delete only one
auto it = ms.find(1);
if (it != ms.end()) {
ms.erase(it); // Delete only one
}
Performance Optimization
Optimization Strategies
-
Choose efficient data structure
- How: Use appropriate STL container for situation
- Effect: Improve time complexity
-
Prevent unnecessary copying
- How: Use reference passing
- Effect: Reduce memory usage
-
Compiler optimization
- How: Use -O2, -O3 flags
- Effect: Improve execution speed
Benchmark Results
| Method | Execution Time | Memory Usage | Notes |
|---|---|---|---|
| Basic implementation | 100ms | 10MB | - |
| Optimization 1 | 80ms | 8MB | Reference passing |
| Optimization 2 | 50ms | 5MB | STL algorithms |
Conclusion: Appropriate optimization can improve performance 2x or more
Caution: Absolute time/memory numbers in above table are examples. In real services, remeasure before/after with same input and compile options.
Advanced: reserve·rehash·Load Distribution
unordered_set triggers rehash when exceeding load factor, at which point all iterators may be invalid. If approximate element count known before insertion:
std::unordered_set<int> s;
s.reserve(1'000'000); // Reduce rehash frequency, stabilize benchmark
set doesn’t have reserve, but reducing unnecessary copies/temporary objects helps perceived performance.
Advanced: Heterogeneous Lookup (C++20, unordered_set)
When key is std::string, to prevent find("literal") from creating temporary std::string, need custom policy with transparent hash/transparent equality (std::hash<std::string_view>, std::equal_to<>). Standard specialization varies by implementation/version, so check project’s standard library documentation.
// Concept example: lookup with string_view (must verify project policy/standard version)
// std::unordered_set<std::string, SVHash, SVEqual> etc.
Advanced: node_type / extract (C++17)
set/unordered_set can extract nodes and move to other containers, used for patterns that update keys without reallocation.
std::set<int> a{1, 2, 3};
auto nh = a.extract(2);
if (!nh.empty()) {
nh.value() = 5;
a.insert(std::move(nh));
}
unordered_set similar but watch rehash timing.
Advanced: Benchmark Method (Practically)
- Fix: Compiler optimization (
-O2/-O3), CPU fixed frequency (if possible), same seed. - Warmup: Start timer after cache/hash table warmup.
- Metrics: Not just average but p95/p99, especially for
unordered_*include worst-case hash input as separate case.
#include <chrono>
template<typename F>
auto bench(F&& f, int reps) {
using clock = std::chrono::steady_clock;
auto t0 = clock::now();
for (int i = 0; i < reps; ++i) f();
return clock::now() - t0;
}
Advanced: Debugging Guide
| Symptom | Check |
|---|---|
Slow in unordered_set | Hash quality, key collision, load factor, wrong operator== |
| Abnormal behavior during iteration | Rehash invalidates iterators — use reserve or copy-then-iterate |
set sorting is strange | Check if comparison operation satisfies strict weak ordering |
Advanced: Common Mistake Patterns (Additional)
- Using
vectoras key inunordered_setwith only default hash → Hash inefficient or definition missing. Use immutable key as identifier or define custom hash. - Wrong iterator increment during
erasewhile iterating → Useit = container.erase(it)pattern. - Assuming const member read-only is safe in multithreading →
unordered_setneeds synchronization even for reads (read-only snapshot orshared_mutexdesign).
FAQ
Q1: Can beginners learn this?
A: Yes, this guide is written for beginners. Basic C++ syntax knowledge is sufficient.
Q2: Frequently used in practice?
A: Yes, very frequently. Essential concept in production projects.
Q3: Comparison with other languages?
A: C++ advantage is performance and control. Faster than Python, more flexible than Java.
Q4: How long to learn?
A: Basic concepts 1-2 hours, mastery about 1-2 weeks.
Q5: Recommended learning order?
A:
- Learn basic syntax
- Follow simple examples
- Apply to practical projects
- Learn advanced techniques
Q6: Common mistakes?
A:
- Not initializing
- Memory management mistakes
- Not considering time complexity
- Missing exception handling
Related Posts (Internal Links)
Posts connected to this topic.
- C++ string | “String Processing” Complete Guide [Practical Function Summary]
- C++ Algorithm Set | “Set Algorithms” Guide
- C++ STL vector | “More Convenient Than Arrays” Vector Complete Guide [Practical Examples]
Related Posts
- C++ map vs unordered_map (STL Series)
- C++ map·set Complete Guide | ordered vs unordered· Custom Keys
Practical Tips
Tips you can apply immediately in practice.
Debugging Tips
- Check compiler warnings first when problems occur
- Reproduce problem with simple test cases
- Use print debugging to trace execution
Performance Tips
- Choose correct data structure for use case
- Measure before optimizing
- Profile to find bottlenecks
Code Review Tips
- Pre-check frequently pointed out parts
- Follow team coding conventions
- Write clear variable names
Practical Checklist
When Choosing Container
- Need sorted order?
- Need range queries (lower_bound)?
- Only existence check?
- What is expected element count?
When Using Custom Types
- Defined operator< or comparator?
- Defined hash and operator==?
- Tested with sample data?
When Iterating
- Modifying container during iteration?
- Using safe erase pattern?
- Considered iterator invalidation?
Keywords
C++, set, unordered_set, Set, STL, Duplicate Removal, Hash Table, Binary Search Tree, Iterator Invalidation, Custom Comparator, multiset