C++ sregex_iterator | Regex iterators for all matches
이 글의 핵심
Practical guide to regex iterators: walking every non-overlapping match, tokenizing strings, performance notes, and log-parsing patterns.
What is regex_iterator?
Traverse all matches (C++11)```cpp
#include
std::string text = “C++ 11, C++ 14, C++ 17”; std::regex pattern{R”(\d+)”};
auto begin = std::sregex_iterator(text.begin(), text.end(), pattern); auto end = std::sregex_iterator();
for (auto it = begin; it != end; ++it) {
std::cout << it->str() << std::endl;
}
// 11
// 14
// 17
## Default usecpp
#include
std::string text = “abc 123 def 456”; std::regex pattern{R”(\d+)”};
// 반복자 생성 auto begin = std::sregex_iterator(text.begin(), text.end(), pattern); auto end = std::sregex_iterator();
// 순회 for (auto it = begin; it != end; ++it) { std::smatch match = *it; std::cout << match.str() << std::endl; }
### Example 1: Word extraction```cpp
#include <regex>
#include <vector>
std::vector<std::string> extractWords(const std::string& text) {
std::regex pattern{R"(\b\w+\b)"};
auto begin = std::sregex_iterator(text.begin(), text.end(), pattern);
auto end = std::sregex_iterator();
std::vector<std::string> words;
for (auto it = begin; it != end; ++it) {
words.push_back(it->str());
}
return words;
}
int main() {
auto words = extractWords("Hello, World! C++ 2026");
for (const auto& word : words) {
std::cout << word << std::endl;
}
// Hello
// World
// C
// 2026
}
```### Example 2: Capture Group```cpp
#include <regex>
int main() {
std::string text = "[email protected], [email protected]";
std::regex pattern{R"((\w+)@(\w+\.\w+))"};
auto begin = std::sregex_iterator(text.begin(), text.end(), pattern);
auto end = std::sregex_iterator();
for (auto it = begin; it != end; ++it) {
std::smatch match = *it;
std::cout << "이메일: " << match[0] << std::endl;
std::cout << "사용자: " << match[1] << std::endl;
std::cout << "도메인: " << match[2] << std::endl;
std::cout << std::endl;
}
}
```### Example 3: URL parsing```cpp
#include <regex>
struct URL {
std::string protocol;
std::string host;
std::string path;
};
std::vector<URL> extractURLs(const std::string& text) {
std::regex pattern{R"((https?)://([^/]+)(/[^\s]*))"};
auto begin = std::sregex_iterator(text.begin(), text.end(), pattern);
auto end = std::sregex_iterator();
std::vector<URL> urls;
for (auto it = begin; it != end; ++it) {
std::smatch match = *it;
urls.push_back({
match[1].str(), // protocol
match[2].str(), // host
match[3].str() // path
});
}
return urls;
}
```### Example 4: Token Splitting```cpp
#include <regex>
std::vector<std::string> tokenize(const std::string& text) {
std::regex pattern{R"(\s+)"}; // 공백
std::sregex_token_iterator begin(text.begin(), text.end(), pattern, -1);
std::sregex_token_iterator end;
return {begin, end};
}
int main() {
auto tokens = tokenize("Hello World C++");
for (const auto& token : tokens) {
std::cout << "[" << token << "]" << std::endl;
}
// [Hello]
// [World]
// [C++]
}
regex_token_iterator
std::string text = "a,b,c,d";
std::regex pattern{","};
// -1: 구분자 제외
std::sregex_token_iterator begin(text.begin(), text.end(), pattern, -1);
std::sregex_token_iterator end;
for (auto it = begin; it != end; ++it) {
std::cout << *it << std::endl;
}
// a
// b
// c
// d
```## Frequently occurring problems
### Problem 1: Iterator Lifetime```cpp
// ❌ 댕글링
auto getIterator() {
std::string text = "hello 123";
std::regex pattern{R"(\d+)"};
return std::sregex_iterator(text.begin(), text.end(), pattern);
// text 소멸
}
// ✅ 문자열 수명 보장
std::string text = "hello 123";
auto it = std::sregex_iterator(text.begin(), text.end(), pattern);
```### Issue 2: Performance```cpp
// regex는 느림
std::regex pattern{R"(\d+)"};
// ❌ 매번 컴파일
for (const auto& text : texts) {
std::regex pattern{R"(\d+)"}; // 반복 컴파일
std::regex_search(text, pattern);
}
// ✅ 한 번만 컴파일
std::regex pattern{R"(\d+)"};
for (const auto& text : texts) {
std::regex_search(text, pattern);
}
```### Problem 3: Empty match```cpp
std::string text = "hello";
std::regex pattern{R"(\d*)"}; // 0개 이상
// 빈 매치 가능
auto begin = std::sregex_iterator(text.begin(), text.end(), pattern);
auto end = std::sregex_iterator();
for (auto it = begin; it != end; ++it) {
if (!it->str().empty()) {
std::cout << it->str() << std::endl;
}
}
```### Issue 4: Group Index```cpp
std::string text = "[email protected]";
std::regex pattern{R"((\w+)@(\w+)\.(\w+))"};
std::smatch matches;
if (std::regex_match(text, matches, pattern)) {
// matches[0]: 전체 매치
// matches[1]: 첫 번째 그룹
// matches[2]: 두 번째 그룹
// ...
}
```## Utilization pattern```cpp
// 1. 검증
bool isValid = std::regex_match(text, pattern);
// 2. 검색
std::smatch matches;
std::regex_search(text, matches, pattern);
// 3. 치환
auto result = std::regex_replace(text, pattern, replacement);
// 4. 모든 매치
auto it = std::sregex_iterator(begin, end, pattern);
```## Behavior of regular expression iterative matching
`std::regex_search` searches **only one section at a time**. To iterate over **all non-nested matches** in a string, use `std::regex_iterator` (for character sequences) / `std::sregex_iterator` (for `std::string` iterators). The iterator is internally implemented as a pattern that calls `regex_search` again **starting from the position after the end of the previous match**.
- Not a **global match**: the default iterator lists **partial matches**. Use `regex_match` to see if an entire string matches a pattern exactly.
- **Nested/Overlapping**: Standard iterators generally move the **next search start point to the end of the match**, so **overlapping patterns** (e.g. `aa` over `a`) require separate design depending on requirements.
## `sregex_iterator` type family
- **`sregex_iterator`**: `std::string::const_iterator` scope + initialized to `std::regex`.
- **`cregex_iterator`**: for `const char*` ranges.
- **`wsregex_iterator`**: for `std::wstring`.
The widely used idiom is to leave the terminal iterator in the default construction **`std::sregex_iterator()`**.```cpp
auto begin = std::sregex_iterator(text.begin(), text.end(), pattern);
auto end = std::sregex_iterator();
for (auto it = begin; it != end; ++it) {
const std::smatch& m = *it;
// m.ready(), m.size(), m.str(n)
}
```## In practice: log parsing example
Below is an example of extracting **Timestamp·Level·Message** from one line (pattern adjusted to suit log format).```cpp
#include <regex>
#include <string>
#include <iostream>
#include <vector>
struct LogLine {
std::string timestamp;
std::string level;
std::string message;
};
// 예: "2026-03-30 12:00:00 ERROR something failed"
bool parseLogLine(const std::string& line, LogLine& out) {
static const std::regex re(
R"((\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (\w+) (.*))");
std::smatch m;
if (!std::regex_match(line, m, re)) {
return false;
}
out.timestamp = m[1].str();
out.level = m[2].str();
out.message = m[3].str();
return true;
}
// 여러 줄에서 특정 레벨만 모으기
std::vector<std::string> extractErrors(const std::string& text) {
std::regex levelLine(R"(\b(ERROR|CRITICAL)\b.*)");
auto begin = std::sregex_iterator(text.begin(), text.end(), levelLine);
auto end = std::sregex_iterator();
std::vector<std::string> errors;
for (auto it = begin; it != end; ++it) {
errors.push_back(it->str());
}
return errors;
}
```**Tip**: If you upload a file as a whole to `string` and then divide it into **line units**, `regex_iterator` can be used for “multiple tokens within a line” and **line-level loop** can be used for “record boundaries”.
## Performance precautions
- **Compilation cost**: The `std::regex` constructor **compiles** the pattern. Create and reuse **only once** outside the loop.
- **Engine**: GCC/LLVM's `std::regex` may be slower than expected on very large inputs or complex patterns. If it's a hot path, look at Boost.Regex, RE2 style libraries, or **manual parsers** after **profiling**.
- **Assignment**: Using `smatch`/`sregex_iterator` can internally create **substrings**. For bulk logs, a custom scanner based on **`string_view`** or a **fixed buffer** parser may be better.
- **`std::regex_constants::optimize`**: May hint at optimization depending on implementation, but **not guaranteed to always be faster**—measurement takes precedence.
## FAQ
### Q1: What about Regex?
**A**: Regular expression (C++11).
### Q2: Iterator?
**A**: Any match of `regex_iterator`.
### Q3: Capture group?
**A**: Use `()`. `matches[N]`.
### Q4: Performance?
**A**: Slow. Compilation reuse.
### Q5: Token split?
**A**: `regex_token_iterator`.
### Q6: What are the learning resources?
**A**:
- “Mastering Regular Expressions”
- cppreference.com
- "C++ Primer"
---
## Good article to read together (internal link)
Here's another article related to this topic.
- [C++ Regex | “Regular Expressions” Guide](/blog/cpp-regex/)
- [C++ subrange | "Subrange" guide](/blog/cpp-subrange/)
- [C++ regular expression | "regex" complete guide](/blog/cpp-regex-guide/)
## Practical tips
These are tips that can be applied right away in practice.
### Debugging tips
- If you run into a problem, check the compiler warnings first.
- Reproduce the problem with a simple test case
### Performance Tips
- Don't optimize without profiling
- Set measurable indicators first
### Code review tips
- Check in advance for areas that are frequently pointed out in code reviews.
- Follow your team's coding conventions
---
## Practical checklist
This is what you need to check when applying this concept in practice.
### Before writing code
- [ ] Is this technique the best way to solve the current problem?
- [ ] Can team members understand and maintain this code?
- [ ] Does it meet the performance requirements?
### Writing code
- [ ] Have you resolved all compiler warnings?
- [ ] Have you considered edge cases?
- [ ] Is error handling appropriate?
### When reviewing code
- [ ] Is the intent of the code clear?
- [ ] Are there enough test cases?
- [ ] Is it documented?
Use this checklist to reduce mistakes and improve code quality.
---
## Keywords covered in this article (related search terms)
This article will be helpful if you search for C++, regex, iterator, match, C++11, etc.
---
## Related articles
- [C++ Regex | ](/blog/cpp-regex/)
- [C++ async & launch | ](/blog/cpp-async-launch/)
- [C++ Atomic Operations | ](/blog/cpp-atomic-operations/)
- [C++ Attributes | ](/blog/cpp-attributes/)
- [C++ auto keyword | ](/blog/cpp-auto-keyword/)