C++ sregex_iterator | Regex iterators for all matches

C++ sregex_iterator | Regex iterators for all matches

이 글의 핵심

Practical guide to regex iterators: walking every non-overlapping match, tokenizing strings, performance notes, and log-parsing patterns.

What is regex_iterator?

Traverse all matches (C++11)```cpp #include

std::string text = “C++ 11, C++ 14, C++ 17”; std::regex pattern{R”(\d+)”};

auto begin = std::sregex_iterator(text.begin(), text.end(), pattern); auto end = std::sregex_iterator();

for (auto it = begin; it != end; ++it) { std::cout << it->str() << std::endl; } // 11 // 14 // 17 ## Default usecpp #include #include

std::string text = “abc 123 def 456”; std::regex pattern{R”(\d+)”};

// 반복자 생성 auto begin = std::sregex_iterator(text.begin(), text.end(), pattern); auto end = std::sregex_iterator();

// 순회 for (auto it = begin; it != end; ++it) { std::smatch match = *it; std::cout << match.str() << std::endl; }


### Example 1: Word extraction```cpp
#include <regex>
#include <vector>

std::vector<std::string> extractWords(const std::string& text) {
    std::regex pattern{R"(\b\w+\b)"};
    
    auto begin = std::sregex_iterator(text.begin(), text.end(), pattern);
    auto end = std::sregex_iterator();
    
    std::vector<std::string> words;
    for (auto it = begin; it != end; ++it) {
        words.push_back(it->str());
    }
    
    return words;
}

int main() {
    auto words = extractWords("Hello, World! C++ 2026");
    
    for (const auto& word : words) {
        std::cout << word << std::endl;
    }
    // Hello
    // World
    // C
    // 2026
}
```### Example 2: Capture Group```cpp
#include <regex>

int main() {
    std::string text = "[email protected], [email protected]";
    std::regex pattern{R"((\w+)@(\w+\.\w+))"};
    
    auto begin = std::sregex_iterator(text.begin(), text.end(), pattern);
    auto end = std::sregex_iterator();
    
    for (auto it = begin; it != end; ++it) {
        std::smatch match = *it;
        std::cout << "이메일: " << match[0] << std::endl;
        std::cout << "사용자: " << match[1] << std::endl;
        std::cout << "도메인: " << match[2] << std::endl;
        std::cout << std::endl;
    }
}
```### Example 3: URL parsing```cpp
#include <regex>

struct URL {
    std::string protocol;
    std::string host;
    std::string path;
};

std::vector<URL> extractURLs(const std::string& text) {
    std::regex pattern{R"((https?)://([^/]+)(/[^\s]*))"};
    
    auto begin = std::sregex_iterator(text.begin(), text.end(), pattern);
    auto end = std::sregex_iterator();
    
    std::vector<URL> urls;
    for (auto it = begin; it != end; ++it) {
        std::smatch match = *it;
        urls.push_back({
            match[1].str(),  // protocol
            match[2].str(),  // host
            match[3].str()   // path
        });
    }
    
    return urls;
}
```### Example 4: Token Splitting```cpp
#include <regex>

std::vector<std::string> tokenize(const std::string& text) {
    std::regex pattern{R"(\s+)"};  // 공백
    
    std::sregex_token_iterator begin(text.begin(), text.end(), pattern, -1);
    std::sregex_token_iterator end;
    
    return {begin, end};
}

int main() {
    auto tokens = tokenize("Hello   World  C++");
    
    for (const auto& token : tokens) {
        std::cout << "[" << token << "]" << std::endl;
    }
    // [Hello]
    // [World]
    // [C++]
}

regex_token_iterator

std::string text = "a,b,c,d";
std::regex pattern{","};

// -1: 구분자 제외
std::sregex_token_iterator begin(text.begin(), text.end(), pattern, -1);
std::sregex_token_iterator end;

for (auto it = begin; it != end; ++it) {
    std::cout << *it << std::endl;
}
// a
// b
// c
// d
```## Frequently occurring problems

### Problem 1: Iterator Lifetime```cpp
// ❌ 댕글링
auto getIterator() {
    std::string text = "hello 123";
    std::regex pattern{R"(\d+)"};
    return std::sregex_iterator(text.begin(), text.end(), pattern);
    // text 소멸
}

// ✅ 문자열 수명 보장
std::string text = "hello 123";
auto it = std::sregex_iterator(text.begin(), text.end(), pattern);
```### Issue 2: Performance```cpp
// regex는 느림
std::regex pattern{R"(\d+)"};

// ❌ 매번 컴파일
for (const auto& text : texts) {
    std::regex pattern{R"(\d+)"};  // 반복 컴파일
    std::regex_search(text, pattern);
}

// ✅ 한 번만 컴파일
std::regex pattern{R"(\d+)"};
for (const auto& text : texts) {
    std::regex_search(text, pattern);
}
```### Problem 3: Empty match```cpp
std::string text = "hello";
std::regex pattern{R"(\d*)"};  // 0개 이상

// 빈 매치 가능
auto begin = std::sregex_iterator(text.begin(), text.end(), pattern);
auto end = std::sregex_iterator();

for (auto it = begin; it != end; ++it) {
    if (!it->str().empty()) {
        std::cout << it->str() << std::endl;
    }
}
```### Issue 4: Group Index```cpp
std::string text = "[email protected]";
std::regex pattern{R"((\w+)@(\w+)\.(\w+))"};

std::smatch matches;
if (std::regex_match(text, matches, pattern)) {
    // matches[0]: 전체 매치
    // matches[1]: 첫 번째 그룹
    // matches[2]: 두 번째 그룹
    // ...
}
```## Utilization pattern```cpp
// 1. 검증
bool isValid = std::regex_match(text, pattern);

// 2. 검색
std::smatch matches;
std::regex_search(text, matches, pattern);

// 3. 치환
auto result = std::regex_replace(text, pattern, replacement);

// 4. 모든 매치
auto it = std::sregex_iterator(begin, end, pattern);
```## Behavior of regular expression iterative matching

`std::regex_search` searches **only one section at a time**. To iterate over **all non-nested matches** in a string, use `std::regex_iterator` (for character sequences) / `std::sregex_iterator` (for `std::string` iterators). The iterator is internally implemented as a pattern that calls `regex_search` again **starting from the position after the end of the previous match**.

- Not a **global match**: the default iterator lists **partial matches**. Use `regex_match` to see if an entire string matches a pattern exactly.
- **Nested/Overlapping**: Standard iterators generally move the **next search start point to the end of the match**, so **overlapping patterns** (e.g. `aa` over `a`) require separate design depending on requirements.

## `sregex_iterator` type family

- **`sregex_iterator`**: `std::string::const_iterator` scope + initialized to `std::regex`.
- **`cregex_iterator`**: for `const char*` ranges.
- **`wsregex_iterator`**: for `std::wstring`.

The widely used idiom is to leave the terminal iterator in the default construction **`std::sregex_iterator()`**.```cpp
auto begin = std::sregex_iterator(text.begin(), text.end(), pattern);
auto end   = std::sregex_iterator();
for (auto it = begin; it != end; ++it) {
    const std::smatch& m = *it;
    // m.ready(), m.size(), m.str(n)
}
```## In practice: log parsing example

Below is an example of extracting **Timestamp·Level·Message** from one line (pattern adjusted to suit log format).```cpp
#include <regex>
#include <string>
#include <iostream>
#include <vector>

struct LogLine {
    std::string timestamp;
    std::string level;
    std::string message;
};

// 예: "2026-03-30 12:00:00 ERROR something failed"
bool parseLogLine(const std::string& line, LogLine& out) {
    static const std::regex re(
        R"((\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (\w+) (.*))");
    std::smatch m;
    if (!std::regex_match(line, m, re)) {
        return false;
    }
    out.timestamp = m[1].str();
    out.level = m[2].str();
    out.message = m[3].str();
    return true;
}

// 여러 줄에서 특정 레벨만 모으기
std::vector<std::string> extractErrors(const std::string& text) {
    std::regex levelLine(R"(\b(ERROR|CRITICAL)\b.*)");
    auto begin = std::sregex_iterator(text.begin(), text.end(), levelLine);
    auto end = std::sregex_iterator();
    std::vector<std::string> errors;
    for (auto it = begin; it != end; ++it) {
        errors.push_back(it->str());
    }
    return errors;
}
```**Tip**: If you upload a file as a whole to `string` and then divide it into **line units**, `regex_iterator` can be used for “multiple tokens within a line” and **line-level loop** can be used for “record boundaries”.

## Performance precautions

- **Compilation cost**: The `std::regex` constructor **compiles** the pattern. Create and reuse **only once** outside the loop.
- **Engine**: GCC/LLVM's `std::regex` may be slower than expected on very large inputs or complex patterns. If it's a hot path, look at Boost.Regex, RE2 style libraries, or **manual parsers** after **profiling**.
- **Assignment**: Using `smatch`/`sregex_iterator` can internally create **substrings**. For bulk logs, a custom scanner based on **`string_view`** or a **fixed buffer** parser may be better.
- **`std::regex_constants::optimize`**: May hint at optimization depending on implementation, but **not guaranteed to always be faster**—measurement takes precedence.

## FAQ

### Q1: What about Regex?

**A**: Regular expression (C++11).

### Q2: Iterator?

**A**: Any match of `regex_iterator`.

### Q3: Capture group?

**A**: Use `()`. `matches[N]`.

### Q4: Performance?

**A**: Slow. Compilation reuse.

### Q5: Token split?

**A**: `regex_token_iterator`.

### Q6: What are the learning resources?

**A**: 
- “Mastering Regular Expressions”
- cppreference.com
- "C++ Primer"

---

## Good article to read together (internal link)

Here's another article related to this topic.

- [C++ Regex | “Regular Expressions” Guide](/blog/cpp-regex/)
- [C++ subrange | "Subrange" guide](/blog/cpp-subrange/)
- [C++ regular expression | "regex" complete guide](/blog/cpp-regex-guide/)

## Practical tips

These are tips that can be applied right away in practice.

### Debugging tips
- If you run into a problem, check the compiler warnings first.
- Reproduce the problem with a simple test case

### Performance Tips
- Don't optimize without profiling
- Set measurable indicators first

### Code review tips
- Check in advance for areas that are frequently pointed out in code reviews.
- Follow your team's coding conventions

---
## Practical checklist

This is what you need to check when applying this concept in practice.

### Before writing code
- [ ] Is this technique the best way to solve the current problem?
- [ ] Can team members understand and maintain this code?
- [ ] Does it meet the performance requirements?

### Writing code
- [ ] Have you resolved all compiler warnings?
- [ ] Have you considered edge cases?
- [ ] Is error handling appropriate?

### When reviewing code
- [ ] Is the intent of the code clear?
- [ ] Are there enough test cases?
- [ ] Is it documented?

Use this checklist to reduce mistakes and improve code quality.

---

## Keywords covered in this article (related search terms)

This article will be helpful if you search for C++, regex, iterator, match, C++11, etc.

---

## Related articles

- [C++ Regex | ](/blog/cpp-regex/)
- [C++ async & launch | ](/blog/cpp-async-launch/)
- [C++ Atomic Operations | ](/blog/cpp-atomic-operations/)
- [C++ Attributes | ](/blog/cpp-attributes/)
- [C++ auto keyword | ](/blog/cpp-auto-keyword/)