C++ std::regex | Regular expressions in C++11

C++ std::regex | Regular expressions in C++11

이 글의 핵심

Hands-on guide to std::regex: validating emails and URLs, extracting matches, replacing text, and avoiding common pitfalls like recompiling patterns in a loop.

Introduction

C++11 <regex> adds regular expressions to the standard library. You can match patterns, search, replace, and build robust string-processing pipelines.


1. Regex basics

Basic usage

A minimal example that checks an email-shaped string:

#include <regex>
#include <iostream>
#include <string>

int main() {
    std::string text = "[email protected]";
    
    // Build regex pattern
    // R"(...)" : raw string literal (no extra escaping)
    // (\w+) : first capture — one or more word chars
    // @ : literal @
    // (\w+\.\w+) : second capture — domain (e.g. example.com)
    std::regex pattern{R"((\w+)@(\w+\.\w+))"};
    
    // regex_match: entire string must match the pattern
    if (std::regex_match(text, pattern)) {
        std::cout << "Valid email shape" << std::endl;
    }
    
    return 0;
}

Pattern pieces:

  • \w : word character (a-z, A-Z, 0-9, _)
  • + : one or more
  • \. : literal dot (escaped)
  • () : capture group (extract later)

Raw string literals

#include <regex>

// ❌ heavy escaping
std::regex pattern1{"\\d+\\.\\d+"};

// ✅ raw string (recommended)
std::regex pattern2{R"(\d+\.\d+)"};

std::regex email{R"(^[\w\.-]+@[\w\.-]+\.\w+$)"};

2. Regex algorithms

regex_match (full match)

regex_match returns true only if the entire string matches the pattern:

#include <regex>
#include <iostream>

int main() {
    std::string text1 = "123";
    std::string text2 = "abc123";
    
    // \d+ : one or more digits
    std::regex pattern{R"(\d+)"};
    
    // "123" — whole string is digits → match
    std::cout << std::regex_match(text1, pattern) << std::endl;  // 1 (true)
    
    // "abc123" — prefix letters → no full match
    std::cout << std::regex_match(text2, pattern) << std::endl;  // 0 (false)
    
    // regex_match requires the entire string to match (no partial match)
    
    return 0;
}

When to use:

  • Validate phone, email, postal codes against a full-string pattern
  • Check filenames against a pattern end-to-end

regex_search (partial match)

regex_search returns true if the pattern matches somewhere in the string:

#include <regex>
#include <iostream>

int main() {
    std::string text = "C++ 2026";
    
    // \d+ : one or more digits
    std::regex pattern{R"(\d+)"};
    
    // "C++ 2026" contains "2026" → success
    if (std::regex_search(text, pattern)) {
        std::cout << "found digits" << std::endl;
    }
    
    // regex_search finds the first match; use sregex_iterator for all matches
    
    return 0;
}

regex_match vs regex_search:

FunctionBehaviorExample
regex_matchFull string matches"123"\d+ → true
"abc123"\d+ → false
regex_searchSubstring matches"123"\d+ → true
"abc123"\d+ → true

When to use:

  • Log parsing: find error lines in logs
  • Search: find a pattern in a document
  • Extraction: pull data from HTML/XML

regex_replace (replace)

#include <regex>
#include <iostream>

int main() {
    std::string text = "Hello World 2026";
    std::regex pattern{R"(\d+)"};
    
    // wrap digits in brackets
    std::string result = std::regex_replace(text, pattern, "[$&]");
    std::cout << result << std::endl;  // Hello World [2026]
    
    return 0;
}

3. Capture groups

Basic captures

#include <regex>
#include <iostream>

int main() {
    std::string text = "2026-03-29";
    std::regex pattern{R"((\d{4})-(\d{2})-(\d{2}))"};
    
    std::smatch matches;
    if (std::regex_match(text, matches, pattern)) {
        std::cout << "full: " << matches[0] << std::endl;  // 2026-03-29
        std::cout << "year: " << matches[1] << std::endl;    // 2026
        std::cout << "month: " << matches[2] << std::endl;    // 03
        std::cout << "day: " << matches[3] << std::endl;    // 29
    }
    
    return 0;
}

Named captures (not in C++11 regex)

#include <regex>
#include <iostream>

int main() {
    std::string text = "[email protected]";
    std::regex pattern{R"((\w+)@(\w+\.\w+))"};
    
    std::smatch matches;
    if (std::regex_match(text, matches, pattern)) {
        std::string username = matches[1];
        std::string domain = matches[2];
        
        std::cout << "user: " << username << std::endl;  // user
        std::cout << "domain: " << domain << std::endl;    // example.com
    }
    
    return 0;
}

4. Iterators

Find every match

#include <regex>
#include <iostream>
#include <string>

int main() {
    std::string text = "C++ 11, C++ 14, C++ 17, C++ 20";
    std::regex pattern{R"(C\+\+ (\d+))"};
    
    // iterate all matches
    auto begin = std::sregex_iterator(text.begin(), text.end(), pattern);
    auto end = std::sregex_iterator();
    
    for (auto it = begin; it != end; ++it) {
        std::smatch match = *it;
        std::cout << "full: " << match.str() << std::endl;
        std::cout << "version: " << match[1] << std::endl;
    }
    
    // Output:
    // full: C++ 11
    // version: 11
    // ...
    // ...
    
    return 0;
}

Token iterator

#include <regex>
#include <iostream>

int main() {
    std::string text = "apple,banana,cherry";
    std::regex pattern{R"(,)"};
    
    // split on delimiter
    std::sregex_token_iterator begin(text.begin(), text.end(), pattern, -1);
    std::sregex_token_iterator end;
    
    for (auto it = begin; it != end; ++it) {
        std::cout << *it << std::endl;
    }
    
    // Output:
    // apple
    // banana
    // cherry
    
    return 0;
}

5. Practical examples

Example 1: Email validation

#include <regex>
#include <iostream>
#include <string>

bool isValidEmail(const std::string& email) {
    // simple email pattern
    std::regex pattern{R"(^[\w\.-]+@[\w\.-]+\.\w{2,}$)"};
    return std::regex_match(email, pattern);
}

int main() {
    std::string emails[] = {
        "[email protected]",
        "invalid.email",
        "[email protected]",
        "@example.com"
    };
    
    for (const auto& email : emails) {
        std::cout << email << ": " 
                  << (isValidEmail(email) ? "valid" : "invalid") 
                  << std::endl;
    }
    
    return 0;
}

Example 2: Parse URL

#include <regex>
#include <iostream>

struct URL {
    std::string protocol;
    std::string domain;
    std::string path;
};

URL parseURL(const std::string& url) {
    std::regex pattern{R"(^(https?)://([^/]+)(/.*)?$)"};
    std::smatch matches;
    
    if (std::regex_match(url, matches, pattern)) {
        return {
            matches[1].str(),  // protocol
            matches[2].str(),  // domain
            matches[3].str()   // path
        };
    }
    
    return {};
}

int main() {
    std::string url = "https://example.com/path/to/page";
    URL parsed = parseURL(url);
    
    std::cout << "protocol: " << parsed.protocol << std::endl;
    std::cout << "domain: " << parsed.domain << std::endl;
    std::cout << "path: " << parsed.path << std::endl;
    
    return 0;
}

Example 3: Log parsing

#include <regex>
#include <iostream>
#include <string>
#include <vector>

struct LogEntry {
    std::string timestamp;
    std::string level;
    std::string message;
};

std::vector<LogEntry> parseLog(const std::string& log) {
    std::vector<LogEntry> entries;
    
    // pattern: [timestamp] [LEVEL] message
    std::regex pattern{R"(\[([^\]]+)\] \[(\w+)\] (.+))"};
    
    std::istringstream stream(log);
    std::string line;
    
    while (std::getline(stream, line)) {
        std::smatch matches;
        if (std::regex_match(line, matches, pattern)) {
            entries.push_back({
                matches[1].str(),  // timestamp
                matches[2].str(),  // level
                matches[3].str()   // message
            });
        }
    }
    
    return entries;
}

int main() {
    std::string log = 
        "[2026-03-29 14:30:25] [INFO] Server started\n"
        "[2026-03-29 14:30:26] [ERROR] Connection failed\n"
        "[2026-03-29 14:30:27] [DEBUG] Retrying...";
    
    auto entries = parseLog(log);
    
    for (const auto& entry : entries) {
        std::cout << entry.timestamp << " [" << entry.level << "] " 
                  << entry.message << std::endl;
    }
    
    return 0;
}

6. Common problems

Problem 1: Escaping

#include <regex>

// ❌ painful escaping
std::regex pattern1{"\\d+\\.\\d+"};

// ✅ raw string literal
std::regex pattern2{R"(\d+\.\d+)"};

// complex patterns stay readable
std::regex email{R"(^[\w\.-]+@[\w\.-]+\.\w+$)"};

Problem 2: Performance

#include <regex>
#include <iostream>
#include <chrono>
#include <vector>

int main() {
    std::vector<std::string> texts = {"test1", "test2", "test3"};
    
    // ❌ create regex every iteration (slow)
    auto start1 = std::chrono::high_resolution_clock::now();
    for (const auto& text : texts) {
        std::regex pattern{R"(\d+)"};  // created each time!
        std::regex_search(text, pattern);
    }
    auto end1 = std::chrono::high_resolution_clock::now();
    
    // ✅ reuse one regex (fast)
    auto start2 = std::chrono::high_resolution_clock::now();
    std::regex pattern{R"(\d+)"};  // create once
    for (const auto& text : texts) {
        std::regex_search(text, pattern);
    }
    auto end2 = std::chrono::high_resolution_clock::now();
    
    auto duration1 = std::chrono::duration_cast<std::chrono::microseconds>(end1 - start1).count();
    auto duration2 = std::chrono::duration_cast<std::chrono::microseconds>(end2 - start2).count();
    
    std::cout << "create each time: " << duration1 << " μs" << std::endl;
    std::cout << "reuse: " << duration2 << " μs" << std::endl;
    
    return 0;
}

Fix: reuse one std::regex object.

Problem 3: Exceptions

#include <regex>
#include <iostream>

int main() {
    try {
        // ❌ invalid pattern
        std::regex pattern{"[invalid"};  // std::regex_error
        
    } catch (const std::regex_error& e) {
        std::cerr << "regex error: " << e.what() << std::endl;
        std::cerr << "error code: " << e.code() << std::endl;
    }
    
    return 0;
}

Problem 4: Flags

#include <regex>
#include <iostream>

int main() {
    std::string text = "Hello World";
    
    // case sensitive (default)
    std::regex pattern1{"hello"};
    std::cout << std::regex_search(text, pattern1) << std::endl;  // 0 (false)
    
    // case insensitive
    std::regex pattern2{"hello", std::regex::icase};
    std::cout << std::regex_search(text, pattern2) << std::endl;  // 1 (true)
    
    return 0;
}

7. ECMAScript regex syntax

Character classes

\d  // digit [0-9]
\D  // non-digit
\w  // word char
\W  // non-word
\s  // whitespace
\S  // non-whitespace
.   // any char except newline

Quantifiers

*   // zero or more
+   // one or more
?   // zero or one
{n}   // exactly n
{n,}  // n or more
{n,m} // n to m times

Anchors

^   // start of string
$   // end of string
\b  // word boundary
\B  // non-word boundary

Groups

()    // capturing group
(?:)  // non-capturing group
|     // OR
[]    // character class

8. Example: text utilities

#include <regex>
#include <iostream>
#include <string>
#include <vector>

class TextProcessor {
public:
    // extract emails
    static std::vector<std::string> extractEmails(const std::string& text) {
        std::vector<std::string> emails;
        std::regex pattern{R"([\w\.-]+@[\w\.-]+\.\w+)"};
        
        auto begin = std::sregex_iterator(text.begin(), text.end(), pattern);
        auto end = std::sregex_iterator();
        
        for (auto it = begin; it != end; ++it) {
            emails.push_back(it->str());
        }
        
        return emails;
    }
    
    // phone numbers (example pattern)
    static std::vector<std::string> extractPhones(const std::string& text) {
        std::vector<std::string> phones;
        std::regex pattern{R"(\d{2,3}-\d{3,4}-\d{4})"};
        
        auto begin = std::sregex_iterator(text.begin(), text.end(), pattern);
        auto end = std::sregex_iterator();
        
        for (auto it = begin; it != end; ++it) {
            phones.push_back(it->str());
        }
        
        return phones;
    }
    
    // strip HTML tags
    static std::string removeHTMLTags(const std::string& html) {
        std::regex pattern{R"(<[^>]*>)"};
        return std::regex_replace(html, pattern, "");
    }
    
    // collapse whitespace
    static std::string normalizeSpaces(const std::string& text) {
        std::regex pattern{R"(\s+)"};
        return std::regex_replace(text, pattern, " ");
    }
    
    // validate URL
    static bool isValidURL(const std::string& url) {
        std::regex pattern{R"(^https?://[\w\.-]+\.\w{2,}(/.*)?$)"};
        return std::regex_match(url, pattern);
    }
};

int main() {
    std::string text = "Contact: [email protected] or call 010-1234-5678";
    
    // extract emails
    auto emails = TextProcessor::extractEmails(text);
    std::cout << "emails:" << std::endl;
    for (const auto& email : emails) {
        std::cout << "  " << email << std::endl;
    }
    
    // extract phones
    auto phones = TextProcessor::extractPhones(text);
    std::cout << "phones:" << std::endl;
    for (const auto& phone : phones) {
        std::cout << "  " << phone << std::endl;
    }
    
    // strip HTML tags
    std::string html = "<p>Hello <b>World</b></p>";
    std::string plain = TextProcessor::removeHTMLTags(html);
    std::cout << "stripped: " << plain << std::endl;
    
    // normalize spaces
    std::string messy = "Hello    World   !";
    std::string clean = TextProcessor::normalizeSpaces(messy);
    std::cout << "normalized spaces: " << clean << std::endl;
    
    // validate URL
    std::cout << "URL valid: " 
              << (TextProcessor::isValidURL("https://example.com") ? "yes" : "no") 
              << std::endl;
    
    return 0;
}

Summary

Key points

  1. regex: C++11 regular-expression library
  2. regex_match: full-string match
  3. regex_search: substring search
  4. regex_replace: replace matches
  5. Captures: () extract submatches
  6. Iterators: enumerate all matches

Function cheat sheet

FunctionRoleReturnsWhen
regex_matchFull matchboolValidation
regex_searchSubstringboolSearch
regex_replaceReplacestringTransform
sregex_iteratorAll matchesiteratorExtract many

Tips

Performance:

  • Reuse std::regex objects (construction is expensive)
  • simple tasks may use string helpers
  • for heavy throughput consider RE2/PCRE

Readability:

  • Prefer raw string literals (R"(...)")
  • comment non-obvious patterns
  • use clear names for submatches

Safety:

  • catch std::regex_error
  • validate input length
  • watch ReDoS on user patterns

Next steps

  • C++ Regex Iterator
  • C++ String
  • C++ Algorithm Replace

  • C++ Regex guide |