C++ std::regex | Regular expressions in C++11
이 글의 핵심
Hands-on guide to std::regex: validating emails and URLs, extracting matches, replacing text, and avoiding common pitfalls like recompiling patterns in a loop.
Introduction
C++11 <regex> adds regular expressions to the standard library. You can match patterns, search, replace, and build robust string-processing pipelines.
1. Regex basics
Basic usage
A minimal example that checks an email-shaped string:
#include <regex>
#include <iostream>
#include <string>
int main() {
std::string text = "[email protected]";
// Build regex pattern
// R"(...)" : raw string literal (no extra escaping)
// (\w+) : first capture — one or more word chars
// @ : literal @
// (\w+\.\w+) : second capture — domain (e.g. example.com)
std::regex pattern{R"((\w+)@(\w+\.\w+))"};
// regex_match: entire string must match the pattern
if (std::regex_match(text, pattern)) {
std::cout << "Valid email shape" << std::endl;
}
return 0;
}
Pattern pieces:
\w: word character (a-z, A-Z, 0-9, _)+: one or more\.: literal dot (escaped)(): capture group (extract later)
Raw string literals
#include <regex>
// ❌ heavy escaping
std::regex pattern1{"\\d+\\.\\d+"};
// ✅ raw string (recommended)
std::regex pattern2{R"(\d+\.\d+)"};
std::regex email{R"(^[\w\.-]+@[\w\.-]+\.\w+$)"};
2. Regex algorithms
regex_match (full match)
regex_match returns true only if the entire string matches the pattern:
#include <regex>
#include <iostream>
int main() {
std::string text1 = "123";
std::string text2 = "abc123";
// \d+ : one or more digits
std::regex pattern{R"(\d+)"};
// "123" — whole string is digits → match
std::cout << std::regex_match(text1, pattern) << std::endl; // 1 (true)
// "abc123" — prefix letters → no full match
std::cout << std::regex_match(text2, pattern) << std::endl; // 0 (false)
// regex_match requires the entire string to match (no partial match)
return 0;
}
When to use:
- Validate phone, email, postal codes against a full-string pattern
- Check filenames against a pattern end-to-end
regex_search (partial match)
regex_search returns true if the pattern matches somewhere in the string:
#include <regex>
#include <iostream>
int main() {
std::string text = "C++ 2026";
// \d+ : one or more digits
std::regex pattern{R"(\d+)"};
// "C++ 2026" contains "2026" → success
if (std::regex_search(text, pattern)) {
std::cout << "found digits" << std::endl;
}
// regex_search finds the first match; use sregex_iterator for all matches
return 0;
}
regex_match vs regex_search:
| Function | Behavior | Example |
|---|---|---|
regex_match | Full string matches | "123" → \d+ → true"abc123" → \d+ → false |
regex_search | Substring matches | "123" → \d+ → true"abc123" → \d+ → true |
When to use:
- Log parsing: find error lines in logs
- Search: find a pattern in a document
- Extraction: pull data from HTML/XML
regex_replace (replace)
#include <regex>
#include <iostream>
int main() {
std::string text = "Hello World 2026";
std::regex pattern{R"(\d+)"};
// wrap digits in brackets
std::string result = std::regex_replace(text, pattern, "[$&]");
std::cout << result << std::endl; // Hello World [2026]
return 0;
}
3. Capture groups
Basic captures
#include <regex>
#include <iostream>
int main() {
std::string text = "2026-03-29";
std::regex pattern{R"((\d{4})-(\d{2})-(\d{2}))"};
std::smatch matches;
if (std::regex_match(text, matches, pattern)) {
std::cout << "full: " << matches[0] << std::endl; // 2026-03-29
std::cout << "year: " << matches[1] << std::endl; // 2026
std::cout << "month: " << matches[2] << std::endl; // 03
std::cout << "day: " << matches[3] << std::endl; // 29
}
return 0;
}
Named captures (not in C++11 regex)
#include <regex>
#include <iostream>
int main() {
std::string text = "[email protected]";
std::regex pattern{R"((\w+)@(\w+\.\w+))"};
std::smatch matches;
if (std::regex_match(text, matches, pattern)) {
std::string username = matches[1];
std::string domain = matches[2];
std::cout << "user: " << username << std::endl; // user
std::cout << "domain: " << domain << std::endl; // example.com
}
return 0;
}
4. Iterators
Find every match
#include <regex>
#include <iostream>
#include <string>
int main() {
std::string text = "C++ 11, C++ 14, C++ 17, C++ 20";
std::regex pattern{R"(C\+\+ (\d+))"};
// iterate all matches
auto begin = std::sregex_iterator(text.begin(), text.end(), pattern);
auto end = std::sregex_iterator();
for (auto it = begin; it != end; ++it) {
std::smatch match = *it;
std::cout << "full: " << match.str() << std::endl;
std::cout << "version: " << match[1] << std::endl;
}
// Output:
// full: C++ 11
// version: 11
// ...
// ...
return 0;
}
Token iterator
#include <regex>
#include <iostream>
int main() {
std::string text = "apple,banana,cherry";
std::regex pattern{R"(,)"};
// split on delimiter
std::sregex_token_iterator begin(text.begin(), text.end(), pattern, -1);
std::sregex_token_iterator end;
for (auto it = begin; it != end; ++it) {
std::cout << *it << std::endl;
}
// Output:
// apple
// banana
// cherry
return 0;
}
5. Practical examples
Example 1: Email validation
#include <regex>
#include <iostream>
#include <string>
bool isValidEmail(const std::string& email) {
// simple email pattern
std::regex pattern{R"(^[\w\.-]+@[\w\.-]+\.\w{2,}$)"};
return std::regex_match(email, pattern);
}
int main() {
std::string emails[] = {
"[email protected]",
"invalid.email",
"[email protected]",
"@example.com"
};
for (const auto& email : emails) {
std::cout << email << ": "
<< (isValidEmail(email) ? "valid" : "invalid")
<< std::endl;
}
return 0;
}
Example 2: Parse URL
#include <regex>
#include <iostream>
struct URL {
std::string protocol;
std::string domain;
std::string path;
};
URL parseURL(const std::string& url) {
std::regex pattern{R"(^(https?)://([^/]+)(/.*)?$)"};
std::smatch matches;
if (std::regex_match(url, matches, pattern)) {
return {
matches[1].str(), // protocol
matches[2].str(), // domain
matches[3].str() // path
};
}
return {};
}
int main() {
std::string url = "https://example.com/path/to/page";
URL parsed = parseURL(url);
std::cout << "protocol: " << parsed.protocol << std::endl;
std::cout << "domain: " << parsed.domain << std::endl;
std::cout << "path: " << parsed.path << std::endl;
return 0;
}
Example 3: Log parsing
#include <regex>
#include <iostream>
#include <string>
#include <vector>
struct LogEntry {
std::string timestamp;
std::string level;
std::string message;
};
std::vector<LogEntry> parseLog(const std::string& log) {
std::vector<LogEntry> entries;
// pattern: [timestamp] [LEVEL] message
std::regex pattern{R"(\[([^\]]+)\] \[(\w+)\] (.+))"};
std::istringstream stream(log);
std::string line;
while (std::getline(stream, line)) {
std::smatch matches;
if (std::regex_match(line, matches, pattern)) {
entries.push_back({
matches[1].str(), // timestamp
matches[2].str(), // level
matches[3].str() // message
});
}
}
return entries;
}
int main() {
std::string log =
"[2026-03-29 14:30:25] [INFO] Server started\n"
"[2026-03-29 14:30:26] [ERROR] Connection failed\n"
"[2026-03-29 14:30:27] [DEBUG] Retrying...";
auto entries = parseLog(log);
for (const auto& entry : entries) {
std::cout << entry.timestamp << " [" << entry.level << "] "
<< entry.message << std::endl;
}
return 0;
}
6. Common problems
Problem 1: Escaping
#include <regex>
// ❌ painful escaping
std::regex pattern1{"\\d+\\.\\d+"};
// ✅ raw string literal
std::regex pattern2{R"(\d+\.\d+)"};
// complex patterns stay readable
std::regex email{R"(^[\w\.-]+@[\w\.-]+\.\w+$)"};
Problem 2: Performance
#include <regex>
#include <iostream>
#include <chrono>
#include <vector>
int main() {
std::vector<std::string> texts = {"test1", "test2", "test3"};
// ❌ create regex every iteration (slow)
auto start1 = std::chrono::high_resolution_clock::now();
for (const auto& text : texts) {
std::regex pattern{R"(\d+)"}; // created each time!
std::regex_search(text, pattern);
}
auto end1 = std::chrono::high_resolution_clock::now();
// ✅ reuse one regex (fast)
auto start2 = std::chrono::high_resolution_clock::now();
std::regex pattern{R"(\d+)"}; // create once
for (const auto& text : texts) {
std::regex_search(text, pattern);
}
auto end2 = std::chrono::high_resolution_clock::now();
auto duration1 = std::chrono::duration_cast<std::chrono::microseconds>(end1 - start1).count();
auto duration2 = std::chrono::duration_cast<std::chrono::microseconds>(end2 - start2).count();
std::cout << "create each time: " << duration1 << " μs" << std::endl;
std::cout << "reuse: " << duration2 << " μs" << std::endl;
return 0;
}
Fix: reuse one std::regex object.
Problem 3: Exceptions
#include <regex>
#include <iostream>
int main() {
try {
// ❌ invalid pattern
std::regex pattern{"[invalid"}; // std::regex_error
} catch (const std::regex_error& e) {
std::cerr << "regex error: " << e.what() << std::endl;
std::cerr << "error code: " << e.code() << std::endl;
}
return 0;
}
Problem 4: Flags
#include <regex>
#include <iostream>
int main() {
std::string text = "Hello World";
// case sensitive (default)
std::regex pattern1{"hello"};
std::cout << std::regex_search(text, pattern1) << std::endl; // 0 (false)
// case insensitive
std::regex pattern2{"hello", std::regex::icase};
std::cout << std::regex_search(text, pattern2) << std::endl; // 1 (true)
return 0;
}
7. ECMAScript regex syntax
Character classes
\d // digit [0-9]
\D // non-digit
\w // word char
\W // non-word
\s // whitespace
\S // non-whitespace
. // any char except newline
Quantifiers
* // zero or more
+ // one or more
? // zero or one
{n} // exactly n
{n,} // n or more
{n,m} // n to m times
Anchors
^ // start of string
$ // end of string
\b // word boundary
\B // non-word boundary
Groups
() // capturing group
(?:) // non-capturing group
| // OR
[] // character class
8. Example: text utilities
#include <regex>
#include <iostream>
#include <string>
#include <vector>
class TextProcessor {
public:
// extract emails
static std::vector<std::string> extractEmails(const std::string& text) {
std::vector<std::string> emails;
std::regex pattern{R"([\w\.-]+@[\w\.-]+\.\w+)"};
auto begin = std::sregex_iterator(text.begin(), text.end(), pattern);
auto end = std::sregex_iterator();
for (auto it = begin; it != end; ++it) {
emails.push_back(it->str());
}
return emails;
}
// phone numbers (example pattern)
static std::vector<std::string> extractPhones(const std::string& text) {
std::vector<std::string> phones;
std::regex pattern{R"(\d{2,3}-\d{3,4}-\d{4})"};
auto begin = std::sregex_iterator(text.begin(), text.end(), pattern);
auto end = std::sregex_iterator();
for (auto it = begin; it != end; ++it) {
phones.push_back(it->str());
}
return phones;
}
// strip HTML tags
static std::string removeHTMLTags(const std::string& html) {
std::regex pattern{R"(<[^>]*>)"};
return std::regex_replace(html, pattern, "");
}
// collapse whitespace
static std::string normalizeSpaces(const std::string& text) {
std::regex pattern{R"(\s+)"};
return std::regex_replace(text, pattern, " ");
}
// validate URL
static bool isValidURL(const std::string& url) {
std::regex pattern{R"(^https?://[\w\.-]+\.\w{2,}(/.*)?$)"};
return std::regex_match(url, pattern);
}
};
int main() {
std::string text = "Contact: [email protected] or call 010-1234-5678";
// extract emails
auto emails = TextProcessor::extractEmails(text);
std::cout << "emails:" << std::endl;
for (const auto& email : emails) {
std::cout << " " << email << std::endl;
}
// extract phones
auto phones = TextProcessor::extractPhones(text);
std::cout << "phones:" << std::endl;
for (const auto& phone : phones) {
std::cout << " " << phone << std::endl;
}
// strip HTML tags
std::string html = "<p>Hello <b>World</b></p>";
std::string plain = TextProcessor::removeHTMLTags(html);
std::cout << "stripped: " << plain << std::endl;
// normalize spaces
std::string messy = "Hello World !";
std::string clean = TextProcessor::normalizeSpaces(messy);
std::cout << "normalized spaces: " << clean << std::endl;
// validate URL
std::cout << "URL valid: "
<< (TextProcessor::isValidURL("https://example.com") ? "yes" : "no")
<< std::endl;
return 0;
}
Summary
Key points
- regex: C++11 regular-expression library
- regex_match: full-string match
- regex_search: substring search
- regex_replace: replace matches
- Captures:
()extract submatches - Iterators: enumerate all matches
Function cheat sheet
| Function | Role | Returns | When |
|---|---|---|---|
regex_match | Full match | bool | Validation |
regex_search | Substring | bool | Search |
regex_replace | Replace | string | Transform |
sregex_iterator | All matches | iterator | Extract many |
Tips
Performance:
- Reuse
std::regexobjects (construction is expensive) - simple tasks may use string helpers
- for heavy throughput consider RE2/PCRE
Readability:
- Prefer raw string literals (
R"(...)") - comment non-obvious patterns
- use clear names for submatches
Safety:
- catch
std::regex_error - validate input length
- watch ReDoS on user patterns
Next steps
- C++ Regex Iterator
- C++ String
- C++ Algorithm Replace
Related posts
- C++ Regex guide |