When do I use this in production?

Whenever you need SRE-style metrics: request rates, latency, errors, and resource usage from C++ services.

prometheus-cpp vs a manual implementation?

Use prometheus-cpp when you need labels, histograms, and can add a dependency. Use manual text export when you want minimal deps or embedded targets.

C++ Observability: Prometheus and Grafana for Server Monitoring

Q: What should I read next?

Follow the previous/next links at the bottom of each post, or the C++ series index.

2026년 3월 12일 · 18분 읽기 · 수정 2026년 3월 12일 Intermediate Hands-on

이 글의 핵심

Build a pipeline: expose metrics from C++ servers, scrape with Prometheus, and visualize in Grafana.

Introduction: see why it got slow—with data

You need metrics to respond

Parts 43-1 and 43-2 covered RPC and security. In operations, metrics (request counts, latency, CPU usage, etc.) are essential. Prometheus uses a pull model: the server scrapes targets over HTTP and stores time series. Grafana visualizes them on dashboards.

To expose Prometheus text format from C++, define Counter, Gauge, and Histogram, and serve them as text on a path such as /metrics. You can use prometheus-cpp or implement a minimal exporter yourself.

This article covers:

Prometheus metric types: Counter, Gauge, Histogram, labels
Exposing metrics from C++: library vs manual, thread safety
Grafana: Prometheus data source and dashboard examples
Scenarios, common errors, production patterns

Real-world scenarios

Scenario 1: API suddenly slow, root cause unknown

Situation: C++ gRPC server latency spikes to 10s at 2 AM
Problem: Logs alone do not show *where* it blocks
Result: Without Prometheus metrics you cannot see RPS, latency distribution, or error-rate trends
→ Hours to triage, incident response drags

Scenario 2: Memory usage creeps up

Situation: C++ server memory rises for 3 days straight
Problem: No time series for heap, connections, or queue depth
Result: Without Gauge metrics you only suspect leaks, with no evidence
→ Restart as a band-aid, root cause unfixed

Scenario 3: One endpoint has high errors

Situation: Overall error rate 1%, but /api/payment is 30%
Problem: Without per-path metrics you cannot pinpoint the route
Result: Only a global Counter → no fine-grained analysis
→ You optimize the wrong thing or miss the bad path

Scenario 4: Regression after deploy

Situation: Users feel slowness after a new release
Problem: No p99 or RPS to compare before/after
Result: Without Histograms you cannot compare percentiles
→ Roll back blindly or leave the outage running

This article shows how to prevent those issues with a Prometheus + Grafana pipeline and complete examples.

Prometheus metrics
Exposing metrics from C++
Prometheus configuration and scraping
Grafana integration
End-to-end Prometheus + Grafana examples
Common errors and fixes
Best practices
Production patterns
Implementation checklist
Summary

1. Prometheus metrics

Counter, Gauge, Histogram

Counter: monotonically increasing (requests, bytes). Use rate() for per-second increase.
Gauge: goes up and down (connections, queue length, memory).
Histogram: distributions (latency). Expose buckets plus sum and count; use histogram_quantile in Prometheus for percentiles.
Labels: attach labels (e.g. method, path, status) for filtering and grouping. Keep cardinality bounded.

Histogram bucket hints (seconds)

Service type	Suggested buckets (s)	Notes
Low-latency API	0.001, 0.005, 0.01, 0.025, 0.05, 0.1	ms-scale latency
Typical API	0.005, 0.025, 0.1, 0.5, 1.0, 2.5	REST/gRPC
Batch	1, 5, 10, 30, 60, 120	Long jobs

Prometheus text format example

# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",path="/api"} 1234
http_requests_total{method="POST",path="/api"} 567

# HELP http_request_duration_seconds Request duration in seconds
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.05"} 50
http_request_duration_seconds_bucket{le="0.1"} 100
http_request_duration_seconds_bucket{le="0.5"} 200
http_request_duration_seconds_bucket{le="1.0"} 250
http_request_duration_seconds_bucket{le="+Inf"} 300
http_request_duration_seconds_sum 45.2
http_request_duration_seconds_count 300

Collection architecture

flowchart LR
    subgraph Cpp["C++ server"]
        M[/metrics endpoint]
    end
    subgraph Prom["Prometheus"]
        S[Scrape]
        TS[Time series DB]
    end
    subgraph Graf["Grafana"]
        D[Dashboards]
        A[Alerts]
    end
    M -->|HTTP GET| S
    S --> TS
    TS -->|PromQL| D
    TS -->|Alert rules| A

Scrape sequence

sequenceDiagram
    participant P as Prometheus
    participant C as C++ server

    loop scrape_interval (e.g. 15s)
        P->>C: GET /metrics
        C->>C: call export_metrics()
        C->>P: 200 OK, text/plain
        P->>P: parse and store in TSDB
    end

2. Exposing metrics from C++

Library vs manual

prometheus-cpp: register Counter/Gauge/Histogram and serialize to text. Under multi-threaded access, protect with atomics or locks.
Manual: std::atomic counters and per-bucket counts, assemble strings in the /metrics handler. Set Content-Type: text/plain; charset=utf-8 as expected.
Placement: put metrics on an admin port or separate path, use auth and network isolation so they are not public.

Minimal manual example

Increment request_count with fetch_add(1, memory_order_relaxed) on each request; export_metrics() returns a Prometheus line (name value\n). The /metrics handler returns that body. memory_order_relaxed is enough for a simple counter; use seq_cst if you need ordering across metrics.

#include <atomic>
#include <string>

// Conceptual single Counter
std::atomic<uint64_t> request_count{0};

void on_request() {
    request_count.fetch_add(1, std::memory_order_relaxed);
}

std::string export_metrics() {
    return "http_requests_total " + std::to_string(request_count.load()) + "\n";
}

Manual Counter, Gauge, Histogram

#include <atomic>
#include <string>
#include <sstream>
#include <mutex>
#include <chrono>

// Per-label counters (path, method) — simplified global example
struct Metrics {
    std::atomic<uint64_t> requests_total{0};
    std::atomic<uint64_t> errors_total{0};
    std::atomic<uint64_t> active_connections{0};
    std::atomic<uint64_t> queue_length{0};

    // Histogram buckets: 5ms, 25ms, 100ms, 500ms, 1s, +Inf
    static constexpr double buckets[] = {0.005, 0.025, 0.1, 0.5, 1.0, -1};  // -1 = +Inf
    std::atomic<uint64_t> duration_bucket_5ms{0};
    std::atomic<uint64_t> duration_bucket_25ms{0};
    std::atomic<uint64_t> duration_bucket_100ms{0};
    std::atomic<uint64_t> duration_bucket_500ms{0};
    std::atomic<uint64_t> duration_bucket_1s{0};
    std::atomic<uint64_t> duration_bucket_inf{0};
    std::atomic<double> duration_sum{0};
    std::atomic<uint64_t> duration_count{0};

    void record_request(bool error, double duration_sec) {
        requests_total.fetch_add(1, std::memory_order_relaxed);
        if (error) errors_total.fetch_add(1, std::memory_order_relaxed);

        auto add_bucket = [this](std::atomic<uint64_t>& b) {
            b.fetch_add(1, std::memory_order_relaxed);
        };
        if (duration_sec <= 0.005) add_bucket(duration_bucket_5ms);
        else if (duration_sec <= 0.025) add_bucket(duration_bucket_25ms);
        else if (duration_sec <= 0.1) add_bucket(duration_bucket_100ms);
        else if (duration_sec <= 0.5) add_bucket(duration_bucket_500ms);
        else if (duration_sec <= 1.0) add_bucket(duration_bucket_1s);
        add_bucket(duration_bucket_inf);

        double expected;
        do {
            expected = duration_sum.load(std::memory_order_relaxed);
        } while (!duration_sum.compare_exchange_weak(
            expected, expected + duration_sec, std::memory_order_relaxed));
        duration_count.fetch_add(1, std::memory_order_relaxed);
    }

    void connection_opened() {
        active_connections.fetch_add(1, std::memory_order_relaxed);
    }
    void connection_closed() {
        active_connections.fetch_sub(1, std::memory_order_relaxed);
    }
    void queue_inc() { queue_length.fetch_add(1, std::memory_order_relaxed); }
    void queue_dec() { queue_length.fetch_sub(1, std::memory_order_relaxed); }

    std::string export_prometheus() const {
        std::ostringstream out;
        out << "# HELP http_requests_total Total HTTP requests\n";
        out << "# TYPE http_requests_total counter\n";
        out << "http_requests_total " << requests_total.load() << "\n";

        out << "# HELP http_errors_total Total HTTP errors\n";
        out << "# TYPE http_errors_total counter\n";
        out << "http_errors_total " << errors_total.load() << "\n";

        out << "# HELP http_active_connections Active connections\n";
        out << "# TYPE http_active_connections gauge\n";
        out << "http_active_connections " << active_connections.load() << "\n";

        out << "# HELP http_queue_length Current queue length\n";
        out << "# TYPE http_queue_length gauge\n";
        out << "http_queue_length " << queue_length.load() << "\n";

        out << "# HELP http_request_duration_seconds Request duration\n";
        out << "# TYPE http_request_duration_seconds histogram\n";
        out << "http_request_duration_seconds_bucket{le=\"0.005\"} " << duration_bucket_5ms.load() << "\n";
        out << "http_request_duration_seconds_bucket{le=\"0.025\"} " << duration_bucket_25ms.load() << "\n";
        out << "http_request_duration_seconds_bucket{le=\"0.1\"} " << duration_bucket_100ms.load() << "\n";
        out << "http_request_duration_seconds_bucket{le=\"0.5\"} " << duration_bucket_500ms.load() << "\n";
        out << "http_request_duration_seconds_bucket{le=\"1\"} " << duration_bucket_1s.load() << "\n";
        out << "http_request_duration_seconds_bucket{le=\"+Inf\"} " << duration_bucket_inf.load() << "\n";
        out << "http_request_duration_seconds_sum " << duration_sum.load() << "\n";
        out << "http_request_duration_seconds_count " << duration_count.load() << "\n";

        return out.str();
    }
};

prometheus-cpp example

#include <prometheus/counter.h>
#include <prometheus/gauge.h>
#include <prometheus/histogram.h>
#include <prometheus/registry.h>
#include <prometheus/exposer.h>
#include <memory>

int main() {
    // Expose /metrics on port 8080
    prometheus::Exposer exposer{"127.0.0.1:8080"};
    auto registry = std::make_shared<prometheus::Registry>();

    // Counter with labels for path and method
    auto& request_counter = prometheus::BuildCounter()
        .Name("http_requests_total")
        .Help("Total HTTP requests")
        .Labels({{"service", "cpp-server"}})
        .Register(*registry);
    auto& get_requests = request_counter.Add({{"method", "GET"}, {"path", "/api"}});
    auto& post_requests = request_counter.Add({{"method", "POST"}, {"path", "/api"}});

    // Gauge: active connections
    auto& conn_gauge = prometheus::BuildGauge()
        .Name("http_active_connections")
        .Help("Active connections")
        .Register(*registry);

    // Histogram: latency (5ms, 25ms, 100ms, 500ms, 1s)
    auto& duration_hist = prometheus::BuildHistogram()
        .Name("http_request_duration_seconds")
        .Help("Request duration")
        .Buckets({0.005, 0.025, 0.1, 0.5, 1.0})
        .Register(*registry);
    auto& get_duration = duration_hist.Add({{"method", "GET"}}, std::vector<double>{0.005, 0.025, 0.1, 0.5, 1.0});

    exposer.RegisterCollectable(registry);

    // During request handling
    get_requests.Increment();
    conn_gauge.Increment();
    auto start = std::chrono::steady_clock::now();
    // ... handle request ...
    auto elapsed = std::chrono::duration<double>(std::chrono::steady_clock::now() - start).count();
    get_duration.Observe(elapsed);
    conn_gauge.Decrement();

    return 0;
}

Building prometheus-cpp

# vcpkg (recommended)
vcpkg install prometheus-cpp

# CMakeLists.txt
find_package(prometheus-cpp CONFIG REQUIRED)
target_link_libraries(my_server prometheus::prometheus)

# Or FetchContent without a submodule
include(FetchContent)
FetchContent_Declare(
  prometheus-cpp
  GIT_REPOSITORY https://github.com/jupp0r/prometheus-cpp.git
  GIT_TAG        v1.2.2
)
FetchContent_MakeAvailable(prometheus-cpp)
target_link_libraries(my_server prometheus::prometheus)

3. Prometheus configuration and scraping

Basic prometheus.yml

global:
  scrape_interval: 15s      # default scrape interval
  evaluation_interval: 15s   # alert rule evaluation

alerting:
  alertmanagers:
    - static_configs:
        - targets: []

rule_files: []

scrape_configs:
  - job_name: 'cpp-server'
    scrape_interval: 10s    # scrape C++ server every 10s
    scrape_timeout: 5s
    static_configs:
      - targets: ['localhost:8080']
        labels:
          env: 'production'
          service: 'cpp-api'

Dynamic targets (service discovery)

# Scrape C++ pods in Kubernetes
scrape_configs:
  - job_name: 'cpp-pods'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: ${1}:${2}
        target_label: __address__

4. Grafana integration

Data source

Add Prometheus as a Grafana data source and query with PromQL.
URL: http://prometheus:9090 (Docker/K8s) or http://localhost:9090

Useful PromQL examples

# Requests per second
rate(http_requests_total[5m])

# p99 latency (seconds)
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))

# Error rate (%)
100 * sum(rate(http_errors_total[5m])) / sum(rate(http_requests_total[5m]))

# Active connections (Gauge — no rate)
http_active_connections

# Queue length
http_queue_length

Dashboard panels

Graphs: RPS, latency percentiles (p50, p95, p99), error rate over time
Single stat: current connections, queue length
Table: requests by path, errors by method
Alerts: e.g. p99 > 1s, error rate > 5% → Slack/email

More PromQL

# p50, p95, p99
histogram_quantile(0.50, rate(http_request_duration_seconds_bucket[5m]))
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))

# Average latency (sum/count)
rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])

# RPS per instance
sum by (instance) (rate(http_requests_total[5m]))

# Errors in 5 minutes
increase(http_errors_total[5m])

Grafana alert channel (Slack example)

# Configuration → Alerting → Contact points → New contact point
# Type: Slack
# Webhook URL: https://hooks.slack.com/services/xxx/yyy/zzz
# Channel: #alerts-cpp-server

Dashboard variables (filter by instance)

# Dashboard Settings → Variables → New variable
# Name: instance
# Type: Query
# Data source: Prometheus
# Query: label_values(http_requests_total, instance)
# Multi-value: Yes
# In panel queries: {instance=~"$instance"}

5. End-to-end Prometheus + Grafana examples

Full stack with Docker Compose

# docker-compose.yml
version: '3.8'

services:
  cpp-server:
    build: .
    ports:
      - "8080:8080"
    environment:
      - METRICS_PORT=8080

  prometheus:
    image: prom/prometheus:v2.47.0
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=15d'

  grafana:
    image: grafana/grafana:10.2.0
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_USERS_ALLOW_SIGN_UP=false
    volumes:
      - grafana-data:/var/lib/grafana
    depends_on:
      - prometheus

volumes:
  grafana-data:

C++ server + /metrics (Boost.Beast sketch)

#include <boost/beast/core.hpp>
#include <boost/beast/http.hpp>
#include <boost/asio.hpp>
#include <atomic>
#include <chrono>
#include <string>
#include <thread>

namespace beast = boost::beast;
namespace http = beast::http;
namespace net = boost::asio;

// Global metrics (prefer singleton or DI in production)
std::atomic<uint64_t> g_requests_total{0};
std::atomic<uint64_t> g_errors_total{0};
std::atomic<uint64_t> g_active_connections{0};

void handle_metrics(http::request<http::string_body> const& req,
                    http::response<http::string_body>& res) {
    res.set(http::field::content_type, "text/plain; charset=utf-8");
    res.body() = "# HELP http_requests_total Total requests\n"
                 "# TYPE http_requests_total counter\n"
                 "http_requests_total " + std::to_string(g_requests_total.load()) + "\n"
                 "# HELP http_errors_total Total errors\n"
                 "# TYPE http_errors_total counter\n"
                 "http_errors_total " + std::to_string(g_errors_total.load()) + "\n"
                 "# HELP http_active_connections Active connections\n"
                 "# TYPE http_active_connections gauge\n"
                 "http_active_connections " + std::to_string(g_active_connections.load()) + "\n";
    res.prepare_payload();
}

// For /metrics call handle_metrics; other paths run business logic

Grafana dashboard JSON (core panels)

{
  "panels": [
    {
      "title": "RPS",
      "type": "timeseries",
      "targets": [{
        "expr": "rate(http_requests_total[5m])",
        "legendFormat": "{{instance}}"
      }]
    },
    {
      "title": "p99 latency (s)",
      "type": "timeseries",
      "targets": [{
        "expr": "histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))",
        "legendFormat": "p99"
      }]
    },
    {
      "title": "Error rate (%)",
      "type": "timeseries",
      "targets": [{
        "expr": "100 * sum(rate(http_errors_total[5m])) / sum(rate(http_requests_total[5m]))",
        "legendFormat": "error_rate"
      }]
    },
    {
      "title": "Active connections",
      "type": "stat",
      "targets": [{
        "expr": "http_active_connections",
        "legendFormat": "connections"
      }]
    }
  ]
}

6. Common errors and fixes

1. Prometheus: “connection refused” or “context deadline exceeded”

Cause: /metrics port closed, firewall, or network isolation.

Fix:

curl -v http://localhost:8080/metrics

docker exec prometheus wget -qO- http://cpp-server:8080/metrics

# Use Docker service names or K8s Service names
scrape_configs:
  - job_name: 'cpp-server'
    static_configs:
      - targets: ['cpp-server:8080']

2. “parse error” or “invalid character”

Cause: Text format does not match Prometheus exposition format.

Fix:

# Bad: commas, spaces, bad escaping
http_requests_total 1234,567

# Good
# HELP http_requests_total Total requests
# TYPE http_requests_total counter
http_requests_total 1234
http_requests_total{path="/api"} 100

Include HELP and TYPE where appropriate
Escape " and \ inside label values
One sample per line: name{labels} value or name value

3. Grafana: “No data”

Cause: PromQL typo, time range, or metric name mismatch.

Fix:

{__name__=~"http_.*"}

rate(http_requests_total[5m])

histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))

4. Label cardinality explosion

Cause: Using raw paths or user IDs as labels.

Fix:

// Risky: thousands of paths → thousands of series
request_counter.Add({{"path", user_provided_path}});

// Safer: normalize templates
std::string normalize_path(const std::string& path) {
    if (path.find("/api/users/") == 0) return "/api/users/:id";
    if (path.find("/api/orders/") == 0) return "/api/orders/:id";
    return path;
}

5. Histogram races with atomics

Cause: Bucket updates and sum/count not consistent as a group.

Fix: use atomics per bucket and CAS loop for sum, or one mutex around record_request.

6. Slow /metrics (hundreds of ms)

Cause: Heavy allocation or lock contention during export.

Fix: cache serialized text in a thread-local buffer and refresh when metrics change.

7. “Out of order” or duplicate samples

Cause: Clock jumps after restart, or duplicate scrape jobs for the same target.

Fix: ensure one job per target.

7. Best practices

Naming

Counter: _total suffix (e.g. http_requests_total)
Units: _seconds, _bytes, etc.
Lowercase snake_case

Labels

Bound cardinality (hundreds of combinations, not millions)
Static labels: env, service, region
Avoid high-cardinality dynamic values (user_id, request_id)

Scrape intervals

App: 10–15s
Infra: 30s–1m
Expensive metrics: 1–5m

Securing /metrics

Internal networks only
Basic Auth or mTLS
Separate port from user-facing API (e.g. 8080 API, 9090 metrics)

Performance

Topic	Recommendation
Atomics	`memory_order_relaxed` when order across metrics does not matter
Histogram	Prefer atomic buckets on hot paths over a global mutex
Export	Minimize string building on each scrape
Labels	Keep label count small; cardinality < 100 for typical setups

8. Production patterns

Pattern 1: separate metrics port

// API :8080, metrics :9090 (bind to internal IP)
void run_metrics_server(const std::string& bind_addr, uint16_t port) {
    tcp::acceptor acceptor(ctx, {net::ip::make_address(bind_addr), port});
}

Pattern 2: initialize metrics at startup

void init_metrics() {
    g_requests_total.store(0);
    g_errors_total.store(0);
}

Pattern 3: RAII request scope

struct ScopedRequestMetrics {
    Metrics& m;
    std::chrono::steady_clock::time_point start;
    bool error = false;

    ScopedRequestMetrics(Metrics& metrics) : m(metrics), start(std::chrono::steady_clock::now()) {
        m.connection_opened();
    }
    ~ScopedRequestMetrics() {
        auto dur = std::chrono::duration<double>(
            std::chrono::steady_clock::now() - start).count();
        m.record_request(error, dur);
        m.connection_closed();
    }
};

Pattern 4: Prometheus alert rules

groups:
  - name: cpp-server
    rules:
      - alert: HighErrorRate
        expr: 100 * sum(rate(http_errors_total[5m])) / sum(rate(http_requests_total[5m])) > 5
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "C++ server error rate {{ $value | humanize }}% exceeded"

      - alert: HighLatency
        expr: histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "p99 latency exceeded 1s"

9. Implementation checklist

10. Summary

Topic	Summary
Prometheus	Counter/Gauge/Histogram, labels, pull, text exposition
C++	Atomics, library or manual serialization, /metrics handler
Grafana	PromQL, dashboards, alerts
Production	Split ports, alert rules, bounded labels

Series 43 covered gRPC/Protobuf → secure coding/OpenSSL → Observability (Prometheus + Grafana) for large distributed systems.

Rust vs C++ memory safety #47-3
C++ network errors #28-3
Clang-Tidy and Cppcheck #41-1

Practical tips

Start from compiler warnings and minimal reproducers when debugging.
Measure before optimizing; define SLOs and metrics first.
Align code review with team conventions.

Keywords (SEO)

Prometheus, Grafana, C++ monitoring, prometheus-cpp, metrics, Observability

FAQ

When is this useful in production?

A. Whenever you run C++ services in production and need scrapeable metrics, dashboards, and alerts. Use the examples above as templates.

prometheus-cpp vs manual?

A. prometheus-cpp: full feature set and labels when you can take the dependency. Manual: minimal deps, embedded systems, or very small metric sets.

What should I read next?

A. Follow the Previous/Next links at the bottom of each article, or the C++ series index.

Where to go deeper?

A. Prometheus docs, Grafana docs, prometheus-cpp.

One-line summary: Prometheus scrapes your C++ /metrics; Grafana turns them into dashboards and alerts. Next: C++26 preview #44-1.

Previous: Secure coding & OpenSSL #43-2

Next: C++26 preview #44-1

constexpr basics #43-1
gRPC & Protobuf #43-1
Advanced constexpr #43-2
Secure coding & OpenSSL #43-2
Monitoring dashboard #50-6

이 글의 핵심

Introduction: see why it got slow—with data

You need metrics to respond

Real-world scenarios

Table of contents

1. Prometheus metrics

Counter, Gauge, Histogram

Histogram bucket hints (seconds)

Prometheus text format example

Collection architecture

Scrape sequence

2. Exposing metrics from C++

Library vs manual

Minimal manual example

Manual Counter, Gauge, Histogram

prometheus-cpp example

Building prometheus-cpp

3. Prometheus configuration and scraping

Basic prometheus.yml

Dynamic targets (service discovery)

4. Grafana integration

Data source

Useful PromQL examples

Dashboard panels

More PromQL

Grafana alert channel (Slack example)

Dashboard variables (filter by instance)

5. End-to-end Prometheus + Grafana examples

Full stack with Docker Compose

C++ server + /metrics (Boost.Beast sketch)

Grafana dashboard JSON (core panels)

6. Common errors and fixes

1. Prometheus: “connection refused” or “context deadline exceeded”

2. “parse error” or “invalid character”

3. Grafana: “No data”

4. Label cardinality explosion

5. Histogram races with atomics

6. Slow /metrics (hundreds of ms)

7. “Out of order” or duplicate samples

7. Best practices

Naming

Labels

Scrape intervals

Securing /metrics

Performance

8. Production patterns

Pattern 1: separate metrics port

Pattern 2: initialize metrics at startup

Pattern 3: RAII request scope

Pattern 4: Prometheus alert rules

9. Implementation checklist

10. Summary

Related posts (internal links)

Practical tips

Keywords (SEO)

FAQ

When is this useful in production?

prometheus-cpp vs manual?

What should I read next?

Where to go deeper?

Related posts