DevOps Intermediate
Observability Strategy — Metrics, Logs, Traces¶
ObservabilityMonitoringStrategySRE 5 min read
Observability strategy for modern systems. Three pillars, correlation, tools and implementation plan.
Three Pillars¶
- Metrics — numerical values over time (Prometheus). Fast, cheap, aggregated.
- Logs — text records of events (Loki, ELK). Detailed context.
- Traces — a request’s path through the system (Tempo, Jaeger). Cross-service debugging.
No single pillar is sufficient on its own. The power lies in correlation.
Correlation¶
Connect the three pillars through shared identifiers:
# In Grafana: exemplars link metric → trace
# In Loki: trace_id label links log → trace
# In Tempo: service.name links trace → metrics
# Example: structured log with trace_id
{"level":"error","msg":"payment failed",
"trace_id":"abc123","span_id":"def456",
"service":"order-service","user_id":"u789"}
# LogQL → Tempo
{app="order-service"} | json | trace_id != ""
| line_format "{{.trace_id}}"
Implementation Plan¶
- Phase 1: Metrics + alerting (Prometheus + Alertmanager)
- Phase 2: Centralized logs (Loki + Promtail)
- Phase 3: Distributed tracing (OTel + Tempo)
- Phase 4: Correlation and dashboards (Grafana)
- Phase 5: SLO/SLI + Error Budgets
Summary¶
Implement your observability strategy iteratively: metrics first, then logs, then traces. Correlation between pillars is key for fast debugging.
Need Help with Implementation?¶
Our team has experience designing and implementing modern architectures. We’re happy to help.