Observability

Distributed Tracing with OpenTelemetry

● Advanced ⏱ 15 min read

A request hits your API gateway, fans out to three microservices, queries two databases, and returns in 800ms. One of those services is slow. Logs tell you each service received and returned a request. Metrics tell you the p99 latency per service. But neither tells you which path this specific slow request took, or how long each hop contributed. Distributed tracing does.

Why Tracing

Tracing answers questions that logs and metrics can't:

Traces, Spans, Context

A trace represents a single end-to-end request. It is composed of spans — individual units of work (an HTTP handler, a DB query, a cache lookup). Spans form a tree: the root span is the entry point; child spans are downstream calls.

Trace anatomy — one request, multiple spans
0ms100ms200ms300ms400ms
gateway: POST /order
400ms
↳ order-svc: create
240ms
↳ db: INSERT orders
120ms ⚠
↳ inventory-svc: reserve
100ms
↳ notify-svc: email
120ms
⚠ db: INSERT orders (120ms) — immediately visible as the hot span. Without tracing you'd see "order-svc is slow" in metrics and hunt through logs to find the DB.
A trace is a tree of spans. Each span has a start time, duration, service name, and metadata. The waterfall view makes latency contributors immediately visible.

Trace context is propagated between services via HTTP headers (W3C traceparent header is the standard). Every service that receives a request extracts the trace ID and span ID, creates a child span, and forwards the context to downstream calls.

OTel Stack Overview

OpenTelemetry (OTel) is the CNCF standard for telemetry instrumentation. It unifies traces, metrics, and logs under one API so you don't depend on vendor SDKs.

ComponentRole
OTel SDKLibrary added to the application. Creates spans and exports telemetry.
OTel OperatorKubernetes operator that injects OTel SDK automatically via webhook — no code changes.
OTel CollectorReceives, processes, and exports telemetry. Can fan out to multiple backends.
Backend (Jaeger/Tempo)Stores and queries traces. Jaeger is self-contained; Tempo integrates with Grafana.

Auto-Instrumentation

The OTel Operator can inject the OTel SDK as an init container and configure it via environment variables — no application code changes required for many frameworks (Java, Python, Node.js, .NET, Go).

Instrumentation CR — auto-instrument Java pods
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: myapp-instrumentation
  namespace: production
spec:
  exporter:
    endpoint: http://otel-collector.monitoring:4317   # gRPC OTLP
  propagators:
    - tracecontext    # W3C standard
    - baggage
  sampler:
    type: parentbased_traceidratio
    argument: "0.1"   # sample 10% of root spans
  java:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
annotate a Deployment to opt in
apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-svc
  namespace: production
spec:
  template:
    metadata:
      annotations:
        instrumentation.opentelemetry.io/inject-java: "myapp-instrumentation"
        # For Python:   instrumentation.opentelemetry.io/inject-python: "myapp-instrumentation"
        # For Node.js:  instrumentation.opentelemetry.io/inject-nodejs: "myapp-instrumentation"

OTel Collector

The Collector acts as a telemetry hub. Pipelines are composed of receivers (how data comes in), processors (transformations), and exporters (where data goes).

OTel Collector config — receive OTLP, export to Tempo + Jaeger
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 5s
    send_batch_size: 512
  memory_limiter:
    limit_mib: 512
    spike_limit_mib: 128
  resource:
    attributes:
    - action: insert
      key: k8s.cluster.name
      value: production

exporters:
  otlp/tempo:
    endpoint: tempo.monitoring:4317
    tls:
      insecure: true
  jaeger:
    endpoint: jaeger-collector.monitoring:14250
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch, resource]
      exporters: [otlp/tempo, jaeger]

Backends — Jaeger & Tempo

JaegerGrafana Tempo
UIBuilt-in Jaeger UIGrafana (same dashboard as metrics/logs)
StorageElasticsearch, Cassandra, BadgerObject storage (S3, GCS) — very cheap
Best forStandalone tracing, existing ESUnified Grafana stack (logs + metrics + traces)
TraceQLJaeger query languageTraceQL — similar to PromQL/LogQL

Sampling Strategies

Tracing every request is expensive. Sampling controls what fraction of traces are recorded.

StrategyHow it worksTradeoff
Head-based (ratio)Decision made at the root span. Sample 10% of all requests.Simple. May miss rare errors if not sampled.
Tail-basedDecision made after trace completes. Always record slow or errored traces.More useful signal. Requires buffering full traces in collector.
Always-onRecord everything.Full fidelity. Only viable at low request volume.
tail sampling in OTel Collector — always keep errors and slow traces
processors:
  tail_sampling:
    decision_wait: 10s     # wait for all spans before deciding
    policies:
    - name: errors-policy
      type: status_code
      status_code: {status_codes: [ERROR]}
    - name: slow-traces
      type: latency
      latency: {threshold_ms: 500}
    - name: probabilistic-sample
      type: probabilistic
      probabilistic: {sampling_percentage: 5}    # 5% of remaining traces

Log–Trace Correlation

The real power is correlation: click a slow trace span, jump to the exact log lines emitted during that span. This requires the application to include trace_id and span_id in every log line.

inject trace context into logs (Python/structlog example)
from opentelemetry import trace
import structlog

def add_trace_context(logger, method, event_dict):
    span = trace.get_current_span()
    ctx = span.get_span_context()
    if ctx.is_valid:
        event_dict["trace_id"] = format(ctx.trace_id, "032x")
        event_dict["span_id"] = format(ctx.span_id, "016x")
    return event_dict

structlog.configure(
    processors=[add_trace_context, structlog.processors.JSONRenderer()]
)

In Grafana, configure the Loki datasource to derive fields from trace_id and link to Tempo — this creates a clickable "View Trace" button directly in log lines.