Observability

Distributed Tracing with OpenTelemetry

● Advanced ⏱ 15 min read

A request hits your API gateway, fans out to three microservices, queries two databases, and returns in 800ms. One of those services is slow. Logs tell you each service received and returned a request. Metrics tell you the p99 latency per service. But neither tells you which path this specific slow request took, or how long each hop contributed. Distributed tracing does.

Why Tracing

Tracing answers questions that logs and metrics can't:

Which service is responsible for the tail latency on this request?
What was the full call graph for request ID abc123?
Did the slowness come from the database query or the network hop?
Which downstream dependency is degraded right now?

Traces, Spans, Context

A trace represents a single end-to-end request. It is composed of spans — individual units of work (an HTTP handler, a DB query, a cache lookup). Spans form a tree: the root span is the entry point; child spans are downstream calls.

Trace anatomy — one request, multiple spans

0ms100ms200ms300ms400ms

gateway: POST /order

400ms

↳ order-svc: create

240ms

↳ db: INSERT orders

120ms ⚠

↳ inventory-svc: reserve

100ms

↳ notify-svc: email

120ms

⚠ db: INSERT orders (120ms) — immediately visible as the hot span. Without tracing you'd see "order-svc is slow" in metrics and hunt through logs to find the DB.

A trace is a tree of spans. Each span has a start time, duration, service name, and metadata. The waterfall view makes latency contributors immediately visible.

Trace context is propagated between services via HTTP headers (W3C traceparent header is the standard). Every service that receives a request extracts the trace ID and span ID, creates a child span, and forwards the context to downstream calls.

OTel Stack Overview

OpenTelemetry (OTel) is the CNCF standard for telemetry instrumentation. It unifies traces, metrics, and logs under one API so you don't depend on vendor SDKs.

Component	Role
OTel SDK	Library added to the application. Creates spans and exports telemetry.
OTel Operator	Kubernetes operator that injects OTel SDK automatically via webhook — no code changes.
OTel Collector	Receives, processes, and exports telemetry. Can fan out to multiple backends.
Backend (Jaeger/Tempo)	Stores and queries traces. Jaeger is self-contained; Tempo integrates with Grafana.

Auto-Instrumentation

The OTel Operator can inject the OTel SDK as an init container and configure it via environment variables — no application code changes required for many frameworks (Java, Python, Node.js, .NET, Go).

Instrumentation CR — auto-instrument Java pods

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: myapp-instrumentation
  namespace: production
spec:
  exporter:
    endpoint: http://otel-collector.monitoring:4317   # gRPC OTLP
  propagators:
    - tracecontext    # W3C standard
    - baggage
  sampler:
    type: parentbased_traceidratio
    argument: "0.1"   # sample 10% of root spans
  java:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest

annotate a Deployment to opt in

apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-svc
  namespace: production
spec:
  template:
    metadata:
      annotations:
        instrumentation.opentelemetry.io/inject-java: "myapp-instrumentation"
        # For Python:   instrumentation.opentelemetry.io/inject-python: "myapp-instrumentation"
        # For Node.js:  instrumentation.opentelemetry.io/inject-nodejs: "myapp-instrumentation"

OTel Collector

The Collector acts as a telemetry hub. Pipelines are composed of receivers (how data comes in), processors (transformations), and exporters (where data goes).

OTel Collector config — receive OTLP, export to Tempo + Jaeger

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 5s
    send_batch_size: 512
  memory_limiter:
    limit_mib: 512
    spike_limit_mib: 128
  resource:
    attributes:
    - action: insert
      key: k8s.cluster.name
      value: production

exporters:
  otlp/tempo:
    endpoint: tempo.monitoring:4317
    tls:
      insecure: true
  jaeger:
    endpoint: jaeger-collector.monitoring:14250
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch, resource]
      exporters: [otlp/tempo, jaeger]

Backends — Jaeger & Tempo

	Jaeger	Grafana Tempo
UI	Built-in Jaeger UI	Grafana (same dashboard as metrics/logs)
Storage	Elasticsearch, Cassandra, Badger	Object storage (S3, GCS) — very cheap
Best for	Standalone tracing, existing ES	Unified Grafana stack (logs + metrics + traces)
TraceQL	Jaeger query language	TraceQL — similar to PromQL/LogQL

Sampling Strategies

Tracing every request is expensive. Sampling controls what fraction of traces are recorded.

Strategy	How it works	Tradeoff
Head-based (ratio)	Decision made at the root span. Sample 10% of all requests.	Simple. May miss rare errors if not sampled.
Tail-based	Decision made after trace completes. Always record slow or errored traces.	More useful signal. Requires buffering full traces in collector.
Always-on	Record everything.	Full fidelity. Only viable at low request volume.

tail sampling in OTel Collector — always keep errors and slow traces

processors:
  tail_sampling:
    decision_wait: 10s     # wait for all spans before deciding
    policies:
    - name: errors-policy
      type: status_code
      status_code: {status_codes: [ERROR]}
    - name: slow-traces
      type: latency
      latency: {threshold_ms: 500}
    - name: probabilistic-sample
      type: probabilistic
      probabilistic: {sampling_percentage: 5}    # 5% of remaining traces

Log–Trace Correlation

The real power is correlation: click a slow trace span, jump to the exact log lines emitted during that span. This requires the application to include trace_id and span_id in every log line.

inject trace context into logs (Python/structlog example)

from opentelemetry import trace
import structlog

def add_trace_context(logger, method, event_dict):
    span = trace.get_current_span()
    ctx = span.get_span_context()
    if ctx.is_valid:
        event_dict["trace_id"] = format(ctx.trace_id, "032x")
        event_dict["span_id"] = format(ctx.span_id, "016x")
    return event_dict

structlog.configure(
    processors=[add_trace_context, structlog.processors.JSONRenderer()]
)

In Grafana, configure the Loki datasource to derive fields from trace_id and link to Tempo — this creates a clickable "View Trace" button directly in log lines.