Observability

Logging Architecture & Aggregation

● Intermediate ⏱ 12 min read

Kubernetes has no built-in log aggregation. Pods write to stdout/stderr; the container runtime captures those streams; the kubelet exposes them via kubectl logs. That's it. Logs are only retained on the node until the container is deleted or the log rotation limit is hit. Everything beyond that — shipping to a central store, querying across pods, retention — is your responsibility.

How K8s Logs Work

When a container writes to stdout or stderr, the container runtime (containerd/CRI-O) captures it and writes it to a log file on the node at /var/log/pods/<namespace>_<pod>_<uid>/<container>/0.log. The kubelet creates a symlink at /var/log/containers/ for convenience.

Kubernetes log pipeline — from container to aggregator
POD
app writes to
stdout/stderr
container runtime captures
NODE
/var/log/pods/…/0.log
log rotation via
logrotate / kubelet
default: 100 MB / 5 files
AGGREGATOR
Fluent Bit DaemonSet
tails log files
ships to Loki / ES / S3
kubectl logs reads directly from the node log file — no aggregator needed for live tailing. Once the pod is deleted, the log file is gone.
Containers write to stdout/stderr → runtime writes to node log file → DaemonSet agent tails and ships to central store.

Node-Level Logging Agent

The standard pattern: run a log-shipping agent as a DaemonSet so one agent instance runs on every node. The agent tails /var/log/containers/, adds Kubernetes metadata (namespace, pod name, labels), and forwards to a backend.

AgentWritten inMemoryBest for
Fluent BitC~20 MBHigh-throughput, low footprint. Preferred for K8s.
FluentdRuby~50–200 MBRich plugin ecosystem. Better for complex routing logic.
PromtailGo~30 MBLoki-native. Auto-discovers pods via K8s API.
VectorRust~15 MBUnified logs + metrics pipeline. Fast.

Sidecar Logging Patterns

When an application writes logs to a file (not stdout), or when you need to split a single stream into multiple destinations, use a sidecar container that reads the file and re-emits to stdout.

sidecar log streamer — emit file as stdout
spec:
  containers:
  - name: app
    image: myapp:1.0.0
    volumeMounts:
    - name: logs
      mountPath: /var/log/app

  - name: log-streamer          # sidecar: tail the file to stdout
    image: busybox:1.36
    args: [/bin/sh, -c, "tail -n+1 -F /var/log/app/app.log"]
    volumeMounts:
    - name: logs
      mountPath: /var/log/app

  volumes:
  - name: logs
    emptyDir: {}                 # shared between containers
💡
Prefer stdout over file logging

If you control the application, configure it to write to stdout/stderr. The sidecar pattern adds a container, a shared volume, and a tail process. Stdout is simpler, cheaper, and works with kubectl logs natively.

Structured Logging

Emit logs as JSON. Aggregators can parse fields without fragile regex patterns, and backends can index and query individual fields efficiently.

structured log — JSON format
# Good — every field queryable
{"time":"2024-01-15T10:23:41Z","level":"info","msg":"request handled",
 "method":"GET","path":"/api/users","status":200,"latency_ms":12,
 "trace_id":"abc123","pod":"api-7d9f6-xkw2p","namespace":"production"}

# Bad — free-form text, hard to parse and filter
2024-01-15 10:23:41 INFO GET /api/users 200 12ms

Include these fields in every log line: time (RFC3339), level, msg, and trace_id (for correlation with distributed tracing).

Fluent Bit DaemonSet

fluent-bit configmap — tail K8s logs and forward to Loki
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: logging
data:
  fluent-bit.conf: |
    [SERVICE]
        Flush        1
        Log_Level    info
        Parsers_File parsers.conf

    [INPUT]
        Name              tail
        Path              /var/log/containers/*.log
        multiline.parser  docker, cri
        Tag               kube.*
        Refresh_Interval  5
        Mem_Buf_Limit     50MB

    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Merge_Log           On       # merge JSON log fields into record
        Keep_Log            Off
        K8S-Logging.Parser  On

    [OUTPUT]
        Name        loki
        Match       kube.*
        Host        loki.logging.svc.cluster.local
        Port        3100
        Labels      job=fluentbit,namespace=$kubernetes['namespace_name'],pod=$kubernetes['pod_name']
        Auto_Kubernetes_Labels On
fluent-bit daemonset (abbreviated)
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
spec:
  selector:
    matchLabels:
      app: fluent-bit
  template:
    spec:
      serviceAccountName: fluent-bit   # needs get/list/watch pods
      tolerations:
      - key: node-role.kubernetes.io/control-plane
        effect: NoSchedule              # run on control-plane nodes too
      containers:
      - name: fluent-bit
        image: fluent/fluent-bit:3.0
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: config
          mountPath: /fluent-bit/etc/
      volumes:
      - name: varlog
        hostPath:
          path: /var/log               # read node log files
      - name: config
        configMap:
          name: fluent-bit-config

Grafana Loki Stack

Loki stores logs indexed only by labels (not full-text), making it significantly cheaper than Elasticsearch for high-volume log storage. Query language is LogQL — similar to PromQL.

LogQL — common queries in Grafana
# Stream all logs from a namespace
{namespace="production"}

# Filter for error lines
{namespace="production"} |= "ERROR"

# JSON parsing — filter on a specific field value
{namespace="production"} | json | status >= 500

# Rate of error log lines per minute
rate({namespace="production"} |= "ERROR" [1m])

# Count log lines by pod
sum by (pod) (count_over_time({namespace="production"}[5m]))

Retention & Cost

Log volume grows with traffic. A realistic breakdown for a medium-sized cluster:

BackendCost modelRetention sweet spot
LokiObject storage (S3/GCS). Very cheap. ~$0.02/GB/month on S3.30–90 days queryable; archive indefinitely.
ElasticsearchCompute + disk. Expensive at scale.7–14 days hot; ILM to warm/cold after.
CloudWatch / Cloud LoggingPer GB ingested + stored.Managed, but costs spike without filtering.
⚠️
Filter before shipping

Drop health-check logs, debug-level lines, and high-cardinality noise at the agent before they hit storage. A Fluent Bit [FILTER] Name grep that drops GET /healthz lines can cut log volume by 20–40% in typical clusters.

kubectl Commands

# Stream logs from a pod (all containers)
kubectl logs -f mypod -n production --all-containers

# Logs from a specific container
kubectl logs -f mypod -c mycontainer -n production

# Logs from all pods matching a label selector
kubectl logs -f -l app=myapp -n production --all-containers

# Previous container's logs (after crash/restart)
kubectl logs mypod -c mycontainer --previous -n production

# Tail last 100 lines
kubectl logs mypod --tail=100 -n production

# Logs since a time (RFC3339 or duration)
kubectl logs mypod --since=1h -n production
kubectl logs mypod --since-time=2024-01-15T10:00:00Z -n production