Logging Architecture & Aggregation
Kubernetes has no built-in log aggregation. Pods write to stdout/stderr; the container runtime captures those streams; the kubelet exposes them via kubectl logs. That's it. Logs are only retained on the node until the container is deleted or the log rotation limit is hit. Everything beyond that — shipping to a central store, querying across pods, retention — is your responsibility.
How K8s Logs Work
When a container writes to stdout or stderr, the container runtime (containerd/CRI-O) captures it and writes it to a log file on the node at /var/log/pods/<namespace>_<pod>_<uid>/<container>/0.log. The kubelet creates a symlink at /var/log/containers/ for convenience.
stdout/stderr
logrotate / kubelet
tails log files
Node-Level Logging Agent
The standard pattern: run a log-shipping agent as a DaemonSet so one agent instance runs on every node. The agent tails /var/log/containers/, adds Kubernetes metadata (namespace, pod name, labels), and forwards to a backend.
| Agent | Written in | Memory | Best for |
|---|---|---|---|
| Fluent Bit | C | ~20 MB | High-throughput, low footprint. Preferred for K8s. |
| Fluentd | Ruby | ~50–200 MB | Rich plugin ecosystem. Better for complex routing logic. |
| Promtail | Go | ~30 MB | Loki-native. Auto-discovers pods via K8s API. |
| Vector | Rust | ~15 MB | Unified logs + metrics pipeline. Fast. |
Sidecar Logging Patterns
When an application writes logs to a file (not stdout), or when you need to split a single stream into multiple destinations, use a sidecar container that reads the file and re-emits to stdout.
spec:
containers:
- name: app
image: myapp:1.0.0
volumeMounts:
- name: logs
mountPath: /var/log/app
- name: log-streamer # sidecar: tail the file to stdout
image: busybox:1.36
args: [/bin/sh, -c, "tail -n+1 -F /var/log/app/app.log"]
volumeMounts:
- name: logs
mountPath: /var/log/app
volumes:
- name: logs
emptyDir: {} # shared between containers
If you control the application, configure it to write to stdout/stderr. The sidecar pattern adds a container, a shared volume, and a tail process. Stdout is simpler, cheaper, and works with kubectl logs natively.
Structured Logging
Emit logs as JSON. Aggregators can parse fields without fragile regex patterns, and backends can index and query individual fields efficiently.
# Good — every field queryable
{"time":"2024-01-15T10:23:41Z","level":"info","msg":"request handled",
"method":"GET","path":"/api/users","status":200,"latency_ms":12,
"trace_id":"abc123","pod":"api-7d9f6-xkw2p","namespace":"production"}
# Bad — free-form text, hard to parse and filter
2024-01-15 10:23:41 INFO GET /api/users 200 12ms
Include these fields in every log line: time (RFC3339), level, msg, and trace_id (for correlation with distributed tracing).
Fluent Bit DaemonSet
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: logging
data:
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Parsers_File parsers.conf
[INPUT]
Name tail
Path /var/log/containers/*.log
multiline.parser docker, cri
Tag kube.*
Refresh_Interval 5
Mem_Buf_Limit 50MB
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Merge_Log On # merge JSON log fields into record
Keep_Log Off
K8S-Logging.Parser On
[OUTPUT]
Name loki
Match kube.*
Host loki.logging.svc.cluster.local
Port 3100
Labels job=fluentbit,namespace=$kubernetes['namespace_name'],pod=$kubernetes['pod_name']
Auto_Kubernetes_Labels On
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: logging
spec:
selector:
matchLabels:
app: fluent-bit
template:
spec:
serviceAccountName: fluent-bit # needs get/list/watch pods
tolerations:
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule # run on control-plane nodes too
containers:
- name: fluent-bit
image: fluent/fluent-bit:3.0
volumeMounts:
- name: varlog
mountPath: /var/log
- name: config
mountPath: /fluent-bit/etc/
volumes:
- name: varlog
hostPath:
path: /var/log # read node log files
- name: config
configMap:
name: fluent-bit-config
Grafana Loki Stack
Loki stores logs indexed only by labels (not full-text), making it significantly cheaper than Elasticsearch for high-volume log storage. Query language is LogQL — similar to PromQL.
# Stream all logs from a namespace
{namespace="production"}
# Filter for error lines
{namespace="production"} |= "ERROR"
# JSON parsing — filter on a specific field value
{namespace="production"} | json | status >= 500
# Rate of error log lines per minute
rate({namespace="production"} |= "ERROR" [1m])
# Count log lines by pod
sum by (pod) (count_over_time({namespace="production"}[5m]))
Retention & Cost
Log volume grows with traffic. A realistic breakdown for a medium-sized cluster:
| Backend | Cost model | Retention sweet spot |
|---|---|---|
| Loki | Object storage (S3/GCS). Very cheap. ~$0.02/GB/month on S3. | 30–90 days queryable; archive indefinitely. |
| Elasticsearch | Compute + disk. Expensive at scale. | 7–14 days hot; ILM to warm/cold after. |
| CloudWatch / Cloud Logging | Per GB ingested + stored. | Managed, but costs spike without filtering. |
Drop health-check logs, debug-level lines, and high-cardinality noise at the agent before they hit storage. A Fluent Bit [FILTER] Name grep that drops GET /healthz lines can cut log volume by 20–40% in typical clusters.
kubectl Commands
# Stream logs from a pod (all containers)
kubectl logs -f mypod -n production --all-containers
# Logs from a specific container
kubectl logs -f mypod -c mycontainer -n production
# Logs from all pods matching a label selector
kubectl logs -f -l app=myapp -n production --all-containers
# Previous container's logs (after crash/restart)
kubectl logs mypod -c mycontainer --previous -n production
# Tail last 100 lines
kubectl logs mypod --tail=100 -n production
# Logs since a time (RFC3339 or duration)
kubectl logs mypod --since=1h -n production
kubectl logs mypod --since-time=2024-01-15T10:00:00Z -n production