Liveness, Readiness & Startup Probes
Kubernetes needs to know if your container is alive, ready to accept traffic, and finished starting up. These are three distinct questions — and Kubernetes has three separate probes to answer them. Getting probes right is one of the highest-leverage reliability improvements you can make to a Kubernetes workload. Getting them wrong causes restart cascades, traffic black-holes, and slow-rolling outages that are notoriously hard to diagnose.
Why Probes Matter
Without probes, Kubernetes can only observe whether a container's main process is running. It cannot tell if the process has deadlocked, if the app is stuck in an error loop consuming CPU, or if a slow-starting JVM is ready to serve requests. Probes bridge this gap — they let you tell Kubernetes what "healthy" and "ready" actually mean for your specific application.
| Probe | Question answered | On failure |
|---|---|---|
| Liveness | Is the container still alive, or is it stuck? | Container is killed and restarted |
| Readiness | Is the container ready to receive traffic? | Pod is removed from Service endpoints — no new traffic |
| Startup | Has a slow-starting container finished initialising? | Container is killed; liveness and readiness are suspended until it passes |
Liveness Probe
A liveness probe detects containers that are running but broken — deadlocked, in an infinite error loop, or otherwise unable to make progress. When a liveness probe fails failureThreshold times in a row, kubelet kills the container and restarts it.
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15 # wait before first check
periodSeconds: 10 # check every 10s
failureThreshold: 3 # kill after 3 consecutive failures
A failing liveness probe is a blunt instrument. It kills and restarts the container even if the issue is transient (a slow database query, a brief downstream timeout). Only use liveness to detect unrecoverable states — deadlocks, memory corruption, infinite loops. For transient failures, use readiness instead.
Readiness Probe
A readiness probe determines whether a container should receive traffic. When it fails, the pod is removed from its Service's endpoints — load balancers stop sending new requests to it. The container is not restarted. When the probe passes again, the pod is added back to endpoints automatically.
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
successThreshold: 1 # must pass once to be considered ready
A common pattern is to have separate health endpoints:
/healthz— for liveness. Returns 200 if the process can still function. Never depends on external services./ready— for readiness. Returns 200 only if the app can serve requests (database connected, caches warm, etc.).
During a rolling update, Kubernetes waits for the new pod's readiness probe to pass before terminating an old pod. Without a readiness probe, new pods immediately replace old ones — sending traffic before the app has finished warming up or connecting to dependencies.
Startup Probe
A startup probe handles containers with slow initialization — JVMs, apps that run migrations on startup, or services that need to load large datasets into memory. While the startup probe is running, liveness and readiness probes are disabled. The startup probe gets failureThreshold × periodSeconds total time to succeed before the container is killed.
startupProbe:
httpGet:
path: /healthz
port: 8080
failureThreshold: 30 # 30 failures allowed
periodSeconds: 10 # checked every 10s = 300s (5 min) total
Once the startup probe succeeds once, it's disabled and liveness/readiness take over. Without a startup probe, you'd need a large initialDelaySeconds on your liveness probe — which delays detection of real failures after startup completes.
Probe Mechanisms
All three probe types support the same three mechanisms:
| Mechanism | How it works | Best for |
|---|---|---|
| httpGet | kubelet makes an HTTP GET to the specified path and port. 2xx–3xx = success. | HTTP services with dedicated health endpoints |
| tcpSocket | kubelet attempts a TCP connection to the specified port. Connection succeeds = healthy. | Databases, message brokers, any TCP service without HTTP |
| exec | kubelet runs a command inside the container. Exit code 0 = success. | Custom health checks; apps without HTTP or TCP health endpoints |
| grpc | kubelet makes a gRPC health check (standard gRPC Health Checking Protocol). | gRPC services (Kubernetes 1.27+, stable) |
# HTTP GET
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: X-Health-Check
value: "1"
# TCP socket — e.g. MySQL
livenessProbe:
tcpSocket:
port: 3306
# Exec command
livenessProbe:
exec:
command:
- /bin/sh
- -c
- "redis-cli ping | grep PONG"
# gRPC (Kubernetes 1.27+)
livenessProbe:
grpc:
port: 50051
service: "health"
Configuration Fields
| Field | Default | Description |
|---|---|---|
initialDelaySeconds | 0 | Seconds to wait after container start before the first probe. Set this high enough for the app to start if you have no startup probe. |
periodSeconds | 10 | How often to run the probe. |
timeoutSeconds | 1 | Probe times out after this many seconds. Counts as a failure. Increase for slow health checks. |
failureThreshold | 3 | Consecutive failures before the action is taken (kill for liveness/startup; remove from endpoints for readiness). |
successThreshold | 1 | Consecutive successes needed to mark the probe as passing. Only useful to set >1 on readiness. |
When to Use Each
Common Mistakes
Using liveness for transient failures. If your liveness probe checks a database connection and the database has a brief hiccup, all your pods restart — turning a 5-second outage into a rolling restart storm. Liveness should only check if the container's own process is functional, never external dependencies.
No startup probe on slow-starting apps. Without a startup probe, you set a large initialDelaySeconds on liveness — but that delay applies every restart. After a crash, you wait the full delay before liveness kicks in, hiding deadlocks during recovery.
Setting timeoutSeconds: 1 on a database readiness check. The default 1-second timeout is aggressive for external checks. If the check sometimes takes 1.5s under load, it counts as a failure. Set timeoutSeconds to something realistic for the check you're running.
No readiness probe on newly deployed pods. Without a readiness probe, a pod starts receiving traffic the moment its container process starts — before the app has connected to the database, loaded config, or finished any startup work. This causes request errors during every deploy.
containers:
- name: api
image: myapp:1.0
startupProbe:
httpGet:
path: /healthz
port: 8080
failureThreshold: 30
periodSeconds: 10 # 5 min total to start
livenessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 15
timeoutSeconds: 5
failureThreshold: 3 # restart after 45s of liveness failure
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
successThreshold: 1