Workloads

Liveness, Readiness & Startup Probes

● Intermediate ⏱ 12 min read

Kubernetes needs to know if your container is alive, ready to accept traffic, and finished starting up. These are three distinct questions — and Kubernetes has three separate probes to answer them. Getting probes right is one of the highest-leverage reliability improvements you can make to a Kubernetes workload. Getting them wrong causes restart cascades, traffic black-holes, and slow-rolling outages that are notoriously hard to diagnose.

Why Probes Matter

Without probes, Kubernetes can only observe whether a container's main process is running. It cannot tell if the process has deadlocked, if the app is stuck in an error loop consuming CPU, or if a slow-starting JVM is ready to serve requests. Probes bridge this gap — they let you tell Kubernetes what "healthy" and "ready" actually mean for your specific application.

Probe	Question answered	On failure
Liveness	Is the container still alive, or is it stuck?	Container is killed and restarted
Readiness	Is the container ready to receive traffic?	Pod is removed from Service endpoints — no new traffic
Startup	Has a slow-starting container finished initialising?	Container is killed; liveness and readiness are suspended until it passes

Liveness Probe

A liveness probe detects containers that are running but broken — deadlocked, in an infinite error loop, or otherwise unable to make progress. When a liveness probe fails failureThreshold times in a row, kubelet kills the container and restarts it.

Liveness via HTTP

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 15   # wait before first check
  periodSeconds: 10          # check every 10s
  failureThreshold: 3        # kill after 3 consecutive failures

⚠️

Liveness failures restart the container — use carefully

A failing liveness probe is a blunt instrument. It kills and restarts the container even if the issue is transient (a slow database query, a brief downstream timeout). Only use liveness to detect unrecoverable states — deadlocks, memory corruption, infinite loops. For transient failures, use readiness instead.

Readiness Probe

A readiness probe determines whether a container should receive traffic. When it fails, the pod is removed from its Service's endpoints — load balancers stop sending new requests to it. The container is not restarted. When the probe passes again, the pod is added back to endpoints automatically.

Readiness via HTTP

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5
  failureThreshold: 3
  successThreshold: 1    # must pass once to be considered ready

A common pattern is to have separate health endpoints:

/healthz — for liveness. Returns 200 if the process can still function. Never depends on external services.
/ready — for readiness. Returns 200 only if the app can serve requests (database connected, caches warm, etc.).

💡

Readiness also controls rolling updates

During a rolling update, Kubernetes waits for the new pod's readiness probe to pass before terminating an old pod. Without a readiness probe, new pods immediately replace old ones — sending traffic before the app has finished warming up or connecting to dependencies.

Startup Probe

A startup probe handles containers with slow initialization — JVMs, apps that run migrations on startup, or services that need to load large datasets into memory. While the startup probe is running, liveness and readiness probes are disabled. The startup probe gets failureThreshold × periodSeconds total time to succeed before the container is killed.

Startup probe — gives app up to 5 minutes to start

startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  failureThreshold: 30    # 30 failures allowed
  periodSeconds: 10       # checked every 10s = 300s (5 min) total

Once the startup probe succeeds once, it's disabled and liveness/readiness take over. Without a startup probe, you'd need a large initialDelaySeconds on your liveness probe — which delays detection of real failures after startup completes.

Probe Mechanisms

All three probe types support the same three mechanisms:

Mechanism	How it works	Best for
httpGet	kubelet makes an HTTP GET to the specified path and port. 2xx–3xx = success.	HTTP services with dedicated health endpoints
tcpSocket	kubelet attempts a TCP connection to the specified port. Connection succeeds = healthy.	Databases, message brokers, any TCP service without HTTP
exec	kubelet runs a command inside the container. Exit code 0 = success.	Custom health checks; apps without HTTP or TCP health endpoints
grpc	kubelet makes a gRPC health check (standard gRPC Health Checking Protocol).	gRPC services (Kubernetes 1.27+, stable)

All three mechanisms

# HTTP GET
livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
    httpHeaders:
    - name: X-Health-Check
      value: "1"

# TCP socket — e.g. MySQL
livenessProbe:
  tcpSocket:
    port: 3306

# Exec command
livenessProbe:
  exec:
    command:
    - /bin/sh
    - -c
    - "redis-cli ping | grep PONG"

# gRPC (Kubernetes 1.27+)
livenessProbe:
  grpc:
    port: 50051
    service: "health"

Configuration Fields

Field	Default	Description
`initialDelaySeconds`	0	Seconds to wait after container start before the first probe. Set this high enough for the app to start if you have no startup probe.
`periodSeconds`	10	How often to run the probe.
`timeoutSeconds`	1	Probe times out after this many seconds. Counts as a failure. Increase for slow health checks.
`failureThreshold`	3	Consecutive failures before the action is taken (kill for liveness/startup; remove from endpoints for readiness).
`successThreshold`	1	Consecutive successes needed to mark the probe as passing. Only useful to set >1 on readiness.

When to Use Each

Startup Probe

Use when: app takes >30s to initialise. Gives app time to start without triggering liveness kills.

Liveness Probe

Use when: app can deadlock or enter unrecoverable error states. Only fires after startup probe passes.

Readiness Probe

Use always. Controls traffic routing and rolling update pacing. Should check that the app can actually serve requests.

Startup → Liveness → Readiness: three probes, three distinct responsibilities

Common Mistakes

Using liveness for transient failures. If your liveness probe checks a database connection and the database has a brief hiccup, all your pods restart — turning a 5-second outage into a rolling restart storm. Liveness should only check if the container's own process is functional, never external dependencies.

No startup probe on slow-starting apps. Without a startup probe, you set a large initialDelaySeconds on liveness — but that delay applies every restart. After a crash, you wait the full delay before liveness kicks in, hiding deadlocks during recovery.

Setting timeoutSeconds: 1 on a database readiness check. The default 1-second timeout is aggressive for external checks. If the check sometimes takes 1.5s under load, it counts as a failure. Set timeoutSeconds to something realistic for the check you're running.

No readiness probe on newly deployed pods. Without a readiness probe, a pod starts receiving traffic the moment its container process starts — before the app has connected to the database, loaded config, or finished any startup work. This causes request errors during every deploy.

Full probe configuration — production template

containers:
- name: api
  image: myapp:1.0
  startupProbe:
    httpGet:
      path: /healthz
      port: 8080
    failureThreshold: 30
    periodSeconds: 10          # 5 min total to start
  livenessProbe:
    httpGet:
      path: /healthz
      port: 8080
    periodSeconds: 15
    timeoutSeconds: 5
    failureThreshold: 3        # restart after 45s of liveness failure
  readinessProbe:
    httpGet:
      path: /ready
      port: 8080
    periodSeconds: 5
    timeoutSeconds: 3
    failureThreshold: 3
    successThreshold: 1