Workloads

StatefulSets for Stateful Apps

● Intermediate ⏱ 15 min read

A StatefulSet manages pods that require stable, persistent identity. Unlike a Deployment where pods are interchangeable, each StatefulSet pod has a fixed name (mysql-0, mysql-1, …), its own persistent storage, and a predictable DNS hostname. This makes StatefulSets the right tool for databases, message brokers, distributed coordination systems, and any application where "which pod" matters.

Why StatefulSets?

Most Kubernetes primitives treat pods as cattle — interchangeable, disposable, replaceable. This works great for stateless services. But stateful applications break that model:

A database replica needs to know which node is the primary and which is a replica — identity matters.
A Kafka broker stores partitions on local disk — storage must follow the pod, not be shared.
A ZooKeeper node must be reachable at a predictable hostname — other nodes reference it by name.

StatefulSets provide three guarantees that Deployments cannot:

Stable pod names — <name>-0, <name>-1, … that survive restarts and rescheduling.
Ordered operations — pods are created, scaled, and deleted in order (0 before 1 before 2).
Per-pod persistent volumes — each pod gets its own PVC that is never shared or reused by another pod.

StatefulSet vs Deployment

Property	Deployment	StatefulSet
Pod names	Random suffix (`nginx-7b9f5d6-xkz4q`)	Ordinal index (`mysql-0`, `mysql-1`)
Pod identity	Interchangeable	Unique and stable
Storage	Shared volumes or ephemeral	Per-pod PVC via volumeClaimTemplates
Scaling order	Parallel (any order)	Sequential (0→1→2 up; 2→1→0 down)
DNS	Service round-robin	Stable per-pod DNS via headless Service
Use case	Stateless web/API servers	Databases, brokers, distributed systems

StatefulSet YAML

statefulset.yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: mysql          # must match the headless Service name
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: root-password
        ports:
        - containerPort: 3306
          name: mysql
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: standard
      resources:
        requests:
          storage: 10Gi

The volumeClaimTemplates field is unique to StatefulSets. When the StatefulSet creates pod mysql-0, it also creates a PVC named data-mysql-0. Pod mysql-1 gets data-mysql-1, and so on. These PVCs are never deleted when you scale down — protecting your data.

Stable Network Identity

Each pod in a StatefulSet gets a stable identity that survives rescheduling. The pod name is always <statefulset-name>-<ordinal>. If pod mysql-1 crashes and Kubernetes recreates it on a different node, it still comes back as mysql-1 — with the same PVC attached.

Combined with a headless Service (see below), each pod gets a stable DNS hostname:

# Pod DNS pattern:
# <pod-name>.<service-name>.<namespace>.svc.cluster.local

mysql-0.mysql.default.svc.cluster.local
mysql-1.mysql.default.svc.cluster.local
mysql-2.mysql.default.svc.cluster.local

This means application code and other services can address specific pods by name — critical for primary/replica topologies where only mysql-0 is the write primary.

Headless Service

A StatefulSet requires a headless Service — a Service with clusterIP: None. Unlike a regular Service that load-balances across pods, a headless Service creates individual DNS A records for each pod, enabling direct pod-to-pod addressing.

headless-service.yaml

apiVersion: v1
kind: Service
metadata:
  name: mysql         # must match StatefulSet spec.serviceName
spec:
  clusterIP: None     # headless — no VIP, direct pod DNS records
  selector:
    app: mysql
  ports:
  - port: 3306
    name: mysql

💡

Two Services for StatefulSets

You typically need two Services: the headless Service (for pod-to-pod addressing) and a regular Service pointing only to the primary pod (for application writes). Apps that need read replicas connect to individual pod DNS names directly.

Ordered Pod Operations

StatefulSets are deliberate about ordering — by default, every operation on pods happens one at a time, in order:

Scale up: Pods are created from ordinal 0 upward. Pod N is not started until pod N-1 is Running and Ready.
Scale down: Pods are terminated in reverse ordinal order (highest first). Pod N must be fully terminated before pod N-1 is deleted.
Rolling update: Pods are updated from the highest ordinal down to 0.

Scale up from 0 → 3 replicas

mysql-0 Created first. Must be Ready before…

mysql-1 Created second. Must be Ready before…

mysql-2 Created last.

Scale down from 3 → 0 replicas (reverse order)

mysql-2 Terminated first.

mysql-1 Terminated after mysql-2 is gone.

mysql-0 Terminated last.

Ordered operations protect data integrity — the primary (ordinal 0) is always last to go down

If you need parallel operations (accepting the risk), set spec.podManagementPolicy: Parallel. This speeds up scale operations at the cost of losing ordering guarantees.

Volume Claim Templates

volumeClaimTemplates is the mechanism that gives each pod its own PVC. When a pod is created, Kubernetes automatically provisions a PVC named <template-name>-<pod-name>.

⚠️

PVCs are NOT deleted when you scale down

If you scale a StatefulSet from 3 to 1, pods mysql-1 and mysql-2 are deleted — but their PVCs (data-mysql-1 and data-mysql-2) remain. This is intentional data protection. If you scale back up, pod mysql-1 reattaches to data-mysql-1 — its data is still there. To reclaim the storage you must delete the PVCs manually.

# List PVCs created by a StatefulSet
kubectl get pvc -l app=mysql

# Delete PVCs manually when no longer needed
kubectl delete pvc data-mysql-1 data-mysql-2

Update Strategies

StatefulSets support two update strategies via spec.updateStrategy.type:

Strategy	Behaviour
RollingUpdate (default)	Updates pods from the highest ordinal down to 0. Each pod must be Ready before the next is updated.
OnDelete	Pods are only updated when you manually delete them. Gives full control over the update sequence — useful for complex failover choreography.

RollingUpdate also supports partition — only pods with ordinal ≥ partition value are updated. This enables canary-style staged rollouts of StatefulSets:

# Only update pods with ordinal >= 2 (i.e., mysql-2 and above)
spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      partition: 2

Common Pitfalls

Using a Deployment for a database. This is the most common mistake. Deployments can schedule multiple pods on the same node and share volumes between replicas (which a database cannot safely do), and they don't give you stable hostnames for replication setup. Use StatefulSets for anything that writes to persistent storage.

Forgetting the headless Service. A StatefulSet without a headless Service has no per-pod DNS records. Pods can't address each other by stable hostname. Always create the headless Service first.

Assuming PVCs are cleaned up. Scaling down orphans PVCs. Budget storage accordingly and script PVC cleanup if needed.

Not setting a readiness probe. Without a readiness probe, Kubernetes has no way to know when pod N is truly ready before starting pod N+1. For databases, the readiness probe should check that the DB is accepting connections, not just that the process is running.

# Check StatefulSet status
kubectl get statefulset mysql
kubectl describe statefulset mysql

# Watch pods come up in order
kubectl get pods -l app=mysql -w

# Check per-pod PVCs
kubectl get pvc -l app=mysql