StatefulSets for Stateful Apps
A StatefulSet manages pods that require stable, persistent identity. Unlike a Deployment where pods are interchangeable, each StatefulSet pod has a fixed name (mysql-0, mysql-1, …), its own persistent storage, and a predictable DNS hostname. This makes StatefulSets the right tool for databases, message brokers, distributed coordination systems, and any application where "which pod" matters.
Why StatefulSets?
Most Kubernetes primitives treat pods as cattle — interchangeable, disposable, replaceable. This works great for stateless services. But stateful applications break that model:
- A database replica needs to know which node is the primary and which is a replica — identity matters.
- A Kafka broker stores partitions on local disk — storage must follow the pod, not be shared.
- A ZooKeeper node must be reachable at a predictable hostname — other nodes reference it by name.
StatefulSets provide three guarantees that Deployments cannot:
- Stable pod names —
<name>-0,<name>-1, … that survive restarts and rescheduling. - Ordered operations — pods are created, scaled, and deleted in order (0 before 1 before 2).
- Per-pod persistent volumes — each pod gets its own PVC that is never shared or reused by another pod.
StatefulSet vs Deployment
| Property | Deployment | StatefulSet |
|---|---|---|
| Pod names | Random suffix (nginx-7b9f5d6-xkz4q) | Ordinal index (mysql-0, mysql-1) |
| Pod identity | Interchangeable | Unique and stable |
| Storage | Shared volumes or ephemeral | Per-pod PVC via volumeClaimTemplates |
| Scaling order | Parallel (any order) | Sequential (0→1→2 up; 2→1→0 down) |
| DNS | Service round-robin | Stable per-pod DNS via headless Service |
| Use case | Stateless web/API servers | Databases, brokers, distributed systems |
StatefulSet YAML
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
spec:
serviceName: mysql # must match the headless Service name
replicas: 3
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:8.0
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: root-password
ports:
- containerPort: 3306
name: mysql
volumeMounts:
- name: data
mountPath: /var/lib/mysql
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: standard
resources:
requests:
storage: 10Gi
The volumeClaimTemplates field is unique to StatefulSets. When the StatefulSet creates pod mysql-0, it also creates a PVC named data-mysql-0. Pod mysql-1 gets data-mysql-1, and so on. These PVCs are never deleted when you scale down — protecting your data.
Stable Network Identity
Each pod in a StatefulSet gets a stable identity that survives rescheduling. The pod name is always <statefulset-name>-<ordinal>. If pod mysql-1 crashes and Kubernetes recreates it on a different node, it still comes back as mysql-1 — with the same PVC attached.
Combined with a headless Service (see below), each pod gets a stable DNS hostname:
# Pod DNS pattern:
# <pod-name>.<service-name>.<namespace>.svc.cluster.local
mysql-0.mysql.default.svc.cluster.local
mysql-1.mysql.default.svc.cluster.local
mysql-2.mysql.default.svc.cluster.local
This means application code and other services can address specific pods by name — critical for primary/replica topologies where only mysql-0 is the write primary.
Headless Service
A StatefulSet requires a headless Service — a Service with clusterIP: None. Unlike a regular Service that load-balances across pods, a headless Service creates individual DNS A records for each pod, enabling direct pod-to-pod addressing.
apiVersion: v1
kind: Service
metadata:
name: mysql # must match StatefulSet spec.serviceName
spec:
clusterIP: None # headless — no VIP, direct pod DNS records
selector:
app: mysql
ports:
- port: 3306
name: mysql
You typically need two Services: the headless Service (for pod-to-pod addressing) and a regular Service pointing only to the primary pod (for application writes). Apps that need read replicas connect to individual pod DNS names directly.
Ordered Pod Operations
StatefulSets are deliberate about ordering — by default, every operation on pods happens one at a time, in order:
- Scale up: Pods are created from ordinal 0 upward. Pod N is not started until pod N-1 is Running and Ready.
- Scale down: Pods are terminated in reverse ordinal order (highest first). Pod N must be fully terminated before pod N-1 is deleted.
- Rolling update: Pods are updated from the highest ordinal down to 0.
If you need parallel operations (accepting the risk), set spec.podManagementPolicy: Parallel. This speeds up scale operations at the cost of losing ordering guarantees.
Volume Claim Templates
volumeClaimTemplates is the mechanism that gives each pod its own PVC. When a pod is created, Kubernetes automatically provisions a PVC named <template-name>-<pod-name>.
If you scale a StatefulSet from 3 to 1, pods mysql-1 and mysql-2 are deleted — but their PVCs (data-mysql-1 and data-mysql-2) remain. This is intentional data protection. If you scale back up, pod mysql-1 reattaches to data-mysql-1 — its data is still there. To reclaim the storage you must delete the PVCs manually.
# List PVCs created by a StatefulSet
kubectl get pvc -l app=mysql
# Delete PVCs manually when no longer needed
kubectl delete pvc data-mysql-1 data-mysql-2
Update Strategies
StatefulSets support two update strategies via spec.updateStrategy.type:
| Strategy | Behaviour |
|---|---|
| RollingUpdate (default) | Updates pods from the highest ordinal down to 0. Each pod must be Ready before the next is updated. |
| OnDelete | Pods are only updated when you manually delete them. Gives full control over the update sequence — useful for complex failover choreography. |
RollingUpdate also supports partition — only pods with ordinal ≥ partition value are updated. This enables canary-style staged rollouts of StatefulSets:
# Only update pods with ordinal >= 2 (i.e., mysql-2 and above)
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
partition: 2
Common Pitfalls
Using a Deployment for a database. This is the most common mistake. Deployments can schedule multiple pods on the same node and share volumes between replicas (which a database cannot safely do), and they don't give you stable hostnames for replication setup. Use StatefulSets for anything that writes to persistent storage.
Forgetting the headless Service. A StatefulSet without a headless Service has no per-pod DNS records. Pods can't address each other by stable hostname. Always create the headless Service first.
Assuming PVCs are cleaned up. Scaling down orphans PVCs. Budget storage accordingly and script PVC cleanup if needed.
Not setting a readiness probe. Without a readiness probe, Kubernetes has no way to know when pod N is truly ready before starting pod N+1. For databases, the readiness probe should check that the DB is accepting connections, not just that the process is running.
# Check StatefulSet status
kubectl get statefulset mysql
kubectl describe statefulset mysql
# Watch pods come up in order
kubectl get pods -l app=mysql -w
# Check per-pod PVCs
kubectl get pvc -l app=mysql