Storage

Storage Patterns for Stateful Apps

● Advanced ⏱ 20 min read

Running stateless apps on Kubernetes is straightforward. Running stateful workloads — databases, message queues, distributed caches — requires careful thought about storage identity, replication, availability, and recovery. This guide covers the patterns that actually work in production, and the cases where "just use managed" is the right answer.

StatefulSet + PVC Templates

A StatefulSet's volumeClaimTemplates field automatically provisions a dedicated PVC for each pod, named <template-name>-<pod-name>. Unlike Deployment pods that all share one PVC, StatefulSet pods each own private storage — essential for databases where each replica has its own data directory.

statefulset-with-pvc.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: production
spec:
  serviceName: postgres          # headless Service for stable DNS
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:16
        env:
        - name: PGDATA
          value: /var/lib/postgresql/data/pgdata
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:          # one PVC created per pod
  - metadata:
      name: data
    spec:
      accessModes: [ReadWriteOnce]
      storageClassName: aws-gp3-retain   # use Retain policy for prod data!
      resources:
        requests:
          storage: 100Gi
StatefulSet — per-pod PVCs with stable identity
pod
postgres-0
primary
PVC: data-postgres-0
100 GiB
EBS vol-aaa
pod
postgres-1
replica
PVC: data-postgres-1
100 GiB
EBS vol-bbb
pod
postgres-2
replica
PVC: data-postgres-2
100 GiB
EBS vol-ccc
PVCs survive pod deletion — if postgres-0 crashes and restarts, it binds to the same data-postgres-0 PVC
Each StatefulSet pod owns its PVC. Pod identity (ordinal) stays stable across restarts — the pod always reattaches to the same volume.
⚠️
Deleting a StatefulSet does not delete its PVCs

PVCs created by volumeClaimTemplates are intentionally orphaned when you delete the StatefulSet — Kubernetes won't accidentally delete your database. You must delete the PVCs manually after confirming data is no longer needed. To delete both together: scale to 0 first, then delete the StatefulSet, then delete the PVCs.

RWO Limitations

Most cloud block storage (EBS, GCP PD, Azure Disk) only supports ReadWriteOnce — one node at a time. This creates a constraint for StatefulSets: if a node fails and the pod is rescheduled, the new node must wait for the old node to release the disk. The process typically takes 6–10 minutes before cloud controllers detect the node failure and force-detach the volume.

Strategies to reduce RWO attachment delays:

Data Replication Patterns

Where replication lives depends on the technology stack:

PatternWho replicatesExampleTrade-off
Application-level replicationThe database itselfPostgreSQL streaming replication, Redis Sentinel, Kafka ISRBattle-tested; storage can be plain RWO blocks
Storage-level replicationThe distributed storage layerCeph/Rook, Longhorn, PortworxWorks for any app; adds storage cluster overhead
Cloud-native snapshotsCloud providerEBS snapshots, GCP disk snapshotsPoint-in-time; not real-time HA

Volume Expansion

You can grow PVCs on a StorageClass with allowVolumeExpansion: true. For StatefulSets, you can't change volumeClaimTemplates directly — patch the PVCs individually, then update the StatefulSet template (which takes effect on new pods).

# Expand all data PVCs in a StatefulSet with 3 replicas
for i in 0 1 2; do
  kubectl patch pvc data-postgres-$i -n production \
    -p '{"spec":{"resources":{"requests":{"storage":"200Gi"}}}}'
done

# Update the volumeClaimTemplate so future pods (scale-up) also get 200Gi
# (requires deleting and re-creating the StatefulSet with the new template,
#  or using --cascade=orphan to preserve PVCs)

Volume Snapshots

Kubernetes volume snapshots (GA in 1.20) let you take point-in-time copies of a PVC using the storage provider's snapshot mechanism — no application quiescing required at the Kubernetes level (though crash-consistent vs application-consistent is up to the app).

volume-snapshot.yaml
# Create a snapshot
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: postgres-snap-2024-01
  namespace: production
spec:
  volumeSnapshotClassName: csi-aws-vsc   # VolumeSnapshotClass for EBS
  source:
    persistentVolumeClaimName: data-postgres-0
---
# Restore from snapshot — create a new PVC pre-populated with snapshot data
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-restored
  namespace: production
spec:
  accessModes: [ReadWriteOnce]
  storageClassName: aws-gp3
  resources:
    requests:
      storage: 100Gi
  dataSource:
    name: postgres-snap-2024-01
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io

Backup with Velero

Velero is the standard Kubernetes backup tool. It backs up Kubernetes resource manifests (to object storage) and optionally volume data (via filesystem backup or volume snapshots).

# Install Velero with AWS backend
velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.10 \
  --bucket my-velero-backups \
  --backup-location-config region=us-east-1

# Create a backup of the production namespace (resources + volume snapshots)
velero backup create prod-backup-$(date +%F) \
  --include-namespaces production \
  --snapshot-volumes

# Restore from a backup
velero restore create --from-backup prod-backup-2024-01-15

# Schedule daily backups at 02:00
velero schedule create daily-prod \
  --schedule="0 2 * * *" \
  --include-namespaces production \
  --ttl 720h   # keep for 30 days

Managed vs Self-Hosted

Before running a database on Kubernetes, honestly assess the operational cost:

Self-hosted on K8sManaged service
HA setupYou configure replication, failover, and fencingAutomatic
UpgradesYou manage rolling upgrades across replicasAutomatic or one-click
BackupsYou run Velero or custom jobs, test restoresAutomatic, point-in-time restore included
EncryptionYou configure TLS and etcd Secret encryptionAutomatic, usually with KMS integration
CostCluster compute; no license feeService premium (typically 2–3× raw compute)
Expertise neededDeep DB + Kubernetes knowledge requiredSQL/API only
💡
When to self-host on Kubernetes

Self-hosting makes sense when: you need to run in a private network without cloud egress, you're already using a Kubernetes operator (Zalando postgres-operator, Strimzi Kafka, CloudNativePG) that handles HA and upgrades, or cost at scale makes the managed service premium prohibitive. For most teams starting out, managed databases buy back engineering time that compounds over years.

kubectl Commands

# List all PVCs created by a StatefulSet's volumeClaimTemplates
kubectl get pvc -n production -l app=postgres

# Check which node a StatefulSet pod is running on (for AZ awareness)
kubectl get pods -n production -o wide -l app=postgres

# Describe a PVC to see capacity, StorageClass, bound PV
kubectl describe pvc data-postgres-0 -n production

# Scale a StatefulSet down (PVCs are preserved)
kubectl scale statefulset postgres --replicas=0 -n production

# View volume attachment status on a node
kubectl describe node node-1 | grep -A 10 "Attached Volumes"

# Force-detach a stuck RWO volume (node offline) — use with care
kubectl delete volumeattachment <attachment-name>

# List VolumeSnapshots
kubectl get volumesnapshot -n production