Production Operations
Multi-Cluster Patterns
A single cluster is simpler to operate. Multiple clusters add complexity — more control planes, more kubeconfigs, cross-cluster networking to solve. But beyond a certain scale or compliance requirement, multiple clusters become necessary. This guide covers when to split, the common patterns, and the tooling that makes fleet management tractable.
Why Multiple Clusters
| Driver | Why a separate cluster |
|---|---|
| Environment isolation | Dev, staging, prod on separate clusters — a misconfigured RBAC in dev doesn't touch prod. |
| Blast radius | A cluster-level incident (etcd corruption, bad upgrade) affects only that cluster's workloads. |
| Compliance / data residency | GDPR or regulatory requirements mandate workloads and data stay in a specific region or jurisdiction. |
| Multi-region availability | Run active clusters in us-east-1 and eu-west-1; route traffic to the healthy region during outages. |
| Team autonomy | Large orgs give platform teams their own cluster to avoid shared control plane contention. |
| Specialised hardware | GPU clusters for ML teams; ARM clusters for cost savings; FIPS clusters for defence workloads. |
Common Topologies
Multi-cluster topologies — hub-and-spoke vs. active-active
Hub-and-Spoke (GitOps)
Management Cluster
ArgoCD / Flux / Rancher
↓
prod-us
↓
prod-eu
↓
staging
Single pane for deploys. Management cluster pushes manifests to member clusters.
Active-Active (Multi-Region)
Global Load Balancer
AWS Route53 / Cloudflare
↓
us-east-1
cluster
cluster
↓
eu-west-1
cluster
cluster
Both serve traffic. Health-check failover routes to healthy region if one goes down.
Hub-and-spoke: one management cluster deploys to many member clusters via GitOps. Active-active: multiple independent clusters behind a global load balancer for HA and low-latency routing.
Fleet Management
| Tool | Model | Best for |
|---|---|---|
| Cluster API (CAPI) | Declarative cluster lifecycle — create, upgrade, delete clusters via Kubernetes CRDs. | Homogeneous self-managed fleets. Infrastructure-as-code for clusters. |
| ArgoCD ApplicationSet | One ApplicationSet generates an ArgoCD Application per cluster from a template. | GitOps deployment to many clusters simultaneously. |
| Flux + Cluster API | Flux manages manifests; CAPI manages the clusters themselves. | Full lifecycle GitOps — both cluster and app config in git. |
| Rancher / Rancher MCM | UI + API for multi-cluster management, policies, and monitoring. | Ops teams needing a UI; mixed cloud environments. |
ArgoCD ApplicationSet — deploy to all prod clusters
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: myapp
namespace: argocd
spec:
generators:
- clusters:
selector:
matchLabels:
environment: production # targets all clusters labelled env=production
template:
metadata:
name: "myapp-{{name}}" # {{name}} = cluster name
spec:
project: default
source:
repoURL: https://github.com/myorg/gitops
targetRevision: main
path: apps/myapp/overlays/production
destination:
server: "{{server}}" # {{server}} = cluster API endpoint
namespace: myapp
syncPolicy:
automated:
prune: true
selfHeal: true
GitOps Multi-Cluster
The standard repo structure for multi-cluster GitOps: one folder per cluster or per environment, with a base and per-cluster overlays.
gitops-repo/
├── apps/
│ └── myapp/
│ ├── base/ ← shared manifests
│ │ ├── deployment.yaml
│ │ └── service.yaml
│ └── overlays/
│ ├── production-us/
│ │ └── kustomization.yaml ← us-specific patches (replicas, resources)
│ └── production-eu/
│ └── kustomization.yaml ← eu-specific patches
└── clusters/
├── prod-us/ ← ArgoCD/Flux cluster config
└── prod-eu/
Cross-Cluster Service Discovery
Kubernetes DNS (svc.cluster.local) is cluster-scoped — it cannot resolve services in other clusters. Options for cross-cluster service communication:
| Approach | How it works |
|---|---|
| External DNS + LoadBalancer | Each cluster exposes services via LoadBalancer; DNS resolves to the other cluster's LB IP. Simple but adds latency and LB cost. |
| Submariner | Extends K8s Service and DNS across clusters via IPSec tunnels. svc.cluster.local works cross-cluster. |
| Istio multi-cluster | East-west gateway between clusters. Service mesh spans multiple clusters; mTLS transparent to apps. |
| KubeFed ServiceExport/Import | Kubernetes SIG Multicluster API — export a Service from one cluster, import it into another. |
Traffic Splitting & Failover
AWS Route 53 — weighted multi-cluster routing
# Route 70% traffic to us-east-1 cluster, 30% to eu-west-1
aws route53 change-resource-record-sets --hosted-zone-id ZONE_ID \
--change-batch '{
"Changes": [
{"Action":"UPSERT","ResourceRecordSet":{
"Name":"api.example.com","Type":"A",
"SetIdentifier":"us-east-1",
"Weight":70,
"AliasTarget":{"DNSName":"us-east-1-alb.amazonaws.com","EvaluateTargetHealth":true,"HostedZoneId":"Z35SXDOTRQ7X7K"}
}},
{"Action":"UPSERT","ResourceRecordSet":{
"Name":"api.example.com","Type":"A",
"SetIdentifier":"eu-west-1",
"Weight":30,
"AliasTarget":{"DNSName":"eu-west-1-alb.amazonaws.com","EvaluateTargetHealth":true,"HostedZoneId":"Z32O12XQLNTSW2"}
}}
]
}'
kubeconfig & Context Management
# Merge multiple kubeconfigs
KUBECONFIG=~/.kube/prod-us.yaml:~/.kube/prod-eu.yaml:~/.kube/staging.yaml \
kubectl config view --flatten > ~/.kube/config
# List all contexts
kubectl config get-contexts
# Switch context
kubectl config use-context prod-us
# Run a command against a specific context without switching
kubectl --context=prod-eu get pods -n myapp
# Install kubectx for fast switching
brew install kubectx
kubectx prod-eu # switch cluster
kubens myapp # switch namespace