Resource Requests & Limits
Every pod in Kubernetes competes for node resources. Without guidance, the scheduler places pods blindly — overloading some nodes and starving others. Requests tell the scheduler how much CPU and memory a container needs. Limits cap how much it can consume. Together they control scheduling placement, QoS priority, and whether a container gets throttled or killed when the cluster is under pressure.
Why Resource Management Matters
Skipping resource declarations has predictable consequences:
- The scheduler cannot make informed placement decisions — pods land on already-overloaded nodes.
- A single runaway container can starve every other pod on the node.
- Pods get assigned
BestEffortQoS — the first evicted under memory pressure. - Auto-scaling (HPA) cannot work without resource metrics to react to.
Requests vs Limits
These two values serve entirely different purposes:
| Requests | Limits | |
|---|---|---|
| Purpose | Scheduler guarantee — node must have at least this available | Runtime cap — container cannot exceed this |
| CPU behaviour | Minimum CPU share guaranteed under contention | Container is throttled if it tries to use more (no kill) |
| Memory behaviour | Minimum memory the node reserves for this container | Container is OOMKilled if it exceeds this |
| Affects | Scheduling, QoS class, HPA | cgroup enforcement, eviction order |
CPU Units
CPU is measured in cores or millicores. One core = 1000 millicores (1000m).
resources:
requests:
cpu: "250m" # 0.25 of a core (one quarter)
# equivalently:
cpu: "0.25"
limits:
cpu: "1" # 1 full core
cpu: "1000m" # same thing
CPU is compressible. If a container tries to use more CPU than its limit, it is throttled — the kernel reduces its CPU time. The container keeps running but slows down. This can cause unexpected latency spikes even when the container appears healthy.
Memory Units
Memory uses standard SI suffixes: Ki, Mi, Gi (binary) or K, M, G (decimal).
resources:
requests:
memory: "128Mi" # 128 mebibytes (134,217,728 bytes)
limits:
memory: "256Mi" # 256 mebibytes
Memory is incompressible. If a container exceeds its memory limit, the kernel kills it with OOMKilled. Kubernetes then restarts it according to the pod's restartPolicy. A container that repeatedly hits its memory limit shows OOMKilled in kubectl describe pod.
Exceeding a memory limit kills the container immediately. Exceeding a CPU limit just slows it. This asymmetry means memory limits should be set with headroom (1.5–2× the typical working set), while CPU limits can be set closer to average usage since the penalty is latency, not death.
Setting Resources
spec:
template:
spec:
containers:
- name: api
image: myapp:1.0
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "256Mi"
cpu: "500m"
To find good starting values, run the app under realistic load with no limits, then observe actual usage with:
# Requires metrics-server installed
kubectl top pods
kubectl top pods --containers
# Detailed per-container usage
kubectl top pod <pod-name> --containers
QoS Classes
Kubernetes assigns a Quality of Service class to each pod based on its resource configuration. This determines eviction order when a node runs out of memory.
| Class | Condition | Eviction order |
|---|---|---|
| Guaranteed | Every container has equal requests and limits for both CPU and memory | Last evicted |
| Burstable | At least one container has a request or limit, but they differ | Evicted after BestEffort |
| BestEffort | No container has any requests or limits | First evicted |
# Check assigned QoS class
kubectl get pod myapp -o jsonpath='{.status.qosClass}'
# → Guaranteed | Burstable | BestEffort
For production workloads that cannot be evicted, set equal requests and limits to achieve Guaranteed QoS. For batch jobs or dev workloads that can tolerate eviction, Burstable is a reasonable trade-off.
LimitRange
A LimitRange is a namespace-scoped policy that sets default requests/limits and enforces min/max bounds. When a pod is created without resource declarations in that namespace, the LimitRange defaults are applied automatically.
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: production
spec:
limits:
- type: Container
default:
cpu: "500m"
memory: "256Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
max:
cpu: "4"
memory: "4Gi"
min:
cpu: "50m"
memory: "64Mi"
ResourceQuota
A ResourceQuota caps total resource consumption across all pods in a namespace. Useful for multi-team clusters where you need to prevent one team's workloads from monopolising cluster resources.
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-quota
namespace: team-a
spec:
hard:
requests.cpu: "10"
requests.memory: "20Gi"
limits.cpu: "20"
limits.memory: "40Gi"
pods: "50"
# Check quota usage
kubectl describe resourcequota team-quota -n team-a
Common Mistakes
Setting limits without requests. Kubernetes sets the request equal to the limit in this case — which may dramatically overstate what the scheduler reserves, making it harder to bin-pack pods.
Setting CPU limits too low. A container running a JVM, Python garbage collector, or Node.js event loop can spike CPU on startup. A limit of 200m may leave the app throttled at boot, appearing hung. Start with a generous limit and tune down with data.
Not setting memory limits at all. A memory leak will exhaust the node and trigger evictions of unrelated pods. Always set memory limits, even if generous.
Setting requests equal to peak usage. Requests should reflect typical usage, not peak. If you set requests to peak, the scheduler reserves that much permanently, and the node appears full even when pods are mostly idle.