Workloads

Resource Requests & Limits

● Intermediate ⏱ 12 min read

Every pod in Kubernetes competes for node resources. Without guidance, the scheduler places pods blindly — overloading some nodes and starving others. Requests tell the scheduler how much CPU and memory a container needs. Limits cap how much it can consume. Together they control scheduling placement, QoS priority, and whether a container gets throttled or killed when the cluster is under pressure.

Why Resource Management Matters

Skipping resource declarations has predictable consequences:

The scheduler cannot make informed placement decisions — pods land on already-overloaded nodes.
A single runaway container can starve every other pod on the node.
Pods get assigned BestEffort QoS — the first evicted under memory pressure.
Auto-scaling (HPA) cannot work without resource metrics to react to.

Requests vs Limits

These two values serve entirely different purposes:

	Requests	Limits
Purpose	Scheduler guarantee — node must have at least this available	Runtime cap — container cannot exceed this
CPU behaviour	Minimum CPU share guaranteed under contention	Container is throttled if it tries to use more (no kill)
Memory behaviour	Minimum memory the node reserves for this container	Container is OOMKilled if it exceeds this
Affects	Scheduling, QoS class, HPA	cgroup enforcement, eviction order

Node with 4 CPU — two pods scheduled

Pod A — request: 1 CPU, limit: 2 CPU

limit req

Pod B — request: 1.5 CPU, limit: 3 CPU

limit req

Scheduler sees 2.5 CPU reserved (requests). Both pods can burst up to their limits when node is idle.

Requests reserve capacity for scheduling. Limits cap burst usage at runtime.

CPU Units

CPU is measured in cores or millicores. One core = 1000 millicores (1000m).

resources:
  requests:
    cpu: "250m"     # 0.25 of a core (one quarter)
    # equivalently:
    cpu: "0.25"

  limits:
    cpu: "1"        # 1 full core
    cpu: "1000m"    # same thing

CPU is compressible. If a container tries to use more CPU than its limit, it is throttled — the kernel reduces its CPU time. The container keeps running but slows down. This can cause unexpected latency spikes even when the container appears healthy.

Memory Units

Memory uses standard SI suffixes: Ki, Mi, Gi (binary) or K, M, G (decimal).

resources:
  requests:
    memory: "128Mi"    # 128 mebibytes (134,217,728 bytes)
  limits:
    memory: "256Mi"    # 256 mebibytes

Memory is incompressible. If a container exceeds its memory limit, the kernel kills it with OOMKilled. Kubernetes then restarts it according to the pod's restartPolicy. A container that repeatedly hits its memory limit shows OOMKilled in kubectl describe pod.

⚠️

OOMKill vs CPU throttle

Exceeding a memory limit kills the container immediately. Exceeding a CPU limit just slows it. This asymmetry means memory limits should be set with headroom (1.5–2× the typical working set), while CPU limits can be set closer to average usage since the penalty is latency, not death.

Setting Resources

deployment.yaml — resource configuration

spec:
  template:
    spec:
      containers:
      - name: api
        image: myapp:1.0
        resources:
          requests:
            memory: "128Mi"
            cpu: "250m"
          limits:
            memory: "256Mi"
            cpu: "500m"

To find good starting values, run the app under realistic load with no limits, then observe actual usage with:

# Requires metrics-server installed
kubectl top pods
kubectl top pods --containers

# Detailed per-container usage
kubectl top pod <pod-name> --containers

QoS Classes

Kubernetes assigns a Quality of Service class to each pod based on its resource configuration. This determines eviction order when a node runs out of memory.

Class	Condition	Eviction order
Guaranteed	Every container has equal requests and limits for both CPU and memory	Last evicted
Burstable	At least one container has a request or limit, but they differ	Evicted after BestEffort
BestEffort	No container has any requests or limits	First evicted

# Check assigned QoS class
kubectl get pod myapp -o jsonpath='{.status.qosClass}'
# → Guaranteed | Burstable | BestEffort

For production workloads that cannot be evicted, set equal requests and limits to achieve Guaranteed QoS. For batch jobs or dev workloads that can tolerate eviction, Burstable is a reasonable trade-off.

LimitRange

A LimitRange is a namespace-scoped policy that sets default requests/limits and enforces min/max bounds. When a pod is created without resource declarations in that namespace, the LimitRange defaults are applied automatically.

limitrange.yaml

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - type: Container
    default:
      cpu: "500m"
      memory: "256Mi"
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"
    max:
      cpu: "4"
      memory: "4Gi"
    min:
      cpu: "50m"
      memory: "64Mi"

ResourceQuota

A ResourceQuota caps total resource consumption across all pods in a namespace. Useful for multi-team clusters where you need to prevent one team's workloads from monopolising cluster resources.

resourcequota.yaml

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-a
spec:
  hard:
    requests.cpu: "10"
    requests.memory: "20Gi"
    limits.cpu: "20"
    limits.memory: "40Gi"
    pods: "50"

# Check quota usage
kubectl describe resourcequota team-quota -n team-a

Common Mistakes

Setting limits without requests. Kubernetes sets the request equal to the limit in this case — which may dramatically overstate what the scheduler reserves, making it harder to bin-pack pods.

Setting CPU limits too low. A container running a JVM, Python garbage collector, or Node.js event loop can spike CPU on startup. A limit of 200m may leave the app throttled at boot, appearing hung. Start with a generous limit and tune down with data.

Not setting memory limits at all. A memory leak will exhaust the node and trigger evictions of unrelated pods. Always set memory limits, even if generous.

Setting requests equal to peak usage. Requests should reflect typical usage, not peak. If you set requests to peak, the scheduler reserves that much permanently, and the node appears full even when pods are mostly idle.