Workloads

Resource Requests & Limits

● Intermediate ⏱ 12 min read

Every pod in Kubernetes competes for node resources. Without guidance, the scheduler places pods blindly — overloading some nodes and starving others. Requests tell the scheduler how much CPU and memory a container needs. Limits cap how much it can consume. Together they control scheduling placement, QoS priority, and whether a container gets throttled or killed when the cluster is under pressure.

Why Resource Management Matters

Skipping resource declarations has predictable consequences:

Requests vs Limits

These two values serve entirely different purposes:

RequestsLimits
PurposeScheduler guarantee — node must have at least this availableRuntime cap — container cannot exceed this
CPU behaviourMinimum CPU share guaranteed under contentionContainer is throttled if it tries to use more (no kill)
Memory behaviourMinimum memory the node reserves for this containerContainer is OOMKilled if it exceeds this
AffectsScheduling, QoS class, HPAcgroup enforcement, eviction order
Node with 4 CPU — two pods scheduled
Pod A — request: 1 CPU, limit: 2 CPU
limit req
Pod B — request: 1.5 CPU, limit: 3 CPU
limit req
Scheduler sees 2.5 CPU reserved (requests). Both pods can burst up to their limits when node is idle.
Requests reserve capacity for scheduling. Limits cap burst usage at runtime.

CPU Units

CPU is measured in cores or millicores. One core = 1000 millicores (1000m).

resources:
  requests:
    cpu: "250m"     # 0.25 of a core (one quarter)
    # equivalently:
    cpu: "0.25"

  limits:
    cpu: "1"        # 1 full core
    cpu: "1000m"    # same thing

CPU is compressible. If a container tries to use more CPU than its limit, it is throttled — the kernel reduces its CPU time. The container keeps running but slows down. This can cause unexpected latency spikes even when the container appears healthy.

Memory Units

Memory uses standard SI suffixes: Ki, Mi, Gi (binary) or K, M, G (decimal).

resources:
  requests:
    memory: "128Mi"    # 128 mebibytes (134,217,728 bytes)
  limits:
    memory: "256Mi"    # 256 mebibytes

Memory is incompressible. If a container exceeds its memory limit, the kernel kills it with OOMKilled. Kubernetes then restarts it according to the pod's restartPolicy. A container that repeatedly hits its memory limit shows OOMKilled in kubectl describe pod.

⚠️
OOMKill vs CPU throttle

Exceeding a memory limit kills the container immediately. Exceeding a CPU limit just slows it. This asymmetry means memory limits should be set with headroom (1.5–2× the typical working set), while CPU limits can be set closer to average usage since the penalty is latency, not death.

Setting Resources

deployment.yaml — resource configuration
spec:
  template:
    spec:
      containers:
      - name: api
        image: myapp:1.0
        resources:
          requests:
            memory: "128Mi"
            cpu: "250m"
          limits:
            memory: "256Mi"
            cpu: "500m"

To find good starting values, run the app under realistic load with no limits, then observe actual usage with:

# Requires metrics-server installed
kubectl top pods
kubectl top pods --containers

# Detailed per-container usage
kubectl top pod <pod-name> --containers

QoS Classes

Kubernetes assigns a Quality of Service class to each pod based on its resource configuration. This determines eviction order when a node runs out of memory.

ClassConditionEviction order
GuaranteedEvery container has equal requests and limits for both CPU and memoryLast evicted
BurstableAt least one container has a request or limit, but they differEvicted after BestEffort
BestEffortNo container has any requests or limitsFirst evicted
# Check assigned QoS class
kubectl get pod myapp -o jsonpath='{.status.qosClass}'
# → Guaranteed | Burstable | BestEffort

For production workloads that cannot be evicted, set equal requests and limits to achieve Guaranteed QoS. For batch jobs or dev workloads that can tolerate eviction, Burstable is a reasonable trade-off.

LimitRange

A LimitRange is a namespace-scoped policy that sets default requests/limits and enforces min/max bounds. When a pod is created without resource declarations in that namespace, the LimitRange defaults are applied automatically.

limitrange.yaml
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - type: Container
    default:
      cpu: "500m"
      memory: "256Mi"
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"
    max:
      cpu: "4"
      memory: "4Gi"
    min:
      cpu: "50m"
      memory: "64Mi"

ResourceQuota

A ResourceQuota caps total resource consumption across all pods in a namespace. Useful for multi-team clusters where you need to prevent one team's workloads from monopolising cluster resources.

resourcequota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-a
spec:
  hard:
    requests.cpu: "10"
    requests.memory: "20Gi"
    limits.cpu: "20"
    limits.memory: "40Gi"
    pods: "50"
# Check quota usage
kubectl describe resourcequota team-quota -n team-a

Common Mistakes

Setting limits without requests. Kubernetes sets the request equal to the limit in this case — which may dramatically overstate what the scheduler reserves, making it harder to bin-pack pods.

Setting CPU limits too low. A container running a JVM, Python garbage collector, or Node.js event loop can spike CPU on startup. A limit of 200m may leave the app throttled at boot, appearing hung. Start with a generous limit and tune down with data.

Not setting memory limits at all. A memory leak will exhaust the node and trigger evictions of unrelated pods. Always set memory limits, even if generous.

Setting requests equal to peak usage. Requests should reflect typical usage, not peak. If you set requests to peak, the scheduler reserves that much permanently, and the node appears full even when pods are mostly idle.