Workloads

Jobs & CronJobs

● Intermediate ⏱ 12 min read

Deployments and StatefulSets manage long-running processes that should never stop. Jobs manage the opposite: tasks that run to completion and then stop. A Job tracks success and failure, handles retries, and can run multiple pods in parallel for throughput. A CronJob schedules a Job on a repeating calendar schedule — the Kubernetes equivalent of a Unix cron.

What Is a Job?

A Job creates one or more pods, runs them until the specified number complete successfully, and then stops. Unlike a Deployment where restartPolicy: Always is required, Jobs use restartPolicy: OnFailure or restartPolicy: Never.

Common use cases:

Job YAML

job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: db-migrate
spec:
  completions: 1          # how many pods must succeed
  parallelism: 1          # how many pods run at once
  backoffLimit: 4         # retry up to 4 times on failure
  activeDeadlineSeconds: 600  # kill the job after 10 min
  template:
    spec:
      restartPolicy: OnFailure  # Never or OnFailure (not Always)
      containers:
      - name: migrate
        image: myapp:1.0
        command: ["./migrate", "--up"]
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: db-secret
              key: url
        resources:
          requests:
            memory: "128Mi"
            cpu: "250m"
          limits:
            memory: "256Mi"
            cpu: "500m"
FieldDescription
completionsTotal successful pod completions required. Default 1.
parallelismMax pods running simultaneously. Default 1.
backoffLimitRetries before the Job is marked Failed. Default 6.
activeDeadlineSecondsHard timeout. Job is killed if it runs past this. Overrides backoffLimit.
ttlSecondsAfterFinishedAuto-delete the Job N seconds after it completes. Keeps the cluster tidy.
⚠️
restartPolicy must not be Always

Jobs require restartPolicy: OnFailure or restartPolicy: Never. Always is for long-running services and will cause a validation error if used in a Job spec.

Parallelism & Completions

Jobs support three work queue patterns controlled by completions and parallelism:

PatterncompletionsparallelismUse case
Single run1 (default)1 (default)One task, one pod, one try
Fixed completionsNM < NProcess N work items with M workers; each pod does one item
Work queueunsetMPods pull from an external queue; Job ends when any pod completes
Parallel batch — process 10 items with 3 workers
spec:
  completions: 10
  parallelism: 3
  # Kubernetes runs 3 pods at a time; as each succeeds,
  # another starts until 10 total have completed.
completions=6, parallelism=3 — timeline
t=0
pod-1 ▶ pod-2 ▶ pod-3 ▶
t=1
pod-1 ✓ pod-2 ▶ pod-3 ▶ pod-4 ▶
t=2
pod-2 ✓ pod-3 ✓ pod-4 ▶ pod-5 ▶
t=3 ✓
pod-4 ✓ pod-5 ✓ pod-6 ✓
✓ = succeeded · ▶ = running. Parallelism capped at 3; new pod starts as each finishes.
Fixed completions with parallelism — Kubernetes keeps N workers running until all completions are done

Failure Handling

When a pod in a Job fails (non-zero exit code or OOMKilled), the Job controller retries according to backoffLimit using exponential back-off (10s, 20s, 40s…). After backoffLimit retries, the Job is marked Failed and no more pods are created.

restartPolicyOn failure
OnFailureContainer is restarted in-place on the same pod. Pod stays; container restarts.
NeverPod is marked Failed and a new pod is created. Old pod remains (for log inspection).

Use Never when you need to inspect failed pod logs — OnFailure restarts the container in place, potentially overwriting the failure state.

💡
Clean up finished Jobs

Completed Jobs and their pods linger in the cluster consuming etcd space. Use ttlSecondsAfterFinished: 3600 to auto-delete after an hour, or add a regular cleanup job. Without this, old jobs accumulate indefinitely.

What Is a CronJob?

A CronJob creates a new Job on a schedule you define as a cron expression. Every trigger creates a fresh Job object (and thus fresh pods). The CronJob itself is just a scheduler — the actual work is done by the Job it spawns.

Common use cases:

CronJob YAML

cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: db-backup
spec:
  schedule: "0 2 * * *"          # daily at 02:00 UTC
  timeZone: "UTC"                 # explicit tz (Kubernetes 1.27+)
  concurrencyPolicy: Forbid       # skip if previous run still active
  startingDeadlineSeconds: 300    # give up if >5 min late to start
  successfulJobsHistoryLimit: 3   # keep 3 successful job records
  failedJobsHistoryLimit: 1       # keep 1 failed job record
  jobTemplate:
    spec:
      backoffLimit: 2
      activeDeadlineSeconds: 3600
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: backup
            image: myapp/backup:1.0
            command: ["./backup.sh"]
            resources:
              requests:
                memory: "256Mi"
                cpu: "200m"
              limits:
                memory: "512Mi"
                cpu: "500m"

Schedule Syntax

CronJob schedules use standard cron syntax: minute hour day-of-month month day-of-week.

ScheduleMeaning
0 * * * *Every hour on the hour
0 2 * * *Every day at 02:00
0 2 * * 0Every Sunday at 02:00
*/15 * * * *Every 15 minutes
0 0 1 * *First of each month at midnight
@dailyShorthand for 0 0 * * *
@hourlyShorthand for 0 * * * *
💡
Schedule runs in UTC by default

Before Kubernetes 1.27, all cron schedules were interpreted as UTC. From 1.27 onward, you can set spec.timeZone to a tz database name (e.g. "America/New_York"). Always be explicit to avoid confusion during daylight-saving transitions.

Concurrency Policy

spec.concurrencyPolicy controls what happens when a new Job would be triggered while the previous one is still running:

PolicyBehaviourUse case
Allow (default)New Job runs even if previous is still running. Can cause overlap.Idempotent tasks with no shared state.
ForbidSkip the new run if the previous Job is still active.Backups, migrations — must not run concurrently.
ReplaceDelete the running Job and start a fresh one.Cache refresh — only the latest run matters.

kubectl Commands

# Apply a Job or CronJob
kubectl apply -f job.yaml
kubectl apply -f cronjob.yaml

# Check Job status
kubectl get job db-migrate
kubectl describe job db-migrate

# Watch Job pods
kubectl get pods -l job-name=db-migrate -w

# Check Job logs
kubectl logs job/db-migrate

# List CronJobs
kubectl get cronjobs

# Manually trigger a CronJob (create a Job from it)
kubectl create job --from=cronjob/db-backup manual-backup-$(date +%s)

# Delete a completed Job (and its pods)
kubectl delete job db-migrate

# Suspend a CronJob (stop future runs without deleting)
kubectl patch cronjob db-backup -p '{"spec":{"suspend":true}}'