Workloads

Jobs & CronJobs

● Intermediate ⏱ 12 min read

Deployments and StatefulSets manage long-running processes that should never stop. Jobs manage the opposite: tasks that run to completion and then stop. A Job tracks success and failure, handles retries, and can run multiple pods in parallel for throughput. A CronJob schedules a Job on a repeating calendar schedule — the Kubernetes equivalent of a Unix cron.

What Is a Job?

A Job creates one or more pods, runs them until the specified number complete successfully, and then stops. Unlike a Deployment where restartPolicy: Always is required, Jobs use restartPolicy: OnFailure or restartPolicy: Never.

Common use cases:

Database migrations before a new app version deploys
One-time data imports or exports
Batch ML model training runs
Sending a mass email or notification
Report generation on demand

Job YAML

job.yaml

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migrate
spec:
  completions: 1          # how many pods must succeed
  parallelism: 1          # how many pods run at once
  backoffLimit: 4         # retry up to 4 times on failure
  activeDeadlineSeconds: 600  # kill the job after 10 min
  template:
    spec:
      restartPolicy: OnFailure  # Never or OnFailure (not Always)
      containers:
      - name: migrate
        image: myapp:1.0
        command: ["./migrate", "--up"]
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: db-secret
              key: url
        resources:
          requests:
            memory: "128Mi"
            cpu: "250m"
          limits:
            memory: "256Mi"
            cpu: "500m"

Field	Description
`completions`	Total successful pod completions required. Default 1.
`parallelism`	Max pods running simultaneously. Default 1.
`backoffLimit`	Retries before the Job is marked Failed. Default 6.
`activeDeadlineSeconds`	Hard timeout. Job is killed if it runs past this. Overrides `backoffLimit`.
`ttlSecondsAfterFinished`	Auto-delete the Job N seconds after it completes. Keeps the cluster tidy.

⚠️

restartPolicy must not be Always

Jobs require restartPolicy: OnFailure or restartPolicy: Never. Always is for long-running services and will cause a validation error if used in a Job spec.

Parallelism & Completions

Jobs support three work queue patterns controlled by completions and parallelism:

Pattern	completions	parallelism	Use case
Single run	1 (default)	1 (default)	One task, one pod, one try
Fixed completions	N	M < N	Process N work items with M workers; each pod does one item
Work queue	unset	M	Pods pull from an external queue; Job ends when any pod completes

Parallel batch — process 10 items with 3 workers

spec:
  completions: 10
  parallelism: 3
  # Kubernetes runs 3 pods at a time; as each succeeds,
  # another starts until 10 total have completed.

completions=6, parallelism=3 — timeline

t=0

pod-1 ▶ pod-2 ▶ pod-3 ▶

t=1

pod-1 ✓ pod-2 ▶ pod-3 ▶ pod-4 ▶

t=2

pod-2 ✓ pod-3 ✓ pod-4 ▶ pod-5 ▶

t=3 ✓

pod-4 ✓ pod-5 ✓ pod-6 ✓

✓ = succeeded · ▶ = running. Parallelism capped at 3; new pod starts as each finishes.

Fixed completions with parallelism — Kubernetes keeps N workers running until all completions are done

Failure Handling

When a pod in a Job fails (non-zero exit code or OOMKilled), the Job controller retries according to backoffLimit using exponential back-off (10s, 20s, 40s…). After backoffLimit retries, the Job is marked Failed and no more pods are created.

restartPolicy	On failure
`OnFailure`	Container is restarted in-place on the same pod. Pod stays; container restarts.
`Never`	Pod is marked Failed and a new pod is created. Old pod remains (for log inspection).

Use Never when you need to inspect failed pod logs — OnFailure restarts the container in place, potentially overwriting the failure state.

💡

Clean up finished Jobs

Completed Jobs and their pods linger in the cluster consuming etcd space. Use ttlSecondsAfterFinished: 3600 to auto-delete after an hour, or add a regular cleanup job. Without this, old jobs accumulate indefinitely.

What Is a CronJob?

A CronJob creates a new Job on a schedule you define as a cron expression. Every trigger creates a fresh Job object (and thus fresh pods). The CronJob itself is just a scheduler — the actual work is done by the Job it spawns.

Common use cases:

Daily database backups at 02:00
Hourly report generation
Weekly cache warmup
Periodic cleanup of old files or records

CronJob YAML

cronjob.yaml

apiVersion: batch/v1
kind: CronJob
metadata:
  name: db-backup
spec:
  schedule: "0 2 * * *"          # daily at 02:00 UTC
  timeZone: "UTC"                 # explicit tz (Kubernetes 1.27+)
  concurrencyPolicy: Forbid       # skip if previous run still active
  startingDeadlineSeconds: 300    # give up if >5 min late to start
  successfulJobsHistoryLimit: 3   # keep 3 successful job records
  failedJobsHistoryLimit: 1       # keep 1 failed job record
  jobTemplate:
    spec:
      backoffLimit: 2
      activeDeadlineSeconds: 3600
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: backup
            image: myapp/backup:1.0
            command: ["./backup.sh"]
            resources:
              requests:
                memory: "256Mi"
                cpu: "200m"
              limits:
                memory: "512Mi"
                cpu: "500m"

Schedule Syntax

CronJob schedules use standard cron syntax: minute hour day-of-month month day-of-week.

Schedule	Meaning
`0 * * * *`	Every hour on the hour
`0 2 * * *`	Every day at 02:00
`0 2 * * 0`	Every Sunday at 02:00
`/15 * * *`	Every 15 minutes
`0 0 1 * *`	First of each month at midnight
`@daily`	Shorthand for `0 0 * * *`
`@hourly`	Shorthand for `0 * * * *`

💡

Schedule runs in UTC by default

Before Kubernetes 1.27, all cron schedules were interpreted as UTC. From 1.27 onward, you can set spec.timeZone to a tz database name (e.g. "America/New_York"). Always be explicit to avoid confusion during daylight-saving transitions.

Concurrency Policy

spec.concurrencyPolicy controls what happens when a new Job would be triggered while the previous one is still running:

Policy	Behaviour	Use case
Allow (default)	New Job runs even if previous is still running. Can cause overlap.	Idempotent tasks with no shared state.
Forbid	Skip the new run if the previous Job is still active.	Backups, migrations — must not run concurrently.
Replace	Delete the running Job and start a fresh one.	Cache refresh — only the latest run matters.

kubectl Commands

# Apply a Job or CronJob
kubectl apply -f job.yaml
kubectl apply -f cronjob.yaml

# Check Job status
kubectl get job db-migrate
kubectl describe job db-migrate

# Watch Job pods
kubectl get pods -l job-name=db-migrate -w

# Check Job logs
kubectl logs job/db-migrate

# List CronJobs
kubectl get cronjobs

# Manually trigger a CronJob (create a Job from it)
kubectl create job --from=cronjob/db-backup manual-backup-$(date +%s)

# Delete a completed Job (and its pods)
kubectl delete job db-migrate

# Suspend a CronJob (stop future runs without deleting)
kubectl patch cronjob db-backup -p '{"spec":{"suspend":true}}'