Knative & Serverless on Kubernetes
Serverless on Kubernetes means scale-to-zero and event-driven scaling without managing the plumbing. Knative adds this capability on top of any Kubernetes cluster. KEDA extends it further — scaling based on external event sources like Kafka lag, SQS depth, or Prometheus metrics. Neither requires a managed cloud function service.
Serverless on K8s
The two main capabilities serverless adds to Kubernetes:
- Scale-to-zero — idle services consume zero pods and zero resources. Traffic wakes them up within 1–3 seconds (cold start). Essential for cost efficiency on bursty or infrequent workloads.
- Event-driven scaling — scale based on event queue depth, not just CPU/memory. A Kafka consumer scales from 0 to 50 pods when messages pile up; returns to 0 when queue is empty.
Knative Serving
Knative Serving manages the lifecycle of HTTP workloads. A Service CRD (not a core K8s Service) wraps your container and manages revisions, routing, and autoscaling automatically.
→ creates Revisions
revisions
(old)
(current)
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: hello
namespace: default
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/class: kpa.autoscaling.knative.dev # KPA = Knative Pod Autoscaler
autoscaling.knative.dev/metric: concurrency
autoscaling.knative.dev/target: "10" # scale up when >10 concurrent requests per pod
autoscaling.knative.dev/minScale: "0" # scale to zero when idle
autoscaling.knative.dev/maxScale: "50" # cap at 50 pods
autoscaling.knative.dev/scale-to-zero-grace-period: "30s" # idle window before scale-down
spec:
containers:
- image: ghcr.io/myorg/hello:latest
resources:
requests: {cpu: "100m", memory: "128Mi"}
limits: {cpu: "1", memory: "256Mi"}
Scale to Zero
When no requests arrive for the grace period, Knative scales the Deployment to 0 replicas. An activator component buffers incoming requests while the pod cold-starts. Cold start time = container startup time (typically 1–3s for small images, 10–30s for large ML models).
| Metric | What it measures | Best for |
|---|---|---|
concurrency | Simultaneous requests in flight per pod | Latency-sensitive HTTP APIs |
rps | Requests per second per pod | High-throughput, fast requests |
cpu | CPU utilisation (like HPA) | CPU-bound workloads; disables scale-to-zero |
Traffic Splitting
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: hello
spec:
traffic:
- revisionName: hello-00001 # previous revision
percent: 80
- revisionName: hello-00002 # new revision
percent: 20
- latestRevision: true # automatically updated on each new deploy
percent: 0 # receive 0% until explicitly weighted
Knative Eventing
Knative Eventing routes CloudEvents from sources (Kafka, PubSub, GitHub webhooks, cron) to sinks (Knative Services, Kubernetes Services, channels). It decouples event producers from consumers.
apiVersion: sources.knative.dev/v1
kind: ApiServerSource
metadata:
name: pod-events
namespace: default
spec:
serviceAccountName: events-sa
resources:
- apiVersion: v1
kind: Pod
eventMode: Resource # send full resource, not just reference
sink:
ref:
apiVersion: serving.knative.dev/v1
kind: Service
name: event-processor # Knative Service that processes the event
# Broker: event bus
apiVersion: eventing.knative.dev/v1
kind: Broker
metadata:
name: default
namespace: production
---
# Trigger: subscribe a service to specific event types
apiVersion: eventing.knative.dev/v1
kind: Trigger
metadata:
name: order-created
namespace: production
spec:
broker: default
filter:
attributes:
type: com.myapp.order.created # only this CloudEvent type
subscriber:
ref:
apiVersion: serving.knative.dev/v1
kind: Service
name: order-processor
KEDA — Event-Driven Autoscaling
KEDA (Kubernetes Event-Driven Autoscaling) scales standard Deployments and Jobs based on external event sources — no Knative required. Over 50 scalers: Kafka, RabbitMQ, SQS, Azure Service Bus, Prometheus, cron, and more.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-consumer-scaler
namespace: production
spec:
scaleTargetRef:
name: kafka-consumer # Deployment to scale
minReplicaCount: 0 # scale to zero when no messages
maxReplicaCount: 30
pollingInterval: 15 # check lag every 15 seconds
cooldownPeriod: 300 # wait 5 min before scaling down
triggers:
- type: kafka
metadata:
bootstrapServers: kafka.production:9092
consumerGroup: order-processors
topic: orders
lagThreshold: "100" # 1 pod per 100 messages of lag
offsetResetPolicy: latest
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: image-processor
spec:
jobTargetRef:
template:
spec:
containers:
- name: processor
image: myapp/image-processor:latest
restartPolicy: Never
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-east-1.amazonaws.com/123/image-queue
queueLength: "1" # 1 Job per message
awsRegion: us-east-1
When to Use Serverless
| Good fit | Poor fit |
|---|---|
| Infrequent or bursty HTTP APIs (webhooks, background tasks) | Low-latency services where cold start > 100ms matters |
| Event consumers that process Kafka/SQS messages | Stateful workloads (databases, caches) |
| ML inference endpoints that spike then idle | Services with persistent WebSocket connections |
| Dev/staging environments where idle cost matters | High-traffic services always above 1 replica anyway |
kn Commands
# Install kn CLI
brew install kn
# List Knative Services
kn service list -n production
# Deploy / update a Knative Service
kn service create hello --image ghcr.io/myorg/hello:latest --scale-min 0 --scale-max 10
# Update image (creates new revision)
kn service update hello --image ghcr.io/myorg/hello:v2
# Split traffic 80/20
kn service update hello --traffic hello-00001=80 --traffic hello-00002=20
# Check revision list
kn revision list
# Check KEDA ScaledObjects
kubectl get scaledobject -A
kubectl describe scaledobject kafka-consumer-scaler -n production