Advanced Topics

Knative & Serverless on Kubernetes

● Advanced ⏱ 20 min read

Serverless on Kubernetes means scale-to-zero and event-driven scaling without managing the plumbing. Knative adds this capability on top of any Kubernetes cluster. KEDA extends it further — scaling based on external event sources like Kafka lag, SQS depth, or Prometheus metrics. Neither requires a managed cloud function service.

Serverless on K8s

The two main capabilities serverless adds to Kubernetes:

Scale-to-zero — idle services consume zero pods and zero resources. Traffic wakes them up within 1–3 seconds (cold start). Essential for cost efficiency on bursty or infrequent workloads.
Event-driven scaling — scale based on event queue depth, not just CPU/memory. A Kafka consumer scales from 0 to 50 pods when messages pile up; returns to 0 when queue is empty.

Knative Serving

Knative Serving manages the lifecycle of HTTP workloads. A Service CRD (not a core K8s Service) wraps your container and manages revisions, routing, and autoscaling automatically.

Knative Serving — resource hierarchy

Knative Service

top-level resource — manages everything below

↓ creates

Configuration

desired pod template
→ creates Revisions

Route

traffic split across
revisions

↓

Revision v1
(old)

Revision v2
(current)

Knative Service owns Configuration (what to run) and Route (how to split traffic). Each deployment creates an immutable Revision. Routes can split traffic across revisions.

Knative Service — scale-to-zero HTTP workload

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: hello
  namespace: default
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/class: kpa.autoscaling.knative.dev   # KPA = Knative Pod Autoscaler
        autoscaling.knative.dev/metric: concurrency
        autoscaling.knative.dev/target: "10"      # scale up when >10 concurrent requests per pod
        autoscaling.knative.dev/minScale: "0"     # scale to zero when idle
        autoscaling.knative.dev/maxScale: "50"    # cap at 50 pods
        autoscaling.knative.dev/scale-to-zero-grace-period: "30s"  # idle window before scale-down
    spec:
      containers:
      - image: ghcr.io/myorg/hello:latest
        resources:
          requests: {cpu: "100m", memory: "128Mi"}
          limits: {cpu: "1", memory: "256Mi"}

Scale to Zero

When no requests arrive for the grace period, Knative scales the Deployment to 0 replicas. An activator component buffers incoming requests while the pod cold-starts. Cold start time = container startup time (typically 1–3s for small images, 10–30s for large ML models).

Metric	What it measures	Best for
`concurrency`	Simultaneous requests in flight per pod	Latency-sensitive HTTP APIs
`rps`	Requests per second per pod	High-throughput, fast requests
`cpu`	CPU utilisation (like HPA)	CPU-bound workloads; disables scale-to-zero

Traffic Splitting

canary — split traffic across two revisions

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: hello
spec:
  traffic:
  - revisionName: hello-00001    # previous revision
    percent: 80
  - revisionName: hello-00002    # new revision
    percent: 20
  - latestRevision: true         # automatically updated on each new deploy
    percent: 0                   # receive 0% until explicitly weighted

Knative Eventing

Knative Eventing routes CloudEvents from sources (Kafka, PubSub, GitHub webhooks, cron) to sinks (Knative Services, Kubernetes Services, channels). It decouples event producers from consumers.

ApiServerSource — react to K8s API events

apiVersion: sources.knative.dev/v1
kind: ApiServerSource
metadata:
  name: pod-events
  namespace: default
spec:
  serviceAccountName: events-sa
  resources:
  - apiVersion: v1
    kind: Pod
  eventMode: Resource              # send full resource, not just reference
  sink:
    ref:
      apiVersion: serving.knative.dev/v1
      kind: Service
      name: event-processor        # Knative Service that processes the event

Broker + Trigger — fan-out event routing

# Broker: event bus
apiVersion: eventing.knative.dev/v1
kind: Broker
metadata:
  name: default
  namespace: production

---
# Trigger: subscribe a service to specific event types
apiVersion: eventing.knative.dev/v1
kind: Trigger
metadata:
  name: order-created
  namespace: production
spec:
  broker: default
  filter:
    attributes:
      type: com.myapp.order.created    # only this CloudEvent type
  subscriber:
    ref:
      apiVersion: serving.knative.dev/v1
      kind: Service
      name: order-processor

KEDA — Event-Driven Autoscaling

KEDA (Kubernetes Event-Driven Autoscaling) scales standard Deployments and Jobs based on external event sources — no Knative required. Over 50 scalers: Kafka, RabbitMQ, SQS, Azure Service Bus, Prometheus, cron, and more.

ScaledObject — scale on Kafka lag

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-consumer-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: kafka-consumer             # Deployment to scale
  minReplicaCount: 0                 # scale to zero when no messages
  maxReplicaCount: 30
  pollingInterval: 15                # check lag every 15 seconds
  cooldownPeriod: 300                # wait 5 min before scaling down
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka.production:9092
      consumerGroup: order-processors
      topic: orders
      lagThreshold: "100"            # 1 pod per 100 messages of lag
      offsetResetPolicy: latest

ScaledJob — process each message as a separate Job

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: image-processor
spec:
  jobTargetRef:
    template:
      spec:
        containers:
        - name: processor
          image: myapp/image-processor:latest
        restartPolicy: Never
  triggers:
  - type: aws-sqs-queue
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/123/image-queue
      queueLength: "1"               # 1 Job per message
      awsRegion: us-east-1

When to Use Serverless

Good fit	Poor fit
Infrequent or bursty HTTP APIs (webhooks, background tasks)	Low-latency services where cold start > 100ms matters
Event consumers that process Kafka/SQS messages	Stateful workloads (databases, caches)
ML inference endpoints that spike then idle	Services with persistent WebSocket connections
Dev/staging environments where idle cost matters	High-traffic services always above 1 replica anyway

kn Commands

# Install kn CLI
brew install kn

# List Knative Services
kn service list -n production

# Deploy / update a Knative Service
kn service create hello --image ghcr.io/myorg/hello:latest --scale-min 0 --scale-max 10

# Update image (creates new revision)
kn service update hello --image ghcr.io/myorg/hello:v2

# Split traffic 80/20
kn service update hello --traffic hello-00001=80 --traffic hello-00002=20

# Check revision list
kn revision list

# Check KEDA ScaledObjects
kubectl get scaledobject -A
kubectl describe scaledobject kafka-consumer-scaler -n production