Advanced Topics

Knative & Serverless on Kubernetes

● Advanced ⏱ 20 min read

Serverless on Kubernetes means scale-to-zero and event-driven scaling without managing the plumbing. Knative adds this capability on top of any Kubernetes cluster. KEDA extends it further — scaling based on external event sources like Kafka lag, SQS depth, or Prometheus metrics. Neither requires a managed cloud function service.

Serverless on K8s

The two main capabilities serverless adds to Kubernetes:

Knative Serving

Knative Serving manages the lifecycle of HTTP workloads. A Service CRD (not a core K8s Service) wraps your container and manages revisions, routing, and autoscaling automatically.

Knative Serving — resource hierarchy
Knative Service
top-level resource — manages everything below
↓ creates
Configuration
desired pod template
→ creates Revisions
Route
traffic split across
revisions
Revision v1
(old)
Revision v2
(current)
Knative Service owns Configuration (what to run) and Route (how to split traffic). Each deployment creates an immutable Revision. Routes can split traffic across revisions.
Knative Service — scale-to-zero HTTP workload
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: hello
  namespace: default
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/class: kpa.autoscaling.knative.dev   # KPA = Knative Pod Autoscaler
        autoscaling.knative.dev/metric: concurrency
        autoscaling.knative.dev/target: "10"      # scale up when >10 concurrent requests per pod
        autoscaling.knative.dev/minScale: "0"     # scale to zero when idle
        autoscaling.knative.dev/maxScale: "50"    # cap at 50 pods
        autoscaling.knative.dev/scale-to-zero-grace-period: "30s"  # idle window before scale-down
    spec:
      containers:
      - image: ghcr.io/myorg/hello:latest
        resources:
          requests: {cpu: "100m", memory: "128Mi"}
          limits: {cpu: "1", memory: "256Mi"}

Scale to Zero

When no requests arrive for the grace period, Knative scales the Deployment to 0 replicas. An activator component buffers incoming requests while the pod cold-starts. Cold start time = container startup time (typically 1–3s for small images, 10–30s for large ML models).

MetricWhat it measuresBest for
concurrencySimultaneous requests in flight per podLatency-sensitive HTTP APIs
rpsRequests per second per podHigh-throughput, fast requests
cpuCPU utilisation (like HPA)CPU-bound workloads; disables scale-to-zero

Traffic Splitting

canary — split traffic across two revisions
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: hello
spec:
  traffic:
  - revisionName: hello-00001    # previous revision
    percent: 80
  - revisionName: hello-00002    # new revision
    percent: 20
  - latestRevision: true         # automatically updated on each new deploy
    percent: 0                   # receive 0% until explicitly weighted

Knative Eventing

Knative Eventing routes CloudEvents from sources (Kafka, PubSub, GitHub webhooks, cron) to sinks (Knative Services, Kubernetes Services, channels). It decouples event producers from consumers.

ApiServerSource — react to K8s API events
apiVersion: sources.knative.dev/v1
kind: ApiServerSource
metadata:
  name: pod-events
  namespace: default
spec:
  serviceAccountName: events-sa
  resources:
  - apiVersion: v1
    kind: Pod
  eventMode: Resource              # send full resource, not just reference
  sink:
    ref:
      apiVersion: serving.knative.dev/v1
      kind: Service
      name: event-processor        # Knative Service that processes the event
Broker + Trigger — fan-out event routing
# Broker: event bus
apiVersion: eventing.knative.dev/v1
kind: Broker
metadata:
  name: default
  namespace: production

---
# Trigger: subscribe a service to specific event types
apiVersion: eventing.knative.dev/v1
kind: Trigger
metadata:
  name: order-created
  namespace: production
spec:
  broker: default
  filter:
    attributes:
      type: com.myapp.order.created    # only this CloudEvent type
  subscriber:
    ref:
      apiVersion: serving.knative.dev/v1
      kind: Service
      name: order-processor

KEDA — Event-Driven Autoscaling

KEDA (Kubernetes Event-Driven Autoscaling) scales standard Deployments and Jobs based on external event sources — no Knative required. Over 50 scalers: Kafka, RabbitMQ, SQS, Azure Service Bus, Prometheus, cron, and more.

ScaledObject — scale on Kafka lag
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-consumer-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: kafka-consumer             # Deployment to scale
  minReplicaCount: 0                 # scale to zero when no messages
  maxReplicaCount: 30
  pollingInterval: 15                # check lag every 15 seconds
  cooldownPeriod: 300                # wait 5 min before scaling down
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka.production:9092
      consumerGroup: order-processors
      topic: orders
      lagThreshold: "100"            # 1 pod per 100 messages of lag
      offsetResetPolicy: latest
ScaledJob — process each message as a separate Job
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: image-processor
spec:
  jobTargetRef:
    template:
      spec:
        containers:
        - name: processor
          image: myapp/image-processor:latest
        restartPolicy: Never
  triggers:
  - type: aws-sqs-queue
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/123/image-queue
      queueLength: "1"               # 1 Job per message
      awsRegion: us-east-1

When to Use Serverless

Good fitPoor fit
Infrequent or bursty HTTP APIs (webhooks, background tasks)Low-latency services where cold start > 100ms matters
Event consumers that process Kafka/SQS messagesStateful workloads (databases, caches)
ML inference endpoints that spike then idleServices with persistent WebSocket connections
Dev/staging environments where idle cost mattersHigh-traffic services always above 1 replica anyway

kn Commands

# Install kn CLI
brew install kn

# List Knative Services
kn service list -n production

# Deploy / update a Knative Service
kn service create hello --image ghcr.io/myorg/hello:latest --scale-min 0 --scale-max 10

# Update image (creates new revision)
kn service update hello --image ghcr.io/myorg/hello:v2

# Split traffic 80/20
kn service update hello --traffic hello-00001=80 --traffic hello-00002=20

# Check revision list
kn revision list

# Check KEDA ScaledObjects
kubectl get scaledobject -A
kubectl describe scaledobject kafka-consumer-scaler -n production