Observability

Metrics Server & Prometheus Integration

● Intermediate ⏱ 15 min read

Kubernetes has two separate metric systems that often get confused. Metrics Server serves the real-time resource snapshot the HPA and kubectl top need. Prometheus is a time-series database that scrapes metrics from every component in the cluster and stores them for querying and alerting. You almost always need both.

Two Metric Systems

	Metrics Server	Prometheus
What it is	In-memory aggregator of kubelet resource stats	Time-series database with pull-based scraping
Retention	~60 seconds (in memory only)	Configurable — days to years
Used by	HPA, VPA, kubectl top	Grafana dashboards, Alertmanager, custom tooling
Install	Single deployment, 1–2 replicas	Full stack: prometheus, alertmanager, exporters
Query	Kubernetes Metrics API	PromQL

Metrics Server

Metrics Server scrapes resource usage (CPU/memory) from each node's kubelet Summary API every 60 seconds and serves them via the metrics.k8s.io API group. Install it with the official manifest:

# Install Metrics Server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify it's running
kubectl get apiservice v1beta1.metrics.k8s.io

# Use it
kubectl top nodes
kubectl top pods -n production --sort-by=memory

⚠️

TLS in local clusters

On clusters without proper TLS (kind, minikube), Metrics Server fails because it can't verify kubelet certificates. Add --kubelet-insecure-tls to the Metrics Server container args to skip verification in dev environments only.

Prometheus Architecture

Prometheus metrics pipeline in Kubernetes

TARGETS (expose /metrics)

kubelet (cadvisor)

kube-apiserver

kube-state-metrics

node-exporter

your app pods

PROMETHEUS

scrapes every 15s

stores in TSDB

evaluates rules

fires alerts

CONSUMERS

Grafana dashboards

Alertmanager

HPA custom metrics

PromQL API clients

↑ Prometheus pulls (scrapes) from targets — targets do not push

kube-prometheus-stack (Helm chart) installs all of this in one command: Prometheus Operator, Grafana, kube-state-metrics, node-exporter, and pre-built dashboards.

Prometheus scrapes /metrics endpoints from all targets on a fixed interval. Grafana queries Prometheus for dashboards; Alertmanager routes fired alerts to PagerDuty, Slack, etc.

kube-prometheus-stack — install via Helm

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set grafana.adminPassword=changeme \
  --set prometheus.prometheusSpec.retention=30d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi

kube-state-metrics

The kubelet exposes container CPU/memory usage. But it knows nothing about Deployment replicas, Pod phase, HPA target ratios, or node conditions. That's what kube-state-metrics adds — it watches the Kubernetes API and exposes object-level metrics as Prometheus gauges.

Metric	What it tells you
`kube_deployment_status_replicas_available`	Available replicas vs desired — spot degraded deployments.
`kube_pod_status_phase`	Count of pods in Pending/Running/Failed/Succeeded per namespace.
`kube_node_status_condition`	Node Ready, DiskPressure, MemoryPressure conditions.
`kube_persistentvolumeclaim_status_phase`	Pending/Bound/Lost PVCs.
`kube_job_status_failed`	Failed job runs — useful for CronJob alerting.

Scrape Configs & ServiceMonitor

The Prometheus Operator introduces ServiceMonitor and PodMonitor CRDs. Instead of editing Prometheus config files, you declare what to scrape in a Kubernetes object.

ServiceMonitor — scrape your app's /metrics

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp
  namespace: production
  labels:
    release: kube-prometheus-stack    # must match Prometheus selector
spec:
  selector:
    matchLabels:
      app: myapp                      # select Services with this label
  endpoints:
  - port: http                        # named port on the Service
    path: /metrics
    interval: 15s
    scheme: http

expose /metrics from your app's Service

apiVersion: v1
kind: Service
metadata:
  name: myapp
  namespace: production
  labels:
    app: myapp                        # must match ServiceMonitor selector
spec:
  ports:
  - name: http                        # port name must match ServiceMonitor
    port: 8080
    targetPort: 8080
  selector:
    app: myapp

PromQL Basics

essential PromQL queries for K8s

# CPU usage per pod (cores)
sum by (pod, namespace) (
  rate(container_cpu_usage_seconds_total{container!=""}[5m])
)

# Memory usage per pod (bytes)
sum by (pod, namespace) (
  container_memory_working_set_bytes{container!=""}
)

# Deployment availability ratio
kube_deployment_status_replicas_available /
kube_deployment_spec_replicas

# HTTP error rate (requires app to expose http_requests_total)
rate(http_requests_total{status=~"5.."}[5m]) /
rate(http_requests_total[5m])

# Pods not running
count by (namespace, phase) (kube_pod_status_phase{phase!="Running", phase!="Succeeded"})

# Node memory pressure
node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes

Recording Rules & Alerting

PrometheusRule — alert on high error rate

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: myapp-alerts
  namespace: production
  labels:
    release: kube-prometheus-stack
spec:
  groups:
  - name: myapp
    interval: 30s
    rules:
    # Recording rule — pre-compute expensive query
    - record: job:http_error_rate:ratio5m
      expr: |
        rate(http_requests_total{status=~"5.."}[5m]) /
        rate(http_requests_total[5m])

    # Alert rule — fire when error rate exceeds 1%
    - alert: HighErrorRate
      expr: job:http_error_rate:ratio5m > 0.01
      for: 5m                            # must be true for 5 min before firing
      labels:
        severity: warning
        team: backend
      annotations:
        summary: "High error rate on {{ $labels.job }}"
        description: "Error rate is {{ $value | humanizePercentage }} over 5m."

kubectl Commands

# Check Metrics Server API availability
kubectl get apiservice v1beta1.metrics.k8s.io -o yaml | grep -A5 status

# Top nodes/pods
kubectl top nodes
kubectl top pods -A --sort-by=cpu
kubectl top pods -n production --sort-by=memory

# Port-forward Prometheus UI
kubectl port-forward svc/kube-prometheus-stack-prometheus 9090 -n monitoring

# Port-forward Grafana
kubectl port-forward svc/kube-prometheus-stack-grafana 3000 -n monitoring

# List all ServiceMonitors
kubectl get servicemonitor -A

# Check Prometheus targets (via UI at /targets after port-forward)
curl http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {job:.labels.job, health:.health}'