Observability

Metrics Server & Prometheus Integration

● Intermediate ⏱ 15 min read

Kubernetes has two separate metric systems that often get confused. Metrics Server serves the real-time resource snapshot the HPA and kubectl top need. Prometheus is a time-series database that scrapes metrics from every component in the cluster and stores them for querying and alerting. You almost always need both.

Two Metric Systems

Metrics ServerPrometheus
What it isIn-memory aggregator of kubelet resource statsTime-series database with pull-based scraping
Retention~60 seconds (in memory only)Configurable — days to years
Used byHPA, VPA, kubectl topGrafana dashboards, Alertmanager, custom tooling
InstallSingle deployment, 1–2 replicasFull stack: prometheus, alertmanager, exporters
QueryKubernetes Metrics APIPromQL

Metrics Server

Metrics Server scrapes resource usage (CPU/memory) from each node's kubelet Summary API every 60 seconds and serves them via the metrics.k8s.io API group. Install it with the official manifest:

# Install Metrics Server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify it's running
kubectl get apiservice v1beta1.metrics.k8s.io

# Use it
kubectl top nodes
kubectl top pods -n production --sort-by=memory
⚠️
TLS in local clusters

On clusters without proper TLS (kind, minikube), Metrics Server fails because it can't verify kubelet certificates. Add --kubelet-insecure-tls to the Metrics Server container args to skip verification in dev environments only.

Prometheus Architecture

Prometheus metrics pipeline in Kubernetes
TARGETS (expose /metrics)
kubelet (cadvisor)
kube-apiserver
kube-state-metrics
node-exporter
your app pods
PROMETHEUS
scrapes every 15s
stores in TSDB
evaluates rules
fires alerts
CONSUMERS
Grafana dashboards
Alertmanager
HPA custom metrics
PromQL API clients
↑ Prometheus pulls (scrapes) from targets — targets do not push
kube-prometheus-stack (Helm chart) installs all of this in one command: Prometheus Operator, Grafana, kube-state-metrics, node-exporter, and pre-built dashboards.
Prometheus scrapes /metrics endpoints from all targets on a fixed interval. Grafana queries Prometheus for dashboards; Alertmanager routes fired alerts to PagerDuty, Slack, etc.
kube-prometheus-stack — install via Helm
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set grafana.adminPassword=changeme \
  --set prometheus.prometheusSpec.retention=30d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi

kube-state-metrics

The kubelet exposes container CPU/memory usage. But it knows nothing about Deployment replicas, Pod phase, HPA target ratios, or node conditions. That's what kube-state-metrics adds — it watches the Kubernetes API and exposes object-level metrics as Prometheus gauges.

MetricWhat it tells you
kube_deployment_status_replicas_availableAvailable replicas vs desired — spot degraded deployments.
kube_pod_status_phaseCount of pods in Pending/Running/Failed/Succeeded per namespace.
kube_node_status_conditionNode Ready, DiskPressure, MemoryPressure conditions.
kube_persistentvolumeclaim_status_phasePending/Bound/Lost PVCs.
kube_job_status_failedFailed job runs — useful for CronJob alerting.

Scrape Configs & ServiceMonitor

The Prometheus Operator introduces ServiceMonitor and PodMonitor CRDs. Instead of editing Prometheus config files, you declare what to scrape in a Kubernetes object.

ServiceMonitor — scrape your app's /metrics
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp
  namespace: production
  labels:
    release: kube-prometheus-stack    # must match Prometheus selector
spec:
  selector:
    matchLabels:
      app: myapp                      # select Services with this label
  endpoints:
  - port: http                        # named port on the Service
    path: /metrics
    interval: 15s
    scheme: http
expose /metrics from your app's Service
apiVersion: v1
kind: Service
metadata:
  name: myapp
  namespace: production
  labels:
    app: myapp                        # must match ServiceMonitor selector
spec:
  ports:
  - name: http                        # port name must match ServiceMonitor
    port: 8080
    targetPort: 8080
  selector:
    app: myapp

PromQL Basics

essential PromQL queries for K8s
# CPU usage per pod (cores)
sum by (pod, namespace) (
  rate(container_cpu_usage_seconds_total{container!=""}[5m])
)

# Memory usage per pod (bytes)
sum by (pod, namespace) (
  container_memory_working_set_bytes{container!=""}
)

# Deployment availability ratio
kube_deployment_status_replicas_available /
kube_deployment_spec_replicas

# HTTP error rate (requires app to expose http_requests_total)
rate(http_requests_total{status=~"5.."}[5m]) /
rate(http_requests_total[5m])

# Pods not running
count by (namespace, phase) (kube_pod_status_phase{phase!="Running", phase!="Succeeded"})

# Node memory pressure
node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes

Recording Rules & Alerting

PrometheusRule — alert on high error rate
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: myapp-alerts
  namespace: production
  labels:
    release: kube-prometheus-stack
spec:
  groups:
  - name: myapp
    interval: 30s
    rules:
    # Recording rule — pre-compute expensive query
    - record: job:http_error_rate:ratio5m
      expr: |
        rate(http_requests_total{status=~"5.."}[5m]) /
        rate(http_requests_total[5m])

    # Alert rule — fire when error rate exceeds 1%
    - alert: HighErrorRate
      expr: job:http_error_rate:ratio5m > 0.01
      for: 5m                            # must be true for 5 min before firing
      labels:
        severity: warning
        team: backend
      annotations:
        summary: "High error rate on {{ $labels.job }}"
        description: "Error rate is {{ $value | humanizePercentage }} over 5m."

kubectl Commands

# Check Metrics Server API availability
kubectl get apiservice v1beta1.metrics.k8s.io -o yaml | grep -A5 status

# Top nodes/pods
kubectl top nodes
kubectl top pods -A --sort-by=cpu
kubectl top pods -n production --sort-by=memory

# Port-forward Prometheus UI
kubectl port-forward svc/kube-prometheus-stack-prometheus 9090 -n monitoring

# Port-forward Grafana
kubectl port-forward svc/kube-prometheus-stack-grafana 3000 -n monitoring

# List all ServiceMonitors
kubectl get servicemonitor -A

# Check Prometheus targets (via UI at /targets after port-forward)
curl http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {job:.labels.job, health:.health}'