Kubernetes Architecture
A Kubernetes cluster consists of two types of machines: a control plane (one or more nodes that manage the cluster) and worker nodes (machines that run your workloads). Every component is designed to be independently replaceable and horizontally scalable. This guide walks through each component, what it does, and how they interact.
Cluster Overview
At the highest level, a Kubernetes cluster is a set of machines (nodes) that run containerised applications. Every cluster has at minimum:
- A control plane — the brain of the cluster. Manages state, makes scheduling decisions, and exposes the Kubernetes API.
- At least one worker node — runs the pods (groups of containers) that make up your applications.
In production, the control plane runs on dedicated nodes (often 3 for high availability) and worker nodes are separate. In development clusters (like minikube), everything runs on a single node.
Control Plane
The control plane is responsible for maintaining the desired state of the cluster — what applications are running, which container images they use, and how many replicas of each. It consists of four main components.
kube-apiserver
The API server is the front door to Kubernetes. Every operation in the cluster — whether triggered by kubectl, a CI/CD pipeline, or an internal controller — goes through the API server. It:
- Validates and processes REST API requests
- Is the only component that reads from and writes to etcd
- Authenticates and authorises every request (RBAC)
- Listens on port 6443 (HTTPS)
The API server is designed to scale horizontally — you can run multiple instances behind a load balancer for high availability.
etcd
etcd is a consistent, distributed key-value store used as Kubernetes' backing store for all cluster data. It holds the complete state of the cluster: what pods exist, what services are configured, what secrets are stored, what nodes have joined.
If you lose etcd without a backup, you lose your cluster's entire state. In production, etcd should run as a 3 or 5-node cluster for fault tolerance, and you should take regular snapshots: etcdctl snapshot save snapshot.db.
kube-scheduler
The scheduler watches for newly created pods that have no node assigned, and selects a node for them to run on. It evaluates multiple factors:
- Resource requirements (CPU, memory requests/limits)
- Hardware/software/policy constraints (node selectors, affinity/anti-affinity rules)
- Data locality and inter-workload interference
- Deadlines and taints/tolerations
The scheduler does not run pods — it just decides where they should run, writing the decision back to the API server.
kube-controller-manager
The controller manager runs a collection of control loops (controllers) that watch cluster state and make changes to move the actual state toward the desired state. Key controllers include:
| Controller | Responsibility |
|---|---|
| Node controller | Notices and responds when nodes go down |
| Job controller | Watches Job objects and creates pods to run one-off tasks |
| EndpointSlice controller | Populates EndpointSlice objects (linking Services to Pods) |
| ServiceAccount controller | Creates default ServiceAccounts for new namespaces |
| ReplicaSet controller | Maintains the correct number of pod replicas |
| Deployment controller | Manages Deployments, creating/updating ReplicaSets |
Worker Nodes
Worker nodes are the machines that actually run your workloads. Every worker node runs three core components.
kubelet
The kubelet is an agent that runs on every worker node. It receives pod specifications (PodSpecs) from the API server and ensures the containers described in them are running and healthy. Specifically:
- Talks to the container runtime via the Container Runtime Interface (CRI)
- Reports node and pod status back to the API server
- Runs liveness, readiness, and startup probes
- Manages pod lifecycle (creation, restart, deletion)
The kubelet does not manage containers that were not created by Kubernetes — it only manages pods.
kube-proxy
kube-proxy runs on each node and maintains network rules that allow network communication to pods from sessions inside or outside the cluster. It implements part of the Kubernetes Service concept — when you create a Service, kube-proxy creates iptables (or IPVS) rules that route traffic to the correct pod endpoints.
Container Runtime
The container runtime is the software responsible for pulling container images and running them. Kubernetes supports any runtime that implements the CRI (Container Runtime Interface). Common choices:
| Runtime | Notes |
|---|---|
| containerd | Default for most managed K8s offerings (EKS, GKE, AKS). Lightweight, OCI-compliant. |
| CRI-O | Purpose-built for Kubernetes, used by OpenShift. Minimal footprint. |
| Docker Engine (via cri-dockerd) | Docker support was deprecated in K8s 1.20 and removed in 1.24. Uses cri-dockerd shim. |
Add-ons
Add-ons extend the functionality of a Kubernetes cluster. They use cluster resources (DaemonSets, Deployments, etc.) to implement cluster features. Essential add-ons include:
- CoreDNS — provides DNS for the cluster. Every Service gets a DNS name. Required for service discovery.
- CNI plugin (Calico, Flannel, Cilium) — provides pod networking, implementing the Kubernetes network model.
- Metrics Server — provides resource metrics (CPU/memory) for Horizontal Pod Autoscaler and
kubectl top. - Dashboard — optional web UI for cluster management.
API Request Flow
Understanding how a kubectl apply command flows through the cluster helps demystify Kubernetes. When you run kubectl apply -f deployment.yaml:
- kubectl reads your kubeconfig, authenticates, and sends an HTTP request to the kube-apiserver.
- The API server validates the manifest, authorises the request via RBAC, and persists the object to etcd.
- The Deployment controller (in kube-controller-manager) notices the new Deployment and creates a ReplicaSet.
- The ReplicaSet controller notices it needs N pods and creates Pod objects in etcd.
- The kube-scheduler notices unscheduled pods and assigns each one to a node.
- The kubelet on the chosen node notices the pod assignment, instructs containerd to pull the image and start the container.
- containerd starts the container; kubelet reports running status back to the API server.
The entire process is event-driven and eventually consistent. Each component watches for its specific changes and reacts — no component directly calls another.