Advanced Topics

Building Kubernetes Operators

● Advanced ⏱ 25 min read

An operator is a controller that encodes human operational knowledge into software. A database operator knows how to provision a replica set, take backups, handle failover, and run schema migrations — all triggered by changes to a Custom Resource. This guide covers the reconcile loop pattern, kubebuilder scaffolding, and the mechanics that make controllers reliable.

What Is an Operator

Kubernetes controllers are control loops: observe current state, compare to desired state, act to close the gap. The Deployment controller does this for pods. An operator does the same thing for your application-specific concepts — a Database CR, a KafkaCluster CR, a MLTrainingJob CR.

operator maturity levels
# Level 1 — Basic Install
# Operator provisions the app from a CR. No operational logic.

# Level 2 — Seamless Upgrades
# Operator handles version upgrades without downtime.

# Level 3 — Full Lifecycle
# Operator handles backup, restore, failure recovery.

# Level 4 — Deep Insights
# Operator exposes metrics, SLO checks, anomaly detection.

# Level 5 — Auto Pilot
# Operator auto-scales, auto-tunes, self-heals without human input.

The Reconcile Loop

Reconcile loop — the heart of every controller
Watch (informer)
CR created/updated/deleted → enqueue key
Reconcile()
fetch desired state from CR
fetch current state from cluster
In sync
return nil
(requeue after interval)
Drift / Error
apply changes
return error (requeue)
Idempotent: Reconcile() must produce the same result whether called once or a hundred times. Events can fire multiple times; the controller must handle duplicates safely.
The reconcile loop: watch for changes → compare desired vs current state → act to converge. Idempotency is required — reconcile may be called any number of times.

kubebuilder Scaffolding

kubebuilder generates the boilerplate for a controller project: CRD types, controller skeleton, RBAC markers, and Makefile targets for building and deploying.

kubebuilder — scaffold a new operator
# Init a new Go module + operator project
kubebuilder init --domain myapp.io --repo github.com/myorg/database-operator

# Create a new API (CRD + controller)
kubebuilder create api --group myapp --version v1 --kind Database

# Generated structure:
# api/v1/database_types.go      ← CRD struct definition
# internal/controller/          ← controller logic
# config/crd/                   ← generated CRD YAML
# config/rbac/                  ← generated RBAC YAML

# Generate CRD manifests from Go types
make manifests

# Run against a local cluster (no deploy needed)
make run

Writing a Controller

reconciler skeleton — Go with controller-runtime
package controller

import (
    "context"
    myappv1 "github.com/myorg/database-operator/api/v1"
    appsv1 "k8s.io/api/apps/v1"
    corev1 "k8s.io/api/core/v1"
    "k8s.io/apimachinery/pkg/runtime"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
)

type DatabaseReconciler struct {
    client.Client
    Scheme *runtime.Scheme
}

// +kubebuilder:rbac:groups=myapp.io,resources=databases,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=myapp.io,resources=databases/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=apps,resources=statefulsets,verbs=get;list;watch;create;update;patch

func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    log := ctrl.LoggerFrom(ctx)

    // 1. Fetch the desired state (the CR)
    db := &myappv1.Database{}
    if err := r.Get(ctx, req.NamespacedName, db); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // 2. Fetch current state (the StatefulSet we manage)
    sts := &appsv1.StatefulSet{}
    err := r.Get(ctx, req.NamespacedName, sts)
    if err != nil && client.IgnoreNotFound(err) != nil {
        return ctrl.Result{}, err
    }

    // 3. Reconcile: create if missing, update if different
    if sts.Name == "" {
        log.Info("Creating StatefulSet", "name", db.Name)
        sts = r.buildStatefulSet(db)
        return ctrl.Result{}, r.Create(ctx, sts)
    }

    // 4. Update status
    db.Status.Phase = "Running"
    if err := r.Status().Update(ctx, db); err != nil {
        return ctrl.Result{}, err
    }

    return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
}

Owner References

Set an owner reference from managed resources (StatefulSet, Service) back to the CR. When the CR is deleted, Kubernetes automatically garbage-collects the owned resources — no manual cleanup in the controller needed.

// Set owner reference so StatefulSet is GC'd when Database CR is deleted
if err := ctrl.SetControllerReference(db, sts, r.Scheme); err != nil {
    return ctrl.Result{}, err
}

// This sets:
// sts.OwnerReferences = [{
//   apiVersion: myapp.io/v1,
//   kind: Database,
//   name: production-db,
//   uid: ...,
//   controller: true,
//   blockOwnerDeletion: true
// }]

Finalizers

Finalizers prevent Kubernetes from deleting a CR until your controller has run cleanup logic. Without a finalizer, the CR is deleted immediately — you never get a chance to clean up external resources (cloud databases, DNS records, S3 buckets).

const dbFinalizer = "myapp.io/database-finalizer"

func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    db := &myappv1.Database{}
    if err := r.Get(ctx, req.NamespacedName, db); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // Being deleted — run cleanup before allowing deletion
    if !db.DeletionTimestamp.IsZero() {
        if controllerutil.ContainsFinalizer(db, dbFinalizer) {
            // Run cleanup: delete RDS instance, remove DNS records, etc.
            if err := r.cleanupExternalResources(ctx, db); err != nil {
                return ctrl.Result{}, err
            }
            // Remove finalizer — K8s will now delete the CR
            controllerutil.RemoveFinalizer(db, dbFinalizer)
            return ctrl.Result{}, r.Update(ctx, db)
        }
        return ctrl.Result{}, nil
    }

    // Not being deleted — ensure finalizer is present
    if !controllerutil.ContainsFinalizer(db, dbFinalizer) {
        controllerutil.AddFinalizer(db, dbFinalizer)
        return ctrl.Result{}, r.Update(ctx, db)
    }
    // ... rest of reconcile
}

Status Conditions

Use the standard metav1.Condition type for status conditions — it integrates with kubectl wait and tooling that understands the Kubernetes condition convention.

// Set a condition on the CR status
meta.SetStatusCondition(&db.Status.Conditions, metav1.Condition{
    Type:               "Ready",
    Status:             metav1.ConditionTrue,
    ObservedGeneration: db.Generation,
    Reason:             "DatabaseRunning",
    Message:            "All replicas are ready",
})

// kubectl can wait on it:
// kubectl wait database/production-db --for=condition=Ready --timeout=5m

When NOT to Write One

Operators have real costs: a Go binary to maintain, CRD schema to version, RBAC to audit. Before writing one, check: