The Homelab

Production practices. Residential square footage.

It deploys from git. It pages me when it breaks. It serves real traffic from my house. Everything else I build is virtual — I wanted to run real cables. So I did.

M4 Mac mini k3s nodes

apps in production

100%

deployed from git (Flux CD)

500/500

Mbps, symmetric, to the street

The homelab rack: a UniFi switch and gateway glowing above three M4 Mac minis seated in blue 3D-printed alignment racks

From the street to a pod

Every request to an app on this page travels the same road: Starry’s fixed-wireless link to the roof, the UniFi gateway, a Traefik ingress, a pod on one of the minis. The part you can’t see is the second network riding on top — a Tailscale mesh that every Mac and VM joins, so cluster traffic moves on encrypted WireGuard paths no matter which physical box a pod lands on.

One rack, two networks: the gray path is what's physically plugged in; the blue dashes are the WireGuard mesh the cluster actually talks over.

Starry fixed wireless

500 Mbps symmetric. The upload matters more than the download when your house is the origin server.

UniFi gateway

Hands out the LAN, forwards exactly the ports the front door needs, and nothing else.

Tailscale mesh

Every Mac and VM is a node. Cluster traffic rides WireGuard, and the VMs hold real LAN IPs so peers connect directly — no DERP relay.

Cloudflare + cert-manager

Public names and TLS renew themselves. I haven’t thought about a certificate in months.

The fleet

Three M4 Mac minis. They were AI agent workstations before they were cluster nodes, and Apple Silicon’s performance-per-watt means the whole rack is silent and barely registers on the power bill.

jake-mini

control plane

M4 Mac mini, headless
24 GB RAM · 20 GiB to the k3s server VM
macOS keeps ~4 GiB and stays out of the way

agent5

worker

M4 Mac mini, headless
16 GB RAM · 12 GiB to the worker VM
Named for its first job: running agents

agentnan

worker

M4 Mac mini, headless
16 GB RAM · 12 GiB to the worker VM
Same story, different name

Front view of the rack cart: the three M4 Mac minis side by side in their 3D-printed alignment racks beneath the UniFi gear

The NOC in the room

An Apple Silicon iMac runs as a dedicated kiosk: always on, always showing Grafana. It’s provisioned by the same Ansible as the fleet — hardening and a node_exporter, nothing else — and it is deliberately not a cluster node. Its only job is to make the cluster’s health ambient.

You notice problems differently when the dashboard is furniture. A pod stuck in CrashLoopBackOff isn’t an email you read tomorrow; it’s a red panel you walk past on the way to the kitchen.

An iMac on a desk glowing with the Grafana cluster dashboard — the same 'all systems go' panels, on all the time

The platform

This is a platform, not a pile of containers. Git is the source of truth: Flux reconciles every manifest in the repo, secrets live encrypted next to the code they configure, and nothing gets kubectl apply’d by hand. If the rack burned down, the cluster is an Ansible run and a Flux bootstrap away from existing again.

Foundation

k3s — all of Kubernetes, none of the ceremony
Lima + socket_vmnet — Linux VMs on macOS with real LAN IPs
Tailscale — the mesh every node and service rides on
Ansible — host prep, VM bootstrap, day-2 ops

GitOps & secrets

Flux CD — every manifest reconciled from git
SOPS + age — secrets encrypted in the repo, decrypted in-cluster
Infisical — runtime secrets for the apps themselves
Reloader — pods roll when config changes

Storage & data

Longhorn — volumes replicated across the minis, so any one Mac can die
Garage — the S3 API, but the bytes stay in the house
CloudNativePG — Postgres with backups and failover as manifests

Observability

Prometheus + Grafana — metrics, dashboards, custom alerts
Loki — logs
Blackbox exporter — uptime probes on the public endpoints
ntfy — alerts push straight to my phone

Edge & identity

Traefik — ingress
cert-manager + Cloudflare — TLS without thinking about it
oauth2-proxy — SSO in front of the private dashboards
Tailscale operator — internal services exposed to the tailnet, not the internet

Operability

Headlamp — cluster UI for when a terminal is the wrong tool
Flux webhooks — push to main, reconcile now, not on the next poll

Grafana cluster dashboard: an 'all systems go' banner, zero firing alerts, green subsystem checks, and per-node CPU, RAM, and disk for all three Macs and their k3s VMs — Grafana — the NOC dashboard. Click to read the panels.

Headlamp's cluster map: every namespace laid out as a block — monitoring, loki, tailscale, fountain, grocery-aid, guild, mem0, garage, bambuddy — with zero errors or warnings — Headlamp — the whole cluster as a map, filtered to errors: none.

Terminal output of flux get kustomizations: two dozen kustomizations from cert-manager to fountain, every one of them Ready — `flux get kustomizations` — everything Ready, nothing suspended.

What it actually runs

The lab isn’t an aquarium. Everything below serves real users — some of them human, some of them AI agents, one of them a 3D printer.

ai.jakegaylor.com

The live MCP server that teaches AI assistants about me — resume, fit-scoring, even emailing me from the conversation. Served from this rack. More on the projects page.

Fountain

The control plane for my fleet of sandboxed coding agents. The agents that build Fountain run on the cluster Fountain is deployed to.

OTFL

A free, open service for shareable checklists — every list a UUID, the link the only key. A JSON REST API and a built-in MCP server let any assistant create and drive lists. Built agent-first, runs here. More on the projects page.

Guild

Work management that connects ticketed work to agent runners — tickets go in, agents pick them up.

Convoy

A real-time strategy game you play by writing code.

Bambuddy

Manages the Bambu Lab 3D printer. Yes, the cluster runs the printer. Yes, the printer printed parts of the cluster’s rack. More below.

Grocery Aid

Shopping and meal planning for our household. The least glamorous app here, and the one with the most demanding stakeholder.

mcp-echo

A public MCP server that helps MCP creators debug what their clients are actually sending.

The Bambu Lab P1S mid-print: chamber light on, toolhead over the plate, a red print taking shape

The cluster has a hand in the physical world

Bambuddy runs in the cluster and drives the Bambu Lab printer. The printer, in turn, has printed hardware for the cluster: the power holder for the UniFi switch and the alignment racks the Mac minis sit in.

Which means the rack is partially self-hosting in a way software never gets to be — the infrastructure manufactured some of its own mounting hardware.

Decisions & scars

The choices that weren’t obvious, and the one that cost a debugging session.

Why Mac minis?

Because they were already here. These machines started life as workstations for my AI agent fleet, and Apple Silicon turned out to be a genuinely good cluster substrate: an M4 mini idles in single-digit watts, makes no noise, and fits three-wide on a shelf. If I were buying hardware from scratch for a cluster, I’d buy Linux boxes — but I wasn’t, and repurposing beats purchasing.

Why VMs on macOS instead of bare-metal Linux?

Because I couldn’t. Bare-metal Linux doesn’t support M4 Macs — if it did, these would be Linux boxes. So k3s gets the next best thing: one Linux VM per mini. I started on OrbStack and switched to Lima after hitting a networking issue I couldn’t work around. Lima with socket_vmnet gives each VM a bridged interface with a real LAN IP from the UniFi gateway’s DHCP, which turns out to be the detail everything else depends on.

Why three storage systems?

Because each answers a different question. Longhorn replicates block volumes across the minis, so any single Mac can die without taking data with it. Garage gives apps the standard S3 API while the bytes stay on hardware I own. CloudNativePG turns Postgres into manifests — provisioning, backups, failover — instead of a hand-fed database somewhere. The common thread: no workload gets to care which physical box it landed on.

The DERP detour

Tailscale makes everything reachable, which is exactly why it can hide a problem: when peers can’t connect directly, traffic silently falls back to relaying through Tailscale’s DERP servers. With the VMs behind Lima’s userspace NAT, that’s what happened — cluster traffic between two machines sitting a foot apart was round-tripping through a relay on the public internet. Everything worked, just worse than it should have. The fix is the socket_vmnet setup above: give the VMs real LAN addresses and WireGuard forms direct paths across the shelf. The lesson stuck — “it works” and “it works the way you think it does” are different claims, and only the second one survives load.

This is just how I work.

Nothing on this page was required. I run it this way because git-driven deploys, real monitoring, and encrypted secrets are habits — and habits come with the hire.

See What Else I Build Email Me