Homelab 2024

2025-01-25

This is the (belated) second post in a series of snapshots about the state of my homelab.

The homelab actually made two transitions between this post and the introductory post last year. The first was to a new apartment, where the main change was to my networking stack and some compute experimentation. The second was to our first house!

Homelab or Homeprod?

I still stand by the homelab as a playground and testbed, however the house needs a backbone of solid internet connectivity, and this overlaps with the needs of regular, non-lab computing in our home. While I previously used a 5G gateway as a failover WAN, the move to a new location without 5G Ultrawide coverage removed that possibility. Altough I no longer have a backup internet connection, I upgraded my networking stack to be able to multiplex the common household needs with my homelab needs over the same internet connection, such that I could get the isolation benefits without just routing some devices over a separate WAN. This was done with VLAN segmentation, and because I can tag Wi-Fi networks with a VLAN, and because Wi-Fi devices will pick up the same SSID regardless of the hardware or any other implementation details, I was able to swap out for a new network stack without requiring any changes to any of the other clients in the house. So while I think the next year will focus more on building tooling that is used by more than just myself in the household, this year was a good exercise in supporting the infrastructure needs of all our work-from-home equipment and being able to make changes with as little disruption as possible. This is really more of a home networking infrastructure post.

Apartmentlab

The main focus of the apartment lab in the last year was setting up OPNsense as my router and firewall. I've enjoyed having full control over my router, and the main motivation was to get VLAN support. In order to not interfere with other devices in the house, I set it up to have one Wi-Fi SSD per VLAN. The "commons" is the 1-1 replacement for our old Wi-Fi network, with the same SSID so all devices migrated seamlessly. Then I have one network that I use to connect to the router and management interfaces. Then a separate Homelab VLAN for all of my equipment. And another for my work laptop and phone so they are cordoned from the rest of the house as I don't own or manage those. I also created an IOT network without any internet connectivity. So far I haven't done much with that VLAN except test some smart outlets.

Images not shown on home page. Full post

I had some success with tethering my iPhone to my OPNsense router and using that as a secondary WAN Gateway for failover in the event the internet went out, but it was pretty rough and only worked sometimes after a few reboots. I still like this idea and might circle back to it.

Images not shown on home page. Full post

Proxmox

I collected a set of 4 Beelink EQ12 boxes slowly as they went on sale. These have Intel N100 processors, which means they have 4 of Intel's "E" cores (for efficiency, as apposed to "P" for performance). They also have dual 2.5 Gb ethernet, which is why I chose this model. They work great as an OPNsense router, and have room for an internal NVME SSD as well as an internal 2.5" SATA SSD. This made them a good choice for building out a cheap experimental cluster.

I set up a Proxmox cluster on the remaining 3 nodes, including adding a SATA SSD in each one to use for a Ceph storage cluster. The benefit of a Proxmox cluster is you can manage all of the nodes in the cluster from a single administration web UI, create VMs (or LXC containers) on any of the nodes, and even migrate the VMs between nodes.

I was even able to get OPNsense virtualized and running on the Proxmox cluster, as these have two ethernet ports. This was done by connecting one port of each node to a layer 2 switch, which was connected to my ISP modem. Packet routing is done by MAC address, and the VM retains the same MAC address for the ethernet interface when moving across nodes. The other ethernet port on each node was used for the LAN side, connected to a layer 3 switch. Surprisingly all that was required to migrate from bare metal to virtualized Proxmox with the same config was to change the WAN and LAN interface names. I exported the config, did a find-and-replace of the interface names, and restored the virtualized OPNsense install from the config, and everything worked.

Before adding Ceph, the only storage available was the SSD on each node. This meant that migrating from one node to another required downtime. Basically Proxmox sets up a schedule to sync the VM's on-disk data to the other nodes. Then when you migrate, the machine can come up in the same state. However, if you have shared storage, where no external sync is required, then you can do live migrations, which means that Proxmox will copy the memory from the source node to the destination node, and then can pause and resume the VM quickly to migrate the VM without downtime. From the VM's perspective, nothing changed and it keeps running.

This even worked for OPNsense - I was able to migrate the router between nodes to do maintenance or rewiring. While this let me do some maintenance without interrupting our internet connectivity, it did require triggering the migration manually - if a node was disconnected or otherwise died, it turns out that Proxmox has some hardcoded internal timers and configuration that means it takes about 5 minutes for a node to be detected as down and trigger a failover migration to a healthy node. This is okay for some workloads, but means 5 minutes of internet downtime. It's certainly faster than I could restore OPNsense to new hardware on my own if a node dies, but was a bit disappointing from a high-availability perspective given that there's no technical limitation that I'm aware of except for Proxmox's lack of available tuning.

Migrating a VM:

Images not shown on home page. Full post

Grafana

I set up a Prometheus instance running on my NAS, and collected metrics from OPNsense, and used Grafana to visualize the bandwidth across the different VLANs.

Images not shown on home page. Full post

Speed test:

Images not shown on home page. Full post

Backups from the NAS uploading overnight:

Images not shown on home page. Full post

Work meetings during the day:

Images not shown on home page. Full post

netboot.xyz

I successfully got netboot.xyz working in tandem with OPNsense, which points DHCP clients to the right files for PXE booting. I can bring up any VM or physical machine connected to the homelab network with a remote image. Netboot.xyz doesn't seem very compatible with custom images, but for basics like Ubuntu or Debian it's been nice to have.

Images not shown on home page. Full post

Images not shown on home page. Full post

Network-Attached Storage

I need to do a "how I backup" post separately, but not much changed regarding how I use the storage on these devices. I still run Nebula to connect two Synology's, one at home and one at my parent's house, and use the snapshot features to sync backups in both directions.

I did do some upgrades on my 1821+, adding two NVMe SSDs and increasing the memory to 32GB. The Virtual Machine Manager app is pretty great, and I've been running a VM for monitoring (running Prometheus + Grafana), and another for compute jobs that are best with direct access to the storage pools. These are Jellyfin for video and Navidrome for audio, although I really haven't had time to do much lately with these.

I did have one scare with my Synology. We lost power at our apartment, which was not uncommon, so I had a large UPS attached. The Synology was configured to shut down after 5 minutes on UPS power so that the battery can be used for powering the networking equipment instead of async tasks like backup replication. However, I found that once power was restored, the Synology did not come up cleanly. I was unable to log in, with the UI giving an error that it was waiting for services to start. As I had SSH access I was able to look up some commands to bypass this, but then found that a couple of major services, including the Virtual Machine Manager and Hyper Backup (which does snapshot replication) were down. Rebooting put it back into the same loop. Luckily I knew I had plenty of copies of this data, so I started one of the "restore" jobs that was supposed to reinstall the OS layer that Synology adds while retaining the storage pools. This seemed to pretty quickly go into a failed state and I figured I would be wiping the system, however a reboot after this brought the system up cleanly and it has been working ever since. None of this was confidence-inspiring, but I'll monitor and see how it handles future events.

Hosting this site

I decided to move this site onto Cloud VMs. I've got one running in DigitalOcean, and another running in Vultr, just load balanced by putting both IPs in the DNS records for the site. It was cool to have it served entirely from my homelab and was a fun exercise in keeping the site available through upgrades and hardware moves. However, I had two reasons for moving it to the cloud. One was that I wanted it to not have to be fronted by anything and control the entire flow as part of being a blog on the open web. If I have Denial of Service (DoS/DDoS) issues I can always throw the domain behind Cloudflare temporarily, but not having to do this and letting the server deal with the flood of traffic being on the open web is a good exercise. I also had some issues with the cloudflared tunnel dropping out and requiring a restart despite saying it was healthy, so there were some availability issues. I haven't hand any availability issues since hosting directly from VMs in the cloud. As they are just Debian boxes and I'm running a light static binary, I can move to any hosting platform with minimal requirements.

Homelab

When you can put holes in the walls you can really make the changes you want to see. I wanted the rack of networking equipment to live in the basement so it was out of sight and earshot. Putting the rack in the center of the basement up high worked well, and I was able to fish five runs of CAT6 up from the basement to our upstairs offices. One run was for the access point - up high worked best for its broadcast pattern and made it close to our offices for Wi-Fi 6E coverage. The other four runs I terminated in a wall panel.

Fishing CAT6 up through the walls.

Images not shown on home page. Full post

Images not shown on home page. Full post

We replaced the flooring on the landing of our second floor, and removed an electric baseboard heater in the process. Most of the holes in the studs were already there, and luckily one of them worked for reaching down to the basement, two floor up but directly above the rack. This required a borescope in addition to wire fishing tape in order to route the cable down through the floor and wall cavities. I put metal plates between the wires and the drywall to protect them before sealing it up.

Before:

Images not shown on home page. Full post

After:

Images not shown on home page. Full post

The backside, where the wires come in to a closet in the office:

Images not shown on home page. Full post

I forgot to take a good finished picture, but the wires are routed around the inside of the closet door to hide them, and protected by a track.

Images not shown on home page. Full post

The Ubiquiti Access Points like being ceiling mounted, so this has worked well to cover all of our house with only one AP. The slanted ceiling in the closet that it's mounted on is the slant of the roof of the second floor.

Images not shown on home page. Full post

Four 10-gig ethernet runs into the office:

Images not shown on home page. Full post

Images not shown on home page. Full post

Making lots of cables for the keystone patch panel

Images not shown on home page. Full post

I used an audio rack which works well as a low-depth rack for network equipment, since I don't need room for full-depth servers. I ran a power outlet to feed the UPS and extended battery pack which feeds the rack. The rack is located in the center of the house near the stair well. We've got a dehumidifier and improved the insulation of the basement, so it sits under 50% humidity and around 60 degrees year-round.

Images not shown on home page. Full post

Switching to fiber

Despite my IPv6 adventures in the apartmentlab, I opted for the bandwidth of fiber at the new house over having native IPv6 connectivity. The only fiber provider on our street, Fidium fiber, does not provide IPv6 in 2024. I debated this for quite a while, but a few things swayed me to switch off of cable with IPv6 to fiber without. The first was just wanting to experiment with the higher bandwidth, being able to saturate my equipment and see what it was capable of, since most of it was able to pull around 2 Gbps now. The second was that I knew it wouldn't affect our work-from-home requirements. I do need IPv6 for some things, but I just flip on Cloudflare Warp on my laptop for those cases. Last, I wanted the buried fiber connection to the house, and to get off flakey cable. We had so many outages on Spectrum cable in this area that I was ready to switch. We haven't had a single internet outage yet with fiber. And Fidium has been very accommodating of customers running their own routers. They provide an ONT with 1 gig and 10 gig copper ports, and I just have to register the MAC address of the NIC (an SFP module at this point) that I'm connecting on the other end. I also was able to remove a lot of the coaxial cable in the basement.

I added conduit and a 50 foot extension for the fiber coming into the house to run it over to the rack:

Images not shown on home page. Full post

Images not shown on home page. Full post

Images not shown on home page. Full post

Images not shown on home page. Full post

Images not shown on home page. Full post

10 Gigabit

With 2.5GbE ports on my router, 2 Gbps fiber service, and 10GbE ports on my NAS and laptop (via an adapter) and 2.5Gbe and Wi-Fi 6E (capable of about 1.4 Gbps wirelessly) at the Access Point, it was time to get above the 1 Gbps limit of my main ethernet managed switch. I searched for a long time and ended up with a QNAP switch, which -- while it doesn't have any Power over Ethernet (PoE) -- it has 8x 10GbE RJ45 ports as well as 8x 10 gigabit SFP ports at a reasonable price point. This let me have enough copper (RJ45) ports to support my existing 10 gigabit devices, a couple of SFP to bridge my switches, and a bunch of extra SFP ports to expand in the future.

The switch has been rock solid, the only downside being the web interface, namely the VLAN selector that took multiple forum posts to figure out how to get my ports tagged successfully along with the trunk ports for the router and access point.

Images not shown on home page. Full post

Initial setup. (The names are Marathon-class heavy cruisers. This organization didn't last long!

Images not shown on home page. Full post

Back to Ubiquiti

After a year or so of enjoying the flexibility of OPNsense over my old Amplifi HD router, I started having an issue. All of our laptops and phones -- anything not being routed through a tunnel like Cloudflare Warp -- started to fail to load some web pages periodically. Most importantly this was reported by my wife, so it was happening on my common subnets and not just on my experimental ones. It seemed to mostly be Google properties that were affected, but this distinction would only be helpful for debugging, we use too many services for home and work that I couldn't ignore these. The symptom on our phones was just that the webpage would start connecting and never get to rendering any of the page - just a blank page was shown. I could reproduce less often on my laptop - just a browser starting to connect but never loading the page. YouTube was a common example, reloading often helped but was still untenable. An error message was never displayed. I ended up finding a reproducer on my laptop with Google maps, and could periodically show the curl failing and not completing the TLS handshake:

kerby@tycho % curl -vvvvv https://www.google.com:443/maps * Trying 142.251.41.4:443... * Connected to www.google.com (142.251.41.4) port 443 (#0) * ALPN: offers h2,http/1.1 * (304) (OUT), TLS handshake, Client hello (1): * CAfile: /etc/ssl/cert.pem * CApath: none ^C kerby@tycho %

kerby@tycho % openssl s_client -showcerts -connect maps.google.com:443 < /dev/null Connecting to 142.251.40.206 CONNECTED(00000006) write:errno=60 --- no peer certificate available --- No client certificate CA names sent --- SSL handshake has read 0 bytes and written 323 bytes Verification: OK --- New, (NONE), Cipher is (NONE) This TLS version forbids renegotiation. Compression: NONE Expansion: NONE No ALPN negotiated Early data was not sent Verify return code: 0 (ok) ---

Images not shown on home page. Full post

As best I could tell, something was preventing the TLS handshake response from the server back to the client after the client hello. I still don't know what was the cause was. I spent a few tries resetting everything I could think of in OPNsense, trying to get it back to a working configuration. Eventually, due to the impact, I tried replacing the router with a Ubiquity Dream Machine Pro Max. I haven't had a single issue since. This was unsatisfying from a curiosity perspective, but sometimes things just need to work. This has been a bit of a theme with this post! Mostly spending time on house projects and not root cause analysis.

I've been running the Dream Machine and plan to continue to do so -- the performance has been great, and with recent updates the Wireguard options are sufficient for my needs.

Rack at the end of 2024

Plenty of cable management left to do after the router and switch changes.

Images not shown on home page. Full post

Next up

Things I'm working on:
  • Z-Wave relays to fill some gaps where we don't have a good way to wire up switches to lights. For example, being able to turn off the outside garage lights from the mudroom.
  • paperless-ngx and a scanner to handle incoming household documents
  • Ersatz tv with Jellyfin for commercial-free reruns