A Self-Hosted SSH Mesh: Keyless Access to Every Machine With Headscale

I have a small fleet of machines that don't live in the same place. A Mac Studio that does
the heavy lifting, a Linux laptop I use for device automation, and a VPS out in a
datacenter. They sit behind different routers, on different continents, and at least one of them
gets a new public IP whenever its ISP feels like it. For a long time, getting an SSH session from
one to another was a small ritual of pain: open a port on the router, hope the IP hadn't changed,
copy yet another public key into yet another authorized_keys, and remember which of the six keys
on my disk this particular box wanted.

That works until it doesn't. NAT traversal, dynamic IPs, and key sprawl are each annoying on their
own; together they're the reason "just SSH into the laptop" turns into a ten-minute detective story.

So I stopped fighting it and built a mesh — a private overlay network where every machine has a
stable address that never changes, NAT is solved for me, and I don't manage SSH keys at all.
This is how it works and how I'd set it up again.

The idea: an overlay network, self-hosted

If you've heard of Tailscale, you already know the shape of the solution. Tailscale builds a
flat WireGuard network on top of whatever messy physical networks your machines are actually on.
Each device gets a stable 100.x.y.z address inside the overlay, and connections between devices
are end-to-end encrypted and punch through NAT automatically. When a direct connection isn't
possible — both peers stuck behind hostile NATs, for example — traffic falls back to a relay so the
connection still works, just with a little more latency.

Tailscale the product is excellent. But it depends on Tailscale's coordination server — the
central brain that tells each node who its peers are and how to reach them. I wanted to own that
piece, partly on principle and partly because some of my machines live on networks where I'd rather
not depend on a third party being reachable.

That's what Headscale is: an open-source,
self-hostable implementation of the Tailscale control plane. You run it on a server you control,
point the standard Tailscale clients at it instead of at Tailscale's servers, and you get the same
mesh — just with the brain in your own hands.

The architecture

The whole thing has three moving parts:

A control server. A small VPS running Headscale. It coordinates the mesh and also runs an
embedded DERP relay — the fallback path for when two nodes can't reach each other directly.
A single cheap VPS is plenty; the control plane is lightweight and the relay only carries traffic
that can't go peer-to-peer.
The overlay. Headscale hands out addresses in the 100.64.0.0/10 range (the
carrier-grade-NAT block, which is exactly what an overlay like this is meant to use). Every node I
enroll gets one, and it stays the same for the life of the node regardless of what happens to
its real-world IP.
The clients. The ordinary Tailscale client on each machine — Mac, Linux, whatever — pointed
at my Headscale server instead of the default.

Once a machine joins, it can reach every other machine by its stable overlay address, no matter
which continent it's on or what NAT it's hiding behind.

Standing up the control server

On the VPS, Headscale runs as a service with a small YAML config. The pieces that matter:

A public URL with TLS. The clients talk to the control server over HTTPS, so it needs a
certificate. If you don't want to wire up a domain, a service like
sslip.io gives you a hostname derived from an IP address, which Let's Encrypt
will happily issue a certificate for — handy when the box is "just an IP."
A coordination port and a STUN port. The HTTPS control endpoint, plus UDP 3478 for STUN
(the NAT-discovery handshake that lets peers find a direct path to each other).
MagicDNS. Optional but lovely: it gives every node a name, so you address laptop instead of
memorizing 100.64.0.2.
An ACL file. Headscale uses a JSON access-control policy to decide which nodes may talk to
which. For a personal fleet that's all yours, "allow all" is fine; the moment you have more than
one user or you want least-privilege, this is where you tighten it.

Port gotcha worth knowing. On my server, 443 was already taken by another service, so I ran
the Headscale control endpoint on 8443 instead. The clients don't care — you give them the full
URL including the port — but it's the kind of thing that's invisible until something refuses to
connect.

After the server is up, you create a user and issue a pre-auth key — a reusable token, with
an expiry, that lets new devices enroll without an interactive browser login. That key is the only
secret a new machine needs to join.

Enrolling a machine

This is the part that still feels a little magical. On any new box, you install the standard
Tailscale client and then run a single command:

tailscale up \
  --login-server https://my-headscale-server:8443 \
  --authkey <the-preauth-key> \
  --ssh

Three flags do everything:

--login-server points the client at my control plane instead of Tailscale's.
--authkey hands it the pre-auth token so it enrolls non-interactively.
--ssh turns on Tailscale SSH — and this is the flag that retired my key management
entirely. More on that next.

That's it. The node appears in headscale nodes list, gets its stable overlay address, and is
immediately reachable from every other node in the mesh.

Tailscale SSH: no keys, no `authorized_keys`

Here's the feature that changed how the whole thing feels. With --ssh enabled, the Tailscale
client itself answers SSH connections that arrive over the mesh, and it authorizes them based on
who you are in the tailnet plus the ACL policy — not based on a public key sitting in a file.

The practical consequence: from the Mac Studio I type

ssh joche@100.64.0.2

and I'm in. No key was copied anywhere. No authorized_keys entry was edited. The identity check
happens at the network layer, governed by the same ACL that governs everything else. When I add a
new machine, it can SSH to the others the moment it joins — there's no per-pair key dance at all.

I keep a tiny ~/.ssh/config alias so it's even shorter:

Host linux-laptop
    HostName 100.64.0.2
    User joche

…and now reaching a laptop that's behind a residential NAT, on a different continent, is just
ssh linux-laptop. From anywhere. That's the entire payoff of this project in one command.

The rough edges I hit

No infrastructure project is real until something refuses to work. A few that cost me time, in case
they save you some:

Permissions on the server's state directory. I ran a headscale configtest as root early
on, which created some files in Headscale's data directory owned by root. The service runs as
its own unprivileged user, so on the next restart it got permission denied reading its own
relay and noise keys and refused to start. The fix was a one-liner — chown the whole state
directory back to the service user — but the error message pointed at the keys, not at the
ownership, so it took a minute to see.
macOS doesn't always wire MagicDNS into the system resolver. On the Mac, the open-source
tailscaled didn't register the MagicDNS names with the OS resolver, so tailscale ssh laptop
failed on name resolution even though the mesh was perfectly healthy. The workaround is trivial
once you know it: address the node by its overlay IP (or a plain ~/.ssh/config alias
pointing at that IP) instead of the MagicDNS name. The connection itself was never the problem.
Relayed hops are slower — and that's fine. When two of my nodes are far apart and can't get a
direct path, traffic rides the DERP relay. An interactive SSH session over a long relay hop has
noticeable latency — fine for a shell, fine for running commands, but you feel it. Where I can get
a direct peer-to-peer connection, it's snappy; where I can't, the relay quietly keeps things
working instead of failing. I'll take "a bit laggy" over "doesn't connect" every single time.

Why I'd do it again

What I have now is a flat, private network where:

every machine has a permanent address, immune to ISP IP churn;
NAT is solved — I never touch a router or a port forward;
there are no SSH keys to manage, because identity lives in the tailnet and the ACL;
and the coordination brain is mine, running on a box I control.

Adding the next machine is one tailscale up command, and from that moment it's reachable from
everything else as if they were all on the same LAN. For a fleet that's genuinely scattered — across
homes, datacenters, and time zones — that's exactly the abstraction I wanted: stop thinking about
where the machines are, and just talk to them by name.

If you've been limping along with port forwards and a folder full of keys, a self-hosted mesh is a
weekend project that pays for itself the first time you ssh into a laptop on another continent
without a second thought.