It started innocently enough.
"Can you check the server."
A ping to 192.168.192.52 — the home lab beast sitting on a ZeroTier virtual network, running Ollama, LiquidBrain, n8n, MinIO, ComfyUI, PostgreSQL, and about a dozen other services. The kind of box that runs hot and does everything.
SSH was dead. Not "connection refused" dead. Not "timeout" dead. The sneaky kind of dead — where the connection starts, the handshake begins, and then... silence. The server closes the connection mid-key-exchange without a word.
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
Connection closed by 192.168.192.52 port 22
First instinct: fail2ban. We'd been hammering the box. The symptoms fit. But we were wrong.
The fundamental catch-22 of remote server administration: you need SSH to fix SSH.
The server was 1,000+ miles away. No IPMI. No KVM. No one on-site to restart sshd. No Cockpit or Webmin running.
Just a locked door and a bunch of services still happily running behind it.
Every tool in the standard playbook required the one thing we didn't have — a shell. We could see the server on the network. We could ping it. We could watch its services respond to HTTP requests. But we couldn't talk to it. Not in any way that mattered.
The clock was ticking. Not because anything was on fire, but because the longer SSH stayed broken, the more likely we were to compound the problem by trying things that would make it worse. Every failed connection attempt was another data point the server could use against us.
An nmap scan revealed the attack surface:
| Port | Service | Status |
| 22 | SSH | Broken |
| 80/443 | Nginx | Redirect loop |
| 5678 | n8n v2.9.4 | Wide open |
| 7777 | LiquidBrain | Running |
| 8080 | Custom API | Running |
| 9001 | MinIO Console | Running |
| 11434 | Ollama | Running |
| 5432 | PostgreSQL | Auth required |
Everything was alive except the one thing we needed.
Then we spotted it: n8n — the workflow automation tool — sitting on port 5678, accessible through a Cloudflare tunnel at n8n.americannex.com. n8n has an SSH node. If we could log in, we could use n8n to SSH from the server to itself and run commands.
A workflow automation tool with SSH capabilities, accessible from the public internet. Our way back in was hiding in plain sight.
We drove Playwright — a browser automation framework — directly into the n8n web interface. Navigated to the login page, authenticated, and started building a workflow.
Attempt 1: Code Node
First we tried the Code node with a shell command. n8n sandboxes its Code node — no raw Node.js modules allowed. Fair play, n8n. Fair play.
Attempt 2: SSH Node to localhost
We added an SSH node, configured credentials to connect to the server's LAN IP (10.0.0.43) from n8n's container network, and pointed it at fail2ban-client set sshd unbanip 192.168.192.199.
It "executed successfully." SSH still dead.
Attempt 3: The n8n UI Fight
The n8n interface was fighting us. A broken node sat between the trigger and SSH node, eating the data flow. Output panels showed nothing. Sessions expired mid-operation. We were driving a browser through Playwright, clicking accessibility tree refs, wrestling with dialog modals.
It was getting ugly. We were automating a browser to automate a workflow engine to automate an SSH connection to fix the SSH connection we couldn't make directly. Three layers of indirection deep and sinking.
We abandoned the UI. Went to n8n's settings, created an API key, and started driving everything through REST calls.
First, we created a clean workflow via the API — just a webhook trigger wired directly to an SSH node. No broken nodes in between. Activated it. Hit the webhook URL:
curl 'https://n8n.americannex.com/webhook/ssh-diag-run'
And for the first time, we got actual output from the server:
Timeout before authentication for connection from 192.168.192.199
srclimit_penalise: ipv4: new 192.168.192.199/32 deferred penalty of 10 seconds
A webhook shell. We'd built a web-accessible command executor out of a workflow automation tool's SSH node and a webhook trigger. Crude. Effective. Terrifying from a security perspective.
The moment everything changed
OpenSSH 10 introduced PerSourcePenalties — a built-in rate limiter that penalizes IPs with repeated authentication failures. Our IP had been accumulating penalties from every failed SSH attempt we'd made during debugging. The server was deliberately delaying and dropping our connections.
It was never fail2ban. It was SSH itself, punishing us for trying to fix it.
We disabled it via the webhook shell and restarted sshd.
# Disable PerSourcePenalties
# Restart sshd
# Check status
Still broken.
We checked iptables. Found Kubernetes's kube-router had set the INPUT chain to policy DROP with its own netpol rules. Added explicit ACCEPT rules for the ZeroTier subnet.
Still broken.
Two hypotheses down. Each one plausible. Each one wrong. The real answer was hiding in a place we hadn't thought to look yet — the cryptographic handshake itself.
After exhausting every firewall and ban theory, we tried something different:
ssh -o KexAlgorithms=curve25519-sha256 192.168.192.52 "echo WORKS"
One flag. That's all it took.
The climax
OpenSSH 10 defaults to sntrup761x25519-sha512 — a post-quantum hybrid key exchange algorithm. It's the future of cryptography. It's also large. The initial key exchange packets are ~1.5KB, which is fine on regular networks but gets fragmented traversing ZeroTier's tunnel. The fragments were getting silently dropped, causing the handshake to hang at SSH2_MSG_KEX_ECDH_REPLY indefinitely.
The classic curve25519-sha256 produces smaller packets that fit cleanly through the tunnel.
The fix:
Host 192.168.192.52
KexAlgorithms curve25519-sha256
Permanent. Clean. One line.
We went through fail2ban, PerSourcePenalties, iptables INPUT rules, OUTPUT rules, kube-router netpol, conntrack, and UFW — all dead ends. The actual fix was one line in SSH config. The diagnosis was harder than the cure.
With SSH restored, we went further — setting up a full ZeroTier bridge to the home LAN (10.0.0.0/24).
Server side (persistent via UFW):
# NAT masquerade between ZeroTier and LAN interfaces
# Forwarding rules between ztwfufk6e4 and enp14s0
# ZeroTier subnet accept in input chain
Network side (auto-pushed to all devices):
# Managed route: 10.0.0.0/24 via 192.168.192.52
# Pushed to all ZeroTier members automatically
Now every device on the ZeroTier network — phone, tablet, laptop, wherever in the world — can reach every device on the home LAN.
From locked out to full LAN bridge in one hour.
- Post-quantum crypto is here and it breaks things. OpenSSH 10's default kex algorithm produces packets too large for some VPN tunnels. If SSH hangs during key exchange over a tunnel, force
curve25519-sha256 before blaming the firewall.
- n8n is a legitimate remote access tool. If you have n8n running with SSH credentials configured, you have a web-accessible shell. With the webhook pattern, you can even script it from curl. Secure accordingly.
- The diagnosis was harder than the fix. We went through fail2ban, PerSourcePenalties, iptables INPUT rules, OUTPUT rules, kube-router netpol, conntrack, and UFW — all dead ends. The actual fix was one line in SSH config. The attack surface analysis, lateral movement through n8n, and systematic elimination of hypotheses is what got us there.
- Always have a second way in. If SSH is your only remote access, you're one misconfiguration away from a drive to the data center. n8n, Cockpit, Tailscale, WireGuard, a webhook-triggered script — have something else running.
- Red-team your own infrastructure. We found that n8n was accessible through Cloudflare with full admin access. The SSH node could run arbitrary commands as a user with sudo. MinIO's console was exposed. PostgreSQL was listening on all interfaces. This was a fun exercise. It would not be fun if someone else did it first.
| Time | Event |
| 0:00 | "Can you check the server" |
| 0:02 | SSH dead. Ollama alive. |
| 0:05 | fail2ban hypothesis |
| 0:08 | Port scan reveals n8n on 5678 |
| 0:12 | Playwright navigates to n8n login |
| 0:15 | Code node blocked by sandbox |
| 0:20 | SSH node configured, UI fight begins |
| 0:35 | Session expires mid-operation |
| 0:40 | API key created, webhook shell built |
| 0:42 | First server output: PerSourcePenalties logs |
| 0:45 | Penalties disabled, sshd restarted — still broken |
| 0:48 | iptables rules added — still broken |
| 0:50 | KexAlgorithms curve25519-sha256 — WE'RE IN |
| 0:55 | ZeroTier bridge configured |
| 1:00 | Managed route pushed to all devices |
Total time from lockout to full LAN bridge: ~1 hour.