The SSH Heist

Chapter I

The Setup

It started innocently enough.

"Can you check the server."

A ping to 192.168.192.52 — the home lab beast sitting on a ZeroTier virtual network, running Ollama, LiquidBrain, n8n, MinIO, ComfyUI, PostgreSQL, and about a dozen other services. The kind of box that runs hot and does everything.

SSH was dead. Not "connection refused" dead. Not "timeout" dead. The sneaky kind of dead — where the connection starts, the handshake begins, and then... silence. The server closes the connection mid-key-exchange without a word.

ssh debug output

debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
Connection closed by 192.168.192.52 port 22

First instinct: fail2ban. We'd been hammering the box. The symptoms fit. But we were wrong.

Chapter II

No Shell, No Fix

The fundamental catch-22 of remote server administration: you need SSH to fix SSH.

The server was 1,000+ miles away. No IPMI. No KVM. No one on-site to restart sshd. No Cockpit or Webmin running.

Just a locked door and a bunch of services still happily running behind it.

Every tool in the standard playbook required the one thing we didn't have — a shell. We could see the server on the network. We could ping it. We could watch its services respond to HTTP requests. But we couldn't talk to it. Not in any way that mattered.

The clock was ticking. Not because anything was on fire, but because the longer SSH stayed broken, the more likely we were to compound the problem by trying things that would make it worse. Every failed connection attempt was another data point the server could use against us.

Chapter III

Recon

An nmap scan revealed the attack surface:

Port	Service	Status
22	SSH	Broken
80/443	Nginx	Redirect loop
5678	n8n v2.9.4	Wide open
7777	LiquidBrain	Running
8080	Custom API	Running
9001	MinIO Console	Running
11434	Ollama	Running
5432	PostgreSQL	Auth required

Everything was alive except the one thing we needed.

Then we spotted it: n8n — the workflow automation tool — sitting on port 5678, accessible through a Cloudflare tunnel at n8n.americannex.com. n8n has an SSH node. If we could log in, we could use n8n to SSH from the server to itself and run commands.

A workflow automation tool with SSH capabilities, accessible from the public internet. Our way back in was hiding in plain sight.

Chapter IV

Lateral Movement

We drove Playwright — a browser automation framework — directly into the n8n web interface. Navigated to the login page, authenticated, and started building a workflow.

Attempt 1: Code Node

First we tried the Code node with a shell command. n8n sandboxes its Code node — no raw Node.js modules allowed. Fair play, n8n. Fair play.

Attempt 2: SSH Node to localhost

We added an SSH node, configured credentials to connect to the server's LAN IP (10.0.0.43) from n8n's container network, and pointed it at fail2ban-client set sshd unbanip 192.168.192.199.

It "executed successfully." SSH still dead.

Attempt 3: The n8n UI Fight

The n8n interface was fighting us. A broken node sat between the trigger and SSH node, eating the data flow. Output panels showed nothing. Sessions expired mid-operation. We were driving a browser through Playwright, clicking accessibility tree refs, wrestling with dialog modals.

It was getting ugly. We were automating a browser to automate a workflow engine to automate an SSH connection to fix the SSH connection we couldn't make directly. Three layers of indirection deep and sinking.

Chapter V

The Pivot

We abandoned the UI. Went to n8n's settings, created an API key, and started driving everything through REST calls.

First, we created a clean workflow via the API — just a webhook trigger wired directly to an SSH node. No broken nodes in between. Activated it. Hit the webhook URL:

terminal

curl 'https://n8n.americannex.com/webhook/ssh-diag-run'

And for the first time, we got actual output from the server:

server response

Timeout before authentication for connection from 192.168.192.199
srclimit_penalise: ipv4: new 192.168.192.199/32 deferred penalty of 10 seconds

A webhook shell. We'd built a web-accessible command executor out of a workflow automation tool's SSH node and a webhook trigger. Crude. Effective. Terrifying from a security perspective. The moment everything changed

Chapter VI

The Real Culprit

OpenSSH 10 introduced PerSourcePenalties — a built-in rate limiter that penalizes IPs with repeated authentication failures. Our IP had been accumulating penalties from every failed SSH attempt we'd made during debugging. The server was deliberately delaying and dropping our connections.

It was never fail2ban. It was SSH itself, punishing us for trying to fix it.

We disabled it via the webhook shell and restarted sshd.

webhook shell

# Disable PerSourcePenalties
# Restart sshd
# Check status

Still broken.

We checked iptables. Found Kubernetes's kube-router had set the INPUT chain to policy DROP with its own netpol rules. Added explicit ACCEPT rules for the ZeroTier subnet.

Still broken.

Two hypotheses down. Each one plausible. Each one wrong. The real answer was hiding in a place we hadn't thought to look yet — the cryptographic handshake itself.

Chapter VII

Post-Quantum vs. VPN

After exhausting every firewall and ban theory, we tried something different:

terminal

ssh -o KexAlgorithms=curve25519-sha256 192.168.192.52 "echo WORKS"

output

WORKS

One flag. That's all it took. The climax

OpenSSH 10 defaults to sntrup761x25519-sha512 — a post-quantum hybrid key exchange algorithm. It's the future of cryptography. It's also large. The initial key exchange packets are ~1.5KB, which is fine on regular networks but gets fragmented traversing ZeroTier's tunnel. The fragments were getting silently dropped, causing the handshake to hang at SSH2_MSG_KEX_ECDH_REPLY indefinitely.

The classic curve25519-sha256 produces smaller packets that fit cleanly through the tunnel.

The fix:

~/.ssh/config

Host 192.168.192.52
  KexAlgorithms curve25519-sha256

Permanent. Clean. One line.

We went through fail2ban, PerSourcePenalties, iptables INPUT rules, OUTPUT rules, kube-router netpol, conntrack, and UFW — all dead ends. The actual fix was one line in SSH config. The diagnosis was harder than the cure.

Chapter VIII

Victory Lap

With SSH restored, we went further — setting up a full ZeroTier bridge to the home LAN (10.0.0.0/24).

Server side (persistent via UFW):

network configuration

# NAT masquerade between ZeroTier and LAN interfaces
# Forwarding rules between ztwfufk6e4 and enp14s0
# ZeroTier subnet accept in input chain

Network side (auto-pushed to all devices):

zerotier central

# Managed route: 10.0.0.0/24 via 192.168.192.52
# Pushed to all ZeroTier members automatically

Now every device on the ZeroTier network — phone, tablet, laptop, wherever in the world — can reach every device on the home LAN.

From locked out to full LAN bridge in one hour.

Chapter IX

Lessons Learned

Post-quantum crypto is here and it breaks things. OpenSSH 10's default kex algorithm produces packets too large for some VPN tunnels. If SSH hangs during key exchange over a tunnel, force curve25519-sha256 before blaming the firewall.
n8n is a legitimate remote access tool. If you have n8n running with SSH credentials configured, you have a web-accessible shell. With the webhook pattern, you can even script it from curl. Secure accordingly.
The diagnosis was harder than the fix. We went through fail2ban, PerSourcePenalties, iptables INPUT rules, OUTPUT rules, kube-router netpol, conntrack, and UFW — all dead ends. The actual fix was one line in SSH config. The attack surface analysis, lateral movement through n8n, and systematic elimination of hypotheses is what got us there.
Always have a second way in. If SSH is your only remote access, you're one misconfiguration away from a drive to the data center. n8n, Cockpit, Tailscale, WireGuard, a webhook-triggered script — have something else running.
Red-team your own infrastructure. We found that n8n was accessible through Cloudflare with full admin access. The SSH node could run arbitrary commands as a user with sudo. MinIO's console was exposed. PostgreSQL was listening on all interfaces. This was a fun exercise. It would not be fun if someone else did it first.

Chapter X

Timeline

Time	Event
0:00	"Can you check the server"
0:02	SSH dead. Ollama alive.
0:05	fail2ban hypothesis
0:08	Port scan reveals n8n on 5678
0:12	Playwright navigates to n8n login
0:15	Code node blocked by sandbox
0:20	SSH node configured, UI fight begins
0:35	Session expires mid-operation
0:40	API key created, webhook shell built
0:42	First server output: PerSourcePenalties logs
0:45	Penalties disabled, sshd restarted — still broken
0:48	iptables rules added — still broken
0:50	KexAlgorithms curve25519-sha256 — WE'RE IN
0:55	ZeroTier bridge configured
1:00	Managed route pushed to all devices

Total time from lockout to full LAN bridge: ~1 hour.