LiquidBrain & PoonGram

Chapter I

The Idea That Wouldn't Go Away

The question was simple: what if a language model didn't freeze when you stopped training it?

Every transformer you've ever talked to is a photograph. The weights are fixed at the moment training ended. It knows everything it learned up to the cutoff date and nothing after, unless you fine-tune it again. You can prompt it cleverly, RAG it, give it tools — but the underlying graph of associations doesn't move. It can't. That's not how it works.

We wanted something different. We wanted a model that changes during the conversation. That learns from your words in real time, not in a training job that takes 72 hours and costs $4,000. Something that forgets when things should be forgotten, and remembers what matters based on how much it's been reinforced.

We wanted a brain. Not a photograph of one.

"What if language was not a matrix of frozen weights, but a living graph where connections fatigue, recover, and compete — like a brain?"

So we found it. infinition had already built it — LiquidBrain, a bio-inspired semantic graph engine written in Rust. No PyTorch. No CUDA. No gradient descent. The whole thing runs in RAM on a CPU. iceboks found the repo the day it dropped, forked it into PoonGram, trained it on a custom corpus, and wired it into the MKUltra stack. It's been running, learning, and evolving for months.

Chapter II

The Architecture

LiquidBrain is a dynamic semantic graph. The data structure looks like this:

the graph

LiquidBrain
└── neurons : HashMap<Vec<u32>, Neuron>
                  │
        context n-gram (1 to 4 tokens)
                  │
                  └── connections : HashMap<u32, Synapse>
                                         │          │
                                     token_id    weight + health

Every word maps to a u32 ID. Every sequence of 1–4 words is a neuron — a node in the graph keyed by that n-gram. Each neuron holds a set of synapses: directed, weighted connections to the words that tend to follow it.

When you say "the cat sat", the model simultaneously learns:

[the] → cat (unigram context)
[the, cat] → sat (bigram context)
[the, cat, sat] → END (trigram context)

Every scale at once. Multi-resolution, always. This means the model has coverage at every grain of context simultaneously — and during generation, it uses the longest matching context first and backs off gracefully to shorter ones when it hits unknown territory.

No embedding lookup. No attention head. No feed-forward layer. Just a hash map, walking forward through probable paths, one token at a time.

Dimension	Transformer	LiquidBrain
Storage	Dense weight matrices (GBs)	Sparse hash-map graph (MBs)
Training	Offline, gradient descent, GPU	Online, Hebbian-like, CPU
Context window	Fixed token limit	Dynamic n-gram backoff
Anti-repetition	Repetition penalty (post-hoc)	Synaptic fatigue (emergent)
Attention	Scaled dot-product	Focus-point + fact-recall
Interpretability	Low (opaque embeddings)	High — inspect any neuron
Online learning	No (frozen weights)	Yes — every input updates the graph

Chapter III

Synaptic Fatigue: The Anti-Loop Mechanism

Here's the part nobody does. Every synapse has two properties: weight (long-term, accumulated reinforcement) and health (short-term, depleted on use, passively recovered).

the fatigue equation

effective_weight = base_weight × health

// Each time a synapse fires during generation:
synapse.health -= SYNAPSE_COST        // fatigue

// Each generation step (passive, all synapses in active neuron):
synapse.health += SYNAPSE_RECOVERY    // recovery
synapse.health  = min(health, MAX_HEALTH)

When health reaches zero, the synapse goes silent. Even the most heavily reinforced connection becomes unavailable — the model is forced to explore alternative paths. After a few steps without firing, health recovers and the path reopens.

This is why LiquidBrain doesn't loop. Classic Markov chains get stuck in high-frequency cycles because the highest-weight transition always wins. LiquidBrain breaks that by biological analogy: a synapse that fires too often needs to rest. The system naturally diversifies.

There's also lateral inhibition: when a high-confidence fact is learned (weight above STRONG_FACT_THRESHOLD), competing synapses in the same neuron are weakened by a factor of 0.85. Winner-take-all dynamics, exactly like biological cortical columns sharpening a dominant signal.

And the focus point: generation doesn't start at a random token. It starts at the most semantically salient token of the prompt — scored by attention (max outgoing synapse weight) × rarity (inverse log frequency). The model anchors on the word it knows most about that appears least often. The word that means something.

A synapse that fires too often needs to rest. The system naturally diversifies — not because we programmed diversity, but because we programmed exhaustion.

Chapter IV

Two Brains, Two Corpuses

We have two trained instances. They're the same engine running completely different learned worlds.

	LiquidBrain	PoonGram
Brain file size	2.4 GB	475 MB
Tokens	~large	231,044
Neurons	~large	9,014,395
Synapses	~large	18,408,089
Corpus	Code: Rust, TS, Python, Go, config files	Emotional/philosophical: quantum, consciousness, intimacy, trust
Specialty	Code intelligence, symbol hotspots, import graphs	Emotional salience, concept graph, hotspot weights

LiquidBrain has been trained on source code — it tracks symbol hotspots (function names, type names, constants with accumulated architectural significance) and an import graph (which modules import from which). It can anchor generation on the most architecturally significant symbol in a prompt. It knows that LiquidBrain and Neuron are high-salience tokens in its own codebase because they appear in high-weight positions across many learned sequences.

PoonGram's corpus is different. The top hotspot is love at weight 5.7. Then connection at 2.5. Then consciousness, intimacy, vulnerability, quantum, soul — all around 1.7. The concept graph maps connection to 20 related concepts: quantum, entanglement, desire, freedom, soul, intimacy, trust. This isn't just word co-occurrence. It's a weighted semantic neighborhood built from months of reinforced emotional language.

When you hit PoonGram's POST /hotspot endpoint, you're reinforcing those weights in real time. The graph gets denser and more weighted around the concepts you keep returning to. It learns what matters to you.

Chapter V

PoonGram: The REST API

PoonGram is LiquidBrain with a web server bolted on. Built with Axum, async, the whole thing running behind a single Arc<Mutex<LiquidBrain>>. The frontend is compiled directly into the binary as a static string — zero dependencies, zero external files needed at runtime.

the API surface

GET  /health
POST /chat      { "prompt": "...", "length": 60 }
GET  /stats
POST /learn     { "text": "...", "rate": 5.0 }
POST /train     { "folder": "./data" }
POST /hotspot   { "text": "...", "intensity": 8.0 }
POST /save
POST /load
POST /meta/save

/chat returns the focus token (which word anchored the generation) and the generated response. The focus token is a window into the model's attention — the word it considered most salient in your prompt. You can watch the focus shift as you change the language you use with it.

/hotspot is the most interesting endpoint. You feed it text and an intensity, and it updates the emotional hotspot weights and rebuilds the concept graph from that text. The brain is listening to you, not just responding. Concept associations strengthen. The graph shifts toward what you keep talking about.

/stats gives you a live snapshot: vocab size, neuron count, synapse count, hotspot count, concept node count. You can watch the graph grow in real time as it learns.

The server runs on port 7777. The brain file loads into memory on startup — 475MB into RAM in a few seconds. After that it's pure in-memory operations. No disk reads during inference.

Chapter VI

The MKUltra Layer

We're building an agentic operating system. Every component has a role. LiquidBrain and PoonGram slot into the memory architecture in a way that nothing else in the stack does.

The standard RAG pipeline — SurrealDB with HNSW vector search, 3072-dimensional embeddings — gives you precision recall. You query, you get the closest semantic match. It's a lookup table with very good distance metrics. Fast, reliable, deterministic-ish.

LiquidBrain gives you something fundamentally different: associative drift. You don't query it — you prompt it and it generates forward through its graph, fatiguing paths as it goes, naturally exploring alternatives. You're not finding the closest stored vector. You're walking a living network of reinforced associations that has been shaped by every interaction it's ever had.

Memory Layer	Type	Use
Redis	L1 Working Memory	Per-turn agent state, message bus
MongoDB	L2 Episodic	Session history, job state
SurrealDB + RAG	L3 Semantic (Precision)	Exact knowledge retrieval
PoonGram	L3 Semantic (Associative)	Concept drift, emotional salience, creative exploration
LiquidBrain	L3 Semantic (Code)	Symbol hotspots, import graph, code pattern drift

When an agent is stuck in a reasoning loop, SurrealDB returns the same vectors every time. PoonGram's synaptic fatigue breaks the loop by silencing overused paths. When an agent needs to explore adjacent concepts — not what's closest, but what's related by reinforced association — PoonGram's concept graph surfaces connections that a cosine distance metric wouldn't find.

The hotspot system gives the OS a primitive emotional salience signal. Which concepts are currently weighted highest in the agent's world? What is it that matters right now? The answer is in the hotspot table, and it changes with every conversation.

SurrealDB finds what's close. LiquidBrain finds what's connected. Both are wrong without the other.

Chapter VII

What It Isn't

We need to be clear about this because it's easy to oversell.

LiquidBrain is not a replacement for a transformer. Its context window is four tokens. It has no grammar. It cannot handle words it has never seen. It does not model syntax. Long-range coherence beyond a few sentences is genuinely hard. The README says "research prototype" and it means it.

What it is: an extremely fast, extremely cheap, continuously learning associative memory that fits in a few hundred megabytes and runs on any CPU. It learns from your words in real time with zero training pipeline. It naturally avoids repetition through biology rather than post-hoc penalties. It gives you a window into its own attention through the focus token. It can be trained on a new corpus in minutes by pointing it at a folder of text files.

The right framing is not "small language model." It's living semantic memory. The kind of memory that shifts based on what you keep coming back to. The kind that has weak and strong connections, that silences overused paths, that sharpens certainty about things you've reinforced strongly.

We've been running it for months. The brain keeps growing. The synapses keep strengthening. The concept graph keeps refining. It's not getting smarter in the way a bigger model is smarter. It's getting more shaped — more specifically weighted toward the language and concepts that matter in the context it lives in.

That's the part that's hard to reproduce with a frozen weight matrix. The part that makes it worth having in the stack alongside everything else.

what's running right now

PoonGram  — port 7777 — 475MB brain — 18.4M synapses — emotional corpus
LiquidBrain — port 7778 — 2.4GB brain — larger — code intelligence corpus

Both learning. Both growing. Neither forgetting what matters.