llm-bawt · memory

Five layers of state.

Most chat platforms have two layers of memory: the system prompt and the messages array. llm-bawt has five — and they're designed to interact. Raw messages, distilled semantic memories, rolled-up summaries, structured profile attributes, and sessions. Each has its own table, its own update rules, and its own role at request time.

Storage: Postgres + pgvector Embeddings: all-MiniLM-L6-v2 (384d, CPU) Per-bot isolated table namespaces

01 The five layers.

Each layer lives in its own table (per bot, except profiles which is shared) and is consulted at different points in the request lifecycle. Older approaches mash these into one "context" string — llm-bawt keeps them separate so they can be reasoned about independently.

Memory layers · per bot unless noted

Messages

{bot_id}_messagesRaw turnsNever deletedFull-text searchable

Semantic

{bot_id}_memoriesDistilled factsEmbeddedImportance-weightedDecay + supersede

Summaries

{bot_id}_summariesPer-sessionBudget-drivenNon-destructive

Profile

profile_attributessharedTyped K/VConfidence + source

Sessions

sessionssharedCross-bot groupingActive / completed

02 Per-bot isolation, shared by design.

Each bot owns its own message and memory tables, sanitized into a namespace like snark_messages, snark_memories. The same Postgres instance, the same connection pool — but isolated state. This means a single memory_search query stays inside a bot's namespace by default; cross-bot lookups go through explicit memory_search_all / memory_search_source MCP tools that any agent can opt into.

Why isolated tables and not just a bot_id column? Two reasons. Speed: indexes stay small, vector search is cheaper. Composability: spinning up a new bot or wiping one is a table drop, not a tombstone migration.

✦

Profile and sessions are different.

The profile attribute store is intentionally shared — a fact about you (Nick) shouldn't have to be relearned by every bot. The profile_attributes table keys on (entity_type, entity_id) where entity_type is user or bot. Sessions are shared so a conversation that hops between bots stays one session.

03 Semantic search.

Memories are embedded with sentence-transformers/all-MiniLM-L6-v2 — 384-dimensional vectors, fast enough to run on CPU. The model is lazy-loaded on first use and cached for the process. pgvector handles the similarity query with an HNSW index.

The MCP surface exposes seven search variants:

Tool	Scope	Purpose
`memory_search`	One bot	Default per-bot semantic search
`memory_search_all`	All bots	Cross-bot fan-out, returns source bot per hit
`memory_search_source`	Other bot, RO	Read-only lookup in another bot's memory
`memory_list_recent`	One bot	N newest memories, no embedding query
`memory_list_high_importance`	One bot	Filter by importance threshold
`memory_list_sources`	Server-wide	Discover which bots have memories at all
`messages_search_all`	All bots	Full-text search of raw messages

04 Decay, supersede, consolidate.

Memories don't accumulate forever. Three mechanisms keep the store curated:

Importance + decay

Each memory has an importance score (0.0–1.0) assigned at creation. A maintenance job decays importance over time unless the memory is reinforced (recalled, restated, or explicitly bumped). Below a threshold, memories are moved to the {bot_id}_forgotten table — soft-deleted, recoverable, but excluded from search.

Supersede chains

When a new fact contradicts an old one, memory_supersede marks the new memory as replacing the old. The old memory isn't deleted; it's left in place with a superseded_by reference, and search skips superseded records. This preserves the audit trail without polluting active recall.

Consolidation

Clusters of similar memories (cosine similarity above a threshold) are periodically merged into a single canonical memory. The merge is performed by a local LLM only — never an external API — to avoid sending personal data off-host. The originals are superseded by the consolidated memory.

⚠

Privacy boundary: consolidation never leaves the host.

memory/consolidation.py explicitly checks the configured consolidation model and refuses any non-local backend (gguf or ollama only). This is a hard rule: distilling and merging stored personal facts is a fundamentally different trust boundary from one-off chat completions, and the system enforces it at the LLM dispatch layer.

05 The context builder.

Raw memory search results aren't what the LLM sees. At request time, the context_builder module categorizes recalled memories into four buckets and renders them as structured context blocks in the system prompt.

Bucket 01

Core context

Foundational, high-stakes facts. Name, role, persistent constraints. "Always inject."

Bucket 02

Active concerns

Recent seeking-help / advice intents. What the user is currently working through.

Bucket 03

Preferences

Communication style, tool-use preferences, response shape.

Bucket 04

Background

General context. Everything else relevant to the query but not in the other three.

Each memory is classified using its meaning fields — intent, stakes, emotional_charge, recurrence_keywords — set at memory creation by the memory_update_meaning tool. Categorization is deterministic, not LLM-driven.

06 Rolling summaries.

Long histories don't fit in any LLM's context window. Two strategies handle this:

Session detection. Messages are bucketed into sessions by timestamp gap (default: 1 hour of silence = new session).
Budget-driven summarization. When fitting recent history into the model's context would push older sessions out, those sessions are summarized — and only then. No prophylactic summarization. The raw messages stay in the messages table; the summary is just a compressed proxy used at injection time.

Summarization is parallelized: per-session LLM calls run in a thread pool. The chosen model is configurable per bot via summarize_model_override.

07 Profile attributes.

The profile system is the only memory layer that's shared across all bots. It stores typed key/value attributes for users and bots:

User attributes — facts about the human(s) using the system. Confidence-scored, sourced (which message produced this), updatable.
Bot attributes — emergent personality drift. A bot can learn its own preferences through conversation: "I prefer terse responses," "I sign off with a checkmark." This is how the bawts become characters over time.

Profile attributes are auto-extracted from high-importance facts (importance > 0.7) by the facts_extract tool, with safety rails: a ALLOWED_PROFILE_KEYS allowlist and BLOCKED_PROFILE_PATTERNS regex set prevent the system from learning things like passwords or absurd attributes.

08 Maintenance jobs.

A scheduled system_run_maintenance MCP tool (also runnable on a cron) batches:

Memory consolidation across similarity clusters
Importance decay + soft-delete to forgotten
Orphan cleanup (memories whose source messages were ignored/recalled)
Recurrence detection (bumping memories that keep coming up)
Profile attribute consolidation (resolving contradictions)

09 The memory dashboard.

The BawtHub UI exposes the memory layer at /tools/memory. Per bot, you can browse session-grouped previews, run on-demand consolidations, regenerate embeddings (e.g. after model upgrades), and rebuild summaries.

BawtHub memory dashboard showing 37 sessions, 529 messages, 198 summaries for Snark

/tools/memory — Snark's memory dashboard. 12 distilled memories, 2,439 raw messages, 198 summaries, 37 sessions in the current preview window. The Summarize and Maintenance panels run jobs on demand against just this bot's tables.

10 Key files.

memory/postgresql.py

The backend. ~1,900 lines. SQLAlchemy table builders, search, importance + decay logic, supersede mechanics, the PostgreSQLMemoryBackend class.

memory/embeddings.py

Sentence-transformers wrapper. Lazy loads all-MiniLM-L6-v2; warns on first use; cached for the process.

memory/consolidation.py

Cluster + merge. Local-LLM-only. Finds similar-embedding clusters, asks a gguf/ollama model to merge them, supersedes the originals.

memory/context_builder.py

Recall → injection. Categorizes memories into core / concerns / preferences / background and renders them for the system prompt.

memory/summarization.py

Budget-driven compression. Session detection by timestamp gap; per-session summary; runs in a thread pool when context would overflow.

memory/profile_maintenance.py

Profile resolution. LLM-driven consolidation of contradicting profile attributes.

memory/extraction/

Fact + attribute extraction. LLM prompts for distilling memories from raw messages and routing high-importance facts into profile attributes.

profiles.py

Shared profile store. SQLModel ORM; EntityType enum with USER / BOT; confidence + source provenance on every attribute.

mcp_server/server.py

The 14 memory MCP tools. Store, search, update, supersede, list, consolidate, regenerate-embeddings, plus the four "messages preview" variants and the soft-delete messages_ignore_* family.

PreviousModel clients NextTools

Validated against main on 2026-05-13 Source: llm-bawt/src/llm_bawt/memory