How it started.
From a one-line shell command to a multi-bot platform with persistent personas, voice, memory, and an agent task system.
v0 · CLI roots
The whole stack starts at llm-bawt, which began life as a CLI for messaging LLMs:
llm "what's up"llm -b loopy "what's up" # talk to a specific bot
Originally local and file-based message history. Eventually moved server-side so the same bots could be reached from any machine. The idea of named bots with their own prompts was baked in from the very first commit.
v1 · unified API
llm-bawt merges every LLM provider behind a single OpenAI-compatible API: local GGUF models, OpenAI, Grok (xAI), Anthropic Claude (via Agent SDK), Ollama, vLLM. Each bot has its own system prompt, model preference, voice, and tool access.
The memory layer landed on Postgres + pgvector, with multiple types:
• Message history — raw conversation turns
• Semantic memories — facts that decay unless reinforced; contradictions resolved
• Summaries — rolled-up context for long threads
• User profile — attributes with confidence scores and provenance
• Per-bot segregated memory + a shared cross-bot history pool
v2 · unmute → BawtHub
Started as unmute — a full-stack realtime voice chat framework. Low-latency STT, TTS, hotword detection. The site grew off that, putting more emphasis on text chat, but voice has always remained a first-class feature. The 3D avatar with VRM/GLB lip-sync ships in the same UI.
v3 · agent bridges
Started with OpenClaw, fitting external agents into the same chat UI as regular bots and evolving the concept of realtime streaming of tool call request/response. Built a Redis backend and a per-agent-type bridge.
When Claude OAuth was banned from OpenClaw, the Claude Code SDK integration landed over the same bridge pattern. Then Codex joined. Plus native tool-call/loop on non-agent bots.
Streaming events are the focal point — and the biggest engineering struggle. Tool calls and their responses from every agent backend rendered beautifully in one UI, streamed into a Zustand client store so flipping between bots is instant and threads keep going.
v3.5 · cross-surface continuity
Web, voice, mobile, and CLI all hit the same core API. You can have a chat on your phone walking the dogs, sit down at the laptop and see the thread continue, then finish from the terminal that night — same session, same memory, same bot. The thread lives on the server, not in any particular window.
v4 · agent task system
Projects are high-level components, each with its own markdown definition. Tasks are work items inside them. The lifecycle:
1. Create — title, description, optional image attachments
2. Plan — an agent refines the task into a detailed markdown plan
3. Dispatch — the plan goes to a bot in a clean context with only the project markdown + the task markdown
4. Execute — bot works, streams progress via REST callbacks
5. Review — bot marks the task complete with a summary
6. Approve / revise — you sign off, or send it back; the agent revises
Agents have MCP tools for managing tasks/projects and tapping the full memory layer. The same memory the chat bots use is available to agents working on tasks.
Why this is different.
Most AI tools treat every conversation as a blank slate. Claude Code “agents” are single sessions with a markdown — you spawn one and it dies. Copilot is a sidebar where you keep hitting “new chat.” ChatGPT projects are a folder of conversations.
BawtHub treats each bot as a constant entity you go back to by name. Loopy, Vex, Snark, Caid — they have prompts, memories, voices, roles. There is something genuinely weird and good about giving an agent a name and going back to it; it builds a kind of attachment that recycled sessions don’t.
“I can be working with Snark on something, and then go over to Loopy or Vex and have them doing their own thing but have them message each other to coordinate. The common message history allows for one bot to do something and another bot to just pull history to continue.”
Bots can talk to each other via mcp__bots_send_message. Loopy can ping Vex when something needs a security touch. Snark can hand a refactor off to Caid. Cross-bot history is shared, so a task started by one bot can be continued by another.
One conversation, every device →
The story-of-the-thing in five sections — with a chat demo, the roster, and a real walkthrough of mobile/desktop/CLI continuity.
How it’s built →
The full architecture tour. Every layer, every subsystem, every decision — audited against the live repos.