Every subsystem, as MCP tools.
llm-bawt runs a FastMCP server on port 8001 exposing the platform's internals — memory, messages, sessions, profiles, inter-bot messaging, fact extraction, and the entire agent task system — as 60 callable tools over streamable HTTP JSON-RPC. The same Python functions that power llm-bawt's in-process tool dispatch are also reachable from VS Code, Claude Desktop, agent SDKs, or curl. Same transport, same auth model (none — host allowlist), same data shapes.
01 What it is.
The MCP server lives in src/llm_bawt/mcp_server/. It's a FastMCP application registered with a flat namespace where every tool name is prefix-grouped:
| Prefix | Domain | Count |
|---|---|---|
memory_* | Bot memory CRUD, search, supersede, consolidate, regenerate-embeddings | 14 |
messages_* | Conversation history CRUD, search, preview, ignore/restore, recall marking | 15 |
sessions_* | Session lifecycle (shared sessions table) | 4 |
context_* | Combined message + memory context | 1 |
facts_* | LLM-based fact extraction | 1 |
system_* | Service-wide stats + maintenance | 2 |
bots_* | Bot discovery + inter-bot messaging | 2 |
profile | User/bot profile attribute router (action-based) | 1 |
tasks_* | Task CRUD, briefing, dependencies, promotion, regeneration | 10 |
steps_* | Task step lifecycle | 3 |
projects_* | Project CRUD with context briefings | 6 |
activity_* | Agent activity log | 1 |
Sum: 60 tools. The first 41 live in mcp_server/server.py (1,242 lines); the agent task system tools live in mcp_server/task_tools.py (933 lines) and decorate the same shared mcp instance.
02 Two consumers, one set of functions.
The server has two completely independent consumer paths against the same Python functions:
MemoryClient in mcp_server/client.py. By default MemoryClient calls the storage layer directly — no JSON-RPC overhead. When LLM_BAWT_MCP_SERVER_URL is set, it switches to HTTP JSON-RPC for auditability.http://<host>:8001/mcp over streamable-http JSON-RPC.The function bodies are identical — @mcp.tool(name="memory_search") registers the function as an MCP tool but doesn't change what it does. When llm-bawt's tool executor needs memory_search, it imports and calls the function directly. When VS Code needs it, FastMCP serializes the JSON-RPC call and invokes the same function.
03 Startup and configuration.
The FastMCP instance is constructed in server.py with:
json_response=True— replies are plain JSON, not chunked.stateless_http=True— no per-connection session state. Every call is independent.TransportSecuritySettingswith DNS rebinding protection enabled — only hosts matching the allowlist may issue requests. Default allowlist:127.0.0.1:*,localhost:*; override withLLM_BAWT_MCP_ALLOWED_HOSTS.
The server runs inside the same uvicorn worker as the FastAPI service when launched via the app container — service/api.py's lifespan calls _ensure_mcp_server(config) to bring it up. Standalone, it's reachable via llm-mcp-server or python -m llm_bawt.mcp_server with these flags:
| Flag | Default |
|---|---|
--transport | http (or stdio) |
--host | 0.0.0.0 (HTTP only) |
--port | 8001 (HTTP only) |
04 Memory tools.
Fourteen tools cover everything the memory layer can do. Most operations target a specific bot_id namespace; cross-bot variants are explicit.
| Tool | Purpose |
|---|---|
memory_store | Write a new memory. Tags, importance, source-message IDs. |
memory_search | Per-bot semantic search. |
memory_search_all | Cross-bot fan-out — returns the source bot per hit. |
memory_search_source | Read-only lookup inside a specific other bot's memory. |
memory_list_sources | Which bots have any memories at all. |
memory_list_recent | N newest, no embedding query. |
memory_list_high_importance | Filter by importance threshold. |
memory_update | Modify content, tags, importance. |
memory_update_meaning | Enrich with intent, stakes, emotional_charge, recurrence_keywords. Drives context-builder categorization. |
memory_delete | Delete by id. |
memory_delete_by_source_messages | Delete every memory derived from a given set of messages. |
memory_supersede | Mark one memory as replaced by another (supersede chain). |
memory_consolidate | Cluster similar embeddings + LLM-merge. Local-LLM only. |
memory_regenerate_embeddings | Recompute vectors. Used after embedding model changes. |
05 Messages and sessions.
Fifteen messages_* tools plus four sessions_* tools cover the full conversation-history surface. The interesting one is the soft-delete family — messages_ignore_* moves messages out of search/recall without dropping the row, and messages_restore_ignored reverses it. messages_remove_last_partial drops the trailing assistant message if it's marked partial (used after an aborted stream).
messages_mark_recalled attaches a recall marker to messages a summary was built from — used by the summarization layer to know which raw messages have already been compressed. Sessions are the shared, cross-bot grouping for conversations; tools cover create-implicit (via messages_add), close, get-by-id, list, and get-active.
06 Inter-bot messaging.
Two tools, but they enable real multi-bot workflows:
bots_list_available()— discovers what bots can receive a message. Filters out bots without memory clients or that explicitly opt out of inter-bot.bots_send_message(target_bot_id, message, sender_bot_id, max_tokens, temperature)— runs a one-shot chat completion against the target bot via the internal_dispatch_bot_messagehelper and returns the response text. The target bot's full pipeline runs; the sending bot sees only the reply.
This is how bots can delegate to specialists. Nova can call out to Snark for snark, or to a code-specialist bot to write a function. Each bot remains responsible for its own memory and personality; the calling bot just sees a tool result.
07 The agent task pipeline.
The 19 tools in task_tools.py wrap the agent task REST API at LLM_BAWT_TASK_API_URL (default http://echo.lan.zenoran.com — the BawtHub frontend, which owns the task DB via Prisma). They exist so agents don't have to hand-roll HTTP calls for what is otherwise a normal CRUD surface.
| Tool | Purpose |
|---|---|
tasks_list / tasks_get / tasks_get_context | List, fetch full, fetch formatted briefing. |
tasks_create / tasks_update / tasks_delete | Standard CRUD. Prefer status=CANCELLED over hard delete. |
tasks_add_dependency / tasks_remove_dependency | DAG editing — cycles rejected server-side. |
tasks_promote | Lift a task to its own project (re-parents). |
tasks_regenerate | Server-side LLM rewrite of title + steps. |
steps_add / steps_update / steps_delete | Step lifecycle. Status transitions: PENDING → RUNNING → COMPLETED / FAILED / SKIPPED. |
projects_list / projects_get / projects_get_context | Project listing + briefing. |
projects_create / projects_update / projects_delete | Project CRUD; deleting a project unassigns its tasks. |
activity_get | Recent activity entries, filterable by task / project. |
An agent's typical loop: tasks_get_context for the briefing, steps_update(status="RUNNING"), do work, steps_update(status="COMPLETED", output=...), repeat until task done, then tasks_update(status="REVIEW", response=...).
08 The action-routed profile tool.
Unlike the prefix-grouped tools, profile is a single router tool. Calling shape:
profile(action, entity_type?, entity_id?, category?, key?, value?)
Actions: get, set, delete, list_categories, plus higher-level get_user_summary / get_bot_summary that render the structured profile to a prose paragraph for prompt injection.
09 Fact extraction.
facts_extract is the only tool whose entire job is to invoke an LLM. Given a list of messages, a bot_id, and a user_id, it runs MemoryExtractionService from memory/extraction/ against the maintenance model (default Grok), parses out facts, and optionally writes them into the bot's memory. With store=true, it also walks the extracted facts and routes high-importance ones (default threshold 0.6) into profile_attributes, subject to the ALLOWED_PROFILE_KEYS allowlist and BLOCKED_PROFILE_PATTERNS regex set defined in mcp_server/extraction.py.
The scheduler in service/scheduler.py calls this tool on a cadence to drain completed turns into long-term memory, but external agents can call it ad-hoc against any message bundle they care about.
10 System tools.
Two tools, both essential:
system_stats(bot_id?)— per-bot counts of messages, memories, summaries, sessions; or service-wide ifbot_idomitted.system_run_maintenance(bot_id?, run_consolidation?, run_recurrence_detection?, run_decay_pruning?, run_orphan_cleanup?, dry_run?)— kicks off a maintenance cycle. The scheduler runs this on its own cadence; this is the manual override.
11 The storage indirection.
Every tool body looks the same: storage = _get_storage(); return await storage.<op>(...). The storage abstraction in mcp_server/storage.py (935 lines) is the single backend interface — today implemented by PostgreSQLMemoryBackend from memory/postgresql.py, registered via the llm_bawt.memory entry point group.
This separation matters: every MCP tool stays trivial (one line of dispatch plus logging), all the SQL/vector work lives in one place, and a future alternative backend (DuckDB? Qdrant?) can be slotted in by implementing the same interface and changing one config setting.
12 Key files.
mcp_server/server.pymcp instance with DNS-rebinding protection.mcp_server/task_tools.pymcp instance.mcp_server/storage.pymcp_server/extraction.pyMemoryExtractionService. ALLOWED_PROFILE_KEYS + BLOCKED_PROFILE_PATTERNS safety lists for routing facts into profile attributes.mcp_server/client.pyMemoryClient. 1,300 lines. The in-process consumer side. Defaults to direct backend calls; switches to HTTP JSON-RPC when MCP_SERVER_URL is set. Used by every part of llm-bawt that needs memory.mcp_server/__main__.pypython -m llm_bawt.mcp_server. Also exposed as the llm-mcp-server console script.docs/MCP_SERVER.mddocs/MCP_SERVER.md.
The repo's own documentation lists most of the tool surface but slightly under-counts: it doesn't show the four sessions_* tools or count memory_delete_by_source_messages separately. The actual decorated tools on the FastMCP instance total 60 across server.py and task_tools.py as of main on 2026-05-13.
main on 2026-05-13
Source: llm-bawt/src/llm_bawt/mcp_server