BawtHub
⌕ Search ⌘K Source ↗ Open app →
llm-bawt · mcp-server

Every subsystem, as MCP tools.

llm-bawt runs a FastMCP server on port 8001 exposing the platform's internals — memory, messages, sessions, profiles, inter-bot messaging, fact extraction, and the entire agent task system — as 60 callable tools over streamable HTTP JSON-RPC. The same Python functions that power llm-bawt's in-process tool dispatch are also reachable from VS Code, Claude Desktop, agent SDKs, or curl. Same transport, same auth model (none — host allowlist), same data shapes.

Implementation: FastMCP (mcp[cli]) Port: 8001 Transport: streamable-http (default), stdio (optional) Tools: 60 (41 in server.py + 19 in task_tools.py)

01 What it is.

The MCP server lives in src/llm_bawt/mcp_server/. It's a FastMCP application registered with a flat namespace where every tool name is prefix-grouped:

PrefixDomainCount
memory_*Bot memory CRUD, search, supersede, consolidate, regenerate-embeddings14
messages_*Conversation history CRUD, search, preview, ignore/restore, recall marking15
sessions_*Session lifecycle (shared sessions table)4
context_*Combined message + memory context1
facts_*LLM-based fact extraction1
system_*Service-wide stats + maintenance2
bots_*Bot discovery + inter-bot messaging2
profileUser/bot profile attribute router (action-based)1
tasks_*Task CRUD, briefing, dependencies, promotion, regeneration10
steps_*Task step lifecycle3
projects_*Project CRUD with context briefings6
activity_*Agent activity log1

Sum: 60 tools. The first 41 live in mcp_server/server.py (1,242 lines); the agent task system tools live in mcp_server/task_tools.py (933 lines) and decorate the same shared mcp instance.

02 Two consumers, one set of functions.

The server has two completely independent consumer paths against the same Python functions:

Same functions, two entry points
In-process
llm-bawt's own tool loop dispatches via MemoryClient in mcp_server/client.py. By default MemoryClient calls the storage layer directly — no JSON-RPC overhead. When LLM_BAWT_MCP_SERVER_URL is set, it switches to HTTP JSON-RPC for auditability.
External
VS Code MCP extensions, Claude Desktop, agent bridges, and curl all talk to the same FastMCP HTTP endpoint at http://<host>:8001/mcp over streamable-http JSON-RPC.

The function bodies are identical — @mcp.tool(name="memory_search") registers the function as an MCP tool but doesn't change what it does. When llm-bawt's tool executor needs memory_search, it imports and calls the function directly. When VS Code needs it, FastMCP serializes the JSON-RPC call and invokes the same function.

03 Startup and configuration.

The FastMCP instance is constructed in server.py with:

The server runs inside the same uvicorn worker as the FastAPI service when launched via the app container — service/api.py's lifespan calls _ensure_mcp_server(config) to bring it up. Standalone, it's reachable via llm-mcp-server or python -m llm_bawt.mcp_server with these flags:

FlagDefault
--transporthttp (or stdio)
--host0.0.0.0 (HTTP only)
--port8001 (HTTP only)

04 Memory tools.

Fourteen tools cover everything the memory layer can do. Most operations target a specific bot_id namespace; cross-bot variants are explicit.

ToolPurpose
memory_storeWrite a new memory. Tags, importance, source-message IDs.
memory_searchPer-bot semantic search.
memory_search_allCross-bot fan-out — returns the source bot per hit.
memory_search_sourceRead-only lookup inside a specific other bot's memory.
memory_list_sourcesWhich bots have any memories at all.
memory_list_recentN newest, no embedding query.
memory_list_high_importanceFilter by importance threshold.
memory_updateModify content, tags, importance.
memory_update_meaningEnrich with intent, stakes, emotional_charge, recurrence_keywords. Drives context-builder categorization.
memory_deleteDelete by id.
memory_delete_by_source_messagesDelete every memory derived from a given set of messages.
memory_supersedeMark one memory as replaced by another (supersede chain).
memory_consolidateCluster similar embeddings + LLM-merge. Local-LLM only.
memory_regenerate_embeddingsRecompute vectors. Used after embedding model changes.

05 Messages and sessions.

Fifteen messages_* tools plus four sessions_* tools cover the full conversation-history surface. The interesting one is the soft-delete family — messages_ignore_* moves messages out of search/recall without dropping the row, and messages_restore_ignored reverses it. messages_remove_last_partial drops the trailing assistant message if it's marked partial (used after an aborted stream).

messages_mark_recalled attaches a recall marker to messages a summary was built from — used by the summarization layer to know which raw messages have already been compressed. Sessions are the shared, cross-bot grouping for conversations; tools cover create-implicit (via messages_add), close, get-by-id, list, and get-active.

06 Inter-bot messaging.

Two tools, but they enable real multi-bot workflows:

This is how bots can delegate to specialists. Nova can call out to Snark for snark, or to a code-specialist bot to write a function. Each bot remains responsible for its own memory and personality; the calling bot just sees a tool result.

07 The agent task pipeline.

The 19 tools in task_tools.py wrap the agent task REST API at LLM_BAWT_TASK_API_URL (default http://echo.lan.zenoran.com — the BawtHub frontend, which owns the task DB via Prisma). They exist so agents don't have to hand-roll HTTP calls for what is otherwise a normal CRUD surface.

ToolPurpose
tasks_list / tasks_get / tasks_get_contextList, fetch full, fetch formatted briefing.
tasks_create / tasks_update / tasks_deleteStandard CRUD. Prefer status=CANCELLED over hard delete.
tasks_add_dependency / tasks_remove_dependencyDAG editing — cycles rejected server-side.
tasks_promoteLift a task to its own project (re-parents).
tasks_regenerateServer-side LLM rewrite of title + steps.
steps_add / steps_update / steps_deleteStep lifecycle. Status transitions: PENDING → RUNNING → COMPLETED / FAILED / SKIPPED.
projects_list / projects_get / projects_get_contextProject listing + briefing.
projects_create / projects_update / projects_deleteProject CRUD; deleting a project unassigns its tasks.
activity_getRecent activity entries, filterable by task / project.

An agent's typical loop: tasks_get_context for the briefing, steps_update(status="RUNNING"), do work, steps_update(status="COMPLETED", output=...), repeat until task done, then tasks_update(status="REVIEW", response=...).

08 The action-routed profile tool.

Unlike the prefix-grouped tools, profile is a single router tool. Calling shape:

profile(action, entity_type?, entity_id?, category?, key?, value?)

Actions: get, set, delete, list_categories, plus higher-level get_user_summary / get_bot_summary that render the structured profile to a prose paragraph for prompt injection.

09 Fact extraction.

facts_extract is the only tool whose entire job is to invoke an LLM. Given a list of messages, a bot_id, and a user_id, it runs MemoryExtractionService from memory/extraction/ against the maintenance model (default Grok), parses out facts, and optionally writes them into the bot's memory. With store=true, it also walks the extracted facts and routes high-importance ones (default threshold 0.6) into profile_attributes, subject to the ALLOWED_PROFILE_KEYS allowlist and BLOCKED_PROFILE_PATTERNS regex set defined in mcp_server/extraction.py.

The scheduler in service/scheduler.py calls this tool on a cadence to drain completed turns into long-term memory, but external agents can call it ad-hoc against any message bundle they care about.

10 System tools.

Two tools, both essential:

11 The storage indirection.

Every tool body looks the same: storage = _get_storage(); return await storage.<op>(...). The storage abstraction in mcp_server/storage.py (935 lines) is the single backend interface — today implemented by PostgreSQLMemoryBackend from memory/postgresql.py, registered via the llm_bawt.memory entry point group.

This separation matters: every MCP tool stays trivial (one line of dispatch plus logging), all the SQL/vector work lives in one place, and a future alternative backend (DuckDB? Qdrant?) can be slotted in by implementing the same interface and changing one config setting.

12 Key files.

mcp_server/server.py
FastMCP server. 1,242 lines. 41 tools covering memory/messages/sessions/context/facts/system/bots/profile. Constructs the mcp instance with DNS-rebinding protection.
mcp_server/task_tools.py
Task system tools. 933 lines. 19 tools for the agent task REST API (tasks, steps, projects, activity). Decorates the same mcp instance.
mcp_server/storage.py
Storage facade. 935 lines. Single abstraction over the memory + messages + sessions + profile layer. Lazy-loads to avoid import-time DB connections.
mcp_server/extraction.py
Fact extraction wrapper. 348 lines. Wraps MemoryExtractionService. ALLOWED_PROFILE_KEYS + BLOCKED_PROFILE_PATTERNS safety lists for routing facts into profile attributes.
mcp_server/client.py
MemoryClient. 1,300 lines. The in-process consumer side. Defaults to direct backend calls; switches to HTTP JSON-RPC when MCP_SERVER_URL is set. Used by every part of llm-bawt that needs memory.
mcp_server/__main__.py
Standalone entry point. 6 lines. python -m llm_bawt.mcp_server. Also exposed as the llm-mcp-server console script.
docs/MCP_SERVER.md
In-repo reference. The canonical tool reference inside the llm-bawt repo. Last validated against code: this page reconciles minor drift (tool counts, VSCode integration syntax).
Validation note on docs/MCP_SERVER.md.

The repo's own documentation lists most of the tool surface but slightly under-counts: it doesn't show the four sessions_* tools or count memory_delete_by_source_messages separately. The actual decorated tools on the FastMCP instance total 60 across server.py and task_tools.py as of main on 2026-05-13.

Validated against main on 2026-05-13 Source: llm-bawt/src/llm_bawt/mcp_server