llm-bawt · mcp-server

Every subsystem, as MCP tools.

llm-bawt runs a FastMCP server on port 8001 exposing the platform's internals — memory, messages, sessions, profiles, inter-bot messaging, fact extraction, and the entire agent task system — as 60 callable tools over streamable HTTP JSON-RPC. The same Python functions that power llm-bawt's in-process tool dispatch are also reachable from VS Code, Claude Desktop, agent SDKs, or curl. Same transport, same auth model (none — host allowlist), same data shapes.

Implementation: FastMCP (mcp[cli]) Port: 8001 Transport: streamable-http (default), stdio (optional) Tools: 60 (41 in server.py + 19 in task_tools.py)

01 What it is.

The MCP server lives in src/llm_bawt/mcp_server/. It's a FastMCP application registered with a flat namespace where every tool name is prefix-grouped:

Prefix	Domain	Count
`memory_*`	Bot memory CRUD, search, supersede, consolidate, regenerate-embeddings	14
`messages_*`	Conversation history CRUD, search, preview, ignore/restore, recall marking	15
`sessions_*`	Session lifecycle (shared sessions table)	4
`context_*`	Combined message + memory context	1
`facts_*`	LLM-based fact extraction	1
`system_*`	Service-wide stats + maintenance	2
`bots_*`	Bot discovery + inter-bot messaging	2
`profile`	User/bot profile attribute router (action-based)	1
`tasks_*`	Task CRUD, briefing, dependencies, promotion, regeneration	10
`steps_*`	Task step lifecycle	3
`projects_*`	Project CRUD with context briefings	6
`activity_*`	Agent activity log	1

Sum: 60 tools. The first 41 live in mcp_server/server.py (1,242 lines); the agent task system tools live in mcp_server/task_tools.py (933 lines) and decorate the same shared mcp instance.

02 Two consumers, one set of functions.

The server has two completely independent consumer paths against the same Python functions:

Same functions, two entry points

In-process

llm-bawt's own tool loop dispatches via MemoryClient in mcp_server/client.py. By default MemoryClient calls the storage layer directly — no JSON-RPC overhead. When LLM_BAWT_MCP_SERVER_URL is set, it switches to HTTP JSON-RPC for auditability.

External

VS Code MCP extensions, Claude Desktop, agent bridges, and curl all talk to the same FastMCP HTTP endpoint at http://<host>:8001/mcp over streamable-http JSON-RPC.

The function bodies are identical — @mcp.tool(name="memory_search") registers the function as an MCP tool but doesn't change what it does. When llm-bawt's tool executor needs memory_search, it imports and calls the function directly. When VS Code needs it, FastMCP serializes the JSON-RPC call and invokes the same function.

03 Startup and configuration.

The FastMCP instance is constructed in server.py with:

json_response=True — replies are plain JSON, not chunked.
stateless_http=True — no per-connection session state. Every call is independent.
TransportSecuritySettings with DNS rebinding protection enabled — only hosts matching the allowlist may issue requests. Default allowlist: 127.0.0.1:*,localhost:*; override with LLM_BAWT_MCP_ALLOWED_HOSTS.

The server runs inside the same uvicorn worker as the FastAPI service when launched via the app container — service/api.py's lifespan calls _ensure_mcp_server(config) to bring it up. Standalone, it's reachable via llm-mcp-server or python -m llm_bawt.mcp_server with these flags:

Flag	Default
`--transport`	`http` (or `stdio`)
`--host`	`0.0.0.0` (HTTP only)
`--port`	`8001` (HTTP only)

04 Memory tools.

Fourteen tools cover everything the memory layer can do. Most operations target a specific bot_id namespace; cross-bot variants are explicit.

Tool	Purpose
`memory_store`	Write a new memory. Tags, importance, source-message IDs.
`memory_search`	Per-bot semantic search.
`memory_search_all`	Cross-bot fan-out — returns the source bot per hit.
`memory_search_source`	Read-only lookup inside a specific other bot's memory.
`memory_list_sources`	Which bots have any memories at all.
`memory_list_recent`	N newest, no embedding query.
`memory_list_high_importance`	Filter by importance threshold.
`memory_update`	Modify content, tags, importance.
`memory_update_meaning`	Enrich with `intent`, `stakes`, `emotional_charge`, `recurrence_keywords`. Drives context-builder categorization.
`memory_delete`	Delete by id.
`memory_delete_by_source_messages`	Delete every memory derived from a given set of messages.
`memory_supersede`	Mark one memory as replaced by another (supersede chain).
`memory_consolidate`	Cluster similar embeddings + LLM-merge. Local-LLM only.
`memory_regenerate_embeddings`	Recompute vectors. Used after embedding model changes.

05 Messages and sessions.

Fifteen messages_* tools plus four sessions_* tools cover the full conversation-history surface. The interesting one is the soft-delete family — messages_ignore_* moves messages out of search/recall without dropping the row, and messages_restore_ignored reverses it. messages_remove_last_partial drops the trailing assistant message if it's marked partial (used after an aborted stream).

messages_mark_recalled attaches a recall marker to messages a summary was built from — used by the summarization layer to know which raw messages have already been compressed. Sessions are the shared, cross-bot grouping for conversations; tools cover create-implicit (via messages_add), close, get-by-id, list, and get-active.

06 Inter-bot messaging.

Two tools, but they enable real multi-bot workflows:

bots_list_available() — discovers what bots can receive a message. Filters out bots without memory clients or that explicitly opt out of inter-bot.
bots_send_message(target_bot_id, message, sender_bot_id, max_tokens, temperature) — runs a one-shot chat completion against the target bot via the internal _dispatch_bot_message helper and returns the response text. The target bot's full pipeline runs; the sending bot sees only the reply.

This is how bots can delegate to specialists. Nova can call out to Snark for snark, or to a code-specialist bot to write a function. Each bot remains responsible for its own memory and personality; the calling bot just sees a tool result.

07 The agent task pipeline.

The 19 tools in task_tools.py wrap the agent task REST API at LLM_BAWT_TASK_API_URL (default http://echo.lan.zenoran.com — the BawtHub frontend, which owns the task DB via Prisma). They exist so agents don't have to hand-roll HTTP calls for what is otherwise a normal CRUD surface.

Tool	Purpose
`tasks_list / tasks_get / tasks_get_context`	List, fetch full, fetch formatted briefing.
`tasks_create / tasks_update / tasks_delete`	Standard CRUD. Prefer `status=CANCELLED` over hard delete.
`tasks_add_dependency / tasks_remove_dependency`	DAG editing — cycles rejected server-side.
`tasks_promote`	Lift a task to its own project (re-parents).
`tasks_regenerate`	Server-side LLM rewrite of title + steps.
`steps_add / steps_update / steps_delete`	Step lifecycle. Status transitions: `PENDING → RUNNING → COMPLETED / FAILED / SKIPPED`.
`projects_list / projects_get / projects_get_context`	Project listing + briefing.
`projects_create / projects_update / projects_delete`	Project CRUD; deleting a project unassigns its tasks.
`activity_get`	Recent activity entries, filterable by task / project.

An agent's typical loop: tasks_get_context for the briefing, steps_update(status="RUNNING"), do work, steps_update(status="COMPLETED", output=...), repeat until task done, then tasks_update(status="REVIEW", response=...).

08 The action-routed profile tool.

Unlike the prefix-grouped tools, profile is a single router tool. Calling shape:

profile(action, entity_type?, entity_id?, category?, key?, value?)

Actions: get, set, delete, list_categories, plus higher-level get_user_summary / get_bot_summary that render the structured profile to a prose paragraph for prompt injection.

09 Fact extraction.

facts_extract is the only tool whose entire job is to invoke an LLM. Given a list of messages, a bot_id, and a user_id, it runs MemoryExtractionService from memory/extraction/ against the maintenance model (default Grok), parses out facts, and optionally writes them into the bot's memory. With store=true, it also walks the extracted facts and routes high-importance ones (default threshold 0.6) into profile_attributes, subject to the ALLOWED_PROFILE_KEYS allowlist and BLOCKED_PROFILE_PATTERNS regex set defined in mcp_server/extraction.py.

The scheduler in service/scheduler.py calls this tool on a cadence to drain completed turns into long-term memory, but external agents can call it ad-hoc against any message bundle they care about.

10 System tools.

Two tools, both essential:

system_stats(bot_id?) — per-bot counts of messages, memories, summaries, sessions; or service-wide if bot_id omitted.
system_run_maintenance(bot_id?, run_consolidation?, run_recurrence_detection?, run_decay_pruning?, run_orphan_cleanup?, dry_run?) — kicks off a maintenance cycle. The scheduler runs this on its own cadence; this is the manual override.

11 The storage indirection.

Every tool body looks the same: storage = _get_storage(); return await storage.<op>(...). The storage abstraction in mcp_server/storage.py (935 lines) is the single backend interface — today implemented by PostgreSQLMemoryBackend from memory/postgresql.py, registered via the llm_bawt.memory entry point group.

This separation matters: every MCP tool stays trivial (one line of dispatch plus logging), all the SQL/vector work lives in one place, and a future alternative backend (DuckDB? Qdrant?) can be slotted in by implementing the same interface and changing one config setting.

12 Key files.

mcp_server/server.py

FastMCP server. 1,242 lines. 41 tools covering memory/messages/sessions/context/facts/system/bots/profile. Constructs the mcp instance with DNS-rebinding protection.

mcp_server/task_tools.py

Task system tools. 933 lines. 19 tools for the agent task REST API (tasks, steps, projects, activity). Decorates the same mcp instance.

mcp_server/storage.py

Storage facade. 935 lines. Single abstraction over the memory + messages + sessions + profile layer. Lazy-loads to avoid import-time DB connections.

mcp_server/extraction.py

Fact extraction wrapper. 348 lines. Wraps MemoryExtractionService. ALLOWED_PROFILE_KEYS + BLOCKED_PROFILE_PATTERNS safety lists for routing facts into profile attributes.

mcp_server/client.py

MemoryClient. 1,300 lines. The in-process consumer side. Defaults to direct backend calls; switches to HTTP JSON-RPC when MCP_SERVER_URL is set. Used by every part of llm-bawt that needs memory.

mcp_server/__main__.py

Standalone entry point. 6 lines. python -m llm_bawt.mcp_server. Also exposed as the llm-mcp-server console script.

docs/MCP_SERVER.md

In-repo reference. The canonical tool reference inside the llm-bawt repo. Last validated against code: this page reconciles minor drift (tool counts, VSCode integration syntax).

✦

Validation note on docs/MCP_SERVER.md.

The repo's own documentation lists most of the tool surface but slightly under-counts: it doesn't show the four sessions_* tools or count memory_delete_by_source_messages separately. The actual decorated tools on the FastMCP instance total 60 across server.py and task_tools.py as of main on 2026-05-13.

PreviousAgent backends NextPrompt adapters

Validated against main on 2026-05-13 Source: llm-bawt/src/llm_bawt/mcp_server