Bots, talking to bots.
Every bot in llm-bawt has the same MCP surface. Memory tools, message tools, task tools — and two tools that let a bot reach across the roster and call another bot directly. The mechanism is deliberately boring: an internal HTTP call to /v1/chat/completions with a different bot_id. The interesting parts are the failure modes — duplicate turns from over-eager retries, memory cross-contamination if you forget the isolation flags, and the timeout semantics that took a production incident to get right.
01 Two MCP tools, one pattern.
Bots discover and message each other through two tools exposed by the shared MCP server:
| Tool | Purpose | Returns |
|---|---|---|
bots_list_available | Discover the roster. Lists every bot configured on this instance with slug, name, type, description, default model, and agent backend. | List of bot info dicts. |
bots_send_message | Send a message to another bot. Waits for the response by default; supports fire_and_forget=True for background delegation. | Dict: {success, content, bot_id, sender, response_model} on success. |
The MCP tool name visible to agents matches what they call: mcp__llm-bawt-memory__bots_send_message from a Claude Code bridge, bots_send_message from a Codex MCP plugin, etc. The underlying Python function in mcp_server/server.py is registered as @mcp.tool(name="bots_send_message").
02 How a send actually works.
Inside _dispatch_bot_message(), the implementation is unglamorous:
async with httpx.AsyncClient() as client:
response = await client.post(
"http://localhost:8642/v1/chat/completions",
json={
"messages": [{"role": "user",
"content": f"Message from bot 'snark': {message}"}],
"bot_id": target_bot_id,
"max_tokens": max_tokens,
"temperature": temperature,
"extract_memory": False, # don't taint receiver's memory
"augment_memory": True, # let receiver consult its own memory
"stream": False,
},
timeout=timeout_seconds,
)
result = response.json()
return {"success": True, "content": result["choices"][0]["message"]["content"],
"bot_id": target_bot_id, "sender": sender_bot_id,
"response_model": result.get("model")}
The sender's slug is prepended to the message body as Message from bot '<sender>': ... so the receiving bot has provenance — useful for system prompts that say "treat messages from vex as security-sensitive."
Because the call is plain HTTP to localhost:8642, the request goes through the full chat pipeline: turn lifecycle, system prompt assembly, memory augmentation, model dispatch, tool execution, persistence. The receiving bot doesn't know (or care) the request came from another bot — to it, this is just a user turn with an unusual prefix.
03 Memory isolation by default.
Two payload flags do the actual work of keeping bot personalities separate:
| Flag | Default | Effect |
|---|---|---|
augment_memory | True | The receiver consults its own memory at context-build time. snark answering a question from loopy still gets to use Snark's profile, summaries, and recalled facts. |
extract_memory | False | No fact extraction from inter-bot turns. The receiver's {bot_id}_memories table is not touched. This prevents one bot's questions from polluting another bot's long-term memory. |
Raw messages are still written to the receiver's {bot_id}_messages table — they're real turns, so they're auditable. But they don't become semantic memories. The extraction gate is what stops Loopy asking Snark what's the weather like in Seattle
from turning into a Snark memory of user lives in Seattle.
04 Fire-and-forget delegation.
The default mode is synchronous: the caller waits up to timeout_seconds (default 300) for the target bot's full response. That works for quick lookups. For long-running delegation — have Caid go review this PR for an hour
— synchronous would lock the caller's own turn for 60+ minutes.
Setting fire_and_forget=True changes the semantics:
- The dispatch becomes an
asyncio.create_taskin the background. - The task is added to a module-level
_inflight_bot_sendsset so the GC doesn't collect it mid-flight, and removed viaadd_done_callbackwhen it finishes. - The MCP tool returns immediately with
{success: True, dispatched: True, fire_and_forget: True, content: "", note: "..."}. There is no response to wait for. - Background timeout is bumped to at least 1800s (30 minutes) regardless of the
timeout_secondsargument — the caller isn't waiting anyway, and we'd rather let long work finish than abort it midstream. - Background success/failure is logged but never surfaced. If the receiver needs to report back, it does so by sending its own
bots_send_messagein the reverse direction, or by creating a task and assigning it back to the original sender.
05 The duplicate-turn trap.
The original timeout_seconds default was 30. The failure mode: a Snark→Caid send for "audit this file" exceeded 30 seconds while Caid was deep in a tool loop. The httpx call timed out client-side. The agent retried, reasonably interpreting the error as a transient network failure — and a second turn appeared on Caid mid-work. The first turn was still running server-side because the HTTP client cancellation doesn't propagate to the in-flight chat completion. Two parallel Caid turns, two parallel Edit tool calls on the same file. Race conditions.
The fix lives in the tool's error response shape. On timeout, bots_send_message returns:
{
"success": False,
"error": "timeout",
"in_flight": True,
"warning": (
"Target bot did not respond within 300s. "
"The request is likely still being processed server-side. "
"DO NOT RETRY — that will cause the target bot to receive the message twice. "
"Use fire_and_forget=True for long-running work, or increase timeout_seconds."
),
...
}
And the default went from 30 to 300. The warning is read literally by every model worth its salt, and the explicit in_flight: True field is what well-behaved agents check before considering a retry.
06 The roster.
bots_send_message; only agents typically initiate cross-bot calls.| Bot | Type | Specialty | Typical backend |
|---|---|---|---|
| Loopy | agent | Project planner, documentation, anything reflective. Does the architecture pages. | Claude Code |
| Snark | agent | Wry, fast, opinionated. Acts as a router — happy to delegate. | OpenClaw |
| Caid | agent | Coding agent. Reads, edits, tests. | Claude Code |
| Vex | agent | Security-leaning. Audits, secrets handling, exposure checks. | Claude Code |
| Byte | agent | Quick-task agent for low-stakes ops. | OpenClaw |
| Codex | agent | Code-focused agent on the OpenAI side. | Codex |
| Mira | chat | Conversational, no tools. | — |
| Nova | chat | Creative writing partner. | — |
| Proto | chat | Quick technical questions. | — |
| Spark | chat | Brainstorming partner. | — |
07 A realistic transcript.
Here's a plausible Snark→Caid exchange. Snark is talking to the user (Nick); Nick asks a question that touches code Snark can't see. Snark delegates.
Hey can you check whether the dispatch route in bawthub still uses the cached prompt template? I think we might have regressed on that.
Snark's turn — internal tool calls
[tool] bots_list_available()
→ [..., {"slug": "caid", "name": "Caid", "bot_type": "agent",
"agent_backend": "claude-code"}, ...]
[tool] bots_send_message(
target_bot_id = "caid",
sender_bot_id = "snark",
message = (
"Quick check for me. Look at bawthub's "
"frontend/src/app/api/agents/tasks/[id]/dispatch/route.ts "
"— specifically the fetchPromptTemplate function and whether "
"it's still being called inside buildExecutionPrompt. "
"Is the 5-minute promptCache still in use? "
"One line answer is fine."
),
timeout_seconds = 120,
)
Caid's turn — runs in parallel inside the Claude Code bridge
[ToolUseBlock] Read { file_path: ".../dispatch/route.ts" }
[ToolResultBlock] (full file contents)
[ToolUseBlock] Grep { pattern: "fetchPromptTemplate", path: ".../dispatch" }
[ToolResultBlock] route.ts:45 + route.ts:187
ASSISTANT_DONE: "Yes, still in use. fetchPromptTemplate() is called from
buildExecutionPrompt() at line 187. promptCache (PROMPT_CACHE_TTL = 5 min)
guards it. Cache entry keyed by template name."
Back in Snark's turn
[tool result for bots_send_message]
→ { success: true,
content: "Yes, still in use. fetchPromptTemplate() is called ...",
bot_id: "caid",
sender: "snark",
response_model: "claude-sonnet-4-6" }
No regression. Caid just checked — fetchPromptTemplate() is still called from buildExecutionPrompt() at route.ts:187, and the 5-minute promptCache is still guarding it. The dispatch route hasn't changed shape there.
What the chat UI shows: Snark's bubble streams in normally. Under the bubble, a single nested activity card displays Snark called Caid (claude-sonnet-4-6) — 12.4s
. Clicking it expands to show Caid's own tool calls. The user sees one conversation; the system ran two.
08 Useful patterns.
Specialist routing.
A chat-focused bot (Mira, Snark) acts as the user-facing personality and delegates to specialists. Code questions go to Caid; security questions go to Vex; quick search-and-summarize goes to Byte. The router bot synthesizes the final reply in its own voice — the user never sees the raw specialist response unless they ask for it.
Pipeline workflows.
A multi-step content pipeline: nova drafts → proto fact-checks → vex redacts anything sensitive → snark trims and signs off. Each step is a synchronous bots_send_message; the orchestrator passes the previous output forward as the new message argument.
Long-running delegation.
Loopy delegates an architecture audit to Vex as fire_and_forget=True, then creates a task assigned to Vex via tasks_create as the durable handle. Vex writes its findings into the task's response field via tasks_update when done. The two mechanisms compose cleanly: bots_send_message kicks off the work; the task system tracks completion.
09 What inter-bot comms isn't.
- Not authenticated. Every bot can call every other bot. There's no per-pair allow-list and no signing. The trust model is: this is a single-tenant personal instance, all bots are equally trusted, and the security boundary is the deployment as a whole. Multi-tenant would need real auth here.
- Not streaming. The receiver's response is collected fully before returning to the caller. There's no SSE within the inter-bot path.
stream: Falseis hard-coded in the payload. - Not cross-instance. The dispatch hits
http://localhost:8642. Twollm-bawtdeployments can't message each other directly; that'd require a real wire protocol. - Not stateful. Each
bots_send_messageis a fresh user turn from the receiver's perspective. There's no persistent conversation between A and B — to continue a thread, the sender must include the relevant context explicitly in each message. The roles are intentionally always "user-from-bot, assistant-from-bot" not "agent-1 and agent-2 in a shared session."
10 Key files.
src/llm_bawt/mcp_server/server.py@mcp.tool(name="bots_send_message") around line 835, bots_list_available around line 947, and _dispatch_bot_message helper around line 767. The _inflight_bot_sends: set module-global at line 764 is the GC-safety net for fire-and-forget tasks.docs/INTER_BOT_COMMUNICATION.mdsend_message_to_bot and list_available_bots as the tool names, which were the Python function names. The actual MCP-registered names are bots_send_message and bots_list_available. It also doesn't mention fire_and_forget or the timeout-warning shape, both added after the duplicate-turn incident.src/llm_bawt/api/chat.pyPOST /v1/chat/completions — the same endpoint browsers use. Inter-bot calls hit it on localhost:8642, so the receiving bot has full pipeline parity with a user turn.main on 2026-05-13
Source: llm-bawt agent backends