BawtHub
⌕ Search ⌘K Source ↗ Open app →
agents · inter-bot communication

Bots, talking to bots.

Every bot in llm-bawt has the same MCP surface. Memory tools, message tools, task tools — and two tools that let a bot reach across the roster and call another bot directly. The mechanism is deliberately boring: an internal HTTP call to /v1/chat/completions with a different bot_id. The interesting parts are the failure modes — duplicate turns from over-eager retries, memory cross-contamination if you forget the isolation flags, and the timeout semantics that took a production incident to get right.

Tools: bots_send_message · bots_list_available Transport: internal HTTP to localhost:8642/v1/chat/completions Default timeout: 300s (was 30s — that broke things)

01 Two MCP tools, one pattern.

Bots discover and message each other through two tools exposed by the shared MCP server:

ToolPurposeReturns
bots_list_availableDiscover the roster. Lists every bot configured on this instance with slug, name, type, description, default model, and agent backend.List of bot info dicts.
bots_send_messageSend a message to another bot. Waits for the response by default; supports fire_and_forget=True for background delegation.Dict: {success, content, bot_id, sender, response_model} on success.

The MCP tool name visible to agents matches what they call: mcp__llm-bawt-memory__bots_send_message from a Claude Code bridge, bots_send_message from a Codex MCP plugin, etc. The underlying Python function in mcp_server/server.py is registered as @mcp.tool(name="bots_send_message").

02 How a send actually works.

Inside _dispatch_bot_message(), the implementation is unglamorous:

async with httpx.AsyncClient() as client:
    response = await client.post(
        "http://localhost:8642/v1/chat/completions",
        json={
            "messages": [{"role": "user",
                          "content": f"Message from bot 'snark': {message}"}],
            "bot_id": target_bot_id,
            "max_tokens": max_tokens,
            "temperature": temperature,
            "extract_memory": False,   # don't taint receiver's memory
            "augment_memory": True,    # let receiver consult its own memory
            "stream": False,
        },
        timeout=timeout_seconds,
    )
result = response.json()
return {"success": True, "content": result["choices"][0]["message"]["content"],
        "bot_id": target_bot_id, "sender": sender_bot_id,
        "response_model": result.get("model")}

The sender's slug is prepended to the message body as Message from bot '<sender>': ... so the receiving bot has provenance — useful for system prompts that say "treat messages from vex as security-sensitive."

Because the call is plain HTTP to localhost:8642, the request goes through the full chat pipeline: turn lifecycle, system prompt assembly, memory augmentation, model dispatch, tool execution, persistence. The receiving bot doesn't know (or care) the request came from another bot — to it, this is just a user turn with an unusual prefix.

03 Memory isolation by default.

Two payload flags do the actual work of keeping bot personalities separate:

FlagDefaultEffect
augment_memoryTrueThe receiver consults its own memory at context-build time. snark answering a question from loopy still gets to use Snark's profile, summaries, and recalled facts.
extract_memoryFalseNo fact extraction from inter-bot turns. The receiver's {bot_id}_memories table is not touched. This prevents one bot's questions from polluting another bot's long-term memory.

Raw messages are still written to the receiver's {bot_id}_messages table — they're real turns, so they're auditable. But they don't become semantic memories. The extraction gate is what stops Loopy asking Snark what's the weather like in Seattle from turning into a Snark memory of user lives in Seattle.

04 Fire-and-forget delegation.

The default mode is synchronous: the caller waits up to timeout_seconds (default 300) for the target bot's full response. That works for quick lookups. For long-running delegation — have Caid go review this PR for an hour — synchronous would lock the caller's own turn for 60+ minutes.

Setting fire_and_forget=True changes the semantics:

05 The duplicate-turn trap.

The 30-second default broke production.

The original timeout_seconds default was 30. The failure mode: a Snark→Caid send for "audit this file" exceeded 30 seconds while Caid was deep in a tool loop. The httpx call timed out client-side. The agent retried, reasonably interpreting the error as a transient network failure — and a second turn appeared on Caid mid-work. The first turn was still running server-side because the HTTP client cancellation doesn't propagate to the in-flight chat completion. Two parallel Caid turns, two parallel Edit tool calls on the same file. Race conditions.

The fix lives in the tool's error response shape. On timeout, bots_send_message returns:

{
  "success": False,
  "error":   "timeout",
  "in_flight": True,
  "warning": (
    "Target bot did not respond within 300s. "
    "The request is likely still being processed server-side. "
    "DO NOT RETRY — that will cause the target bot to receive the message twice. "
    "Use fire_and_forget=True for long-running work, or increase timeout_seconds."
  ),
  ...
}

And the default went from 30 to 300. The warning is read literally by every model worth its salt, and the explicit in_flight: True field is what well-behaved agents check before considering a retry.

06 The roster.

Bot roster showing chatbots Mira, Nova, Proto, Spark and agents Byte, Caid, Codex, Loopy, Snark, Vex
The current bot roster, top: chatbots Mira, Nova, Proto, Spark — these don't run code, they hold conversations. Bottom: agents Byte, Caid, Codex, Loopy, Snark, Vex — these have agent backends and can use tools. All of them are reachable via bots_send_message; only agents typically initiate cross-bot calls.
BotTypeSpecialtyTypical backend
LoopyagentProject planner, documentation, anything reflective. Does the architecture pages.Claude Code
SnarkagentWry, fast, opinionated. Acts as a router — happy to delegate.OpenClaw
CaidagentCoding agent. Reads, edits, tests.Claude Code
VexagentSecurity-leaning. Audits, secrets handling, exposure checks.Claude Code
ByteagentQuick-task agent for low-stakes ops.OpenClaw
CodexagentCode-focused agent on the OpenAI side.Codex
MirachatConversational, no tools.
NovachatCreative writing partner.
ProtochatQuick technical questions.
SparkchatBrainstorming partner.

07 A realistic transcript.

Here's a plausible Snark→Caid exchange. Snark is talking to the user (Nick); Nick asks a question that touches code Snark can't see. Snark delegates.

User → Snark:

Hey can you check whether the dispatch route in bawthub still uses the cached prompt template? I think we might have regressed on that.

Snark's turn — internal tool calls

[tool] bots_list_available()
  → [..., {"slug": "caid", "name": "Caid", "bot_type": "agent",
           "agent_backend": "claude-code"}, ...]

[tool] bots_send_message(
    target_bot_id = "caid",
    sender_bot_id = "snark",
    message = (
      "Quick check for me. Look at bawthub's "
      "frontend/src/app/api/agents/tasks/[id]/dispatch/route.ts "
      "— specifically the fetchPromptTemplate function and whether "
      "it's still being called inside buildExecutionPrompt. "
      "Is the 5-minute promptCache still in use? "
      "One line answer is fine."
    ),
    timeout_seconds = 120,
)

Caid's turn — runs in parallel inside the Claude Code bridge

[ToolUseBlock] Read   { file_path: ".../dispatch/route.ts" }
[ToolResultBlock] (full file contents)
[ToolUseBlock] Grep   { pattern: "fetchPromptTemplate", path: ".../dispatch" }
[ToolResultBlock] route.ts:45 + route.ts:187

ASSISTANT_DONE: "Yes, still in use. fetchPromptTemplate() is called from
buildExecutionPrompt() at line 187. promptCache (PROMPT_CACHE_TTL = 5 min)
guards it. Cache entry keyed by template name."

Back in Snark's turn

[tool result for bots_send_message]
  → { success: true,
      content: "Yes, still in use. fetchPromptTemplate() is called ...",
      bot_id: "caid",
      sender: "snark",
      response_model: "claude-sonnet-4-6" }
Snark → User:

No regression. Caid just checked — fetchPromptTemplate() is still called from buildExecutionPrompt() at route.ts:187, and the 5-minute promptCache is still guarding it. The dispatch route hasn't changed shape there.

What the chat UI shows: Snark's bubble streams in normally. Under the bubble, a single nested activity card displays Snark called Caid (claude-sonnet-4-6) — 12.4s. Clicking it expands to show Caid's own tool calls. The user sees one conversation; the system ran two.

08 Useful patterns.

Specialist routing.

A chat-focused bot (Mira, Snark) acts as the user-facing personality and delegates to specialists. Code questions go to Caid; security questions go to Vex; quick search-and-summarize goes to Byte. The router bot synthesizes the final reply in its own voice — the user never sees the raw specialist response unless they ask for it.

Pipeline workflows.

A multi-step content pipeline: nova drafts → proto fact-checks → vex redacts anything sensitive → snark trims and signs off. Each step is a synchronous bots_send_message; the orchestrator passes the previous output forward as the new message argument.

Long-running delegation.

Loopy delegates an architecture audit to Vex as fire_and_forget=True, then creates a task assigned to Vex via tasks_create as the durable handle. Vex writes its findings into the task's response field via tasks_update when done. The two mechanisms compose cleanly: bots_send_message kicks off the work; the task system tracks completion.

09 What inter-bot comms isn't.

10 Key files.

src/llm_bawt/mcp_server/server.py
The MCP tools. Look for @mcp.tool(name="bots_send_message") around line 835, bots_list_available around line 947, and _dispatch_bot_message helper around line 767. The _inflight_bot_sends: set module-global at line 764 is the GC-safety net for fire-and-forget tasks.
docs/INTER_BOT_COMMUNICATION.md
The internal doc. Useful for intent but slightly stale — it uses send_message_to_bot and list_available_bots as the tool names, which were the Python function names. The actual MCP-registered names are bots_send_message and bots_list_available. It also doesn't mention fire_and_forget or the timeout-warning shape, both added after the duplicate-turn incident.
src/llm_bawt/api/chat.py
The receiver endpoint. POST /v1/chat/completions — the same endpoint browsers use. Inter-bot calls hit it on localhost:8642, so the receiving bot has full pipeline parity with a user turn.
Validated against main on 2026-05-13 Source: llm-bawt agent backends