agents · inter-bot communication

Bots, talking to bots.

Every bot in llm-bawt has the same MCP surface. Memory tools, message tools, task tools — and two tools that let a bot reach across the roster and call another bot directly. The mechanism is deliberately boring: an internal HTTP call to /v1/chat/completions with a different bot_id. The interesting parts are the failure modes — duplicate turns from over-eager retries, memory cross-contamination if you forget the isolation flags, and the timeout semantics that took a production incident to get right.

Tools: bots_send_message · bots_list_available Transport: internal HTTP to localhost:8642/v1/chat/completions Default timeout: 300s (was 30s — that broke things)

01 Two MCP tools, one pattern.

Bots discover and message each other through two tools exposed by the shared MCP server:

Tool	Purpose	Returns
`bots_list_available`	Discover the roster. Lists every bot configured on this instance with slug, name, type, description, default model, and agent backend.	List of bot info dicts.
`bots_send_message`	Send a message to another bot. Waits for the response by default; supports `fire_and_forget=True` for background delegation.	Dict: `{success, content, bot_id, sender, response_model}` on success.

The MCP tool name visible to agents matches what they call: mcp__llm-bawt-memory__bots_send_message from a Claude Code bridge, bots_send_message from a Codex MCP plugin, etc. The underlying Python function in mcp_server/server.py is registered as @mcp.tool(name="bots_send_message").

02 How a send actually works.

Inside _dispatch_bot_message(), the implementation is unglamorous:

async with httpx.AsyncClient() as client:
    response = await client.post(
        "http://localhost:8642/v1/chat/completions",
        json={
            "messages": [{"role": "user",
                          "content": f"Message from bot 'snark': {message}"}],
            "bot_id": target_bot_id,
            "max_tokens": max_tokens,
            "temperature": temperature,
            "extract_memory": False,   # don't taint receiver's memory
            "augment_memory": True,    # let receiver consult its own memory
            "stream": False,
        },
        timeout=timeout_seconds,
    )
result = response.json()
return {"success": True, "content": result["choices"][0]["message"]["content"],
        "bot_id": target_bot_id, "sender": sender_bot_id,
        "response_model": result.get("model")}

The sender's slug is prepended to the message body as Message from bot '<sender>': ... so the receiving bot has provenance — useful for system prompts that say "treat messages from vex as security-sensitive."

Because the call is plain HTTP to localhost:8642, the request goes through the full chat pipeline: turn lifecycle, system prompt assembly, memory augmentation, model dispatch, tool execution, persistence. The receiving bot doesn't know (or care) the request came from another bot — to it, this is just a user turn with an unusual prefix.

03 Memory isolation by default.

Two payload flags do the actual work of keeping bot personalities separate:

Flag	Default	Effect
`augment_memory`	`True`	The receiver consults its own memory at context-build time. `snark` answering a question from `loopy` still gets to use Snark's profile, summaries, and recalled facts.
`extract_memory`	`False`	No fact extraction from inter-bot turns. The receiver's `{bot_id}_memories` table is not touched. This prevents one bot's questions from polluting another bot's long-term memory.

Raw messages are still written to the receiver's {bot_id}_messages table — they're real turns, so they're auditable. But they don't become semantic memories. The extraction gate is what stops Loopy asking Snark what's the weather like in Seattle from turning into a Snark memory of user lives in Seattle.

04 Fire-and-forget delegation.

The default mode is synchronous: the caller waits up to timeout_seconds (default 300) for the target bot's full response. That works for quick lookups. For long-running delegation — have Caid go review this PR for an hour — synchronous would lock the caller's own turn for 60+ minutes.

Setting fire_and_forget=True changes the semantics:

The dispatch becomes an asyncio.create_task in the background.
The task is added to a module-level _inflight_bot_sends set so the GC doesn't collect it mid-flight, and removed via add_done_callback when it finishes.
The MCP tool returns immediately with {success: True, dispatched: True, fire_and_forget: True, content: "", note: "..."}. There is no response to wait for.
Background timeout is bumped to at least 1800s (30 minutes) regardless of the timeout_seconds argument — the caller isn't waiting anyway, and we'd rather let long work finish than abort it midstream.
Background success/failure is logged but never surfaced. If the receiver needs to report back, it does so by sending its own bots_send_message in the reverse direction, or by creating a task and assigning it back to the original sender.

05 The duplicate-turn trap.

⚠

The 30-second default broke production.

The original timeout_seconds default was 30. The failure mode: a Snark→Caid send for "audit this file" exceeded 30 seconds while Caid was deep in a tool loop. The httpx call timed out client-side. The agent retried, reasonably interpreting the error as a transient network failure — and a second turn appeared on Caid mid-work. The first turn was still running server-side because the HTTP client cancellation doesn't propagate to the in-flight chat completion. Two parallel Caid turns, two parallel Edit tool calls on the same file. Race conditions.

The fix lives in the tool's error response shape. On timeout, bots_send_message returns:

{
  "success": False,
  "error":   "timeout",
  "in_flight": True,
  "warning": (
    "Target bot did not respond within 300s. "
    "The request is likely still being processed server-side. "
    "DO NOT RETRY — that will cause the target bot to receive the message twice. "
    "Use fire_and_forget=True for long-running work, or increase timeout_seconds."
  ),
  ...
}

And the default went from 30 to 300. The warning is read literally by every model worth its salt, and the explicit in_flight: True field is what well-behaved agents check before considering a retry.

06 The roster.

Bot roster showing chatbots Mira, Nova, Proto, Spark and agents Byte, Caid, Codex, Loopy, Snark, Vex

The current bot roster, top: chatbots Mira, Nova, Proto, Spark — these don't run code, they hold conversations. Bottom: agents Byte, Caid, Codex, Loopy, Snark, Vex — these have agent backends and can use tools. All of them are reachable via bots_send_message; only agents typically initiate cross-bot calls.

Bot	Type	Specialty	Typical backend
Loopy	agent	Project planner, documentation, anything reflective. Does the architecture pages.	Claude Code
Snark	agent	Wry, fast, opinionated. Acts as a router — happy to delegate.	OpenClaw
Caid	agent	Coding agent. Reads, edits, tests.	Claude Code
Vex	agent	Security-leaning. Audits, secrets handling, exposure checks.	Claude Code
Byte	agent	Quick-task agent for low-stakes ops.	OpenClaw
Codex	agent	Code-focused agent on the OpenAI side.	Codex
Mira	chat	Conversational, no tools.	—
Nova	chat	Creative writing partner.	—
Proto	chat	Quick technical questions.	—
Spark	chat	Brainstorming partner.	—

07 A realistic transcript.

Here's a plausible Snark→Caid exchange. Snark is talking to the user (Nick); Nick asks a question that touches code Snark can't see. Snark delegates.

▸

User → Snark:

Hey can you check whether the dispatch route in bawthub still uses the cached prompt template? I think we might have regressed on that.

Snark's turn — internal tool calls

[tool] bots_list_available()
  → [..., {"slug": "caid", "name": "Caid", "bot_type": "agent",
           "agent_backend": "claude-code"}, ...]

[tool] bots_send_message(
    target_bot_id = "caid",
    sender_bot_id = "snark",
    message = (
      "Quick check for me. Look at bawthub's "
      "frontend/src/app/api/agents/tasks/[id]/dispatch/route.ts "
      "— specifically the fetchPromptTemplate function and whether "
      "it's still being called inside buildExecutionPrompt. "
      "Is the 5-minute promptCache still in use? "
      "One line answer is fine."
    ),
    timeout_seconds = 120,
)

Caid's turn — runs in parallel inside the Claude Code bridge

[ToolUseBlock] Read   { file_path: ".../dispatch/route.ts" }
[ToolResultBlock] (full file contents)
[ToolUseBlock] Grep   { pattern: "fetchPromptTemplate", path: ".../dispatch" }
[ToolResultBlock] route.ts:45 + route.ts:187

ASSISTANT_DONE: "Yes, still in use. fetchPromptTemplate() is called from
buildExecutionPrompt() at line 187. promptCache (PROMPT_CACHE_TTL = 5 min)
guards it. Cache entry keyed by template name."

Back in Snark's turn

[tool result for bots_send_message]
  → { success: true,
      content: "Yes, still in use. fetchPromptTemplate() is called ...",
      bot_id: "caid",
      sender: "snark",
      response_model: "claude-sonnet-4-6" }

◆

Snark → User:

No regression. Caid just checked — fetchPromptTemplate() is still called from buildExecutionPrompt() at route.ts:187, and the 5-minute promptCache is still guarding it. The dispatch route hasn't changed shape there.

What the chat UI shows: Snark's bubble streams in normally. Under the bubble, a single nested activity card displays Snark called Caid (claude-sonnet-4-6) — 12.4s. Clicking it expands to show Caid's own tool calls. The user sees one conversation; the system ran two.

08 Useful patterns.

Specialist routing.

A chat-focused bot (Mira, Snark) acts as the user-facing personality and delegates to specialists. Code questions go to Caid; security questions go to Vex; quick search-and-summarize goes to Byte. The router bot synthesizes the final reply in its own voice — the user never sees the raw specialist response unless they ask for it.

Pipeline workflows.

A multi-step content pipeline: nova drafts → proto fact-checks → vex redacts anything sensitive → snark trims and signs off. Each step is a synchronous bots_send_message; the orchestrator passes the previous output forward as the new message argument.

Long-running delegation.

Loopy delegates an architecture audit to Vex as fire_and_forget=True, then creates a task assigned to Vex via tasks_create as the durable handle. Vex writes its findings into the task's response field via tasks_update when done. The two mechanisms compose cleanly: bots_send_message kicks off the work; the task system tracks completion.

09 What inter-bot comms isn't.

Not authenticated. Every bot can call every other bot. There's no per-pair allow-list and no signing. The trust model is: this is a single-tenant personal instance, all bots are equally trusted, and the security boundary is the deployment as a whole. Multi-tenant would need real auth here.
Not streaming. The receiver's response is collected fully before returning to the caller. There's no SSE within the inter-bot path. stream: False is hard-coded in the payload.
Not cross-instance. The dispatch hits http://localhost:8642. Two llm-bawt deployments can't message each other directly; that'd require a real wire protocol.
Not stateful. Each bots_send_message is a fresh user turn from the receiver's perspective. There's no persistent conversation between A and B — to continue a thread, the sender must include the relevant context explicitly in each message. The roles are intentionally always "user-from-bot, assistant-from-bot" not "agent-1 and agent-2 in a shared session."

10 Key files.

src/llm_bawt/mcp_server/server.py

The MCP tools. Look for @mcp.tool(name="bots_send_message") around line 835, bots_list_available around line 947, and _dispatch_bot_message helper around line 767. The _inflight_bot_sends: set module-global at line 764 is the GC-safety net for fire-and-forget tasks.

docs/INTER_BOT_COMMUNICATION.md

The internal doc. Useful for intent but slightly stale — it uses send_message_to_bot and list_available_bots as the tool names, which were the Python function names. The actual MCP-registered names are bots_send_message and bots_list_available. It also doesn't mention fire_and_forget or the timeout-warning shape, both added after the duplicate-turn incident.

src/llm_bawt/api/chat.py

The receiver endpoint. POST /v1/chat/completions — the same endpoint browsers use. Inter-bot calls hit it on localhost:8642, so the receiving bot has full pipeline parity with a user turn.

PreviousOpenClaw bridge NextMCP server

Validated against main on 2026-05-13 Source: llm-bawt agent backends