One event bus. Four backends.
A bot in BawtHub might be a native OpenAI-tool-loop running inside llm-bawt-app, an OpenClaw agent on a remote host, Claude Code spawned by a sidecar, or Codex via a Rust binary. The frontend doesn't know — and it doesn't have to. Every backend publishes the same AgentEvent shape to a Redis Stream, and one SSE endpoint translates that into OpenAI-compatible chat completion chunks.
01 The shape of the problem.
The native tool loop in llm-bawt streams text tokens as they arrive from OpenAI or xAI, executes tools inline, and continues streaming. The OpenClaw gateway runs on a remote host and pushes events over WebSocket. The Claude Code SDK spawns a child process and sends typed events over stdio. Codex talks JSON-RPC 2.0 over stdio to a Rust binary.
Four wildly different upstream protocols. One frontend. The bridges all collapse them into a single AgentEvent dataclass, publish to Redis, and let the main FastAPI app translate to whatever wire format the client wants — currently OpenAI streaming chat completions plus a sidecar event stream for tool activity that survives the page reload.
02 The unifying type: AgentEvent.
Every bridge translates its native upstream notifications into a single dataclass defined in src/agent_bridge/events.py. Nine kinds cover everything the frontend renders:
| Kind | When it fires | Notable fields |
|---|---|---|
RUN_STARTED | Backend accepted the user message and a run is in flight | run_id, model |
ASSISTANT_DELTA | Streamed text token from the assistant | text, seq |
TOOL_START | Agent is about to execute a tool | tool_name, tool_arguments, provider |
TOOL_END | Tool finished — result attached | tool_name, tool_result |
ASSISTANT_DONE | Final assistant message complete | text, token_usage |
USER_MESSAGE | Echoes the user's prompt (passive-subscriber mode) | text, origin |
RUN_COMPLETED | Backend signalled end-of-run (matches run_id) | run_id |
SYSTEM_NOTE | Out-of-band info from the harness (e.g. model switched) | text |
ERROR | Bridge or upstream failure surfaced to the UI | text |
Each event carries a deterministic event_id — SHA-256 of session, kind, payload, and stream sequence — so the Postgres event store can dedupe on replay, and a provider string ("claude-code", "codex", "openclaw") stamped at the publish boundary. The provider is what the UI uses to dispatch tool rendering, since each harness has its own native tool vocabulary.
Because the alternative is N×M: every frontend renderer would need to know the quirks of every backend. With the unified shape, the Claude tool-call card component in frontend/src/.../ClaudeToolCallCard.tsx renders Codex's commandExecution items the same way it renders Claude Code's Bash tool calls — the codex bridge aliases its event types to Claude tool names before publishing. Provider-aware dispatch (TASK-212) is the longer-term plan, but the unified shape buys today's UI a lot of leverage.
03 Redis stream keyspace.
The bridges and the main app coordinate through four families of Redis Streams, defined in openclaw_bridge/publisher.py:
| Stream key | Producers | Consumers | Maxlen |
|---|---|---|---|
agent:commands | llm-bawt-app (chat.send, chat.abort, RPCs) | All three bridges (filtered by backend field) | (implicit) |
agent:events:{session_key} | Bridges | FanoutHub + EventStore | 10,000 |
agent:run:{request_id} | Bridges | The single SSE generator for that turn | 5,000, 5-min TTL |
events:{bot_id}:{user_id} | Main app worker thread + bridges | Frontend EventSource (one per user) | 5,000 |
agent:rpc:{request_id} | Bridges | Main app awaiting RPC reply | 10, 60-sec TTL |
agent:history | Bridges | Main app history persister | 1,000 |
The chat.send command on agent:commands carries a backend field; each bridge runs in its own consumer group (bridge, claude-code-bridge, codex-bridge) and ACKs commands that don't match its backend without processing them. This is what lets all three bridges hang off the same stream without stepping on each other.
The per-session stream (agent:events:{session_key}) is the canonical event log; the per-request stream (agent:run:{request_id}) is a short-lived response channel scoped to one HTTP request. The SSE handler in chat_streaming.py subscribes to the per-request stream so it doesn't accidentally pick up unrelated events from a long-lived session.
04 Three SSE spouts, not one.
The frontend actually consumes events over three different transports — each one solves a different problem:
1. OpenAI-compatible chat completion SSE
The primary chat stream. POST /v1/chat/completions with stream: true returns text/event-stream chunks in the standard OpenAI shape: {"choices":[{"delta":{"content":"..."}}]}. Tool calls come through as delta.tool_calls entries. This is what most third-party OpenAI clients can already consume — pointing the official Python client at http://localhost:8642/v1 Just Works.
2. Per-bot/per-user event stream
An EventSource connection scoped to events:{bot_id}:{user_id}. Tool start/end pairs, turn-complete signals, and animation hints flow here. The frontend uses this to drive the tool activity cards under the originating user message — bucketed by trigger_message_id, which every bridge stamps onto every event. The connection is in useUnifiedEventStream.ts and reconnects on visibility/focus/online.
3. Diagnostic gateway feed (admin-only)
A separate SSE endpoint at /v1/openclaw/ws exposes raw AgentEvents for the gateway dashboard at /tools/openclaw. It's debug instrumentation, not the user-facing surface.
The chat completions stream is bound to one HTTP request. If the client disconnects (page reload, network hiccup), that stream dies — but the turn keeps running on the backend. Tool events published only to the per-request stream would be lost. So every bridge also publishes tool start/end and the final turn_complete sentinel to events:{bot_id}:{user_id}, which the frontend's persistent EventSource picks up. Reload the page mid-stream and the tool cards repopulate from the unified event stream while the assistant text repopulates from the turn-log API.
05 Lifecycle of a single turn.
Tracing what happens when a user sends a message to a Claude Code bot:
- Frontend POSTs to
/api/chat/proxy/v1/chat/completionswithstream: trueand a frontend-generateduser_message_idUUID. - BawtHub backend proxies to
llm-bawtathttp://host.docker.internal:8642. ChatStreamingMixin.chat_completion_streampersists a turn-log row withstatus="streaming", builds the message list (history + memory + system prompt), and dispatches viaAgentBackendClient.ClaudeCodeBackendwrites achat.sendcommand toagent:commandswithbackend: "claude-code"and a newrequest_id.- The Claude Code bridge picks it up (filtering by backend field), spawns or reuses an SDK
query(), and the bundled Claude binary handles the conversation. - SDK events stream back: assistant deltas, tool calls, tool results. The bridge translates each into
AgentEvents and publishes toagent:run:{request_id}and the per-session stream and (for tool events specifically) the unifiedevents:{bot_id}:{user_id}stream. - The main app's SSE generator reads from the per-request stream and translates each event into a chat completion chunk. Tool starts/ends are also re-published to the unified event stream from the main app's worker thread — that way they reach Redis even if the SSE generator is cancelled because the user closed the tab.
ASSISTANT_DONE+RUN_COMPLETEDarrive; the SSE generator emits afinish_reason: "stop"chunk and a[DONE]sentinel.- A
turn_completeevent is published to the unified stream withstatus,animation, andtoken_usage. The frontend uses this to flip the assistant bubble from streaming to final. - The turn-log row is finalized with the full response text, tool call details, and elapsed time.
06 Aborts and session keys.
Aborting a turn is a separate RPC: POST /v1/chat/abort with a turn_id. The handler looks up the agent_session_key stamped on the turn-log row and writes a chat.abort command to agent:commands. Each bridge tracks active streams keyed by that session_key.
There's a subtle wrinkle: the routing key for aborts is not the same as the persisted session_key for Claude Code and Codex. Those backends use SDK-internal thread/session UUIDs in agent_backend_config.session_key, while abort routing uses f"{bot_id}:{user_id}". OpenClaw uses the persisted session_key directly because its gateway-side abstraction matches that key 1:1. The chat-streaming code in chat_streaming.py resolves this per-backend before persisting the turn log.
07 Idempotency and replay.
Events are deduped at two layers:
- Redis Stream IDs —
XADDreturns a monotonic ID, and the Postgres event store keeps a per-session cursor (agent_session_state.last_event_id) so a restart replays only the gap, not the whole history. - Event dedupe key — the
event_idinAgentEventis a SHA-256 hash of{session_key}:{kind}:{canonical_payload}:{stream_seq}. Theagent_eventstable has aUNIQUEconstraint onevent_dedupe_keywithON CONFLICT DO NOTHING, so even a double-publish from a re-spawned bridge is harmless.
08 The in-process FanoutHub.
Before Redis, llm-bawt used an in-process FanoutHub (agent_bridge_fanout.py) — a dict of asyncio.Queue subscribers per session. It still exists for the in-process WebSocket gateway path: the WS client receives an event, persists it via the store, and broadcasts to local subscribers. With Redis enabled, that path is mostly inert; the main app subscribes via RedisSubscriber instead.
The FanoutHub is still useful for tests and for the rare case where someone runs llm-bawt without Redis. Both paths produce the same AgentEvents, so the consumers don't care which is wired up.
09 Token accounting passes through too.
Each backend reports per-turn token usage differently. Claude Code's SDK exposes a ResultMessage with usage + modelUsage. Codex doesn't surface cost at all. The OpenAI native loop captures usage from the upstream stream's final chunk. All of them land on ASSISTANT_DONE.token_usage as the same dict shape:
{
"input_tokens": 1247,
"cache_read_tokens": 892,
"cache_creation_tokens": 0,
"output_tokens": 318,
"context_window": 200000,
"total_cost_usd": null
}
The frontend reads it off the turn_complete sidecar event and renders the pill in the lower-right of the assistant bubble. total_cost_usd is None for Codex turns because the upstream doesn't report it — that's a known asymmetry.
10 Key files.
src/agent_bridge/events.pyAgentEvent dataclass, AgentEventKind enum, deterministic event_id synthesizer. Every bridge imports this directly.src/agent_bridge/publisher.pyagent:events:, agent:run:, agent:commands, events:) and the maxlen caps. Stamps the provider field at publish time.src/agent_bridge/subscriber.pysrc/agent_bridge/store.pyagent_events, agent_session_state, agent_runs. Dedupes by event_dedupe_key, tracks per-session cursors for replay.src/llm_bawt/service/chat_streaming.pyChatStreamingMixin bridges native-loop streaming, OpenClaw bridge streaming, and the OpenAI chat.completion.chunk wire format. ~1,500 lines; the longest single file in the streaming path.src/claude_code_bridge/bridge.pyagent:commands with backend=claude-code, drives the Agent SDK, maps each SDK item to an AgentEvent.src/codex_bridge/bridge.pycommandExecution, fileChange, etc.) to Claude tool names so the existing UI cards render them.bawthub/frontend/src/app/chat/useUnifiedEventStream.tsEventSource per logged-in user, watchdog-reconnects on visibility/focus/online/idle. Dispatches tool_event and turn_complete through a late-bound handler table so HMR edits land without forcing a reconnect.