BawtHub
⌕ Search ⌘K Source ↗ Open app →
cross-cutting · streaming

One event bus. Four backends.

A bot in BawtHub might be a native OpenAI-tool-loop running inside llm-bawt-app, an OpenClaw agent on a remote host, Claude Code spawned by a sidecar, or Codex via a Rust binary. The frontend doesn't know — and it doesn't have to. Every backend publishes the same AgentEvent shape to a Redis Stream, and one SSE endpoint translates that into OpenAI-compatible chat completion chunks.

Transport: Redis 7 Streams (redis:7-alpine) Wire format: OpenAI chat.completion.chunk SSE Out-of-band events: EventSource at /api/chat/proxy/v1/ws

01 The shape of the problem.

The native tool loop in llm-bawt streams text tokens as they arrive from OpenAI or xAI, executes tools inline, and continues streaming. The OpenClaw gateway runs on a remote host and pushes events over WebSocket. The Claude Code SDK spawns a child process and sends typed events over stdio. Codex talks JSON-RPC 2.0 over stdio to a Rust binary.

Four wildly different upstream protocols. One frontend. The bridges all collapse them into a single AgentEvent dataclass, publish to Redis, and let the main FastAPI app translate to whatever wire format the client wants — currently OpenAI streaming chat completions plus a sidecar event stream for tool activity that survives the page reload.

Provider fan-in · backend fan-out
Native loop
OpenAI SSExAI / GrokIn-processDirect → queue
OpenClaw
WebSocket → remote gatewayAnthropic via gatewayEd25519 device auth
Claude Code
claude-agent-sdkstdio childOAuth subscription
Codex
openai-codex-sdkRust binary, JSON-RPCChatGPT OAuth

02 The unifying type: AgentEvent.

Every bridge translates its native upstream notifications into a single dataclass defined in src/agent_bridge/events.py. Nine kinds cover everything the frontend renders:

KindWhen it firesNotable fields
RUN_STARTEDBackend accepted the user message and a run is in flightrun_id, model
ASSISTANT_DELTAStreamed text token from the assistanttext, seq
TOOL_STARTAgent is about to execute a tooltool_name, tool_arguments, provider
TOOL_ENDTool finished — result attachedtool_name, tool_result
ASSISTANT_DONEFinal assistant message completetext, token_usage
USER_MESSAGEEchoes the user's prompt (passive-subscriber mode)text, origin
RUN_COMPLETEDBackend signalled end-of-run (matches run_id)run_id
SYSTEM_NOTEOut-of-band info from the harness (e.g. model switched)text
ERRORBridge or upstream failure surfaced to the UItext

Each event carries a deterministic event_id — SHA-256 of session, kind, payload, and stream sequence — so the Postgres event store can dedupe on replay, and a provider string ("claude-code", "codex", "openclaw") stamped at the publish boundary. The provider is what the UI uses to dispatch tool rendering, since each harness has its own native tool vocabulary.

Why one shape across four wildly different protocols?

Because the alternative is N×M: every frontend renderer would need to know the quirks of every backend. With the unified shape, the Claude tool-call card component in frontend/src/.../ClaudeToolCallCard.tsx renders Codex's commandExecution items the same way it renders Claude Code's Bash tool calls — the codex bridge aliases its event types to Claude tool names before publishing. Provider-aware dispatch (TASK-212) is the longer-term plan, but the unified shape buys today's UI a lot of leverage.

03 Redis stream keyspace.

The bridges and the main app coordinate through four families of Redis Streams, defined in openclaw_bridge/publisher.py:

Stream keyProducersConsumersMaxlen
agent:commandsllm-bawt-app (chat.send, chat.abort, RPCs)All three bridges (filtered by backend field)(implicit)
agent:events:{session_key}BridgesFanoutHub + EventStore10,000
agent:run:{request_id}BridgesThe single SSE generator for that turn5,000, 5-min TTL
events:{bot_id}:{user_id}Main app worker thread + bridgesFrontend EventSource (one per user)5,000
agent:rpc:{request_id}BridgesMain app awaiting RPC reply10, 60-sec TTL
agent:historyBridgesMain app history persister1,000

The chat.send command on agent:commands carries a backend field; each bridge runs in its own consumer group (bridge, claude-code-bridge, codex-bridge) and ACKs commands that don't match its backend without processing them. This is what lets all three bridges hang off the same stream without stepping on each other.

The per-session stream (agent:events:{session_key}) is the canonical event log; the per-request stream (agent:run:{request_id}) is a short-lived response channel scoped to one HTTP request. The SSE handler in chat_streaming.py subscribes to the per-request stream so it doesn't accidentally pick up unrelated events from a long-lived session.

04 Three SSE spouts, not one.

The frontend actually consumes events over three different transports — each one solves a different problem:

1. OpenAI-compatible chat completion SSE

The primary chat stream. POST /v1/chat/completions with stream: true returns text/event-stream chunks in the standard OpenAI shape: {"choices":[{"delta":{"content":"..."}}]}. Tool calls come through as delta.tool_calls entries. This is what most third-party OpenAI clients can already consume — pointing the official Python client at http://localhost:8642/v1 Just Works.

2. Per-bot/per-user event stream

An EventSource connection scoped to events:{bot_id}:{user_id}. Tool start/end pairs, turn-complete signals, and animation hints flow here. The frontend uses this to drive the tool activity cards under the originating user message — bucketed by trigger_message_id, which every bridge stamps onto every event. The connection is in useUnifiedEventStream.ts and reconnects on visibility/focus/online.

3. Diagnostic gateway feed (admin-only)

A separate SSE endpoint at /v1/openclaw/ws exposes raw AgentEvents for the gateway dashboard at /tools/openclaw. It's debug instrumentation, not the user-facing surface.

Why the dual-channel design (chat stream + sidecar event stream)?

The chat completions stream is bound to one HTTP request. If the client disconnects (page reload, network hiccup), that stream dies — but the turn keeps running on the backend. Tool events published only to the per-request stream would be lost. So every bridge also publishes tool start/end and the final turn_complete sentinel to events:{bot_id}:{user_id}, which the frontend's persistent EventSource picks up. Reload the page mid-stream and the tool cards repopulate from the unified event stream while the assistant text repopulates from the turn-log API.

05 Lifecycle of a single turn.

Tracing what happens when a user sends a message to a Claude Code bot:

  1. Frontend POSTs to /api/chat/proxy/v1/chat/completions with stream: true and a frontend-generated user_message_id UUID.
  2. BawtHub backend proxies to llm-bawt at http://host.docker.internal:8642.
  3. ChatStreamingMixin.chat_completion_stream persists a turn-log row with status="streaming", builds the message list (history + memory + system prompt), and dispatches via AgentBackendClient.
  4. ClaudeCodeBackend writes a chat.send command to agent:commands with backend: "claude-code" and a new request_id.
  5. The Claude Code bridge picks it up (filtering by backend field), spawns or reuses an SDK query(), and the bundled Claude binary handles the conversation.
  6. SDK events stream back: assistant deltas, tool calls, tool results. The bridge translates each into AgentEvents and publishes to agent:run:{request_id} and the per-session stream and (for tool events specifically) the unified events:{bot_id}:{user_id} stream.
  7. The main app's SSE generator reads from the per-request stream and translates each event into a chat completion chunk. Tool starts/ends are also re-published to the unified event stream from the main app's worker thread — that way they reach Redis even if the SSE generator is cancelled because the user closed the tab.
  8. ASSISTANT_DONE + RUN_COMPLETED arrive; the SSE generator emits a finish_reason: "stop" chunk and a [DONE] sentinel.
  9. A turn_complete event is published to the unified stream with status, animation, and token_usage. The frontend uses this to flip the assistant bubble from streaming to final.
  10. The turn-log row is finalized with the full response text, tool call details, and elapsed time.

06 Aborts and session keys.

Aborting a turn is a separate RPC: POST /v1/chat/abort with a turn_id. The handler looks up the agent_session_key stamped on the turn-log row and writes a chat.abort command to agent:commands. Each bridge tracks active streams keyed by that session_key.

There's a subtle wrinkle: the routing key for aborts is not the same as the persisted session_key for Claude Code and Codex. Those backends use SDK-internal thread/session UUIDs in agent_backend_config.session_key, while abort routing uses f"{bot_id}:{user_id}". OpenClaw uses the persisted session_key directly because its gateway-side abstraction matches that key 1:1. The chat-streaming code in chat_streaming.py resolves this per-backend before persisting the turn log.

07 Idempotency and replay.

Events are deduped at two layers:

08 The in-process FanoutHub.

Before Redis, llm-bawt used an in-process FanoutHub (agent_bridge_fanout.py) — a dict of asyncio.Queue subscribers per session. It still exists for the in-process WebSocket gateway path: the WS client receives an event, persists it via the store, and broadcasts to local subscribers. With Redis enabled, that path is mostly inert; the main app subscribes via RedisSubscriber instead.

The FanoutHub is still useful for tests and for the rare case where someone runs llm-bawt without Redis. Both paths produce the same AgentEvents, so the consumers don't care which is wired up.

09 Token accounting passes through too.

Each backend reports per-turn token usage differently. Claude Code's SDK exposes a ResultMessage with usage + modelUsage. Codex doesn't surface cost at all. The OpenAI native loop captures usage from the upstream stream's final chunk. All of them land on ASSISTANT_DONE.token_usage as the same dict shape:

{
  "input_tokens": 1247,
  "cache_read_tokens": 892,
  "cache_creation_tokens": 0,
  "output_tokens": 318,
  "context_window": 200000,
  "total_cost_usd": null
}

The frontend reads it off the turn_complete sidecar event and renders the pill in the lower-right of the assistant bubble. total_cost_usd is None for Codex turns because the upstream doesn't report it — that's a known asymmetry.

10 Key files.

src/agent_bridge/events.py
The shared event type. AgentEvent dataclass, AgentEventKind enum, deterministic event_id synthesizer. Every bridge imports this directly.
src/agent_bridge/publisher.py
Redis Streams publisher. Defines the four stream key prefixes (agent:events:, agent:run:, agent:commands, events:) and the maxlen caps. Stamps the provider field at publish time.
src/agent_bridge/subscriber.py
Consumer-side. Reads per-session and per-request streams, replays gaps from Postgres on subscriber attach.
src/agent_bridge/store.py
Postgres event store. agent_events, agent_session_state, agent_runs. Dedupes by event_dedupe_key, tracks per-session cursors for replay.
src/llm_bawt/service/chat_streaming.py
Where everything converges. The ChatStreamingMixin bridges native-loop streaming, OpenClaw bridge streaming, and the OpenAI chat.completion.chunk wire format. ~1,500 lines; the longest single file in the streaming path.
src/claude_code_bridge/bridge.py
Claude Code translation layer. Subscribes to agent:commands with backend=claude-code, drives the Agent SDK, maps each SDK item to an AgentEvent.
src/codex_bridge/bridge.py
Codex translation layer. Same role for Codex. Aliases Codex's native tool types (commandExecution, fileChange, etc.) to Claude tool names so the existing UI cards render them.
bawthub/frontend/src/app/chat/useUnifiedEventStream.ts
Frontend SSE consumer. One persistent EventSource per logged-in user, watchdog-reconnects on visibility/focus/online/idle. Dispatches tool_event and turn_complete through a late-bound handler table so HMR edits land without forcing a reconnect.
Validated against main on 2026-05-13 Source: llm-bawt + bawthub