cross-cutting · streaming

One event bus. Four backends.

A bot in BawtHub might be a native OpenAI-tool-loop running inside llm-bawt-app, an OpenClaw agent on a remote host, Claude Code spawned by a sidecar, or Codex via a Rust binary. The frontend doesn't know — and it doesn't have to. Every backend publishes the same AgentEvent shape to a Redis Stream, and one SSE endpoint translates that into OpenAI-compatible chat completion chunks.

Transport: Redis 7 Streams (redis:7-alpine) Wire format: OpenAI chat.completion.chunk SSE Out-of-band events: EventSource at /api/chat/proxy/v1/ws

01 The shape of the problem.

The native tool loop in llm-bawt streams text tokens as they arrive from OpenAI or xAI, executes tools inline, and continues streaming. The OpenClaw gateway runs on a remote host and pushes events over WebSocket. The Claude Code SDK spawns a child process and sends typed events over stdio. Codex talks JSON-RPC 2.0 over stdio to a Rust binary.

Four wildly different upstream protocols. One frontend. The bridges all collapse them into a single AgentEvent dataclass, publish to Redis, and let the main FastAPI app translate to whatever wire format the client wants — currently OpenAI streaming chat completions plus a sidecar event stream for tool activity that survives the page reload.

Provider fan-in · backend fan-out

Native loop

OpenAI SSExAI / GrokIn-processDirect → queue

OpenClaw

WebSocket → remote gatewayAnthropic via gatewayEd25519 device auth

Claude Code

claude-agent-sdkstdio childOAuth subscription

Codex

openai-codex-sdkRust binary, JSON-RPCChatGPT OAuth

02 The unifying type: `AgentEvent`.

Every bridge translates its native upstream notifications into a single dataclass defined in src/agent_bridge/events.py. Nine kinds cover everything the frontend renders:

Kind	When it fires	Notable fields
`RUN_STARTED`	Backend accepted the user message and a run is in flight	`run_id`, `model`
`ASSISTANT_DELTA`	Streamed text token from the assistant	`text`, `seq`
`TOOL_START`	Agent is about to execute a tool	`tool_name`, `tool_arguments`, `provider`
`TOOL_END`	Tool finished — result attached	`tool_name`, `tool_result`
`ASSISTANT_DONE`	Final assistant message complete	`text`, `token_usage`
`USER_MESSAGE`	Echoes the user's prompt (passive-subscriber mode)	`text`, `origin`
`RUN_COMPLETED`	Backend signalled end-of-run (matches `run_id`)	`run_id`
`SYSTEM_NOTE`	Out-of-band info from the harness (e.g. model switched)	`text`
`ERROR`	Bridge or upstream failure surfaced to the UI	`text`

Each event carries a deterministic event_id — SHA-256 of session, kind, payload, and stream sequence — so the Postgres event store can dedupe on replay, and a provider string ("claude-code", "codex", "openclaw") stamped at the publish boundary. The provider is what the UI uses to dispatch tool rendering, since each harness has its own native tool vocabulary.

✦

Why one shape across four wildly different protocols?

Because the alternative is N×M: every frontend renderer would need to know the quirks of every backend. With the unified shape, the Claude tool-call card component in frontend/src/.../ClaudeToolCallCard.tsx renders Codex's commandExecution items the same way it renders Claude Code's Bash tool calls — the codex bridge aliases its event types to Claude tool names before publishing. Provider-aware dispatch (TASK-212) is the longer-term plan, but the unified shape buys today's UI a lot of leverage.

03 Redis stream keyspace.

The bridges and the main app coordinate through four families of Redis Streams, defined in openclaw_bridge/publisher.py:

Stream key	Producers	Consumers	Maxlen
`agent:commands`	`llm-bawt-app` (chat.send, chat.abort, RPCs)	All three bridges (filtered by `backend` field)	(implicit)
`agent:events:{session_key}`	Bridges	FanoutHub + EventStore	10,000
`agent:run:{request_id}`	Bridges	The single SSE generator for that turn	5,000, 5-min TTL
`events:{bot_id}:{user_id}`	Main app worker thread + bridges	Frontend `EventSource` (one per user)	5,000
`agent:rpc:{request_id}`	Bridges	Main app awaiting RPC reply	10, 60-sec TTL
`agent:history`	Bridges	Main app history persister	1,000

The chat.send command on agent:commands carries a backend field; each bridge runs in its own consumer group (bridge, claude-code-bridge, codex-bridge) and ACKs commands that don't match its backend without processing them. This is what lets all three bridges hang off the same stream without stepping on each other.

The per-session stream (agent:events:{session_key}) is the canonical event log; the per-request stream (agent:run:{request_id}) is a short-lived response channel scoped to one HTTP request. The SSE handler in chat_streaming.py subscribes to the per-request stream so it doesn't accidentally pick up unrelated events from a long-lived session.

04 Three SSE spouts, not one.

The frontend actually consumes events over three different transports — each one solves a different problem:

1. OpenAI-compatible chat completion SSE

The primary chat stream. POST /v1/chat/completions with stream: true returns text/event-stream chunks in the standard OpenAI shape: {"choices":[{"delta":{"content":"..."}}]}. Tool calls come through as delta.tool_calls entries. This is what most third-party OpenAI clients can already consume — pointing the official Python client at http://localhost:8642/v1 Just Works.

2. Per-bot/per-user event stream

An EventSource connection scoped to events:{bot_id}:{user_id}. Tool start/end pairs, turn-complete signals, and animation hints flow here. The frontend uses this to drive the tool activity cards under the originating user message — bucketed by trigger_message_id, which every bridge stamps onto every event. The connection is in useUnifiedEventStream.ts and reconnects on visibility/focus/online.

3. Diagnostic gateway feed (admin-only)

A separate SSE endpoint at /v1/openclaw/ws exposes raw AgentEvents for the gateway dashboard at /tools/openclaw. It's debug instrumentation, not the user-facing surface.

⚠

Why the dual-channel design (chat stream + sidecar event stream)?

The chat completions stream is bound to one HTTP request. If the client disconnects (page reload, network hiccup), that stream dies — but the turn keeps running on the backend. Tool events published only to the per-request stream would be lost. So every bridge also publishes tool start/end and the final turn_complete sentinel to events:{bot_id}:{user_id}, which the frontend's persistent EventSource picks up. Reload the page mid-stream and the tool cards repopulate from the unified event stream while the assistant text repopulates from the turn-log API.

05 Lifecycle of a single turn.

Tracing what happens when a user sends a message to a Claude Code bot:

Frontend POSTs to /api/chat/proxy/v1/chat/completions with stream: true and a frontend-generated user_message_id UUID.
BawtHub backend proxies to llm-bawt at http://host.docker.internal:8642.
ChatStreamingMixin.chat_completion_stream persists a turn-log row with status="streaming", builds the message list (history + memory + system prompt), and dispatches via AgentBackendClient.
ClaudeCodeBackend writes a chat.send command to agent:commands with backend: "claude-code" and a new request_id.
The Claude Code bridge picks it up (filtering by backend field), spawns or reuses an SDK query(), and the bundled Claude binary handles the conversation.
SDK events stream back: assistant deltas, tool calls, tool results. The bridge translates each into AgentEvents and publishes to agent:run:{request_id} and the per-session stream and (for tool events specifically) the unified events:{bot_id}:{user_id} stream.
The main app's SSE generator reads from the per-request stream and translates each event into a chat completion chunk. Tool starts/ends are also re-published to the unified event stream from the main app's worker thread — that way they reach Redis even if the SSE generator is cancelled because the user closed the tab.
ASSISTANT_DONE + RUN_COMPLETED arrive; the SSE generator emits a finish_reason: "stop" chunk and a [DONE] sentinel.
A turn_complete event is published to the unified stream with status, animation, and token_usage. The frontend uses this to flip the assistant bubble from streaming to final.
The turn-log row is finalized with the full response text, tool call details, and elapsed time.

06 Aborts and session keys.

Aborting a turn is a separate RPC: POST /v1/chat/abort with a turn_id. The handler looks up the agent_session_key stamped on the turn-log row and writes a chat.abort command to agent:commands. Each bridge tracks active streams keyed by that session_key.

There's a subtle wrinkle: the routing key for aborts is not the same as the persisted session_key for Claude Code and Codex. Those backends use SDK-internal thread/session UUIDs in agent_backend_config.session_key, while abort routing uses f"{bot_id}:{user_id}". OpenClaw uses the persisted session_key directly because its gateway-side abstraction matches that key 1:1. The chat-streaming code in chat_streaming.py resolves this per-backend before persisting the turn log.

07 Idempotency and replay.

Events are deduped at two layers:

Redis Stream IDs — XADD returns a monotonic ID, and the Postgres event store keeps a per-session cursor (agent_session_state.last_event_id) so a restart replays only the gap, not the whole history.
Event dedupe key — the event_id in AgentEvent is a SHA-256 hash of {session_key}:{kind}:{canonical_payload}:{stream_seq}. The agent_events table has a UNIQUE constraint on event_dedupe_key with ON CONFLICT DO NOTHING, so even a double-publish from a re-spawned bridge is harmless.

08 The in-process `FanoutHub`.

Before Redis, llm-bawt used an in-process FanoutHub (agent_bridge_fanout.py) — a dict of asyncio.Queue subscribers per session. It still exists for the in-process WebSocket gateway path: the WS client receives an event, persists it via the store, and broadcasts to local subscribers. With Redis enabled, that path is mostly inert; the main app subscribes via RedisSubscriber instead.

The FanoutHub is still useful for tests and for the rare case where someone runs llm-bawt without Redis. Both paths produce the same AgentEvents, so the consumers don't care which is wired up.

09 Token accounting passes through too.

Each backend reports per-turn token usage differently. Claude Code's SDK exposes a ResultMessage with usage + modelUsage. Codex doesn't surface cost at all. The OpenAI native loop captures usage from the upstream stream's final chunk. All of them land on ASSISTANT_DONE.token_usage as the same dict shape:

{
  "input_tokens": 1247,
  "cache_read_tokens": 892,
  "cache_creation_tokens": 0,
  "output_tokens": 318,
  "context_window": 200000,
  "total_cost_usd": null
}

The frontend reads it off the turn_complete sidecar event and renders the pill in the lower-right of the assistant bubble. total_cost_usd is None for Codex turns because the upstream doesn't report it — that's a known asymmetry.

10 Key files.

src/agent_bridge/events.py

The shared event type. AgentEvent dataclass, AgentEventKind enum, deterministic event_id synthesizer. Every bridge imports this directly.

src/agent_bridge/publisher.py

Redis Streams publisher. Defines the four stream key prefixes (agent:events:, agent:run:, agent:commands, events:) and the maxlen caps. Stamps the provider field at publish time.

src/agent_bridge/subscriber.py

Consumer-side. Reads per-session and per-request streams, replays gaps from Postgres on subscriber attach.

src/agent_bridge/store.py

Postgres event store. agent_events, agent_session_state, agent_runs. Dedupes by event_dedupe_key, tracks per-session cursors for replay.

src/llm_bawt/service/chat_streaming.py

Where everything converges. The ChatStreamingMixin bridges native-loop streaming, OpenClaw bridge streaming, and the OpenAI chat.completion.chunk wire format. ~1,500 lines; the longest single file in the streaming path.

src/claude_code_bridge/bridge.py

Claude Code translation layer. Subscribes to agent:commands with backend=claude-code, drives the Agent SDK, maps each SDK item to an AgentEvent.

src/codex_bridge/bridge.py

Codex translation layer. Same role for Codex. Aliases Codex's native tool types (commandExecution, fileChange, etc.) to Claude tool names so the existing UI cards render them.

bawthub/frontend/src/app/chat/useUnifiedEventStream.ts

Frontend SSE consumer. One persistent EventSource per logged-in user, watchdog-reconnects on visibility/focus/online/idle. Dispatches tool_event and turn_complete through a late-bound handler table so HMR edits land without forcing a reconnect.

PreviousArchitecture overview NextData + schema

Validated against main on 2026-05-13 Source: llm-bawt + bawthub