BawtHub · overview

The face of the bawts.

BawtHub is everything that isn't the LLM. It's the web UI you talk to, the voice pipeline that turns microphone bytes into transcripts and TTS audio back into your speakers, the 3D avatar that mouths the words, and the dashboards for managing memory, bots, agents, Unraid, and a thousand small operational details. llm-bawt does the thinking. BawtHub is the body.

Frontend: Next.js 16.1.6 · React 19.2 · Tailwind 4.2 Backend: Python 3.12 · FastAPI · fastrtc 0.0.23 Front door: Traefik on :80

01 Two services, one origin.

BawtHub is split across two processes that share a hostname. A Next.js app handles every page, every REST proxy, and every Server-Sent Event stream to llm-bawt. A Python FastAPI service handles the realtime voice pipeline — microphone WebSocket in, Opus audio WebSocket out — plus a few admin endpoints that need direct access to the GPU services (TTS module switching, container restarts, Home Assistant pulls). Traefik routes between them on a per-path basis.

Runtime topology · single host, five containers

Edge

traefik :80priority-based path routing12+ host aliases

Web

frontend :3000Next.js 16 + HMR//api/chat/*/api/unraid/*/api/agents/*/api/avatar/*

Voice

backend :80FastAPI + fastrtc/v1/ws/v1/health/v1/tts/*

GPU

stt :8090tts :8089Moshi · Kyutaimsgpack/ws

Upstream

llm-bawt :8642separate repoOpenAI-compatSSE turns + tool events

The frontend is the orchestrator — every chat message, every memory query, every agent task dispatch goes through a Next.js API route that proxies to llm-bawt. The Python backend exists for one reason: realtime audio doesn't belong in a Node.js process. WebSockets to STT and TTS, Opus codec handling, and the turn-by-turn voice state machine all live in Python where fastrtc, sphn, and the Moshi msgpack protocol are first-class.

02 Traefik does the splitting.

The compose stack runs a single Traefik instance on port 80. Path rules send /api/chat/*, /api/unraid/*, /api/agents/*, /api/avatar/*, /api/notifications/*, /api/preferences/*, /api/llm/*, /api/docker/*, /api/bot-colors/*, /api/camera-presets/*, /api/uploads/*, /api/media/*, and /api/clog to the Next.js container with router priority 200. A backend catch-all on PathPrefix(/api) at priority 100 mops up everything else — the voice WS, TTS admin, health checks — after stripping the /api/ prefix so the FastAPI app sees clean paths.

✦

Adding a new /api/ route is a Traefik decision, not a Next decision.

If you forget to add a new prefix to the frontend's priority-200 router, the Python backend's catch-all will swallow the request and you'll get a confused 404 from the wrong service. The compose file has a long banner comment about this. docker inspect bawthub-frontend-1 | grep traefik.http.routers.frontend- is the verification step.

The same container serves a dozen host aliases — echo.ferreri.us, echo.zenoran.com, app.bawthub.com, and matching echo.lan.* + dev.echo.lan.* internal names. There's no host-based service selection; the router is host-listed but path-driven. An optional frontend-prod snapshot container behind the snapshot compose profile serves snapshot.echo.lan.* with a baked image when you need a stable build to A/B against the live HMR container.

03 What the frontend looks like.

The Next.js app is organized by feature, not by component type. Almost everything lives under src/app/ using App Router conventions. The two main route groups split the layout in half:

Group	Layout	Pages
`(app)/(dashboard)`	Sidebar nav, chrome, sub-nav strip	Home tiles, tools/, agents/, docker, studio, unraid
`(app)/(fullscreen)`	No chrome — edge to edge	chat, voice, avatar

Beyond pages, the top level of src/app/ is a flat surface of orchestrator components (BawtHub.tsx, AnimatedOrb.tsx, AvatarViewer.tsx, FloatingTranscript.tsx), Zustand stores (useAppStore.ts, useAvatarStore.ts), and audio hooks (useAudioProcessor.ts, useRealtimeAudioOutput.ts, useSpeechRecognition.ts). The chat surface gets its own folder with the big ChatUI.tsx (3,500+ lines), a per-bot useChatStore, and a unified SSE event stream hook.

04 State, in three categories.

BawtHub doesn't pick one state pattern. It uses three, deliberately:

Zustand for cross-page selections (selected bot, user, voice id, voice provider, avatar model) and for the per-bot chat state. Persisted to localStorage under bawthub-app-settings so the bot you were talking to is still selected on hard reload.
TanStack React Query 5 for server state — bot lists, model registries, memory dashboards, turn logs. Cache-driven, with stale-while-revalidate semantics.
Prisma 7 + PostgreSQL for persistent UI state — Unraid container groups, avatar settings, bone mappings, agent projects/tasks, notifications, user preferences, bot color overrides. This is the only database BawtHub owns directly; llm-bawt's Postgres is a separate logical store (and may be a separate physical database).

A PreferencesContext reconciles the persisted Zustand state with the server preferences on first load, treating the server as authoritative but never blocking the UI on the async fetch.

05 What the Python backend does.

The Python service is small by design — about 1,500 lines in main_websocket.py plus an 885-line handler.py that owns the voice state machine. Its job is:

Accept the browser's microphone WebSocket. Decode Opus into 24 kHz float frames.
Stream those frames to the Kyutai Moshi STT container over msgpack/WebSocket. Read back word + pause-prediction messages.
When the STT signals a pause (and a configurable VAD threshold confirms it), flush the audio buffer, hand the transcript to llm-bawt's chat completions endpoint, and stream the response word-by-word.
Forward each word to a TTS provider (Moshi, Azure, xAI Grok, or Kokoro adapter — chosen by voice ID). Stream the audio chunks back to the browser as Opus frames over the same WebSocket.
Handle interruptions: if the STT-VAD detects user speech while the bot is talking, cancel the LLM stream and stop the TTS mid-word.

It also exposes admin endpoints — /v1/voices, /v1/tts/admin/config, /v1/tts/admin/restart, /v1/tts/preview, /v1/health — that the frontend's TTS admin page calls. Container restarts go through a small shell script (restart_tts_container.sh) that talks to the mounted Docker socket so the backend can recycle the TTS worker after a module-type switch.

06 How it consumes llm-bawt.

llm-bawt is the LLM orchestration layer — chat completions, memory, tools, agent bridges, bot personalities. It runs in its own Docker stack on port 8642, exposes an OpenAI-compatible API at /v1/*, and is reached from both halves of BawtHub via BAWTHUB_LLM_URL=http://host.docker.internal:8642.

frontend/src/app/api/chat/*

The proxy floor. Every chat-flavored endpoint — bots, models, history, memory, summaries, tool-calls, turn-logs — has a thin Next.js route that forwards to llm-bawt's /v1/*. Headers, auth, and SSE pass-through happen here. The proxy/[...path] catch-all handles long-tail endpoints not worth a dedicated route file.

bawthub/llm/llm_utils.py

The voice path. The Python backend talks to llm-bawt as a stateless OpenAI client and streams word-by-word for low TTS latency. StatelessLLMStream + rechunk_to_words are the relevant helpers.

frontend/src/app/chat/useUnifiedEventStream.ts

The SSE side. A single Server-Sent Events subscription per bot consumes tool_start / tool_end / turn_complete events from llm-bawt. Used by the chat UI to render live tool activity inline as agent bots execute steps.

There's nothing that talks to llm-bawt's Postgres directly. Memory writes, summary rebuilds, embedding regeneration — all go through MCP-style HTTP endpoints. This is a deliberate firewall: BawtHub knows nothing about the schema in {bot_id}_memories or profile_attributes, and it can't accidentally corrupt them.

07 Deployment is just file edits.

The frontend container bind-mounts ./frontend from the host and runs next dev --webpack. Editing a .ts, .tsx, or .css file on disk triggers HMR; users see the change on the next render. There is no separate build-and-deploy step for the live site — git pull on the host is the release.

⚠

Turbopack is intentionally off.

package.json pins next dev --webpack. Turbopack broke /public/ Web Workers + AudioWorklets (the Opus decoder and the audio-output-processor), which killed voice audio streaming. The webpack-based dev server is the supported path.

The Python backend is hot-reloading too: ./bawthub is bind-mounted, and the container runs fastapi run in dev-mode with autoreload. The Moshi STT/TTS containers are static — they don't reload, but they almost never change. make rebuild-ui exists for the rare case where package.json changes warrant a fresh image. make snapshot-up brings up the baked snapshot build on a sibling hostname when you want a stable comparison.

08 External dependencies.

The compose stack is intentionally narrow: Traefik, frontend, backend, TTS, STT, plus an optional NocoDB instance. Everything else BawtHub talks to is external:

Service	Where	Why
llm-bawt	`host.docker.internal:8642`	LLM completions, memory, tools, bots — separate repo, separate lifecycle
PostgreSQL	Operator-provided	Prisma schema for UI persistence — not bundled, you bring your own
Home Assistant	Configured via env	Weather widget on the home page, smart-home context for bots
Unraid	Configured via env	Container management dashboard for the home server
HuggingFace	HF_TOKEN	Pulls Moshi STT + TTS weights on first run (~20 GB)

09 Where to go from here.

Frontend →

Next.js 16, App Router, Zustand stores, the SSE event bus, and how the chat UI tracks 3,500 lines of state.

Voice →

The fastrtc handler, Moshi STT/TTS msgpack, the multi-provider TTS registry, and how a word becomes audio in under 200 ms.

Avatar →

VRM, GLB, and FBX in a unified three.js scene with audio-amplitude lip sync and procedural idle motion.

Surfaces →

A tour of every page — chat, voice, agents, tools, unraid — with the real bot roster doing real work.

PreviousBawtHub NextChat surface

Validated against main on 2026-05-13 Source: bawthub repo (private)