llm-bawt · tools

Native when you can, text-parse when you must.

Bots call tools to search memory, fetch web pages, control smart-home devices, switch models, and search the web. src/llm_bawt/tools/ implements the multi-turn tool loop, two format handlers (native OpenAI function-calling and ReAct text parsing) plus a legacy XML fallback, a consolidated nine-tool catalog, a normalizer for the names models hallucinate, and a streaming variant that emits tool events as SSE.

Files: tools/ + tools/formats/ Total: ~4,400 lines Catalog: 9 consolidated tools (action-based)

01 The tool loop.

ToolLoop.run() in tools/loop.py is the heart of in-turn tool dispatch. The same loop handles both native and text-parsed tools — the difference is in where the tool calls come from in the model's response.

One iteration of the tool loop

1. Query the LLM with the current message list. Pass tool schema (native) or rely on the system-prompt tool catalog (ReAct). Stop sequences from format handler + adapter are combined.

2. Inspect the response. Dict with tool_calls? Native path. Text containing ReAct markers? Parse path. Plain text? Return as the final answer.

3. For each tool call: log it, run it through ToolExecutor, format the result via the active handler.

4. Append the assistant message (with truncation to prevent hallucinated continuations) and each tool result message. Continue to the next iteration.

5. Loop ends when the model produces a non-tool response, or hits MAX_TOOL_CALLS_PER_TURN (default 20, capped at 3 in HA-mode for device commands).

The loop tracks two parallel records:

tool_context — a list of dicts {tools_called: [...], results: "..."} used to build the history-persisted summary block (so the next turn sees a "Tool Results @ ..." system message).
tool_call_details — per-call dicts with name, arguments, result, and iteration. Used by debug logging and by the turn-log writer.

02 Format handlers.

tools/formats/ defines a small interface, ToolFormatHandler, with three handlers:

Handler	File	How tool calls are encoded
`NativeOpenAIFormatHandler`	`native_openai.py` (208 lines)	Standard OpenAI `tools` schema + `tool_calls` response field. JSON arguments.
`ReActFormatHandler`	`react.py` (530 lines)	Text format: `Thought: ... Action: tool_name Action Input: {json}`. Stop sequences trim the model after its tool call.
`LegacyXMLFormatHandler`	`xml_legacy.py` (48 lines)	Fallback only — `<tool_call>` / `<function_call>` tags. Detected and parsed when a model emits these despite being configured for another format.

The active handler is picked once per turn from config.get_tool_format(model_alias=..., model_def=...). OpenAI and Grok use native; local models (llama.cpp, vLLM, Ollama) use ReAct.

03 Why ReAct for local models.

⚠

Local-model native tool calling is unreliable for multi-turn loops.

llama-cpp-python's chatml-function-calling chat format works for one-shot tool calls but returns empty content after the first round-trip when a tool result is fed back. The same is partially true of some vLLM tokenizer configs. The ReAct text format sidesteps this entirely: every turn is plain text, the loop parses tool calls out of the text, and the model continues generating naturally with the observation injected as the next message. See ToolLoop._should_use_native_tools() — it explicitly returns False for react even when the client claims to support native.

The trade-off is that text-parsed tool calls are messier: models invent variant names, malform JSON, hallucinate observations, or trail off into commentary. react.py handles all of this with a fallback ladder.

04 ReAct parsing.

The ReAct handler looks for the standard markers (Thought:, Action:, Action Input:, Observation:) plus a set of alternative markers (# Tool:, Tool:, Function:) that some models use. Tool names are normalized through a 30-entry alias table — retrieve_conversation_history, search_memory, save_memory, google, etc. all map to canonical tool names so the executor doesn't fail on naming drift.

Arguments are JSON — but loose JSON. _try_fix_json walks a series of repair attempts: unbalanced braces, trailing commas, single quotes, unquoted keys. If json5 is installed it's tried as a last resort. If everything fails, the call is recorded with an error result and the model sees the failure in the next iteration.

Stop sequences from ReActFormatHandler.get_stop_sequences() include Observation: and a few variants — they cut the model off the moment it tries to hallucinate a tool result, before it can wander into fiction. Adapters like DolphinAdapter add their own model-specific stops that get concatenated.

05 Cross-format fallbacks inside the loop.

Even with a format picked, models go off-script. ToolLoop.run() has a short fallback ladder: if the configured handler finds no tool calls but the response contains <tool_call>, <function_call>, or action: + action input:, it instantiates the matching handler dynamically, parses again, and logs a warning. This catches models that "leak" a different format than was requested without failing the turn.

After a tool runs, the loop appends an assistant message and the tool result message. For ReAct, the assistant message is truncated at the end of the last detected tool-call's raw text — otherwise hallucinated continuations like "the result is X, so I'll also..." poison the next iteration. Native mode appends the message verbatim with tool_calls attached, since the SDK handles structuring.

06 The executor.

ToolExecutor in tools/executor.py (1,943 lines) is where the call actually runs. It owns references to every backend client (memory, profile, search, home, HA-native, news, web-fetch, model lifecycle, history) and a dispatch table mapping tool names to handler methods.

The consolidated catalog is intentionally small. Most "tools" are action-based — one tool with an action parameter that selects the operation:

Tool	Actions / parameters	Backend
`memory`	`search`, `store`, `update`, `delete`	`MemoryClient` → MCP server
`history`	`search`, `recent`, `forget`	History manager + memory client
`profile`	`get`, `set`, `delete`	`ProfileManager`
`self`	Bot personality reflection + trait development	`ProfileManager` (entity_type=BOT)
`search`	`web`, `news`, `reddit`	`SearchClient` (Brave / Tavily / DDGS / Reddit)
`web_fetch`	Fetch + extract a page	Crawl4AI
`home`	`query`, `get`, `set`	Home Assistant MCP
`model`	`list`, `current`, `switch`	`ModelLifecycleManager`
`time`	Current local time	—

Legacy single-purpose names (search_memories, store_memory, web_search, get_current_time, …) are kept in the dispatch table as aliases routing to the consolidated handlers. Existing prompts and fine-tunes that emit the old names still work.

A per-turn call counter (default max 20, configurable via MAX_TOOL_CALLS_PER_TURN) prevents runaway loops. HA-mode caps it at 3 — device commands almost never need more, and the cap stops a chatty model from spamming the smart-home graph.

07 Home Assistant native tools.

When the configured HomeAssistantNativeClient is initialized, llm-bawt also exposes HA's own catalog — HassTurnOn, HassTurnOff, HassLightSet, HassSetPosition, GetLiveContext, GetDateTime — converted into Tool objects via ha_tools_to_tool_definitions. These pass through the same format handler and executor, but the executor delegates to the HA MCP client rather than the consolidated home tool. The native catalog gives the model finer-grained control when configured, with extensive prompt guidance baked into the format handler about device-control semantics (cover open/close vs on/off, friendly names vs entity IDs, area/floor targeting).

08 Streaming tool events.

The non-streaming tool loop in loop.py is the simpler case — every iteration runs synchronously, then the final answer is returned. For SSE streaming, tools/streaming.py (352 lines) implements stream_with_tools:

The model's response is streamed token-by-token into a buffer.
The buffer is checked for tool-call markers as it grows. Until a decision threshold (~80 chars) it's not yet clear whether the response is text or a tool call.
If markers appear, the stream is consumed silently until the call completes, then the tool runs and emits tool_call / tool_result events to the SSE channel.
If the response is clearly text, buffered chunks flush to the SSE stream as content deltas and the model continues streaming normally.

This is how the BawtHub UI shows tool calls inline with the answer: the SSE event types tool_call and tool_result are intercepted by the frontend's chunk parser and rendered as their own card UI inside the assistant bubble. Frontends that don't care just ignore them — the standard content deltas still arrive in the right order.

09 Output sanitization.

Three layers of cleanup run before the final response is returned to the caller:

Adapter cleaning. Per the adapters page, model-specific quirks (Pygmalion BBCode, Dolphin hallucinated observations) are stripped. Runs after streaming, before format-handler sanitization.
Handler sanitization. Each handler implements sanitize_response — removes trailing tool markers, residual Observation: stubs, and any other format-specific scaffolding from the final answer.
Cross-format leakage. shared/output_sanitizer.strip_tool_protocol_leakage catches stray tool-protocol fragments that snuck through both layers — e.g. an OpenAI model that emitted a ReAct-style Thought: by mistake.

10 Key files.

tools/loop.py

ToolLoop. 550 lines. The multi-turn dispatch state machine. Native + ReAct branching, cross-format fallback ladder, per-iteration message assembly, response truncation for ReAct.

tools/executor.py

ToolExecutor. 1,943 lines — the biggest file in tools/. Owns every backend client; dispatch table for consolidated + legacy tool names; per-turn call counter; the per-action handler methods.

tools/definitions.py

Tool catalog. 792 lines. Tool/ToolParameter dataclasses; MEMORY_TOOL, HISTORY_TOOL, etc.; get_tools_list and get_tools_prompt for system-prompt rendering.

tools/parser.py

Shared parsing utilities. 361 lines. ToolCall dataclass, generic parse_tool_calls, result formatting helpers, the KNOWN_TOOLS set.

tools/streaming.py

stream_with_tools. 352 lines. The streaming variant of the loop with decision-threshold buffering and SSE event emission for tool calls/results.

tools/formats/native_openai.py

Native OpenAI handler. 208 lines. Emits the OpenAI tools schema; parses tool_calls from response; assembles the HA-native tool guidance block when those tools are present.

tools/formats/react.py

ReAct handler. 530 lines. Thought/Action/Action Input parsing, 30-entry tool-name alias table, loose-JSON repair ladder, stop sequences, response sanitization.

tools/formats/xml_legacy.py

Legacy XML. 48 lines. Fallback for <tool_call> / <function_call> markers that leak from older prompts.

PreviousMemory NextAgent backends

Validated against main on 2026-05-13 Source: llm-bawt/src/llm_bawt/tools