llm-bawt · adapters

Cleanup, per model.

Different base models have different output habits. Some leak chat-template tokens. Some emit BBCode formatting nobody asked for. Some hallucinate fake tool observations after a real tool call. The src/llm_bawt/adapters/ module is a small registry of per-model output sanitizers and stop-sequence sets that get applied after streaming completes but before the format handler's own sanitization runs. Three concrete adapters today: default, dolphin, pygmalion. The auto-detection is intentionally narrow.

Path: src/llm_bawt/adapters/ Adapters: 3 (default, dolphin, pygmalion) Total: ~210 lines including registry

01 The contract.

adapters/base.py defines ModelAdapter in 42 lines. It's an abstract class with four hookable methods, all with sensible defaults:

Method	Default	Purpose
`get_stop_sequences() -> list[str]`	`[]`	Model-specific stops, concatenated with the tool-format handler's stops in the tool loop.
`clean_output(response: str) -> str`	passthrough	Strip model-specific artifacts. Runs after streaming completes, before the format handler's `sanitize_response`.
`supports_system_role() -> bool`	`True`	Whether the model natively understands a `system` role.
`transform_messages(messages) -> list`	passthrough	Apply message-list rewrites — e.g. merging the system message into the first user message for models without system support.

Adapters never re-run inference and never touch token logits — they're string-level cleanup. The right place to fix a chat template is the model definition's chat_format field, not here. Adapters exist for the artifacts that the chat template alone can't prevent.

02 The registry and auto-detection.

adapters/registry.py is 68 lines. Adapters register themselves at import time via _register_builtins() — currently default, pygmalion (with mythomax as an alias for the same class), and dolphin.

Resolution order in get_adapter(model_alias, model_def):

Explicit adapter: "pygmalion" field in the model definition. If set and the name exists, use it.
Auto-detection from the model alias or from model_def.repo_id. The detector is small — it looks for the substrings pygmalion, mytho, mythomax, lewd in either source. If matched, returns "pygmalion".
Otherwise: DefaultAdapter().

Auto-detection is deliberately conservative. Dolphin models, for example, are not auto-detected — they have to be configured explicitly. The Dolphin adapter only adds a safety-net cleanup for hallucinated observations, and even that is rarely needed once the ReAct stop sequences are correctly applied. Auto-detecting it would risk silently masking the real problem (a bad chat template).

03 Default — no-op for well-behaved models.

adapters/default.py is six lines. It's a named no-op:

from .base import ModelAdapter

class DefaultAdapter(ModelAdapter):
    """Default no-op adapter for well-behaved models."""
    name = "default"

OpenAI, Grok, Claude (via the agent bridge), and well-templated GGUF models get this one. It exists for symmetry — every LLMClient always has an adapter, no code path has to deal with None.

04 Pygmalion — BBCode and role-marker stripping.

adapters/pygmalion.py is 64 lines. It targets Pygmalion-family models (MythoMax, character-tuned Llama derivatives, the "Lewd" series). These models pick up a fistful of conversational and formatting habits from their training data that don't belong in a chat-completion response.

Stop sequences: Pygmalion models love to roll into a second turn unprompted. The adapter adds [HUMAN], [/HUMAN], [INST], [/INST], ### Instruction:, ### Human:, and <|im_start|>user as stops, on top of whatever the tool-format handler is using. The moment the model starts a fake turn, generation halts.

Output cleaning in clean_output runs four passes:

Paired role-block removal. [HUMAN]fake message[/HUMAN] and [INST]fake[/INST] blocks — including content — are scrubbed entirely.
Unpaired role-marker truncation. If [INST], [HUMAN], [/INST], [/HUMAN], ### Instruction:, ### Human:, or <|im_start|>user appears without a matching closer, everything from that marker to the end is dropped.
BBCode strip. [FONT=Arial], [/FONT], [B], [/B], and similar tags are matched by regex (\[\w+(?:=[^\]]+)?\] and \[/\w+\]) and removed. The content between them is preserved.
Whitespace normalization. \n{3,} collapses to \n\n; trailing whitespace is trimmed.

When the adapter actually changes the output, it logs the byte count removed at DEBUG level — useful for spotting models whose template is misconfigured (lots of cleanup per turn means the chat format should be fixed upstream).

05 Dolphin — hallucinated observation safety net.

adapters/dolphin.py is 36 lines. Dolphin models (Dolphin3.0-Llama3.1, Dolphin-Qwen) tend to follow up a real tool call with an invented Observation: block — they "complete the ReAct pattern" rhetorically instead of waiting for the actual observation to arrive in the next message. The ReAct format handler stops on Observation:, which catches most of these. The adapter is a backstop for when one slips through:

obs_match = re.search(r"\n+Observation\s*:", response, re.IGNORECASE)
if obs_match:
    response = response[: obs_match.start()]

Stop sequences are not added — the format handler's stops already cover Dolphin's surface. The adapter is purely a regex-based safety net for the tail.

06 Where adapters fit in the call chain.

Adapters are applied at three points in the request flow:

Adapter touch points

1. BaseLLMBawt.__init__ calls get_adapter(model_alias, model_def) and attaches the result to the LLMBawt instance. One adapter per client lifetime.

2. ToolLoop._query_llm concatenates adapter.get_stop_sequences() onto the format handler's stops before each non-native query. The model sees combined stops.

3. Streaming completes → adapter.clean_output(response) → format handler's sanitize_response → shared strip_tool_protocol_leakage. Three nested layers of cleanup.

transform_messages and supports_system_role are hooks for models that need pre-flight message munging. None of the shipped adapters use them today — every supported model handles system messages — but the seams are there for, e.g., older Llama-2 chat formats that needed the system message folded into the first user turn.

07 Adapters vs format-handler sanitization.

The two layers do similar-looking things and the boundary matters. Format handlers (tools/formats/) are tool-protocol aware — they know what ReAct or native or legacy-XML tool calls look like and how to strip their scaffolding. Adapters are model-output aware — they know that Pygmalion emits BBCode and Dolphin hallucinates observations.

An equivalent rule of thumb: a format handler's sanitization stays correct if you swap models within the same format. An adapter's cleaning stays correct if you swap formats within the same model. The two layers compose cleanly because they target different sources of noise.

08 Adding an adapter.

Three steps:

Create a class in a new file under adapters/ inheriting ModelAdapter. Set name = "your-adapter" and override clean_output / get_stop_sequences as needed.
Register it in adapters/registry._register_builtins() — or, if shipping in a separate package, expose it via a future entry-point group (not yet wired but trivial to add).
Reference it from a model definition with adapter: "your-adapter", or extend _auto_detect_adapter() if it should apply automatically based on model alias / repo id.

The 36–64-line size of the existing adapters is a good template. Don't over-engineer: if it can be fixed in the chat template or via a stop sequence, do that first.

09 Key files.

adapters/base.py

ModelAdapter. 42 lines. Abstract base. Four hookable methods, all defaulted. Adapters are string-level cleanup, never logits.

adapters/default.py

DefaultAdapter. 6 lines. The no-op. Used by every model that doesn't need cleanup, which is most of them.

adapters/dolphin.py

DolphinAdapter. 36 lines. Truncates hallucinated Observation: blocks as a safety net behind the ReAct stop sequences. Not auto-detected; configure explicitly with adapter: "dolphin".

adapters/pygmalion.py

PygmalionAdapter. 64 lines. Strips paired and unpaired [HUMAN]/[INST]/###/<|im_start|> markers, BBCode formatting tags, and excess whitespace. Adds matching stop sequences. Auto-detected from pygmalion/mytho/mythomax/lewd substrings.

adapters/registry.py

Registry + auto-detection. 68 lines. _ADAPTERS name → class table. get_adapter(alias, def) with the explicit/auto/default resolution order. register_adapter for runtime registration.

adapters/__init__.py

Re-export surface. 7 lines. Exposes ModelAdapter, the three concrete adapters, and the registry functions.

PreviousMCP server NextSystem map

Validated against main on 2026-05-13 Source: llm-bawt/src/llm_bawt/adapters