agents · task system

Plan, dispatch, review. Repeat.

Agents in BawtHub don't live in a separate task tracker. The same Next.js app that renders chat also owns a full project / task / step model — and every operation on it is exposed twice: once as a REST endpoint for the UI, once as an MCP tool for the agents themselves. A bot can plan its own follow-up work, queue dependent tasks, mark its own steps complete, and hand off to another bot — all without a human round-trip.

Storage: Postgres + Prisma Dispatch: POST /api/agents/tasks/[id]/dispatch → after() background job Surface parity: every UI action has an MCP twin

01 The data model.

Three nouns: projects, tasks, steps. One verb that matters: dispatch. Everything else — activity entries, dependencies, attachments, cron schedules — hangs off those.

Agent task graph · per-bot, per-project

Project

AgentProjectname + color + iconcontextPromptdefault agentBotId

Task

AgentTaskshortId (TASK-216)status · prioritymodelIdresponsedependsOn[]

Step

AgentSteporderIndextypestatusoutput

Activity

AgentActivitytypeactorType (user/bot)meta JSONappend-only

Tasks carry a human-friendly shortId (e.g. TASK-216) for use in chat, plus a UUID for foreign-key joins. Steps are ordered by orderIndex and typed: PLAN, READ_FILE, EDIT_FILE, CREATE_FILE, DELETE_FILE, RUN_COMMAND, SEARCH, ASK_USER, REVIEW. The type is a hint — the agent picks the actual tool to use at execution time.

02 The status machine.

A task moves through a small set of statuses. Two transitions are gated — IN_PROGRESS → REVIEW is set by the executing agent when it claims work is done; REVIEW → COMPLETED is set by a human, never by a bot. That single rule is what keeps the system honest.

Status	Set by	Meaning
`QUEUED`	creator	Created, not yet planned or started.
`PLANNING`	plan-dispatcher	An agent is writing or refining the spec + steps.
`REFINED`	planner	Plan written; awaiting execute dispatch.
`IN_PROGRESS`	dispatcher (auto)	The dispatch route flipped this before sending the prompt.
`REVIEW`	executing agent	"I'm done. Human, look at this."
`COMPLETED`	human only	Signed off. Counts toward project progress.
`FAILED`	agent or dispatcher	Hard error; `response` holds the explanation.
`CANCELLED`	human	Won't do; kept for audit.

✦

Only humans mark COMPLETED.

The tasks_update MCP tool's docstring spells this out: IMPORTANT: Set status to REVIEW when done - only humans mark COMPLETED. Nothing physically prevents an agent from sending status="COMPLETED" — it's a norm, not a permission check — but it's the norm that keeps the review queue meaningful.

03 Dispatch: clean-context handoff.

When a human (or another bot) clicks Dispatch, the frontend hits POST /api/agents/tasks/[id]/dispatch. That route does four things, in this order:

Mark the task IN_PROGRESS in Postgres and pre-fill modelId from the assigned bot's default model — the human doesn't have to specify which model is doing the work.
Touch the parent project's updatedAt so it bubbles to the top of the sidebar.
Record an AgentActivity row of type task.dispatched with actorType = user or bot.
Return {ok: true, status: "dispatched"} immediately. The actual LLM call is deferred to Next's after() background hook so the HTTP response is never blocked on a 30-second agent turn.

Inside the after() callback, the route fetches the full task with its project and ordered steps, renders the agents.task_execution prompt template (pulled live from the llm-bawt prompt store, cached 5 minutes), and POSTs to /v1/chat/completions with extract_memory: false and augment_memory: false. Task work doesn't pollute the bot's conversational memory.

The execution prompt is the interesting part. It includes the task's markdown context (title, description, project context, ordered checklist of steps with their UUIDs), then injects:

▸

The execution contract, in plain English.

1. Start: Set task status to IN_PROGRESS immediately. modelId is auto-filled. 2. Per step: Before starting each step, set it to RUNNING. When done, set it to COMPLETED with a brief output. If it fails, FAILED with the error. 3. Finish: When all work is done, write a summary into the task response field and set status to REVIEW. If the task cannot be completed, set status to FAILED.

The agent is told to update its own task via the same REST endpoints the UI uses — PATCH /api/agents/tasks/[id] and PATCH /api/agents/tasks/[id]/steps/[stepId]. From its perspective there's nothing magic about the dashboard; it just makes the same HTTP calls a human would.

04 What the human sees.

BawtHub agents dashboard showing TASK-216, 215, 214 in the queue and an agent bots panel listing Loopy, Snark, Caid, Vex, Byte, Codex

/agents — the dispatch surface. Left: tasks grouped by project, status pills colored by stage. Right: the bot roster, each agent showing its current load. Selecting a task opens a panel with BotDispatchPanel — pick a bot, hit Plan or Execute, the task flips to PLANNING or IN_PROGRESS in real time as the dispatched bot updates its own row.

The frontend pieces — SortableTaskList, TaskRow, BotDispatchPanel, PlanDispatchButton, DispatchNoteDialog — all live under src/app/(app)/(dashboard)/agents/_components/. The drag-to-reorder uses POST /api/agents/tasks/reorder. Promote-to-project is a one-click action that wraps a task as its own project when scope creeps.

05 The MCP surface.

Every operation the UI can perform on tasks is also exposed as an MCP tool, registered in mcp_server/task_tools.py. Agents call these directly. The tools are thin httpx wrappers over the same /api/agents/* REST endpoints — X-Agent-Bot-Id is passed for activity attribution, and errors are wrapped as {error: ..., status: ...} dicts instead of raised exceptions, because LLMs handle failure better as data.

Tool	Verb	Use
`tasks_list`	GET	Filter by status, project, search query. Sorted by recency.
`tasks_get`	GET	Full task by UUID or shortId.
`tasks_get_context`	GET (derived)	Markdown briefing — title, description, dep list, step checklist, project context. Drop into prompt.
`tasks_create`	POST	Queue a new task; optionally seed steps.
`tasks_update`	PATCH	Status, response, modelId, title, description, priority, planned, projectId, agentBotId.
`tasks_delete`	DELETE	Hard delete. Docstring nudges agents toward `CANCELLED` instead.
`tasks_add_dependency`	POST	Cycles rejected server-side.
`tasks_remove_dependency`	DELETE	Unblock.
`tasks_promote`	POST	Promote task to its own project (title becomes project name, description becomes context).
`tasks_regenerate`	POST	Server-side LLM rewrite of title + steps. Docstring explicitly warns agents off: RARELY USEFUL FOR AGENTS — you are already an LLM and can write better steps yourself.
`steps_add` / `steps_update` / `steps_delete`	POST/PATCH/DELETE	Per-step lifecycle. Agents call `steps_update` as they work — `RUNNING` on entry, `COMPLETED`/`FAILED` on exit, with an `output` summary.
`projects_*` family	CRUD	Same shape: `list`, `get`, `create`, `update`, `delete`, plus `projects_get_context` for a markdown-only briefing.
`activity_get`	GET	Recent activity entries. Filterable by task or project. The audit trail.

06 Task context as a markdown briefing.

The tasks_get_context tool builds a single markdown document agents can paste into their reasoning at the top of a turn. It's deliberately not a JSON blob — the agent doesn't need to parse it. It looks like this:

# TASK-216 — Wire up Codex tool-result diffs

**Status:** IN_PROGRESS  **Priority:** HIGH
**Assigned to:** caid  **Model:** claude-sonnet-4-6

## Description
The Codex bridge currently sends fileChange items without diff content...

## Dependencies
- ✅ TASK-212 — Provider-aware tool rendering (COMPLETED)

## Steps
- [x] Inspect codex item shape (READ_FILE)
- [~] Map file_change → ClaudeToolCallCard (EDIT_FILE)
- [ ] Add provider gate to FileChangeBody (EDIT_FILE)

## Project: llm-bawt
### Project Context
This is the llm-bawt repo. Run `make restart` after Python changes...

The checkbox states [x], [~], [ ], [!], [-] map to COMPLETED, RUNNING, PENDING, FAILED, SKIPPED. The agent updates these via steps_update as it works; a parallel viewer in the human dashboard re-fetches the task and shows the same row state. Both sides are watching the same Postgres rows.

07 Two-phase dispatch: plan, then execute.

For larger tasks the UI offers two dispatch buttons. Plan sends the task to a bot with a different prompt template — agents.task_planning — that asks the bot to write the spec and seed an ordered step list, but not to do the work. Status moves QUEUED → PLANNING → REFINED. Execute dispatches the same or a different bot against the now-refined plan.

This split lets a fast-thinking bot (e.g. Snark or Loopy) plan, and a code-capable agent (Caid for code; Vex for security audits) execute. The planner doesn't have to be a coding agent, and the executor doesn't have to write the spec from a one-liner.

08 Cron and scheduled work.

The /api/agents/cron family — CronCreate, CronList, CronDelete — lets agents schedule recurring task creation. A common pattern is a daily 6am job: create a task titled "morning brief" assigned to snark with the description "summarize overnight Postgres logs". The cron row holds a crontab expression plus a task template; on each fire it materializes a fresh task in QUEUED and either auto-dispatches or waits for a human nudge depending on the row's autoDispatch flag.

09 Activity as the audit trail.

Every mutating operation — task.created, task.dispatched, task.status_changed, step.completed, project.updated — appends a row to AgentActivity. The row carries actorType (user or bot), actorId (user email or bot slug), and a free-form meta JSON for type-specific payload. The frontend's chat-side AgentActivityRow component renders these inline in the conversation when an agent touches a task while talking to you.

The activity_get MCP tool lets agents look up what they (or other agents) did recently — useful for follow-up tasks: List the last 10 things caid did on TASK-216 returns the actual mutation history, not a summary.

10 Key files.

llm-bawt/src/llm_bawt/mcp_server/task_tools.py

The MCP surface. ~930 lines. tasks_*, steps_*, projects_*, activity_get. Thin httpx wrappers over the BawtHub REST API; X-Agent-Bot-Id for attribution.

bawthub/frontend/src/app/api/agents/tasks/[id]/dispatch/route.ts

The dispatch route. Flips status, records activity, then after()-defers the actual /v1/chat/completions call. Uses cached agents.task_execution prompt template.

bawthub/frontend/src/app/(app)/(dashboard)/agents/_components/

The UI. BotDispatchPanel, SortableTaskList, TaskRow, TaskCreateDialog, PlanDispatchButton, DispatchNoteDialog, TaskStatusIcon.

bawthub/frontend/src/app/agents/taskDispatchPrompt.ts

Prompt builders. Shared between client-side dispatch (from chat) and server-side dispatch route. Single source of truth for the execution prompt shape.

bawthub/prisma/schema.prisma

The data model. AgentProject, AgentTask, AgentStep, AgentActivity, AgentCron. Foreign keys with cascade deletes only on steps; tasks survive project deletion as unassigned.

bawthub/frontend/src/lib/agentActivity.ts

Activity helpers. recordActivity, resolveActor — used by every mutating route to stamp who did what.

PreviousOverview NextClaude Code bridge

Validated against main on 2026-05-13 Source: llm-bawt agent backends