Reasoning Loop¶
The reasoning loop is an autonomous agent cycle that decomposes goals into plans, executes skills, evaluates results, and iterates until the goal is met. Unlike the Workflow Engine which runs pre-defined step graphs, the reasoning loop builds and adapts its plan at runtime.
Core Cycle¶
User Goal
↓
Prior Knowledge (Memory + Knowledge search)
↓
PLANNING ──→ LLM produces step plan
↓
┌─→ EXECUTE ──→ Invoke whitelisted skill
│ ↓
│ EVALUATE ──→ LLM scores result (rubric 1-5)
│ ↓
│ DECIDE ──→ continue | adjust | ask_user | done | stuck
│ ↓
│ ┌── continue ──→ next step ──┐
│ ├── adjust ────→ new plan ───┤
│ ├── ask_user ──→ pause ──────┤
│ ├── done ──────→ deliver ────┤
│ └── stuck ─────→ terminate ──┘
│ │
└────────────────────────────────┘
How It Works¶
The loop runs as a GenServer under a DynamicSupervisor. LLM calls run as Tasks so the GenServer stays responsive to user intervention (pause, abort, steer) during slow model inference.
Planning — The LLM receives the goal, available whitelisted skills with descriptions, and prior knowledge from Memory/Knowledge semantic search. It returns a JSON plan with ordered steps.
Execution — For each step, the LLM prepares concrete skill input, then the skill runs through the standard security stack: whitelist check, SkillRegistry resolve, capability token mint, SafeExecutor, CircuitBreaker, ContentSanitizer.
Evaluation — The LLM scores the skill output on four criteria (relevance, completeness, usability, goal progress) each 1-5. Quality is derived: good (avg >= 3.5), partial (>= 2.0), failed.
Decision — A mix of deterministic logic and LLM judgment. Obvious cases are handled without an LLM call:
| Condition | Decision | LLM call? |
|---|---|---|
| All steps done, last eval good | Force summary | No |
| Steps remaining, no failures | Continue | No |
| Consecutive failures at threshold | Stuck | No |
| Everything else | LLM decides | Yes |
Working Memory¶
A single string threaded through all four prompts. Each LLM response includes an updated working_memory field. This is the loop's train of thought — it prevents context fragmentation across phases.
Every 3 iterations, a dedicated LLM call compresses the working memory to essential facts, discarding stale process notes.
User Intervention¶
All interventions are real-time via PubSub — the GenServer processes them between phases.
| Action | Effect |
|---|---|
| Pause | Completes current task, then halts |
| Resume | Continues from where it paused |
| Steer | Free text injected into working memory with [USER GUIDANCE] prefix |
| Step override | Forces a specific skill + input, bypasses decision |
| Add context | Appended to working memory, persists across iterations |
| Abort | Kills running task, terminates session |
Plan Validation¶
Before execution begins, every plan step is validated deterministically:
- Skill name must be present (not nil or empty)
- Skill must exist in the configured whitelist
- Steps failing validation are rejected and planning retries with the error injected
This catches malformed LLM output before it reaches the execution phase.
Safety Mechanisms¶
| Mechanism | Default | Configurable |
|---|---|---|
| Max iterations | 15 | reasoning.max_iterations |
| Max LLM calls | 60 | reasoning.max_llm_calls |
| Time budget | ~300s per step | Proportional to plan size |
| Stuck threshold | 3 consecutive failures | reasoning.stuck_threshold |
| Confidence threshold | 0.7 | reasoning.done_confidence_threshold |
| Duplicate detection | 3x same | Automatic |
| Adjust oscillation | 3+ adjusts at high confidence | Forces final summary |
| Orphan cleanup | Boot sweep + terminate callback | Automatic |
Score Trend¶
The last 3 evaluation scores are averaged and the slope is injected into the decision prompt as "improving", "stable", or "DEGRADING". This gives the model a signal about whether the current approach is working.
Result Delivery¶
On completion:
- Result stored to Memory (
kind: :reasoning) with pgvector embedding - Optional Telegram/Discord notification via
reasoning.default_deliveryconfig - Skill outputs from each step are also embedded for future session context
Configuration¶
All settings are editable from the Admin UI config page.
| Key | Default | Description |
|---|---|---|
reasoning.enabled | true | Feature flag |
reasoning.llm_tier | local | LLM tier: local, light, medium, heavy |
reasoning.max_iterations | 15 | Max loop iterations |
reasoning.max_llm_calls | 60 | Max LLM calls per session |
reasoning.skill_whitelist | JSON array | Skills the loop may invoke |
reasoning.done_confidence_threshold | 0.7 | Min confidence to accept done |
reasoning.stuck_threshold | 3 | Consecutive failures before stuck |
reasoning.step_timeout_seconds | 120 | Per-skill execution timeout |
reasoning.max_plan_steps | 8 | Max steps in a plan |
reasoning.default_delivery | ["memory"] | Delivery channels on completion |
Prompt templates (prompts.reasoning.planning, .execution, .evaluation, .decision) are also editable at runtime. Leave empty to use defaults.
Audit Trail¶
Every phase is recorded in the reasoning_steps table:
- LLM prompt and raw response
- Skill name, input, and output
- Rubric scores and quality assessment
- Decision and confidence
- Working memory snapshot
- Duration in milliseconds
- Errors
Sessions are tracked in reasoning_sessions with goal, status, plan, final result, iteration count, and LLM call count.
Chat Page Integration¶
The chat page (/chat) operates in two modes:
- Chat mode — standard LLM conversation with memory context (unchanged)
- Reasoning mode — toggle to start a reasoning session with goal input, plan view, step timeline, intervention controls, and result panel
Limitations¶
- Output quality is model-dependent. Local models (7B-14B) hallucinate when source material is thin. Routing to a stronger tier via
reasoning.llm_tierimproves quality. - No parallel skill execution. Steps run sequentially.
- Working memory degrades over long sessions despite compression every 3 iterations.
- Evaluation is LLM-judged. Local models tend to rate their own output favorably. The deterministic pre-filter mitigates this for obvious cases.