mc-multimodal-agent Local Implementation Insights
Target: mc-multimodal-agent
This implementation relies on a runtime-owned tool loop rather than LLM code generation. The LLM outputs structured JSON, and the runtime handles schema validation, execution, and memory. This aligns perfectly with our /probe architecture.
Core Takeaways
Post-Tool Observation
After specific actions (move, wait, craft), the agent loop immediately appends a new state observation (postToolState) to the tool result.
- For
/probe: Extend our simple{ tool, status }results. After an action (likemove_to), automatically attach the new position, visible actors, and recent chat to the result so the LLM understands why it succeeded or failed without wasting a turn onobserve.
Strict Tool Schema
The LLM interaction is constrained to specific JSON shapes (tool_call, tool_calls, final), forbidding prose-based action planning.
- For
/probe: EnforceoneToolPerTurnvia strict JSON parsing. If the LLM returns invalid JSON or hallucinates tools, return a structured parser error directly into the loop.
Anti-Repeat Detection
The runtime hashes tool names and arguments. If the same arguments yield the same result repeatedly, it escalates from a warning to a critical failure, aborting the loop.
- For
/probe: Implement a lightweight memory Map tracking the last 8{ tool, argsHash, resultStatus }entries per actor. If repeated failures hit a threshold, inject a prompt warning; abort on critical thresholds.
Skill Recording (Traces, Not Code)
Skills are saved as ordered atomic traces ({ tool, arguments }) with preconditions and success criteria, rather than raw JavaScript blobs.
- For
/probe: Adopt this trace-based recording. Successful dialogue or movement sequences should be saved as JSON traces (build/generated-skills/*.json). Ignore replay functionality for now.
Memory & Transcript Separation
Transcripts are append-only JSONL files (the ledger), while memory is a LevelDB store split into semantic, episodic, and working layers.
- For
/probe: Maintain the append-only transcript as evidence. For memory, don't use a DB yet; just split the currentcreateMemory()array into aworkingcontext and a shortepisodiclist injected into the prompt.
Goal & Task Planning
Goals are defined with explicit successCriteria and blockers. State updates (blocked/failed) are recorded to prevent looping.
- For
/probe: Keep goal trees minimal. A scenario only needs 2-3 subgoals (e.g.,conversation,move,remember), tracked simply bystatusandsuccessCriteria. Discard complex material/crafting planners for now.
Action Items for /probe
- Standardize Tool Results: Enforce
{ ok: boolean, status: string, text: string, data?: JsonValue }. - Rich Action Feedback: Append post-action observations to movement and social tool results.
- Anti-Repeat Loop: Track action hashes and inject prompt warnings upon repeated failures.
- Memory Layering: Separate
workingnotes fromepisodicevents in the prompt context. - Trace-Based Skills: Record successful sequences as JSON step traces, not code.
- Minimal Goals: Define scenarios via simple
successCriteriaandblockers, avoiding deep planners. - Defer Complexity: Ignore vision models, subagents, and crafting trees for the initial headless proof.