Mutual NPC Interaction Probe Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Build mutual_npc_interaction_probe_v1, a headless Bun + TypeScript proof where both Minecraft bots act, exchange visible dialogue, react to approach/attention, perform one small item handoff, and write a transcript that lets a reviewer verify the interaction without opening a client.

Architecture: Keep the existing agent_loop_probe_v0 working and add a separate v1 path under probe/src/mutual/. Reuse the existing Docker server and bot boot logic, but introduce a small mutual-turn runtime with actor-specific deterministic providers, a richer transcript shape, and one reliable world action (drop_item("paper", 1)) so the second bot is an actor rather than a passive state object.

Tech Stack: Bun 1.3.x, TypeScript, Mineflayer ^4.37.1, mineflayer-pathfinder, Docker vanilla Minecraft 1.21.11, node:test

File Structure

Shared files to modify

probe/package.json
- Add the new probe:v1 script and the mineflayer-pathfinder dependency.
probe/src/config.ts
- Add loadMutualProbeConfig() and the new mutual_npc_interaction_probe_v1 defaults.
probe/src/runtime/createBots.ts
- Load shared bot setup needed by the new runtime, including pathfinder plugin registration.
probe/src/tools/moveTo.ts
- Replace the current lookAt + forward probe with a reliable near-target movement helper.
probe/test/runtimeLogic.test.ts
- Add the mutual runtime and tool behavior tests.
probe/test/transcript.test.ts
- Add the richer transcript acceptance test.
README.md
- Document the new proof command and what v1 proves beyond v0.
docs/docs/Agent-Search-Index.md
- Route future agents to the new mutual-interaction plan and proof script.

New files to create

probe/src/mutual/personas.ts
- Stable persona cards for Mara and Jun.
probe/src/mutual/types.ts
- Shared v1 actor ids, transcript records, verdict types, and final result types.
probe/src/mutual/provider.ts
- Deterministic next-step selection for each actor, including the local createSequenceProvider helper.
probe/src/mutual/runtimeState.ts
- Runtime-owned heard-message queue, busy/available state, and item handoff state.
probe/src/mutual/transcript.ts
- Transcript writer for the richer actorAction / targetObservation / targetResponse / worldStateChange / memoryNote payload.
probe/src/mutual/tools/observeWorld.ts
- Observation helper that includes heard chat, distance, facing, and dropped marker state.
probe/src/mutual/tools/replyTo.ts
- Target-side response tool for npc_b.
probe/src/mutual/tools/lookAtActor.ts
- Attention tool that turns one bot toward another.
probe/src/mutual/tools/dropItem.ts
- Minimal material action using paper.
probe/src/mutual/tools/index.ts
- Compose the observe_world, move_to, say, wait, reply_to, look_at_actor, drop_item, and remember tool implementations for the mutual loop, and reject unsupported proposals.
probe/src/mutual/mutualLoop.ts
- Alternating actor runtime that records cause-and-effect across both bots, including the concrete RunMutualLoopArgs type.
probe/src/mutual/runMutualProbe.ts
- End-to-end orchestration for v1, including try/finally cleanup mirroring runProbe.ts.
probe/src/mutual/cli.ts
- CLI entrypoint for the new proof, including cleanup warning formatting.
scripts/run-mutual-interaction-probe.sh
- Primary proof command for the new slice.
docs/superpowers/reports/2026-05-19-mutual-npc-interaction-probe-review.md
- Human-readable report with the real transcript path and category verdicts.
docs/superpowers/reports/artifacts/2026-05-19-mutual-npc-interaction-probe-transcript.json
- Committed copy of the proof transcript so later review does not depend on an ephemeral local data/evidence/ file.

Locked Design Choices

Keep v0 runnable. Do not fold the v1 behavior into runProbe.ts.
Use paper as the marker item to avoid chest setup, mining, or crafting.
Use mineflayer-pathfinder for move_to; the current forward-only movement is not good enough for the spatial category.
Make npc_b act through its own deterministic provider and runtime turn, not by having npc_a simulate npc_b.
Keep provider output proposal-only. Busy/available, heard-message delivery, and world-state changes remain runtime-owned.
Acceptance comes from the live transcript artifact plus the small tests below, not from adding a broad suite.

Test Strategy

Keep the test surface small:

probe/test/runtimeLogic.test.ts
- provider sequence for both actors
- runtime-owned busy/heard state
- mutual loop cause-and-effect recording
- movement helper result contract
- item handoff result contract
probe/test/transcript.test.ts
- transcript JSON shape and snapshot behavior
existing server/config coverage remains in probe/test/serverConfig.test.ts

Task 1: Add failing mutual transcript and runtime tests

Files:

Create: probe/src/mutual/types.ts
Create: probe/src/mutual/transcript.ts
Modify: probe/test/runtimeLogic.test.ts
Modify: probe/test/transcript.test.ts
Test: probe/test/runtimeLogic.test.ts
Test: probe/test/transcript.test.ts
Step 1: Add the failing transcript acceptance test

test("mutual transcript writes category verdicts and causal step records", async () => {
  const evidenceDir = path.resolve("probe/test-artifacts/mutual-transcript");
  const transcript = createMutualTranscript({
    evidenceDir,
    probeId: "mutual_npc_interaction_probe_v1",
    personas: {
      npc_a: "Mara, anxious quartermaster",
      npc_b: "Jun, distracted runner"
    }
  });

  transcript.recordStep({
    category: "conversationTurnState",
    actorAction: { actor: "npc_a", tool: "say", result: "heard_by_npc_b" },
    targetObservation: { actor: "npc_b", heardText: "Jun, can you confirm the marker?" },
    targetResponse: { actor: "npc_b", tool: "reply_to", result: "busy_reply" },
    causedNext: { actor: "npc_a", tool: "wait" }
  });

  const outputPath = await transcript.write({
    conversationTurnState: "passed",
    spatialAttentionApproach: "passed",
    materialEnvironmentHandoff: "passed"
  }, {
    status: "success",
    why: "both NPCs responded to dialogue and world actions"
  });

  const payload = JSON.parse(await fs.readFile(outputPath, "utf8"));

  assert.equal(payload.probe, "mutual_npc_interaction_probe_v1");
  assert.equal(payload.categories.materialEnvironmentHandoff, "passed");
  assert.equal(payload.steps[0].targetResponse.tool, "reply_to");
});

Step 2: Add the failing mutual runtime test

function createFakeMutualBots() {
  return {
    npc_a: { username: "npc_a" },
    npc_b: { username: "npc_b" }
  };
}

function createFakeMutualTools() {
  let executionCount = 0;

  return {
    lastResult() {
      return null;
    },
    validateProposal(proposal: Proposal) {
      return proposal;
    },
    async observe_world() {
      return { status: "ok", heardMessages: [], markerEntitySeen: executionCount >= 2 };
    },
    async execute() {
      executionCount += 1;

      if (executionCount === 1) {
        return {
          category: "conversationTurnState",
          actorAction: { actor: "npc_a", tool: "say", result: "said" },
          targetResponse: { actor: "npc_b", tool: "reply_to", result: "busy_reply" }
        };
      }

      if (executionCount === 2) {
        return {
          category: "spatialAttentionApproach",
          actorAction: { actor: "npc_b", tool: "look_at_actor", result: "looked_at_target" },
          worldStateChange: { arrived: true }
        };
      }

      return {
        category: "materialEnvironmentHandoff",
        actorAction: { actor: "npc_a", tool: "drop_item", result: "dropped" },
        targetObservation: { markerEntitySeen: true },
        targetResponse: { actor: "npc_b", tool: "reply_to", result: "replied" },
        worldStateChange: { itemName: "paper" },
        causedNext: { actor: "npc_b", tool: "reply_to" }
      };
    }
  };
}

test("mutual loop makes npc_b act after npc_a changes chat or world state", async () => {
  const transcriptSteps: unknown[] = [];

  const result = await runMutualLoop({
    bots: createFakeMutualBots(),
    providers: createMutualProviders(),
    transcript: { recordStep(step) { transcriptSteps.push(step); } },
    tools: createFakeMutualTools()
  });

  assert.equal(result.status, "success");
  assert.ok(transcriptSteps.some((step) => {
    return typeof step === "object"
      && step !== null
      && "targetResponse" in step
      && (step as { targetResponse?: { actor?: string } }).targetResponse?.actor === "npc_b";
  }));
});

Step 3: Run the two focused tests and verify they fail

Run:

cd probe && bun test test/transcript.test.ts test/runtimeLogic.test.ts

Expected: FAIL because createMutualTranscript, runMutualLoop, or the mutual provider/runtime files do not exist yet.

Step 4: Create the minimal shared mutual types

export type MutualActorId = "npc_a" | "npc_b";

export type InteractionCategory =
  | "conversationTurnState"
  | "spatialAttentionApproach"
  | "materialEnvironmentHandoff";

export type CategoryVerdict = "passed" | "failed";

export type Proposal = {
  tool: string;
  args?: Record<string, unknown>;
};

export type HeardMessage = {
  from: MutualActorId;
  text: string;
};

export type MutualStepRecord = {
  category: InteractionCategory;
  actorAction: {
    actor: MutualActorId;
    tool: string;
    result: string;
  };
  targetObservation?: Record<string, unknown>;
  targetResponse?: {
    actor: MutualActorId;
    tool: string;
    result: string;
  };
  worldStateChange?: Record<string, unknown>;
  memoryNote?: {
    actor: MutualActorId;
    note: string;
  };
  causedNext?: {
    actor: MutualActorId;
    tool: string;
  };
};

export type TranscriptFinal = {
  status: "success" | "failed";
  why: string;
};

export type CreateMutualTranscriptOptions = {
  evidenceDir: string;
  probeId: string;
  personas: Record<MutualActorId, string>;
};

Step 5: Add the mutual transcript writer

export function createMutualTranscript({ evidenceDir, probeId, personas }: CreateMutualTranscriptOptions) {
  const steps: MutualStepRecord[] = [];

  return {
    recordStep(step: MutualStepRecord) {
      steps.push(structuredClone(step));
    },
    async write(categories: Record<InteractionCategory, CategoryVerdict>, final: TranscriptFinal) {
      await fs.mkdir(evidenceDir, { recursive: true });
      const outputPath = path.join(evidenceDir, `${probeId}-${Date.now()}.json`);
      await fs.writeFile(outputPath, JSON.stringify({
        probe: probeId,
        personas: structuredClone(personas),
        categories: structuredClone(categories),
        steps,
        final: structuredClone(final)
      }, null, 2));
      return outputPath;
    }
  };
}

Step 6: Re-run the focused tests until the failures move to the missing mutual runtime

Run:

cd probe && bun test test/transcript.test.ts test/runtimeLogic.test.ts

Expected: transcript test passes or reaches deeper runtime failures; runtime test still fails because the loop and providers are not implemented yet.

Step 7: Commit the red-to-green transcript groundwork

git add probe/test/runtimeLogic.test.ts probe/test/transcript.test.ts probe/src/mutual/transcript.ts
git commit -m "test: add mutual transcript contract"

Task 2: Implement two-actor deterministic runtime state and providers

Files:

Create: probe/src/mutual/personas.ts
Create: probe/src/mutual/provider.ts
Create: probe/src/mutual/runtimeState.ts
Create: probe/src/mutual/mutualLoop.ts
Modify: probe/test/runtimeLogic.test.ts
Test: probe/test/runtimeLogic.test.ts
Step 1: Add failing provider and runtime-state assertions

test("mutual providers keep actor-specific deterministic order", () => {
  const providers = createMutualProviders();

  assert.deepEqual(providers.npc_a.next({ lastResult: null }), {
    tool: "observe_world",
    args: {}
  });

  assert.deepEqual(providers.npc_b.next({ lastResult: null }), {
    tool: "reply_to",
    args: { source: "npc_a", text: "Busy. Give me a second." }
  });

  assert.deepEqual(providers.npc_b.next({
    lastResult: { tool: "reply_to", status: "busy_reply" }
  }), {
    tool: "look_at_actor",
    args: { target: "npc_a" }
  });
});

test("runtime state owns heard chat, busy replies, and marker visibility", () => {
  const state = createMutualRuntimeState({ busyRepliesBeforeAvailable: 1, markerItemName: "paper" });

  state.recordHeardMessage("npc_b", { from: "npc_a", text: "Jun, can you confirm the marker?" });

  assert.equal(state.consumeHeardMessages("npc_b").length, 1);
  assert.deepEqual(state.requestReply("npc_b", "npc_a"), {
    status: "busy",
    reason: "npc_b is busy"
  });
});

Step 2: Run the runtime test and verify the new assertions fail

Run:

cd probe && bun test test/runtimeLogic.test.ts

Expected: FAIL because the mutual providers and runtime state do not exist yet.

Step 3: Add persona cards

export const mutualPersonas = {
  npc_a: {
    name: "Mara",
    summary: "anxious quartermaster",
    goal: "ask whether the marker item should be moved to the shared chest"
  },
  npc_b: {
    name: "Jun",
    summary: "distracted runner",
    goal: "reply only after finishing or pausing the current task"
  }
} as const;

Step 4: Add runtime state for heard chat, busy ownership, and marker state

export function createMutualRuntimeState({ busyRepliesBeforeAvailable, markerItemName }: MutualRuntimeStateOptions) {
  let remainingBusyReplies = busyRepliesBeforeAvailable;
  let markerDropped = false;
  const heard: Record<MutualActorId, HeardMessage[]> = { npc_a: [], npc_b: [] };
  const lastResults: Record<MutualActorId, { tool: string; status: string } | null> = {
    npc_a: null,
    npc_b: null
  };

  return {
    recordHeardMessage(target: MutualActorId, message: HeardMessage) {
      heard[target].push(message);
    },
    consumeHeardMessages(target: MutualActorId) {
      const messages = [...heard[target]];
      heard[target] = [];
      return messages;
    },
    requestReply(actor: MutualActorId, target: MutualActorId) {
      if (actor !== "npc_b" || target !== "npc_a") {
        return { status: "unavailable" as const, reason: `${target} is unavailable` };
      }
      if (remainingBusyReplies > 0) {
        remainingBusyReplies -= 1;
        return { status: "busy" as const, reason: "npc_b is busy" };
      }
      return { status: "available" as const };
    },
    markDroppedItem(actor: MutualActorId, itemName: string) {
      markerDropped = actor === "npc_a" && itemName === markerItemName;
    },
    hasDroppedMarker() {
      return markerDropped;
    },
    markerItemName() {
      return markerItemName;
    },
    lastResult(actorId: MutualActorId) {
      return lastResults[actorId];
    },
    recordLastResult(actorId: MutualActorId, result: { tool: string; status: string }) {
      lastResults[actorId] = result;
    }
  };
}

export type MutualRuntimeStateOptions = {
  busyRepliesBeforeAvailable: number;
  markerItemName: string;
};

Step 5: Add the deterministic actor providers

function createSequenceProvider(sequence: Proposal[]) {
  let index = 0;

  return {
    next(_input: { observation?: unknown; lastResult: { tool: string; status: string } | null }) {
      const proposal = sequence[Math.min(index, sequence.length - 1)];
      index += 1;
      return structuredClone(proposal);
    }
  };
}

export function createMutualProviders() {
  return {
    npc_a: createSequenceProvider([
      { tool: "observe_world", args: {} },
      { tool: "move_to", args: { target: "npc_b" } },
      { tool: "say", args: { target: "npc_b", text: "Jun, can you confirm the marker?" } },
      { tool: "wait", args: { ticks: 20, reason: "npc_b was busy" } },
      { tool: "drop_item", args: { itemName: "paper", count: 1 } },
      { tool: "remember", args: { note: "Jun answered after the marker drop" } }
    ]),
    npc_b: {
      next(input: { observation?: Record<string, unknown>; lastResult: { tool: string; status: string } | null }) {
        if (input.lastResult === null) {
          return { tool: "reply_to", args: { source: "npc_a", text: "Busy. Give me a second." } };
        }

        if (input.lastResult.tool === "reply_to" && input.lastResult.status === "busy_reply") {
          return { tool: "look_at_actor", args: { target: "npc_a" } };
        }

        if (input.lastResult.tool === "look_at_actor") {
          return { tool: "observe_world", args: {} };
        }

        if (input.lastResult.tool === "observe_world" && input.observation?.markerEntitySeen === true) {
          return { tool: "reply_to", args: { source: "npc_a", text: "I see the paper. Leave it with me." } };
        }

        return { tool: "reply_to", args: { source: "npc_a", text: "I still do not see the marker." } };
      }
    }
  };
}

Step 6: Add the alternating mutual loop

type MutualRuntimeActor = {
  username: string;
};

export type RunMutualLoopArgs<TActor extends MutualRuntimeActor> = {
  bots: Record<MutualActorId, TActor>;
  providers: ReturnType<typeof createMutualProviders>;
  tools: {
    lastResult(actorId: MutualActorId): { tool: string; status: string } | null;
    validateProposal(proposal: Proposal): Proposal;
    observe_world(input: {
      actorId: MutualActorId;
      actor: TActor;
      targetId: MutualActorId;
      target: TActor;
    }): Promise<Record<string, unknown>>;
    execute(input: {
      actorId: MutualActorId;
      actor: TActor;
      targetId: MutualActorId;
      target: TActor;
      proposal: Proposal;
      observation: Record<string, unknown>;
    }): Promise<MutualStepRecord>;
  };
  transcript: {
    recordStep(step: MutualStepRecord): void;
  };
};

function deriveCategoryVerdicts(steps: MutualStepRecord[]): Record<InteractionCategory, CategoryVerdict> {
  const droppedMarker = steps.some((step) => {
    return step.category === "materialEnvironmentHandoff"
      && step.actorAction.tool === "drop_item"
      && step.worldStateChange?.itemName === "paper";
  });
  const observedMarker = steps.some((step) => {
    return step.category === "materialEnvironmentHandoff"
      && step.targetObservation?.markerEntitySeen === true;
  });
  const repliedAfterMarker = steps.some((step) => {
    return step.category === "materialEnvironmentHandoff"
      && step.targetResponse?.tool === "reply_to"
      && step.causedNext?.actor === "npc_b";
  });

  return {
    conversationTurnState: steps.some((step) => step.category === "conversationTurnState" && step.targetResponse)
      ? "passed"
      : "failed",
    spatialAttentionApproach: steps.some((step) => step.category === "spatialAttentionApproach" && step.worldStateChange?.arrived === true)
      ? "passed"
      : "failed",
    materialEnvironmentHandoff: droppedMarker && observedMarker && repliedAfterMarker
      ? "passed"
      : "failed"
  };
}

export async function runMutualLoop<TActor extends MutualRuntimeActor>({
  bots,
  providers,
  tools,
  transcript
}: RunMutualLoopArgs<TActor>) {
  const plan: MutualActorId[] = ["npc_a", "npc_a", "npc_a", "npc_b", "npc_a", "npc_b", "npc_a", "npc_b", "npc_b", "npc_a"];
  const recordedSteps: MutualStepRecord[] = [];

  for (const actorId of plan) {
    const actor = bots[actorId];
    const targetId = actorId === "npc_a" ? "npc_b" : "npc_a";
    const target = bots[targetId];
    const observation = await tools.observe_world({ actorId, actor, targetId, target });
    const proposal = tools.validateProposal(
      providers[actorId].next({ observation, lastResult: tools.lastResult(actorId) })
    );
    const step = await tools.execute({ actorId, actor, targetId, target, proposal, observation });
    recordedSteps.push(step);
    transcript.recordStep(step);
  }

  const categories = deriveCategoryVerdicts(recordedSteps);
  const passedAll = Object.values(categories).every((value) => value === "passed");

  return {
    status: passedAll ? ("success" as const) : ("failed" as const),
    why: passedAll
      ? "both NPCs responded to each other's dialogue and world actions"
      : "one or more interaction categories did not reach acceptance",
    categories
  };
}

Step 7: Re-run the runtime test until the provider and loop assertions pass

Run:

cd probe && bun test test/runtimeLogic.test.ts

Expected: PASS for the new provider/runtime cases, with any remaining failures now isolated to the new tool implementations.

Step 8: Commit the mutual runtime core

git add probe/src/mutual/personas.ts probe/src/mutual/provider.ts probe/src/mutual/runtimeState.ts probe/src/mutual/mutualLoop.ts probe/test/runtimeLogic.test.ts
git commit -m "feat: add mutual npc runtime core"

Task 3: Add reliable movement, attention, item handoff, and the live entrypoint

Files:

Modify: probe/package.json
Modify: probe/src/config.ts
Modify: probe/src/runtime/createBots.ts
Modify: probe/src/tools/moveTo.ts
Create: probe/src/mutual/tools/observeWorld.ts
Create: probe/src/mutual/tools/replyTo.ts
Create: probe/src/mutual/tools/lookAtActor.ts
Create: probe/src/mutual/tools/dropItem.ts
Create: probe/src/mutual/tools/index.ts
Create: probe/src/mutual/runMutualProbe.ts
Create: probe/src/mutual/cli.ts
Create: scripts/run-mutual-interaction-probe.sh
Modify: probe/test/runtimeLogic.test.ts
Test: probe/test/runtimeLogic.test.ts
Step 1: Add failing movement and item-handoff tests

function createPathfindingFakeBot(username: string, position: { x: number; y: number; z: number }) {
  return {
    username,
    entity: {
      position: {
        ...position,
        distanceTo(other: { x: number; y: number; z: number }) {
          return Math.hypot(position.x - other.x, position.y - other.y, position.z - other.z);
        }
      }
    },
    pathfinder: {
      async goto() {
        position.x = 1.5;
      }
    }
  };
}

function createInventoryFakeBot(username: string) {
  return {
    username,
    version: "1.21.11",
    creative: {
      async setInventorySlot() {}
    },
    async toss() {}
  };
}

test("move_to reports before and after distance and reaches the near threshold", async () => {
  const actor = createPathfindingFakeBot("npc_a", { x: 0, y: 0, z: 0 });
  const target = createPathfindingFakeBot("npc_b", { x: 3, y: 0, z: 0 });

  assert.deepEqual(await moveTo({ actor, target, targetId: "npc_b" }), {
    status: "arrived",
    beforeDistance: 3,
    afterDistance: 1.5,
    arrived: true
  });
});

test("drop_item marks world state so npc_b can observe and answer", async () => {
  const state = createMutualRuntimeState({ busyRepliesBeforeAvailable: 1, markerItemName: "paper" });

  const result = await dropItem({
    actor: createInventoryFakeBot("npc_a"),
    runtimeState: state,
    itemName: "paper",
    count: 1
  });

  assert.equal(result.status, "dropped");
  assert.equal(state.hasDroppedMarker(), true);
});

Step 2: Run the runtime test and verify the tool assertions fail

Run:

cd probe && bun test test/runtimeLogic.test.ts

Expected: FAIL because mineflayer-pathfinder is not installed, the moveTo result shape is different, and dropItem does not exist yet.

Step 3: Add the pathfinding dependency and the new script entry

{
  "scripts": {
    "probe:v0": "bun run src/cli.ts",
    "probe:v1": "bun run src/mutual/cli.ts"
  },
  "dependencies": {
    "mineflayer": "^4.37.1",
    "minecraft-data": "^3.87.0",
    "mineflayer-pathfinder": "^2.4.5",
    "minecraft-protocol": "^1.66.2",
    "prismarine-item": "^1.15.0",
    "vec3": "^0.1.10"
  }
}

Run:

cd probe && bun add mineflayer-pathfinder minecraft-data prismarine-item

Expected: lockfile updates and dependency install succeeds.

Step 4: Register pathfinder and implement the reliable movement helper

bot.loadPlugin(pathfinder);

export async function moveTo({ actor, target, targetId }: MoveToArgs): Promise<MoveToResult> {
  if (targetId !== target.username) {
    throw new Error(`Unsupported move target: ${targetId}`);
  }

  const beforeDistance = roundDistance(actor.entity.position.distanceTo(target.entity.position));
  const goal = new goals.GoalNear(target.entity.position.x, target.entity.position.y, target.entity.position.z, 1);

  await actor.pathfinder.goto(goal);

  const afterDistance = roundDistance(actor.entity.position.distanceTo(target.entity.position));

  return {
    status: afterDistance <= 1.5 ? "arrived" : "moved",
    beforeDistance,
    afterDistance,
    arrived: afterDistance <= 1.5
  };
}

Step 5: Add the attention, reply, observe, and drop-item tools

const mutualAllowedTools = [
  "observe_world",
  "move_to",
  "say",
  "wait",
  "reply_to",
  "look_at_actor",
  "drop_item",
  "remember"
] as const;

export function validateMutualProposal(proposal: Proposal) {
  if (!mutualAllowedTools.includes(proposal.tool as (typeof mutualAllowedTools)[number])) {
    throw new Error(`Unsupported mutual tool: ${proposal.tool}`);
  }

  return {
    tool: proposal.tool,
    args: proposal.args ?? {}
  };
}

export async function lookAtActor({ actor, target }: LookAtActorArgs) {
  await actor.lookAt(target.entity.position, true);
  return { status: "looked_at_target", target: target.username };
}

export async function replyTo({ actor, source, runtimeState, text }: ReplyToArgs) {
  const result = runtimeState.requestReply(actor.username as MutualActorId, source.username as MutualActorId);
  actor.chat(text);
  runtimeState.recordHeardMessage(source.username as MutualActorId, {
    from: actor.username as MutualActorId,
    text
  });
  return result.status === "busy"
    ? { status: "busy_reply", reason: result.reason }
    : { status: "replied", text };
}

export async function dropItem({ actor, runtimeState, itemName, count }: DropItemArgs) {
  const mcData = minecraftData(actor.version);
  const itemId = mcData.itemsByName[itemName].id;
  const ItemCtor = Item(actor.version);
  await actor.creative.setInventorySlot(36, new ItemCtor(itemId, count));
  await actor.toss(itemId, null, count);
  runtimeState.markDroppedItem("npc_a", itemName);
  return { status: "dropped", itemName, count };
}

function findNearbyMarkerEntity({ observer, itemName }: { observer: { entities: Record<string, { name?: string; displayName?: string }> }; itemName: string }) {
  return Object.values(observer.entities).find((entity) => {
    return entity.name === itemName
      || entity.displayName?.toLowerCase() === itemName
      || (entity.name === "item" && JSON.stringify(entity).toLowerCase().includes(itemName));
  });
}

export function observeWorld({ actor, target, runtimeState, memory }: ObserveWorldArgs) {
  const markerEntitySeen = Boolean(
    findNearbyMarkerEntity({
      observer: actor,
      itemName: runtimeState.markerItemName()
    })
  );

  return {
    status: "ok",
    visibleActors: [{
      id: target.username,
      distance: Number(actor.entity.position.distanceTo(target.entity.position).toFixed(2))
    }],
    heardMessages: runtimeState.consumeHeardMessages(actor.username as MutualActorId),
    markerEntitySeen,
    memory: memory.list()
  };
}

type CreateMutualToolsArgs = {
  runtimeState: ReturnType<typeof createMutualRuntimeState>;
  memories: Record<MutualActorId, ReturnType<typeof createMemory>>;
};

type ObserveWorldToolArgs<TActor extends { username: string } = { username: string }> = {
  actorId: MutualActorId;
  actor: TActor;
  targetId: MutualActorId;
  target: TActor;
};

type ExecuteMutualToolArgs<TActor extends { username: string } = { username: string }> =
  ObserveWorldToolArgs<TActor> & {
    proposal: Proposal;
    observation: Record<string, unknown>;
  };

async function executeMutualTool<TActor extends { username: string }>({
  actorId,
  actor,
  targetId,
  target,
  proposal,
  observation,
  runtimeState
}: ExecuteMutualToolArgs<TActor> & CreateMutualToolsArgs) {
  switch (proposal.tool) {
    case "observe_world":
      return {
        category: "conversationTurnState" as const,
        actorAction: { actor: actorId, tool: proposal.tool, result: "observed" }
      };
    case "move_to":
      return {
        category: "spatialAttentionApproach" as const,
        actorAction: { actor: actorId, tool: proposal.tool, result: "moved" },
        worldStateChange: await moveTo({ actor, target, targetId })
      };
    case "say":
      actor.chat(String(proposal.args?.text ?? ""));
      runtimeState.recordHeardMessage(targetId, {
        from: actorId,
        text: String(proposal.args?.text ?? "")
      });
      return {
        category: "conversationTurnState" as const,
        actorAction: { actor: actorId, tool: proposal.tool, result: "said" },
        targetObservation: observation,
        causedNext: { actor: targetId, tool: "reply_to" }
      };
    case "wait":
      return {
        category: "conversationTurnState" as const,
        actorAction: { actor: actorId, tool: proposal.tool, result: "waited" },
        worldStateChange: await wait({ ticks: Number(proposal.args?.ticks ?? 20) })
      };
    case "look_at_actor":
      return {
        category: "spatialAttentionApproach" as const,
        actorAction: { actor: actorId, tool: proposal.tool, result: "looked_at_target" },
        worldStateChange: await lookAtActor({ actor, target })
      };
    case "reply_to":
      const replyResult = await replyTo({
        actor,
        source: target,
        runtimeState,
        text: String(proposal.args?.text ?? "")
      });

      return {
        category: runtimeState.hasDroppedMarker() ? "materialEnvironmentHandoff" as const : "conversationTurnState" as const,
        actorAction: { actor: actorId, tool: proposal.tool, result: replyResult.status },
        targetObservation: observation,
        targetResponse: {
          actor: actorId,
          tool: proposal.tool,
          result: replyResult.status
        },
        causedNext: { actor: actorId, tool: proposal.tool }
      };
    case "drop_item":
      return {
        category: "materialEnvironmentHandoff" as const,
        actorAction: { actor: actorId, tool: proposal.tool, result: "dropped" },
        worldStateChange: await dropItem({
          actor,
          runtimeState,
          itemName: String(proposal.args?.itemName ?? "paper"),
          count: Number(proposal.args?.count ?? 1)
        })
      };
    case "remember":
      return {
        category: runtimeState.hasDroppedMarker() ? "materialEnvironmentHandoff" as const : "conversationTurnState" as const,
        actorAction: { actor: actorId, tool: proposal.tool, result: "remembered" },
        memoryNote: {
          actor: actorId,
          note: String(proposal.args?.note ?? "")
        }
      };
    default:
      return {
        category: "conversationTurnState" as const,
        actorAction: { actor: actorId, tool: proposal.tool, result: "ok" }
      };
  }
}

export function createMutualTools({ runtimeState, memories }: CreateMutualToolsArgs) {
  return {
    lastResult(actorId: MutualActorId) {
      return runtimeState.lastResult(actorId);
    },
    validateProposal: validateMutualProposal,
    observe_world(input: ObserveWorldToolArgs) {
      return observeWorld({ ...input, runtimeState, memory: memories[input.actorId] });
    },
    async execute(input: ExecuteMutualToolArgs) {
      const step = await executeMutualTool({ ...input, runtimeState, memories });
      runtimeState.recordLastResult(input.actorId, {
        tool: input.proposal.tool,
        status: step.targetResponse?.result ?? step.actorAction.result
      });
      return step;
    }
  };
}

Step 6: Wire the live v1 probe and script

export function loadMutualProbeConfig(): ProbeConfig {
  return {
    ...loadProbeConfig(),
    probeId: "mutual_npc_interaction_probe_v1"
  };
}

export async function runMutualProbe() {
  const config = loadMutualProbeConfig();
  let server: ServerHandle | null = null;
  let bots: Awaited<ReturnType<typeof createBots>> | null = null;
  let result: { transcriptPath: string } | null = null;
  let caughtError: unknown;
  const cleanupErrors: unknown[] = [];

  try {
    server = await startDockerServer(config);
    bots = await createBots(config, { host: server.host, port: server.port });
    const memories = {
      npc_a: createMemory(config.memoryLimit),
      npc_b: createMemory(config.memoryLimit)
    };
    const runtimeState = createMutualRuntimeState({
      busyRepliesBeforeAvailable: config.dialogue.busyRepliesBeforeAvailable,
      markerItemName: "paper"
    });
    const transcript = createMutualTranscript({
      evidenceDir: config.evidenceDir,
      probeId: config.probeId,
      personas: {
        npc_a: "Mara, anxious quartermaster",
        npc_b: "Jun, distracted runner"
      }
    });

    const final = await runMutualLoop({
      bots,
      providers: createMutualProviders(),
      tools: createMutualTools({ runtimeState, memories }),
      transcript
    });

    result = {
      transcriptPath: await transcript.write(final.categories, {
        status: final.status,
        why: final.why
      })
    };
  } catch (error) {
    caughtError = error;
  } finally {
    if (bots) {
      try {
        await closeBots(bots);
      } catch (error) {
        cleanupErrors.push(error);
      }
    }
    if (server) {
      try {
        await server.stop();
      } catch (error) {
        cleanupErrors.push(error);
      }
    }
  }

  return finalizeRunProbe({
    result,
    caughtError,
    cleanupErrors
  });
}

function formatError(error: unknown): string {
  if (error instanceof AggregateError) {
    return `${error.message}\n${error.errors.map((entry) => formatError(entry)).join("\n")}`;
  }

  if (error instanceof Error) {
    return error.message;
  }

  return String(error);
}

async function main() {
  try {
    const { transcriptPath } = await runMutualProbe();
    console.log(transcriptPath);
  } catch (error) {
    console.error(formatError(error));
    process.exitCode = 1;
  }
}

#!/usr/bin/env bash
set -euo pipefail

bun run --cwd probe probe:v1

Run:

chmod +x scripts/run-mutual-interaction-probe.sh

Expected: the new proof script is executable in the working tree.

Step 7: Run the tests, typecheck, and live proof

Run:

bun run --cwd probe test
bun run --cwd probe typecheck
./scripts/run-mutual-interaction-probe.sh

Expected:

all Bun tests pass;
typecheck passes;
the script exits 0 and prints a data/evidence/mutual_npc_interaction_probe_v1-*.json path.
Step 8: Commit the live runtime slice

git add probe/package.json probe/bun.lock probe/src/config.ts probe/src/runtime/createBots.ts probe/src/tools/moveTo.ts probe/src/mutual probe/test/runtimeLogic.test.ts scripts/run-mutual-interaction-probe.sh
git commit -m "feat: add mutual npc interaction probe"

Task 4: Update docs and write the evidence report

Files:

Modify: README.md
Modify: docs/docs/Agent-Search-Index.md
Create: docs/superpowers/reports/2026-05-19-mutual-npc-interaction-probe-review.md
Create: docs/superpowers/reports/artifacts/2026-05-19-mutual-npc-interaction-probe-transcript.json
Step 1: Add the new README proof section

## Mutual NPC interaction probe

Run the second headless proof:

```sh
./scripts/run-mutual-interaction-probe.sh
```

This proof adds:

1. two acting bots rather than one acting bot plus runtime-gated state
2. transcript category verdicts for conversation, spatial attention, and material handoff
3. a small item action (`paper`) that changes the next NPC response

Step 2: Update the search index route

- `MUTUAL_NPC_INTERACTION_PROBE_V1`
  - `docs/superpowers/plans/2026-05-19-mutual-npc-interaction-probe.md`
  - `scripts/run-mutual-interaction-probe.sh`
  - `docs/superpowers/reports/2026-05-19-mutual-npc-interaction-probe-review.md`

Step 3: Write the evidence report from the actual transcript

# Mutual NPC Interaction Probe Review

## Bottom line

`mutual_npc_interaction_probe_v1` ran against a real local Docker-backed server.

- Fill this report only after Step 4 with the exact transcript path printed by the proof run.

## Category verdicts

- Copy the exact `categories` object from the preserved transcript JSON.

## Durable evidence

- committed transcript copy:
  `docs/superpowers/reports/artifacts/2026-05-19-mutual-npc-interaction-probe-transcript.json`
- live run path:
  add the exact filename printed by Step 4

## What the transcript shows

1. `npc_a` approached `npc_b` and closed distance to the near threshold.
2. `npc_b` replied first with a busy response, then with a confirmation after the paper drop.
3. The dropped marker item changed a later response rather than only adding decoration.

Step 4: Re-run the proof command, inspect the artifact, and preserve the evidence

Run:

artifact_path=$(./scripts/run-mutual-interaction-probe.sh)
mkdir -p docs/superpowers/reports/artifacts
cp "$artifact_path" docs/superpowers/reports/artifacts/2026-05-19-mutual-npc-interaction-probe-transcript.json
printf '%s\n' "$artifact_path"

Expected: the printed artifact path exists under data/evidence/, the committed copy is refreshed, and the report is updated from that copied JSON with the exact live path and verdict values.

Step 5: Run the full acceptance command set

Run:

bun run --cwd probe test
bun run --cwd probe typecheck
./scripts/run-mutual-interaction-probe.sh
npm run build --prefix docs

Expected: all commands succeed and the docs build still passes.

Step 6: Commit the docs and report

git add README.md docs/docs/Agent-Search-Index.md docs/superpowers/reports/2026-05-19-mutual-npc-interaction-probe-review.md docs/superpowers/reports/artifacts/2026-05-19-mutual-npc-interaction-probe-transcript.json
git commit -m "docs: record mutual probe proof"

Self-Review

Spec coverage

Two acting NPCs: covered by Task 2 mutual providers and alternating loop.
Conversation/turn-state category: covered by Task 2 and Task 3 reply/heard-message tools.
Spatial attention/approach category: covered by Task 3 pathfinder-backed move_to and look_at_actor.
Material/environment handoff category: covered by Task 3 drop_item("paper", 1) and the runtime-state marker observation.
Transcript acceptance: covered by Task 1 and Task 4.
Live proof and review handoff: covered by Task 3 and Task 4.

Placeholder scan

No TODO, TBD, or "implement later" placeholders remain.
The item choice, script name, and tool set are locked so the implementer does not need to guess.

Type consistency

Actor ids stay npc_a | npc_b.
The new transcript categories use conversationTurnState, spatialAttentionApproach, and materialEnvironmentHandoff consistently.
The world-action slice uses drop_item + marker visibility, not a mix of chest/place/drop approaches.

Plan complete and saved to docs/superpowers/plans/2026-05-19-mutual-npc-interaction-probe.md.

Execution mode selected: Inline Execution in this session using executing-plans.

File Structure​

Shared files to modify​

New files to create​

Locked Design Choices​

Test Strategy​

Task 1: Add failing mutual transcript and runtime tests​

Task 2: Implement two-actor deterministic runtime state and providers​

Task 3: Add reliable movement, attention, item handoff, and the live entrypoint​

Task 4: Update docs and write the evidence report​

Self-Review​

Spec coverage​

Placeholder scan​

Type consistency​

File Structure

Shared files to modify

New files to create

Locked Design Choices

Test Strategy

Task 1: Add failing mutual transcript and runtime tests

Task 2: Implement two-actor deterministic runtime state and providers

Task 3: Add reliable movement, attention, item handoff, and the live entrypoint

Task 4: Update docs and write the evidence report

Self-Review

Spec coverage

Placeholder scan

Type consistency