Skip to content

Commit b32c2ca

Browse files
shinohara-rinnekomeowww
authored andcommitted
feat(minecraft): add llm trace endpoint and compact prompt payloads
1 parent 10251a9 commit b32c2ca

File tree

6 files changed

+292
-3
lines changed

6 files changed

+292
-3
lines changed
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
---
2+
name: minecraft-debug-mcp
3+
description: Operate and debug the live Minecraft bot through its built-in MCP REPL server. Use when work requires starting the bot with `pnpm dev`, connecting to the local MCP endpoint, inspecting cognitive state/logs/history, injecting synthetic chat/events, or running targeted REPL code against the running brain during investigation and development.
4+
---
5+
6+
# Minecraft Debug MCP
7+
8+
## Overview
9+
10+
Use this skill to run the local bot and interact with its MCP debug interface safely and quickly.
11+
12+
## Quick Start Workflow
13+
14+
1. Run `pnpm dev` from `/Users/rinshinohara/Repo/airi/services/minecraft` and keep it running.
15+
2. Wait for `MCP REPL server running at http://localhost:3001` in logs.
16+
3. Connect MCP client to `http://localhost:3001/sse`.
17+
4. Verify readiness with a read-only call:
18+
- Read resource `brain://state`, or
19+
- Call tool `get_state`.
20+
5. Continue with the smallest tool/action that answers the task.
21+
22+
## Execution Rules
23+
24+
- Start read-only, then escalate to mutation tools only when needed.
25+
- Prefer `get_state`, `get_last_prompt`, and `get_logs` for diagnostics before `execute_repl`.
26+
- Prefer `get_llm_trace` for structured per-attempt reasoning/content inspection.
27+
- Keep `execute_repl` snippets minimal and reversible.
28+
- Use `inject_chat` for conversational simulation and `inject_event` only when specific event-shape testing is required.
29+
- Treat `inject_chat` as side-effectful: it can trigger actual in-game bot replies/actions.
30+
- If MCP connection fails, check that `pnpm dev` is still running and port `3001` is free.
31+
32+
## Tooling Strategy
33+
34+
- Use `get_state` to inspect queue/processing state.
35+
- Use `get_logs` with a small `limit` first.
36+
- Use `get_last_prompt` to inspect latest LLM input.
37+
- Use `execute_repl` for deep object inspection or one-off targeted calls on the running brain.
38+
- Use `inject_chat` to simulate player chat and verify behavior loop.
39+
- Use `get_llm_trace` to assert planner behavior in automation (for example, detect repeated `await skip()` on specific events).
40+
41+
Read `references/mcp-surface.md` for exact tool/resource names and argument schemas.
42+
43+
## Live-Tested Notes
44+
45+
- `get_state` returns a large variable snapshot; prefer it over REPL for first-pass health checks.
46+
- `get_last_prompt` can return very large payloads; call only when prompt-level debugging is needed.
47+
- `execute_repl` returns a structured result where `returnValue` is stringified; parse mentally as display output, not typed JSON.
48+
- `get_logs(limit=10)` is enough to verify whether an injected event reached planner/executor.
49+
- `get_llm_trace(limit, turnId?)` gives structured attempt-level trace data (messages, content, reasoning, usage, duration).
50+
- `get_last_prompt` and `get_llm_trace` are compacted for MCP: system prompt/system-role messages are omitted to reduce token cost.
51+
- If environment summary shows `"SOMETHING WENT WRONG, YOU SHOULD NOTIFY THE USER OF THIS"`, treat it as degraded runtime context and avoid high-confidence world actions.
52+
53+
## Live Testing Workflow
54+
55+
1. Confirm MCP health:
56+
- Call `get_state`.
57+
2. Capture baseline inventory:
58+
- `execute_repl` with `query.inventory().list().map(i => ({ name: i.name, count: i.count }))`.
59+
3. Trigger a task through normal cognition path:
60+
- Call `inject_chat` with a clear instruction (example: "please gather 3 dirt blocks").
61+
4. Verify execution trace:
62+
- Call `get_logs(limit=10)` and check for:
63+
- bot acknowledgement chat
64+
- action tool feedback (for example `collectBlocks`)
65+
- planner result summary
66+
- Call `get_llm_trace(limit=5)` when you need exact model output/reasoning for assertions.
67+
5. Re-check inventory using the same REPL snippet and compare against baseline.
68+
69+
Use this workflow when validating behavior changes, tool wiring, or regressions in planning/execution loops.
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
interface:
2+
display_name: 'Minecraft Debug MCP'
3+
short_description: 'Operate the live Minecraft debug MCP bot safely'
4+
default_prompt: 'Start the bot with pnpm dev, connect to the minecraft-debug MCP, inspect brain state/logs, and execute focused debug actions.'
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# Minecraft Debug MCP Surface
2+
3+
Implementation source: `/Users/rinshinohara/Repo/airi/services/minecraft/src/debug/mcp-repl-server.ts`.
4+
5+
## Endpoint
6+
7+
- Base server: `http://localhost:3001`
8+
- MCP endpoint: `http://localhost:3001/sse`
9+
- SSE fallback endpoint: `GET /sse` + `POST /messages`
10+
11+
The bot starts this server during normal runtime from:
12+
- `/Users/rinshinohara/Repo/airi/services/minecraft/src/cognitive/index.ts`
13+
14+
## Resources
15+
16+
- `brain://state`
17+
- Summary state: processing, queue length, turn, give-up timer.
18+
- `brain://context`
19+
- Current context view text.
20+
- `brain://history`
21+
- Conversation history JSON.
22+
- `brain://logs`
23+
- Latest LLM log entries JSON (last 50 in resource output).
24+
25+
## Tools
26+
27+
- `get_state()`
28+
- Returns current REPL/brain state JSON.
29+
30+
- `get_last_prompt()`
31+
- Returns latest LLM input JSON.
32+
- Returns error when no prompt exists yet.
33+
- Compacted payload: omits `systemPrompt` and drops `messages` items with `role: "system"`.
34+
35+
- `get_logs(limit?: number)`
36+
- Returns recent LLM logs; start with small limits.
37+
38+
- `get_llm_trace(limit?: number, turnId?: number)`
39+
- Returns structured LLM trace entries captured per attempt.
40+
- Includes: turn/source metadata, messages, generated content, reasoning (if available), token usage, and duration.
41+
- Use `turnId` to isolate trace for one injected test event.
42+
- Compacted payload: drops `messages` items with `role: "system"` to save tokens.
43+
44+
- `execute_repl(code: string)`
45+
- Executes debug REPL code in running brain context.
46+
- Use for focused inspection/action only.
47+
48+
- `inject_chat(username: string, message: string)`
49+
- Injects a synthetic chat perception event.
50+
51+
- `inject_event(type, payload, source)`
52+
- `type`: `perception | feedback | world_update | system_alert`
53+
- `source.type`: `minecraft | airi | system`
54+
- `source.id`: string
55+
- Use only with deliberate, test-specific payloads.
56+
57+
## Troubleshooting
58+
59+
- Connection refused:
60+
- Ensure `pnpm dev` is running in the service directory.
61+
- Confirm logs include `MCP REPL server running at http://localhost:3001`.
62+
- 404/invalid endpoint:
63+
- Use `/sse` as MCP entrypoint.
64+
- Empty prompt/logs:
65+
- Trigger activity first (for example via `inject_chat`) and retry `get_last_prompt` or `get_logs`.
66+
67+
## Live-Tested Behavior Notes
68+
69+
- `inject_chat` is not a passive write: it enters the normal cognition pipeline and can cause the bot to send chat/actions.
70+
- `get_last_prompt` may be very large (full system prompt + history); avoid repeated calls unless needed.
71+
- `get_last_prompt` is now MCP-compacted (no raw system prompt text), which makes it cheaper for automation checks.
72+
- `execute_repl` response includes metadata (`source`, `durationMs`, `actions`, `logs`) and a stringified `returnValue`.
73+
- Log verification pattern that worked reliably:
74+
1. `inject_chat(...)`
75+
2. `get_logs(limit: 10)`
76+
3. Confirm sequence: `turn_input` -> `llm_attempt` -> `feedback` -> `planner_result`
77+
78+
## Repeatable Smoke Test Recipe
79+
80+
Use this exact sequence for fast live validation:
81+
82+
1. Baseline
83+
- `get_state()`
84+
- `execute_repl("query.inventory().list().map(i => ({ name: i.name, count: i.count }))")`
85+
2. Task trigger
86+
- `inject_chat({ username: \"codex-live-test\", message: \"please gather 3 dirt blocks\" })`
87+
3. Execution proof
88+
- `get_logs({ limit: 10 })`
89+
- Expect acknowledgement chat + `collectBlocks` success feedback + planner summary.
90+
- `get_llm_trace({ limit: 5 })`
91+
- Assert expected LLM behavior (for example response code, or repeated `await skip()`).
92+
- Assert trace payload does not include `role: "system"` entries.
93+
4. Outcome proof
94+
- Run the same inventory `execute_repl` call again and compare item counts.
95+
96+
## Runtime Caveat Seen Live
97+
98+
- If a turn includes `Environment: SOMETHING WENT WRONG, YOU SHOULD NOTIFY THE USER OF THIS`, treat the world snapshot as degraded and avoid issuing risky autonomous actions until context stabilizes.

services/minecraft/src/cognitive/conscious/brain.ts

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,26 @@ interface LlmInputSnapshot {
7575
attempt: number
7676
}
7777

78+
interface LlmTraceEntry {
79+
id: number
80+
turnId: number
81+
timestamp: number
82+
eventType: string
83+
sourceType: string
84+
sourceId: string
85+
attempt: number
86+
model: string
87+
messages: Message[]
88+
content: string
89+
reasoning?: string
90+
usage?: {
91+
prompt_tokens?: number
92+
completion_tokens?: number
93+
total_tokens?: number
94+
}
95+
durationMs: number
96+
}
97+
7898
interface RuntimeInputEnvelope {
7999
id: number
80100
turnId: number
@@ -137,6 +157,8 @@ export class Brain {
137157
private runtimeMineflayer: MineflayerWithAgents | null = null
138158
private readonly llmLogEntries: LlmLogEntry[] = []
139159
private llmLogIdCounter = 0
160+
private readonly llmTraceEntries: LlmTraceEntry[] = []
161+
private llmTraceIdCounter = 0
140162
private turnCounter = 0
141163
private currentInputEnvelope: RuntimeInputEnvelope | null = null
142164
private readonly llmLogRuntime = createLlmLogRuntime(() => this.llmLogEntries)
@@ -277,6 +299,20 @@ export class Brain {
277299
return entries.slice(-Math.floor(limit))
278300
}
279301

302+
public getLlmTrace(limit?: number, turnId?: number): LlmTraceEntry[] {
303+
let entries = [...this.llmTraceEntries]
304+
if (typeof turnId === 'number' && Number.isFinite(turnId)) {
305+
const normalizedTurnId = Math.floor(turnId)
306+
entries = entries.filter(entry => entry.turnId === normalizedTurnId)
307+
}
308+
309+
if (typeof limit === 'number' && Number.isFinite(limit) && limit > 0) {
310+
entries = entries.slice(-Math.floor(limit))
311+
}
312+
313+
return JSON.parse(JSON.stringify(entries)) as LlmTraceEntry[]
314+
}
315+
280316
public async injectDebugEvent(event: BotEvent): Promise<void> {
281317
if (!this.runtimeMineflayer) {
282318
throw new Error('Brain runtime is not initialized yet')
@@ -636,6 +672,24 @@ export class Brain {
636672
model: config.openai.model,
637673
duration: Date.now() - traceStart,
638674
})
675+
this.llmTraceEntries.push({
676+
id: ++this.llmTraceIdCounter,
677+
turnId,
678+
timestamp: Date.now(),
679+
eventType: event.type,
680+
sourceType: event.source.type,
681+
sourceId: event.source.id,
682+
attempt,
683+
model: config.openai.model,
684+
messages: this.cloneMessages(messages),
685+
content,
686+
reasoning,
687+
usage: llmResult.usage,
688+
durationMs: Date.now() - traceStart,
689+
})
690+
if (this.llmTraceEntries.length > 500) {
691+
this.llmTraceEntries.shift()
692+
}
639693
this.currentInputEnvelope.llm = {
640694
attempt,
641695
model: config.openai.model,

services/minecraft/src/debug/mcp-repl-server.test.ts

Lines changed: 38 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,8 +53,27 @@ describe('mcpReplServer', () => {
5353
executeDebugRepl: vi.fn().mockResolvedValue({ result: 'success' }),
5454
injectDebugEvent: vi.fn().mockResolvedValue(undefined),
5555
getReplState: vi.fn().mockReturnValue({ variables: [], updatedAt: 0 }),
56-
getLastLlmInput: vi.fn().mockReturnValue({ systemPrompt: 'sys', userMessage: 'user' }),
56+
getLastLlmInput: vi.fn().mockReturnValue({
57+
systemPrompt: 'sys',
58+
userMessage: 'user',
59+
messages: [
60+
{ role: 'system', content: 'sys' },
61+
{ role: 'user', content: 'user' },
62+
],
63+
conversationHistory: [],
64+
updatedAt: 0,
65+
attempt: 1,
66+
}),
5767
getLlmLogs: vi.fn().mockReturnValue([{ id: 1, text: 'log' }]),
68+
getLlmTrace: vi.fn().mockReturnValue([{
69+
id: 1,
70+
turnId: 1,
71+
content: 'await skip()',
72+
messages: [
73+
{ role: 'system', content: 'sys' },
74+
{ role: 'user', content: 'u' },
75+
],
76+
}]),
5877
} as unknown as Brain
5978

6079
server = new McpReplServer(brain)
@@ -75,6 +94,7 @@ describe('mcpReplServer', () => {
7594
expect(mocks.tool).toHaveBeenCalledWith('get_state', expect.anything(), expect.any(Function))
7695
expect(mocks.tool).toHaveBeenCalledWith('get_last_prompt', expect.anything(), expect.any(Function))
7796
expect(mocks.tool).toHaveBeenCalledWith('get_logs', expect.anything(), expect.any(Function))
97+
expect(mocks.tool).toHaveBeenCalledWith('get_llm_trace', expect.anything(), expect.any(Function))
7898
})
7999

80100
it('executes repl via tool handler', async () => {
@@ -117,9 +137,12 @@ describe('mcpReplServer', () => {
117137
const handler = toolCall[2]
118138

119139
const result = await handler({})
140+
const text = result.content[0].text as string
120141

121142
expect(brain.getLastLlmInput).toHaveBeenCalled()
122-
expect(result.content[0].text).toContain('sys')
143+
expect(text).toContain('user')
144+
expect(text).not.toContain('systemPrompt')
145+
expect(text).not.toContain('"role":"system"')
123146
})
124147

125148
it('gets logs via tool handler', async () => {
@@ -131,4 +154,17 @@ describe('mcpReplServer', () => {
131154
expect(brain.getLlmLogs).toHaveBeenCalledWith(10)
132155
expect(result.content[0].text).toContain('log')
133156
})
157+
158+
it('gets llm trace via tool handler', async () => {
159+
const toolCall = mocks.tool.mock.calls.find(call => call[0] === 'get_llm_trace')
160+
const handler = toolCall[2]
161+
162+
const result = await handler({ limit: 5, turnId: 3 })
163+
const text = result.content[0].text as string
164+
165+
expect(brain.getLlmTrace).toHaveBeenCalledWith(5, 3)
166+
expect(text).toContain('await skip()')
167+
expect(text).toContain('"role":"user"')
168+
expect(text).not.toContain('"role":"system"')
169+
})
134170
})

services/minecraft/src/debug/mcp-repl-server.ts

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -198,8 +198,17 @@ export class McpReplServer {
198198
isError: true,
199199
}
200200
}
201+
const {
202+
systemPrompt: _systemPrompt,
203+
messages,
204+
...rest
205+
} = result
206+
const compactMessages = messages.filter(message => message.role !== 'system')
201207
return {
202-
content: [{ type: 'text', text: JSON.stringify(result) }],
208+
content: [{ type: 'text', text: JSON.stringify({
209+
...rest,
210+
messages: compactMessages,
211+
}) }],
203212
}
204213
},
205214
)
@@ -216,6 +225,25 @@ export class McpReplServer {
216225
}
217226
},
218227
)
228+
229+
this.mcpServer.tool(
230+
'get_llm_trace',
231+
{
232+
limit: z.number().optional(),
233+
turnId: z.number().optional(),
234+
},
235+
async ({ limit, turnId }) => {
236+
const result = this.brain
237+
.getLlmTrace(limit, turnId)
238+
.map(entry => ({
239+
...entry,
240+
messages: entry.messages.filter(message => message.role !== 'system'),
241+
}))
242+
return {
243+
content: [{ type: 'text', text: JSON.stringify(result) }],
244+
}
245+
},
246+
)
219247
}
220248

221249
start(): void {

0 commit comments

Comments
 (0)