- Cerebras 120B: Outputting raw
{"tool":"browser_navigate","params":{...}}as visible text - Llama 3: Saying "this model produced incoherent output" when it was working fine (created files successfully)
- DeepSeek R1: "No response generated" after 51 lines of thinking
- Tool execution: Claims success but folders/files empty
- Tool JSON: Appearing inline without code fences
- Merged/distill models: Producing actual garbage but also false positive triggers
- Location:
main/llmEngine.jslines 693-720 - Problem: Aggressive heuristics (alpha ratio <0.25, special chars >0.12, etc.) were ABORTING valid model output
- Evidence: Llama created files successfully but detection killed it
- Fix: REMOVED ENTIRELY - Let models output naturally, no abort logic
- Location:
main/llmEngine.jslines 267-273, and 4 other session creation points - Problem: Forcing
ChatMLChatWrapperon Qwen-family models OVERRODE their native GGUF chat template - Evidence: Merged models likely have correct templates in metadata, forcing ChatML corrupted output
- Fix: REMOVED ENTIRELY - Let node-llama-cpp use native GGUF chat template
- Location:
electron-main.jslines 1341-1347 (cloud loop)electron-main.jslines 2107-2113 (local loop)
- Problem:
- Stripped ALL
json/tool blocks from accumulated response - Stripped ALL bare
{"tool":...}JSON with regex - Stripped tool result summaries like "browser_navigate done"
- This DESTROYED legitimate response content, leaving empty strings
- Stripped ALL
- Evidence: DeepSeek thinking 51 lines then "No response" = thinking stripped out
- Fix: REMOVED AGGRESSIVE STRIPPING - Only collapse excessive newlines
- Location:
main/llmEngine.jslines 342-360 (_getModelSpecificParams) - Problem:
- Merged/distill models forced to temp=0.2, topK=10 (extremely conservative)
- Small models (≤3B) forced to temp=0.5, topK=15
- This KILLED creativity and made models produce repetitive/garbage output
- Evidence: Conservative sampling causes degenerate output in many models
- Fix: REMOVED ALL OVERRIDES - Let users control params via settings
Before: Tool JSON stripped after parsing, but only from code fences After: No stripping → UI will parse and display tools correctly Root Cause: Models not using code fences, but parser DOES handle bare JSON
Before: Gibberish detection aborted generation mid-stream After: No detection → model runs to completion naturally Root Cause: Heuristics too aggressive (false positive on valid JSON/technical output)
Before: Thinking content stripped by tool JSON regex After: Only newlines collapsed → thinking preserved Root Cause: Aggressive stripping removed legitimate response text
Needs Investigation: This may be a separate file system permissions issue Action Required: Test actual file creation with fixed models
Before: Parser handles, but stripping missed bare JSON → showed as text After: No stripping → UI renders tool calls correctly from parsed data Root Cause: Double problem: models not fenced + stripping didn't catch inline
Before: ChatML override + conservative sampling forced poor output After: Native GGUF template + user-controlled sampling Root Cause: Forcing wrong chat template + killing sampling diversity
- Test Cerebras 120B with tool calls (browser navigation)
- Test local Llama 3 with file creation (verify no "incoherent" abort)
- Test DeepSeek R1 thinking model (verify response after reasoning)
- Test merged/distill models (check output quality improvement)
- Verify tool execution creates actual files on disk
- Check UI tool call rendering (both fenced and bare JSON)
-
main/llmEngine.js:
- Removed gibberish detection (lines 693-720 deleted)
- Removed ChatML wrapper forcing (lines 267-273, multiple session creations)
- Removed model-specific param overrides (lines 342-360 simplified)
-
electron-main.js:
- Removed aggressive tool JSON stripping (cloud loop lines 1341-1347)
- Removed aggressive tool JSON stripping (local loop lines 2107-2113)
- Kept only newline collapse for clean display
OLD APPROACH:
- Detect problems → abort/strip → band-aid UI fixes
- Override model behavior → force conservative sampling
- Assume models need hand-holding
NEW APPROACH:
- Trust models to output correctly with native templates
- Let users control sampling parameters
- Parser handles multiple formats (fenced + bare JSON)
- UI displays parsed tool data, not raw text
- Only minimal cleanup (whitespace), no content removal
- Tool execution failures: Check file system permissions, verify
write_fileIPC handler - UI showing raw JSON: Verify ChatPanel tool parsing logic in
renderContentParts() - Actual garbage output: May need to adjust user's temperature/topP in settings, not code overrides