Summary
Add support for returning images (and other binary content) from tool results, enabling a fetch_screenshot tool that returns page screenshots for Claude to interpret visually.
Background
While implementing Chrome profile support for fetch_html, we discovered that:
- The readability algorithm strips content from feed-style pages (Twitter/X, etc.)
- Screenshots would be ideal for visual content Claude can interpret directly
- chromiumoxide supports full-page screenshots via
page.screenshot()
Current Limitation
The genai library's ToolResponse only supports string content:
// lib/genai/src/chat/tool/tool_response.rs
pub struct ToolResponse {
pub call_id: String,
pub content: String, // <-- String only
}
And the Anthropic adapter serializes it as a simple string:
// lib/genai/src/adapter/adapters/anthropic/adapter_impl.rs:570-574
values.push(json!({
"type": "tool_result",
"content": tool_response.content, // <-- Just a string
"tool_use_id": tool_response.call_id,
}));
What Anthropic API Actually Supports
Anthropic's API accepts rich content in tool_result, including images:
{
"type": "tool_result",
"tool_use_id": "toolu_...",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": "iVBORw0KGgo..."
}
},
{
"type": "text",
"text": "Screenshot of the page"
}
]
}
Proposed Changes
1. Modify genai's ToolResponse
Change content from String to support rich content:
pub struct ToolResponse {
pub call_id: String,
pub content: ToolResponseContent,
}
pub enum ToolResponseContent {
Text(String),
Parts(Vec<ContentPart>),
}
Or simply use MessageContent:
pub struct ToolResponse {
pub call_id: String,
pub content: MessageContent,
}
2. Update Anthropic Adapter
Serialize images properly when present in tool results.
3. Update Codey's Agent
Change submit_tool_result signature:
// Current
pub fn submit_tool_result(&mut self, call_id: &str, content: String)
// New
pub fn submit_tool_result(&mut self, call_id: &str, content: impl Into<MessageContent>)
4. Add fetch_screenshot Tool
pub async fn fetch_screenshot(url: &str) -> Result<Vec<u8>, String> {
// Uses same browser infrastructure as fetch_html
// Returns PNG bytes
}
Use Cases
- Twitter/X feeds - Readability strips most content; screenshots preserve full context
- Dashboards - Visual layouts don't convert well to markdown
- Charts/graphs - Better interpreted visually
- Any SPA - Complex rendered content
Related
References
- Anthropic Tool Use Docs
- genai ContentPart already supports Binary:
ContentPart::from_binary_base64("image/png", data, None)
- chromiumoxide screenshot:
page.screenshot(ScreenshotParams::builder().full_page(true).build())
Summary
Add support for returning images (and other binary content) from tool results, enabling a
fetch_screenshottool that returns page screenshots for Claude to interpret visually.Background
While implementing Chrome profile support for
fetch_html, we discovered that:page.screenshot()Current Limitation
The
genailibrary'sToolResponseonly supports string content:And the Anthropic adapter serializes it as a simple string:
What Anthropic API Actually Supports
Anthropic's API accepts rich content in tool_result, including images:
{ "type": "tool_result", "tool_use_id": "toolu_...", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": "iVBORw0KGgo..." } }, { "type": "text", "text": "Screenshot of the page" } ] }Proposed Changes
1. Modify genai's ToolResponse
Change
contentfromStringto support rich content:Or simply use
MessageContent:2. Update Anthropic Adapter
Serialize images properly when present in tool results.
3. Update Codey's Agent
Change
submit_tool_resultsignature:4. Add fetch_screenshot Tool
Use Cases
Related
References
ContentPart::from_binary_base64("image/png", data, None)page.screenshot(ScreenshotParams::builder().full_page(true).build())