feat: add tool calling support to m serve#850
feat: add tool calling support to m serve#850markstur wants to merge 18 commits intogenerative-computing:mainfrom
Conversation
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
|
@markstur Do you want review comments yet or still WIP? |
Comments would be great! It is draft because I need to do more review/test myself on the generated code. I don't want to waste your time but comments early would be very welcome. |
psschwei
left a comment
There was a problem hiding this comment.
Code Review: feat: add tool calling support to m serve
Good feature PR — the core plumbing is correct and the OpenAI-compatible response format looks right. A couple of bugs to fix before merge, plus some improvements.
Summary
The implementation correctly wires tool calling through the serve endpoint: tools maps to ModelOption.TOOLS, tool_choice passes through as-is, and the response extracts tool calls from ModelOutputThunk into the OpenAI format. The Pydantic models mirror the OpenAI types well, and tests cover the main paths.
Two bugs need fixing (see inline comments):
- Empty
tool_callsdict produces incorrectfinish_reason: "tool_calls"with an empty array - Client example's multi-turn loop duplicates the assistant message for each tool call
Other improvements (see inline comments):
- Unused loop variable
tool_name eval()in example code with# noqasuppressing the security lint for copy-pasters- Missing test for the empty dict edge case
hasattrcheck is always true forModelOutputThunk— defensive but masks upstream bugs
What's working well
- Pydantic models (
ToolCallFunction,ChatCompletionMessageToolCall) closely match OpenAI types _build_model_optionschange is clean —toolsremoved from exclusion set, mapped toModelOption.TOOLS- 8 well-structured tests covering single/multiple tool calls, finish reasons, model_options passthrough, complex args, usage info, and backward compat
- Existing test updated consistently from "excluded" to "passed"
planetf1
left a comment
There was a problem hiding this comment.
Two additional items not yet covered in existing review comments.
planetf1
left a comment
There was a problem hiding this comment.
Two additional items not covered in existing review comments.
Fixed all these. |
planetf1
left a comment
There was a problem hiding this comment.
Most issues fine - I just have one concern -- the finish reason I don't think is what you intended for streaming?
|
with CI fix and rebase this is ready again. All feedback is addressed including 1 issue created. |
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
…y dict
Fixed the bug where an empty tool_calls dict ({}) incorrectly produced finish_reason="tool_calls" with an empty array instead of finish_reason="stop" with tool_calls=None.
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
…xample Issue: The assistant message was being added inside the loop for each tool call, causing duplication when multiple tool calls were present. Fix: Moved the assistant message append outside the loop (before processing tool calls), so it's only added once. Now the loop only adds tool responses. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
The dict key tool_name is never used — the function name comes from model_tool_call.name. Using .values() instead. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Replaced hasattr() with direct __dict__ membership tests to correctly distinguish: 1. Typed instances (ModelOutputThunk[float](...)) - have __orig_class__ in their instance dict 2. Untyped instances (ModelOutputThunk(...)) - do NOT have __orig_class__ in their instance dict Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Security issue resolved in `m_serve_example_tool_calling.py`: **Changes made:** - Replaced `CalculatorTool` (which used unsafe `eval()` with `# noqa: S307`) with `GetStockPriceTool` - New tool demonstrates API-calling pattern with mock stock prices (AAPL, GOOGL, MSFT, TSLA) - Updated all references: `calculator_tool` → `stock_price_tool` - Maintains the same tool calling demonstration with two tools (weather + stock price) **Why this is better:** - Eliminates security risk entirely (no `eval()` or suppressed lints) - Still demonstrates multiple tools effectively - Uses safe, realistic API-calling pattern that users can copy - No dangerous code that could be copy-pasted into production Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
The pass-thru behavior was not clear enough, so adding it to ModelOptions where important options are known. Most of these are sentinels which are removed (because @@@) but this will be like TEMPERATURE which is passed through to the backends. No behavior change, but give a handly constant and a place to look for these. This does not address all the other possible pass through args. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com> Assisted-by: IBM Bob
- switch server example to OpenAIBackend - align tool-calling example with tested Granite model setup - narrow advertised tools when `tool_choice` selects a specific function - enable `tool_calls=True` in the serve path - replace calculator example with stock-price tool - examples 1/2 as tool-call-only demos - example 4 as the full tool execution round-trip - improve client diagnostics for empty/no-tool responses Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com> Assisted-by: IBM Bob
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com> Assisted-by: IBM Bob
The OpenAI streaming spec requires each item in delta.tool_calls to carry an index field. Clients including the openai Python SDK, LangChain, and LiteLLM key their delta-reassembly state machine on this field. Without it, they silently drop tool calls, coalesce them incorrectly, or raise a TypeError depending on version. Changes: - Add ChatCompletionMessageToolCallDelta model with required index field - Add ToolCallFunctionDelta model for streaming function deltas - Update ChatCompletionChunkDelta to use delta models - Update streaming.py to populate index field using enumerate() - Add comprehensive tests verifying index field presence - Update existing test to check for index field The bundled client_streaming_tool_calling.py example masked this issue because it reads delta.tool_calls verbatim rather than going through SDK delta reassembly. Fixes compatibility with OpenAI SDK, LangChain, and LiteLLM streaming tool call consumers. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com> Assisted-by: IBM Bob
build_tool_calls was called before streaming block and then not used in case of streaming. Rearrange condition and call to avoid wasted call. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
…ing and tool calling Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Use str to non-serializable types. This should effectively avoid TypeError (in normal situations). Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
New tests for tooling improved coverage, but the significant rewrite caused too much diverging from main. Keeping the old tests in places while adding new tests in new file will help sort this out. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Rebased and now the new tests need updating. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
More new tests need fixing after rebase. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Misc PR
Type of PR
Description
Feature: OpenAI-Compatible Tool Calling Support
This feature adds comprehensive tool calling support to the
m serveCLI command, making it fully compatible with OpenAI's tool/function calling API.Core Capabilities Added
1. Tool Calling Protocol Support
tools,tool_choiceparameterstool_call_idtracking2. Streaming Tool Calls
indexfield trackingfinish_reasonhandling (tool_callsvsstop)stream_options.include_usage3. Enhanced Model Options
TOOL_CHOICEtoModelOptionsas a first-class option (likeTEMPERATURE)4. Helper Utilities
build_tool_calls()- Converts Mellea tool outputs to OpenAI formatbuild_completion_usage()- Extracts token usage from model outputTesting Coverage
Added comprehensive test suites (~1,500+ lines):
Key Files Modified
cli/serve/app.py- Tool call handling in completion endpointcli/serve/streaming.py- Streaming tool call delta generationcli/serve/models.py- OpenAI-compatible request/response modelsmellea/helpers/openai_compatible_helpers.py- Tool formatting utilitiesmellea/backends/model_options.py- AddedTOOL_CHOICEoptionTesting