Skip to content

feat: add tool calling support to m serve#850

Open
markstur wants to merge 18 commits intogenerative-computing:mainfrom
markstur:issue_825
Open

feat: add tool calling support to m serve#850
markstur wants to merge 18 commits intogenerative-computing:mainfrom
markstur:issue_825

Conversation

@markstur
Copy link
Copy Markdown
Contributor

@markstur markstur commented Apr 13, 2026

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

Feature: OpenAI-Compatible Tool Calling Support

This feature adds comprehensive tool calling support to the m serve CLI command, making it fully compatible with OpenAI's tool/function calling API.

Core Capabilities Added

1. Tool Calling Protocol Support

  • Full OpenAI-compatible tool calling in both streaming and non-streaming modes
  • Support for tools, tool_choice parameters
  • Proper handling of tool call responses with tool_call_id tracking

2. Streaming Tool Calls

  • Incremental streaming of tool call deltas with proper index field tracking
  • Correct finish_reason handling (tool_calls vs stop)
  • Optional usage statistics in final streaming chunk via stream_options.include_usage

3. Enhanced Model Options

  • Added TOOL_CHOICE to ModelOptions as a first-class option (like TEMPERATURE)
  • Proper integration with Mellea's backend system

4. Helper Utilities

  • build_tool_calls() - Converts Mellea tool outputs to OpenAI format
  • build_completion_usage() - Extracts token usage from model output
  • Robust JSON serialization with fallback handling

Testing Coverage

Added comprehensive test suites (~1,500+ lines):

  • Unit tests: Tool call formatting, index verification, helper functions
  • Integration tests: Full request/response cycles with TestClient
  • Streaming tests: Tool call delta generation and usage statistics
  • Examples: Three working client examples demonstrating tool calling patterns

Key Files Modified

  • cli/serve/app.py - Tool call handling in completion endpoint
  • cli/serve/streaming.py - Streaming tool call delta generation
  • cli/serve/models.py - OpenAI-compatible request/response models
  • mellea/helpers/openai_compatible_helpers.py - Tool formatting utilities
  • mellea/backends/model_options.py - Added TOOL_CHOICE option

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

@markstur markstur requested a review from a team as a code owner April 13, 2026 23:38
@markstur markstur marked this pull request as draft April 13, 2026 23:38
@github-actions github-actions Bot added the enhancement New feature or request label Apr 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@planetf1
Copy link
Copy Markdown
Contributor

@markstur Do you want review comments yet or still WIP?

@markstur
Copy link
Copy Markdown
Contributor Author

@markstur Do you want review comments yet or still WIP?

Comments would be great! It is draft because I need to do more review/test myself on the generated code. I don't want to waste your time but comments early would be very welcome.

Copy link
Copy Markdown
Member

@psschwei psschwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: feat: add tool calling support to m serve

Good feature PR — the core plumbing is correct and the OpenAI-compatible response format looks right. A couple of bugs to fix before merge, plus some improvements.

Summary

The implementation correctly wires tool calling through the serve endpoint: tools maps to ModelOption.TOOLS, tool_choice passes through as-is, and the response extracts tool calls from ModelOutputThunk into the OpenAI format. The Pydantic models mirror the OpenAI types well, and tests cover the main paths.

Two bugs need fixing (see inline comments):

  1. Empty tool_calls dict produces incorrect finish_reason: "tool_calls" with an empty array
  2. Client example's multi-turn loop duplicates the assistant message for each tool call

Other improvements (see inline comments):

  • Unused loop variable tool_name
  • eval() in example code with # noqa suppressing the security lint for copy-pasters
  • Missing test for the empty dict edge case
  • hasattr check is always true for ModelOutputThunk — defensive but masks upstream bugs

What's working well

  • Pydantic models (ToolCallFunction, ChatCompletionMessageToolCall) closely match OpenAI types
  • _build_model_options change is clean — tools removed from exclusion set, mapped to ModelOption.TOOLS
  • 8 well-structured tests covering single/multiple tool calls, finish reasons, model_options passthrough, complex args, usage info, and backward compat
  • Existing test updated consistently from "excluded" to "passed"

Comment thread cli/serve/app.py Outdated
Comment thread cli/serve/app.py Outdated
Comment thread test/cli/test_serve_tool_calling.py
Comment thread docs/examples/m_serve/client_tool_calling.py Outdated
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two additional items not yet covered in existing review comments.

Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two additional items not covered in existing review comments.

Comment thread cli/serve/app.py Outdated
Comment thread docs/examples/m_serve/m_serve_example_tool_calling.py Outdated
Comment thread docs/examples/m_serve/m_serve_example_tool_calling.py Outdated
@markstur
Copy link
Copy Markdown
Contributor Author

Code Review: feat: add tool calling support to m serve

Good feature PR — the core plumbing is correct and the OpenAI-compatible response format looks right. A couple of bugs to fix before merge, plus some improvements.

Summary

The implementation correctly wires tool calling through the serve endpoint: tools maps to ModelOption.TOOLS, tool_choice passes through as-is, and the response extracts tool calls from ModelOutputThunk into the OpenAI format. The Pydantic models mirror the OpenAI types well, and tests cover the main paths.

Two bugs need fixing (see inline comments):

  1. Empty tool_calls dict produces incorrect finish_reason: "tool_calls" with an empty array
  2. Client example's multi-turn loop duplicates the assistant message for each tool call

Other improvements (see inline comments):

  • Unused loop variable tool_name
  • eval() in example code with # noqa suppressing the security lint for copy-pasters
  • Missing test for the empty dict edge case
  • hasattr check is always true for ModelOutputThunk — defensive but masks upstream bugs

What's working well

  • Pydantic models (ToolCallFunction, ChatCompletionMessageToolCall) closely match OpenAI types
  • _build_model_options change is clean — tools removed from exclusion set, mapped to ModelOption.TOOLS
  • 8 well-structured tests covering single/multiple tool calls, finish reasons, model_options passthrough, complex args, usage info, and backward compat
  • Existing test updated consistently from "excluded" to "passed"

Fixed all these.
The eval one goes away with the removal of calc (replaced by stock "look-up")

@markstur markstur requested a review from psschwei April 17, 2026 20:49
@markstur markstur marked this pull request as ready for review April 17, 2026 20:49
@markstur markstur requested a review from planetf1 April 17, 2026 20:49
Copy link
Copy Markdown
Member

@psschwei psschwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Since @planetf1 had also reviewed, will leave final approval to him

@markstur markstur dismissed psschwei’s stale review April 20, 2026 22:42

dismissing the requested change which was resolved
@psschwei approved but waiting for @planetf1 to approve

Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most issues fine - I just have one concern -- the finish reason I don't think is what you intended for streaming?

Comment thread cli/serve/app.py
Comment thread cli/serve/streaming.py
Comment thread cli/serve/app.py
Comment thread test/cli/test_serve_tool_calling.py
Comment thread docs/examples/m_serve/client_tool_calling.py
Comment thread mellea/helpers/openai_compatible_helpers.py
@markstur
Copy link
Copy Markdown
Contributor Author

with CI fix and rebase this is ready again. All feedback is addressed including 1 issue created.

markstur added 18 commits April 30, 2026 17:25
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
…y dict

Fixed the bug where an empty tool_calls dict ({}) incorrectly produced finish_reason="tool_calls" with an empty array instead of finish_reason="stop" with tool_calls=None.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
…xample

Issue: The assistant message was being added inside the loop for each tool call, causing duplication when multiple tool calls were present.
Fix: Moved the assistant message append outside the loop (before processing tool calls), so it's only added once. Now the loop only adds tool responses.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
The dict key tool_name is never used — the function name comes from model_tool_call.name. Using .values() instead.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Replaced hasattr() with direct __dict__ membership tests to correctly distinguish:

1. Typed instances (ModelOutputThunk[float](...)) - have __orig_class__ in their instance dict
2. Untyped instances (ModelOutputThunk(...)) - do NOT have __orig_class__ in their instance dict

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Security issue resolved in `m_serve_example_tool_calling.py`:

**Changes made:**
- Replaced `CalculatorTool` (which used unsafe `eval()` with `# noqa: S307`) with `GetStockPriceTool`
- New tool demonstrates API-calling pattern with mock stock prices (AAPL, GOOGL, MSFT, TSLA)
- Updated all references: `calculator_tool` → `stock_price_tool`
- Maintains the same tool calling demonstration with two tools (weather + stock price)

**Why this is better:**
- Eliminates security risk entirely (no `eval()` or suppressed lints)
- Still demonstrates multiple tools effectively
- Uses safe, realistic API-calling pattern that users can copy
- No dangerous code that could be copy-pasted into production

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
The pass-thru behavior was not clear enough, so adding it to ModelOptions
where important options are known.  Most of these are sentinels which are
removed (because @@@) but this will be like TEMPERATURE which is passed
through to the backends.

No behavior change, but give a handly constant and a place to look for these.
This does not address all the other possible pass through args.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Assisted-by: IBM Bob
- switch server example to OpenAIBackend
- align tool-calling example with tested Granite model setup
- narrow advertised tools when `tool_choice` selects a specific function
- enable `tool_calls=True` in the serve path
- replace calculator example with stock-price tool
- examples 1/2 as tool-call-only demos
- example 4 as the full tool execution round-trip
- improve client diagnostics for empty/no-tool responses

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Assisted-by: IBM Bob
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Assisted-by: IBM Bob
The OpenAI streaming spec requires each item in delta.tool_calls to carry
an index field. Clients including the openai Python SDK, LangChain, and
LiteLLM key their delta-reassembly state machine on this field.

Without it, they silently drop tool calls, coalesce them incorrectly, or
raise a TypeError depending on version.

Changes:
- Add ChatCompletionMessageToolCallDelta model with required index field
- Add ToolCallFunctionDelta model for streaming function deltas
- Update ChatCompletionChunkDelta to use delta models
- Update streaming.py to populate index field using enumerate()
- Add comprehensive tests verifying index field presence
- Update existing test to check for index field

The bundled client_streaming_tool_calling.py example masked this issue
because it reads delta.tool_calls verbatim rather than going through
SDK delta reassembly.

Fixes compatibility with OpenAI SDK, LangChain, and LiteLLM streaming
tool call consumers.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Assisted-by: IBM Bob
build_tool_calls was called before streaming block and then not used in case of streaming.
Rearrange condition and call to avoid wasted call.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
…ing and tool calling

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Use str to non-serializable types. This should effectively avoid TypeError (in normal situations).

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
New tests for tooling improved coverage, but the significant
rewrite caused too much diverging from main. Keeping the old
tests in places while adding new tests in new file will help
sort this out.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Rebased and now the new tests need updating.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
More new tests need fixing after rebase.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
@markstur markstur enabled auto-merge May 1, 2026 01:02
@markstur markstur requested a review from psschwei May 1, 2026 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

m serve OpenAI API tool calling round-trip

3 participants