refactor(tests): framework-agnostic test infrastructure#1790
refactor(tests): framework-agnostic test infrastructure#1790
Conversation
Greptile SummaryThis PR decouples the test infrastructure from LangChain by introducing
|
| Filename | Overview |
|---|---|
| tests/utils.py | New FakeLLMModel cleanly implements the LLMModel protocol; streaming flag is stored but unused in both async methods — minor dead-code concern. |
| tests/conftest.py | Adds collect_ignore_glob to skip LangChain tests when langchain_core is absent; prior-thread concerns were partially addressed by adding both *.py and **/*.py patterns. |
| tests/integrations/langchain/utils.py | Relocated FakeLLM (LangChain) and get_bound_llm_magic_mock from tests/utils.py — now correctly isolated to the LangChain integration layer. |
| tests/test_llmrails.py | All FakeLLM → FakeLLMModel swaps; assertion updated to assert llm_rails.llm is injected_llm reflecting no-adapter wrapping for protocol-compliant models. |
| tests/test_output_rails_tool_calls.py | Rewritten to use FakeLLMModel with LLMResponse/ToolCall protocol types instead of LangChain AIMessage; logic unchanged. |
| tests/integrations/langchain/runnable_rails/test_streaming.py | New comprehensive streaming test suite for LangChain RunnableRails; correctly imports FakeLLM from the integration utils.py. |
Sequence Diagram
sequenceDiagram
participant CoreTest as Core Test (tests/test_*.py)
participant IntegTest as Integration Test (tests/integrations/langchain/)
participant FakeLLMModel as FakeLLMModel (tests/utils.py)
participant FakeLLM as FakeLLM (tests/integrations/langchain/utils.py)
participant LLMRails as LLMRails
participant LLMModel as LLMModel Protocol
CoreTest->>FakeLLMModel: FakeLLMModel(responses=[...])
FakeLLMModel-->>CoreTest: instance (implements LLMModel)
CoreTest->>LLMRails: LLMRails(config, llm=fake_llm)
LLMRails->>LLMModel: generate_async(prompt) / stream_async(prompt)
LLMModel-->>LLMRails: LLMResponse / AsyncIterator[LLMResponseChunk]
LLMRails-->>CoreTest: result (no LangChain dependency)
IntegTest->>FakeLLM: FakeLLM(responses=[...])
FakeLLM-->>IntegTest: instance (LangChain BaseLLM)
IntegTest->>LLMRails: RunnableRails(config, llm=fake_llm)
LLMRails->>FakeLLM: ainvoke / _astream (LangChain API)
FakeLLM-->>LLMRails: AIMessage / GenerationChunk
LLMRails-->>IntegTest: result
Prompt To Fix All With AI
This is a comment left during a code review.
Path: tests/utils.py
Line: 45-61
Comment:
**`streaming` flag is stored but never consulted**
`FakeLLMModel.__init__` accepts and stores `self.streaming`, but neither `generate_async` nor `stream_async` reads it. In the original `FakeLLM`, `streaming=True` activated LangChain's `_astream` code path; in the new design the framework is responsible for choosing which method to call, so the flag has no effect. Any test that passes `streaming=True` expecting a behavioural difference (e.g., chunked output instead of a full string) will silently exercise the same path as `streaming=False`.
If the attribute is intentionally a no-op (because the framework decides independently), a doc-comment to that effect would prevent confusion for future contributors.
How can I resolve this? If you propose a fix, please make it concise.Reviews (7): Last reviewed commit: "apply coderabbit review suggestion" | Re-trigger Greptile
9c17d45 to
a77a7f6
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
a89e626 to
fbf78bb
Compare
…o-skip Add FakeLLMModel implementing LLMModel protocol to tests/utils.py. TestChat uses FakeLLMModel instead of LangChain FakeLLM. Add pytest conftest to auto-skip tests in tests/integrations/langchain/ when LangChain is not installed.
…langchain/ Pure file moves with git mv, no content changes. GitHub should detect these as renames.
…ixes - Create tests/integrations/langchain/utils.py with LangChain FakeLLM - Add core-friendly rewrites using FakeLLMModel for tool call, output rail, reasoning, and passthrough tests - Migrate FakeLLM→FakeLLMModel in all core test files - Remove LangChainLLMAdapter wrapping from test_llama_guard, test_patronus_lynx - Update integration test imports to use FakeLLM from new location - Fix Windows timer resolution in test_logging
3739452 to
e19d879
Compare
📝 WalkthroughWalkthroughThis PR refactors test infrastructure to decouple from LangChain-specific test doubles by introducing a framework-agnostic Changes
Sequence Diagram(s)No sequence diagrams generated. While the changes introduce new integration tests with multiple components, the scope is primarily test infrastructure refactoring (import path changes, test double replacement) and new test modules that implement testing logic rather than new production features with significant control flow interactions. Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes The PR involves substantial cross-cutting changes: refactoring the core test infrastructure with two competing LLM test doubles, updating 31+ test files with systematic (but mixed-pattern) import and instantiation changes, and introducing 5 new integration test modules with moderately complex test logic. While many individual changes are repetitive imports/instantiations (lower effort), the heterogeneity of change patterns, the number of affected files, and the need to verify test isolation and correctness across the new integration test suite warrant careful review. 🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (2)
tests/test_content_safety_integration.py (1)
353-353: Use the public action-param registration API instead of mutating runtime internals.Assigning into
runtime.registered_action_paramsmakes this test depend on internal storage details. Registering the fake throughregister_action_paramkeeps the setup aligned with the supported API and is less brittle.♻️ Suggested cleanup
- chat.app.runtime.registered_action_params["llms"] = {"content_safety_reasoning": content_safety_llm} + chat.app.register_action_param( + "llms", + {"content_safety_reasoning": content_safety_llm}, + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_content_safety_integration.py` at line 353, The test directly mutates internal storage via chat.app.runtime.registered_action_params["llms"] = {"content_safety_reasoning": content_safety_llm}; instead use the public API by calling chat.app.runtime.register_action_param (or the app-level helper if available) to register the fake LLM param under the "llms" namespace with key "content_safety_reasoning" and value content_safety_llm so the test relies on the supported registration mechanism rather than internal dict structure.tests/test_llmrails.py (1)
769-775: Assert the fullLLMModelcontract here.This regression test now checks only
generate_async, so a partial fake could still pass while missing other members the core pipeline relies on. Since this file is validating the framework-agnostic migration, make the assertion match the protocol more directly.♻️ Suggested assertion tightening
from nemoguardrails.types import LLMModel `@action`(name="test_llm_action") async def test_llm_action(llm: LLMModel): assert llm is not None - assert hasattr(llm, "generate_async") + assert isinstance(llm, LLMModel) return "llm_action_success"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_llmrails.py` around lines 769 - 775, The test currently only checks for generate_async on the LLMModel and can be bypassed by a partial fake; update test_llm_action to assert the full LLMModel contract by iterating over the members declared on the LLMModel type and asserting the provided llm has each member (e.g. use LLMModel.__annotations__ or dir(LLMModel) to enumerate expected attribute names) and for those that should be callables assert callable(getattr(llm, name)); keep references to LLMModel and test_llm_action so the check explicitly validates the protocol surface rather than a single method.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/conftest.py`:
- Around line 38-50: pytest_collection_modifyitems runs too late to prevent
top-level imports from failing; add a collection-time gate (e.g., implement
pytest_ignore_collect or collect_ignore_glob) that checks for langchain
availability and skips matching modules before they are imported. Specifically,
implement a pytest_ignore_collect(path) function that tries to import
langchain_core (or uses pytest.importorskip) and returns True to ignore
collection for any path whose parts include "integrations" and "langchain" or
the helper config files under tests/test_configs/with_custom_llm,
tests/test_configs/with_custom_chat_model, and
tests/test_configs/with_custom_llm_prompt_action_v2_x when langchain_core is
missing; keep pytest_collection_modifyitems only for post-collection markers if
desired.
In `@tests/integrations/langchain/test_reasoning_trace_extraction.py`:
- Around line 289-307: The test exposes that llm_call currently returns raw
"<think>...</think>" markup when additional_kwargs["reasoning_content"] is
present; change llm_call (used with LangChainLLMAdapter and AIMessage) so it
still prefers storing additional_kwargs["reasoning_content"] into
reasoning_trace_var but always strips any <think>...</think> segments from the
returned AIMessage.content before returning it; locate the logic in llm_call
that determines stored_trace vs output content and ensure stored_trace selection
(from additional_kwargs or embedded tags) is kept, then apply a sanitize step to
the message content (removing <think>...</think> blocks) so user-visible content
never contains raw think tags.
In `@tests/integrations/langchain/test_tool_calls_event_extraction.py`:
- Around line 119-138: The test currently only asserts call_count >= 1 so
multiple reads would still pass; modify the assertion to enforce the single-pop
contract by checking call_count == 1 for the mocked
get_and_clear_tool_calls_contextvar (mock_get_and_clear), keeping the rest of
the expectations (result from TestChat.generate_async and result["tool_calls"]
assertions) unchanged so the test verifies exactly one read/clear occurred.
- Around line 167-189: The test mutates the shared ContextVar tool_calls_var and
never restores it; change test_tool_rails_cannot_clear_context_variable to
capture the token returned by tool_calls_var.set(...) and ensure
tool_calls_var.reset(token) is called in a finally block (or teardown) so the
ContextVar is restored even on assertion failures; reference the tool_calls_var
and validate_tool_parameters symbols when making this change.
- Around line 27-49: The helper functions validate_tool_parameters and
self_check_tool_calls currently only handle a nested function.arguments schema
and miss LangChain's top-level tool-call shape (with top-level "name" and
"args"), so dangerous strings never get inspected; update both functions to
normalize the incoming call shape first by accepting either call.get("function",
{}).get("arguments", {}) or call.get("args") (and likewise accept
call.get("function", {}).get("name") or call.get("name")), then iterate
parameter values from that normalized args mapping when checking dangerous
patterns in validate_tool_parameters, and update self_check_tool_calls to
consider a call valid if it has either a "function" dict with "id"/"function" or
the top-level "name" and "args" fields (and still require an "id" if expected),
ensuring both schemas are supported before performing the existing validations.
In `@tests/integrations/langchain/test_tool_output_rails.py`:
- Around line 27-49: The validators assume an OpenAI nested format but tests use
LangChain-style tool_calls; update validate_tool_parameters and
self_check_tool_calls to accept both formats by normalizing tool_calls entries:
for each call in validate_tool_parameters extract parameters from either
call.get("function", {}).get("arguments", {}) or call.get("args", {}) (falling
back appropriately) and continue checking strings against dangerous_patterns,
and in self_check_tool_calls return True when each call is a dict that contains
either the OpenAI keys ("function" and "id") or the LangChain keys ("name",
"args", and "id"); reference the existing functions validate_tool_parameters,
self_check_tool_calls, the dangerous_patterns list, and the tool_calls variable
when making the changes.
In `@tests/integrations/langchain/utils.py`:
- Around line 104-118: get_bound_llm_magic_mock is incorrectly wrapping dict
responses with MagicMock(**...), which makes dicts behave like objects; change
both places handling the dict branch so they return a real dict instead (e.g.,
use dict(ainvoke_return_value) or ainvoke_return_value directly) for
bound_llm_mock.ainvoke.return_value and for mock_llm.ainvoke = AsyncMock(...),
while keeping the existing AsyncMock/AIMessage branch behavior; update
references to bound_llm_mock, mock_llm, and ainvoke_return_value accordingly.
In `@tests/test_tool_calls_event_extraction.py`:
- Around line 321-335: The test sets provider_metadata on the fake LLM response
but never asserts it, so add an assertion to verify metadata propagation: after
calling LLMRails.generate_async (using FakeLLMModel, LLMResponse, LLMRails)
assert that the returned result includes the provider metadata (e.g.,
result["tool_calls"][0] contains the same provider_metadata dict or a public
metadata field) to ensure metadata isn't dropped; update or add an assertion
comparing the expected {"model": "test-model", "usage": {"tokens": 50}} to the
value returned by result so the test actually verifies metadata propagation.
In `@tests/utils.py`:
- Around line 158-172: The constructor logic in TestChat silently sets self.llm
= None when both llm and llm_completions are missing, causing tests to hit the
real LLM; change the else branch to either raise a clear error or instantiate a
default stubbed FakeLLMModel instead. Specifically, in the TestChat
initialization where llm and llm_completions are checked, replace "else:
self.llm = None" with one of: (a) raise ValueError("No llm or llm_completions
provided to TestChat; tests must supply a fake"), or (b) self.llm =
FakeLLMModel(responses=[], streaming=False, exception=None, token_usage=None,
should_return_token_usage=False) so tests fail-fast or use an empty fake
respectively; refer to the TestChat constructor and the FakeLLMModel symbol when
applying the change.
---
Nitpick comments:
In `@tests/test_content_safety_integration.py`:
- Line 353: The test directly mutates internal storage via
chat.app.runtime.registered_action_params["llms"] = {"content_safety_reasoning":
content_safety_llm}; instead use the public API by calling
chat.app.runtime.register_action_param (or the app-level helper if available) to
register the fake LLM param under the "llms" namespace with key
"content_safety_reasoning" and value content_safety_llm so the test relies on
the supported registration mechanism rather than internal dict structure.
In `@tests/test_llmrails.py`:
- Around line 769-775: The test currently only checks for generate_async on the
LLMModel and can be bypassed by a partial fake; update test_llm_action to assert
the full LLMModel contract by iterating over the members declared on the
LLMModel type and asserting the provided llm has each member (e.g. use
LLMModel.__annotations__ or dir(LLMModel) to enumerate expected attribute names)
and for those that should be callables assert callable(getattr(llm, name)); keep
references to LLMModel and test_llm_action so the check explicitly validates the
protocol surface rather than a single method.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: d2e27196-c546-46d7-8416-3b256fa8d9af
📒 Files selected for processing (60)
tests/conftest.pytests/integrations/langchain/llm/__init__.pytests/integrations/langchain/llm/models/__init__.pytests/integrations/langchain/llm/models/test_langchain_init_scenarios.pytests/integrations/langchain/llm/models/test_langchain_initialization_methods.pytests/integrations/langchain/llm/models/test_langchain_initializer.pytests/integrations/langchain/llm/models/test_langchain_special_cases.pytests/integrations/langchain/llm/providers/__init__.pytests/integrations/langchain/llm/providers/test_deprecated_providers.pytests/integrations/langchain/llm/providers/test_providers.pytests/integrations/langchain/llm/providers/test_trtllm_provider.pytests/integrations/langchain/llm/test_langchain_integration.pytests/integrations/langchain/llm/test_version_compatibility.pytests/integrations/langchain/runnable_rails/__init__.pytests/integrations/langchain/runnable_rails/test_basic_operations.pytests/integrations/langchain/runnable_rails/test_batch_as_completed.pytests/integrations/langchain/runnable_rails/test_batching.pytests/integrations/langchain/runnable_rails/test_composition.pytests/integrations/langchain/runnable_rails/test_format_output.pytests/integrations/langchain/runnable_rails/test_history.pytests/integrations/langchain/runnable_rails/test_message_utils.pytests/integrations/langchain/runnable_rails/test_metadata.pytests/integrations/langchain/runnable_rails/test_piping.pytests/integrations/langchain/runnable_rails/test_runnable_rails.pytests/integrations/langchain/runnable_rails/test_streaming.pytests/integrations/langchain/runnable_rails/test_tool_calling.pytests/integrations/langchain/runnable_rails/test_transform_input.pytests/integrations/langchain/runnable_rails/test_types.pytests/integrations/langchain/test_actions_llm_utils.pytests/integrations/langchain/test_langchain_llm_adapter.pytests/integrations/langchain/test_output_rails_tool_calls.pytests/integrations/langchain/test_reasoning_trace_extraction.pytests/integrations/langchain/test_server_streaming.pytests/integrations/langchain/test_streaming.pytests/integrations/langchain/test_tool_calling_passthrough_only.pytests/integrations/langchain/test_tool_calling_utils.pytests/integrations/langchain/test_tool_calls_event_extraction.pytests/integrations/langchain/test_tool_output_rails.pytests/integrations/langchain/utils.pytests/test_content_safety_actions.pytests/test_content_safety_cache.pytests/test_content_safety_integration.pytests/test_context_updates.pytests/test_execute_action.pytests/test_guardrails_ai_e2e_v1.pytests/test_integration_cache.pytests/test_jailbreak_cache.pytests/test_llama_guard.pytests/test_llmrails.pytests/test_output_rails_tool_calls.pytests/test_patronus_lynx.pytests/test_prompt_generation.pytests/test_reasoning_trace_extraction.pytests/test_system_message_conversion.pytests/test_tool_calling_passthrough_only.pytests/test_tool_calls_event_extraction.pytests/test_tool_output_rails.pytests/test_topic_safety_cache.pytests/tracing/spans/test_span_v2_integration.pytests/utils.py
| async def test_llm_call_prefers_additional_kwargs_over_think_tags(self): | ||
| reasoning_from_kwargs = "This should be used" | ||
| reasoning_from_tags = "This should be ignored" | ||
|
|
||
| mock_llm = AsyncMock() | ||
| mock_response = AIMessage( | ||
| content=f"<think>{reasoning_from_tags}</think>Response", | ||
| additional_kwargs={"reasoning_content": reasoning_from_kwargs}, | ||
| ) | ||
| mock_llm.ainvoke = AsyncMock(return_value=mock_response) | ||
|
|
||
| from nemoguardrails.actions.llm.utils import llm_call | ||
|
|
||
| reasoning_trace_var.set(None) | ||
| result = await llm_call(LangChainLLMAdapter(mock_llm), "Query") | ||
|
|
||
| assert result.content == f"<think>{reasoning_from_tags}</think>Response" | ||
| stored_trace = reasoning_trace_var.get() | ||
| assert stored_trace == reasoning_from_kwargs |
There was a problem hiding this comment.
Don't lock in <think> tags as user-visible output.
This assertion makes the mixed case preserve raw <think>...</think> content whenever additional_kwargs["reasoning_content"] is present. That hardens the exact leakage path the neighboring tests sanitize. The precedence rule should only decide which trace gets stored; the rendered content should still be stripped.
Possible fix
- assert result.content == f"<think>{reasoning_from_tags}</think>Response"
+ assert result.content == "Response"
+ assert "<think>" not in result.content
stored_trace = reasoning_trace_var.get()
assert stored_trace == reasoning_from_kwargs📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def test_llm_call_prefers_additional_kwargs_over_think_tags(self): | |
| reasoning_from_kwargs = "This should be used" | |
| reasoning_from_tags = "This should be ignored" | |
| mock_llm = AsyncMock() | |
| mock_response = AIMessage( | |
| content=f"<think>{reasoning_from_tags}</think>Response", | |
| additional_kwargs={"reasoning_content": reasoning_from_kwargs}, | |
| ) | |
| mock_llm.ainvoke = AsyncMock(return_value=mock_response) | |
| from nemoguardrails.actions.llm.utils import llm_call | |
| reasoning_trace_var.set(None) | |
| result = await llm_call(LangChainLLMAdapter(mock_llm), "Query") | |
| assert result.content == f"<think>{reasoning_from_tags}</think>Response" | |
| stored_trace = reasoning_trace_var.get() | |
| assert stored_trace == reasoning_from_kwargs | |
| async def test_llm_call_prefers_additional_kwargs_over_think_tags(self): | |
| reasoning_from_kwargs = "This should be used" | |
| reasoning_from_tags = "This should be ignored" | |
| mock_llm = AsyncMock() | |
| mock_response = AIMessage( | |
| content=f"<think>{reasoning_from_tags}</think>Response", | |
| additional_kwargs={"reasoning_content": reasoning_from_kwargs}, | |
| ) | |
| mock_llm.ainvoke = AsyncMock(return_value=mock_response) | |
| from nemoguardrails.actions.llm.utils import llm_call | |
| reasoning_trace_var.set(None) | |
| result = await llm_call(LangChainLLMAdapter(mock_llm), "Query") | |
| assert result.content == "Response" | |
| assert "<think>" not in result.content | |
| stored_trace = reasoning_trace_var.get() | |
| assert stored_trace == reasoning_from_kwargs |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integrations/langchain/test_reasoning_trace_extraction.py` around lines
289 - 307, The test exposes that llm_call currently returns raw
"<think>...</think>" markup when additional_kwargs["reasoning_content"] is
present; change llm_call (used with LangChainLLMAdapter and AIMessage) so it
still prefers storing additional_kwargs["reasoning_content"] into
reasoning_trace_var but always strips any <think>...</think> segments from the
returned AIMessage.content before returning it; locate the logic in llm_call
that determines stored_trace vs output content and ensure stored_trace selection
(from additional_kwargs or embedded tags) is kept, then apply a sanitize step to
the message content (removing <think>...</think> blocks) so user-visible content
never contains raw think tags.
| @action(is_system_action=True) | ||
| async def validate_tool_parameters(tool_calls, context=None, **kwargs): | ||
| """Test implementation of tool parameter validation.""" | ||
| tool_calls = tool_calls or (context.get("tool_calls", []) if context else []) | ||
|
|
||
| dangerous_patterns = ["eval", "exec", "system", "../", "rm -", "DROP", "DELETE"] | ||
|
|
||
| for tool_call in tool_calls: | ||
| func = tool_call.get("function", {}) | ||
| args = func.get("arguments", {}) | ||
| for param_value in args.values(): | ||
| if isinstance(param_value, str): | ||
| if any(pattern.lower() in param_value.lower() for pattern in dangerous_patterns): | ||
| return False | ||
| return True | ||
|
|
||
|
|
||
| @action(is_system_action=True) | ||
| async def self_check_tool_calls(tool_calls, context=None, **kwargs): | ||
| """Test implementation of tool call validation.""" | ||
| tool_calls = tool_calls or (context.get("tool_calls", []) if context else []) | ||
|
|
||
| return all(isinstance(call, dict) and "function" in call and "id" in call for call in tool_calls) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In langchain-core 0.2.14, what is the schema of each item in AIMessage.tool_calls? Does it use top-level "name"/"args"/"id"/"type", or a nested "function": {"name","arguments"} object?
💡 Result:
In langchain-core 0.2.14, each item in AIMessage.tool_calls is a ToolCall (TypedDict) with top-level keys: "name" (str), "args" (dict[str, Any]), "id" (str | None), and optionally "type" (Literal["tool_call"]). It does not use a nested "function": {"name", "arguments"} object. This schema was introduced in the PR adding tool calls to AIMessage (merged April 2024), and issues from late 2024 (e.g., #27938 with langchain-core==0.2.33) confirm LangChain's ToolCall uses top-level "name"/"args"/"id", distinguishing it from OpenAI's nested format. Source code examples across versions consistently show this structure.
Citations:
- 1: https://reference.langchain.com/python/langchain-core/messages/ai/AIMessage/tool_calls
- 2: [core]: add tool calls message langchain-ai/langchain#18947
- 3: https://reference.langchain.com/python/langchain-core/messages/ai/AIMessage
- 4: AIMessage enforces ToolCall model incompatible with OpenAI langchain-ai/langchain#27938
- 5: https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/messages/tool.py
- 6: https://github.com/langchain-ai/langchain/blob/273d282a/libs/core/langchain_core/messages/tool.py
- 7: https://github.com/langchain-ai/langchain/blob/50febb79/libs/core/langchain_core/messages/tool.py
- 8: https://github.com/langchain-ai/langchain/blob/feb992abfe01c28a669980dad022c5eed58e878d/libs/core/langchain_core/messages/tool.py
- 9: https://newreleases.io/project/github/langchain-ai/langchain/release/langchain%3D%3D0.2.14
- 10: langchain: release 0.2.14 langchain-ai/langchain#25416
🏁 Script executed:
# First, let's find the test file and read it
find . -name "test_tool_calls_event_extraction.py" -type fRepository: NVIDIA-NeMo/Guardrails
Length of output: 176
🏁 Script executed:
# Let's read the test file that's referenced in the review
wc -l ./tests/integrations/langchain/test_tool_calls_event_extraction.pyRepository: NVIDIA-NeMo/Guardrails
Length of output: 136
🏁 Script executed:
# Read the helper functions at lines 27-49
sed -n '27,49p' ./tests/integrations/langchain/test_tool_calls_event_extraction.pyRepository: NVIDIA-NeMo/Guardrails
Length of output: 1092
🏁 Script executed:
# Check the test at line 54
sed -n '50,80p' ./tests/integrations/langchain/test_tool_calls_event_extraction.pyRepository: NVIDIA-NeMo/Guardrails
Length of output: 891
🏁 Script executed:
# Check the test at line 196
sed -n '190,220p' ./tests/integrations/langchain/test_tool_calls_event_extraction.pyRepository: NVIDIA-NeMo/Guardrails
Length of output: 914
🏁 Script executed:
# Check the test at line 384
sed -n '378,410p' ./tests/integrations/langchain/test_tool_calls_event_extraction.pyRepository: NVIDIA-NeMo/Guardrails
Length of output: 929
Fix helper functions to handle LangChain's top-level tool-call schema.
The helpers at lines 35–36 and 49 read a nested function.arguments structure, but the tests at lines 54, 196, and 384 use LangChain's actual schema with top-level name/args fields. Since the helpers look for the wrong schema, the dangerous argument strings are never inspected—these blocking-path tests don't actually exercise the security checks they claim to validate.
The fix requires normalizing both schemas:
Proposed fix
`@action`(is_system_action=True)
async def validate_tool_parameters(tool_calls, context=None, **kwargs):
"""Test implementation of tool parameter validation."""
tool_calls = tool_calls or (context.get("tool_calls", []) if context else [])
dangerous_patterns = ["eval", "exec", "system", "../", "rm -", "DROP", "DELETE"]
for tool_call in tool_calls:
- func = tool_call.get("function", {})
- args = func.get("arguments", {})
+ if "function" in tool_call:
+ args = tool_call.get("function", {}).get("arguments", {}) or {}
+ else:
+ args = tool_call.get("args", {}) or {}
for param_value in args.values():
if isinstance(param_value, str):
if any(pattern.lower() in param_value.lower() for pattern in dangerous_patterns):
return False
return True
@@
`@action`(is_system_action=True)
async def self_check_tool_calls(tool_calls, context=None, **kwargs):
"""Test implementation of tool call validation."""
tool_calls = tool_calls or (context.get("tool_calls", []) if context else [])
- return all(isinstance(call, dict) and "function" in call and "id" in call for call in tool_calls)
+ return all(
+ isinstance(call, dict)
+ and "id" in call
+ and ("function" in call or ("name" in call and "args" in call))
+ for call in tool_calls
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @action(is_system_action=True) | |
| async def validate_tool_parameters(tool_calls, context=None, **kwargs): | |
| """Test implementation of tool parameter validation.""" | |
| tool_calls = tool_calls or (context.get("tool_calls", []) if context else []) | |
| dangerous_patterns = ["eval", "exec", "system", "../", "rm -", "DROP", "DELETE"] | |
| for tool_call in tool_calls: | |
| func = tool_call.get("function", {}) | |
| args = func.get("arguments", {}) | |
| for param_value in args.values(): | |
| if isinstance(param_value, str): | |
| if any(pattern.lower() in param_value.lower() for pattern in dangerous_patterns): | |
| return False | |
| return True | |
| @action(is_system_action=True) | |
| async def self_check_tool_calls(tool_calls, context=None, **kwargs): | |
| """Test implementation of tool call validation.""" | |
| tool_calls = tool_calls or (context.get("tool_calls", []) if context else []) | |
| return all(isinstance(call, dict) and "function" in call and "id" in call for call in tool_calls) | |
| `@action`(is_system_action=True) | |
| async def validate_tool_parameters(tool_calls, context=None, **kwargs): | |
| """Test implementation of tool parameter validation.""" | |
| tool_calls = tool_calls or (context.get("tool_calls", []) if context else []) | |
| dangerous_patterns = ["eval", "exec", "system", "../", "rm -", "DROP", "DELETE"] | |
| for tool_call in tool_calls: | |
| if "function" in tool_call: | |
| args = tool_call.get("function", {}).get("arguments", {}) or {} | |
| else: | |
| args = tool_call.get("args", {}) or {} | |
| for param_value in args.values(): | |
| if isinstance(param_value, str): | |
| if any(pattern.lower() in param_value.lower() for pattern in dangerous_patterns): | |
| return False | |
| return True | |
| `@action`(is_system_action=True) | |
| async def self_check_tool_calls(tool_calls, context=None, **kwargs): | |
| """Test implementation of tool call validation.""" | |
| tool_calls = tool_calls or (context.get("tool_calls", []) if context else []) | |
| return all( | |
| isinstance(call, dict) | |
| and "id" in call | |
| and ("function" in call or ("name" in call and "args" in call)) | |
| for call in tool_calls | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integrations/langchain/test_tool_calls_event_extraction.py` around
lines 27 - 49, The helper functions validate_tool_parameters and
self_check_tool_calls currently only handle a nested function.arguments schema
and miss LangChain's top-level tool-call shape (with top-level "name" and
"args"), so dangerous strings never get inspected; update both functions to
normalize the incoming call shape first by accepting either call.get("function",
{}).get("arguments", {}) or call.get("args") (and likewise accept
call.get("function", {}).get("name") or call.get("name")), then iterate
parameter values from that normalized args mapping when checking dangerous
patterns in validate_tool_parameters, and update self_check_tool_calls to
consider a call valid if it has either a "function" dict with "id"/"function" or
the top-level "name" and "args" fields (and still require an "id" if expected),
ensuring both schemas are supported before performing the existing validations.
| call_count = 0 | ||
|
|
||
| def mock_get_and_clear(): | ||
| nonlocal call_count | ||
| call_count += 1 | ||
| if call_count == 1: | ||
| return test_tool_calls | ||
| return None | ||
|
|
||
| with patch( | ||
| "nemoguardrails.actions.llm.utils.get_and_clear_tool_calls_contextvar", | ||
| side_effect=mock_get_and_clear, | ||
| ): | ||
| chat = TestChat(config, llm_completions=[""]) | ||
|
|
||
| result = await chat.app.generate_async(messages=[{"role": "user", "content": "Test"}]) | ||
|
|
||
| assert call_count >= 1, "get_and_clear_tool_calls_contextvar should be called" | ||
| assert result["tool_calls"] is not None | ||
| assert result["tool_calls"][0]["name"] == "test_tool" |
There was a problem hiding this comment.
Assert the “once” contract explicitly.
Line 136 only checks that the hook was called at least once, so an implementation that reads and clears the context multiple times would still pass. If this test is meant to lock down single-pop behavior, it should assert call_count == 1.
Proposed fix
- assert call_count >= 1, "get_and_clear_tool_calls_contextvar should be called"
+ assert call_count == 1, "get_and_clear_tool_calls_contextvar should only be called once"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integrations/langchain/test_tool_calls_event_extraction.py` around
lines 119 - 138, The test currently only asserts call_count >= 1 so multiple
reads would still pass; modify the assertion to enforce the single-pop contract
by checking call_count == 1 for the mocked get_and_clear_tool_calls_contextvar
(mock_get_and_clear), keeping the rest of the expectations (result from
TestChat.generate_async and result["tool_calls"] assertions) unchanged so the
test verifies exactly one read/clear occurred.
| @pytest.mark.asyncio | ||
| async def test_tool_rails_cannot_clear_context_variable(): | ||
| from nemoguardrails.context import tool_calls_var | ||
|
|
||
| test_tool_calls = [ | ||
| { | ||
| "id": "call_blocked", | ||
| "type": "function", | ||
| "function": { | ||
| "name": "blocked_tool", | ||
| "arguments": {"param": "rm -rf /"}, | ||
| }, | ||
| } | ||
| ] | ||
|
|
||
| tool_calls_var.set(test_tool_calls) | ||
|
|
||
| context = {"tool_calls": test_tool_calls} | ||
| result = await validate_tool_parameters(test_tool_calls, context=context) | ||
|
|
||
| assert result is False | ||
| assert tool_calls_var.get() is not None, "Context variable should not be cleared by tool rails" | ||
| assert tool_calls_var.get()[0]["function"]["name"] == "blocked_tool" |
There was a problem hiding this comment.
Restore tool_calls_var after the test.
Line 182 mutates shared ContextVar state and never resets it. That can leak stale tool calls into later tests that read tool_calls_var, making failures order-dependent.
Proposed fix
- tool_calls_var.set(test_tool_calls)
-
- context = {"tool_calls": test_tool_calls}
- result = await validate_tool_parameters(test_tool_calls, context=context)
-
- assert result is False
- assert tool_calls_var.get() is not None, "Context variable should not be cleared by tool rails"
- assert tool_calls_var.get()[0]["function"]["name"] == "blocked_tool"
+ token = tool_calls_var.set(test_tool_calls)
+ try:
+ context = {"tool_calls": test_tool_calls}
+ result = await validate_tool_parameters(test_tool_calls, context=context)
+
+ assert result is False
+ assert tool_calls_var.get() is not None, "Context variable should not be cleared by tool rails"
+ assert tool_calls_var.get()[0]["function"]["name"] == "blocked_tool"
+ finally:
+ tool_calls_var.reset(token)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @pytest.mark.asyncio | |
| async def test_tool_rails_cannot_clear_context_variable(): | |
| from nemoguardrails.context import tool_calls_var | |
| test_tool_calls = [ | |
| { | |
| "id": "call_blocked", | |
| "type": "function", | |
| "function": { | |
| "name": "blocked_tool", | |
| "arguments": {"param": "rm -rf /"}, | |
| }, | |
| } | |
| ] | |
| tool_calls_var.set(test_tool_calls) | |
| context = {"tool_calls": test_tool_calls} | |
| result = await validate_tool_parameters(test_tool_calls, context=context) | |
| assert result is False | |
| assert tool_calls_var.get() is not None, "Context variable should not be cleared by tool rails" | |
| assert tool_calls_var.get()[0]["function"]["name"] == "blocked_tool" | |
| `@pytest.mark.asyncio` | |
| async def test_tool_rails_cannot_clear_context_variable(): | |
| from nemoguardrails.context import tool_calls_var | |
| test_tool_calls = [ | |
| { | |
| "id": "call_blocked", | |
| "type": "function", | |
| "function": { | |
| "name": "blocked_tool", | |
| "arguments": {"param": "rm -rf /"}, | |
| }, | |
| } | |
| ] | |
| token = tool_calls_var.set(test_tool_calls) | |
| try: | |
| context = {"tool_calls": test_tool_calls} | |
| result = await validate_tool_parameters(test_tool_calls, context=context) | |
| assert result is False | |
| assert tool_calls_var.get() is not None, "Context variable should not be cleared by tool rails" | |
| assert tool_calls_var.get()[0]["function"]["name"] == "blocked_tool" | |
| finally: | |
| tool_calls_var.reset(token) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integrations/langchain/test_tool_calls_event_extraction.py` around
lines 167 - 189, The test mutates the shared ContextVar tool_calls_var and never
restores it; change test_tool_rails_cannot_clear_context_variable to capture the
token returned by tool_calls_var.set(...) and ensure tool_calls_var.reset(token)
is called in a finally block (or teardown) so the ContextVar is restored even on
assertion failures; reference the tool_calls_var and validate_tool_parameters
symbols when making this change.
| @action(is_system_action=True) | ||
| async def validate_tool_parameters(tool_calls, context=None, **kwargs): | ||
| """Test implementation of tool parameter validation.""" | ||
| tool_calls = tool_calls or (context.get("tool_calls", []) if context else []) | ||
|
|
||
| dangerous_patterns = ["eval", "exec", "system", "../", "rm -", "DROP", "DELETE"] | ||
|
|
||
| for tool_call in tool_calls: | ||
| func = tool_call.get("function", {}) | ||
| args = func.get("arguments", {}) | ||
| for param_value in args.values(): | ||
| if isinstance(param_value, str): | ||
| if any(pattern.lower() in param_value.lower() for pattern in dangerous_patterns): | ||
| return False | ||
| return True | ||
|
|
||
|
|
||
| @action(is_system_action=True) | ||
| async def self_check_tool_calls(tool_calls, context=None, **kwargs): | ||
| """Test implementation of tool call validation.""" | ||
| tool_calls = tool_calls or (context.get("tool_calls", []) if context else []) | ||
|
|
||
| return all(isinstance(call, dict) and "function" in call and "id" in call for call in tool_calls) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In langchain-core 0.2.14, what is the schema of each item in AIMessage.tool_calls? Does it use top-level "name"/"args"/"id"/"type", or a nested "function": {"name","arguments"} object?
💡 Result:
In langchain-core 0.2.14, each item in AIMessage.tool_calls is a ToolCall object (TypedDict) with top-level keys: name (str), args (dict[str, Any]), id (str | None), and optionally type (Literal["tool_call"]). It does not use a nested "function" object with "name" and "arguments".
Citations:
- 1: https://reference.langchain.com/python/langchain-core/messages/ai/AIMessage/tool_calls
- 2: [core]: add tool calls message langchain-ai/langchain#18947
- 3: AIMessage enforces ToolCall model incompatible with OpenAI langchain-ai/langchain#27938
- 4: https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/messages/tool.py
- 5: https://blog.langchain.dev/tool-calling-with-langchain/
🏁 Script executed:
fd "test_tool_output_rails.py" -t fRepository: NVIDIA-NeMo/Guardrails
Length of output: 152
🏁 Script executed:
cat -n tests/integrations/langchain/test_tool_output_rails.pyRepository: NVIDIA-NeMo/Guardrails
Length of output: 8777
Fix validators to handle LangChain-style tool_calls format.
The test fixtures at lines 56, 107, and 167 provide LangChain-style tool_calls with top-level "name", "args", "id", and "type" keys. However, validate_tool_parameters (line 35–36) and self_check_tool_calls (line 49) are hardcoded for OpenAI-style nested format with "function" and "arguments". This causes:
validate_tool_parametersto silently skip dangerous patterns (always returns True)self_check_tool_callsto reject valid tool_calls (always returns False)
Tests currently assert the opposite of intended behavior.
Proposed fix
`@action`(is_system_action=True)
async def validate_tool_parameters(tool_calls, context=None, **kwargs):
"""Test implementation of tool parameter validation."""
tool_calls = tool_calls or (context.get("tool_calls", []) if context else [])
dangerous_patterns = ["eval", "exec", "system", "../", "rm -", "DROP", "DELETE"]
for tool_call in tool_calls:
- func = tool_call.get("function", {})
- args = func.get("arguments", {})
+ if "function" in tool_call:
+ args = tool_call.get("function", {}).get("arguments", {}) or {}
+ else:
+ args = tool_call.get("args", {}) or {}
for param_value in args.values():
if isinstance(param_value, str):
if any(pattern.lower() in param_value.lower() for pattern in dangerous_patterns):
return False
return True
@@
`@action`(is_system_action=True)
async def self_check_tool_calls(tool_calls, context=None, **kwargs):
"""Test implementation of tool call validation."""
tool_calls = tool_calls or (context.get("tool_calls", []) if context else [])
- return all(isinstance(call, dict) and "function" in call and "id" in call for call in tool_calls)
+ return all(
+ isinstance(call, dict)
+ and "id" in call
+ and ("function" in call or ("name" in call and "args" in call))
+ for call in tool_calls
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integrations/langchain/test_tool_output_rails.py` around lines 27 - 49,
The validators assume an OpenAI nested format but tests use LangChain-style
tool_calls; update validate_tool_parameters and self_check_tool_calls to accept
both formats by normalizing tool_calls entries: for each call in
validate_tool_parameters extract parameters from either call.get("function",
{}).get("arguments", {}) or call.get("args", {}) (falling back appropriately)
and continue checking strings against dangerous_patterns, and in
self_check_tool_calls return True when each call is a dict that contains either
the OpenAI keys ("function" and "id") or the LangChain keys ("name", "args", and
"id"); reference the existing functions validate_tool_parameters,
self_check_tool_calls, the dangerous_patterns list, and the tool_calls variable
when making the changes.
| def get_bound_llm_magic_mock(ainvoke_return_value: Union[AIMessage, dict]) -> MagicMock: | ||
| mock_llm = MagicMock() | ||
| mock_llm.return_value = mock_llm | ||
|
|
||
| bound_llm_mock = AsyncMock() | ||
| if isinstance(ainvoke_return_value, dict): | ||
| bound_llm_mock.ainvoke.return_value = MagicMock(**ainvoke_return_value) | ||
| else: | ||
| bound_llm_mock.ainvoke.return_value = ainvoke_return_value | ||
|
|
||
| mock_llm.bind.return_value = bound_llm_mock | ||
| if isinstance(ainvoke_return_value, dict): | ||
| mock_llm.ainvoke = AsyncMock(return_value=MagicMock(**ainvoke_return_value)) | ||
| else: | ||
| mock_llm.ainvoke = AsyncMock(return_value=ainvoke_return_value) |
There was a problem hiding this comment.
Return a real dict from the helper's dict path.
MagicMock(**ainvoke_return_value) turns dict keys into attributes, not mapping behavior. Any code path that does isinstance(res, dict), res["key"], or res.get(...) will see a different shape than production, so this helper can mask bugs in the exact branch it claims to support.
Possible fix
def get_bound_llm_magic_mock(ainvoke_return_value: Union[AIMessage, dict]) -> MagicMock:
mock_llm = MagicMock()
mock_llm.return_value = mock_llm
bound_llm_mock = AsyncMock()
if isinstance(ainvoke_return_value, dict):
- bound_llm_mock.ainvoke.return_value = MagicMock(**ainvoke_return_value)
+ bound_llm_mock.ainvoke.return_value = ainvoke_return_value
else:
bound_llm_mock.ainvoke.return_value = ainvoke_return_value
mock_llm.bind.return_value = bound_llm_mock
if isinstance(ainvoke_return_value, dict):
- mock_llm.ainvoke = AsyncMock(return_value=MagicMock(**ainvoke_return_value))
+ mock_llm.ainvoke = AsyncMock(return_value=ainvoke_return_value)
else:
mock_llm.ainvoke = AsyncMock(return_value=ainvoke_return_value)
return mock_llm🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integrations/langchain/utils.py` around lines 104 - 118,
get_bound_llm_magic_mock is incorrectly wrapping dict responses with
MagicMock(**...), which makes dicts behave like objects; change both places
handling the dict branch so they return a real dict instead (e.g., use
dict(ainvoke_return_value) or ainvoke_return_value directly) for
bound_llm_mock.ainvoke.return_value and for mock_llm.ainvoke = AsyncMock(...),
while keeping the existing AsyncMock/AIMessage branch behavior; update
references to bound_llm_mock, mock_llm, and ainvoke_return_value accordingly.
| fake_llm = FakeLLMModel( | ||
| llm_responses=[ | ||
| LLMResponse( | ||
| content="Processing with metadata.", | ||
| tool_calls=test_tool_calls, | ||
| provider_metadata={"model": "test-model", "usage": {"tokens": 50}}, | ||
| ) | ||
| ] | ||
| ) | ||
| rails = LLMRails(config, llm=fake_llm) | ||
| result = await rails.generate_async(messages=[{"role": "user", "content": "Process with metadata"}]) | ||
|
|
||
| assert result.tool_calls is not None | ||
| assert result.tool_calls[0]["name"] == "preserve_test" | ||
| assert result.content == "" | ||
| assert hasattr(result, "response_metadata") | ||
| assert result["tool_calls"] is not None | ||
| assert result["tool_calls"][0]["function"]["name"] == "preserve_test" | ||
| assert result["content"] == "" |
There was a problem hiding this comment.
This test no longer verifies metadata propagation.
provider_metadata is part of the fixture here, but none of the assertions inspect it. A regression that drops metadata entirely would still pass, so the test name and the behavior it actually covers have drifted apart. Either assert the public metadata output in this test or remove/rename the metadata setup.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_tool_calls_event_extraction.py` around lines 321 - 335, The test
sets provider_metadata on the fake LLM response but never asserts it, so add an
assertion to verify metadata propagation: after calling LLMRails.generate_async
(using FakeLLMModel, LLMResponse, LLMRails) assert that the returned result
includes the provider metadata (e.g., result["tool_calls"][0] contains the same
provider_metadata dict or a public metadata field) to ensure metadata isn't
dropped; update or add an assertion comparing the expected {"model":
"test-model", "usage": {"tokens": 50}} to the value returned by result so the
test actually verifies metadata propagation.
| if llm is not None: | ||
| self.llm = llm | ||
| elif llm_completions is not None: | ||
| main_model = next((model for model in config.models if model.type == "main"), None) | ||
| should_return_token_usage = bool(main_model and main_model.engine in _TEST_PROVIDERS_WITH_TOKEN_USAGE) | ||
|
|
||
| self.llm = FakeLLM( | ||
| self.llm = FakeLLMModel( | ||
| responses=llm_completions, | ||
| streaming=streaming, | ||
| exception=llm_exception, | ||
| token_usage=token_usage, | ||
| should_return_token_usage=should_return_token_usage, | ||
| ) | ||
| if llm_exception: | ||
| self.llm.exception = llm_exception | ||
| else: | ||
| self.llm = None |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
python - <<'PY'
import ast
from pathlib import Path
for path in Path(".").rglob("*.py"):
try:
tree = ast.parse(path.read_text())
except Exception:
continue
for node in ast.walk(tree):
if isinstance(node, ast.Call) and getattr(node.func, "id", None) == "TestChat":
kw_names = {kw.arg for kw in node.keywords if kw.arg}
if len(node.args) == 1 and "llm" not in kw_names and "llm_completions" not in kw_names:
print(f"{path}:{node.lineno}: TestChat called without llm or llm_completions")
PYRepository: NVIDIA-NeMo/Guardrails
Length of output: 940
Address TestChat silent fallback to None LLM.
The new else: self.llm = None at line 171 silently switches instances created without llm or llm_completions to bypass the test double entirely. This affects at least 11 existing test calls:
tests/test_ai_defense.py(lines 72, 642, 1431, 1465, 1502, 1540, 1584, 1625)tests/test_jailbreak_heuristics.py(lines 52, 61)tests/test_jailbreak_models.py(line 56)
Instead of fail-fast behavior, tests now delegate to real LLMRails without a fake, hiding unexpected model calls and changing failure modes from "missing stub" to config/provider-dependent errors. Either provide an empty FakeLLMModel as the default or raise an error when both parameters are missing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/utils.py` around lines 158 - 172, The constructor logic in TestChat
silently sets self.llm = None when both llm and llm_completions are missing,
causing tests to hit the real LLM; change the else branch to either raise a
clear error or instantiate a default stubbed FakeLLMModel instead. Specifically,
in the TestChat initialization where llm and llm_completions are checked,
replace "else: self.llm = None" with one of: (a) raise ValueError("No llm or
llm_completions provided to TestChat; tests must supply a fake"), or (b)
self.llm = FakeLLMModel(responses=[], streaming=False, exception=None,
token_usage=None, should_return_token_usage=False) so tests fail-fast or use an
empty fake respectively; refer to the TestChat constructor and the FakeLLMModel
symbol when applying the change.
tgasser-nv
left a comment
There was a problem hiding this comment.
There are some tests outside of tests/integrations/langchain so they're not covered by conftest.py skip logic. These would need moving to the new location.
- tests/test_configs/with_custom_llm/config.py → imports CustomLLM
- tests/test_configs/with_custom_llm/custom_llm.py → extends langchain_core.language_models.BaseLLM
- tests/test_configs/with_custom_chat_model/config.py → imports CustomChatModel
- tests/test_configs/with_custom_chat_model/custom_chat_model.py → extends langchain_core.language_models.BaseChatModel
Could you also take a look at the conftest.py comment from Code rabbit
These would surface when changing the default framework. I've fixed these in the last stack. I'll bring them here.
Yes. This one is critical. Fixed it, will push 👍🏻 |
…integrations/langchain/ test_custom_llm.py is LangChain-specific — it asserts that custom_llm and custom_chat_model hooks register against the LangChain provider registries. It belongs alongside the other LangChain-only tests already under tests/integrations/langchain/. The three LangChain-based test configs (with_custom_llm, with_custom_chat_model, with_custom_llm_prompt_action_v2_x) are duplicated into tests/integrations/langchain/test_configs/ so the moved test continues to resolve them via its sibling ./test_configs path. The originals under tests/test_configs/ remain in place as the default-framework versions — they will be rewritten for the LLMModel protocol in the framework cutover that introduces the new default.
Replace pytest_collection_modifyitems with collect_ignore_glob so LangChain-gated test modules are skipped before pytest imports them.
ca11b55 to
344c8f7
Compare
Summary
FakeLLMModelintests/utils.pyas a framework-agnostic test double implementingLLMModelprotocol (replaces LangChainFakeLLMin core tests)tests/integrations/langchain/conftest.pyauto-skip for LangChain tests whenlangchainis not installedwith_custom_llm,with_custom_chat_model) to useLLMModelprotocol, keep LangChain copies undertests/integrations/langchain/test_configs/FakeLLMimport paths in runnable_rails integration testsWhy the large diffs
Tests that used LangChain types directly are split into two versions:
tests/test_*.py): rewritten to useFakeLLMModeland canonicaltypes (
LLMResponse,ToolCall). No LangChain dependency.tests/integrations/langchain/test_*.py): copy of theoriginal LangChain version, preserved for LangChain-specific coverage.
This means some files show as full rewrites (core) and their counterparts show as new files (integration copy). The test logic is the same, only the types and test doubles changed.
Review guidance
No new tests were added in this PR. The work is purely structural: moving, copying, and rewriting existing tests to use framework-agnostic types. The test logic and assertions are unchanged.
When reviewing:
tests/integrations/langchain/: these are copies ofexisting tests, preserved as-is. A quick scan to confirm they match the
originals is sufficient.
tests/test_*.py): focus on the type changes(
AIMessage->LLMResponse,FakeLLM->FakeLLMModel, LangChaintool call dicts ->
ToolCall/ToolCallFunction). The test structureand assertions should be equivalent.
tests/utils.py: theFakeLLMModelimplementation is the key newcode. Verify it correctly implements the
LLMModelprotocol (generate_async,stream_async,model_name,provider_name,provider_url).Test plan
poetry run pytest tests/ -x --ignore=tests/integrations/langchain(core tests pass without LangChain)poetry run pytest tests/integrations/langchain/ -x(LangChain tests pass)poetry run pytest tests/ -x(full suite)Summary by CodeRabbit