refactor(tests): framework-agnostic test infrastructure by Pouyanpi · Pull Request #1790 · NVIDIA-NeMo/Guardrails

Pouyanpi · 2026-04-14T13:39:40Z

Summary

Add FakeLLMModel in tests/utils.py as a framework-agnostic test double implementing LLMModel protocol (replaces LangChain FakeLLM in core tests)
Move approx 20 LangChain specific test files to tests/integrations/langchain/
Add conftest.py auto-skip for LangChain tests when langchain is not installed
Rewrite core test configs (with_custom_llm, with_custom_chat_model) to use LLMModel protocol, keep LangChain copies under tests/integrations/langchain/test_configs/
Fix FakeLLM import paths in runnable_rails integration tests

Why the large diffs

Tests that used LangChain types directly are split into two versions:

Core (tests/test_*.py): rewritten to use FakeLLMModel and canonical
types (LLMResponse, ToolCall). No LangChain dependency.
Integration (tests/integrations/langchain/test_*.py): copy of the
original LangChain version, preserved for LangChain-specific coverage.

This means some files show as full rewrites (core) and their counterparts show as new files (integration copy). The test logic is the same, only the types and test doubles changed.

Review guidance

No new tests were added in this PR. The work is purely structural: moving, copying, and rewriting existing tests to use framework-agnostic types. The test logic and assertions are unchanged.

When reviewing:

New files under tests/integrations/langchain/: these are copies of
existing tests, preserved as-is. A quick scan to confirm they match the
originals is sufficient.
Rewritten core files (tests/test_*.py): focus on the type changes
(AIMessage -> LLMResponse, FakeLLM -> FakeLLMModel, LangChain
tool call dicts -> ToolCall/ToolCallFunction). The test structure
and assertions should be equivalent.
tests/utils.py: the FakeLLMModel implementation is the key new
code. Verify it correctly implements the LLMModel protocol (
generate_async, stream_async, model_name, provider_name,
provider_url).

Test plan

poetry run pytest tests/ -x --ignore=tests/integrations/langchain (core tests pass without LangChain)
poetry run pytest tests/integrations/langchain/ -x (LangChain tests pass)
poetry run pytest tests/ -x (full suite)

Summary by CodeRabbit

Tests
- Added automatic skipping of integration tests when optional dependencies are unavailable.
- Expanded test coverage for tool calling, reasoning trace extraction, and output rail validation in LangChain scenarios.
- Refactored test infrastructure for improved maintainability and consistency.

greptile-apps · 2026-04-14T13:44:29Z

Greptile Summary

This PR decouples the test infrastructure from LangChain by introducing FakeLLMModel — a framework-agnostic test double implementing the LLMModel protocol — and reorganising ~20 LangChain-specific test files under tests/integrations/langchain/. Core tests are rewritten to use LLMResponse/ToolCall protocol types; LangChain copies are preserved as integration tests with their original LangChain types intact.

Confidence Score: 5/5

Safe to merge; only one P2 style concern remains (unused streaming flag on FakeLLMModel).

All prior P0/P1 concerns from previous threads have been addressed (copy-mutation fixed with copy.copy, collect_ignore_glob extended to cover both flat and nested patterns). The sole remaining finding is a no-op streaming attribute — dead code that does not affect correctness or test coverage.

tests/utils.py — streaming stored but never consulted in generate_async/stream_async.

Important Files Changed

Filename	Overview
tests/utils.py	New `FakeLLMModel` cleanly implements the `LLMModel` protocol; `streaming` flag is stored but unused in both async methods — minor dead-code concern.
tests/conftest.py	Adds `collect_ignore_glob` to skip LangChain tests when `langchain_core` is absent; prior-thread concerns were partially addressed by adding both `.py` and `/.py` patterns.
tests/integrations/langchain/utils.py	Relocated `FakeLLM` (LangChain) and `get_bound_llm_magic_mock` from `tests/utils.py` — now correctly isolated to the LangChain integration layer.
tests/test_llmrails.py	All `FakeLLM` → `FakeLLMModel` swaps; assertion updated to `assert llm_rails.llm is injected_llm` reflecting no-adapter wrapping for protocol-compliant models.
tests/test_output_rails_tool_calls.py	Rewritten to use `FakeLLMModel` with `LLMResponse`/`ToolCall` protocol types instead of LangChain `AIMessage`; logic unchanged.
tests/integrations/langchain/runnable_rails/test_streaming.py	New comprehensive streaming test suite for LangChain `RunnableRails`; correctly imports `FakeLLM` from the integration `utils.py`.

Sequence Diagram

sequenceDiagram
    participant CoreTest as Core Test (tests/test_*.py)
    participant IntegTest as Integration Test (tests/integrations/langchain/)
    participant FakeLLMModel as FakeLLMModel (tests/utils.py)
    participant FakeLLM as FakeLLM (tests/integrations/langchain/utils.py)
    participant LLMRails as LLMRails
    participant LLMModel as LLMModel Protocol

    CoreTest->>FakeLLMModel: FakeLLMModel(responses=[...])
    FakeLLMModel-->>CoreTest: instance (implements LLMModel)
    CoreTest->>LLMRails: LLMRails(config, llm=fake_llm)
    LLMRails->>LLMModel: generate_async(prompt) / stream_async(prompt)
    LLMModel-->>LLMRails: LLMResponse / AsyncIterator[LLMResponseChunk]
    LLMRails-->>CoreTest: result (no LangChain dependency)

    IntegTest->>FakeLLM: FakeLLM(responses=[...])
    FakeLLM-->>IntegTest: instance (LangChain BaseLLM)
    IntegTest->>LLMRails: RunnableRails(config, llm=fake_llm)
    LLMRails->>FakeLLM: ainvoke / _astream (LangChain API)
    FakeLLM-->>LLMRails: AIMessage / GenerationChunk
    LLMRails-->>IntegTest: result

Prompt To Fix All With AI

This is a comment left during a code review.
Path: tests/utils.py
Line: 45-61

Comment:
**`streaming` flag is stored but never consulted**

`FakeLLMModel.__init__` accepts and stores `self.streaming`, but neither `generate_async` nor `stream_async` reads it. In the original `FakeLLM`, `streaming=True` activated LangChain's `_astream` code path; in the new design the framework is responsible for choosing which method to call, so the flag has no effect. Any test that passes `streaming=True` expecting a behavioural difference (e.g., chunked output instead of a full string) will silently exercise the same path as `streaming=False`.

If the attribute is intentionally a no-op (because the framework decides independently), a doc-comment to that effect would prevent confusion for future contributors.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (7): Last reviewed commit: "apply coderabbit review suggestion" | Re-trigger Greptile}

codecov · 2026-04-14T14:04:35Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…o-skip Add FakeLLMModel implementing LLMModel protocol to tests/utils.py. TestChat uses FakeLLMModel instead of LangChain FakeLLM. Add pytest conftest to auto-skip tests in tests/integrations/langchain/ when LangChain is not installed.

…langchain/ Pure file moves with git mv, no content changes. GitHub should detect these as renames.

…ixes - Create tests/integrations/langchain/utils.py with LangChain FakeLLM - Add core-friendly rewrites using FakeLLMModel for tool call, output rail, reasoning, and passthrough tests - Migrate FakeLLM→FakeLLMModel in all core test files - Remove LangChainLLMAdapter wrapping from test_llama_guard, test_patronus_lynx - Update integration test imports to use FakeLLM from new location - Fix Windows timer resolution in test_logging

…tests

coderabbitai · 2026-04-16T09:29:07Z

📝 Walkthrough

Walkthrough

This PR refactors test infrastructure to decouple from LangChain-specific test doubles by introducing a framework-agnostic FakeLLMModel. It adds pytest hooks to conditionally skip LangChain integration tests when the dependency is unavailable, creates a dedicated LangChain integration test utilities module, and introduces comprehensive new integration tests covering tool calling, reasoning trace extraction, and output rail validation.

Changes

Cohort / File(s)	Summary
Test Configuration & Pytest Hooks `tests/conftest.py`	Added `pytest_collection_modifyitems` hook to conditionally skip LangChain integration tests when `langchain_core` is not installed.
LangChain Integration Test Package Structure `tests/integrations/langchain/**/__init__.py`, `tests/integrations/langchain/utils.py`	Created directory structure with four new `__init__.py` files and a new `utils.py` module exporting `FakeLLM` (LangChain-specific) and `get_bound_llm_magic_mock()` for integration tests.
LangChain Runnable Rails Integration Tests `tests/integrations/langchain/runnable_rails/__init__.py`, `tests/integrations/langchain/runnable_rails/test_*.py`	Added package structure and updated nine runnable rails test files to import `FakeLLM` from `tests.integrations.langchain.utils` instead of `tests.utils`.
New LangChain Integration Tests `tests/integrations/langchain/test_output_rails_tool_calls.py`, `test_reasoning_trace_extraction.py`, `test_tool_calling_passthrough_only.py`, `test_tool_calls_event_extraction.py`, `test_tool_output_rails.py`	Added five comprehensive integration test modules (1,563 lines total) covering LangChain tool-call handling, reasoning trace extraction and storage, tool-call passthrough gating, event-driven tool call extraction, and tool output rail validation with system actions and complex scenarios.
Core Test Utilities Refactoring `tests/utils.py`	Removed LangChain-specific `FakeLLM` class and added framework-agnostic `FakeLLMModel` class supporting both string and strongly-typed `LLMResponse` backends; updated `TestChat.__init__` to accept optional `llm` parameter; removed `get_bound_llm_magic_mock()` helper.
Existing Tests Migration to FakeLLMModel `tests/test_content_safety_*.py`, `test_context_updates.py`, `test_execute_action.py`, `test_guardrails_ai_e2e_v1.py`, `test_integration_cache.py`, `test_jailbreak_cache.py`, `test_llama_guard.py`, `test_llmrails.py`, `test_output_rails_tool_calls.py`, `test_patronus_lynx.py`, `test_prompt_generation.py`, `test_reasoning_trace_extraction.py`, `test_system_message_conversion.py`, `test_tool_calling_passthrough_only.py`, `test_tool_calls_event_extraction.py`, `test_tool_output_rails.py`, `tests/tracing/spans/test_span_v2_integration.py`	Updated 31 test files to use `FakeLLMModel` instead of `FakeLLM`, removing `LangChainLLMAdapter` wrapping where applicable; adjusted assertions and tool-call representations to match new framework-agnostic API (dict-based results, typed `LLMResponse`/`ToolCall` objects); updated one test assertion from `BaseLLM` type to `LLMModel` type.

Sequence Diagram(s)

No sequence diagrams generated. While the changes introduce new integration tests with multiple components, the scope is primarily test infrastructure refactoring (import path changes, test double replacement) and new test modules that implement testing logic rather than new production features with significant control flow interactions.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

The PR involves substantial cross-cutting changes: refactoring the core test infrastructure with two competing LLM test doubles, updating 31+ test files with systematic (but mixed-pattern) import and instantiation changes, and introducing 5 new integration test modules with moderately complex test logic. While many individual changes are repetitive imports/instantiations (lower effort), the heterogeneity of change patterns, the number of affected files, and the need to verify test isolation and correctness across the new integration test suite warrant careful review.

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 29.26% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes	⚠️ Warning	PR lacks documented test results, coverage metrics, and CI/CD verification for major test infrastructure refactoring affecting 35+ files.	Include test execution summary with pass/fail counts, coverage changes, and link to CI/CD pipeline results confirming all tests pass.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly describes the main change: refactoring test infrastructure to be framework-agnostic, which is the primary objective of moving core tests away from LangChain and introducing FakeLLMModel.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch refactor/langchain-decouple/stack-8-test-infrastructure

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 9

🧹 Nitpick comments (2)

tests/test_content_safety_integration.py (1)

353-353: Use the public action-param registration API instead of mutating runtime internals.

Assigning into runtime.registered_action_params makes this test depend on internal storage details. Registering the fake through register_action_param keeps the setup aligned with the supported API and is less brittle.

♻️ Suggested cleanup

-        chat.app.runtime.registered_action_params["llms"] = {"content_safety_reasoning": content_safety_llm}
+        chat.app.register_action_param(
+            "llms",
+            {"content_safety_reasoning": content_safety_llm},
+        )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_content_safety_integration.py` at line 353, The test directly
mutates internal storage via chat.app.runtime.registered_action_params["llms"] =
{"content_safety_reasoning": content_safety_llm}; instead use the public API by
calling chat.app.runtime.register_action_param (or the app-level helper if
available) to register the fake LLM param under the "llms" namespace with key
"content_safety_reasoning" and value content_safety_llm so the test relies on
the supported registration mechanism rather than internal dict structure.

tests/test_llmrails.py (1)

769-775: Assert the full LLMModel contract here.

This regression test now checks only generate_async, so a partial fake could still pass while missing other members the core pipeline relies on. Since this file is validating the framework-agnostic migration, make the assertion match the protocol more directly.

♻️ Suggested assertion tightening

     from nemoguardrails.types import LLMModel

     `@action`(name="test_llm_action")
     async def test_llm_action(llm: LLMModel):
         assert llm is not None
-        assert hasattr(llm, "generate_async")
+        assert isinstance(llm, LLMModel)
         return "llm_action_success"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_llmrails.py` around lines 769 - 775, The test currently only
checks for generate_async on the LLMModel and can be bypassed by a partial fake;
update test_llm_action to assert the full LLMModel contract by iterating over
the members declared on the LLMModel type and asserting the provided llm has
each member (e.g. use LLMModel.__annotations__ or dir(LLMModel) to enumerate
expected attribute names) and for those that should be callables assert
callable(getattr(llm, name)); keep references to LLMModel and test_llm_action so
the check explicitly validates the protocol surface rather than a single method.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/conftest.py`:
- Around line 38-50: pytest_collection_modifyitems runs too late to prevent
top-level imports from failing; add a collection-time gate (e.g., implement
pytest_ignore_collect or collect_ignore_glob) that checks for langchain
availability and skips matching modules before they are imported. Specifically,
implement a pytest_ignore_collect(path) function that tries to import
langchain_core (or uses pytest.importorskip) and returns True to ignore
collection for any path whose parts include "integrations" and "langchain" or
the helper config files under tests/test_configs/with_custom_llm,
tests/test_configs/with_custom_chat_model, and
tests/test_configs/with_custom_llm_prompt_action_v2_x when langchain_core is
missing; keep pytest_collection_modifyitems only for post-collection markers if
desired.

In `@tests/integrations/langchain/test_reasoning_trace_extraction.py`:
- Around line 289-307: The test exposes that llm_call currently returns raw
"<think>...</think>" markup when additional_kwargs["reasoning_content"] is
present; change llm_call (used with LangChainLLMAdapter and AIMessage) so it
still prefers storing additional_kwargs["reasoning_content"] into
reasoning_trace_var but always strips any <think>...</think> segments from the
returned AIMessage.content before returning it; locate the logic in llm_call
that determines stored_trace vs output content and ensure stored_trace selection
(from additional_kwargs or embedded tags) is kept, then apply a sanitize step to
the message content (removing <think>...</think> blocks) so user-visible content
never contains raw think tags.

In `@tests/integrations/langchain/test_tool_calls_event_extraction.py`:
- Around line 119-138: The test currently only asserts call_count >= 1 so
multiple reads would still pass; modify the assertion to enforce the single-pop
contract by checking call_count == 1 for the mocked
get_and_clear_tool_calls_contextvar (mock_get_and_clear), keeping the rest of
the expectations (result from TestChat.generate_async and result["tool_calls"]
assertions) unchanged so the test verifies exactly one read/clear occurred.
- Around line 167-189: The test mutates the shared ContextVar tool_calls_var and
never restores it; change test_tool_rails_cannot_clear_context_variable to
capture the token returned by tool_calls_var.set(...) and ensure
tool_calls_var.reset(token) is called in a finally block (or teardown) so the
ContextVar is restored even on assertion failures; reference the tool_calls_var
and validate_tool_parameters symbols when making this change.
- Around line 27-49: The helper functions validate_tool_parameters and
self_check_tool_calls currently only handle a nested function.arguments schema
and miss LangChain's top-level tool-call shape (with top-level "name" and
"args"), so dangerous strings never get inspected; update both functions to
normalize the incoming call shape first by accepting either call.get("function",
{}).get("arguments", {}) or call.get("args") (and likewise accept
call.get("function", {}).get("name") or call.get("name")), then iterate
parameter values from that normalized args mapping when checking dangerous
patterns in validate_tool_parameters, and update self_check_tool_calls to
consider a call valid if it has either a "function" dict with "id"/"function" or
the top-level "name" and "args" fields (and still require an "id" if expected),
ensuring both schemas are supported before performing the existing validations.

In `@tests/integrations/langchain/test_tool_output_rails.py`:
- Around line 27-49: The validators assume an OpenAI nested format but tests use
LangChain-style tool_calls; update validate_tool_parameters and
self_check_tool_calls to accept both formats by normalizing tool_calls entries:
for each call in validate_tool_parameters extract parameters from either
call.get("function", {}).get("arguments", {}) or call.get("args", {}) (falling
back appropriately) and continue checking strings against dangerous_patterns,
and in self_check_tool_calls return True when each call is a dict that contains
either the OpenAI keys ("function" and "id") or the LangChain keys ("name",
"args", and "id"); reference the existing functions validate_tool_parameters,
self_check_tool_calls, the dangerous_patterns list, and the tool_calls variable
when making the changes.

In `@tests/integrations/langchain/utils.py`:
- Around line 104-118: get_bound_llm_magic_mock is incorrectly wrapping dict
responses with MagicMock(**...), which makes dicts behave like objects; change
both places handling the dict branch so they return a real dict instead (e.g.,
use dict(ainvoke_return_value) or ainvoke_return_value directly) for
bound_llm_mock.ainvoke.return_value and for mock_llm.ainvoke = AsyncMock(...),
while keeping the existing AsyncMock/AIMessage branch behavior; update
references to bound_llm_mock, mock_llm, and ainvoke_return_value accordingly.

In `@tests/test_tool_calls_event_extraction.py`:
- Around line 321-335: The test sets provider_metadata on the fake LLM response
but never asserts it, so add an assertion to verify metadata propagation: after
calling LLMRails.generate_async (using FakeLLMModel, LLMResponse, LLMRails)
assert that the returned result includes the provider metadata (e.g.,
result["tool_calls"][0] contains the same provider_metadata dict or a public
metadata field) to ensure metadata isn't dropped; update or add an assertion
comparing the expected {"model": "test-model", "usage": {"tokens": 50}} to the
value returned by result so the test actually verifies metadata propagation.

In `@tests/utils.py`:
- Around line 158-172: The constructor logic in TestChat silently sets self.llm
= None when both llm and llm_completions are missing, causing tests to hit the
real LLM; change the else branch to either raise a clear error or instantiate a
default stubbed FakeLLMModel instead. Specifically, in the TestChat
initialization where llm and llm_completions are checked, replace "else:
self.llm = None" with one of: (a) raise ValueError("No llm or llm_completions
provided to TestChat; tests must supply a fake"), or (b) self.llm =
FakeLLMModel(responses=[], streaming=False, exception=None, token_usage=None,
should_return_token_usage=False) so tests fail-fast or use an empty fake
respectively; refer to the TestChat constructor and the FakeLLMModel symbol when
applying the change.

---

Nitpick comments:
In `@tests/test_content_safety_integration.py`:
- Line 353: The test directly mutates internal storage via
chat.app.runtime.registered_action_params["llms"] = {"content_safety_reasoning":
content_safety_llm}; instead use the public API by calling
chat.app.runtime.register_action_param (or the app-level helper if available) to
register the fake LLM param under the "llms" namespace with key
"content_safety_reasoning" and value content_safety_llm so the test relies on
the supported registration mechanism rather than internal dict structure.

In `@tests/test_llmrails.py`:
- Around line 769-775: The test currently only checks for generate_async on the
LLMModel and can be bypassed by a partial fake; update test_llm_action to assert
the full LLMModel contract by iterating over the members declared on the
LLMModel type and asserting the provided llm has each member (e.g. use
LLMModel.__annotations__ or dir(LLMModel) to enumerate expected attribute names)
and for those that should be callables assert callable(getattr(llm, name)); keep
references to LLMModel and test_llm_action so the check explicitly validates the
protocol surface rather than a single method.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: d2e27196-c546-46d7-8416-3b256fa8d9af

📥 Commits

Reviewing files that changed from the base of the PR and between f2d4492 and e19d879.

📒 Files selected for processing (60)

tests/conftest.py
tests/integrations/langchain/llm/__init__.py
tests/integrations/langchain/llm/models/__init__.py
tests/integrations/langchain/llm/models/test_langchain_init_scenarios.py
tests/integrations/langchain/llm/models/test_langchain_initialization_methods.py
tests/integrations/langchain/llm/models/test_langchain_initializer.py
tests/integrations/langchain/llm/models/test_langchain_special_cases.py
tests/integrations/langchain/llm/providers/__init__.py
tests/integrations/langchain/llm/providers/test_deprecated_providers.py
tests/integrations/langchain/llm/providers/test_providers.py
tests/integrations/langchain/llm/providers/test_trtllm_provider.py
tests/integrations/langchain/llm/test_langchain_integration.py
tests/integrations/langchain/llm/test_version_compatibility.py
tests/integrations/langchain/runnable_rails/__init__.py
tests/integrations/langchain/runnable_rails/test_basic_operations.py
tests/integrations/langchain/runnable_rails/test_batch_as_completed.py
tests/integrations/langchain/runnable_rails/test_batching.py
tests/integrations/langchain/runnable_rails/test_composition.py
tests/integrations/langchain/runnable_rails/test_format_output.py
tests/integrations/langchain/runnable_rails/test_history.py
tests/integrations/langchain/runnable_rails/test_message_utils.py
tests/integrations/langchain/runnable_rails/test_metadata.py
tests/integrations/langchain/runnable_rails/test_piping.py
tests/integrations/langchain/runnable_rails/test_runnable_rails.py
tests/integrations/langchain/runnable_rails/test_streaming.py
tests/integrations/langchain/runnable_rails/test_tool_calling.py
tests/integrations/langchain/runnable_rails/test_transform_input.py
tests/integrations/langchain/runnable_rails/test_types.py
tests/integrations/langchain/test_actions_llm_utils.py
tests/integrations/langchain/test_langchain_llm_adapter.py
tests/integrations/langchain/test_output_rails_tool_calls.py
tests/integrations/langchain/test_reasoning_trace_extraction.py
tests/integrations/langchain/test_server_streaming.py
tests/integrations/langchain/test_streaming.py
tests/integrations/langchain/test_tool_calling_passthrough_only.py
tests/integrations/langchain/test_tool_calling_utils.py
tests/integrations/langchain/test_tool_calls_event_extraction.py
tests/integrations/langchain/test_tool_output_rails.py
tests/integrations/langchain/utils.py
tests/test_content_safety_actions.py
tests/test_content_safety_cache.py
tests/test_content_safety_integration.py
tests/test_context_updates.py
tests/test_execute_action.py
tests/test_guardrails_ai_e2e_v1.py
tests/test_integration_cache.py
tests/test_jailbreak_cache.py
tests/test_llama_guard.py
tests/test_llmrails.py
tests/test_output_rails_tool_calls.py
tests/test_patronus_lynx.py
tests/test_prompt_generation.py
tests/test_reasoning_trace_extraction.py
tests/test_system_message_conversion.py
tests/test_tool_calling_passthrough_only.py
tests/test_tool_calls_event_extraction.py
tests/test_tool_output_rails.py
tests/test_topic_safety_cache.py
tests/tracing/spans/test_span_v2_integration.py
tests/utils.py

coderabbitai · 2026-04-16T09:29:10Z

+    async def test_llm_call_prefers_additional_kwargs_over_think_tags(self):
+        reasoning_from_kwargs = "This should be used"
+        reasoning_from_tags = "This should be ignored"
+
+        mock_llm = AsyncMock()
+        mock_response = AIMessage(
+            content=f"<think>{reasoning_from_tags}</think>Response",
+            additional_kwargs={"reasoning_content": reasoning_from_kwargs},
+        )
+        mock_llm.ainvoke = AsyncMock(return_value=mock_response)
+
+        from nemoguardrails.actions.llm.utils import llm_call
+
+        reasoning_trace_var.set(None)
+        result = await llm_call(LangChainLLMAdapter(mock_llm), "Query")
+
+        assert result.content == f"<think>{reasoning_from_tags}</think>Response"
+        stored_trace = reasoning_trace_var.get()
+        assert stored_trace == reasoning_from_kwargs


⚠️ Potential issue | 🟠 Major

Don't lock in <think> tags as user-visible output.

This assertion makes the mixed case preserve raw <think>...</think> content whenever additional_kwargs["reasoning_content"] is present. That hardens the exact leakage path the neighboring tests sanitize. The precedence rule should only decide which trace gets stored; the rendered content should still be stripped.

Possible fix

- assert result.content == f"<think>{reasoning_from_tags}</think>Response" + assert result.content == "Response" + assert "<think>" not in result.content stored_trace = reasoning_trace_var.get() assert stored_trace == reasoning_from_kwargs

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async def test_llm_call_prefers_additional_kwargs_over_think_tags(self):

reasoning_from_kwargs = "This should be used"

reasoning_from_tags = "This should be ignored"

mock_llm = AsyncMock()

mock_response = AIMessage(

content=f"<think>{reasoning_from_tags}</think>Response",

additional_kwargs={"reasoning_content": reasoning_from_kwargs},

)

mock_llm.ainvoke = AsyncMock(return_value=mock_response)

from nemoguardrails.actions.llm.utils import llm_call

reasoning_trace_var.set(None)

result = await llm_call(LangChainLLMAdapter(mock_llm), "Query")

assert result.content == f"<think>{reasoning_from_tags}</think>Response"

stored_trace = reasoning_trace_var.get()

assert stored_trace == reasoning_from_kwargs

async def test_llm_call_prefers_additional_kwargs_over_think_tags(self):

reasoning_from_kwargs = "This should be used"

reasoning_from_tags = "This should be ignored"

mock_llm = AsyncMock()

mock_response = AIMessage(

content=f"<think>{reasoning_from_tags}</think>Response",

additional_kwargs={"reasoning_content": reasoning_from_kwargs},

)

mock_llm.ainvoke = AsyncMock(return_value=mock_response)

from nemoguardrails.actions.llm.utils import llm_call

reasoning_trace_var.set(None)

result = await llm_call(LangChainLLMAdapter(mock_llm), "Query")

assert result.content == "Response"

assert "<think>" not in result.content

stored_trace = reasoning_trace_var.get()

assert stored_trace == reasoning_from_kwargs

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/integrations/langchain/test_reasoning_trace_extraction.py` around lines 289 - 307, The test exposes that llm_call currently returns raw "<think>...</think>" markup when additional_kwargs["reasoning_content"] is present; change llm_call (used with LangChainLLMAdapter and AIMessage) so it still prefers storing additional_kwargs["reasoning_content"] into reasoning_trace_var but always strips any <think>...</think> segments from the returned AIMessage.content before returning it; locate the logic in llm_call that determines stored_trace vs output content and ensure stored_trace selection (from additional_kwargs or embedded tags) is kept, then apply a sanitize step to the message content (removing <think>...</think> blocks) so user-visible content never contains raw think tags.

coderabbitai · 2026-04-16T09:29:11Z

+@action(is_system_action=True)
+async def validate_tool_parameters(tool_calls, context=None, **kwargs):
+    """Test implementation of tool parameter validation."""
+    tool_calls = tool_calls or (context.get("tool_calls", []) if context else [])
+
+    dangerous_patterns = ["eval", "exec", "system", "../", "rm -", "DROP", "DELETE"]
+
+    for tool_call in tool_calls:
+        func = tool_call.get("function", {})
+        args = func.get("arguments", {})
+        for param_value in args.values():
+            if isinstance(param_value, str):
+                if any(pattern.lower() in param_value.lower() for pattern in dangerous_patterns):
+                    return False
+    return True
+
+
+@action(is_system_action=True)
+async def self_check_tool_calls(tool_calls, context=None, **kwargs):
+    """Test implementation of tool call validation."""
+    tool_calls = tool_calls or (context.get("tool_calls", []) if context else [])
+
+    return all(isinstance(call, dict) and "function" in call and "id" in call for call in tool_calls)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

In langchain-core 0.2.14, what is the schema of each item in AIMessage.tool_calls? Does it use top-level "name"/"args"/"id"/"type", or a nested "function": {"name","arguments"} object?

💡 Result:

In langchain-core 0.2.14, each item in AIMessage.tool_calls is a ToolCall (TypedDict) with top-level keys: "name" (str), "args" (dict[str, Any]), "id" (str | None), and optionally "type" (Literal["tool_call"]). It does not use a nested "function": {"name", "arguments"} object. This schema was introduced in the PR adding tool calls to AIMessage (merged April 2024), and issues from late 2024 (e.g., #27938 with langchain-core==0.2.33) confirm LangChain's ToolCall uses top-level "name"/"args"/"id", distinguishing it from OpenAI's nested format. Source code examples across versions consistently show this structure.

Citations:

1: https://reference.langchain.com/python/langchain-core/messages/ai/AIMessage/tool_calls

2: [core]: add tool calls message langchain-ai/langchain#18947

3: https://reference.langchain.com/python/langchain-core/messages/ai/AIMessage

4: AIMessage enforces ToolCall model incompatible with OpenAI langchain-ai/langchain#27938

5: https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/messages/tool.py

6: https://github.com/langchain-ai/langchain/blob/273d282a/libs/core/langchain_core/messages/tool.py

7: https://github.com/langchain-ai/langchain/blob/50febb79/libs/core/langchain_core/messages/tool.py

8: https://github.com/langchain-ai/langchain/blob/feb992abfe01c28a669980dad022c5eed58e878d/libs/core/langchain_core/messages/tool.py

9: https://newreleases.io/project/github/langchain-ai/langchain/release/langchain%3D%3D0.2.14

10: langchain: release 0.2.14 langchain-ai/langchain#25416

🏁 Script executed:

# First, let's find the test file and read it find . -name "test_tool_calls_event_extraction.py" -type f

Repository: NVIDIA-NeMo/Guardrails

Length of output: 176

🏁 Script executed:

# Let's read the test file that's referenced in the review wc -l ./tests/integrations/langchain/test_tool_calls_event_extraction.py

Repository: NVIDIA-NeMo/Guardrails

Length of output: 136

🏁 Script executed:

# Read the helper functions at lines 27-49 sed -n '27,49p' ./tests/integrations/langchain/test_tool_calls_event_extraction.py

Repository: NVIDIA-NeMo/Guardrails

Length of output: 1092

🏁 Script executed:

# Check the test at line 54 sed -n '50,80p' ./tests/integrations/langchain/test_tool_calls_event_extraction.py

Repository: NVIDIA-NeMo/Guardrails

Length of output: 891

🏁 Script executed:

# Check the test at line 196 sed -n '190,220p' ./tests/integrations/langchain/test_tool_calls_event_extraction.py

Repository: NVIDIA-NeMo/Guardrails

Length of output: 914

🏁 Script executed:

# Check the test at line 384 sed -n '378,410p' ./tests/integrations/langchain/test_tool_calls_event_extraction.py

Repository: NVIDIA-NeMo/Guardrails

Length of output: 929

Fix helper functions to handle LangChain's top-level tool-call schema.

The helpers at lines 35–36 and 49 read a nested function.arguments structure, but the tests at lines 54, 196, and 384 use LangChain's actual schema with top-level name/args fields. Since the helpers look for the wrong schema, the dangerous argument strings are never inspected—these blocking-path tests don't actually exercise the security checks they claim to validate.

The fix requires normalizing both schemas:

Proposed fix

`@action`(is_system_action=True) async def validate_tool_parameters(tool_calls, context=None, **kwargs): """Test implementation of tool parameter validation.""" tool_calls = tool_calls or (context.get("tool_calls", []) if context else []) dangerous_patterns = ["eval", "exec", "system", "../", "rm -", "DROP", "DELETE"] for tool_call in tool_calls: - func = tool_call.get("function", {}) - args = func.get("arguments", {}) + if "function" in tool_call: + args = tool_call.get("function", {}).get("arguments", {}) or {} + else: + args = tool_call.get("args", {}) or {} for param_value in args.values(): if isinstance(param_value, str): if any(pattern.lower() in param_value.lower() for pattern in dangerous_patterns): return False return True @@ `@action`(is_system_action=True) async def self_check_tool_calls(tool_calls, context=None, **kwargs): """Test implementation of tool call validation.""" tool_calls = tool_calls or (context.get("tool_calls", []) if context else []) - return all(isinstance(call, dict) and "function" in call and "id" in call for call in tool_calls) + return all( + isinstance(call, dict) + and "id" in call + and ("function" in call or ("name" in call and "args" in call)) + for call in tool_calls + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@action(is_system_action=True)

async def validate_tool_parameters(tool_calls, context=None, **kwargs):

"""Test implementation of tool parameter validation."""

tool_calls = tool_calls or (context.get("tool_calls", []) if context else [])

dangerous_patterns = ["eval", "exec", "system", "../", "rm -", "DROP", "DELETE"]

for tool_call in tool_calls:

func = tool_call.get("function", {})

args = func.get("arguments", {})

for param_value in args.values():

if isinstance(param_value, str):

if any(pattern.lower() in param_value.lower() for pattern in dangerous_patterns):

return False

return True

@action(is_system_action=True)

async def self_check_tool_calls(tool_calls, context=None, **kwargs):

"""Test implementation of tool call validation."""

tool_calls = tool_calls or (context.get("tool_calls", []) if context else [])

return all(isinstance(call, dict) and "function" in call and "id" in call for call in tool_calls)

`@action`(is_system_action=True)

async def validate_tool_parameters(tool_calls, context=None, **kwargs):

"""Test implementation of tool parameter validation."""

tool_calls = tool_calls or (context.get("tool_calls", []) if context else [])

dangerous_patterns = ["eval", "exec", "system", "../", "rm -", "DROP", "DELETE"]

for tool_call in tool_calls:

if "function" in tool_call:

args = tool_call.get("function", {}).get("arguments", {}) or {}

else:

args = tool_call.get("args", {}) or {}

for param_value in args.values():

if isinstance(param_value, str):

if any(pattern.lower() in param_value.lower() for pattern in dangerous_patterns):

return False

return True

`@action`(is_system_action=True)

async def self_check_tool_calls(tool_calls, context=None, **kwargs):

"""Test implementation of tool call validation."""

tool_calls = tool_calls or (context.get("tool_calls", []) if context else [])

return all(

isinstance(call, dict)

and "id" in call

and ("function" in call or ("name" in call and "args" in call))

for call in tool_calls

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/integrations/langchain/test_tool_calls_event_extraction.py` around lines 27 - 49, The helper functions validate_tool_parameters and self_check_tool_calls currently only handle a nested function.arguments schema and miss LangChain's top-level tool-call shape (with top-level "name" and "args"), so dangerous strings never get inspected; update both functions to normalize the incoming call shape first by accepting either call.get("function", {}).get("arguments", {}) or call.get("args") (and likewise accept call.get("function", {}).get("name") or call.get("name")), then iterate parameter values from that normalized args mapping when checking dangerous patterns in validate_tool_parameters, and update self_check_tool_calls to consider a call valid if it has either a "function" dict with "id"/"function" or the top-level "name" and "args" fields (and still require an "id" if expected), ensuring both schemas are supported before performing the existing validations.

coderabbitai · 2026-04-16T09:29:11Z

+    call_count = 0
+
+    def mock_get_and_clear():
+        nonlocal call_count
+        call_count += 1
+        if call_count == 1:
+            return test_tool_calls
+        return None
+
+    with patch(
+        "nemoguardrails.actions.llm.utils.get_and_clear_tool_calls_contextvar",
+        side_effect=mock_get_and_clear,
+    ):
+        chat = TestChat(config, llm_completions=[""])
+
+        result = await chat.app.generate_async(messages=[{"role": "user", "content": "Test"}])
+
+        assert call_count >= 1, "get_and_clear_tool_calls_contextvar should be called"
+        assert result["tool_calls"] is not None
+        assert result["tool_calls"][0]["name"] == "test_tool"


⚠️ Potential issue | 🟡 Minor

Assert the “once” contract explicitly.

Line 136 only checks that the hook was called at least once, so an implementation that reads and clears the context multiple times would still pass. If this test is meant to lock down single-pop behavior, it should assert call_count == 1.

Proposed fix

- assert call_count >= 1, "get_and_clear_tool_calls_contextvar should be called" + assert call_count == 1, "get_and_clear_tool_calls_contextvar should only be called once"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/integrations/langchain/test_tool_calls_event_extraction.py` around lines 119 - 138, The test currently only asserts call_count >= 1 so multiple reads would still pass; modify the assertion to enforce the single-pop contract by checking call_count == 1 for the mocked get_and_clear_tool_calls_contextvar (mock_get_and_clear), keeping the rest of the expectations (result from TestChat.generate_async and result["tool_calls"] assertions) unchanged so the test verifies exactly one read/clear occurred.

coderabbitai · 2026-04-16T09:29:11Z

+@pytest.mark.asyncio
+async def test_tool_rails_cannot_clear_context_variable():
+    from nemoguardrails.context import tool_calls_var
+
+    test_tool_calls = [
+        {
+            "id": "call_blocked",
+            "type": "function",
+            "function": {
+                "name": "blocked_tool",
+                "arguments": {"param": "rm -rf /"},
+            },
+        }
+    ]
+
+    tool_calls_var.set(test_tool_calls)
+
+    context = {"tool_calls": test_tool_calls}
+    result = await validate_tool_parameters(test_tool_calls, context=context)
+
+    assert result is False
+    assert tool_calls_var.get() is not None, "Context variable should not be cleared by tool rails"
+    assert tool_calls_var.get()[0]["function"]["name"] == "blocked_tool"


⚠️ Potential issue | 🟠 Major

Restore tool_calls_var after the test.

Line 182 mutates shared ContextVar state and never resets it. That can leak stale tool calls into later tests that read tool_calls_var, making failures order-dependent.

Proposed fix

- tool_calls_var.set(test_tool_calls) - - context = {"tool_calls": test_tool_calls} - result = await validate_tool_parameters(test_tool_calls, context=context) - - assert result is False - assert tool_calls_var.get() is not None, "Context variable should not be cleared by tool rails" - assert tool_calls_var.get()[0]["function"]["name"] == "blocked_tool" + token = tool_calls_var.set(test_tool_calls) + try: + context = {"tool_calls": test_tool_calls} + result = await validate_tool_parameters(test_tool_calls, context=context) + + assert result is False + assert tool_calls_var.get() is not None, "Context variable should not be cleared by tool rails" + assert tool_calls_var.get()[0]["function"]["name"] == "blocked_tool" + finally: + tool_calls_var.reset(token)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@pytest.mark.asyncio

async def test_tool_rails_cannot_clear_context_variable():

from nemoguardrails.context import tool_calls_var

test_tool_calls = [

{

"id": "call_blocked",

"type": "function",

"function": {

"name": "blocked_tool",

"arguments": {"param": "rm -rf /"},

},

}

]

tool_calls_var.set(test_tool_calls)

context = {"tool_calls": test_tool_calls}

result = await validate_tool_parameters(test_tool_calls, context=context)

assert result is False

assert tool_calls_var.get() is not None, "Context variable should not be cleared by tool rails"

assert tool_calls_var.get()[0]["function"]["name"] == "blocked_tool"

`@pytest.mark.asyncio`

async def test_tool_rails_cannot_clear_context_variable():

from nemoguardrails.context import tool_calls_var

test_tool_calls = [

{

"id": "call_blocked",

"type": "function",

"function": {

"name": "blocked_tool",

"arguments": {"param": "rm -rf /"},

},

}

]

token = tool_calls_var.set(test_tool_calls)

try:

context = {"tool_calls": test_tool_calls}

result = await validate_tool_parameters(test_tool_calls, context=context)

assert result is False

assert tool_calls_var.get() is not None, "Context variable should not be cleared by tool rails"

assert tool_calls_var.get()[0]["function"]["name"] == "blocked_tool"

finally:

tool_calls_var.reset(token)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/integrations/langchain/test_tool_calls_event_extraction.py` around lines 167 - 189, The test mutates the shared ContextVar tool_calls_var and never restores it; change test_tool_rails_cannot_clear_context_variable to capture the token returned by tool_calls_var.set(...) and ensure tool_calls_var.reset(token) is called in a finally block (or teardown) so the ContextVar is restored even on assertion failures; reference the tool_calls_var and validate_tool_parameters symbols when making this change.

coderabbitai · 2026-04-16T09:29:11Z

+@action(is_system_action=True)
+async def validate_tool_parameters(tool_calls, context=None, **kwargs):
+    """Test implementation of tool parameter validation."""
+    tool_calls = tool_calls or (context.get("tool_calls", []) if context else [])
+
+    dangerous_patterns = ["eval", "exec", "system", "../", "rm -", "DROP", "DELETE"]
+
+    for tool_call in tool_calls:
+        func = tool_call.get("function", {})
+        args = func.get("arguments", {})
+        for param_value in args.values():
+            if isinstance(param_value, str):
+                if any(pattern.lower() in param_value.lower() for pattern in dangerous_patterns):
+                    return False
+    return True
+
+
+@action(is_system_action=True)
+async def self_check_tool_calls(tool_calls, context=None, **kwargs):
+    """Test implementation of tool call validation."""
+    tool_calls = tool_calls or (context.get("tool_calls", []) if context else [])
+
+    return all(isinstance(call, dict) and "function" in call and "id" in call for call in tool_calls)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

In langchain-core 0.2.14, what is the schema of each item in AIMessage.tool_calls? Does it use top-level "name"/"args"/"id"/"type", or a nested "function": {"name","arguments"} object?

💡 Result:

In langchain-core 0.2.14, each item in AIMessage.tool_calls is a ToolCall object (TypedDict) with top-level keys: name (str), args (dict[str, Any]), id (str | None), and optionally type (Literal["tool_call"]). It does not use a nested "function" object with "name" and "arguments".

Citations:

1: https://reference.langchain.com/python/langchain-core/messages/ai/AIMessage/tool_calls

2: [core]: add tool calls message langchain-ai/langchain#18947

3: AIMessage enforces ToolCall model incompatible with OpenAI langchain-ai/langchain#27938

4: https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/messages/tool.py

5: https://blog.langchain.dev/tool-calling-with-langchain/

🏁 Script executed:

fd "test_tool_output_rails.py" -t f

Repository: NVIDIA-NeMo/Guardrails

Length of output: 152

🏁 Script executed:

cat -n tests/integrations/langchain/test_tool_output_rails.py

Repository: NVIDIA-NeMo/Guardrails

Length of output: 8777

Fix validators to handle LangChain-style tool_calls format.

The test fixtures at lines 56, 107, and 167 provide LangChain-style tool_calls with top-level "name", "args", "id", and "type" keys. However, validate_tool_parameters (line 35–36) and self_check_tool_calls (line 49) are hardcoded for OpenAI-style nested format with "function" and "arguments". This causes:

validate_tool_parameters to silently skip dangerous patterns (always returns True)

self_check_tool_calls to reject valid tool_calls (always returns False)

Tests currently assert the opposite of intended behavior.

Proposed fix

`@action`(is_system_action=True) async def validate_tool_parameters(tool_calls, context=None, **kwargs): """Test implementation of tool parameter validation.""" tool_calls = tool_calls or (context.get("tool_calls", []) if context else []) dangerous_patterns = ["eval", "exec", "system", "../", "rm -", "DROP", "DELETE"] for tool_call in tool_calls: - func = tool_call.get("function", {}) - args = func.get("arguments", {}) + if "function" in tool_call: + args = tool_call.get("function", {}).get("arguments", {}) or {} + else: + args = tool_call.get("args", {}) or {} for param_value in args.values(): if isinstance(param_value, str): if any(pattern.lower() in param_value.lower() for pattern in dangerous_patterns): return False return True @@ `@action`(is_system_action=True) async def self_check_tool_calls(tool_calls, context=None, **kwargs): """Test implementation of tool call validation.""" tool_calls = tool_calls or (context.get("tool_calls", []) if context else []) - return all(isinstance(call, dict) and "function" in call and "id" in call for call in tool_calls) + return all( + isinstance(call, dict) + and "id" in call + and ("function" in call or ("name" in call and "args" in call)) + for call in tool_calls + )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/integrations/langchain/test_tool_output_rails.py` around lines 27 - 49, The validators assume an OpenAI nested format but tests use LangChain-style tool_calls; update validate_tool_parameters and self_check_tool_calls to accept both formats by normalizing tool_calls entries: for each call in validate_tool_parameters extract parameters from either call.get("function", {}).get("arguments", {}) or call.get("args", {}) (falling back appropriately) and continue checking strings against dangerous_patterns, and in self_check_tool_calls return True when each call is a dict that contains either the OpenAI keys ("function" and "id") or the LangChain keys ("name", "args", and "id"); reference the existing functions validate_tool_parameters, self_check_tool_calls, the dangerous_patterns list, and the tool_calls variable when making the changes.

coderabbitai · 2026-04-16T09:29:11Z

+def get_bound_llm_magic_mock(ainvoke_return_value: Union[AIMessage, dict]) -> MagicMock:
+    mock_llm = MagicMock()
+    mock_llm.return_value = mock_llm
+
+    bound_llm_mock = AsyncMock()
+    if isinstance(ainvoke_return_value, dict):
+        bound_llm_mock.ainvoke.return_value = MagicMock(**ainvoke_return_value)
+    else:
+        bound_llm_mock.ainvoke.return_value = ainvoke_return_value
+
+    mock_llm.bind.return_value = bound_llm_mock
+    if isinstance(ainvoke_return_value, dict):
+        mock_llm.ainvoke = AsyncMock(return_value=MagicMock(**ainvoke_return_value))
+    else:
+        mock_llm.ainvoke = AsyncMock(return_value=ainvoke_return_value)


⚠️ Potential issue | 🟡 Minor

Return a real dict from the helper's dict path.

MagicMock(**ainvoke_return_value) turns dict keys into attributes, not mapping behavior. Any code path that does isinstance(res, dict), res["key"], or res.get(...) will see a different shape than production, so this helper can mask bugs in the exact branch it claims to support.

Possible fix

def get_bound_llm_magic_mock(ainvoke_return_value: Union[AIMessage, dict]) -> MagicMock: mock_llm = MagicMock() mock_llm.return_value = mock_llm bound_llm_mock = AsyncMock() if isinstance(ainvoke_return_value, dict): - bound_llm_mock.ainvoke.return_value = MagicMock(**ainvoke_return_value) + bound_llm_mock.ainvoke.return_value = ainvoke_return_value else: bound_llm_mock.ainvoke.return_value = ainvoke_return_value mock_llm.bind.return_value = bound_llm_mock if isinstance(ainvoke_return_value, dict): - mock_llm.ainvoke = AsyncMock(return_value=MagicMock(**ainvoke_return_value)) + mock_llm.ainvoke = AsyncMock(return_value=ainvoke_return_value) else: mock_llm.ainvoke = AsyncMock(return_value=ainvoke_return_value) return mock_llm

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/integrations/langchain/utils.py` around lines 104 - 118, get_bound_llm_magic_mock is incorrectly wrapping dict responses with MagicMock(**...), which makes dicts behave like objects; change both places handling the dict branch so they return a real dict instead (e.g., use dict(ainvoke_return_value) or ainvoke_return_value directly) for bound_llm_mock.ainvoke.return_value and for mock_llm.ainvoke = AsyncMock(...), while keeping the existing AsyncMock/AIMessage branch behavior; update references to bound_llm_mock, mock_llm, and ainvoke_return_value accordingly.

coderabbitai · 2026-04-16T09:29:11Z

+    fake_llm = FakeLLMModel(
+        llm_responses=[
+            LLMResponse(
+                content="Processing with metadata.",
+                tool_calls=test_tool_calls,
+                provider_metadata={"model": "test-model", "usage": {"tokens": 50}},
+            )
+        ]
+    )
+    rails = LLMRails(config, llm=fake_llm)
+    result = await rails.generate_async(messages=[{"role": "user", "content": "Process with metadata"}])

-    assert result.tool_calls is not None
-    assert result.tool_calls[0]["name"] == "preserve_test"
-    assert result.content == ""
-    assert hasattr(result, "response_metadata")
+    assert result["tool_calls"] is not None
+    assert result["tool_calls"][0]["function"]["name"] == "preserve_test"
+    assert result["content"] == ""


⚠️ Potential issue | 🟡 Minor

This test no longer verifies metadata propagation.

provider_metadata is part of the fixture here, but none of the assertions inspect it. A regression that drops metadata entirely would still pass, so the test name and the behavior it actually covers have drifted apart. Either assert the public metadata output in this test or remove/rename the metadata setup.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/test_tool_calls_event_extraction.py` around lines 321 - 335, The test sets provider_metadata on the fake LLM response but never asserts it, so add an assertion to verify metadata propagation: after calling LLMRails.generate_async (using FakeLLMModel, LLMResponse, LLMRails) assert that the returned result includes the provider metadata (e.g., result["tool_calls"][0] contains the same provider_metadata dict or a public metadata field) to ensure metadata isn't dropped; update or add an assertion comparing the expected {"model": "test-model", "usage": {"tokens": 50}} to the value returned by result so the test actually verifies metadata propagation.

coderabbitai · 2026-04-16T09:29:11Z

+        if llm is not None:
+            self.llm = llm
+        elif llm_completions is not None:
            main_model = next((model for model in config.models if model.type == "main"), None)
            should_return_token_usage = bool(main_model and main_model.engine in _TEST_PROVIDERS_WITH_TOKEN_USAGE)

-            self.llm = FakeLLM(
+            self.llm = FakeLLMModel(
                responses=llm_completions,
                streaming=streaming,
+                exception=llm_exception,
                token_usage=token_usage,
                should_return_token_usage=should_return_token_usage,
            )
-            if llm_exception:
-                self.llm.exception = llm_exception
+        else:
+            self.llm = None


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash python - <<'PY' import ast from pathlib import Path for path in Path(".").rglob("*.py"): try: tree = ast.parse(path.read_text()) except Exception: continue for node in ast.walk(tree): if isinstance(node, ast.Call) and getattr(node.func, "id", None) == "TestChat": kw_names = {kw.arg for kw in node.keywords if kw.arg} if len(node.args) == 1 and "llm" not in kw_names and "llm_completions" not in kw_names: print(f"{path}:{node.lineno}: TestChat called without llm or llm_completions") PY

Repository: NVIDIA-NeMo/Guardrails

Length of output: 940

Address TestChat silent fallback to None LLM.

The new else: self.llm = None at line 171 silently switches instances created without llm or llm_completions to bypass the test double entirely. This affects at least 11 existing test calls:

tests/test_ai_defense.py (lines 72, 642, 1431, 1465, 1502, 1540, 1584, 1625)

tests/test_jailbreak_heuristics.py (lines 52, 61)

tests/test_jailbreak_models.py (line 56)

Instead of fail-fast behavior, tests now delegate to real LLMRails without a fake, hiding unexpected model calls and changing failure modes from "missing stub" to config/provider-dependent errors. Either provide an empty FakeLLMModel as the default or raise an error when both parameters are missing.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/utils.py` around lines 158 - 172, The constructor logic in TestChat silently sets self.llm = None when both llm and llm_completions are missing, causing tests to hit the real LLM; change the else branch to either raise a clear error or instantiate a default stubbed FakeLLMModel instead. Specifically, in the TestChat initialization where llm and llm_completions are checked, replace "else: self.llm = None" with one of: (a) raise ValueError("No llm or llm_completions provided to TestChat; tests must supply a fake"), or (b) self.llm = FakeLLMModel(responses=[], streaming=False, exception=None, token_usage=None, should_return_token_usage=False) so tests fail-fast or use an empty fake respectively; refer to the TestChat constructor and the FakeLLMModel symbol when applying the change.

tgasser-nv

There are some tests outside of tests/integrations/langchain so they're not covered by conftest.py skip logic. These would need moving to the new location.

tests/test_configs/with_custom_llm/config.py → imports CustomLLM
tests/test_configs/with_custom_llm/custom_llm.py → extends langchain_core.language_models.BaseLLM
tests/test_configs/with_custom_chat_model/config.py → imports CustomChatModel
tests/test_configs/with_custom_chat_model/custom_chat_model.py → extends langchain_core.language_models.BaseChatModel

Could you also take a look at the conftest.py comment from Code rabbit

Pouyanpi · 2026-04-17T12:41:57Z

There are some tests outside of tests/integrations/langchain so they're not covered by conftest.py skip logic. These would need moving to the new location.

tests/test_configs/with_custom_llm/config.py → imports CustomLLM

tests/test_configs/with_custom_llm/custom_llm.py → extends langchain_core.language_models.BaseLLM

tests/test_configs/with_custom_chat_model/config.py → imports CustomChatModel

tests/test_configs/with_custom_chat_model/custom_chat_model.py → extends langchain_core.language_models.BaseChatModel

These would surface when changing the default framework. I've fixed these in the last stack. I'll bring them here.

Could you also take a look at the conftest.py comment from Code rabbit

Yes. This one is critical. Fixed it, will push 👍🏻

…integrations/langchain/ test_custom_llm.py is LangChain-specific — it asserts that custom_llm and custom_chat_model hooks register against the LangChain provider registries. It belongs alongside the other LangChain-only tests already under tests/integrations/langchain/. The three LangChain-based test configs (with_custom_llm, with_custom_chat_model, with_custom_llm_prompt_action_v2_x) are duplicated into tests/integrations/langchain/test_configs/ so the moved test continues to resolve them via its sibling ./test_configs path. The originals under tests/test_configs/ remain in place as the default-framework versions — they will be rewritten for the LLMModel protocol in the framework cutover that introduces the new default.

Replace pytest_collection_modifyitems with collect_ignore_glob so LangChain-gated test modules are skipped before pytest imports them.

Pouyanpi added the refactoring label Apr 14, 2026

Pouyanpi modified the milestones: v0.23.0, v0.22.0 Apr 14, 2026

Pouyanpi self-assigned this Apr 14, 2026

greptile-apps bot reviewed Apr 14, 2026

View reviewed changes

Comment thread tests/utils.py

Comment thread tests/conftest.py Outdated

Comment thread tests/test_output_rails_tool_calls.py

Pouyanpi force-pushed the refactor/langchain-decouple/stack-8-test-infrastructure branch 3 times, most recently from 9c17d45 to a77a7f6 Compare April 14, 2026 13:49

Pouyanpi force-pushed the feat/langchain-decouple/stack-11-streaming-parity branch from a89e626 to fbf78bb Compare April 16, 2026 09:00

Base automatically changed from feat/langchain-decouple/stack-11-streaming-parity to develop April 16, 2026 09:09

Pouyanpi added 7 commits April 16, 2026 11:12

refactor(tests): move LangChain-specific tests to tests/integrations/…

b89552d

…langchain/ Pure file moves with git mv, no content changes. GitHub should detect these as renames.

fix(tests): update FakeLLM import path in runnable_rails integration …

4ed9b3f

…tests

fix failing tests after rebase to develop

28fa711

apply review suggestions

cba65a1

refactor(tests): move get_bound_llm_magic_mock to utils

e19d879

Pouyanpi force-pushed the refactor/langchain-decouple/stack-8-test-infrastructure branch from 3739452 to e19d879 Compare April 16, 2026 09:13

coderabbitai bot reviewed Apr 16, 2026

View reviewed changes

tgasser-nv reviewed Apr 17, 2026

View reviewed changes

Comment thread tests/test_llmrails.py

greptile-apps bot reviewed Apr 17, 2026

View reviewed changes

Comment thread tests/conftest.py Outdated

apply coderabbit review suggestion

344c8f7

Replace pytest_collection_modifyitems with collect_ignore_glob so LangChain-gated test modules are skipped before pytest imports them.

Pouyanpi force-pushed the refactor/langchain-decouple/stack-8-test-infrastructure branch from ca11b55 to 344c8f7 Compare April 17, 2026 14:01

Conversation

Pouyanpi commented Apr 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why the large diffs

Review guidance

Test plan

Summary by CodeRabbit

Uh oh!

greptile-apps bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Apr 14, 2026

Codecov Report

Uh oh!

coderabbitai bot commented Apr 16, 2026

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

tgasser-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Pouyanpi commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Pouyanpi commented Apr 14, 2026 •

edited by coderabbitai bot

Loading

greptile-apps bot commented Apr 14, 2026 •

edited

Loading