feat: groundedness requirement by akihikokuroda · Pull Request #773 · generative-computing/mellea

akihikokuroda · 2026-04-01T20:07:40Z

Misc PR

Type of PR

Bug Fix
New Feature
Documentation
Other

Description

Link to Issue: Fixes IVR Groundedness Validator Using Citations #775

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code as added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

github-actions · 2026-04-01T20:08:00Z

The PR description has been updated. Please fill out the template for your PR to be reviewed.

jakelorocco

This is a very interesting requirement. I think it's a good opportunity to show off Mellea intrinsics and requirement checking. I'm not sure we have many other requirements with as many llm calls.

One broader suggestion than the comments I left below: Could we parallelize the steps? Could we generate citations at the same time we check spans for needing citations? As we generate spans that need to be checked, could we check each in parallel or as they are given? If so, I think we should make this requirement work more asynchronously and have an early exit mode if a span fails the check (even if not all citations have been generated / not all spans have been checked).

psschwei · 2026-04-02T17:06:50Z

cc @generative-computing/mellea-intrinsics

akihikokuroda · 2026-04-02T21:10:14Z

@jakelorocco Thanks for review. I addressed all your comments except "Could we parallelize the steps?". I'm working on it.

akihikokuroda · 2026-04-02T23:06:39Z

@jakelorocco There are 2 ideas improve the requirement.
For this:
OPTIMIZED_PIPELINE_DESIGN.md
I'm checking if citation intrinsic works in this usage.

This one does not parallelize the processing but it make a batch call for the citation support step.
COMBINED_SUPPORT_ASSESSMENT_DESIGN.md

akihikokuroda · 2026-04-03T14:41:41Z

The "parallelize" seems some work/investigation necessary. So I improved "citation support" step to make only one LLM call instead of calling LLM for each span.

psschwei · 2026-04-07T17:22:36Z

cc @yannisk2

akihikokuroda · 2026-04-07T19:38:31Z

@planetf1 Thanks for review. I addressed all your comments.

planetf1

LGTM and thanks for addressing comments. I'll leave this as a comment as there are some items outstanding from @jakelorocco which I think may need addressing

akihikokuroda · 2026-04-29T15:03:33Z

@psschwei this is PR.

yannisk2

@akihikokuroda Thank you for putting this together! Please see below for a few additional comments on improvements/changes.

yannisk2 · 2026-04-29T22:00:40Z

+        if self.documents is not None:
+            documents = self.documents
+        else:
+            documents = last_message._docs or []


For the case where the documents are not directly provided to the requirement checker and are read off the context instead, do we have a standard design pattern for RAG showing where the documents should be attached to? I see at least three options:

Documents are attached to the last assistant message (which the code in this PR assumes)

Documents are attached to the last user message

Documents are attached using the grounding_context (which is what the RAG example at https://github.com/generative-computing/mellea/tree/main/docs/examples/rag uses).

I think that it would help to standardize the way that documents are passed in RAG scenarios and then align the code in this PR to read them from the corresponding location, so that the users that employ the requirement checker as part of an RAG IVR pattern would not have to pass the documents twice (once for response generation and a second time for the requirement check).

I agree that standardizing document passing in RAG scenarios would improve the developer experience and avoid requiring documents to be passed twice (once for response generation and once for requirement validation).
The current design supports documents in the constructor or attached to the assistant message, which works but isn't aligned with the grounding_context pattern used in existing RAG examples like simple_rag_with_filter.py.

I'd like to defer this design alignment to a follow-on PR. That work should:

Standardize on the grounding_context pattern for document passing

Refactor ChatContext (or add grounding context support) to make documents available throughout the pipeline

Update GroundednessRequirement to read documents from the context instead of requiring explicit constructor/message passing

Update examples and documentation to show the unified pattern

This deserves dedicated attention, discussion with the core team and testing to ensure it works well across all RAG use cases. I'll open a tracking issue to capture this improvement.

I agree that this will require a broader discussion and coordination with the core team. If we can capture this as a separate issue, I am completely fine with addressing it in a separate PR.

yannisk2 · 2026-04-29T22:01:48Z

+        try:
+            # Step 1: Citation Generation
+            # Call intrinsic directly for explicit control over model options
+            from ..components.intrinsic._util import call_intrinsic


Can the import statement be moved to the top of the file?

Inline comment added to explain.

The lazy import is actually necessary due to a circular dependency:

mellea.stdlib.requirements.rag imports from mellea.stdlib.components

Which transitively imports from mellea.backends

If the import to call_intrinsic is at module level, it triggers the backends module before initialization completes

yannisk2 · 2026-04-29T22:10:03Z

+            citation_context = context_before_response.add(
+                Message("assistant", response, documents=list(documents))
+            )
+            citations: list[dict] = call_intrinsic(


I would suggest using the find_citations function instead of the call_intrinsic function (which is internally called by the former) to avoid replicating here the code of the find_citations function. This should also make the code cleaner, as if we were to call the find_citations function, we would not need to add back to the context the last assistant message that we have just separated from the original context.

I'll make changes. Thanks!

yannisk2 · 2026-04-29T22:13:12Z

+        covered_ranges.sort()
+        merged_ranges: list[tuple[int, int]] = []
+        for begin, end in covered_ranges:
+            if merged_ranges and begin <= merged_ranges[-1][1]:


Ensure that consecutive non-overlapping spans are not merged (may have to replace <= with < above).

I fix it. Thanks!

yannisk2 · 2026-04-29T22:18:46Z

+        current_span_start = 0
+        current_is_covered = is_covered(0) if response else False
+
+        for i in range(1, len(response) + 1):


Identifying the spans could be done more efficiently by iterating over the merged_ranges instead of iterating over every single character in the response.

yannisk2 · 2026-04-29T22:21:37Z

+            result, _ = await backend.generate_from_context(
+                action,
+                context,
+                model_options={"temperature": 0.0, "max_new_tokens": 500},


Could we have a higher default of max_new_tokens (or make it configurable)?

make it configurable. Thanks!

yannisk2 · 2026-04-29T22:21:46Z

+            result, _ = await backend.generate_from_context(
+                action,
+                context,
+                model_options={"temperature": 0.0, "max_new_tokens": 500},


Could we have a higher default of max_new_tokens (or make it configurable)?

make it configurable. Thanks!

yannisk2 · 2026-04-29T22:31:52Z

@@ -0,0 +1,514 @@
+"""Tests for GroundednessRequirement."""


In addition to the current tests, can we also add a few tests that check end-to-end the correctness of the requirement checker? Test cases could include simple examples of grounded, ungrounded, or partially grounded responses, responses that does not need citations (e.g., I-do-not-know), etc.

I agree that end-to-end correctness tests validating real grounded/ungrounded/partially-grounded responses would strengthen the test suite.

I'd like to defer adding those tests to a follow-on PR. The reason is that comprehensive correctness tests require careful data engineering to craft responses that are genuinely grounded vs. ungrounded by the provided documents, which deserves focused attention.

I'll open a follow-on issue/PR to track adding:

Tests for fully grounded responses (should pass)

Tests for ungrounded responses (should fail)

Tests for partially grounded responses

Tests for responses that don't need citations (I-don't-know, disclaimers, etc.)

These can be marked with `@pytest.mark.slow` since they'll require GPU inference and real backend validation.

yannisk2 · 2026-04-29T22:52:58Z

+        )
+        return prompt
+
+    def _build_batch_support_prompt(


This prompt should also include the documents, as in order for an LLM to decide if a citation supports a response span, it may need to reason about the citation in the context of the document in which the citation appears. For instance, consider the following example:

Document: "IBM ... Its headquarters are in Armonk, NY"

Response sentence: "IBM is headquartered in Armonk, NY"

Citation for response sentence: "Its headquarters are in Armonk, NY"

In this example, for an LLM to verify that the citation supports the response sentence, it has to be aware of the document where the citation appears, so that it can verify that the word "its" in the citation indeed refers to IBM.

Based on the above, the support prompt has to also include the documents.

I'll fix it. Thanks!

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

akihikokuroda requested a review from a team as a code owner April 1, 2026 20:07

akihikokuroda changed the title ~~groundedness requirement~~ feat: groundedness requirement Apr 1, 2026

github-actions Bot added the enhancement New feature or request label Apr 1, 2026

jakelorocco reviewed Apr 2, 2026

View reviewed changes

akihikokuroda force-pushed the citation branch from 7e8f552 to bcc5051 Compare April 2, 2026 17:00

akihikokuroda requested a review from jakelorocco April 3, 2026 15:09

planetf1 reviewed Apr 7, 2026

View reviewed changes

Comment thread mellea/stdlib/requirements/rag.py

planetf1 reviewed Apr 7, 2026

View reviewed changes

Comment thread test/stdlib/requirements/test_groundedness_requirement.py Outdated

planetf1 reviewed Apr 7, 2026

View reviewed changes

Comment thread mellea/stdlib/requirements/rag.py Outdated

planetf1 reviewed Apr 7, 2026

View reviewed changes

Comment thread mellea/stdlib/requirements/rag.py

akihikokuroda requested a review from a team as a code owner April 7, 2026 19:37

akihikokuroda requested a review from planetf1 April 7, 2026 19:37

akihikokuroda self-assigned this Apr 8, 2026

planetf1 reviewed Apr 10, 2026

View reviewed changes

yannisk2 reviewed Apr 29, 2026

View reviewed changes

akihikokuroda added 7 commits April 30, 2026 12:18

feat: citation requirement

ff3a61a

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

fix test error

67dceb7

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

fix example

a38a4f8

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

fix dockstring issue

256b870

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

fix test name conflict

eb1fc8a

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

review comments

218c5cd

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

feat: review comments

3a686b0

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

akihikokuroda added 11 commits April 30, 2026 12:18

feat: review comments

dbe392c

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

review comment

0e3e45d

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

add groundedness requirement

286ba4c

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

review comments

c9120af

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

_assess_citation_support() improvement

2aa5d89

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

fix _assess_citation_support()

3a36714

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

review comments

97eed36

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

review comments

d1a98da

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

review comments

5623420

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

review comments

52af333

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

review comments

cc22642

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

akihikokuroda force-pushed the citation branch from 8a833ee to cc22642 Compare April 30, 2026 16:19

fix merge error

7f676cf

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

akihikokuroda requested a review from yannisk2 April 30, 2026 16:40

Conversation

akihikokuroda commented Apr 1, 2026 • edited by serjikibm Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Misc PR

Type of PR

Description

Testing

Uh oh!

github-actions Bot commented Apr 1, 2026

Uh oh!

jakelorocco left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

psschwei commented Apr 2, 2026

Uh oh!

akihikokuroda commented Apr 2, 2026

Uh oh!

akihikokuroda commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akihikokuroda commented Apr 3, 2026

Uh oh!

psschwei commented Apr 7, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

akihikokuroda commented Apr 7, 2026

Uh oh!

planetf1 left a comment

Choose a reason for hiding this comment

Uh oh!

akihikokuroda commented Apr 29, 2026

Uh oh!

yannisk2 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akihikokuroda commented Apr 1, 2026 •

edited by serjikibm

Loading

akihikokuroda commented Apr 2, 2026 •

edited

Loading