Skip to content

Commit 35df77f

Browse files
committed
docs(stdlib): fix example for delta semantics and note validator latency
Two documentation fixes following the per-chunk semantics correction: - streaming_chunking.py: MaxSentencesReq previously counted sentence-end punctuation in the chunk, which worked under the old accumulated-text behaviour but returns at most 1 per sentence under delta semantics. Rewritten to increment self._count once per chunk -- the canonical pattern for a requirement that needs context beyond a single chunk. - stream_with_chunking docstring: add a Note that chunks are emitted to the consumer only after every active validator returns for that chunk. A slow stream_validate (e.g. an LLM-based one) therefore adds latency to every chunk. The invariant preserved is that the consumer never sees unvalidated content; a concurrent-emission fast path may be added in future if a concrete use case calls for it. Assisted-by: Claude Code
1 parent ea6bdb0 commit 35df77f

2 files changed

Lines changed: 22 additions & 3 deletions

File tree

docs/examples/streaming/streaming_chunking.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,13 @@
2323

2424

2525
class MaxSentencesReq(Requirement):
26-
"""Fails if the model generates more than *limit* sentences mid-stream."""
26+
"""Fails if the model generates more than *limit* sentences mid-stream.
27+
28+
Each ``stream_validate`` call receives one complete sentence from the
29+
:class:`~mellea.stdlib.chunking.SentenceChunker`. The running count is
30+
maintained on ``self`` — this is the standard pattern for requirements
31+
that need context beyond a single chunk.
32+
"""
2733

2834
def __init__(self, limit: int) -> None:
2935
self._limit = limit
@@ -35,8 +41,8 @@ def format_for_llm(self) -> str:
3541
async def stream_validate(
3642
self, chunk: str, *, backend: Backend, ctx: Context
3743
) -> PartialValidationResult:
38-
sentence_count = chunk.count(".") + chunk.count("!") + chunk.count("?")
39-
if sentence_count > self._limit:
44+
self._count += 1
45+
if self._count > self._limit:
4046
return PartialValidationResult(
4147
"fail",
4248
reason=f"Response exceeded {self._limit} sentence limit mid-stream",

mellea/stdlib/streaming.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -247,6 +247,19 @@ async def stream_with_chunking(
247247
``self._seen = self._seen + chunk``). They must not read ``mot.astream()``
248248
directly — this orchestrator is the single consumer of the MOT stream.
249249
250+
Note:
251+
Chunks are emitted to the consumer (via
252+
:meth:`StreamChunkingResult.astream`) only after every requirement's
253+
``stream_validate`` has returned for that chunk. A slow validator
254+
(for example, one that invokes an LLM) therefore adds latency to
255+
every chunk — the consumer sees a chunk at most as quickly as the
256+
slowest active validator. This trade is deliberate in v1: it
257+
preserves the invariant that the consumer never sees content that
258+
has not been validated, which matters for UIs displaying generated
259+
text live. A future fast-path mode that emits chunks to the
260+
consumer concurrently with validation (at the cost of that
261+
invariant) may be added if a concrete use case calls for it.
262+
250263
Note:
251264
v1 retry is simple re-invocation of this function. Plugin hooks
252265
(``SAMPLING_LOOP_START``, ``SAMPLING_REPAIR``, etc.) do not fire

0 commit comments

Comments
 (0)