You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These are the shared vocabulary between the streaming orchestrator and its consumers — UIs, m serve, and plugins. Without a standard set of types in mellea, every consumer defines its own parallel types or takes a dependency on mellea-partial. The sampling strategy plugin hooks (SAMPLING_LOOP_START, SAMPLING_REPAIR, etc.) do not fire during streaming — these event types are the Phase 1 observability substitute. Note that on early exit via cancel_generation(), post_processing() does not run, so mot.generation.usage (token counts) will be None; the CompletedEvent should capture whether generation completed normally or was cancelled.
These are plain dataclasses with no logic and no transport dependency — the minimum mellea needs to own. Once this lands, Hendrik can update mellea-partial to import and emit these types directly.
Implementation
Add to mellea/stdlib/streaming.py. timestamp should be auto-populated in __post_init__ using time.time() so callers do not need to supply it. Export all event types from mellea.stdlib.
Acceptance criteria
All seven event types exist in mellea/stdlib/streaming.py
StreamEvent.timestamp is auto-populated in __post_init__ — callers do not set it
QuickCheckEvent.results typed as list[PartialValidationResult]
FullValidationEvent.results typed as list[ValidationResult]
All types exported from mellea.stdlib
Unit tests in test/stdlib/test_streaming.py covering construction of each event type
Google-style docstring on each event type describing when it is emitted
Additional scope: orchestrator emits events and bridges to OTEL
Deferred from #901 per the epic's explicit instruction that streaming observability lives in the event types, not in the sampling plugin hooks. This issue is the right home because event emission and OTEL recording happen at the same call sites.
Event emission from stream_with_chunking
The orchestrator must emit each event type at the appropriate point:
ChunkEvent — on each validated chunk yielded to the consumer (post-validation, including the flushed trailing fragment)
QuickCheckEvent — after each per-chunk stream_validate batch (one per chunk, covering all active requirements)
StreamingDoneEvent — when the raw token stream ends, before final validate()
FullValidationEvent — after the final validate() call on non-failed requirements
CompletedEvent — on orchestrator exit, including early-exit cases where completed=False
New: ErrorEvent — if stream_validate raises or any other unplanned exit. Replaces the temporary MelleaLogger.warning stopgap in the exception path of _orchestrate_streaming (introduced in feat(stdlib): add stream_with_chunking() with per-chunk validation (#901) #942 to satisfy "fail loud" while this issue lands).
RetryEvent stays deferred — v1 retry is caller-driven re-invocation of stream_with_chunking, so there is no orchestrator-side emission point yet. Revisit when an orchestrator-side retry mechanism is introduced.
OTEL bridge at the event-emission sites
Open an application trace span for stream_with_chunking (mellea.application scope). No orchestrator-level span exists today. Close with set_span_error on early-exit fail. This is separate from the backend span, which cancel_generation() already closes correctly (base.py:413-419).
Each event additionally becomes a span event on the application span, with event fields attached as attributes.
QuickCheckEvent records record_requirement_check; QuickCheckEvent with passed=False additionally records record_requirement_failure with the fail reason.
CompletedEvent records record_sampling_outcome("stream_with_chunking", success=...).
ErrorEvent records record_error(error_type, provider, model).
Open design questions to resolve during implementation
Delivery mechanism. Two candidates: (a) StreamChunkingResult.events() async iterator mirroring the existing astream(); (b) stream_with_chunking(..., on_event=callback). (a) is more consistent with the existing API but requires a second queue and consumer-single-consumer discipline. (b) is simpler to bridge synchronously into mellea-partial UI consumers. Pick one during implementation based on whichever is cleaner to bridge to mellea-partial.
RetryEvent scope. Confirmed deferred here because v1 retry is caller-driven. When/if an orchestrator-side retry lands (likely a future issue, not in the current epic), decide at that point whether RetryEvent fires from within the orchestrator or from the caller that drives the retry loop.
Additional acceptance criteria
Add ErrorEvent dataclass with exception_type: str and detail: str fields (in addition to inherited timestamp)
stream_with_chunking emits ChunkEvent, QuickCheckEvent, StreamingDoneEvent, FullValidationEvent, CompletedEvent, and ErrorEvent at the documented points
Exception cleanup in _orchestrate_streaming replaces the MelleaLogger.warning stopgap with an ErrorEvent emission (remove the # TODO(#902) comment at the same time)
Application trace span opened and closed by the orchestrator, set_span_error called on early-exit fail
record_requirement_check, record_requirement_failure, record_sampling_outcome, record_error called at the documented points
Unit tests for event emission order and content via a capturing subscriber
OTEL side-effect tests via existing metric/span test fixtures
Delivery mechanism (async iterator vs. callback) decided and documented in the orchestrator docstring
What
Add a standard event dataclass hierarchy to
mellea/stdlib/streaming.py(the module created in #901).Why
These are the shared vocabulary between the streaming orchestrator and its consumers — UIs,
m serve, and plugins. Without a standard set of types in mellea, every consumer defines its own parallel types or takes a dependency onmellea-partial. The sampling strategy plugin hooks (SAMPLING_LOOP_START,SAMPLING_REPAIR, etc.) do not fire during streaming — these event types are the Phase 1 observability substitute. Note that on early exit viacancel_generation(),post_processing()does not run, somot.generation.usage(token counts) will beNone; theCompletedEventshould capture whether generation completed normally or was cancelled.These are plain dataclasses with no logic and no transport dependency — the minimum mellea needs to own. Once this lands, Hendrik can update
mellea-partialto import and emit these types directly.Implementation
Add to
mellea/stdlib/streaming.py.timestampshould be auto-populated in__post_init__usingtime.time()so callers do not need to supply it. Export all event types frommellea.stdlib.Acceptance criteria
mellea/stdlib/streaming.pyStreamEvent.timestampis auto-populated in__post_init__— callers do not set itQuickCheckEvent.resultstyped aslist[PartialValidationResult]FullValidationEvent.resultstyped aslist[ValidationResult]mellea.stdlibtest/stdlib/test_streaming.pycovering construction of each event typeAdditional scope: orchestrator emits events and bridges to OTEL
Deferred from #901 per the epic's explicit instruction that streaming observability lives in the event types, not in the sampling plugin hooks. This issue is the right home because event emission and OTEL recording happen at the same call sites.
Event emission from
stream_with_chunkingThe orchestrator must emit each event type at the appropriate point:
ChunkEvent— on each validated chunk yielded to the consumer (post-validation, including the flushed trailing fragment)QuickCheckEvent— after each per-chunkstream_validatebatch (one per chunk, covering all active requirements)StreamingDoneEvent— when the raw token stream ends, before finalvalidate()FullValidationEvent— after the finalvalidate()call on non-failed requirementsCompletedEvent— on orchestrator exit, including early-exit cases wherecompleted=FalseErrorEvent— ifstream_validateraises or any other unplanned exit. Replaces the temporaryMelleaLogger.warningstopgap in the exception path of_orchestrate_streaming(introduced in feat(stdlib): add stream_with_chunking() with per-chunk validation (#901) #942 to satisfy "fail loud" while this issue lands).RetryEventstays deferred — v1 retry is caller-driven re-invocation ofstream_with_chunking, so there is no orchestrator-side emission point yet. Revisit when an orchestrator-side retry mechanism is introduced.OTEL bridge at the event-emission sites
stream_with_chunking(mellea.applicationscope). No orchestrator-level span exists today. Close withset_span_erroron early-exit fail. This is separate from the backend span, whichcancel_generation()already closes correctly (base.py:413-419).QuickCheckEventrecordsrecord_requirement_check;QuickCheckEventwithpassed=Falseadditionally recordsrecord_requirement_failurewith the fail reason.CompletedEventrecordsrecord_sampling_outcome("stream_with_chunking", success=...).ErrorEventrecordsrecord_error(error_type, provider, model).Open design questions to resolve during implementation
StreamChunkingResult.events()async iterator mirroring the existingastream(); (b)stream_with_chunking(..., on_event=callback). (a) is more consistent with the existing API but requires a second queue and consumer-single-consumer discipline. (b) is simpler to bridge synchronously intomellea-partialUI consumers. Pick one during implementation based on whichever is cleaner to bridge tomellea-partial.RetryEventscope. Confirmed deferred here because v1 retry is caller-driven. When/if an orchestrator-side retry lands (likely a future issue, not in the current epic), decide at that point whetherRetryEventfires from within the orchestrator or from the caller that drives the retry loop.Additional acceptance criteria
ErrorEventdataclass withexception_type: stranddetail: strfields (in addition to inheritedtimestamp)stream_with_chunkingemitsChunkEvent,QuickCheckEvent,StreamingDoneEvent,FullValidationEvent,CompletedEvent, andErrorEventat the documented points_orchestrate_streamingreplaces theMelleaLogger.warningstopgap with anErrorEventemission (remove the# TODO(#902)comment at the same time)set_span_errorcalled on early-exit failrecord_requirement_check,record_requirement_failure,record_sampling_outcome,record_errorcalled at the documented pointsBlocked by #901
Part of #891