Skip to content

docs: migrate Guardian documentation from deprecated GuardianCheck to Intrinsics API#935

Draft
planetf1 wants to merge 6 commits intogenerative-computing:mainfrom
planetf1:cs/issue-guardian1
Draft

docs: migrate Guardian documentation from deprecated GuardianCheck to Intrinsics API#935
planetf1 wants to merge 6 commits intogenerative-computing:mainfrom
planetf1:cs/issue-guardian1

Conversation

@planetf1
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 commented Apr 24, 2026

Guardian Documentation Migration

DRAFT — paused pending upstream intrinsics changes. See status note below.

Status

Paused as of 2026-05-01. This PR depends on active intrinsics work that is still in flux:

Plan: wait for #981, #986, and #988 to merge and stabilise, then re-review this PR, update model IDs throughout, and verify examples against the new adapter versions before marking ready.


Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

Migrates Guardian documentation from the deprecated GuardianCheck/GuardianRisk API (emits DeprecationWarning since v0.4) to the current Guardian Intrinsics API (guardian_check(), policy_guardrails(), factuality_detection(), factuality_correction()).

Key changes:

  • New /how-to/safety-guardrails page — full reference for all four Intrinsic functions, CRITERIA_BANK keys, and the target_role="user" input-gating pattern
  • build-a-rag-pipeline.md step 5 and "Putting it together" rewritten to use guardian_check(criteria="groundedness") with Document(text=..., doc_id=...) attached to the assistant message (aligned with fix: add guardian intrinsic document #966)
  • docs/examples/safety/ example files deletedguardian.py, guardian_huggingface.py, and repair_with_guardian.py removed (see below)
  • Deprecation banner added to security-and-taint-tracking.md
  • Glossary: 5 new entries (guardian_check, CRITERIA_BANK, policy_guardrails, factuality_detection, factuality_correction); GuardianCheck/GuardianRisk entries marked deprecated
  • docs.json: how-to/safety-guardrails added to nav; redirect from that path to security-and-taint-tracking removed
  • examples/index.md: intrinsics/ category description updated to clarify Guardian functions are documented separately
  • Guardian Intrinsics cross-link added to advanced/intrinsics.md
  • Safety card on index.mdx updated to reference Intrinsics
  • Session subclass example in use-context-and-sessions.md rewritten (SafeChatSession now accepts guardian_backend as a constructor arg)
  • Common-errors guardian section rewritten
  • concepts/architecture-vs-agents.md, concepts/plugins.mdx, and guide/CONTRIBUTING.md links updated
  • observability/metrics.md: note added that Guardian Intrinsics do not emit mellea.requirement metrics (migration footgun)
  • Typo fix: "Determine is""Determine if" in factuality_detection docstring
  • Fixed -> float return annotations on factuality_detection / factuality_correction (they return str; closes fix(core): wrong return type annotations on factuality_detection and factuality_correction #934)
  • Removed "sexual_content" from tutorial CRITERIA_BANK key list (not a real key; GuardianRisk.SEXUAL_CONTENT has no equivalent in CRITERIA_BANK)

Note on tutorial 04: Steps 4–7 of 04-making-agents-reliable.md were independently migrated to Guardian Intrinsics upstream before this PR was rebased; those upstream changes were taken as-is.


Deletion of docs/examples/safety/ examples — reviewer input requested

guardian.py, guardian_huggingface.py, and repair_with_guardian.py have been deleted rather than retained with deprecation markers. Rationale:

  • guardian.py and guardian_huggingface.py are fully superseded by docs/examples/intrinsics/guardian_core.py, which covers all the same criteria (harm, jailbreak, social_bias, groundedness, function_call, custom criteria) against the same HuggingFace backend. Keeping them would mean CI eventually breaking when GuardianCheck is removed, with no benefit.

  • repair_with_guardian.py demonstrated GuardianCheck as a Requirement inside RepairTemplateStrategy, where Guardian's chain-of-thought _reason string was fed back as repair guidance. This pattern has no direct equivalent in the Guardian Intrinsics API: Intrinsics return a float score and do not expose a reasoning string, so they cannot be passed to m.validate() or wired into RepairTemplateStrategy directly. A safety/README.md is retained to document this gap explicitly.

If you believe repair_with_guardian.py should be kept (or that the RepairTemplateStrategy gap warrants a separate issue), please comment — the example can be restored.


Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used

@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Apr 24, 2026
@planetf1 planetf1 force-pushed the cs/issue-guardian1 branch 4 times, most recently from 684ba01 to 3e0d4dc Compare April 24, 2026 20:10
planetf1 added 3 commits May 1, 2026 11:08
…anCheck to Intrinsics API

Migrates docs, examples, and cross-links from the deprecated GuardianCheck/GuardianRisk
API to the current Guardian Intrinsics API (guardian_check(), policy_guardrails(),
factuality_detection(), factuality_correction()).

- New how-to/safety-guardrails.md: full reference for all four Intrinsic functions,
  CRITERIA_BANK keys, and the target_role="user" input-gating pattern
- Tutorial 04 steps 4–7 rewritten to use Intrinsics; prerequisites updated
- Glossary: 5 new entries; GuardianCheck/GuardianRisk entries marked deprecated
- Deprecation banners added to security-and-taint-tracking.md and three example files
- docs.json: safety-guardrails added to nav; temporary redirect removed
- Cross-links updated in intrinsics.md, index.mdx, build-a-rag-pipeline.md,
  use-context-and-sessions.md, common-errors.md, architecture-vs-agents.md, plugins.mdx

Partially addresses generative-computing#639, generative-computing#802.

Assisted-by: Claude Code
- Fix stale `grounding_context` tip in tutorial step 6 — was referencing
  a parameter removed from the code example (3/3 reviewer consensus)
- Add deprecation notice to docs/examples/safety/README.md to match the
  deprecation docstrings already added to the three .py files
- Resolve duplicate `intrinsics/` entries in examples/index.md — the Safety
  section row covers Guardian functions; the Performance row gains a
  "(Non-Guardian)" qualifier with a cross-reference
- Tutorial step 7: add user message to eval_ctx for consistency with all
  other guardian_check() examples
- safety-guardrails.md: add migration callout after custom criteria section
  noting that not all deprecated GuardianRisk values have CRITERIA_BANK keys
- safety-guardrails.md: add note clarifying counterintuitive factuality_detection()
  return semantics ("yes" = incorrect, "no" = correct)
- troubleshooting/common-errors.md: add factuality_correction() to the
  Guardian Intrinsics list (was omitted alongside the other three functions)
- security-and-taint-tracking.md: update frontmatter description to signal
  deprecation in search results and link previews
- security-and-taint-tracking.md: fix imprecise "no separate Guardian model
  pull" claim — intrinsics still download a model, just a different one

Assisted-by: Claude Code
…telemetry gap

Guardian Intrinsics are not Requirement subclasses and emit no
mellea.requirement.checks/failures metrics. Users migrating from
GuardianCheck would otherwise lose those counters silently.

Also fix "Determine is" → "Determine if" typo in factuality_detection
docstring.

Assisted-by: Claude Code
@planetf1 planetf1 force-pushed the cs/issue-guardian1 branch from 3e0d4dc to 51b4160 Compare May 1, 2026 10:32
planetf1 added 3 commits May 1, 2026 11:50
…view

- plugins.mdx: fix broken OTel link (evaluation-and-observability/...
  → observability/tracing)
- build-a-rag-pipeline: correct # Returns comment (None → float 0.0–1.0)
- safety-guardrails: add context-attachment pattern note to factuality
  section explaining why .add(Document) differs from documents= kwarg;
  add warning about -> float annotation mismatch (tracked as generative-computing#934)
- glossary: fix past-tense "validated" → "validates" in GuardianCheck entry
- deprecated safety examples: drop # pytest: markers so they are no longer
  collected by CI (GuardianCheck removal won't break CI in future)

Assisted-by: Claude Code
guardian.py, guardian_huggingface.py, and repair_with_guardian.py are fully
superseded by docs/examples/intrinsics/guardian_core.py, factuality_detection.py,
factuality_correction.py, and policy_guardrails.py.

One migration gap documented in safety/README.md: the old repair_with_guardian.py
pattern (GuardianCheck as a Requirement inside RepairTemplateStrategy, with
_reason fed back as repair guidance) has no direct equivalent in the Intrinsics
API — Guardian Intrinsics return float scores, not Requirement results, and do
not expose a chain-of-thought reason string.

Assisted-by: Claude Code
- Fix -> float annotations on factuality_detection/factuality_correction
  (resolves generative-computing#934; closes the stale type-lie now that file was touched)
- Fix troubleshooting groundedness bullet: wrong document placement
  (was "user message", correct is assistant Message with documents=[...])
- SafeChatSession: accept guardian_backend as constructor arg instead of
  instantiating LocalHFBackend internally (matches "create once, reuse" guidance)
- Name SEXUAL_CONTENT migration gap explicitly in safety-guardrails.md callout
- Move mellea[hf] prerequisite to RAG guide prerequisites block; drop inline note
- Remove -> float type annotation caveat from safety-guardrails.md (fixed in source)
- Remove "sexual_content" from tutorial CRITERIA_BANK key lists (not a real key)

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

1 participant