Skip to content

fix: resilient manifest parsing for disabled field and unknown versions#95

Open
Sourabhchrs93 wants to merge 1 commit intomainfrom
fix/resilient-manifest-parsing
Open

fix: resilient manifest parsing for disabled field and unknown versions#95
Sourabhchrs93 wants to merge 1 commit intomainfrom
fix/resilient-manifest-parsing

Conversation

@Sourabhchrs93
Copy link
Copy Markdown
Contributor

Summary

  • Fixes production dbt_ingestion celery task failure (ManifestV12 810 validation errors on disabled field)
  • The disabled field uses strict Pydantic discriminated unions (Disabled through Disabled13) that break when dbt Cloud changes its manifest schema. This field is never consumed by any downstream wrapper or backend code.
  • Adds fallback: on validation failure, strips unused disabled field and retries parsing
  • Adds forward-compatibility: unknown manifest versions (e.g. v13+) attempt parse with latest known class instead of hard failing with ValueError

Root Cause

dbt Cloud updated their manifest artifact, and the auto-generated ManifestV12 Pydantic model's disabled field couldn't validate the new structure. Each disabled node failed against all 14 union variants, producing 810 validation errors.

Changes

  • _try_parse_manifest(): attempts strict parse first, falls back to stripping unused fields
  • _strip_unused_fields(): removes disabled (and any future unused strict fields) from manifest dict
  • Forward-compat: regex-matches unknown manifest/vN.json URLs and tries latest known parser
  • Refactored version routing from if/elif chain to dict lookup

Test plan

  • Verify parsing succeeds with the failing Docusign manifest (Batch ID 1374333)
  • Verify parsing still works for manifests that already parse cleanly (no regression)
  • Verify unknown manifest version (e.g. v13) falls back gracefully instead of erroring

🤖 Generated with Claude Code

…ions

- Add fallback that strips unused `disabled` field on Pydantic validation
  failure — the field uses strict discriminated unions that break on dbt
  Cloud schema changes but is never consumed by downstream wrappers
- Add forward-compatibility for unknown manifest versions (e.g. v13+) by
  attempting parse with latest known class instead of hard failing
- Refactor version routing to dict lookup for clarity

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
return _try_parse_manifest(manifest, model_class)

# Forward-compatibility: unknown manifest version — try latest known class
match = _MANIFEST_VERSION_RE.match(dbt_schema_version)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The code may raise a TypeError when parsing a manifest. re.match is called on dbt_schema_version, which can be None if the manifest contains a null value for it.
Severity: MEDIUM

Suggested Fix

Add a check to ensure dbt_schema_version is a non-None string before passing it to re.match(). This validation can be added directly before the call or within the get_dbt_schema_version utility function.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/vendor/dbt_artifacts_parser/parser.py#L159

Potential issue: The function `get_dbt_schema_version` can return `None` if an input
manifest contains `"metadata": {"dbt_schema_version": null}`. This `None` value is then
passed to `re.match()` within the `parse_manifest` function. Since `re.match()` expects
a string or bytes-like object, this will raise a `TypeError` and crash the parser.
Although manifests with a null schema version may be uncommon, the corresponding
Pydantic model explicitly defines this field as optional, making this a possible edge
case that is not handled.

Did we get this right? 👍 / 👎 to inform future reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant