Skip to content

feat: add get_dataframe_report() method to Validate#341

Merged
rich-iannone merged 31 commits intoposit-dev:mainfrom
Meghansaha:add-get_dataframe
Apr 15, 2026
Merged

feat: add get_dataframe_report() method to Validate#341
rich-iannone merged 31 commits intoposit-dev:mainfrom
Meghansaha:add-get_dataframe

Conversation

@Meghansaha
Copy link
Copy Markdown
Contributor

@Meghansaha Meghansaha commented Jan 8, 2026

Summary

Thank you for contributing to Pointblank! To make this process easier for everyone, please explain the context and purpose of your contribution. Also, list the changes made to the existing code or documentation.

Related GitHub Issues and PRs

Checklist

Comment thread pointblank/validate.py Outdated
@Meghansaha
Copy link
Copy Markdown
Contributor Author

Meghansaha commented Feb 25, 2026

NTS:

  • polars and pandas dfs
  • Add duckdb/ibis backend
  • Turn Ruff back on
  • Fix Ruff errors
  • Code review first pass
  • Code review second pass (in progress)
  • Make tests (in progress)
  • Recheck examples
  • Update Changelog

@Meghansaha
Copy link
Copy Markdown
Contributor Author

Hey @rich-iannone!

I'm nearly done with adding #163 (still finishing up tests), but I'm blocked by ruff failures in CI. When my branch generates .pointblank/validate.pyi, I get syntax errors:

invalid-syntax: Unexpected indentation
   --> pointblank\validate.pyi:165:1
    |
163 | def get_dataframe(self, tbl_type: Literal['polars', 'pandas', 'duckdb'] = 'polars') -> Any: ...
164 |     # === GENERATED START ===
165 |     def col_sum_eq(
    | ^^^^

invalid-syntax: Expected a statement
   --> pointblank\validate.pyi:706:1
    |
705 |     # === GENERATED END ===

Looks like the # === GENERATED START === block is being inserted outside of a class body, so the indented def col_sum_eq on line 165 is invalid. This seems to trace back to scripts/generate_agg_validate_pyi.py (L112–L144):

## Create grid of aggs and comparators
with VALIDATE_PYI_PATH.open("w") as f:
f.write(content)
f.write(" # === GENERATED START ===\n")
for agg_name, comp_name in itertools.product(
AGGREGATOR_REGISTRY.keys(), COMPARATOR_REGISTRY.keys()
):
method = f"col_{agg_name}_{comp_name}"
# Extract examples from the doctest registry using robust AST parsing
doctest_fn = _TEST_FUNCTION_REGISTRY[method]
body: str = _extract_body(doctest_fn)
# Add >>> to each line in the body so doctest can run it
body_with_arrows: str = "\n".join(f"\t>>> {line}" for line in body.split("\n"))
# Build docstring
meth_body = (
f'"""Assert the values in a column '
f"{agg_name.replace('_', ' ')} to a value "
f"{comp_name.replace('_', ' ')} some `value`.\n"
f"{DOCSTRING}"
f"{body_with_arrows}\n"
f'"""\n'
)
# Build the .pyi stub method
temp = f" def {method}({SIGNATURE}\t) -> {CLS}:\n {meth_body} ...\n\n"
f.write(temp)
f.write(" # === GENERATED END ===\n")

I saw you may have done related work here not sure if there's anything I need to address on my branch or if I need to ignore? Either way will finish out the tests and mark it as "ready" when I finish those up.

@rich-iannone
Copy link
Copy Markdown
Member

@Meghansaha first off, just want to say thank you for doing all this work! Been following it and it'll be a wonderful contribution to the package. The ruff/validate.pyi part is relatively new/complex and, if you're okay with it, I could take it from here to the finish line.

My only additional change beyond that in this PR would be to change the name of the method from .get_dataframe() to .get_dataframe_report() (mostly for consistency with .get_tabular_report()).

@Meghansaha
Copy link
Copy Markdown
Contributor Author

Hi @rich-iannone! No problem at all, I appreciate you letting me contribute. So sorry it took so long, life was "lifing" and I think as you're aware, duckdb is a pain on my Windows potato machine due to the permission error stuff. 😭

You can take over if you'd like as I'm sure you'll be faster than me. I did feel this needs more behavioral tests though like confirming correct column names and things like that. I think the rename makes sense too!

@rich-iannone rich-iannone changed the title WIP - add get_dataframe method to Validate feat: add get_dataframe_report() method to Validate Apr 14, 2026
@rich-iannone rich-iannone marked this pull request as ready for review April 15, 2026 00:04
Copy link
Copy Markdown
Member

@rich-iannone rich-iannone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@rich-iannone rich-iannone merged commit 3c298a5 into posit-dev:main Apr 15, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants