Skip to content

[v2.0] Nice-to-Haves — post-release backlog #29

@fsecada01

Description

@fsecada01

Approved enhancements to consider after the core v2.0 Rust backend ships. Not blockers for the v2.0 release.

Wiki: Nice-to-Haves

Backlog

  • Memory-mapped file processing for very large PDFs (memmap2 crate) — avoids loading entire file into RAM
  • SIMD-accelerated separator detection — opt-in via [features] simd = [] / pip install "textspitter[simd]"
  • Streaming iterator API — yield chunks instead of collecting all; enables processing before extraction completes
  • cargo bench integration with criterion — reproducible micro-benchmarks replacing the ad-hoc bench_splitting.py
  • PyPI publish job for manylinux wheels (already wired in Phase 6 CI — just needs the publish trigger on release)

Notes

These are all approved and on the roadmap. Pick up after v2.0.0 ships to main.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesttrackingParent tracking issue with sub-tasksv2.0TextSpitter v2.0 Rust backend

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions