Skip to content

[v2.0] Phase 5: Testing — parity tests and benchmarks #27

@fsecada01

Description

@fsecada01

Ensure Rust and Python backends produce identical output across all inputs. Quantify the performance gain.

Wiki: Phase 5 detail
Branch: feature/rust-backend

Tasks

  • 5.1 tests/test_parity.py — happy-path parity: split_text and split_texts produce identical output from both backends
  • 5.2 Edge case parity tests: empty string, no separator present, separator-only input, single chunk exceeding chunk_size
  • 5.3 tests/bench_splitting.py — benchmark script: Python baseline vs Rust batch at 10,000 docs; prints speedup factor
  • 5.4 All parity tests pass in CI (both use_rust=True and use_rust=False paths exercised)

Expected benchmark result

Backend Time Speedup
Python ~45s 1x
Rust ~1.2s ~37x

Metadata

Metadata

Assignees

No one assigned

    Labels

    testingTest coverage and qualitytrackingParent tracking issue with sub-tasksv2.0TextSpitter v2.0 Rust backend

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions