Commit 68facfd
authored
fix(DTPL-364): Fix pdf_split_hook issues (#340)
<!-- CURSOR_SUMMARY -->
> [!NOTE]
> **Medium Risk**
> Changes core request/hook dispatch and split-PDF async
execution/cancellation paths, which could affect concurrency behavior
and cleanup semantics under load. Added tests mitigate risk, but
regressions could surface in real-world event-loop/thread-local hook
usage.
>
> **Overview**
> **Fixes `partition_async()` split-PDF execution to be truly async.**
Split chunk collection now runs via awaited async hook dispatch rather
than spinning up a nested event loop in a worker thread, with response
reassembly offloaded so it doesn’t block the event loop.
>
> **Adds robust cancellation and resource cleanup.** Introduces async
hook dispatch in `SDKHooks` (sync hooks run in `to_thread` with per-hook
serialization), adds cancellation-aware cleanup in
`BaseSDK.do_request_async`, and hardens `SplitPdfHook` with process-wide
PDF-split setup locking, lazy sync executor creation, chunk-task
cancellation/draining, and cleanup of tempdirs/unconsumed chunk files.
Extensive new unit tests cover ordering, concurrency limits,
cancellation, strict-failure behavior, and non-blocking guarantees.
>
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
da0352c. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->1 parent 5171475 commit 68facfd
8 files changed
Lines changed: 1927 additions & 174 deletions
File tree
- _test_unstructured_client/unit
- src/unstructured_client
- _hooks
- custom
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
1 | 12 | | |
2 | 13 | | |
3 | 14 | | |
| |||
0 commit comments