|
| 1 | +# Output Restoration: Compatibility with Real Build Tools |
| 2 | + |
| 3 | +## Background |
| 4 | + |
| 5 | +Output restoration automatically archives files produced by a cached task and restores them on cache hit. |
| 6 | +When a task runs and creates `dist/`, those files are saved as a `tar.zst` archive in the cache directory. |
| 7 | +On subsequent cache hits the archive is extracted, skipping the need to re-execute the task. |
| 8 | + |
| 9 | +The feature works end-to-end when tested with simple write-only commands (`vtt write-file dist/out.txt built`), |
| 10 | +but **fails with real build tools** (Vite 8, tsdown) in the default auto-detection mode. |
| 11 | + |
| 12 | +## The Problem |
| 13 | + |
| 14 | +Deleting the output directory (`dist/`) between runs causes a **cache miss** instead of triggering output restoration. |
| 15 | + |
| 16 | +``` |
| 17 | +~/packages/vite-app$ vite build |
| 18 | +... |
| 19 | +✓ built in 34ms |
| 20 | +
|
| 21 | +# delete dist, run again: |
| 22 | +
|
| 23 | +~/packages/vite-app$ vite build ○ cache miss: 'assets' removed from 'packages/vite-app/dist', executing |
| 24 | +``` |
| 25 | + |
| 26 | +The archived output files exist in the cache and could be restored, but the cache validation rejects the entry before restoration has a chance to run. |
| 27 | + |
| 28 | +## Root Cause |
| 29 | + |
| 30 | +Build tools **read from their output directory** during execution. fspy captures these reads and records them as inferred inputs. When the output directory is later deleted, the inferred input fingerprint no longer matches, producing a cache miss. |
| 31 | + |
| 32 | +### How the cache validation pipeline works |
| 33 | + |
| 34 | +1. **Cache entry lookup** — match by `CacheEntryKey` (spawn fingerprint + input config + output config) |
| 35 | +2. **Globbed input validation** — compare stored file hashes against current state for explicit input globs |
| 36 | +3. **Post-run fingerprint validation** — compare stored fspy-inferred input fingerprints against current filesystem state |
| 37 | +4. **If all pass** → cache hit → replay terminal output → **extract output archive** |
| 38 | + |
| 39 | +The failure occurs at step 3. The stored fingerprint records `packages/app/dist` as `Folder(Some({assets: Dir, index.html: File}))`. After deletion the current fingerprint is `NotFound`. The mismatch is reported before output restoration at step 4 can execute. |
| 40 | + |
| 41 | +### Why build tools read from the output directory |
| 42 | + |
| 43 | +Both Vite 8 and tsdown follow the same pattern: **clean the output directory before writing new files**, then **report compressed sizes after writing**. |
| 44 | + |
| 45 | +#### Vite 8 (`vite build`) |
| 46 | + |
| 47 | +1. **Directory cleanup** (`emptyDir()`, called from `prepareOutDirPlugin` during `renderStart`) |
| 48 | + - `fs.readdirSync(outDir)` enumerates entries so they can be deleted before the new build |
| 49 | + - Controlled by `emptyOutDir` (default: `true`) |
| 50 | + - This is the primary read that causes `packages/app/dist` to appear in fspy's `path_reads` |
| 51 | + |
| 52 | +2. **Compressed size reporting** (rolldown's builtin `viteReporterPlugin`, Rust-native) |
| 53 | + - After writing output files, the reporter reads each file back to compute gzip/brotli sizes |
| 54 | + - Produces the `dist/index.html 0.15 kB │ gzip: 0.14 kB` lines |
| 55 | + - Controlled by `reportCompressedSize` (default: `true`) |
| 56 | + - This causes reads on individual output files like `dist/index.html`, `dist/assets/index-xxx.js` |
| 57 | + |
| 58 | +3. **Public directory copy** (`copyDir()`) |
| 59 | + - If `copyPublicDir` is enabled (default: `true`), reads the public dir to copy into outDir |
| 60 | + |
| 61 | +#### tsdown |
| 62 | + |
| 63 | +1. **Directory cleanup** (`cleanOutDir()` via `tinyglobby`'s `glob()`) |
| 64 | + - Enumerates all files in the output directory with `onlyFiles: false` before deleting them |
| 65 | + - Default `clean: true` triggers this |
| 66 | + - This is the primary read that causes `packages/lib/dist` to appear in fspy's `path_reads` |
| 67 | + |
| 68 | +2. **Shebang permission check** (`ShebangPlugin`'s `writeBundle` hook) |
| 69 | + - Calls `access()` on output files to check existence before setting execute permissions |
| 70 | + - Only applies to entry chunks with shebang directives |
| 71 | + |
| 72 | +### Why the read-write overlap check doesn't catch this |
| 73 | + |
| 74 | +The overlap check at `execute_spawn` (mod.rs:486-488) looks for exact path matches between `path_reads` and `path_writes`: |
| 75 | + |
| 76 | +```rust |
| 77 | +pa.path_reads.keys().find(|p| pa.path_writes.contains(*p)) |
| 78 | +``` |
| 79 | + |
| 80 | +fspy reports these as separate paths: |
| 81 | + |
| 82 | +- **Read**: `packages/app/dist` (the directory itself, via `readdirSync`) |
| 83 | +- **Write**: `packages/app/dist/index.html`, `packages/app/dist/assets/index-xxx.js` (individual files) |
| 84 | + |
| 85 | +Since `packages/app/dist` ≠ `packages/app/dist/index.html`, no overlap is detected, and caching proceeds. |
| 86 | + |
| 87 | +### Why `should_ignore_entry` doesn't help |
| 88 | + |
| 89 | +`fingerprint.rs:185-187` filters `dist` when listed as a directory entry of a **parent**: |
| 90 | + |
| 91 | +```rust |
| 92 | +fn should_ignore_entry(name: &[u8]) -> bool { |
| 93 | + matches!(name, b"." | b".." | b".DS_Store") || name.eq_ignore_ascii_case(b"dist") |
| 94 | +} |
| 95 | +``` |
| 96 | + |
| 97 | +This prevents the fingerprint of `packages/app/` from changing when `dist` appears or disappears inside it. But it does not help when `packages/app/dist` itself is a direct key in `inferred_inputs` — that path is fingerprinted independently, and its transition from `Folder(...)` to `NotFound` is a mismatch. |
| 98 | + |
| 99 | +### Why the existing e2e test passes |
| 100 | + |
| 101 | +The `output-cache-test` fixture uses `vtt write-file dist/out.txt built`, a simple write-only operation. `vtt write-file` never reads from `dist/`, so fspy only records it as a write. The directory never appears in `path_reads`, the post-run fingerprint doesn't include it, and cache validation succeeds after deletion. |
| 102 | + |
| 103 | +This does not reflect real build tool behavior. |
| 104 | + |
| 105 | +## Behavior Matrix |
| 106 | + |
| 107 | +All tests performed with Vite 8.0.8 and tsdown 0.12.9. "Cache hit after deleting dist" means output restoration can work. |
| 108 | + |
| 109 | +| `input` | `output` | fspy enabled | Cache hit after deleting dist | |
| 110 | +| -------------- | --------------- | ------------ | ----------------------------- | |
| 111 | +| auto (default) | auto (default) | yes | **no** | |
| 112 | +| auto (default) | `["dist/**"]` | yes | **no** | |
| 113 | +| `["src/**"]` | auto (default) | yes | **no** | |
| 114 | +| `["src/**"]` | `["dist/**"]` | no | **yes** | |
| 115 | +| `["src/**"]` | `[]` (disabled) | no | yes (but no restoration) | |
| 116 | + |
| 117 | +The only configuration that works requires **both** explicit input globs **and** explicit output globs, which disables fspy entirely. Any configuration that enables fspy (for either auto-input or auto-output detection) causes the output directory reads to pollute the inferred input set. |
0 commit comments