Skip to content

Glob Match Rewrite#16824

Merged
Jarred-Sumner merged 26 commits intooven-sh:mainfrom
probably-neb:probably-neb/fast-glob-port
Feb 14, 2025
Merged

Glob Match Rewrite#16824
Jarred-Sumner merged 26 commits intooven-sh:mainfrom
probably-neb:probably-neb/fast-glob-port

Conversation

@probably-neb
Copy link
Copy Markdown
Contributor

@probably-neb probably-neb commented Jan 28, 2025

What does this PR do?

Rewrites the glob match implementation (both ascii and non-ascii) by porting the fast-glob crate

This fixes #14934 as well as a few other issues I found along the way.
Additionally, based on my local testing it results in a large perf win.

The perf win I am referring to was measured by running each test in test/js/bun/glob/match.test.ts 10,000 times each and comparing the total runtime for the current and my new implementation. On my machine I get an average of a little over 200x faster than the current implementation. I'd be happy to upload the code for this comparison to github and share if it is of interest.

I didn't do very much more in depth perf testing because my main focus with the PR was to fix the aforementioned issues with the current implementation.

The high-level overview of the change is instead of keeping a track of a BraceStack to deal with braces {foo,bar}, it recursively creates new globs for each branch (i.e. 1{foo,bar}2 -> [1foo2,1bar2]).
This does mean, however, that matchImpl must now take in an allocator parameter in order to create the sub globs. For better performance the implementation basically uses a std.stackFallback to stack allocate sub globs less than a predetermined length defined by the GLOB_STACK_BUF_SIZE variable in both src/glob/ascii.zig and src/glob/GlobWalker.zig (currently 64 in both places).

You will notice in the PR that I updated all callsites of matchImpl to pass in an instance of Allocator. I would appreciate if someone more familiar with the codebase than I would double check each place that I did so to make sure I'm using the allocator that makes the most sense in each context. Please note that in places where an allocator was not available in the context I used bun.default_allocator rather than change more function signatures, but I'm happy to change this if desired.

  • Code changes

How did you verify your code works?

I wrote automated tests

  • I checked the lifetime of memory allocated to verify it's (1) freed and (2) only freed when it should be
  • I included a test for the new code, or an existing test covers it
  • I wrote TypeScript/JavaScript tests and they pass locally (bun-debug test test-file-name.test)

@probably-neb
Copy link
Copy Markdown
Contributor Author

Here is the hyperfine output for my perf test.
original: current glob match implementation
rewrite-v0: rewritten implementation before perf improvements commit
rewrite-v3: rewritten implementation after perf improvements commit

❱ hyperfine -N ./original ./rewrite-v3 ./rewrite-v0
Benchmark 1: ./original
  Time (mean ± σ):     117.691 s ±  0.910 s    [User: 38.036 s, System: 79.520 s]
  Range (min … max):   116.630 s … 119.453 s    10 runs

Benchmark 2: ./rewrite-v3
  Time (mean ± σ):     566.2 ms ±  32.4 ms    [User: 565.0 ms, System: 0.0 ms]
  Range (min … max):   540.2 ms … 619.7 ms    10 runs

Benchmark 3: ./rewrite-v0
  Time (mean ± σ):      2.753 s ±  0.087 s    [User: 0.808 s, System: 1.783 s]
  Range (min … max):    2.680 s …  2.916 s    10 runs

Summary
  ./rewrite-v3 ran
    4.86 ± 0.32 times faster than ./rewrite-v0
  207.86 ± 11.99 times faster than ./original

@RiskyMH
Copy link
Copy Markdown
Collaborator

RiskyMH commented Jan 28, 2025

I'm not sure if the tests are flaky, but they do seem related: https://buildkite.com/bun/bun/builds/10675

@zackradisic
Copy link
Copy Markdown
Contributor

Thanks a lot @probably-neb, porting the fast-glob crate is something that I've been meaning to do for a while but never had the time to do so!

I'll take a closer look at this PR when I get the chance, but thanks once again!

@probably-neb
Copy link
Copy Markdown
Contributor Author

@RiskyMH they do seem related. I'll look into it

@probably-neb
Copy link
Copy Markdown
Contributor Author

probably-neb commented Jan 28, 2025

Ok something is very wrong with the Apple x64 output. Won't show it in buildkite and when I download it it's full of null bytes. Not sure what to do about that one.
Seems like the glob ./** is failing to match any paths on the windows tests that are failing. Unfortunately my Windows install decided it is unable to connect to the internet a few weeks ago so I can't test it on Windows ATM.
I'd be happy to join a call on Discord with someone who does and help them debug it tomorrow, otherwise I will just try to get Windows working again this weekend and attempt to debug it myself

Comment thread src/glob/ascii.zig Outdated
Comment on lines +325 to +327
var buf: [GLOB_STACK_BUF_SIZE]u8 = undefined;
var fixed_buf_alloc = std.heap.FixedBufferAllocator.init(&buf);
const buf_alloc = fixed_buf_alloc.allocator();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 3 lines should probably be above and happen unconditionally. Haven't verified but it looks like it will cause undefined behavior as the code is exiting the scope in which they were defined

@zackradisic
Copy link
Copy Markdown
Contributor

Hey @probably-neb thanks once again for the PR. Bun's Glob is in need of some refactoring so I'm gonna take it from here and do some changes like removing the ascii and unicode matchers as only the ascii matcher is necessary, as well as some other changes.

There are two options:

  1. you could give me write access to your fork so I can make the changes their and we can use this same PR
  2. I push these changes into a branch in the Bun repo and make a PR and reference this one

Which one would you like to proceed with?

@probably-neb
Copy link
Copy Markdown
Contributor Author

Thanks a bunch, I've really been meaning to get back around to this, but I've been swamped with work and unable to get Windows working again.

I'm fine with either approach. I'd love if my name was attached in the git log somewhere but I don't feel that strongly about it.

I invited you as a collaborator to my fork. Feel free to proceed with whichever approach is easier for you

@probably-neb
Copy link
Copy Markdown
Contributor Author

Also feel free to ask questions if you have any. Happy to continue helping to the extent I can

Comment thread src/glob/GlobWalker.zig
var width: u32 = 0;
while (i < pattern.len) : (i += 1) {
const c = pattern[i];
width = bun.strings.utf8ByteSequenceLength(c);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use the CodepointIterator because this code will not handle invalid surrogate pairs correctly on Windows

import { isWindows } from "harness";

describe("Glob.match", () => {
test("WTF", () => {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you delete this test

Comment thread src/cli/outdated_command.zig Outdated
else => {},
}
pub fn deinit(_: @This(), _: std.mem.Allocator) void {
// switch (this) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you delete this function so that the next person working on this code doesn't think it's a memory leak?

Comment thread src/glob/GlobWalker.zig
else => {},
}
}
i -|= 1;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems likely to be incorrect?

Comment thread bench/glob/rewrite.mjs Outdated
@@ -0,0 +1,27 @@
import { Glob } from "bun";
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give this a better filename?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Recursive globs fail too early on partial path match

4 participants