yeast: Align query semantics more closely with tree-sitter#21810
Open
yeast: Align query semantics more closely with tree-sitter#21810
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the shared/yeast query language and matcher to behave closer to tree-sitter query semantics, particularly around unnamed tokens and positional child matching.
Changes:
- Extend query syntax to support bare
_(match any node, including unnamed) and bare string literals (shorthand for("...")), including in field positions. - Update positional matching to support forward-scan behavior (skipping over non-matching children rather than requiring exact positional alignment).
- Refine schema kind import to use canonical tree-sitter IDs for unnamed kinds, aligning schema resolution with how the AST visitor assigns kind IDs.
Show a summary per file
| File | Description |
|---|---|
| shared/yeast/tests/test.rs | Adds regression tests for capturing unnamed tokens, bare _, field-position sugar, and forward-scan behavior. |
| shared/yeast/src/schema.rs | Imports canonical IDs for unnamed kinds and tracks kind names more consistently. |
| shared/yeast/src/query.rs | Adds match_unnamed to wildcard nodes and implements forward-scan positional matching. |
| shared/yeast/doc/yeast.md | Updates user-facing query-language documentation to describe the new wildcard/token behavior. |
| shared/yeast-macros/src/parse.rs | Extends proc-macro parsing to accept bare _ and bare literals; allows intermixing fields and positional patterns. |
| shared/yeast-macros/src/lib.rs | Updates macro-level syntax documentation to reflect the new query forms and ordering rules. |
Copilot's findings
- Files reviewed: 6/6 changed files
- Comments generated: 3
Three improvements to the query parser, all aimed at allowing query
patterns to refer to unnamed tokens:
1. Bare-literal capture: `"=" @op` now captures the unnamed `=` token,
matching the parenthesized form `("=") @op`. Previously the literal
branch in parse_query_list skipped the maybe_wrap_capture call, so
the `@op` was a leftover token and would error.
2. Bare `_` matches any node, named or unnamed. Previously bare `_` and
`(_)` both produced QueryNode::Any with the same matches_named_only
behaviour, so bare `_` would skip unnamed children. Now Any carries a
match_unnamed flag: false for `(_)` (named-only, tree-sitter default)
and true for bare `_` (any node).
3. Named fields and bare child patterns may be intermixed in any order.
Previously, once parse_query_fields saw a bare pattern it would stop
accepting named fields. The fix accumulates bare patterns into the
implicit `child` field and keeps parsing.
Each named field independently selects its target field for matching, so
the source-order of fields in the query is purely cosmetic and intermixing
is safe.
Add tests covering parenthesized capture, bare-literal capture, and the
named-vs-any distinction between `(_)` and bare `_`. Update query-syntax
docs to reflect all three.
Schema::from_language registered unnamed kinds via or_insert(id), where
`id` came from iterating 0..node_kind_count. For names with multiple
unnamed IDs (notably "end" in tree-sitter-ruby has IDs 0 and 13, where
ID 0 is the reserved error token), this picked the first encountered
ID — typically the wrong one.
The visitor sets node.kind via language.id_for_node_kind(name, false),
which returns the canonical ID. So a query for ("end") would compare
node.kind=13 against schema=0 and silently fail to match, with no
diagnostic.
Use language.id_for_node_kind(name, false) to obtain the canonical ID
when registering, mirroring the named-kind path that already does the
same with id_for_node_kind(name, true).
Previously, a bare child pattern in a query took whatever the next
child of the iterator was and either matched or failed: it would not
scan ahead to find a match. So `(foo ("baz"))` against a `foo` whose
implicit `child` field was `["bar", "baz"]` would fail (the pattern
took "bar" first).
Switch to forward-scan semantics: a SingleNode matcher advances through
the iterator until it finds a child that matches its sub-query. Patterns
that are named-only continue to skip past unnamed children for free.
Order is preserved across multiple bare patterns at the same level —
each pattern advances the shared iterator past whatever it consumed —
so a query cannot match children out of source order.
Captures from a failed match attempt are rolled back via a snapshot, so
partial captures from a complex sub-query do not leak across attempts.
Add two regression tests against the `do` body wrapper in a Ruby
for-loop, whose implicit `child` field contains [do, identifier, end]:
- a query for ("end") matches by skipping past `do` and the identifier
- a query for ("end") then ("do") fails, demonstrating order preservation
16f0c7a to
af6e921
Compare
Contributor
Author
Rerun has been triggered: 2 restarted 🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In particular:
_) now match any node -- named or unnamed, just as intree-sitter. This also applies to fields:(foo bar: _ @baz)is now valid.(foo "bar" "baz"), then the query(foo "baz")would fail because it would try to match against"bar". Now it skips over"bar"and continues to try to match.