Skip to content

Commit 72a7478

Browse files
tausbnCopilot
andcommitted
yeast: Add YAML node-types format and converter
Human-friendly YAML alternative to tree-sitter node-types.json with three sections: supertypes, named, unnamed. Supports bidirectional conversion and building Schema objects from YAML. Includes CLI binary (node_types_yaml) and documentation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent c6c2e12 commit 72a7478

3 files changed

Lines changed: 1014 additions & 0 deletions

File tree

Lines changed: 241 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,241 @@
1+
# YAML Node Types Format
2+
3+
The YAML node-types format is a human-friendly alternative to tree-sitter's
4+
`node-types.json`. It can be converted to and from JSON using the
5+
`node_types_yaml` tool.
6+
7+
## Overview
8+
9+
A YAML node-types file has three top-level sections:
10+
11+
```yaml
12+
supertypes:
13+
# Abstract union types
14+
15+
named:
16+
# Concrete AST nodes and leaf tokens
17+
18+
unnamed:
19+
# Punctuation and keyword tokens
20+
```
21+
22+
All three sections are optional. If omitted, they default to empty.
23+
24+
## Supertypes
25+
26+
Supertypes are abstract groupings of node types (unions). Each supertype maps
27+
to a list of its members:
28+
29+
```yaml
30+
supertypes:
31+
_expression:
32+
- assignment
33+
- binary
34+
- identifier
35+
- call
36+
```
37+
38+
This corresponds to the following JSON:
39+
40+
```json
41+
{
42+
"type": "_expression",
43+
"named": true,
44+
"subtypes": [
45+
{ "type": "assignment", "named": true },
46+
{ "type": "binary", "named": true },
47+
{ "type": "identifier", "named": true },
48+
{ "type": "call", "named": true }
49+
]
50+
}
51+
```
52+
53+
Members are resolved as named or unnamed using the
54+
[type reference rules](#type-references) described below.
55+
56+
## Named nodes
57+
58+
Named nodes are concrete AST node types. Each entry is a node kind mapping to
59+
its fields. A node with no fields (a leaf token like `identifier`) uses an
60+
empty value:
61+
62+
```yaml
63+
named:
64+
identifier:
65+
constant:
66+
```
67+
68+
```json
69+
{"type": "identifier", "named": true, "fields": {}},
70+
{"type": "constant", "named": true, "fields": {}}
71+
```
72+
73+
### Fields
74+
75+
Each field has a name, a multiplicity suffix, and a list of allowed types.
76+
77+
| Suffix | Meaning | JSON `multiple` | JSON `required` |
78+
| ------ | ------------ | --------------- | --------------- |
79+
| (none) | exactly one | `false` | `true` |
80+
| `?` | zero or one | `false` | `false` |
81+
| `+` | one or more | `true` | `true` |
82+
| `*` | zero or more | `true` | `false` |
83+
84+
Example:
85+
86+
```yaml
87+
named:
88+
assignment:
89+
left: _lhs
90+
right: _expression
91+
```
92+
93+
```json
94+
{
95+
"type": "assignment",
96+
"named": true,
97+
"fields": {
98+
"left": {
99+
"multiple": false,
100+
"required": true,
101+
"types": [{ "type": "_lhs", "named": true }]
102+
},
103+
"right": {
104+
"multiple": false,
105+
"required": true,
106+
"types": [{ "type": "_expression", "named": true }]
107+
}
108+
}
109+
}
110+
```
111+
112+
A field with multiple allowed types uses a list:
113+
114+
```yaml
115+
named:
116+
binary:
117+
left: [_expression, _simple_numeric]
118+
operator: ["!=", "+", "&&"]
119+
right: _expression
120+
```
121+
122+
A singleton list can be written as a bare value (as shown with `right` above).
123+
124+
### Unnamed children
125+
126+
Unnamed children (nodes that appear as children without a field name) are
127+
specified using the special `$children` field name, with the same suffixes:
128+
129+
```yaml
130+
named:
131+
argument_list:
132+
$children*: [_expression, block_argument, splat_argument]
133+
```
134+
135+
```json
136+
{
137+
"type": "argument_list",
138+
"named": true,
139+
"fields": {},
140+
"children": {
141+
"multiple": true,
142+
"required": false,
143+
"types": [
144+
{ "type": "_expression", "named": true },
145+
{ "type": "block_argument", "named": true },
146+
{ "type": "splat_argument", "named": true }
147+
]
148+
}
149+
}
150+
```
151+
152+
## Unnamed tokens
153+
154+
Unnamed tokens are punctuation, operators, and keywords that appear in the
155+
parse tree but don't have their own AST node type. They are listed as simple
156+
strings:
157+
158+
```yaml
159+
unnamed:
160+
- "="
161+
- "end"
162+
- "+"
163+
- "&&"
164+
```
165+
166+
```json
167+
{"type": "=", "named": false},
168+
{"type": "end", "named": false},
169+
{"type": "+", "named": false},
170+
{"type": "&&", "named": false}
171+
```
172+
173+
When converting to YAML, unnamed tokens are always wrapped in quotes for
174+
visual clarity. This is purely cosmetic — YAML treats `end` and `"end"` as
175+
the same string.
176+
177+
## Type references
178+
179+
When a type name appears in a field's type list or a supertype's member list,
180+
it needs to be resolved as either named or unnamed. The rules are:
181+
182+
1. If the name only appears in `named` or `supertypes`, it is **named**.
183+
2. If the name only appears in `unnamed`, it is **unnamed**.
184+
3. If the name appears in both, it defaults to **named**.
185+
4. To explicitly reference an unnamed type in the ambiguous case, use the
186+
map form:
187+
188+
```yaml
189+
named:
190+
example:
191+
field: { unnamed: foo }
192+
```
193+
194+
In practice, ambiguity is rare — names like `end`, `+`, `if` are almost
195+
always only unnamed, while names like `identifier`, `assignment` are only
196+
named.
197+
198+
## Complete example
199+
200+
```yaml
201+
supertypes:
202+
_expression:
203+
- assignment
204+
- binary
205+
- identifier
206+
207+
named:
208+
assignment:
209+
left: _expression
210+
right?: _expression
211+
binary:
212+
left: [_expression, _simple_numeric]
213+
operator: ["!=", "+"]
214+
right: _expression
215+
argument_list:
216+
$children*: [_expression, block_argument]
217+
identifier:
218+
constant:
219+
220+
unnamed:
221+
- "!="
222+
- "+"
223+
- "="
224+
- "end"
225+
```
226+
227+
## CLI usage
228+
229+
Convert YAML to JSON:
230+
231+
```
232+
node_types_yaml input.yaml > node-types.json
233+
```
234+
235+
Convert JSON to YAML:
236+
237+
```
238+
node_types_yaml --from-json node-types.json > node-types.yaml
239+
```
240+
241+
Both commands also accept input from stdin if no file argument is given.
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
use clap::Parser;
2+
use std::io::Read;
3+
4+
#[derive(Parser)]
5+
#[clap(
6+
name = "node-types-yaml",
7+
about = "Convert between YAML and JSON node-types formats"
8+
)]
9+
struct Cli {
10+
/// Input file (reads from stdin if not provided)
11+
input: Option<String>,
12+
13+
/// Convert from JSON to YAML (default is YAML to JSON)
14+
#[arg(long)]
15+
from_json: bool,
16+
}
17+
18+
fn main() {
19+
let args = Cli::parse();
20+
21+
let input = match &args.input {
22+
Some(path) => std::fs::read_to_string(path).unwrap_or_else(|e| {
23+
eprintln!("Error reading {path}: {e}");
24+
std::process::exit(1);
25+
}),
26+
None => {
27+
let mut buf = String::new();
28+
std::io::stdin()
29+
.read_to_string(&mut buf)
30+
.unwrap_or_else(|e| {
31+
eprintln!("Error reading stdin: {e}");
32+
std::process::exit(1);
33+
});
34+
buf
35+
}
36+
};
37+
38+
let result = if args.from_json {
39+
yeast::node_types_yaml::convert_from_json(&input)
40+
} else {
41+
yeast::node_types_yaml::convert(&input)
42+
};
43+
44+
match result {
45+
Ok(output) => print!("{output}"),
46+
Err(e) => {
47+
eprintln!("Error: {e}");
48+
std::process::exit(1);
49+
}
50+
}
51+
}

0 commit comments

Comments
 (0)