Implement StringScanner for TruffleRuby in pure Ruby#195
Implement StringScanner for TruffleRuby in pure Ruby#195eregon wants to merge 4 commits intoruby:masterfrom
Conversation
ca6ba18 to
3fe9dbe
Compare
| if RUBY_ENGINE == 'truffleruby' && RUBY_ENGINE_VERSION.to_i < 34 | ||
| omit 'not supported on TruffleRuby < 34 due to using strscan from stdlib' | ||
| end | ||
| end |
There was a problem hiding this comment.
This will be useful for #193.
It's not practical to implement new methods on TruffleRuby < 34, but it's easy to do so on TruffleRuby >= 34 by modifying lib/strscan/truffleruby_strscan.rb.
There was a problem hiding this comment.
When can we drop support for TruffleRuby < 34?
Do we need to run tests with TruffleRuby < 34 in this repository because TruffleRuby < 34 doesn't use implementation in this repository.
3fe9dbe to
87bc55e
Compare
* Fixes ruby#194. * This is a fresh new implementation from scratch, contributed directly to ruby/strscan, under BSD-2-Clause. This was implemented using the strscan tests, specs from ruby/spec and the documentation (https://docs.ruby-lang.org/en/master/StringScanner.html). * lib/strscan.rb now handles the loading for all implementations. * Test on TruffleRuby 33 to ensure it keeps working on older TruffleRuby releases, which do not have the necessary Primitives used by this new implementation.
87bc55e to
bc0fd1e
Compare
|
I extracted the documentation fix to #196 since there is no dependency on it and it's an orthogonal change. |
|
@eregon IANAL but my understanding is that if you've had exposure to the licensed code, it's pretty hard to claim "rewrote from scratch". I understand the difficulty of relicensing code from experience, but I also respect that contributors can require people to respect their IP rights for whatever reason they choose. |
|
I get your point (though see e.g. this on the subject), but it is a fresh new implementation from scratch (i.e. starting with an empty file) that I wrote between FWIW it's probably not even possible to get the list of contributors to |
I noticed this while working on #195.
|
If implementing by eregon may have any problem for licensing, I can implement this. I haven't seen the |
|
There is no problem, as I said I have implemented this carefully based on the tests/specs/docs alone. If it's helpful I could merge this new implementation in truffleruby/truffleruby first, and delete the rubysl-strscan-based implementation in truffleruby/truffleruby, let me know if you prefer that. |
|
@brixen Do you have any more comments? Can we proceed this with eregon's implementation? |
|
If anyone wants to compare, this is the original rubysl-strscan code and this is the new implementation. And this is the current implementation of strscan in truffleruby. The state, approach, structure, style, etc are clearly different. As an example the implementation of |
|
@kou If you prefer to reimplement it yourself, that's fine by me, as long as I can review it for performance and simplicity (minimal state is best). If you decide to reimplement it, here are some useful tips:
What I did is get the tests passing, then the specs. |
|
Let's wait for at least a few weeks before we proceed this. It's for collecting comments from others. |
* This is a fresh new implementation from scratch under BSD-2-Clause. This was implemented using the ruby/strscan tests, specs from ruby/spec and the official documentation (https://docs.ruby-lang.org/en/master/StringScanner.html). * No AI involved, I fully wrote this myself on March 8-9 2026. * This implementation uses minimal state and has good performance. * Originally contributed to ruby/strscan#195 but given slow progress it will be merged in truffleruby/truffleruby first.
* To test all conditions in lib/strscan.rb.
kou
left a comment
There was a problem hiding this comment.
No objection. Let's proceed this with the eregon's implementation.
There was a problem hiding this comment.
Can we remove _strscan suffix because it's redundant?
|
|
||
| def rest = @string.byteslice(@pos..) | ||
|
|
||
| def rest_size = rest.bytesize |
There was a problem hiding this comment.
@string.bytesize - @pos will be faster.
| alias_method :bol?, :beginning_of_line? | ||
|
|
||
| def eos? | ||
| raise ArgumentError, 'uninitialized StringScanner object' unless @string |
There was a problem hiding this comment.
Do we need this check? It seems that @string must not be nil.
| def matched = @last_match&.to_s | ||
|
|
||
| def [](group) | ||
| raise TypeError, 'no implicit conversion of Range into Integer' if Primitive.is_a?(group, Range) |
There was a problem hiding this comment.
Why do we need this check? Does @last_match[group] already have this check?
| end | ||
|
|
||
| def scan_byte | ||
| if rest? |
There was a problem hiding this comment.
How about using return nil if eos? like get_byte does?
| if rest.start_with?(pattern) | ||
| to = prev + pattern.bytesize | ||
| @last_match = Primitive.matchdata_create_single_group(pattern, @string, prev, to) | ||
| to - prev |
There was a problem hiding this comment.
This works but pattern.bytesize may be better for readability.
| to - prev | |
| pattern.bytesize |
| if from = @string.byteindex(pattern, prev) | ||
| to = from + pattern.bytesize | ||
| @last_match = Primitive.matchdata_create_single_group(pattern, @string, from, to) | ||
| @string.byteslice(prev, to - prev) |
There was a problem hiding this comment.
Can we use pattern instead? I think that pattern is faster than @string.byteslice.
| end | ||
| Primitive.always_split self, :search_full | ||
|
|
||
| # Keep the following 8 methods in sync, they are small variations of one another |
There was a problem hiding this comment.
Why did you implement these methods separately instead of implementing it in scan_full/search_full like the C implementation does? There are many duplicated code in these methods.
Performance? How much faster?
| require 'strscan' | ||
| end | ||
| else | ||
| raise "Unknown Ruby: #{RUBY_ENGINE}" |
There was a problem hiding this comment.
How about using NotImplementedError?
| if RUBY_ENGINE == 'truffleruby' && RUBY_ENGINE_VERSION.to_i < 34 | ||
| omit 'not supported on TruffleRuby < 34 due to using strscan from stdlib' | ||
| end | ||
| end |
There was a problem hiding this comment.
When can we drop support for TruffleRuby < 34?
Do we need to run tests with TruffleRuby < 34 in this repository because TruffleRuby < 34 doesn't use implementation in this repository.
|
Could you also update Lines 9 to 23 in 4243751 |
This was implemented using the strscan tests, specs from ruby/spec and the documentation (https://docs.ruby-lang.org/en/master/StringScanner.html).
which do not have the necessary Primitives used by this new implementation.