-
Notifications
You must be signed in to change notification settings - Fork 396
Clean up j.u.regex.* and use the native 'd' flag when available #4470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c5e056f
to
f032114
Compare
def find(): Boolean = if (canStillFind) { | ||
lastMatch = pattern().execFind(regexp, inputstr) | ||
def find(): Boolean = { | ||
val mtchAndEnd = pattern().execFind(inputstr, position) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use destructuring assignment?
while (i != len) { | ||
groupStartMap(i) = -1 | ||
indices(i) = js.undefined | ||
i += 1 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't a js.Array
contain js.undefined
by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. It creates an array of empty slots, not of slots with actual undefined
values.. That causes the array to become a sparse array, which is less efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a comment about that.
} | ||
val regexp = if (forMatches) jsRegExpForMatches else jsRegExpForFind | ||
regexp.lastIndex = lastMatch.index | ||
lastMatchDyn.indices = regexp.exec(lastMatch.input).asInstanceOf[js.Dynamic].indices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider avoiding the second execution if lastMatch.indices
is already defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, I'm not sure I understand. Isn't that what the following line, above, does:
if (js.isUndefined(lastMatchDyn.indices)) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand either... Seems fine :P
Previously, each `Matcher` had its own copy of a "blueprint" `RegExp`, to isolate its `lastIndex` state. We now explicitly store that state as a separate `position` field in `Matcher`, which we assign to `lastIndex` before `exec`, and read from `lastIndex` after `exec`. This allows to reuse the same `RegExp` for all matchers of a given `Pattern`, and to confine its use to `Pattern.execFind()`.
Previously, the constructor of `Matcher` accepted region params, but the only call site always used the full string. This commit confines the use of `subSequence` to the `region()` method.
Since we have to convert the input `CharSequence`s to `String`s anyway to be able to give them to `js.RegExp`, it is better to do that upon entry into our implementation. We can force the `toString()`s to be at the call site, which can then be inlined and removed when the arguments are already Strings, which should be the vast majority of use cases.
And `undefined` for unmatched groups. In addition, store the start and end of the whole match at index 0. The indices array is cached directly as a field `indices` on the `MatchResult`. This allows the cache to be better shared between a `Matcher` and a `MatchResult`. This corresponds to the format of the `indices` field of the `regexp-match-indices` ECMAScript proposal, scheduled for inclusion in ECMAScript 2022.
That feature is enabled by a flag, and not by default, because of performance concerns. JS engine implementers have shown that enabling the feature has a significant performance impact. Therefore, we only enable it the first time group indices are required for any given `Pattern` instance. This means that the first match for which group indices are required will be run twice, but this is probably the better trade-off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
while (i != len) { | ||
groupStartMap(i) = -1 | ||
indices(i) = js.undefined | ||
i += 1 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a comment about that.
Based on #4455.Manually tested with
testSuite2_12/testHtml
in Firefox, which already supports the'd'
flag by default.