Skip to content

Clean up j.u.regex.* and use the native 'd' flag when available #4470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 9, 2021

Conversation

sjrd
Copy link
Member

@sjrd sjrd commented Apr 27, 2021

Based on #4455.

Manually tested with testSuite2_12/testHtml in Firefox, which already supports the 'd' flag by default.

@sjrd sjrd force-pushed the regex-followup branch from ae2b5fa to 5f4ce14 Compare May 22, 2021 17:12
@sjrd sjrd force-pushed the regex-followup branch 2 times, most recently from c5e056f to f032114 Compare June 10, 2021 16:54
@sjrd sjrd mentioned this pull request Jun 15, 2021
4 tasks
@sjrd sjrd force-pushed the regex-followup branch from f032114 to 3a805cc Compare July 8, 2021 09:12
@sjrd sjrd marked this pull request as ready for review July 8, 2021 09:12
@sjrd sjrd requested a review from gzm0 July 8, 2021 09:12
@sjrd sjrd force-pushed the regex-followup branch from 3a805cc to aba2712 Compare July 8, 2021 09:35
def find(): Boolean = if (canStillFind) {
lastMatch = pattern().execFind(regexp, inputstr)
def find(): Boolean = {
val mtchAndEnd = pattern().execFind(inputstr, position)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use destructuring assignment?

while (i != len) {
groupStartMap(i) = -1
indices(i) = js.undefined
i += 1
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't a js.Array contain js.undefined by default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. It creates an array of empty slots, not of slots with actual undefined values.. That causes the array to become a sparse array, which is less efficient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a comment about that.

}
val regexp = if (forMatches) jsRegExpForMatches else jsRegExpForFind
regexp.lastIndex = lastMatch.index
lastMatchDyn.indices = regexp.exec(lastMatch.input).asInstanceOf[js.Dynamic].indices
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider avoiding the second execution if lastMatch.indices is already defined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, I'm not sure I understand. Isn't that what the following line, above, does:

if (js.isUndefined(lastMatchDyn.indices)) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand either... Seems fine :P

sjrd added 5 commits July 9, 2021 10:43
Previously, each `Matcher` had its own copy of a "blueprint"
`RegExp`, to isolate its `lastIndex` state.

We now explicitly store that state as a separate `position` field
in `Matcher`, which we assign to `lastIndex` before `exec`, and
read from `lastIndex` after `exec`.

This allows to reuse the same `RegExp` for all matchers of a given
`Pattern`, and to confine its use to `Pattern.execFind()`.
Previously, the constructor of `Matcher` accepted region params,
but the only call site always used the full string. This commit
confines the use of `subSequence` to the `region()` method.
Since we have to convert the input `CharSequence`s to `String`s
anyway to be able to give them to `js.RegExp`, it is better to do
that upon entry into our implementation. We can force the
`toString()`s to be at the call site, which can then be inlined and
removed when the arguments are already Strings, which should be the
vast majority of use cases.
And `undefined` for unmatched groups. In addition, store the start
and end of the whole match at index 0.

The indices array is cached directly as a field `indices` on the
`MatchResult`. This allows the cache to be better shared between a
`Matcher` and a `MatchResult`.

This corresponds to the format of the `indices` field of the
`regexp-match-indices` ECMAScript proposal, scheduled for inclusion
in ECMAScript 2022.
That feature is enabled by a flag, and not by default, because of
performance concerns. JS engine implementers have shown that
enabling the feature has a significant performance impact.
Therefore, we only enable it the first time group indices are
required for any given `Pattern` instance. This means that the
first match for which group indices are required will be run twice,
but this is probably the better trade-off.
@sjrd sjrd force-pushed the regex-followup branch from aba2712 to 5c76e19 Compare July 9, 2021 09:04
Copy link
Member Author

@sjrd sjrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

while (i != len) {
groupStartMap(i) = -1
indices(i) = js.undefined
i += 1
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a comment about that.

@sjrd sjrd requested a review from gzm0 July 9, 2021 09:05
@sjrd sjrd merged commit a73746b into scala-js:master Jul 9, 2021
@sjrd sjrd deleted the regex-followup branch July 9, 2021 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants