Skip to content

Regex differences #1201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
japgolly opened this issue Oct 28, 2014 · 20 comments · Fixed by #4455
Closed

Regex differences #1201

japgolly opened this issue Oct 28, 2014 · 20 comments · Fixed by #4455
Assignees
Labels
bug Confirmed bug. Needs to be fixed.
Milestone

Comments

@japgolly
Copy link
Contributor

(Continuing discussion from #1198)

There are differences between how regex behaves on the JVM vs generated JS. Some things shouldn't be too hard to fix, others will be near impossible without either creative hacks or switching to a bundled regex engine. It would also be beneficial to have a document somewhere stating the differences so that devs are aware. If you don't object, let's use this issue to at least start compiling a list of differences and see what (if anything) to do about it.

Off the top of my head we've got:

  • Look-behinds ((?<=), (?<!)). JS regex doesn't support this.
  • Java-style quotes/literals. Pattern.quote uses \Q and \E which doesn't work in JS.
  • Java-style character sets, example: /[\p{S}\p{P}]/.

I'll add more as I think of them.

A semi-fix for Pattern.quote would be to use the a regex escaping function from a mature JS library, Google Closure's is short and sweet. Handling \Q..\E on the other hand would be harder.

Funny story about [\p{S}\p{P}], I actually needed that in my JS so I wrote a little converter for that kind of regex so that it would work in Scala.js. I replaced it with [\u0021-\u002f\u003a-\u0040\\[-\u0060\u007b-\u007e\u00a1-\u00a9\u00ab\u00ac\u00ae-\u00b1\u00b4\u00b6-\u00b8\u00bb\u00bf\u00d7\u00f7\u02c2-\u02c5\u02d2-\u02df\u02e5-\u02eb\u02ed\u02ef-\u02ff\u0375\u037e\u0384\u0385\u0387\u03f6\u0482\u055a-\u055f\u0589\u058a\u058f\u05be\u05c0\u05c3\u05c6\u05f3\u05f4\u0606-\u060f\u061b\u061e\u061f\u066a-\u066d\u06d4\u06de\u06e9\u06fd\u06fe\u0700-\u070d\u07f6-\u07f9\u0830-\u083e\u085e\u0964\u0965\u0970\u09f2\u09f3\u09fa\u09fb\u0af0\u0af1\u0b70\u0bf3-\u0bfa\u0c7f\u0d79\u0df4\u0e3f\u0e4f\u0e5a\u0e5b\u0f01-\u0f17\u0f1a-\u0f1f\u0f34\u0f36\u0f38\u0f3a-\u0f3d\u0f85\u0fbe-\u0fc5\u0fc7-\u0fcc\u0fce-\u0fda\u104a-\u104f\u109e\u109f\u10fb\u1360-\u1368\u1390-\u1399\u1400\u166d\u166e\u169b\u169c\u16eb-\u16ed\u1735\u1736\u17d4-\u17d6\u17d8-\u17db\u1800-\u180a\u1940\u1944\u1945\u19de-\u19ff\u1a1e\u1a1f\u1aa0-\u1aa6\u1aa8-\u1aad\u1b5a-\u1b6a\u1b74-\u1b7c\u1bfc-\u1bff\u1c3b-\u1c3f\u1c7e\u1c7f\u1cc0-\u1cc7\u1cd3\u1fbd\u1fbf-\u1fc1\u1fcd-\u1fcf\u1fdd-\u1fdf\u1fed-\u1fef\u1ffd\u1ffe\u2010-\u2027\u2030-\u205e\u207a-\u207e\u208a-\u208e\u20a0-\u20ba\u2100\u2101\u2103-\u2106\u2108\u2109\u2114\u2116-\u2118\u211e-\u2123\u2125\u2127\u2129\u212e\u213a\u213b\u2140-\u2144\u214a-\u214d\u214f\u2190-\u23f3\u2400-\u2426\u2440-\u244a\u249c-\u24e9\u2500-\u26ff\u2701-\u2775\u2794-\u2b4c\u2b50-\u2b59\u2ce5-\u2cea\u2cf9-\u2cfc\u2cfe\u2cff\u2d70\u2e00-\u2e2e\u2e30-\u2e3b\u2e80-\u2e99\u2e9b-\u2ef3\u2f00-\u2fd5\u2ff0-\u2ffb\u3001-\u3004\u3008-\u3020\u3030\u3036\u3037\u303d-\u303f\u309b\u309c\u30a0\u30fb\u3190\u3191\u3196-\u319f\u31c0-\u31e3\u3200-\u321e\u322a-\u3247\u3250\u3260-\u327f\u328a-\u32b0\u32c0-\u32fe\u3300-\u33ff\u4dc0-\u4dff\ua490-\ua4c6\ua4fe\ua4ff\ua60d-\ua60f\ua673\ua67e\ua6f2-\ua6f7\ua700-\ua716\ua720\ua721\ua789\ua78a\ua828-\ua82b\ua836-\ua839\ua874-\ua877\ua8ce\ua8cf\ua8f8-\ua8fa\ua92e\ua92f\ua95f\ua9c1-\ua9cd\ua9de\ua9df\uaa5c-\uaa5f\uaa77-\uaa79\uaade\uaadf\uaaf0\uaaf1\uabeb\ufb29\ufbb2-\ufbc1\ufd3e\ufd3f\ufdfc\ufdfd\ufe10-\ufe19\ufe30-\ufe52\ufe54-\ufe66\ufe68-\ufe6b\uff01-\uff0f\uff1a-\uff20\uff3b-\uff40\uff5b-\uff65\uffe0-\uffe6\uffe8-\uffee\ufffc\ufffd]

@japgolly
Copy link
Contributor Author

Reported not to work: "([\w&&[\D]][\w]*)".r
https://groups.google.com/forum/#!folder/Scala/scala-js/u43HVcF__yo

@ebruchez
Copy link

The XRegExp library could be used to narrow the gap between JavaScript and Java regular expressions.

I can see a case being made either way:

  1. leave things as they are: just use plain JavaScript regexps and document the gaps
  2. try to reduce the gaps by bundling XRegExp or something like it

To me option 2 sounds better for Scala developers. I see that XRegExp is tiny and compiles down to native JS regexps, which sounds great, but there might be drawbacks I can't think of.

Maybe a compilation option to produce XRegExp regexps at first would be a good idea to experiment with it in the context of Scala.js.

@sjrd
Copy link
Member

sjrd commented Dec 18, 2014

The issue with using something as XRegExp to implement java.util.regex.* is that it will be required by virtually any Scala.js program. So far we've been very careful to be entirely self-contained, depending only on standardized ECMAScript 5.1 (both language and libraries). This allows extreme portability of Scala.js code across JS environments.

@japgolly
Copy link
Contributor Author

A Scala.js SBT flag could control that though. So default is plain JS regex but switching it on changes the engine to XRegExp. I think that argument is easily addressed.

However, I'm actually not advocating that approach. I imagine that using XRegExp will enable more regex features (like look-behinds) but will it get us all the way to full Scala-regex compatibility? If not then, we're back where we started, except now we have two incomplete regex implementations and the same problem to be solved.

Although, if a library (big or small) exists that will enable complete 100% compatibility with Scala regex, then this dual engine thing might be a great idea.

@sjrd
Copy link
Member

sjrd commented Dec 18, 2014

An sbt setting can't do that. But a separate library can, by overriding the java.util.regex.* classes, as long as somehow you can guarantee that it comes first on the classpath.

The only way to enable 100% compatibility is probably to reimplement a complete regex engine in Scala.js.

@ebruchez
Copy link

I don't have a definitive opinion on the topic but here is some food for thoughts:

  1. One benefit of the XRegExp approach is that it compiles down to native JavaScript regexps. The expectation is that performance will be identical to using a native regexp if only native JavaScript features are used, and reasonable for the rest (probably better than a native Scala.js implementation not compiling down to JavaScript regexps).
  2. It would be reasonable for a fully-compatible Scala.js implementation to use the same approach of compiling down to JavaScript regexps, at least when possible.
  3. If we knew that regexps making use of non-native features always threw exceptions in JavaScript, or just did not work meaningfully, there would be little drawback using a progressive approach with XRegExp or similar, as nobody could use such expressions in JavaScript. But I don't know that this is the case.

@gzm0
Copy link
Contributor

gzm0 commented May 22, 2015

I think I have never written down my take on this (although I have discussed it with @sjrd).

  1. Implement a regex parser in Scala.js.
  2. Translate Java regex constructs to JS constructs where we can, fail where we can't. This would all happen at regex "compilation" time. (i.e. inside Pattern.compile).
  3. In a later step, provide a separate artifact that can be linked in which provides full regex support.

This would allow most projects to use Java regexes safely without a big size overhead.

@yawnt
Copy link
Contributor

yawnt commented May 22, 2015

we should probably document this on the website, or did I miss it?

@gzm0
Copy link
Contributor

gzm0 commented May 22, 2015

@martinring
Copy link

I didn't find anything about this, so please ignore me if I just missed it:

Regexes even behave different if they match the same thing, when using groups and their positions:

Minimal example:

import scala.util.matching.Regex
new Regex("a.*(b)c").findFirstMatchIn("abbc").get.start(1)

returns 2 on the JVM but 1 in JS

I guess the reason is some kind of indexOf hack to determine the position of groups (assuming that this information is not available in js).

@martinring
Copy link

And here it is:

groupStr => inputstr.indexOf(groupStr, last.index)

"sound behaviour" is an exaggeration here ;)

@gzm0
Copy link
Contributor

gzm0 commented Oct 12, 2015

Note to self: This is one case we cannot solve with the regex translator approach.

@sjrd
Copy link
Member

sjrd commented Feb 3, 2018

ECMAScript 2018 contains some new RegExp features that could help, here.

@ritschwumm
Copy link

hm - how difficult can it be to reimplement java's regexps?

@sjrd
Copy link
Member

sjrd commented Feb 3, 2018

It's not difficult. Regexes are based on well-known theory. But it requires a lot of code, and we don't want to bundle that much code in the .js of virtually any Scala.js code base.

In addition, using the native regex engine is more efficient. The regex engine is probably the one thing in JS for which native code actually outperforms hand-written JS code that is JIT'ed.

@nightscape
Copy link
Contributor

I have a small case where Scala-JS (see ScalaFiddle) behaves differently than JVM (tested in Ammonite):

// JVM
println("(ab)|(a)".r.replaceAllIn("ab", "[1=$1][2=$2]")) 
[1=ab][2=]

// JS
println("(ab)|(a)".r.replaceAllIn("ab", "[1=$1][2=$2]")) 
[1=ab][2=null]

My naive guess is that the Matcher.group method returns null and null is formatted as "" on JVM and "null" on JS.
If this is actually the problem, then it could be fixed by a rather precise change in Matcher that should not involve changes in code size.
@sjrd do you agree, or am I missing something?

@sjrd
Copy link
Member

sjrd commented Dec 27, 2020

That sounds plausible, indeed. I think we can fix this, yes. Would you mind filing a separate bug report for this one?

@sjrd
Copy link
Member

sjrd commented Mar 14, 2021

I think I have never written down my take on this (although I have discussed it with @sjrd).

  1. Implement a regex parser in Scala.js.
  2. Translate Java regex constructs to JS constructs where we can, fail where we can't. This would all happen at regex "compilation" time. (i.e. inside Pattern.compile).
  3. In a later step, provide a separate artifact that can be linked in which provides full regex support.

This would allow most projects to use Java regexes safely without a big size overhead.

FTR, I have started working on this (1. and 2.) as a week-end project. My (very) wip branch is at master...sjrd:almost-correct-regex. It relies on u support from the JS RegExp (ES2015) for most things, and some advanced features even rely on look-behinds and/or \p{} Unicode character classes (ES2018).

@sjrd sjrd self-assigned this Mar 14, 2021
@ebruchez
Copy link

ebruchez commented Mar 15, 2021 via email

@sjrd
Copy link
Member

sjrd commented Mar 26, 2021

WiP PR: #4455

sjrd added a commit to sjrd/scala-js that referenced this issue Apr 18, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical fullOpt output is approximately 12 KB.

The source of `PatternCompiler` contains an extensive explanation
of the design of the compiler, and of how it compiles each kind of
Java pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue Apr 19, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue Apr 19, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue Apr 20, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue Apr 21, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue Apr 21, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue Apr 22, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue Apr 22, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue Apr 23, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue Apr 25, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue Apr 25, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue Apr 26, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue Apr 26, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue Apr 26, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue May 2, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue May 6, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue May 19, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue May 22, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue Jun 9, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue Jun 10, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue Jun 10, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue Jun 14, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue Jun 24, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue Jun 28, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue Jul 6, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
sjrd added a commit to sjrd/scala-js that referenced this issue Jul 7, 2021
Previously, `java.util.regex.Pattern` was implemented without much
concern for correctness wrt. the Java semantics of regular
expressions. Patterns were passed through to the native `RegExp`
with minimal preprocessing.

This could cause several kinds of incompatibilities:

- Throwing `ParseError`s for features not supported by JS regexes,
- Or worse, silently compile with different semantics.

In this commit, we correctly implement virtually all the features
of Java regular expressions by compiling Java patterns down to JS
patterns with the same semantics.

This change introduces a significant code size regression for code
bases that were already using `Pattern` and/or Scala's `Regex`.
Therefore, we went to great lengths to optimize the compiler for
code size, in particular in the default ES 2015 configuration. This
means that some code is not as idiomatic as it could be. The impact
of this commit on a typical output is approximately 65 KB for
fastOpt and 12 KB for fullOpt.

The `README.md` file contains an extensive explanation of the
design of the compiler, and of how it compiles each kind of Java
pattern.

In addition to fixing the mega-issue scala-js#1201, this commit fixes the
smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been
closed as duplicates of scala-js#1201.
@sjrd sjrd closed this as completed in #4455 Jul 8, 2021
sjrd added a commit that referenced this issue Jul 8, 2021
@sjrd sjrd added the bug Confirmed bug. Needs to be fixed. label Jul 8, 2021
@sjrd sjrd added this to the v1.7.0 milestone Jul 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bug. Needs to be fixed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants