-
Notifications
You must be signed in to change notification settings - Fork 396
Regex differences #1201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Reported not to work: |
The XRegExp library could be used to narrow the gap between JavaScript and Java regular expressions. I can see a case being made either way:
To me option 2 sounds better for Scala developers. I see that XRegExp is tiny and compiles down to native JS regexps, which sounds great, but there might be drawbacks I can't think of. Maybe a compilation option to produce XRegExp regexps at first would be a good idea to experiment with it in the context of Scala.js. |
The issue with using something as |
A Scala.js SBT flag could control that though. So default is plain JS regex but switching it on changes the engine to XRegExp. I think that argument is easily addressed. However, I'm actually not advocating that approach. I imagine that using XRegExp will enable more regex features (like look-behinds) but will it get us all the way to full Scala-regex compatibility? If not then, we're back where we started, except now we have two incomplete regex implementations and the same problem to be solved. Although, if a library (big or small) exists that will enable complete 100% compatibility with Scala regex, then this dual engine thing might be a great idea. |
An sbt setting can't do that. But a separate library can, by overriding the The only way to enable 100% compatibility is probably to reimplement a complete regex engine in Scala.js. |
I don't have a definitive opinion on the topic but here is some food for thoughts:
|
I think I have never written down my take on this (although I have discussed it with @sjrd).
This would allow most projects to use Java regexes safely without a big size overhead. |
we should probably document this on the website, or did I miss it? |
I didn't find anything about this, so please ignore me if I just missed it: Regexes even behave different if they match the same thing, when using groups and their positions: Minimal example: import scala.util.matching.Regex new Regex("a.*(b)c").findFirstMatchIn("abbc").get.start(1) returns I guess the reason is some kind of indexOf hack to determine the position of groups (assuming that this information is not available in js). |
And here it is:
"sound behaviour" is an exaggeration here ;) |
Note to self: This is one case we cannot solve with the regex translator approach. |
ECMAScript 2018 contains some new |
hm - how difficult can it be to reimplement java's regexps? |
It's not difficult. Regexes are based on well-known theory. But it requires a lot of code, and we don't want to bundle that much code in the .js of virtually any Scala.js code base. In addition, using the native regex engine is more efficient. The regex engine is probably the one thing in JS for which native code actually outperforms hand-written JS code that is JIT'ed. |
I have a small case where Scala-JS (see ScalaFiddle) behaves differently than JVM (tested in Ammonite): // JVM
println("(ab)|(a)".r.replaceAllIn("ab", "[1=$1][2=$2]"))
[1=ab][2=]
// JS
println("(ab)|(a)".r.replaceAllIn("ab", "[1=$1][2=$2]"))
[1=ab][2=null] My naive guess is that the |
That sounds plausible, indeed. I think we can fix this, yes. Would you mind filing a separate bug report for this one? |
FTR, I have started working on this (1. and 2.) as a week-end project. My (very) wip branch is at master...sjrd:almost-correct-regex. It relies on |
This is timely. I recently had some Java regexes failing on Scala.js. The
reasons I identified are:
- lack of support for "atomic groups" [1]
- lack of support for unicode character classes [2]
The latter are a inconsistent between Java and JavaScript, by the way. For
example "Alnum" is supported in Java but not JavaScript.
…-Erik
[1] https://www.regular-expressions.info/atomic.html
[2] https://tc39.es/ecma262/#sec-runtime-semantics-unicodematchproperty-p
On Sun, Mar 14, 2021 at 9:32 AM Sébastien Doeraene ***@***.***> wrote:
I think I have never written down my take on this (although I have
discussed it with @sjrd <https://github.com/sjrd>).
1. Implement a regex parser in Scala.js.
2. Translate Java regex constructs to JS constructs where we can, fail where we can't. This would all happen at regex "compilation" time. (i.e. inside `Pattern.compile`).
3. In a later step, provide a separate artifact that can be linked in which provides full regex support.
This would allow most projects to use Java regexes safely without a big
size overhead.
FTR, I have started working on this (1. and 2.) as a week-end project. My
(very) wip branch is at master...sjrd:almost-correct-regex
<master...sjrd:almost-correct-regex>.
It relies on u support from the JS RegExp (ES2015) for most things, and
some advanced features even rely on look-behinds and/or \p{} Unicode
character classes (ES2018).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1201 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAZ2KK3IDKSUWEZ6F6ZF3DTDTQH7ANCNFSM4AWQUMBQ>
.
|
WiP PR: #4455 |
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical fullOpt output is approximately 12 KB. The source of `PatternCompiler` contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
Previously, `java.util.regex.Pattern` was implemented without much concern for correctness wrt. the Java semantics of regular expressions. Patterns were passed through to the native `RegExp` with minimal preprocessing. This could cause several kinds of incompatibilities: - Throwing `ParseError`s for features not supported by JS regexes, - Or worse, silently compile with different semantics. In this commit, we correctly implement virtually all the features of Java regular expressions by compiling Java patterns down to JS patterns with the same semantics. This change introduces a significant code size regression for code bases that were already using `Pattern` and/or Scala's `Regex`. Therefore, we went to great lengths to optimize the compiler for code size, in particular in the default ES 2015 configuration. This means that some code is not as idiomatic as it could be. The impact of this commit on a typical output is approximately 65 KB for fastOpt and 12 KB for fullOpt. The `README.md` file contains an extensive explanation of the design of the compiler, and of how it compiles each kind of Java pattern. In addition to fixing the mega-issue scala-js#1201, this commit fixes the smaller issues scala-js#105, scala-js#1677, scala-js#1847, scala-js#2082 and scala-js#3959, which had been closed as duplicates of scala-js#1201.
(Continuing discussion from #1198)
There are differences between how regex behaves on the JVM vs generated JS. Some things shouldn't be too hard to fix, others will be near impossible without either creative hacks or switching to a bundled regex engine. It would also be beneficial to have a document somewhere stating the differences so that devs are aware. If you don't object, let's use this issue to at least start compiling a list of differences and see what (if anything) to do about it.
Off the top of my head we've got:
(?<=)
,(?<!)
). JS regex doesn't support this.Pattern.quote
uses\Q
and\E
which doesn't work in JS./[\p{S}\p{P}]/
.I'll add more as I think of them.
A semi-fix for
Pattern.quote
would be to use the a regex escaping function from a mature JS library, Google Closure's is short and sweet. Handling\Q..\E
on the other hand would be harder.Funny story about
[\p{S}\p{P}]
, I actually needed that in my JS so I wrote a little converter for that kind of regex so that it would work in Scala.js. I replaced it with[\u0021-\u002f\u003a-\u0040\\[-\u0060\u007b-\u007e\u00a1-\u00a9\u00ab\u00ac\u00ae-\u00b1\u00b4\u00b6-\u00b8\u00bb\u00bf\u00d7\u00f7\u02c2-\u02c5\u02d2-\u02df\u02e5-\u02eb\u02ed\u02ef-\u02ff\u0375\u037e\u0384\u0385\u0387\u03f6\u0482\u055a-\u055f\u0589\u058a\u058f\u05be\u05c0\u05c3\u05c6\u05f3\u05f4\u0606-\u060f\u061b\u061e\u061f\u066a-\u066d\u06d4\u06de\u06e9\u06fd\u06fe\u0700-\u070d\u07f6-\u07f9\u0830-\u083e\u085e\u0964\u0965\u0970\u09f2\u09f3\u09fa\u09fb\u0af0\u0af1\u0b70\u0bf3-\u0bfa\u0c7f\u0d79\u0df4\u0e3f\u0e4f\u0e5a\u0e5b\u0f01-\u0f17\u0f1a-\u0f1f\u0f34\u0f36\u0f38\u0f3a-\u0f3d\u0f85\u0fbe-\u0fc5\u0fc7-\u0fcc\u0fce-\u0fda\u104a-\u104f\u109e\u109f\u10fb\u1360-\u1368\u1390-\u1399\u1400\u166d\u166e\u169b\u169c\u16eb-\u16ed\u1735\u1736\u17d4-\u17d6\u17d8-\u17db\u1800-\u180a\u1940\u1944\u1945\u19de-\u19ff\u1a1e\u1a1f\u1aa0-\u1aa6\u1aa8-\u1aad\u1b5a-\u1b6a\u1b74-\u1b7c\u1bfc-\u1bff\u1c3b-\u1c3f\u1c7e\u1c7f\u1cc0-\u1cc7\u1cd3\u1fbd\u1fbf-\u1fc1\u1fcd-\u1fcf\u1fdd-\u1fdf\u1fed-\u1fef\u1ffd\u1ffe\u2010-\u2027\u2030-\u205e\u207a-\u207e\u208a-\u208e\u20a0-\u20ba\u2100\u2101\u2103-\u2106\u2108\u2109\u2114\u2116-\u2118\u211e-\u2123\u2125\u2127\u2129\u212e\u213a\u213b\u2140-\u2144\u214a-\u214d\u214f\u2190-\u23f3\u2400-\u2426\u2440-\u244a\u249c-\u24e9\u2500-\u26ff\u2701-\u2775\u2794-\u2b4c\u2b50-\u2b59\u2ce5-\u2cea\u2cf9-\u2cfc\u2cfe\u2cff\u2d70\u2e00-\u2e2e\u2e30-\u2e3b\u2e80-\u2e99\u2e9b-\u2ef3\u2f00-\u2fd5\u2ff0-\u2ffb\u3001-\u3004\u3008-\u3020\u3030\u3036\u3037\u303d-\u303f\u309b\u309c\u30a0\u30fb\u3190\u3191\u3196-\u319f\u31c0-\u31e3\u3200-\u321e\u322a-\u3247\u3250\u3260-\u327f\u328a-\u32b0\u32c0-\u32fe\u3300-\u33ff\u4dc0-\u4dff\ua490-\ua4c6\ua4fe\ua4ff\ua60d-\ua60f\ua673\ua67e\ua6f2-\ua6f7\ua700-\ua716\ua720\ua721\ua789\ua78a\ua828-\ua82b\ua836-\ua839\ua874-\ua877\ua8ce\ua8cf\ua8f8-\ua8fa\ua92e\ua92f\ua95f\ua9c1-\ua9cd\ua9de\ua9df\uaa5c-\uaa5f\uaa77-\uaa79\uaade\uaadf\uaaf0\uaaf1\uabeb\ufb29\ufbb2-\ufbc1\ufd3e\ufd3f\ufdfc\ufdfd\ufe10-\ufe19\ufe30-\ufe52\ufe54-\ufe66\ufe68-\ufe6b\uff01-\uff0f\uff1a-\uff20\uff3b-\uff40\uff5b-\uff65\uffe0-\uffe6\uffe8-\uffee\ufffc\ufffd]
The text was updated successfully, but these errors were encountered: