-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
It seems that the algorithm longestCommonSubstring does not handle unicode characters properly:
longestCommonSubstr('𐌵𐌵**ABC', '𐌵𐌵--ABC') === '𐌵𐌵'
// whereas the longest one should be ABC (in terms of number of code points)
// Number of code points:
[...'𐌵𐌵'].length === 2
[...'ABC'].length === 3
// Number of "characters":
'𐌵𐌵'.length === 4
'ABC'.length === 3
You should maybe add a note on the algorithm regarding this. Basically the problem can occur whenever the strings contain characters outside the BMP range (ie code points greater than 0xffff).
Feel free to close the issue whenever you want. The aim was just to signal the problem is case you want to patch it in a way.
JXWJS
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working