Skip to content

[String]: ->match() with flag PREG_OFFSET_CAPTURE giving wrong results (i.e. ignoring Unicode) #45032

Closed
@ThomasLandauer

Description

@ThomasLandauer

Symfony version(s) affected

5.4.2

Description

When using the flag PREG_OFFSET_CAPTURE with $string->match() on a UnicodeString, the position of the matched characters is given in bytes, not in characters (graphemes).

This is in fact an issue of PHP's preg_match_all(); I reported it at https://bugs.php.net/bug.php?id=80166 => Outcome: They didn't change anything, but at least documented the status-quo ;-)
So I was hoping that Symfony's String Component "fixed" it...

How to reproduce

$string = new UnicodeString('öa');
$result = $string->match('/a/', \PREG_PATTERN_ORDER|\PREG_OFFSET_CAPTURE); // PREG_PATTERN_ORDER is only there to make Symfony use `preg_match_all()` instead of `preg_match()`
dd($result);

Actual result (in the innermost array):

1 => 2 // `ö` is counted as 2 bytes, therefore `a` is at index-position 2

Expected result:

1 => 1 // `ö` should be counted as 1 Unicode character (grapheme), then `a` would be at index-position 1

Possible Solution

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions