Closed
Description
Description
The following code:
<?php
$str = 'Total%20M%C4%81ori%2C31.5%2C33.3%2C31.8%2C33%2C36.4%2C33.2%2C33.2';
$rawstr = rawurldecode($str);
var_dump(
mb_detect_encoding($rawstr, ['UTF-8', 'ISO-8859-1', 'WINDOWS-1251']),
mb_detect_encoding($rawstr, ['ISO-8859-1', 'WINDOWS-1251', 'UTF-8']),
mb_detect_encoding($rawstr, ['WINDOWS-1251', 'UTF-8', 'ISO-8859-1']),
mb_check_encoding($rawstr, 'ISO-8859-1'),
mb_check_encoding($rawstr, 'UTF-8'),
mb_check_encoding($rawstr, 'WINDOWS-1251'),
);
Resulted in this output:
string(12) "Windows-1251"
string(12) "Windows-1251"
string(12) "Windows-1251"
bool(true)
bool(true)
bool(true)
But I expected this output instead:
string(5) "UTF-8"
string(5) "UTF-8"
string(5) "UTF-8"
bool(false)
bool(true)
bool(false)
Related issues
- mb_detect_encoding() results for UTF-7 differ between PHP 8.0 and 8.1 (if UTF-7 is present in the encodings list and the string contains '+' character) #10192
mb_detect_encoding()
detects UTF-8 emoji byte sequence as ISO-8859-1 since PHP 8.1 #7871- Improve mb_detect_encoding's recognition of Slavic names #8439
- wrong mb_detect_encoding since php8.1 for very simple utf-8 strings #10481
- mb_detect_encoding does not return the first matching encoding anymore #8279
PHP Version
PHP 8.1.22
Operating System
No response