-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[Intl][Emoji] Arrows instead of emoji (CLDR hierarchical ↑↑↑) #53116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Top 100 emojis according to the first click-bait site i found 😂 ❤️ 🤣 👍 😭 🙏 😘 🥰 😍 😊 |
/cc @lyrixx ;) |
I started to look at it, and i have some news :) Almost all concerns the Resources/emoji/build.php script -- 1/ the merge algorithm with the parents locales does not work because there is an intermediary level in the $mapsByLocale (by mb_strlen), so the merge beeing not recursive we lost data there. foreach ($results as $result) {
// ...
$codePointsCount = mb_strlen($emoji);
$mapsByLocale[$locale][$codePointsCount][$emoji] = $name;
}
// ...
$maps += $mapsByLocale[$parentLocale] ?? []; -- 2/ the arrows can simply be ignored 😃 So something like that should be enough // 263A FE0F ; fully-qualified # ☺️ E0.6 smiling face
preg_match('{^(?<codePoints>[\w ]+) +; [\w-]+ +# (?<emoji>.+) E\d+\.\d+ ?(?<name>.+)$}Uu', $line, $matches);
if (!$matches) {
throw new \DomainException("Could not parse line: \"$line\".");
}
+if (str_contains($matches['name'], '↑')) {
+ continue;
+} -- 3/ There was some codepoints "manually" added that messed with the build
That created 2 codepoints for the same emoji .. or we need to build a 1:1 map in the end, and that made all the warnings during the build due to false-negative tests. -- 4/ Good news: JSON repository I played a bit with the script and i discovered there is a CLDR-JSON repository, much easier to manipulate during the build. One locale file could be generated like this (once the data loaded). foreach (self::getCldrAnnotations() as $locale => $data) {
$rules = array_map(fn (array $data) => $data['tts'][0] ?? null, $data);
$localeRules[$locale] = [...$localeRules[$locale] ?? [], ...array_filter($rules)];
} And seeing how it's complete and simple to parse/reuse, i think we should use it to build the Intl data too -- 5/ I also looked a bit more deeply into the derived annotations (75% of the emoji data weight) ... and if we accept just a "second pass" when we build the transliterator map, we could almost make it seemless. Something like : // BEFORE
'👨🏻❤💋👨🏼' => 'bisou : homme, homme, peau claire et peau moyennement claire',
'👨🏻❤💋👨🏽' => 'bisou : homme, homme, peau claire et peau légèrement mate',
'👨🏻❤💋👨🏾' => 'bisou : homme, homme, peau claire et peau mate',
'👨🏻❤💋👨🏿' => 'bisou : homme, homme, peau claire et peau foncée',
'👨🏼❤💋👨🏻' => 'bisou : homme, homme, peau moyennement claire et peau claire',
'👨🏼❤💋👨🏼' => 'bisou : homme, homme et peau moyennement claire',
'👨🏼❤💋👨🏽' => 'bisou : homme, homme, peau moyennement claire et peau légèrement mate',
'👨🏼❤💋👨🏾' => 'bisou : homme, homme, peau moyennement claire et peau mate',
'👨🏼❤💋👨🏿' => 'bisou : homme, homme, peau moyennement claire et peau foncée',
// AFTER
'👨🏻❤💋👨🏼' => '💋: 👨🏻, 👨🏻, 🏼 et 🏼',
'👨🏻❤💋👨🏽' => '💋: 👨🏻, 👨🏻, 🏼 et 🏽',
'👨🏻❤💋👨🏾' => '💋: 👨🏻, 👨🏻, 🏼 et 🏾',
'👨🏻❤💋👨🏿' => '💋: 👨🏻, 👨🏻, 🏼 et 🏿',
'👨🏼❤💋👨🏻' => '💋: 👨🏻, 👨🏻, 🏼 et 🏼',
'👨🏼❤💋👨🏼' => '💋: 👨🏻, 👨🏻 et 🏼',
'👨🏼❤💋👨🏽' => '💋: 👨🏻, 👨🏻, 🏼 et 🏽',
'👨🏼❤💋👨🏾' => '💋: 👨🏻, 👨🏻, 🏼 et 🏾',
'👨🏼❤💋👨🏿' => '💋: 👨🏻, 👨🏻, 🏼 et 🏿',
'👨🏽❤💋👨🏻' => '💋: 👨🏻, 👨🏻, 🏽 et 🏼',
'👨🏽❤💋👨🏼' => '💋: 👨🏻, 👨🏻, 🏽 et 🏼', As they are "modifiers" we may even not need them at all and build that map following the specs ! -- I'll work on it tomorrow (at least the first part, to fix the bug currently impacting ... around half the world population 😅 ) But if someone wants to do all (or some of that) today, feel really, really free to start without me :) |
…add missing data) (smnandre) This PR was squashed before being merged into the 6.4 branch. Discussion ---------- [Intl] [Emoji] Fix emoji files (remove wrong characters / add missing data) | Q | A | ------------- | --- | Branch? | 6.4 | Bug fix? | yes | New feature? | no | Deprecations? | no | Issues | Fix #53116 | License | MIT Fix two things * unwanted characters (`↑↑↑`) instead of expected translations * merging between child and parent locales Also adapted the code to maintain a shared / reproductible order in the generated files. Commits ------- e9fcff5 [Intl] [Emoji] Fix emoji files (remove wrong characters / add missing data)
Symfony version(s) affected
6.4,7.0,7.1
Description
Hierarchical metadata (this character:
↑↑↑
) are copied into the resource files, leading to arrows beeing inserted in the texts.I took a list of "top 100 emojis" to see how what proportion was concerned and went through all the xx_YY files.
How to reproduce
Obviously it impacts the Slugger with Emoji
Possible Solution
Something to adapt in the files generation.
Additional Context
No response
The text was updated successfully, but these errors were encountered: