-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[Intl] Add EmojiTransliterator
to translate emoji to many locales
#46755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/Symfony/Component/Intl/Resources/bin/transliterator-emoji-rules.php
Outdated
Show resolved
Hide resolved
src/Symfony/Component/Intl/Transliterator/EmojiTransliterator.php
Outdated
Show resolved
Hide resolved
src/Symfony/Component/Intl/Resources/bin/transliterator-emoji-rules.php
Outdated
Show resolved
Hide resolved
src/Symfony/Component/Intl/Transliterator/EmojiTransliterator.php
Outdated
Show resolved
Hide resolved
src/Symfony/Component/Intl/Transliterator/EmojiTransliterator.php
Outdated
Show resolved
Hide resolved
src/Symfony/Component/Intl/Tests/Transliterator/EmojiTransliteratorTest.php
Outdated
Show resolved
Hide resolved
Did you want to also provide a map for short names that could work both ways? |
c6c5bb2
to
7cd2a11
Compare
@alamirault you review the PR a bit too soon :) It was a WIP (I pushed in order to backup my work).
That's already the case, or I missed something ? public function testTransliterate()
{
$tr = EmojiTransliterator::getInstance('en');
$this->assertSame(
'a grinning cat, black cat, and a lion go to national park️... smiling face with heart-eyes party popper yellow heart',
$tr->transliterate('a 😺, 🐈⬛, and a 🦁 go to 🏞️... 😍 🎉 💛')
);
} |
src/Symfony/Component/Intl/Transliterator/EmojiTransliterator.php
Outdated
Show resolved
Hide resolved
src/Symfony/Component/Intl/Transliterator/EmojiTransliterator.php
Outdated
Show resolved
Hide resolved
@stof Thanks for the review, I've addressed your comments. I think the PR is now ready. |
ddc8dd7
to
066e257
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice :)
I just have minor comments.
In a follow up, it'd be great to add two more locales:
en_github
sourced from https://api.github.com/emojisen_slack
sourced from https://github.com/iamcal/emoji-data
src/Symfony/Component/Intl/Transliterator/EmojiTransliterator.php
Outdated
Show resolved
Hide resolved
src/Symfony/Component/Intl/Transliterator/EmojiTransliterator.php
Outdated
Show resolved
Hide resolved
I missed that point! What do you recommend? we can merge en into en_ca? |
that might increase the size of maps, better run the str_replace twice when needed (with the php-based implem) |
I went this way actually, not much to merge. The PR on your fork is up to date. |
@nicolas-grekas Feel free to merge this PR, and then you can open a new PR to improve the situation if it needs to |
@lyrixx Let's finish this PR before merging it. I don't see why we would merge something that we know is not finished yet. |
Especially when this means bloating the git history :) |
So I think the best way It to merge the PR of @nicolas-grekas (in my PR) (I'll do that in few minutes) So we can discuss about it. But Unfortunately, I don't understand the new code, and I miss some time to dig into it. |
e44e73d
to
205587f
Compare
I talked a bit with @nicolas-grekas on slack, and we decided to share our thought on the subject here. Nikos rewrite all the code to be able to rely on Opcache preloading, in order to increase the performance. To be honest, I don't like this new implementation (that explains my previous^2 comment). So the rest of this comment might be subjective. But I let you judge :) The previous code was easier to understand, maintain and work with (see exception message for instance) 😄 :
About the performance, I wrote this Unit Test in both branch public function testBench()
{
$t = microtime(true);
for ($i=0; $i < 1000; $i++) {
$tr = EmojiTransliterator::create('en'); // I also tried with this line outside the loop. Same result ($tr is cached internally)
$tr->transliterate('un 😺, 🐈⬛, et a 🦁 vont au 🏞️');
}
dump((microtime(true) - $t));
} Here are the results: Previous code: 0.0145s It looks like the previous code is 6 times faster than the new code when using only one transliterator instance. It's legit to me, since the new code does not use the translitator anymore but strtr. But what about many transliterator? To know that, I used the following command that run for all locales
old code 00:01.746 So here the old code is also faster php -v
php -i | grep opca
|
205587f
to
de65d43
Compare
Thanks for challenging my proposal @lyrixx |
My bench: Results with 100 transliterations:
Results with 1000 transliterations:
<?php
use Symfony\Component\Intl\Transliterator\EmojiTransliterator;
require __DIR__.'/vendor/autoload.php';
$tr = EmojiTransliterator::create('en');
//$tr = EmojiTransliterator::getInstance('en');
for ($i=0; $i < 100; $i++) {
$tr->transliterate('un 😺, 🐈<200d>⬛, et a 🦁 vont au 🏞️ ');
}
echo $tr->transliterate('un 😺, 🐈<200d>⬛, et a 🦁 vont au 🏞️ '); |
how would we link AsciiSlugger + EmojiTransliterator? is it desired? |
In a follow up PR, and yes, I think it's desired. |
EmojiTransliterator
to translate emoji to many locales
de65d43
to
55d57ff
Compare
Thank you @lyrixx. |
Today, I used the AsciiSlugger and I was really surprised that it replaced
my emoji with nothing (a blank string). I dig a bit, and I think the best way
to solve that is to add a EmojiTransliterator, and then wire it (another PR)
Current API:
To do that, I built some transliterator rules, and I committed them to the repository.
The sources come from the official data at https://github.com/unicode-org/cldr