-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[Intl] Allow compressing emoji and data maps #49970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
16bb293
to
a4679ac
Compare
Given that dependents of |
Given that zlib is now required to load any data in symfony/intl, it must be declared as a dependency in composer.json (there is no feature that works without it, and existing features now require it to work) |
You might have missed my update @stof: The dependency to ext-zlib is made optional by not-compressing data for the "en" and the "meta" locales, which match our defaults. |
It still does not preserve BC for existing projects that work without zlib and would break when upgrading to 6.3 without notice. If we do that, we should probably have the dependency on zlib in 6.3 and 6.4 (so that projects don't upgrade in the semver range while breaking support for their transliteration or ICU data usage) and make it actually optional only in 7.0 |
I think I'm fine either ways:
Our BC policy doesn't cover this topic so we can decide either way IMHO. I have a slight preference for approach 2. 🤘 /cc @symfony/mergers in case you want to chime in here. |
a4679ac
to
b38fb8d
Compare
I prefer this PR instead of #49947 as having a "full" component for "just" the emoji transliterator looks weird to me. |
I like option 2 (about zlib req). |
Option 1 would not break BC as Another argument for keeping the component split in addition of reducing the size of their data: they don't even have the same platform dependency. EmojiTransliterator depends on ext-intl while other parts of symfony/intl has nothing to do with that extension |
Why not create a new component, but having one just for the emoji tranliterator seems weird to me (too small, too specific). We would need to come up with a more generic component for which the first part would be the emoji transliterator. |
Well, to me, the reason to keep that component focused on the emoji transliterator is precisely the fact that it needs a bunch of data to operate. Mixing other features in the same component would force people to download a big package if they want to use those other features and not the transliteration. Even with the zlib compression, the data is still 4MB. And as reported in #49943, there are environments where the package size is an issue. |
I think, but I could be wrong, Fabien meant to create a dedicated component then |
Quick idea: isn't this a new "data" concept that could live next to bridges, bundles, components and contracts? |
@yguedidi separating the data from the code needing it would create more packages, not less. And it would be more complex to locate the data or would require adding autoloadable PHP classes in those data packages that can return the data (at which point we would have half of the existing symfony/intl code moved there) |
@OskarStark but what else would go there ? The Transliterator class itself is already part of PHP itself. And if we have other special transliterators needing their data, the question is whether projects using one will also use others or no. If the answer is no, putting their data together still leads to wasted space (which is what the issue is about). |
@yguedidi thus, if we have packages meant to expose this data, it means that the structure of this data is now the public API of those packages and must be covered by BC. This is a big no-go to me. Today, the data files in symfony/intl are purely internal (and we changed their format in the past, and the current PR changing it again is something that would be a no-go if they were not internal) |
On my side, I see the theory, but the pragmatic aspect of keeping everything here still wins: splitting 4M out of 11MB doesn't look worth the costs of the split to me. |
Most projects I know have a user base with more than 90% of them using a different locale, which makes it mandatory and therefor my vote goes to option 1: adding zlib as a composer dependency. It would also allow to remove that special treatment by handling all locales the same way and compressing english as well. |
What about:
|
@kevinpapst dealing with multiple locales does not imply that you need the translated language names and country names. |
@stof the value of |
@stof it was an alternative suggestion to somewhat find a middle ground between what you're proposing and Fabien's opinion. If an EmojiTranslator is thought to be too little value for a new component, but we want to have a separate package to avoid lots of unused disk space, let's create a package that adds zero value standalone. |
f4b81dd
to
d339300
Compare
Let's sum up the data sizes in the last few versions:
This PR would reduce the data size to 12M, way less than 6.1 (and the previous versions). To me, this PR is enough to fix the size issue of this component (as I still think that an emoji transliterator is not a big enough feature to warrant a full component). |
Maybe wild idea: what about keeping all the code infrastructure in this PR, but NOT compressing any files? |
What is the reason to not compress? |
Convenience mostly, on two aspects:
|
👍 Do you think it would make sense to do it when installing if
Sounds like it would be a good candidate to use LFS here if the data files are expected to change often, it helps significantly with the Git index size. |
Compressing after downloading would not reduce the bandwidth usage of the download (though I'm not sure how big the zip itself is on both cases) and would require a composer plugin to perform the compression and delete the original files (btw, this would create a mess for installations from source as composer would detect the files as modified) |
The data files will change once a year, when Unicode releases a new version. |
Personally I don't care about that at all, I'm just looking forward to being able to minimize Docker images size. |
About the bandwidth, this is a mistake in the original report: the data is always compressed on the network. Either via ZIP when dist, or git compression when source. |
I updated the PR to ship only the compression+decompression logic, but not the pre-compressed files. |
d339300
to
b541917
Compare
} | ||
|
||
foreach (glob(dirname(__DIR__).'/data/*/*.php') as $file) { | ||
if (in_array(basename($file), ['en.php', 'meta.php'], true)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that this script requires zlib anyway, is there any benefit to keeping those uncompressed ?
Builder::saveRules(Builder::buildGitHubRules($emojisCodePoints)); | ||
Builder::saveRules(Builder::buildSlackRules($emojisCodePoints)); | ||
Builder::saveRules(Builder::buildStripRules($emojisCodePoints)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why changing the order ?
@@ -192,7 +192,7 @@ public static function saveRules(iterable $rulesByLocale): void | |||
{ | |||
$firstChars = []; | |||
foreach ($rulesByLocale as $filename => $rules) { | |||
file_put_contents(self::TARGET_DIR."/$filename.php", "<?php\n\nreturn ".VarExporter::export($rules).";\n"); | |||
file_put_contents('compress.zlib://'.self::TARGET_DIR."/$filename.php.gz", "<?php\n\nreturn ".VarExporter::export($rules).";\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it expected that you still build them as compressed here ?
@nicolas-grekas can you add tests covering the loading of the compressed data ? Now that the repo does not pre-compress them anymore, this is not tested on CI anymore. |
oups, closed it by mistake. And apparently, github does not want me to re-open it |
…grekas) This PR was merged into the 6.3 branch. Discussion ---------- [Intl] Allow compressing emoji and data maps | Q | A | ------------- | --- | Branch? | 6.3 | Bug fix? | no | New feature? | yes | Deprecations? | no | Tickets | Fix #49943 | License | MIT | Doc PR | - Resuming #49970, which was closed by mistake :) Commits ------- de6a74c [Intl] Allow compressing emoji and data maps
This PR proposes to compress data and emoji maps into
.php.gz
files that are uncompressed on the fly.Compatibility with opcache is preserved so that this will have no impact on performance.
I see this PR as an alternative to #49947.
du -sh src/Symfony/Component/Intl/
:du -sh src/Symfony/Component/Intl/Resources/data/transliterator/
:The dependency to
ext-zlib
is made optional by not-compressing data for the "en" and the "meta" locales. The "en" locale is the default one in our recipes so this should help first timers. A LogicException is throw when using any other locales or when dealing with emojis andext-zlib
is missing.For the record, this is how I compressed all files: