-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MNT Replace PDF build by ZIP of the HTML #17564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hey @rth @thomasjpfan! Do you mind to review this PR (the log confirms that the ZIP of the HTML is properly uploaded and this link provides the downloadable file? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how much running optipng reduces the size of the zip file...
Nice, thanks @alfaro96 . Since we're not gonna have a ton of these files laying around, I don't think we'd need to worry about the size too much, since it requires cpu cycles to compress them. |
Generally we have an issue with our deployment docs repo size right now: I tried to clone it and at 6% it was at 3.7GB. Because we store the all files for each commit. Even with a clone at So I think anything we can do to reduce the size of files we put there could be worth doing, even at the cost of some CPU time. Though maybe compressing files in previous releases could be a start. The PDF was 48MB in size the ZIP of the PDF is 75MB, so not that different but still a bit more. |
The actual size of the ZIP file is 75MB (1.56 times the size of the PDF file). Running If repository size is a problem, I think that compressing the images with |
According to my calculations, running |
Great. How long does it take? We should probably optimize png and create this Zip in the deploy job not in the job that is building docs (as it runs only for commits on master and only once per commit, unlike the build docs job). Edit: opened #17568 for additional things we could try. |
Compressing the images takes 6:02 (using a docker container with the same hardware resources than CircleCI). I think that is quite affordable.
IIUC, I think that we can compress the images in the scikit-learn/build_tools/circle/build_doc.sh Lines 117 to 120 in a13ead2
are executed under the same condition than the deploy job scikit-learn/.circleci/config.yml Lines 129 to 133 in a13ead2
I have committed these changes to optimize the images ( Thanks @rth! |
Would it be appropriate to use git lfs to at least keep the git out of the server? Or should we look into less append-only usage of the repo? |
git lfs doesn't seem to work with github pages git-lfs/git-lfs#791 |
Should we move on with this PR? (#18151 solves the case insensitive filesystems) |
Okay, lets try to move this forward. The zipped file ends up to have a nested folder structure: For the future, we can work around the github pages being huge as follows:
This way we have a backup I suspect this is what matplotlib does with its devdocs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just wanted to know if @thomasjpfan has any other comments before merging this PR?
I have merged with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there aren't any HTML versions for the older versions, the links for the older PDF versions are not generated by build_tools/circle/list_versions.py
. I think we should continue to link to the PDF versions for <=0.24 and then have the HTML versions linked for >=1.0.0. (Unfortunately, this means having two code paths in list_versions.py
depending on the version.
You are absolutely right. I think that the current solution fixes the problem. |
The generated
Note that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Reference Issues/PRs
Closes #17051.
What does this implement/fix? Explain your changes.
This PR replaces the build of PDF documentation by a ZIP of the HTML for the offline documentation.