-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DOC: Recommend using conda-forge for scikit-learn installation (#30130) #30193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Recommend using conda-forge for scikit-learn installation (#30130) #30193
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be okay with a slight nudge for users to use conda-forge
, but I don't think the language should be this strong.
Note that we completely support pypi/wheels and issues regarding installing wheels are never considered second class. There are also many packages which only exist on pypi
and therefore the user would have to have a mix anyway.
Also, the conda
/mamba
ecosystem is extremely heavy compared to using a system python with venv
or uv
or any similar toolset dealing with wheels. There are many people who use wheel based toolsets and they don't, and shouldn't have an issue with them as far as we are concerned.
So I think the change here should be minimal.
While we're at it, I think mentions to downloads from Anaconda should be removed since we don't want people to get into legal issues for downloading anaconda distros.
Note on the diff here itself, please keep the lines at 88 characters max.
cc @scikit-learn/core-devs @scikit-learn/documentation-team
+1 on everything @adrinjalali said
|
I also agree that the wording can be softer. I think there is value in having people understand the difference between the model of PyPI and the model of conda-forge and have them separate them from the various package managers which exist. Also Anaconda is disjoint from conda-forge, and it is worth that users understand that. |
To add a bit of context/reference to this PR below a screenshot of the install page after I selected MacOs and changed from We already point people to miniforge/conda-forge when they select conda. So for me there isn't anything to do on that front. The conda instructions already point you to the things we think people should be using. One question that is worth discussing is what the default selected option should be: pip or conda? I am not sure there is a good answer to this question, I can think of reasons for each being the default. Given that we support both at the same level and there are pros and cons to each it seems like we will not converge. Or that if we were to have this discussion again in three months time we will converge on a different answer. For me this means it isn't worth changing. I'd rather see an improvement to the miniforge page we link to. Right now that looks pretty off putting to the average user and me. What should I be clicking? Am I about to install some weird thing that will hack me because it is made by five people in a shed? Improving this might mean making a nicer landing page for miniforge or finding out that one already exists and improving our link. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the line of what Adrin said, could you:
- Revert the formatting to use 88 characters
- Only mention that we recommend using conda package from conda-forge for reasons that are given in external resources such as PyPackaging Native
- Remove all the comprehensive pieces of information given on this page (about PyPI). This page must remain short IMO.
?
Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the detailed suggestions, @jjerphan.
I will:
- Reformat the text to comply with the 88-character line limit.
- Simplify the installation instructions to briefly mention that we recommend using conda packages from conda-forge, and refer users to external resources like PyPackaging Native for more in-depth explanations.
- Remove the extensive information about PyPI to keep the page concise, as per your recommendation.
I appreciate your guidance and will update the PR to reflect these changes.
We already point people to miniforge/conda-forge when they select conda. So for me there isn't anything to do on that front.
People get lost. The overall ecosystem and narration conflate conda main channels and conda forge.
Can we point to mamba? I don't realize if it would lead to a clearer message and a more suitable configuration by default?
|
As if to prove the point that it is confusing: miniforge installs mamba (and conda) and uses conda-forge as the default channel. I am not sure that mentioning mamba explicitly will improve things, because it introduces yet another tool "no one has ever heard of" (assuming that the average reader of the install page probably hasn't heard of conda either). I concede that the way the docs are written right now we do nothing to help people get conda/andaconda/conda-forge/mamba/etc untangled in their head. But I think if people just blindly follow the instructions they will get the right answer (using conda-forge). They might even be using mamba by default, but I am not sure (we'd need an actual expert to confirm this). |
I agree that the state of the scientific python projects' distribution is quite tremendously unreadable at first. Mentioning using mamba in place of conda is one simple option as mamba (since 2.0) warns if the |
As if to prove the point that it is confusing: miniforge installs mamba (and conda) and uses conda-forge as the default channel.
OK
I am not sure that mentioning mamba explicitly will improve things, because it introduces yet another tool "no one has ever heard of"
Fair enough, but the title of the tab is "conda". Maybe rename it to "conda/mamba"?
|
Thank you for your thorough review, @adrinjalali. I appreciate your feedback. I understand that the current wording may be too strong in favor of conda-forge. I'll revise the language to present both installation methods—using pip and conda—as equally supported options. I'll ensure that we provide a gentle recommendation without discouraging users who prefer PyPI/wheels. I'll also remove any content that might suggest PyPI installations are less supported and will shorten the explanations to keep the documentation concise. Regarding the mentions of downloads from Anaconda, I will remove those to avoid any potential legal issues. Lastly, I apologize for exceeding the 88-character line limit. I'll reformat the text to comply with the project's style guidelines. Thank you again for your valuable input. I'll update the PR accordingly. |
I am not sure I understand what people would like to see changed and what the problem is that we are trying to solve. From the title of the PR and the text in the linked issue it sounds like the goal is to recommend conda-forge for people who choose the conda/mamba way of installing scikit-learn. We already do that. We do not advertise it/evangelise for it, but the default way we show results in people using conda-forge. This means we are not steering users towards doing something that might land them in trouble in terms of licensing. I am not sure scikit-learn should be in the business of telling people how to install scikit-learn. (1) I think there are reasonable arguments to be made for using wheels and for using conda packages. And even for using your OS level package manager. There are also arguments for not using wheels or conda or your OS's package manager. (2) I think the install page should get people "off the page" as quickly as possible, so they can go back to doing what they really want to be doing. This means it has to be concise, visually clear and easy to follow. (3) scikit-learn supports both methods as first class citizens. Lecturing the reader about what they should and shouldn't do is unhelpful if they are trying to quickly find the information and move on in life. For me there are two possible changes to be discussed:
|
+1 for not getting involved in the business of pros and cons of different possibilities of installation. We just support several.
-1
+0 What we might add is
@Maddouchhamza I would recommend to not make changes to this PR until it becomes clear what to change. |
Just directing attention to a special part of the proposed formulation: A formulation like "For most users we recommend ...[because of issues otherwise]" sends many people who could be interested in contributing off right away. It sounds like you either need to know this one way or you'll be in terrible trouble. Keep in mind that in data science there are many career changers, who might not be aware of what conda and the like are. We don't need to put too many details (author-led release, administrator permissions) in the main text, maybe in a separate section which a heading that makes it clear it is additional information that doesn't need to be digested in order to continue with the installation. Personally, the current introduction "There are different ways to install scikit-learn" has led me to try install scikit-learn the only way that I knew back then (pyenv). "For most users" and "strongly recommend" in bold letters sounds terribly exclusive. |
Was this a good thing or not? As in, do you wish we had given more opinionated advice so that you didn't end up using pyenv? |
Oh no, the contrary: Am saying that communicating openness / various ways to archive a thing includes more people. This is my whole point. |
Thank you all for your detailed reviews and thoughtful input on this pull request. Here’s a summary of the key feedback points:
Proposed Solution: Based on the above feedback, here’s my plan for revising the pull request:
Next Steps: I will proceed with these changes and update the pull request shortly. Please let me know if there are any additional points I should consider or if you have further suggestions. Thank you all again for your time and valuable feedback! |
48e7140
to
2852127
Compare
…implified content
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the detailed feedback, @jjerphan and @adrinjalali!
I’ve made the following updates based on your reviews:
- Reverted text formatting to adhere to the 88-character line limit.
- Softened the language recommending
conda-forge
to ensure a balanced tone. - Simplified the explanations to avoid overwhelming users.
- Removed mentions of Anaconda as requested.
Please let me know if there are additional adjustments needed. Thank you again for your guidance!
Hi @Maddouchhamza, you have entirely reorganized the page (which is not necessarily bad), but this goes well beyond the original scope of this PR and of the associated issue. I am fine if the scope of this PR is explicitly repurposed as an entire reorganization of the page, otherwise I would just go with minimal change. Also, I would let other people involved in the project phrase their opinion and make decision for the project; I have little ownership on the documentation whilst other do (I am thinking of @scikit-learn/documentation-team), and I think everyone would benefit from having more people authoring decisive reviews. 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pinging @jjerphan and thanks for the work @Maddouchhamza. As said this PR is currently including to many changes and is a bit hard to review. I would suggest the following:
- Limit the scope of this PR to reorganizing only the Installing the latest release section so that (hopefully) the diff is easier to read. I actually like the changes in the Third-party distributions of scikit-learn because the original version is too verbose, but it might be better to move that part into a different PR.
- Please revert unnecessary changes to rst anchors, tab synchronization, heading levels, CSS styles, etc. which breaks formatting of the page, unless you have specific reasons for doing so. I would suggest focusing only on contents and keep the changes minimal.
- Please stick to
..include:: ./install_instructions_conda.rst
for reusing conda instructions, as determined in DOC tweak appearance of installation instructions #29160 (comment). Changes can be made in that file if needed. - Please do not remove the dependency table
.. include:: min_dependency_table.rst
and the warning about minimum supported Python versions. I believe those are still relevant today.
Sorry @Maddouchhamza you found yourself in a seemingly not too hard but actually quite tricky issue, this does happen sometimes 😅. I would recommend closing this PR as this seems in a too complicated state, mixing too many topics (and maybe too many people if you ask me 🥲). Pick a single change that seems not controversial, for example pointing to conda-forge rather than Anaconda and start a new PR with this single change. You should aim for changing say 10 lines max 1. While you are working on this simpler PR, you will get more familiar with scikit-learn contribution. Once this simpler PR is merged you be in a better position to create other smaller PRs modifying the install page incrementally. Now my personal take on the original issue #30130 (optional reading): IMO, when people go to the install page, they want to quickly find a command that looks familiar, execute it and get running with scikit-learn. What they definitely don't want is having to learn more about the complicated state of Python packaging ... Personally I am a conda user for historical reasons and to avoid some of the pip limitations that were already discussed 2. I would be fine nudging slightly towards conda-forge (vs pip) if there is a way to do this with a diff changing roughly 10 lines max 3. If this is not possible with a 10 lines change, let's forget about it, because honestly this seems exactly the kind of issue that takes too much work for what it brings. Do I find it slightly sad, maybe, but at one point you have to be strong enough and cut your losses 💪 🧠. Also using the github issue tracker rather than the doc may be a better place to raise awareness of these tricky packaging issues. Footnotes
|
@jjerphan @Charlie-XIAO @lesteve, Thank you all for your detailed feedback and thoughtful suggestions! After reviewing the comments, I’ve decided to follow the recommendations to simplify the changes. I will close this PR and open a new one with a more focused scope. Specifically, the new PR will address pointing to Thank you for your patience and guidance—I appreciate your support in helping me learn and contribute effectively! Looking forward to your feedback on the new PR. |
Reference Issues/PRs
Fixes #30130
What does this implement/fix? Explain your changes.
This pull request updates the
install.rst
documentation to recommend usingconda-forge
for installing scikit-learn. The changes include:Emphasizing
conda-forge
as the preferred installation method: Updated the installation instructions to prioritizeconda-forge
, highlighting its advantages in dependency management and cross-platform compatibility.Providing detailed instructions for installing via
conda-forge
: Added step-by-step guidance for users to install scikit-learn usingconda
with theconda-forge
channel, including environment setup and activation.Clarifying the limitations of PyPI wheels for scientific packages: Included notes explaining why installing scikit-learn via PyPI (
pip
) can lead to issues due to dependency conflicts and lack of integration testing.Removing redundancies and improving clarity: Restructured the installation guide to avoid repetition, making it clearer and more user-friendly.
These updates address the concerns raised in issue #30130 about installation problems when using PyPI wheels and aim to provide users with a more reliable installation experience.
Any other comments?
I have rebuilt the documentation locally using
make doc
and verified that the changes display correctly without any warnings or errors. Please let me know if there are any suggestions or improvements I can make to this contribution.