Skip to content

DOC: Recommend using conda-forge for scikit-learn installation (#30130) #30193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Maddouchhamza
Copy link

Reference Issues/PRs

Fixes #30130

What does this implement/fix? Explain your changes.

This pull request updates the install.rst documentation to recommend using conda-forge for installing scikit-learn. The changes include:

  • Emphasizing conda-forge as the preferred installation method: Updated the installation instructions to prioritize conda-forge, highlighting its advantages in dependency management and cross-platform compatibility.

  • Providing detailed instructions for installing via conda-forge: Added step-by-step guidance for users to install scikit-learn using conda with the conda-forge channel, including environment setup and activation.

  • Clarifying the limitations of PyPI wheels for scientific packages: Included notes explaining why installing scikit-learn via PyPI (pip) can lead to issues due to dependency conflicts and lack of integration testing.

  • Removing redundancies and improving clarity: Restructured the installation guide to avoid repetition, making it clearer and more user-friendly.

These updates address the concerns raised in issue #30130 about installation problems when using PyPI wheels and aim to provide users with a more reliable installation experience.

Any other comments?

I have rebuilt the documentation locally using make doc and verified that the changes display correctly without any warnings or errors. Please let me know if there are any suggestions or improvements I can make to this contribution.

Copy link

github-actions bot commented Nov 1, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 220d4fd. Link to the linter CI: here

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be okay with a slight nudge for users to use conda-forge, but I don't think the language should be this strong.

Note that we completely support pypi/wheels and issues regarding installing wheels are never considered second class. There are also many packages which only exist on pypi and therefore the user would have to have a mix anyway.

Also, the conda/mamba ecosystem is extremely heavy compared to using a system python with venv or uv or any similar toolset dealing with wheels. There are many people who use wheel based toolsets and they don't, and shouldn't have an issue with them as far as we are concerned.

So I think the change here should be minimal.

While we're at it, I think mentions to downloads from Anaconda should be removed since we don't want people to get into legal issues for downloading anaconda distros.

Note on the diff here itself, please keep the lines at 88 characters max.

cc @scikit-learn/core-devs @scikit-learn/documentation-team

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Nov 4, 2024 via email

@jjerphan
Copy link
Member

jjerphan commented Nov 4, 2024

I also agree that the wording can be softer.

I think there is value in having people understand the difference between the model of PyPI and the model of conda-forge and have them separate them from the various package managers which exist. Also Anaconda is disjoint from conda-forge, and it is worth that users understand that.

@betatim
Copy link
Member

betatim commented Nov 4, 2024

To add a bit of context/reference to this PR below a screenshot of the install page after I selected MacOs and changed from pip to conda:
Screenshot 2024-11-04 at 15 15 00

We already point people to miniforge/conda-forge when they select conda. So for me there isn't anything to do on that front. The conda instructions already point you to the things we think people should be using.

One question that is worth discussing is what the default selected option should be: pip or conda? I am not sure there is a good answer to this question, I can think of reasons for each being the default. Given that we support both at the same level and there are pros and cons to each it seems like we will not converge. Or that if we were to have this discussion again in three months time we will converge on a different answer. For me this means it isn't worth changing.

I'd rather see an improvement to the miniforge page we link to. Right now that looks pretty off putting to the average user and me. What should I be clicking? Am I about to install some weird thing that will hack me because it is made by five people in a shed? Improving this might mean making a nicer landing page for miniforge or finding out that one already exists and improving our link.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the line of what Adrin said, could you:

  • Revert the formatting to use 88 characters
  • Only mention that we recommend using conda package from conda-forge for reasons that are given in external resources such as PyPackaging Native
  • Remove all the comprehensive pieces of information given on this page (about PyPI). This page must remain short IMO.

?

Thank you.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed suggestions, @jjerphan.

I will:

  • Reformat the text to comply with the 88-character line limit.
  • Simplify the installation instructions to briefly mention that we recommend using conda packages from conda-forge, and refer users to external resources like PyPackaging Native for more in-depth explanations.
  • Remove the extensive information about PyPI to keep the page concise, as per your recommendation.

I appreciate your guidance and will update the PR to reflect these changes.

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Nov 4, 2024 via email

@betatim
Copy link
Member

betatim commented Nov 4, 2024

We already point people to miniforge/conda-forge when they select conda. So for me there isn't anything to do on that front.
People get lost. The overall ecosystem and narration conflate conda main channels and conda forge. Can we point to mamba? I don't realize if it would lead to a clearer message and a more suitable configuration by default?

As if to prove the point that it is confusing: miniforge installs mamba (and conda) and uses conda-forge as the default channel.

I am not sure that mentioning mamba explicitly will improve things, because it introduces yet another tool "no one has ever heard of" (assuming that the average reader of the install page probably hasn't heard of conda either).

I concede that the way the docs are written right now we do nothing to help people get conda/andaconda/conda-forge/mamba/etc untangled in their head. But I think if people just blindly follow the instructions they will get the right answer (using conda-forge). They might even be using mamba by default, but I am not sure (we'd need an actual expert to confirm this).

@jjerphan
Copy link
Member

jjerphan commented Nov 4, 2024

I agree that the state of the scientific python projects' distribution is quite tremendously unreadable at first.

Mentioning using mamba in place of conda is one simple option as mamba (since 2.0) warns if the defaults channels are used.

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Nov 4, 2024 via email

@Maddouchhamza
Copy link
Author

I'd be okay with a slight nudge for users to use conda-forge, but I don't think the language should be this strong.

Note that we completely support pypi/wheels and issues regarding installing wheels are never considered second class. There are also many packages which only exist on pypi and therefore the user would have to have a mix anyway.

Also, the conda/mamba ecosystem is extremely heavy compared to using a system python with venv or uv or any similar toolset dealing with wheels. There are many people who use wheel based toolsets and they don't, and shouldn't have an issue with them as far as we are concerned.

So I think the change here should be minimal.

While we're at it, I think mentions to downloads from Anaconda should be removed since we don't want people to get into legal issues for downloading anaconda distros.

Note on the diff here itself, please keep the lines at 88 characters max.

cc @scikit-learn/core-devs @scikit-learn/documentation-team

Thank you for your thorough review, @adrinjalali. I appreciate your feedback.

I understand that the current wording may be too strong in favor of conda-forge. I'll revise the language to present both installation methods—using pip and conda—as equally supported options. I'll ensure that we provide a gentle recommendation without discouraging users who prefer PyPI/wheels.

I'll also remove any content that might suggest PyPI installations are less supported and will shorten the explanations to keep the documentation concise.

Regarding the mentions of downloads from Anaconda, I will remove those to avoid any potential legal issues.

Lastly, I apologize for exceeding the 88-character line limit. I'll reformat the text to comply with the project's style guidelines.

Thank you again for your valuable input. I'll update the PR accordingly.

@betatim
Copy link
Member

betatim commented Nov 5, 2024

I am not sure I understand what people would like to see changed and what the problem is that we are trying to solve. From the title of the PR and the text in the linked issue it sounds like the goal is to recommend conda-forge for people who choose the conda/mamba way of installing scikit-learn. We already do that. We do not advertise it/evangelise for it, but the default way we show results in people using conda-forge. This means we are not steering users towards doing something that might land them in trouble in terms of licensing.

I am not sure scikit-learn should be in the business of telling people how to install scikit-learn. (1) I think there are reasonable arguments to be made for using wheels and for using conda packages. And even for using your OS level package manager. There are also arguments for not using wheels or conda or your OS's package manager. (2) I think the install page should get people "off the page" as quickly as possible, so they can go back to doing what they really want to be doing. This means it has to be concise, visually clear and easy to follow. (3) scikit-learn supports both methods as first class citizens. Lecturing the reader about what they should and shouldn't do is unhelpful if they are trying to quickly find the information and move on in life.

For me there are two possible changes to be discussed:

  1. change the default from pip to conda
  2. rename the conda option to conda/mamba

@lorentzenchr
Copy link
Member

+1 for not getting involved in the business of pros and cons of different possibilities of installation. We just support several.

  1. change the default from pip to conda

-1

  1. rename the conda option to conda/mamba

+0

What we might add is

@Maddouchhamza I would recommend to not make changes to this PR until it becomes clear what to change.

@StefanieSenger
Copy link
Contributor

Just directing attention to a special part of the proposed formulation:

A formulation like "For most users we recommend ...[because of issues otherwise]" sends many people who could be interested in contributing off right away.

It sounds like you either need to know this one way or you'll be in terrible trouble. Keep in mind that in data science there are many career changers, who might not be aware of what conda and the like are. We don't need to put too many details (author-led release, administrator permissions) in the main text, maybe in a separate section which a heading that makes it clear it is additional information that doesn't need to be digested in order to continue with the installation.

Personally, the current introduction "There are different ways to install scikit-learn" has led me to try install scikit-learn the only way that I knew back then (pyenv).

"For most users" and "strongly recommend" in bold letters sounds terribly exclusive.

@betatim
Copy link
Member

betatim commented Nov 26, 2024

Personally, the current introduction "There are different ways to install scikit-learn" has led me to try install scikit-learn the only way that I knew back then (pyenv).

Was this a good thing or not? As in, do you wish we had given more opinionated advice so that you didn't end up using pyenv?

@StefanieSenger
Copy link
Contributor

Was this a good thing or not? As in, do you wish we had given more opinionated advice so that you didn't end up using pyenv?

Oh no, the contrary: Am saying that communicating openness / various ways to archive a thing includes more people. This is my whole point.

@Maddouchhamza
Copy link
Author

Thank you all for your detailed reviews and thoughtful input on this pull request. Here’s a summary of the key feedback points:

  1. General Tone and Language:

    • The current language recommending conda-forge may be too strong.
    • There’s a preference for balanced wording that equally supports both pip and conda installation methods without making one seem inferior.
    • Highlighting the advantages of conda-forge is important, but not at the expense of other methods.
  2. Complexity and Length:

    • The documentation should remain concise and avoid overwhelming users with too many details.
    • While it’s valuable to mention key differences between PyPI and conda-forge, a more succinct and accessible presentation is preferable.
  3. Mamba and Conda:

    • There’s agreement that mentioning mamba as an alternative could add value, but it should be introduced carefully to avoid further confusion.
    • Renaming the “Conda” tab to “Conda/Mamba” was suggested for clarity.
  4. Anaconda Mentions:

    • Mentions of Anaconda should be removed to avoid potential licensing or legal issues.
  5. Formatting:

    • The text should adhere to the 88-character line limit.
  6. Inclusivity and Usability:

    • Strong recommendations like “We strongly recommend” can be off-putting to beginners and may exclude users unfamiliar with specific tools.
    • It’s important to communicate that multiple installation methods are supported and valid.

Proposed Solution:

Based on the above feedback, here’s my plan for revising the pull request:

  1. Balanced Recommendation:

    • I will rephrase the documentation to provide a more neutral recommendation. For example:

      “Both pip and conda are valid ways to install scikit-learn. For users who prefer conda, we recommend using the conda-forge channel for better compatibility and dependency management.”

  2. Simplified Content:

    • I will consolidate the details about PyPI and conda-forge into a shorter section that provides just enough context without overwhelming the user.
    • I will include links to external resources (e.g., PyPackaging Native) for users who wish to dive deeper.
  3. Mamba:

    • I will add a note about mamba as an efficient alternative to conda, with a brief explanation:

      “For faster installation, users can also consider using mamba, a drop-in replacement for conda that is pre-installed with miniforge.”

    • I will rename the tab from “Conda” to “Conda/Mamba” for clarity.

  4. Removal of Anaconda References:

    • All mentions of Anaconda will be removed as per the feedback.
  5. Formatting and Style:

    • I will ensure all lines adhere to the 88-character limit and follow the documentation style guide.
  6. Inclusive Language:

    • I will revise phrases like “For most users we strongly recommend” to more inclusive and open wording, such as:

      “There are multiple ways to install scikit-learn. Depending on your environment and preferences, you may choose pip or conda.”


Next Steps:

I will proceed with these changes and update the pull request shortly. Please let me know if there are any additional points I should consider or if you have further suggestions. Thank you all again for your time and valuable feedback!

@Maddouchhamza Maddouchhamza force-pushed the issue30130-conda-forge-recommendation branch from 48e7140 to 2852127 Compare December 23, 2024 23:29
Copy link
Author

@Maddouchhamza Maddouchhamza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed feedback, @jjerphan and @adrinjalali!

I’ve made the following updates based on your reviews:

  • Reverted text formatting to adhere to the 88-character line limit.
  • Softened the language recommending conda-forge to ensure a balanced tone.
  • Simplified the explanations to avoid overwhelming users.
  • Removed mentions of Anaconda as requested.

Please let me know if there are additional adjustments needed. Thank you again for your guidance!

@jjerphan
Copy link
Member

jjerphan commented Jan 2, 2025

Hi @Maddouchhamza, you have entirely reorganized the page (which is not necessarily bad), but this goes well beyond the original scope of this PR and of the associated issue.

I am fine if the scope of this PR is explicitly repurposed as an entire reorganization of the page, otherwise I would just go with minimal change.


Also, I would let other people involved in the project phrase their opinion and make decision for the project; I have little ownership on the documentation whilst other do (I am thinking of @scikit-learn/documentation-team), and I think everyone would benefit from having more people authoring decisive reviews. 🙂

Copy link
Contributor

@Charlie-XIAO Charlie-XIAO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pinging @jjerphan and thanks for the work @Maddouchhamza. As said this PR is currently including to many changes and is a bit hard to review. I would suggest the following:

  • Limit the scope of this PR to reorganizing only the Installing the latest release section so that (hopefully) the diff is easier to read. I actually like the changes in the Third-party distributions of scikit-learn because the original version is too verbose, but it might be better to move that part into a different PR.
  • Please revert unnecessary changes to rst anchors, tab synchronization, heading levels, CSS styles, etc. which breaks formatting of the page, unless you have specific reasons for doing so. I would suggest focusing only on contents and keep the changes minimal.
  • Please stick to ..include:: ./install_instructions_conda.rst for reusing conda instructions, as determined in DOC tweak appearance of installation instructions #29160 (comment). Changes can be made in that file if needed.
  • Please do not remove the dependency table .. include:: min_dependency_table.rst and the warning about minimum supported Python versions. I believe those are still relevant today.

@lesteve
Copy link
Member

lesteve commented Jan 4, 2025

Sorry @Maddouchhamza you found yourself in a seemingly not too hard but actually quite tricky issue, this does happen sometimes 😅.

I would recommend closing this PR as this seems in a too complicated state, mixing too many topics (and maybe too many people if you ask me 🥲).

Pick a single change that seems not controversial, for example pointing to conda-forge rather than Anaconda and start a new PR with this single change. You should aim for changing say 10 lines max 1.

While you are working on this simpler PR, you will get more familiar with scikit-learn contribution. Once this simpler PR is merged you be in a better position to create other smaller PRs modifying the install page incrementally.


Now my personal take on the original issue #30130 (optional reading):

IMO, when people go to the install page, they want to quickly find a command that looks familiar, execute it and get running with scikit-learn. What they definitely don't want is having to learn more about the complicated state of Python packaging ...

Personally I am a conda user for historical reasons and to avoid some of the pip limitations that were already discussed 2. I would be fine nudging slightly towards conda-forge (vs pip) if there is a way to do this with a diff changing roughly 10 lines max 3.

If this is not possible with a 10 lines change, let's forget about it, because honestly this seems exactly the kind of issue that takes too much work for what it brings. Do I find it slightly sad, maybe, but at one point you have to be strong enough and cut your losses 💪 🧠. Also using the github issue tracker rather than the doc may be a better place to raise awareness of these tricky packaging issues.

Footnotes

  1. I mean it can be 20 or maybe 30. Certainly a diff of +140/-260 as it is right now is definitely too big, even if the PR was opened by a very experienced scikit-learn developer.

  2. for example mixing different OpenMP seems one of the main culprit see https://github.com/microsoft/LightGBM/issues/6595 as the most recent issue I was involved in I can remember.

  3. I would also admit that it's quite likely you can live all your life as a pip user and and bump into none of the limitations. I have heard of issues with conda-forge mixing PyTorch and scikit-learn where IIRC the import order matters, but since I never faced them myself I thought oh well let's look at this if I see it happening one day.

@Maddouchhamza
Copy link
Author

@jjerphan @Charlie-XIAO  @lesteve, Thank you all for your detailed feedback and thoughtful suggestions!

After reviewing the comments, I’ve decided to follow the recommendations to simplify the changes. I will close this PR and open a new one with a more focused scope. Specifically, the new PR will address pointing to conda-forge instead of Anaconda with minimal changes (approximately 10 lines). This will make the review process easier and more aligned with the project's needs.

Thank you for your patience and guidance—I appreciate your support in helping me learn and contribute effectively! Looking forward to your feedback on the new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC Motivate preferably using conda-forge's distribution of scikit-learn
9 participants