Skip to content

Use OS-specific cache directories instead of home directory #31295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

norgera
Copy link
Contributor

@norgera norgera commented May 2, 2025

Resolves #31267

The get_data_home function now uses standard OS cache directories:

  • Linux/Unix: $XDG_CACHE_HOME/scikit-learn (~/.cache/scikit-learn)
  • macOS: ~/Library/Caches/scikit-learn
  • Windows: %LOCALAPPDATA%/scikit-learn (~/AppData/Local/scikit-learn)

Previously, data was stored in ~/scikit_learn_data by default.
This change follows OS conventions for cache storage and improves
maintainability.

Implemented deprecation protocol and added tests in test_base.py

Copy link

github-actions bot commented May 2, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 31720d0. Link to the linter CI: here

@lucyleeow
Copy link
Member

I think this would warrant a new entry, see: https://github.com/scikit-learn/scikit-learn/blob/main/doc/whats_new/upcoming_changes/README.md for instructions.

There's quite a few other places that need to be updated, I would suggest searching for "scikit_learn_data" in the codebase.

@norgera
Copy link
Contributor Author

norgera commented May 3, 2025

I believe I have covered all the files now

@jeremiedbb
Copy link
Member

Thanks for the PR @norgera. This is a breaking change so I think it needs a deprecation cycle.

@glemaitre
Copy link
Member

Indeed. Here is a link on how we want to go through when it comes to deprecation: https://scikit-learn.org/dev/developers/contributing.html#deprecation

@norgera
Copy link
Contributor Author

norgera commented May 5, 2025

Thank you all for the information.

I've added deprecation to use the original home directory if its detected or manually set, providing a warning + instructions to move files to new cache folder. Otherwise uses new cache folder.

Let me know if this approach is appropriate.

Copy link
Member

@lucyleeow lucyleeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a flyby note, current release is 1.7, so should be 'deprecated in 1.7, removed in 1.9'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change the default data directory
4 participants