Skip to content

Accept pathlib.Path for data_home in fetch_openml #27447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
riedgar-ms opened this issue Sep 22, 2023 · 9 comments · Fixed by #27468
Closed

Accept pathlib.Path for data_home in fetch_openml #27447

riedgar-ms opened this issue Sep 22, 2023 · 9 comments · Fixed by #27468
Assignees

Comments

@riedgar-ms
Copy link

Describe the workflow you want to enable

When using fetch_openml() it would be nice if pathlib.Path objects were supported. Currently, there is a type check for str | None, so I have to convert my path objects first.

Describe your proposed solution

Change the accepted type to pathlib.Path | str | None

Describe alternatives you've considered, if relevant

No response

Additional context

No response

@riedgar-ms riedgar-ms added Needs Triage Issue requires triage New Feature labels Sep 22, 2023
@adrinjalali adrinjalali added help wanted and removed Needs Triage Issue requires triage labels Sep 22, 2023
@adrinjalali
Copy link
Member

Is this something that used to work but doesn't @riedgar-ms ?

@riedgar-ms
Copy link
Author

I expect it was working circa 2020, since that's when the change to our repo (Fairlearn) which started using pathlib.Path was merged. However, the builds are long deleted, so I can't confirm that, or see what version of SciKit-Learn we were pulling.

@adrinjalali
Copy link
Member

yeah we've added the param validation recently, so this is probably a regression.

@glemaitre
Copy link
Member

I think it makes sense to have this option but we should have a proper testing.
If I recall properly, we have a private functions that handles all the data_home shared by all fetchers (or at least I hope so) so we should only make sure that it works there.

So some unit testing, updating the documentation and parameter validation should be enough.

@Charlie-XIAO
Copy link
Contributor

Can we just directly support all os.Pathlike? Currently get_data_home already does so.

@adrinjalali
Copy link
Member

That would probably be okay I think

@glemaitre
Copy link
Member

@Charlie-XIAO Indeed, it seems fine with me.

@Charlie-XIAO
Copy link
Contributor

Thanks for the confirmation, I'll try to work on this.

@Charlie-XIAO
Copy link
Contributor

/take

riedgar-ms added a commit to fairlearn/fairlearn that referenced this issue Sep 25, 2023
<!--- If relevant, make sure to add a description of your changes in docs/user_guide/installation_and_version_guide/v<major>.<minor>.<patch>.rst -->

## Description

It appears that `fetch_openml()` no longer likes `pathlib.Path` objects. I have filed [an issue on SciKit-Learn](scikit-learn/scikit-learn#27447) about this, but in the meantime.....

## Tests
<!--- Select all that apply by putting an x between the brackets: [x] -->
- [x] no new tests required
- [ ] new tests added
- [ ] existing tests adjusted

## Documentation
<!--- Select all that apply. -->
- [x] no documentation changes needed
- [ ] user guide added or updated
- [ ] API docs added or updated
- [ ] example notebook added or updated

## Screenshots
<!--- If your PR makes changes to visualizations (e.g., matplotlib code) or the website, please include a screenshot of the result. -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants