-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH Better error for corrupted files in fetch_kddcup99 #19669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Just a side note: when reviewing #10027 I noticed that this fetcher is extremely memory inefficient and slow. For all practical purposes I think it's much better to use fetch_openml
.
Therefore, I wonder if we should not deprecate our own fetcher in favor of fetch_openml
although it will not return exactly the same datastructure so is not a drop-in replacement...
Which is also slow (ARFF parsing in pure python) but I imagine less so :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM!
The (near?) future, we can use Parquet files! A parquet parser would be harder to vendor, so it may lead us down a path of using optional dependencies. (Like how XREF: openml/openml-python#1032 |
Great! Looking forward to that. |
Really good news :) |
I cannot even use the scikit-learn loader without getting my 16GB RAM system to start swapping while https://github.com/scikit-learn/scikit-learn/pull/10027/files#r595370382 |
Reference Issues/PRs
Resolves #19667
What does this implement/fix? Explain your changes.
This PR adds a nicer error message when the file can not be loaded from cache.