-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH adding as_frame functionality for CA housing dataset loader #15950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Removed unneeded parens Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>
Removed unneeded parens Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>
…into df_housing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM. The coverage is failing because the dataset is not downloaded on the CI machine but the tests pass as expected on my local machine with the prefetched data.
Just a few comments / improvements:
assert hasattr(bunch, 'frame') is True | ||
assert frame.shape == (20640, 9) | ||
assert isinstance(bunch.data, pd.DataFrame) | ||
assert isinstance(bunch.target, pd.DataFrame) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a test that checks that ImportError
is raised if pandas is missing and that the message is " fetch_california_housing with as_frame=True requires pandas."?
frame : pandas DataFrame | ||
Only present when `as_frame=True`. DataFrame with ``data`` and | ||
``target``. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pleased add the directive:
.. versionadded:: 0.23
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right way to do it?
def _pandas_is_missing():
try:
import pandas
except ImportError:
raise SkipTest('fetch_california_housing with as_frame=True'
' requires pandas')
@pytest.mark.skipif(
_pandas_is_missing(),
reason='Import pandas library to run this test'
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #15950 (comment) I mentioned to add a new test to check that the warning is raised without skipping the test when pandas is missing. Something like the following (untested):
@pytest.mark.skipif(
_is_california_housing_dataset_not_available(),
reason='Download California Housing dataset to run this test'
)
def test_pandas_dependency_message()
try:
import pandas
pytest.skip("This test requires pandas to be not installed")
except ImportError:
# Check that pandas is imported lazily and that an informative error message is
# raised when pandas is missing:
expected_msg = "fetch_california_housing with as_frame=True requires pandas"
with pytest.raises(ImportError, match=expected_msg):
fetch_california_housing(as_frame=True)
…into df_housing
…into df_housing
The CI linter failed for no good reason. Let's ignore the code style checks for that specific line: import pandas # noqa |
Yes this is it. It's If you append " When I said "the CI linter failed for no good reason." I meant the Continuous Integration linter tasks that display red crosses in the github interface on this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…it-learn#15950) Co-authored-by: Stephanie Andrews <byronic.string@gmail.com>
Reference Issues/PRs
Handles part of #10733
(Loosely following conventions found in #13902)
Closes #15486
What does this implement/fix? Explain your changes.
Adds as_frame functionality for the California Housing dataset loader (fetch_california_housing)
Any other comments?
Continuing work begun by @gitsteph at SF scikit sprint (Nov 2019)