Skip to content

MNT Fix E721 linting issues to do type comparisons with is #29501

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 30, 2024

Conversation

adrinjalali
Copy link
Member

Got these from a ruff check ., and hope they don't change anything.

Copy link

github-actions bot commented Jul 16, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 3f4bdb8. Link to the linter CI: here

@jeremiedbb
Copy link
Member

Is that because you a more recent version of ruff than the min 0.2.1 ?
Latest is 0.5.2, maybe it's time to bump the min version.

@@ -324,7 +324,7 @@ def fit(self, X, y=None):
Returns a fitted instance of self.
"""
dtype = bool if self.metric in PAIRWISE_BOOLEAN_FUNCTIONS else float
if dtype == bool and X.dtype != bool:
if dtype is bool and X.dtype != bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X.dtype is also a type. I would expect:

Suggested change
if dtype is bool and X.dtype != bool:
if dtype is bool and X.dtype is not bool:

@@ -2388,7 +2388,7 @@ def pairwise_distances(

dtype = bool if metric in PAIRWISE_BOOLEAN_FUNCTIONS else "infer_float"

if dtype == bool and (X.dtype != bool or (Y is not None and Y.dtype != bool)):
if dtype is bool and (X.dtype != bool or (Y is not None and Y.dtype != bool)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Comment on lines 589 to 592
assert type(cv_results["test_r2"]) is np.ndarray
assert type(cv_results["test_neg_mean_squared_error"]) is np.ndarray
assert type(cv_results["fit_time"]) is np.ndarray
assert type(cv_results["score_time"]) is np.ndarray
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to use isinstance here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert type(cv_results["test_r2"]) is np.ndarray
assert type(cv_results["test_neg_mean_squared_error"]) is np.ndarray
assert type(cv_results["fit_time"]) is np.ndarray
assert type(cv_results["score_time"]) is np.ndarray
assert isinstance(cv_results["test_r2"], np.ndarray)
assert isinstance(cv_results["test_neg_mean_squared_error"], np.ndarray)
assert isinstance(cv_results["fit_time"], np.ndarray)
assert isinstance(cv_results["score_time"], np.ndarray)

@@ -1501,7 +1501,7 @@ def _apply_on_subsets(func, X):
result_by_batch = [func(batch.reshape(1, n_features)) for batch in X]

# func can output tuple (e.g. score_samples)
if type(result_full) == tuple:
if type(result_full) is tuple:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if type(result_full) is tuple:
if isinstance(result_full, tuple):

@@ -1341,7 +1341,7 @@ def test_check_scalar_invalid(
include_boundaries=include_boundaries,
)
assert str(raised_error.value) == str(err_msg)
assert type(raised_error.value) == type(err_msg)
assert type(raised_error.value) is type(err_msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert type(raised_error.value) is type(err_msg)
assert isinstance(raised_error.value, type(err_msg))

@glemaitre
Copy link
Member

glemaitre commented Jul 25, 2024

This is strange that we did not catch this up before. I always thought that flake8 (or pycodestyle) was complaining about those pattern.

@adrinjalali
Copy link
Member Author

@glemaitre I remember why I hadn't changed the ones you suggested, cause it results in the following errors in the CI:

FAILED cluster/tests/test_optics.py::test_nowarn_if_metric_bool_data_bool - sklearn.exceptions.DataConversionWarning: Data will be converted to boolean...
FAILED cluster/tests/test_optics.py::test_warn_if_metric_bool_data_no_bool - AssertionError
FAILED metrics/tests/test_pairwise.py::test_pairwise_boolean_distance[dice] - sklearn.exceptions.DataConversionWarning: Data was converted to boolean for...
FAILED metrics/tests/test_pairwise.py::test_pairwise_boolean_distance[jaccard] - sklearn.exceptions.DataConversionWarning: Data was converted to boolean for...
FAILED metrics/tests/test_pairwise.py::test_pairwise_boolean_distance[rogerstanimoto] - sklearn.exceptions.DataConversionWarning: Data was converted to boolean for...
FAILED metrics/tests/test_pairwise.py::test_pairwise_boolean_distance[russellrao] - sklearn.exceptions.DataConversionWarning: Data was converted to boolean for...
FAILED metrics/tests/test_pairwise.py::test_pairwise_boolean_distance[sokalmichener] - sklearn.exceptions.DataConversionWarning: Data was converted to boolean for...
FAILED metrics/tests/test_pairwise.py::test_pairwise_boolean_distance[sokalsneath] - sklearn.exceptions.DataConversionWarning: Data was converted to boolean for...
FAILED metrics/tests/test_pairwise.py::test_pairwise_boolean_distance[yule] - sklearn.exceptions.DataConversionWarning: Data was converted to boolean for...
FAILED utils/tests/test_validation.py::test_check_array_multiple_extensions[True-boolean-bool] - ValueError: could not convert string to float: 'a'
FAILED utils/tests/test_validation.py::test_check_array_multiple_extensions[True-Int64-int64] - ValueError: could not convert string to float: 'a'
FAILED utils/tests/test_validation.py::test_check_array_multiple_extensions[True-Float64-float64] - ValueError: could not convert string to float: 'a'

giphy

@adrinjalali
Copy link
Member Author

Here's where the issue comes from for instance:

X.dtype == bool
Out[2]: True
X.dtype != bool
Out[3]: False
X.dtype is not bool
Out[4]: True
X.dtype
Out[5]: dtype('bool')

@@ -919,7 +919,7 @@ def is_sparse(dtype):
)
if all(isinstance(dtype_iter, np.dtype) for dtype_iter in dtypes_orig):
dtype_orig = np.result_type(*dtypes_orig)
elif pandas_requires_conversion and any(d == object for d in dtypes_orig):
elif pandas_requires_conversion and any(d is object for d in dtypes_orig):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently here we change the behaviour because of the following rule:

In [32]: np.dtype("object") is object
Out[32]: False

In [33]: np.dtype("object") == object
Out[33]: True

@adrinjalali
Copy link
Member Author

CI is now green here @glemaitre

@adrinjalali adrinjalali merged commit 68d8c2c into scikit-learn:main Aug 30, 2024
29 of 30 checks passed
@adrinjalali adrinjalali deleted the ruff/e721 branch August 30, 2024 13:13
MarcBresson pushed a commit to MarcBresson/scikit-learn that referenced this pull request Sep 2, 2024
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants