Skip to content

FIX Incorrect warning when clustering boolean data #19046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jan 21, 2021

Conversation

jdsuryap
Copy link
Contributor

@jdsuryap jdsuryap commented Dec 20, 2020

Fixes #18996

Input X was being converted to float at the beginning of fit() method. However when a pairwise boolean metric was requested, X was again converted to boolean with multiple warning.

This caused were two issues:

  1. The data, if it was already of boolean type, was first converted to float and later to boolean
  2. The low level utility function printed the warning multiple times

As a FIX, a check has been put in place in the fit() method itself to let boolean data pass as such. However, when the data needs to be converted to boolean, it is done at this level only with the same warning that was printed earlier, but only once.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR @Prakashsur , this looks good so far. Could you please also need a non-regression test making sure that:

  • no warning is raised if the data passed o fit is already converted to bool
  • a single warning (the one you just added) is raised if the data passed to fit isn't converted to bool

thanks!

@jdsuryap jdsuryap changed the title FIX- Incorrect warning when clustering boolean data scikit-learn#18996 [MRG] FIX- Incorrect warning when clustering boolean data scikit-learn#18996 Dec 21, 2020
@jdsuryap
Copy link
Contributor Author

@NicolasHug
Both points have been resolved. Shall we take another look and take it forward?

Thanks

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jdsurya ! made some minor comments but LGTM

jdsuryap and others added 5 commits January 18, 2021 11:51
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
@cmarmo
Copy link
Contributor

cmarmo commented Jan 18, 2021

Hi @jdsurya , thanks for your pull request.
Do you mind fixing the lint errors so that all the check could be run? Thanks!

sklearn/cluster/tests/test_optics.py:219:80: E501 line too long (87 > 79 characters)
    # non-regression test for https://github.com/scikit-learn/scikit-learn/issues/18996
                                                                               ^
sklearn/cluster/tests/test_optics.py:231:80: E501 line too long (87 > 79 characters)
    # non regression test for https://github.com/scikit-learn/scikit-learn/issues/18996

@jdsuryap
Copy link
Contributor Author

Hi @cmarmo, ntmu!

Thanks for sharing the details! The two linting issues have been fixed and are passing now.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @jdsurya !

LGTM

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an entry to the change log at doc/whats_new/v1.0.rst with tag |Fix|. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

@thomasjpfan thomasjpfan changed the title [MRG] FIX- Incorrect warning when clustering boolean data scikit-learn#18996 FIX Incorrect warning when clustering boolean data Jan 21, 2021
@jdsuryap
Copy link
Contributor Author

Thank you for details regarding the entry into 'whats_new' Thomas @thomasjpfan! The same has been added.

Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
@thomasjpfan thomasjpfan merged commit 364b1e3 into scikit-learn:master Jan 21, 2021
@glemaitre glemaitre mentioned this pull request Apr 22, 2021
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect warning when clustering boolean data
5 participants