ENH Improve regularization messages for QuadraticDiscriminantAnalysis #19731

azihna · 2021-03-19T18:50:53Z

Reference Issues/PRs

References #14997

What does this implement/fix? Explain your changes.

Remove warning: the variables collinear.
Check the covariance matrix after regularization and raise a LinAlgError prompting to increase regularization.
I left the tol argument as a way to control the error. The user can turn it off if need be.
Updated the documentation for tol argument.
Change the tests to check for the correct errors.

Any other comments?

The last test was to check for cases where n_samples_class < n_features, I tried many different configurations but I was unable to produce a set of variables the produced an output that didn't throw an error. The test is only there with negative examples, I am open to suggestions for any positive examples in this case.

thomasjpfan

Thank you for the PR @azihna !

thomasjpfan · 2021-03-20T15:09:20Z

sklearn/discriminant_analysis.py

            S2 = (S ** 2) / (len(Xg) - 1)
            S2 = ((1 - self.reg_param) * S2) + self.reg_param
+            cov_reg = np.dot(S2 * Vt.T, Vt)
+            det = linalg.det(cov_reg)
+            if det < self.tol:


Traditionally, tol is compared to the singular values for determining the rank. Can we use np.linalg.matrix_rank here and pass in tol? (Internally matrix_rank computes the svd)

matrix rank is going to be expansive also. I think we should just use the results of the SVD as explained in the above comment.

In your comment here: #14997 (comment)

I think that when the covariance matrix is not full rank we should probably raise a LinAlgError at fit time explicitly recommending the user to increase regularization.

From looking at the code, S does not depend on reg_param, which means increasing the regularization would still result in the error message.

How about using S2 in the same way as S?

I experimented with the proposed change when using S2 instead of S the solution works as intended but there are two tests that use random data that the changes couldn't pass:

test_common.py/test_estimators[Quadratic_DiscriminantAnalysis-check_estimators_dtype]

test_common.py/test_estimators[Quadratic_DiscriminantAnalysis-check_dtype_object]

both of them fit the model with random data then check the existence of methods or attributes. This random data raises the errors. As the quick fixes, I could:

Change the LinAlgError to LinAlgWarning which would still warn the user of the problem

Mark the tests as fail
Do you have any further recommendations?

As for a work around, we can update:

scikit-learn/sklearn/utils/estimator_checks.py

Line 566 in 26b6f60

def _set_checking_parameters(estimator):

To set reg_param.

sklearn/tests/test_discriminant_analysis.py

sklearn/discriminant_analysis.py

thomasjpfan · 2021-04-07T15:51:59Z

sklearn/utils/estimator_checks.py

+        X, y = make_classification(random_state=seed, n_samples=40,
+                                   n_informative=8, n_features=10,
+                                   n_classes=4)


We try not to special case estimators here. Is there no reg_param we can use to make this work?

thanks a lot for the review @thomasjpfan. Sadly no. With random data, the check is always failing no matter how large the reg_param, I tried quite a few different ones. That's why I also mentioned in the earlier param that changing the error to a warning might be a better solution. I think users might be annoyed that the model will throw an error when they try to fit on toy datasets like that.

I changed the error to a warning with the latest commit and reverted back to the original state with estimator checks.

It seems like LinAlgWarning was introduced at SciPy==1.1.0 and the CI pipelines for Python 3.6 are using SciPy==0.19. SciPy actually support Python 3.6 until version 1.6.0. Is it possible to upgrade the CI requirements?

…nto add_linalg_error_qda

azihna · 2021-04-23T15:29:00Z

@ogrisel
Using LinAlgError with the current implementation seems to fail in some use cases. E.g. for some datasets (such as the randomized datasets used in estimator_checks) the error doesn't go away no matter the high the regularization, it is fine with normal datasets. This caused problems in test_common.py because the tests would fail due to LinAlgError.
Now to LinAlgWarning and changed the message to be a bit more informative, however LinAlgWarning was introduced in scipy=1.1.0 and CI pipeline fails in py36 checks.
I'd like to change the warning to RuntimeWarning with a note to upgrade when the minimum scipy dependency is changed. Do you think that this change the point of the PR too much? Would you recommend another approach?

…nto add_linalg_error_qda

thomasjpfan

If there is an issue with test_common.py, you can add some regularization by updating _set_checking_parameters:

scikit-learn/sklearn/utils/estimator_checks.py

Lines 566 to 568 in 36915ae

    
           def _set_checking_parameters(estimator): 
        
               # set parameters to speed up some estimators and 
        
               # avoid deprecated behaviour

thomasjpfan · 2021-06-06T03:04:30Z

doc/whats_new/v1.0.rst

+- |Enhancement| :class:`discriminant_analysis.QuadraticDiscriminantAnalysis` 
+  will now cause 'RuntimeWarning' in case of collinear variables. These errors
+  can be silenced by the 'reg_param' attribute.:pr:`19731` by 
+  :user:`Alihan Zihna <azihna>`


May we move the discriminant_analysis section higher in this list? (The modules are in alphabetical order)

Suggested change

- |Enhancement| :class:`discriminant_analysis.QuadraticDiscriminantAnalysis`

will now cause 'RuntimeWarning' in case of collinear variables. These errors

can be silenced by the 'reg_param' attribute.:pr:`19731` by

:user:`Alihan Zihna <azihna>`

- |Enhancement| :class:`discriminant_analysis.QuadraticDiscriminantAnalysis`

will now cause 'LinAlgWarning' in case of collinear variables. These errors

can be silenced by the 'reg_param' attribute. :pr:`19731` by

:user:`Alihan Zihna <azihna>`

thomasjpfan · 2021-06-06T03:08:15Z

sklearn/tests/test_discriminant_analysis.py

-    with pytest.warns(RuntimeWarning, match="divide by zero"):
-        y_pred = clf.predict(X2)
-    assert np.any(y_pred != y6)


With the LinAlgWarning, is this RuntimeWarning still being raised?

So it means that we need to record the warning and check that both warnings are raised or shall we catch the division by zero warning that is not informative then?

I've put back the test, catching all the raised warnings.

doc/whats_new/v1.0.rst

sklearn/discriminant_analysis.py

sklearn/tests/test_common.py

glemaitre · 2021-07-26T13:18:44Z

sklearn/tests/test_discriminant_analysis.py

-    with pytest.warns(RuntimeWarning, match="divide by zero"):
-        y_pred = clf.predict(X2)
-    assert np.any(y_pred != y6)


So it means that we need to record the warning and check that both warnings are raised or shall we catch the division by zero warning that is not informative then?

Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

github-actions · 2024-04-16T11:21:01Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 5d8f3ed. Link to the linter CI: here}

adrinjalali · 2024-04-16T11:21:12Z

@thomasjpfan @glemaitre @ogrisel I gave this a refresh and applied suggestions, could you please have another go?

adrinjalali · 2024-04-16T13:03:37Z

so somehow the warnings are NOT raised in some of our CI, so I removed them in 8854f58 (they were raised locally though).

glemaitre · 2024-05-18T09:24:54Z

doc/whats_new/v1.5.rst

@@ -217,6 +217,14 @@ Changelog
  have the `n_features_in_` and `feature_names_in_` attributes after `fit`.
  :pr:`27937` by :user:`Marco vd Boom <tvdboom>`.

+:mod:`sklearn.discriminant_analysis`


We will need to move in 1.6 :)

glemaitre

LGTM

Add LinAlgError after regularization

7c2e6af

thomasjpfan reviewed Mar 20, 2021

View reviewed changes

ogrisel reviewed Mar 22, 2021

View reviewed changes

sklearn/tests/test_discriminant_analysis.py Outdated Show resolved Hide resolved

ogrisel reviewed Mar 22, 2021

View reviewed changes

sklearn/discriminant_analysis.py Outdated Show resolved Hide resolved

Fix typo and change determinant calc to rank calc

d5b3dc5

azihna requested review from thomasjpfan and ogrisel March 24, 2021 18:37

Fix data for qda estimator checks

fa968d2

github-actions bot added the module:utils label Apr 7, 2021

thomasjpfan reviewed Apr 7, 2021

View reviewed changes

Update the change log

4e92909

NicolasHug mentioned this pull request Apr 7, 2021

CI Add a check for milestones. #19833

Merged

Alihan Zihna added 3 commits April 12, 2021 09:48

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

488309d

…nto add_linalg_error_qda

Revert the estimator_checks version

22b8c91

Convert error to warning

4ccf5e4

Alihan Zihna added 2 commits May 4, 2021 18:36

Update LinAlgWarning to RuntimeWarning

910ef6c

Fix linting

03265aa

azihna requested a review from thomasjpfan May 5, 2021 19:35

Alihan Zihna added 5 commits May 11, 2021 08:52

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

cddd5c2

…nto add_linalg_error_qda

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

fbf519f

…nto add_linalg_error_qda

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

a9b3310

…nto add_linalg_error_qda

Revert back to LinAlgWarning

0c9b94c

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

bde9d6c

…nto add_linalg_error_qda

thomasjpfan reviewed Jun 6, 2021

View reviewed changes

glemaitre self-assigned this Jul 26, 2021

Merge remote-tracking branch 'origin/main' into pr/azihna/19731

3c7ddc0

glemaitre reviewed Jul 26, 2021

View reviewed changes

glemaitre removed their assignment Jul 27, 2021

Update doc/whats_new/v1.0.rst

a752c29

Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

adrinjalali and others added 5 commits April 16, 2024 12:49

Update doc/whats_new/v1.0.rst

99ac1af

Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

Update sklearn/discriminant_analysis.py

63d243f

Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

Update sklearn/discriminant_analysis.py

f963d65

Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

Update sklearn/discriminant_analysis.py

ae08d18

Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

Merge remote-tracking branch 'upstream/main' into add_linalg_error_qda

c6a301b

remove failing warnings from tests

8854f58

glemaitre reviewed May 18, 2024

View reviewed changes

glemaitre approved these changes May 18, 2024

View reviewed changes

adrinjalali added 2 commits June 6, 2024 15:16

Merge remote-tracking branch 'upstream/main' into add_linalg_error_qda

457034c

DOC move changelog to 1.6

5d8f3ed

adrinjalali approved these changes Jun 6, 2024

View reviewed changes

adrinjalali enabled auto-merge (squash) June 6, 2024 13:18

adrinjalali merged commit 99916c4 into scikit-learn:main Jun 6, 2024
30 checks passed

jeremiedbb mentioned this pull request Jul 2, 2024

Release 1.5.1 #29382

Merged

11 tasks

	def _set_checking_parameters(estimator):
	# set parameters to speed up some estimators and
	# avoid deprecated behaviour

Uh oh!

ENH Improve regularization messages for QuadraticDiscriminantAnalysis #19731

ENH Improve regularization messages for QuadraticDiscriminantAnalysis #19731

Uh oh!

Conversation

azihna commented Mar 19, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

azihna Apr 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

azihna commented Apr 23, 2021

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

adrinjalali commented Apr 16, 2024

Uh oh!

adrinjalali commented Apr 16, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

azihna Apr 7, 2021 •

edited

Loading

github-actions bot commented Apr 16, 2024 •

edited

Loading