ENH Add `chain_method` to `ClassifierChain` #27700

lucyleeow · 2023-11-01T02:27:08Z

Reference Issues/PRs

Fixes #9247
Closes #9316 (supersedes)

What does this implement/fix? Explain your changes.

Add chain_method to ClassifierChain. Supports {'predict', 'predict_proba', 'predict_log_proba', 'decision_function'} (as suggested in #9316 (review))

Any other comments?

Was not sure about naming to distinguish feature input prediction vs output prediction variables, happy to change.

github-actions · 2023-11-01T02:28:41Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 969d889. Link to the linter CI: here}

glemaitre

A couple of comments regarding the code.

sklearn/multioutput.py

glemaitre · 2023-11-01T11:35:35Z

sklearn/multioutput.py

@@ -844,6 +889,10 @@ class labels for each estimator in the chain.
    order_ : list
        The order of labels in the classifier chain.

+    chain_method_ : str


This would be provided when using _return_response_method_used in _get_response_values.

sklearn/multioutput.py

glemaitre · 2023-11-01T11:42:56Z

sklearn/multioutput.py

+            if sp.issparse(X):
+                X_aug = sp.hstack((X, previous_predictions))
+            else:
+                X_aug = np.hstack((X, previous_predictions))


Suggested change

if sp.issparse(X):

X_aug = sp.hstack((X, previous_predictions))

else:

X_aug = np.hstack((X, previous_predictions))

hstack = sp.hstack if sp.issparse(X) else np.hstack

X_aug = hstack([X, previous_prediction])

Is it possible to get a previous_prediction that is dense and X being sparse and thus the stacking will make something not expected?

AFAICT it seems stacking a sparse with dense via sp.hstack gives you a sparse array (even though sp.hstack is not documented to support dense):

In [34]: from scipy.sparse import coo_matrix, hstack ...: ...: A = coo_matrix([[1, 2], [3, 4]]) In [35]: B = np.zeros((2,2)) In [36]: hstack([A,B]) Out[36]: <2x4 sparse matrix of type '<class 'numpy.float64'>' with 4 stored elements in COOrdinate format>

Maybe from here: https://github.com/scipy/scipy/blob/f990b1d2471748c79bc4260baf8923db0a5248af/scipy/sparse/_construct.py#L654 ?

Interesting. Probably this is the np.hstack that did not behave properly then.

Yes you may be right. The documentation does not mention that it would support sparse + dense.

I can't find any reference about whether this is intentional. This scipy PR mentions that the stack functions converts everything to COO format (but we knew this from the code). Also, I found this stackoverflow answer saying that you can stack sparse + dense.

Happy to add previous_prediction sparse conversion ? (Though this would be difficult to test since sp.hstack allows sparse + dense)

Also a quick look at our code, I could not find any other cases where it would be possible to be stacking dense + sparse, if that helps the decision in whether we need to covert to sparse.

sklearn/multioutput.py

lucyleeow · 2023-11-02T22:47:53Z

@glemaitre tests are currently failing because _get_response_values does not properly support 'predict_log_proba'. Maybe we add could support this in another PR? (or alternatively do not support 'predict_log_proba' in chain_method in this PR) ?

glemaitre · 2023-11-03T21:07:57Z

I did not think about predict_log_prob. I will open a PR to add the support. I think it should behave exactly like predict_proba (with an additional log when reverting the proba :))

glemaitre · 2023-11-03T21:43:28Z

The funny part about predict_log_proba is that ClassifierChain does not support it. So it is a bit fun to request it as a chain_method but not being able to get it a predict method :). I added the support in #27720 because it is straightforward.

lucyleeow · 2023-11-04T00:46:10Z

Yes I noticed that too, I mostly added it because it was suggested by Joel here: #9316 (review)

I can add support for predict_log_proba here? Maybe that was intended?

glemaitre · 2023-11-04T09:56:47Z

I can add support for predict_log_proba here? Maybe that was intended?

I open #27720 already.

lucyleeow · 2023-11-04T10:26:35Z

Ah amazing, thanks!

lucyleeow · 2023-11-04T10:27:02Z

Sorry I missed that in your comment!

glemaitre · 2023-12-04T10:22:53Z

Since we merged the blocking PR, I can give a review to this one once this is ready. Ping me @lucyleeow.

glemaitre

Another round of review but it looks good on my side.

sklearn/tests/test_multioutput.py

doc/whats_new/v1.4.rst

sklearn/multioutput.py

lucyleeow · 2023-12-05T06:19:38Z

Thanks @glemaitre , changes made. I think we did not decide if should convert to sparse: #27700 (comment)

I am fine with changing this in a separate PR as well, since this is not strictly related to this PR

glemaitre · 2023-12-05T18:32:23Z

I am fine with changing this in a separate PR as well, since this is not strictly related to this PR

Let's keep the same behaviour here.

glemaitre

LGTM. Thanks @lucyleeow

thomasjpfan

Minor comments on moving to v1.5, otherwise LGTM

thomasjpfan · 2024-02-23T00:20:36Z

doc/whats_new/v1.4.rst

+:mod:`sklearn.multioutput`
+..........................
+
+- |Enhancement| `chain_method` parameter added to `:class:``multioutput.ClassifierChain`.


This needs to be moved to 1.5

thomasjpfan · 2024-02-23T00:20:50Z

sklearn/multioutput.py

+          preference. The method used corresponds to the first method in
+          the list that is implemented by `base_estimator`.
+
+        .. versionadded:: 1.4


Suggested change

.. versionadded:: 1.4

.. versionadded:: 1.5

Looks like the CI is failing. It has been a while, so I do not know if it is related.

lucyleeow · 2024-02-23T00:50:06Z

Updated, thanks @thomasjpfan !

lucyleeow · 2024-02-24T05:22:06Z

Thanks @glemaitre and @thomasjpfan for the reviews!

lesteve · 2024-02-24T08:06:45Z

It seems like PR broke main somehow see https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=64417&view=results.

The error does reminded me of some issues that were seen in #27576.

NotImplementedError: We have not yet implemented 1D sparse slices; please index using explicit indices, e.g. `x[:, [0]]`

Maybe @StefanieSenger has some insights into this.

StefanieSenger · 2024-02-24T10:00:41Z

Yes, I do: :) Since merging this PR estimator_checks also check for sparse arrays, and for RegressorChain we had to convert the format of X into sparse coo array to circumvent a bug in scipy.

See this diff for more info.

Can I fix that?

StefanieSenger · 2024-02-24T10:25:11Z

I made #28524, please have a look @lesteve.

lucyleeow added 2 commits November 1, 2023 13:20

add chain method

b45a370

black

e4753e9

lucyleeow added 5 commits November 1, 2023 13:28

whats new

347dcbb

rm duplicated test

d5752bf

fix, rm chain_method param from regressorchain

4dde42c

black

8aab5d6

Merge branch 'main' into chain_method

56ac5fe

glemaitre self-requested a review November 1, 2023 11:36

glemaitre reviewed Nov 1, 2023

View reviewed changes

lucyleeow added 3 commits November 2, 2023 16:02

use _get_response_values

9e50488

merge main

2b49df0

fix-give output_method

7ca530e

glemaitre self-requested a review November 2, 2023 10:19

lucyleeow added 2 commits November 3, 2023 10:15

fix fit chain_method_ attr

6d1eb7d

black

dbe7ef2

This was referenced Nov 3, 2023

ENH _get_response_values handles predict_log_proba #27719

Merged

ENH add predict_log_proba to ClassifierChain #27720

Merged

lucyleeow mentioned this pull request Nov 17, 2023

MAINT Refactor: use _get_response_values in CalibratedClassifierCV #27796

Merged

lucyleeow added 2 commits December 4, 2023 21:01

Merge branch 'main' into chain_method

bf2b10e

lint

4141451

glemaitre reviewed Dec 4, 2023

View reviewed changes

lucyleeow added 2 commits December 5, 2023 17:11

review formatting

c554ac5

review comment, hstack

aa09f21

black

443bcdb

glemaitre approved these changes Dec 5, 2023

View reviewed changes

lucyleeow mentioned this pull request Dec 6, 2023

Ensure predictions sparse before sp.hstack in ClassifierChain #27905

Closed

lucyleeow added Waiting for Second Reviewer First reviewer is done, need a second one! module:multioutput labels Feb 1, 2024

merge main

dd49dd9

thomasjpfan previously approved these changes Feb 23, 2024

View reviewed changes

lucyleeow added 2 commits February 23, 2024 11:46

Merge branch 'main' into chain_method

50b16d0

update v

969d889

thomasjpfan approved these changes Feb 23, 2024

View reviewed changes

thomasjpfan merged commit 4e82537 into scikit-learn:main Feb 23, 2024

lucyleeow deleted the chain_method branch February 24, 2024 05:21

This was referenced Feb 24, 2024

BUILD/CI Switch to Meson as main build backend #28506

Merged

ENH Add retry mechanism to fetch_xx functions. #28160

Merged

StefanieSenger mentioned this pull request Feb 24, 2024

FIX fix scipy bug with sp.hstack in ClassifierChain and RegressorChain #28524

Merged

luis261 mentioned this pull request Feb 24, 2024

FEAT Introduce DBCV as new cluster metric #28244

Closed

13 tasks

Uh oh!

ENH Add chain_method to ClassifierChain #27700

ENH Add chain_method to ClassifierChain #27700

Uh oh!

Conversation

lucyleeow commented Nov 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Nov 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lucyleeow commented Nov 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Nov 3, 2023

Uh oh!

glemaitre commented Nov 3, 2023

Uh oh!

lucyleeow commented Nov 4, 2023

Uh oh!

glemaitre commented Nov 4, 2023

Uh oh!

lucyleeow commented Nov 4, 2023

Uh oh!

lucyleeow commented Nov 4, 2023

Uh oh!

glemaitre commented Dec 4, 2023

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lucyleeow commented Dec 5, 2023

Uh oh!

glemaitre commented Dec 5, 2023

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ENH Add `chain_method` to `ClassifierChain` #27700

ENH Add `chain_method` to `ClassifierChain` #27700

lucyleeow commented Nov 1, 2023 •

edited

Loading

github-actions bot commented Nov 1, 2023 •

edited

Loading

lucyleeow commented Nov 2, 2023 •

edited

Loading