[WIP] first step in fixing minimum 2 required samples for fixing MLPRegressor attribute error #24788

mohitthakur13 · 2022-10-29T14:36:23Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Added an if loop to check 10% of input training data (X) is at least 2 - based on the comment by @glemaitre in the _validate_data function in the BaseEstimator class in sklearn/base.py

If this is the correct approach to design the fix, what needs to be further done:

manual selection in case 2 >= len(X) < 10
Testing

Any other comments?

Any further comments or directions will be appreciated to design a proper fix.

…or attribute error

betatim · 2022-10-31T14:13:34Z

Thanks for making this PR!

I think a better place to perform this check is in Base.fit() in sklearn/neureal_network/_multilayer_perceptron.py. There you can use self.validation_fraction instead of having to hard-code 0.1. The check should only happen if early_stopping is being used.

mohitthakur13 · 2022-10-31T20:42:31Z

Thanks @betatim , great tip. Will get back on this.

Thanks for making this PR!

I think a better place to perform this check is in Base.fit() in sklearn/neureal_network/_multilayer_perceptron.py. There you can use self.validation_fraction instead of having to hard-code 0.1. The check should only happen if early_stopping is being used.

…ng inputs

mohitthakur13 · 2022-11-02T09:27:08Z

Hi @betatim , I updated the Base.fit() in sklearn/neureal_network/_multilayer_perceptron.py with an if loop to check self.validation_fraction as suggested by you. You should be able to see it now.

Of course, this implies that for any training input where len(X) is less than 11, a ValueError will be raised. This is the intention or? Could there be some angles of the bigger picture that I might be missing here?

Thank you.

betatim · 2022-11-02T17:00:09Z

Apologies, I didn't read the code carefully enough, fit() isn't a good place. I've now read the code and think adding

if X_val.shape[0] < 2:
    raise ValueError("The validation set is too small. Increase the validation_fraction or the size of your dataset.")

after line 581 (where X_val is defined) in _multilayer_perceptron.py is the thing to do.

This way we can directly check that X_val contains enough samples (more than two).

I've also made a suggestion for the error message.

mohitthakur13 · 2022-11-02T18:14:51Z

Thanks for the swift feedback @betatim

X_val is unfortunately not used in fit or _fit function in the BaseMultilayerPerceptron. Also, fit is not checking early_stopping criteria. Could this be a possibility in _fit (at the cost of repetition though):

early_stopping = self.early_stopping
X, X_val, y, y_val = train_test_split(
                X,
                y,
                random_state=self._random_state,
                test_size=self.validation_fraction,
                stratify=stratify,
            )
if (early_stopping and (X_val.shape[0] < 2)):
        raise ValueError("The validation set is too small. Increase the validation_fraction or the size of your dataset.")

betatim · 2022-11-03T08:24:58Z

X_val is created in _fit_stochastic(), so the check that it contains enough samples needs to be there as well. As far as I can see _fit_stochastic is the only place where a validation dataset is created, because it is the only place where early_stopping = True changes the behaviour of the estimator.

mohitthakur13 · 2022-11-03T13:05:05Z

I see. Then, if I understood to somewhat correct degree, here is what could work:

early stopping matters only in _fit_stochastic solver, so its sort of independent from checking the input dataset size or validation_fraction in general.
But _fit_stochastic is called by _fit in BaseMultilayerPerceptron.
Therefore, by putting the if check right after self._validate_input in _fit function will likely solve the whole issue and we only do it once. One does not need to put an extra check for _fit_stochastic with early_stopping as its already been taken care by _fit due to independence.

You should already see it in the PR update. Please let me know if it makes sense or if I am missing something in the issue.

mohitthakur13 · 2022-11-07T12:28:33Z

Hi @betatim , wonder if you had a chance to look at the solution above? Let me know if it addresses the issue. Thanks.

betatim · 2022-11-08T08:57:02Z

HI, unfortunately we misunderstood each other.

X_val is created in _fit_stochastic(), so the check that it contains enough samples needs to be there as well.

mohitthakur13 · 2022-11-08T15:04:09Z

Ok, so the recent commit contains the following two additions:

added an if check in _fit function in the BaseMultilayerPerceptron class

if (self.validation_fraction * X.shape[0] < 2):
    raise ValueError("The validation set is too small. Increase the validation_fraction or the size of your dataset.")

added an additional if check within the _fit_stochastic function:

if X_val.shape[0] < 2:
    raise ValueError("The validation set is too small. Increase the validation_fraction or the size of your dataset.")

I suspect the reason to add this additional check in _fit_stochastic is because _fit_stochastic is called upon in other places in the code than just from _fit, or?

betatim · 2022-11-08T16:51:49Z

I would remove your first addition and only check in _fit_stochastic. The reason being that there you can test what you really want to test: does the validation dataset contain enough samples? In _fit you'd have to replicate the exact behaviour of how train_test_split() (which is used to create the validation set) computes how many samples will be in the validation set. This is why I think checking in _fit_stochastic is the better thing to do.

… function

mohitthakur13 · 2022-11-08T17:47:52Z

I see, got it @betatim - thanks for the clarification. Done - now, we only have the check in _fit_stochastic.
Would you know what the next steps should be with this issue?
Thanks.

betatim · 2022-11-09T14:41:27Z

Two more things are needed:

a non-regression test to make sure the bug doesn't come back. Maybe based on the example reported in AttributeError: 'MLPRegressor' object has no attribute '_best_coefs' #24713
- the test should fail on main and pass with the changes from this PR
an entry in the "what's new" change log that reports that this bug has been fixed

mohitthakur13 · 2022-11-15T23:23:30Z

@betatim - sorry for a bit of delay in coming back to it. I have added a test (last function) in the sklearn/neural_network/tests/test_mlp.py, which now checks the minimum input sample size.

It passes the test when pytest sklearn/neural_network/tests/test_mlp.py but when I just run pytest for all the tests, the following two (seemingly unrelated) errors show up:

E AttributeError: module 'sklearn.metrics._pairwise_distances_reduction._base' has no attribute 'BaseDistanceReducer64'
E AttributeError: module 'sklearn.metrics._pairwise_distances_reduction._base' has no attribute 'BaseDistanceReducer64'

Could it be due to some deprecated code?
Thank you.

mohitthakur13 · 2022-11-18T09:46:14Z

@betatim - could you look at the code (test submitted) above? Any feedback welcome.

betatim · 2022-11-28T14:17:12Z

The linting tests that check the formatting of the code are failing. Take a look at step 9 of https://scikit-learn.org/stable/developers/contributing.html#how-to-contribute or the section after that for the tools you need to run to have the formatting fixed automatically. The easiest way is to use pre-commit.

I'm not sure why the other two tests fail, they shouldn't be related to your changes. If you fix the linter errors we can see if the problems also happen in the CI itself.

mohitthakur13 · 2022-11-28T16:55:39Z

You are right @betatim, shouldn't be from the fix I suggested. Let me have look. Thanks for the feedback.

mohitthakur13 · 2022-12-01T18:31:40Z

Done!

github-actions · 2025-04-13T14:24:00Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: c3ca73b. Link to the linter CI: here}

jeremiedbb

LGTM. Thanks @mohitthakur13

…gressor (scikit-learn#24788) Co-authored-by: Mohit Singh Thakur <mohit.thakur@tum.de> Co-authored-by: Mohit Singh Thakur <thakur.mohit@gmail.com> Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>

first step in fixing minimum 2 required samples for fixing MLPRegress…

4e13200

…or attribute error

add seld.validation_fraction test in Base.fit() for at least 2 traini…

9c284e9

…ng inputs

add check in _fit

29f15d8

mohitthakur13 force-pushed the MLPRegressor_attribute_error branch from b8ec6ec to 29f15d8 Compare November 3, 2022 13:01

add minimum input sample size check in _fit_stochastic

600fc00

removed the unnecessary step of input sample size check from the _fit…

c150fa8

… function

added the minimum input sample size test

17b0a16

Mohit Singh Thakur and others added 4 commits December 1, 2022 15:48

Merge branch 'main' into MLPRegressor_attribute_error

74b3310

added bug fixed to the changlog

de86c72

added bug fixed to the changlog

981a6ac

add changlog to the latest whats new file v1.3

c66b36d

glemaitre mentioned this pull request Sep 2, 2024

AttributeError: 'MLPRegressor' object has no attribute '_best_coefs' #24713

Closed

glemaitre self-requested a review September 2, 2024 09:49

glemaitre removed their request for review November 6, 2024 17:47

Merge remote-tracking branch 'upstream/main' into pr/mohitthakur13/24788

c3ca73b

jeremiedbb approved these changes Apr 13, 2025

View reviewed changes

jeremiedbb merged commit eec4449 into scikit-learn:main Apr 13, 2025
36 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] first step in fixing minimum 2 required samples for fixing MLPRegressor attribute error #24788

[WIP] first step in fixing minimum 2 required samples for fixing MLPRegressor attribute error #24788

mohitthakur13 commented Oct 29, 2022

betatim commented Oct 31, 2022 •

edited

Loading

mohitthakur13 commented Oct 31, 2022 •

edited

Loading

mohitthakur13 commented Nov 2, 2022

betatim commented Nov 2, 2022 •

edited

Loading

mohitthakur13 commented Nov 2, 2022 •

edited

Loading

betatim commented Nov 3, 2022

mohitthakur13 commented Nov 3, 2022 •

edited

Loading

mohitthakur13 commented Nov 7, 2022

betatim commented Nov 8, 2022

mohitthakur13 commented Nov 8, 2022 •

edited

Loading

betatim commented Nov 8, 2022

mohitthakur13 commented Nov 8, 2022

betatim commented Nov 9, 2022

mohitthakur13 commented Nov 15, 2022

mohitthakur13 commented Nov 18, 2022

betatim commented Nov 28, 2022 •

edited

Loading

mohitthakur13 commented Nov 28, 2022

mohitthakur13 commented Dec 1, 2022

github-actions bot commented Apr 13, 2025

jeremiedbb left a comment

[WIP] first step in fixing minimum 2 required samples for fixing MLPRegressor attribute error #24788

[WIP] first step in fixing minimum 2 required samples for fixing MLPRegressor attribute error #24788

Conversation

mohitthakur13 commented Oct 29, 2022

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

betatim commented Oct 31, 2022 • edited Loading

mohitthakur13 commented Oct 31, 2022 • edited Loading

mohitthakur13 commented Nov 2, 2022

betatim commented Nov 2, 2022 • edited Loading

mohitthakur13 commented Nov 2, 2022 • edited Loading

betatim commented Nov 3, 2022

mohitthakur13 commented Nov 3, 2022 • edited Loading

mohitthakur13 commented Nov 7, 2022

betatim commented Nov 8, 2022

mohitthakur13 commented Nov 8, 2022 • edited Loading

betatim commented Nov 8, 2022

mohitthakur13 commented Nov 8, 2022

betatim commented Nov 9, 2022

mohitthakur13 commented Nov 15, 2022

mohitthakur13 commented Nov 18, 2022

betatim commented Nov 28, 2022 • edited Loading

mohitthakur13 commented Nov 28, 2022

mohitthakur13 commented Dec 1, 2022

github-actions bot commented Apr 13, 2025

✔️ Linting Passed

jeremiedbb left a comment

Choose a reason for hiding this comment

betatim commented Oct 31, 2022 •

edited

Loading

mohitthakur13 commented Oct 31, 2022 •

edited

Loading

betatim commented Nov 2, 2022 •

edited

Loading

mohitthakur13 commented Nov 2, 2022 •

edited

Loading

mohitthakur13 commented Nov 3, 2022 •

edited

Loading

mohitthakur13 commented Nov 8, 2022 •

edited

Loading

betatim commented Nov 28, 2022 •

edited

Loading