MAINT parameter validation in Perceptron #23521

Nwanna-Joseph · 2022-06-02T07:24:04Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Add validators for Perceptron. Towards #23462

Defines _parameter_constraints in Perceptron.
Following the steps in the reference Perceptron to let Perceptron models call self._validate_params.

Any other comments?

jeremiedbb

Thanks for the PR @Nwanna-Joseph. Looks like you have an auto formatting tool that messes up the indentation of the function definitions. Can you revert theses changes ?

It appears that BaseSGD already defines a method named _validate_params. I think we should rename it for now otherwise calling self._validate_params() will invoke this one instead of the one we're expecting (we need to call both).

sklearn/linear_model/_stochastic_gradient.py

…nto validate_perceptron

…te_perceptron

Micky774

Hey there @Nwanna-Joseph, thanks for the PR! Note that I'm not a core maintainer so you'll still need two approvals from core maintainers before your PR can be merged, but I'm hoping my review can help you along the way and speed up the process :)

Overall, one thing worth noting here is that this could be significantly simplified/streamlined by making more use of inheritance as I mention in one of my comments. In particular, some constraints ([fit_intercept, max_iter, tol, ...]) can go all the way in BaseSGD but you can also include some in the intermediate BaseSGD{Classifier, Regressor} classes (as you've done w/ loss). These would be params like early_stopping that we don't want in SGDOneClassSVM. Finally there are the params specific to PassiveAggressive{Classifier, Regressor} which should be fairly few.

Let me know if you have any questions/concerns.

sklearn/linear_model/_passive_aggressive.py

Micky774 · 2022-06-23T19:02:47Z

sklearn/linear_model/_passive_aggressive.py

+    _parameter_constraints = {
+        "loss": [StrOptions({"epsilon_insensitive", "squared_epsilon_insensitive"})],
+        "C": [Interval(Real, None, None, closed="neither")],
+        "fit_intercept": [bool],
+        "max_iter": [Interval(Integral, 1, None, closed="left")],
+        "tol": [Interval(Real, None, None, closed="neither"), None],
+        "shuffle": [bool],
+        "verbose": [Interval(Integral, 0, None, closed="left")],
+        "random_state": ["random_state"],
+        "early_stopping": [bool],
+        "validation_fraction": [Interval(Real, 0, 1, closed="neither")],
+        "n_iter_no_change": [Interval(Integral, 1, None, closed="left")],
+        "warm_start": [bool],
+        "average": [Interval(Integral, 0, None, closed="left"), bool],
+        "epsilon": [Interval(Real, 0, None, closed="left")],
+    }


I think we should try to map out what parameters reside in what classes. A lot of this can be greatly simplified via inheritance. For example, PassiveAggressiveRegressor inherits from BaseSGDRegressor, but the parameter constraints here are separate. I think BaseSGDRegressor has the same constraints for all the parameters except [loss, C, epsilon].

This could then be simplified as

Suggested change

_parameter_constraints = {

"loss": [StrOptions({"epsilon_insensitive", "squared_epsilon_insensitive"})],

"C": [Interval(Real, None, None, closed="neither")],

"fit_intercept": [bool],

"max_iter": [Interval(Integral, 1, None, closed="left")],

"tol": [Interval(Real, None, None, closed="neither"), None],

"shuffle": [bool],

"verbose": [Interval(Integral, 0, None, closed="left")],

"random_state": ["random_state"],

"early_stopping": [bool],

"validation_fraction": [Interval(Real, 0, 1, closed="neither")],

"n_iter_no_change": [Interval(Integral, 1, None, closed="left")],

"warm_start": [bool],

"average": [Interval(Integral, 0, None, closed="left"), bool],

"epsilon": [Interval(Real, 0, None, closed="left")],

}

_parameter_constraints = {

**BaseSGDRegressor._parameter_constraints,

"loss": [StrOptions({"epsilon_insensitive", "squared_epsilon_insensitive"})],

"C": [Interval(Real , 0, None, closed="neither")],

"epsilon": [Interval(Real, 0, None, closed="left")],

}

sklearn/utils/_param_validation.py

sklearn/utils/estimator_checks.py

sklearn/linear_model/tests/test_sgd.py

jeremiedbb · 2022-06-24T11:54:41Z

@Nwanna-Joseph I directly pushed some changes to take into account recent improvements in the validation mechanism and to better leverage inheritance.

The inheritance between these classes clearly has some flaws, but let's keep that for a future PR.

glemaitre · 2022-06-24T12:23:04Z

sklearn/linear_model/_stochastic_gradient.py

+from ..utils._param_validation import Interval
+from ..utils._param_validation import StrOptions
+from ..utils._param_validation import Hidden


Suggested change

from ..utils._param_validation import Interval

from ..utils._param_validation import StrOptions

from ..utils._param_validation import Hidden

from ..utils._param_validation import Hidden, Interval, StrOptions

I'd rather rely on isort #23362 😄

It is to be consistent with the other imports in the other files :)

that you changed unilaterally 🦊

glemaitre

Otherwise LGTM

sklearn/linear_model/_stochastic_gradient.py

glemaitre · 2022-06-24T12:47:02Z

sklearn/linear_model/_stochastic_gradient.py

@@ -188,19 +186,11 @@ def _get_loss_function(self, loss):
            raise ValueError("The loss %s is not supported. " % loss) from e

    def _get_learning_rate_type(self, learning_rate):


In a future PR, I think that we can just remove those _get functions and directly get the argument.

sklearn/linear_model/_stochastic_gradient.py

glemaitre · 2022-06-24T12:51:48Z

sklearn/linear_model/_stochastic_gradient.py

@@ -1167,6 +1178,20 @@ class SGDClassifier(BaseSGDClassifier):
    [1]
    """

+    _parameter_constraints = {
+        **BaseSGDClassifier._parameter_constraints,
+        "penalty": [StrOptions({"l2", "l1", "elasticnet"}), None],


If seems that None is not documented. I know that in LogisticRegression, we request a string "none" instead of the Python None.

I open #23749 to discuss the issue. We can let it as-is here. Maybe only updating the documentation is required.

None was not documented but it is documented in PassiveAggressive so it was just a mistake. I added it to the docstring

glemaitre · 2022-06-24T13:03:52Z

sklearn/linear_model/_stochastic_gradient.py

+        "l1_ratio": [Interval(Real, 0, 1, closed="both")],
+        "power_t": [Interval(Real, None, None, closed="neither")],


Can we just use Real if we don't have any min and max?

The difference is that -inf and inf are not in the interval while they are instances of Real, so this is more accurate

sklearn/linear_model/_stochastic_gradient.py

jeremiedbb

LGTM

sklearn/linear_model/_stochastic_gradient.py

Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com>

glemaitre

LGTM

…cikit-learn#23521) Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com>

Nwanna-Joseph and others added 6 commits May 31, 2022 16:39

Simplify comparison

acb5004

Merge branch 'scikit-learn:main' into main

c9b3ae9

impl

863fcaf

setup _parameter_constraints

c10120a

bug fixes

f22ab2a

_parameter_constraints for BaseSGD

7b7670b

Nwanna-Joseph changed the title ~~Validate perceptron~~ [WIP] Validate perceptron Jun 2, 2022

Merge branch 'scikit-learn:main' into main

e60d2e4

jeremiedbb reviewed Jun 3, 2022

View reviewed changes

sklearn/linear_model/_stochastic_gradient.py Outdated Show resolved Hide resolved

Nwanna-Joseph and others added 14 commits June 4, 2022 11:35

fixes

8525f12

fix validation

5744e4f

black linting

07722e5

Merge branch 'scikit-learn:main' into main

efd4a16

fixes

489f060

black lint

8e61778

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

07d46ec

…nto validate_perceptron

merge

2ac300b

bug fix

76fcdf6

fix

8fabf2f

fix perceptron

93ee1a8

black lint

1c17f73

fix tests

302a329

black lint

b85ffa4

Nwanna-Joseph requested a review from jeremiedbb June 6, 2022 10:54

Nwanna-Joseph mentioned this pull request Jun 6, 2022

Make all estimators use _validate_params #23462

Closed

Nwanna-Joseph added 5 commits June 6, 2022 12:31

optimize imports

45ef9a2

test fix

419bcd7

black lint

e29f513

clean up

5fe1785

clean up

e4dc9c2

jeremiedbb added No Changelog Needed Validation related to input validation labels Jun 13, 2022

Nwanna-Joseph and others added 9 commits June 14, 2022 03:23

Merge branch 'scikit-learn:main' into validate_perceptron

e25c1e7

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

d217e5b

…nto validate_perceptron

Merge remote-tracking branch 'origin/validate_perceptron' into valida…

c400b7f

…te_perceptron

build fix

ed78625

build fix

68509d4

black linting

3918206

fix bug

f1f9435

fix bug

8efc7bc

fix bug

e58dcd8

Micky774 reviewed Jun 23, 2022

View reviewed changes

jeremiedbb added 2 commits June 24, 2022 10:35

fixes

b8b7335

Merge remote-tracking branch 'upstream/main' into pr/Nwanna-Joseph/23521

a97734e

glemaitre changed the title ~~[MRG] Validate perceptron~~ MAINT parameter validation in Perceptron Jun 24, 2022

jeremiedbb added 3 commits June 24, 2022 13:40

try to leverage inheritence + some fixes

3bd2cc6

cln

cc7a30f

lint

c5cb17f

hidden for undocumented learning_rates

3e146dc

glemaitre self-requested a review June 24, 2022 12:17

glemaitre reviewed Jun 24, 2022

View reviewed changes

address review comments

782fad9

jeremiedbb approved these changes Jun 24, 2022

View reviewed changes

Micky774 reviewed Jun 24, 2022

View reviewed changes

sklearn/linear_model/_stochastic_gradient.py Outdated Show resolved Hide resolved

sklearn/linear_model/_stochastic_gradient.py Outdated Show resolved Hide resolved

Apply suggestions from code review

c14410d

Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com>

glemaitre approved these changes Jun 24, 2022

View reviewed changes

jeremiedbb merged commit 7f0b57e into scikit-learn:main Jun 24, 2022

		@@ -188,19 +186,11 @@ def _get_loss_function(self, loss):
		raise ValueError("The loss %s is not supported. " % loss) from e

		def _get_learning_rate_type(self, learning_rate):

		"l1_ratio": [Interval(Real, 0, 1, closed="both")],
		"power_t": [Interval(Real, None, None, closed="neither")],

Uh oh!

MAINT parameter validation in Perceptron #23521

MAINT parameter validation in Perceptron #23521

Uh oh!

Conversation

Nwanna-Joseph commented Jun 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Micky774 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeremiedbb commented Jun 24, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeremiedbb Jun 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Nwanna-Joseph commented Jun 2, 2022 •

edited

Loading

jeremiedbb Jun 24, 2022 •

edited

Loading