Make check_parameters_default_constructible more lenient regarding accepted types #25467

glemaitre · 2023-01-24T11:28:31Z

With the nightly build of numpy, we have the following error raised:

               allowed_types = {
                    str,
                    int,
                    float,
                    bool,
                    tuple,
                    type(None),
                    type,
                    types.FunctionType,
                    joblib.Memory,
                }
                # Any numpy numeric such as np.int32.
                allowed_types.update(np.core.numerictypes.allTypes.values())
>               assert type(init_param.default) in allowed_types, (
                    f"Parameter '{init_param.name}' of estimator "
                    f"'{Estimator.__name__}' is of type "
                    f"{type(init_param.default).__name__} which is not "
                    "allowed. All init parameters have to be immutable to "
                    "make cloning possible. Therefore we restrict the set of "
                    "legal types to "
                    f"{set(type.__name__ for type in allowed_types)}."
                )
E               AssertionError: Parameter 'pooling_func' of estimator 'FeatureAgglomeration' is of type _ArrayFunctionDispatcher which is not allowed. All init parameters have to be immutable to make cloning possible. Therefore we restrict the set of legal types to {'bool_', 'int64', 'datetime64', 'complex128', 'int8', 'object_', 'bool', 'float16', 'float32', 'tuple', 'str', 'longlong', 'complex64', 'generic', 'int', 'uint64', 'float128', 'int32', 'floating', 'complex256', 'uint32', 'float', 'integer', 'timedelta64', 'type', 'uint16', 'function', 'int16', 'void', 'Memory', 'flexible', 'complexfloating', 'float64', 'unsignedinteger', 'bytes_', 'signedinteger', 'NoneType', 'number', 'uint8', 'character', 'inexact', 'ulonglong', 'str_'}.

I think this is linked with the integration of: numpy/numpy#23020

Now, the np.func are of type _ArrayFunctionDispatcher and not function anymore. This is a kind of regression but I assume that we could make our test more lenient and accept those types of input.

The only issue is that this dispatcher is private so I don't think this is a bright idea to add it to our test as is.

The text was updated successfully, but these errors were encountered:

glemaitre · 2023-01-24T11:29:03Z

Solving this issue will solve the remaining failure of scipy-dev.

shamzos · 2023-01-25T03:12:17Z

Hello, can I work on this issue ?

lesteve · 2023-01-25T07:52:52Z

@shamzos probably not the best issue to work on, you can look at the issues tagged "good first issue". Have a look at https://scikit-learn.org/dev/developers/contributing.html for more details.

betatim · 2023-01-25T14:20:25Z

Is there a reason why it has to be a types.FunctionType? Would it be ok if we checked if is callable()? callable doesn't check the type, but I wonder if that difference matters? Does function promise more regarding to immutability than callable?

glemaitre · 2023-01-25T14:24:28Z

Does function promise more regarding to immutability than callable?

No, but this is not the same pattern used. Right now we check if the type is in a given set of types. callable is equivalent to isinstance. So we would need to rewrite the statement.

assert type(init_param.default) in allowed_types or callable(init_param.default)

betatim · 2023-01-25T15:01:16Z

Yes, you'd have to rewrite it. But it seemed like that would be easy (in terms of making the edit) compared to understanding if this is "the right thing to do".

lesteve · 2023-01-25T16:41:05Z

It seems like we are trying to avoid the Python gotcha with default mutable arguments but at the same time we accept functions as default arguments which are potentially mutable so ...

Allowing callables would accept an even wider class of mutable objects.

Honestly I don't know what to do about this test, maybe two main options:

be permissive: accept callables and trust that nothing too weird happens
be restrictive: do not accept functions as default arguments and follow the usual pattern of setting the default to None and then in fit transforming this None into the actual default function.

Quickly checking there are 7 estimators that use functions as default arguments (removing types.FunctionType from allowed_types and running pytest sklearn/tests/test_common.py -k default_constructible)

FAILED sklearn/tests/test_common.py::test_estimators[FeatureAgglomeration()-check_parameters_default_constructible] - AssertionError: Parameter 'pooling_f...
FAILED sklearn/tests/test_common.py::test_estimators[GenericUnivariateSelect()-check_parameters_default_constructible] - AssertionError: Parameter 'score_...
FAILED sklearn/tests/test_common.py::test_estimators[SelectFdr()-check_parameters_default_constructible] - AssertionError: Parameter 'score_func' of estim...
FAILED sklearn/tests/test_common.py::test_estimators[SelectFpr()-check_parameters_default_constructible] - AssertionError: Parameter 'score_func' of estim...
FAILED sklearn/tests/test_common.py::test_estimators[SelectFwe()-check_parameters_default_constructible] - AssertionError: Parameter 'score_func' of estim...
FAILED sklearn/tests/test_common.py::test_estimators[SelectKBest()-check_parameters_default_constructible] - AssertionError: Parameter 'score_func' of est...
FAILED sklearn/tests/test_common.py::test_estimators[SelectPercentile()-check_parameters_default_constructible] - AssertionError: Parameter 'score_func' o...

betatim · 2023-01-26T08:51:23Z

I think I'd be permissive. The main trap(?) is mutable default arguments, so things like lists that end up being shared. I'm not sure the same trap exists for functions, for functions (or callables) the problem is much worse: who knows what they are doing, independent of them being defined as default value.

Another aspect that could be relevant: is there a constraint on the default arguments of an estimator because of how joblib works? I'm thinking of things like Python lambda's not being picklable and such. Any thoughts?

lesteve · 2023-01-26T16:27:45Z

Another aspect that could be relevant: is there a constraint on the default arguments of an estimator because of how joblib works? I'm thinking of things like Python lambda's not being picklable and such. Any thoughts?

I don't think so but I could be wrong. Also I am quite sure that there are common tests checking that estimators can be pickled which would catch such an issue.

glemaitre · 2023-01-27T10:21:34Z

So let's go by checking if this is a callable. It seems the most appropriate thing to do.

See discussions and decision of: scikit-learn#25467 Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: From: Tim Head <betatim@gmail.com>

github-actions bot added the Needs Triage Issue requires triage label Jan 24, 2023

glemaitre removed the Needs Triage Issue requires triage label Jan 24, 2023

lesteve mentioned this issue Jan 27, 2023

TST Allow callables as valid parameter regarding cloning estimator #25498

Merged

lesteve closed this as completed in #25498 Jan 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make check_parameters_default_constructible more lenient regarding accepted types #25467

Make check_parameters_default_constructible more lenient regarding accepted types #25467

glemaitre commented Jan 24, 2023 •

edited

Loading

glemaitre commented Jan 24, 2023

shamzos commented Jan 25, 2023

lesteve commented Jan 25, 2023

betatim commented Jan 25, 2023

glemaitre commented Jan 25, 2023 •

edited

Loading

betatim commented Jan 25, 2023

lesteve commented Jan 25, 2023 •

edited

Loading

betatim commented Jan 26, 2023

lesteve commented Jan 26, 2023

glemaitre commented Jan 27, 2023 •

edited

Loading

Make check_parameters_default_constructible more lenient regarding accepted types #25467

Make check_parameters_default_constructible more lenient regarding accepted types #25467

Comments

glemaitre commented Jan 24, 2023 • edited Loading

glemaitre commented Jan 24, 2023

shamzos commented Jan 25, 2023

lesteve commented Jan 25, 2023

betatim commented Jan 25, 2023

glemaitre commented Jan 25, 2023 • edited Loading

betatim commented Jan 25, 2023

lesteve commented Jan 25, 2023 • edited Loading

betatim commented Jan 26, 2023

lesteve commented Jan 26, 2023

glemaitre commented Jan 27, 2023 • edited Loading

glemaitre commented Jan 24, 2023 •

edited

Loading

glemaitre commented Jan 25, 2023 •

edited

Loading

lesteve commented Jan 25, 2023 •

edited

Loading

glemaitre commented Jan 27, 2023 •

edited

Loading