Skip to content

Make check_parameters_default_constructible more lenient regarding accepted types #25467

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
glemaitre opened this issue Jan 24, 2023 · 10 comments · Fixed by #25498
Closed

Make check_parameters_default_constructible more lenient regarding accepted types #25467

glemaitre opened this issue Jan 24, 2023 · 10 comments · Fixed by #25498

Comments

@glemaitre
Copy link
Member

glemaitre commented Jan 24, 2023

With the nightly build of numpy, we have the following error raised:

               allowed_types = {
                    str,
                    int,
                    float,
                    bool,
                    tuple,
                    type(None),
                    type,
                    types.FunctionType,
                    joblib.Memory,
                }
                # Any numpy numeric such as np.int32.
                allowed_types.update(np.core.numerictypes.allTypes.values())
>               assert type(init_param.default) in allowed_types, (
                    f"Parameter '{init_param.name}' of estimator "
                    f"'{Estimator.__name__}' is of type "
                    f"{type(init_param.default).__name__} which is not "
                    "allowed. All init parameters have to be immutable to "
                    "make cloning possible. Therefore we restrict the set of "
                    "legal types to "
                    f"{set(type.__name__ for type in allowed_types)}."
                )
E               AssertionError: Parameter 'pooling_func' of estimator 'FeatureAgglomeration' is of type _ArrayFunctionDispatcher which is not allowed. All init parameters have to be immutable to make cloning possible. Therefore we restrict the set of legal types to {'bool_', 'int64', 'datetime64', 'complex128', 'int8', 'object_', 'bool', 'float16', 'float32', 'tuple', 'str', 'longlong', 'complex64', 'generic', 'int', 'uint64', 'float128', 'int32', 'floating', 'complex256', 'uint32', 'float', 'integer', 'timedelta64', 'type', 'uint16', 'function', 'int16', 'void', 'Memory', 'flexible', 'complexfloating', 'float64', 'unsignedinteger', 'bytes_', 'signedinteger', 'NoneType', 'number', 'uint8', 'character', 'inexact', 'ulonglong', 'str_'}.

I think this is linked with the integration of: numpy/numpy#23020

Now, the np.func are of type _ArrayFunctionDispatcher and not function anymore. This is a kind of regression but I assume that we could make our test more lenient and accept those types of input.

The only issue is that this dispatcher is private so I don't think this is a bright idea to add it to our test as is.

@github-actions github-actions bot added the Needs Triage Issue requires triage label Jan 24, 2023
@glemaitre
Copy link
Member Author

Solving this issue will solve the remaining failure of scipy-dev.

@glemaitre glemaitre removed the Needs Triage Issue requires triage label Jan 24, 2023
@shamzos
Copy link

shamzos commented Jan 25, 2023

Hello, can I work on this issue ?

@lesteve
Copy link
Member

lesteve commented Jan 25, 2023

@shamzos probably not the best issue to work on, you can look at the issues tagged "good first issue". Have a look at https://scikit-learn.org/dev/developers/contributing.html for more details.

@betatim
Copy link
Member

betatim commented Jan 25, 2023

Is there a reason why it has to be a types.FunctionType? Would it be ok if we checked if is callable()? callable doesn't check the type, but I wonder if that difference matters? Does function promise more regarding to immutability than callable?

@glemaitre
Copy link
Member Author

glemaitre commented Jan 25, 2023

Does function promise more regarding to immutability than callable?

No, but this is not the same pattern used. Right now we check if the type is in a given set of types. callable is equivalent to isinstance. So we would need to rewrite the statement.

assert type(init_param.default) in allowed_types or callable(init_param.default)

@betatim
Copy link
Member

betatim commented Jan 25, 2023

Yes, you'd have to rewrite it. But it seemed like that would be easy (in terms of making the edit) compared to understanding if this is "the right thing to do".

@lesteve
Copy link
Member

lesteve commented Jan 25, 2023

It seems like we are trying to avoid the Python gotcha with default mutable arguments but at the same time we accept functions as default arguments which are potentially mutable so ...

Allowing callables would accept an even wider class of mutable objects.

Honestly I don't know what to do about this test, maybe two main options:

  • be permissive: accept callables and trust that nothing too weird happens
  • be restrictive: do not accept functions as default arguments and follow the usual pattern of setting the default to None and then in fit transforming this None into the actual default function.

Quickly checking there are 7 estimators that use functions as default arguments (removing types.FunctionType from allowed_types and running pytest sklearn/tests/test_common.py -k default_constructible)

FAILED sklearn/tests/test_common.py::test_estimators[FeatureAgglomeration()-check_parameters_default_constructible] - AssertionError: Parameter 'pooling_f...
FAILED sklearn/tests/test_common.py::test_estimators[GenericUnivariateSelect()-check_parameters_default_constructible] - AssertionError: Parameter 'score_...
FAILED sklearn/tests/test_common.py::test_estimators[SelectFdr()-check_parameters_default_constructible] - AssertionError: Parameter 'score_func' of estim...
FAILED sklearn/tests/test_common.py::test_estimators[SelectFpr()-check_parameters_default_constructible] - AssertionError: Parameter 'score_func' of estim...
FAILED sklearn/tests/test_common.py::test_estimators[SelectFwe()-check_parameters_default_constructible] - AssertionError: Parameter 'score_func' of estim...
FAILED sklearn/tests/test_common.py::test_estimators[SelectKBest()-check_parameters_default_constructible] - AssertionError: Parameter 'score_func' of est...
FAILED sklearn/tests/test_common.py::test_estimators[SelectPercentile()-check_parameters_default_constructible] - AssertionError: Parameter 'score_func' o...

@betatim
Copy link
Member

betatim commented Jan 26, 2023

I think I'd be permissive. The main trap(?) is mutable default arguments, so things like lists that end up being shared. I'm not sure the same trap exists for functions, for functions (or callables) the problem is much worse: who knows what they are doing, independent of them being defined as default value.

Another aspect that could be relevant: is there a constraint on the default arguments of an estimator because of how joblib works? I'm thinking of things like Python lambda's not being picklable and such. Any thoughts?

@lesteve
Copy link
Member

lesteve commented Jan 26, 2023

Another aspect that could be relevant: is there a constraint on the default arguments of an estimator because of how joblib works? I'm thinking of things like Python lambda's not being picklable and such. Any thoughts?

I don't think so but I could be wrong. Also I am quite sure that there are common tests checking that estimators can be pickled which would catch such an issue.

@glemaitre
Copy link
Member Author

glemaitre commented Jan 27, 2023

So let's go by checking if this is a callable. It seems the most appropriate thing to do.

jjerphan added a commit to jjerphan/scikit-learn that referenced this issue Jan 27, 2023
See discussions and decision of:
scikit-learn#25467

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Co-authored-by: From: Tim Head <betatim@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants