-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MAINT validate parameter in sklearn.preprocessing._encoders #23579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT validate parameter in sklearn.preprocessing._encoders #23579
Conversation
… successfully run tests
…tic/scikit-learn into ohe_validate_params
I'm currently stuck, because _InstancesOf doesn't support numpy data types, see #23599 |
…ass numpy dtypes as string and internally converts them
I think we leave off validation from sklearn.preprocessing import OrdinalEncoder
enc = OrdinalEncoder(dtype="i4")
X = [['dog'], ['cat'], ['snake']]
enc.fit_transform(X)
# array([[1],
# [0],
# [2]], dtype=int32) We can let NumPy handle bad dtypes. For example, this will raise already: from sklearn.preprocessing import OrdinalEncoder
enc = OrdinalEncoder(dtype="i423e12")
X = [['dog'], ['cat'], ['snake']]
enc.fit_transform(X)
# TypeError: data type 'i423e12' not understood For reference, what NumPy considers a valid "dtype" is quite complex according to NumPy's typing system |
Since the parameter validation requires at least one constraint for each parameter. What is the best way to approach the |
I think we can use |
Unfortunately this collides with
Maybe this check should be disabled for |
The current design of |
@thomasjpfan I guess we can add a special case |
Okay, then I'll resume working on this, when the PR is done :) |
ea27732
to
e3f230b
Compare
@Diadochokinetic I directly pushed some changes to take into account the recent improvements in the validation mechanism. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
Thanks @Diadochokinetic |
…learn#23579) Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr>
Reference Issues/PRs
Make all estimators use _validate_params #23462
What does this implement/fix? Explain your changes.
Implements _validate_params for sklear.preprocessing._encoders. Shared parameters will be implemented in the _BaseEncoder.
Any other comments?
This is my first contribution. Tips and Feedback are highly appreciated.