-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG-0] Make LabelEncoder more friendly to new labels #3483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
You have a rebase issue. I got once the problem. If I remember, I think that I have fixed
Hope it helps. |
@arjoly, appears to have worked. Thanks for the recommendation and sorry for the mistake. |
No problem, I ran once in that issue and this was frustrating. I am happy that it works for you! |
But it's important to have a consistent API across the scikit. Could you use a property instead? |
@mblondel , thanks for the recommendation. Implemented as suggested. |
- If ``"raise"``, then raise ValueError. | ||
- If ``"update"``, then re-map the new labels to | ||
classes ``[N, ..., N+m-1]``, where ``m`` is the number of new labels. | ||
- If an integer value is passed, then re-label with this value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it work with string label (string)?
@mjbommar, should we expect you won't be completing this any time soon and label it "needs contributor" for someone to adopt? I will do so, but you should say if you'd rather complete it. |
@jnothman, my recollection is fuzzy, but I think this issue was primarily blocked by design disagreements. If we can come to an agreement about desired behavior, I could see how easily the work could be completed and merged into master. |
This is a final, cleanly rebased version of PR 3243 (#3243) incorporating discussions.
Summary:
This PR intends to make
preprocessing.LabelEncoder
more friendly for production/pipeline usage by adding anew_labels
constructor argument.Instead of always raising
ValueError
for unseen/new labels in transform,LabelEncoder
may be initialized with new_labels as:"raise"
: current behavior, i.e., raiseValueError
; to remain default behavior"update"
: update classes with new IDs[N, ..., N+m-1]
for m new labels and assignN.B.:
.classes_
is not a property to support thenew_labels="update"
behavior.Tests and documentation updates included.