[MRG-0] Make LabelEncoder more friendly to new labels #3483

mjbommar · 2014-07-24T19:00:52Z

This is a final, cleanly rebased version of PR 3243 (#3243) incorporating discussions.

Summary:

This PR intends to make preprocessing.LabelEncoder more friendly for production/pipeline usage by adding a new_labels constructor argument.

Instead of always raising ValueError for unseen/new labels in transform, LabelEncoder may be initialized with new_labels as:

"raise": current behavior, i.e., raise ValueError; to remain default behavior
"update": update classes with new IDs [N, ..., N+m-1] for m new labels and assign
an integer value: set newly seen labels to have fixed value corresponding to this integer value

N.B.: .classes_ is not a property to support the new_labels="update" behavior.

Tests and documentation updates included.

arjoly · 2014-07-24T20:20:31Z

You have a rebase issue. I got once the problem. If I remember, I think that I have fixed
the issue using the following strategy

git checkout master
git fetch upstream
git rebase upstream/master
git checkout new-branch
git rebase master
git push -f

Hope it helps.

…ls seen.

mjbommar · 2014-07-24T20:31:24Z

@arjoly, appears to have worked. Thanks for the recommendation and sorry for the mistake.

coveralls · 2014-07-24T20:42:11Z

Coverage decreased (-0.0%) when pulling 866e939 on mjbommar:label-encoder-unseen-final into 376ac51 on scikit-learn:master.

arjoly · 2014-07-24T20:57:34Z

No problem, I ran once in that issue and this was frustrating. I am happy that it works for you!

coveralls · 2014-07-24T20:58:46Z

Coverage decreased (-0.0%) when pulling da4cafb on mjbommar:label-encoder-unseen-final into 376ac51 on scikit-learn:master.

mblondel · 2014-07-25T01:45:37Z

N.B.: Direct access to .classes_ should be replaced with .get_classes() in order to properly handle the new_labels="update".

But it's important to have a consistent API across the scikit. Could you use a property instead?

coveralls · 2014-07-25T04:16:07Z

Coverage decreased (-0.01%) when pulling 0f3e3d3 on mjbommar:label-encoder-unseen-final into 376ac51 on scikit-learn:master.

mjbommar · 2014-07-25T04:16:58Z

@mblondel , thanks for the recommendation. Implemented as suggested.

mjbommar · 2014-08-03T15:48:55Z

@arjoly, @jnothman , anything I can do to make this easier for you to review?

arjoly · 2014-08-11T07:35:12Z

sklearn/preprocessing/label.py

+        - If ``"raise"``, then raise ValueError.
+        - If ``"update"``, then re-map the new labels to
+          classes ``[N, ..., N+m-1]``, where ``m`` is the number of new labels.
+        - If an integer value is passed, then re-label with this value.


Could it work with string label (string)?

jnothman · 2016-04-28T04:01:38Z

@mjbommar, should we expect you won't be completing this any time soon and label it "needs contributor" for someone to adopt? I will do so, but you should say if you'd rather complete it.

mjbommar · 2016-04-28T15:22:00Z

@jnothman, my recollection is fuzzy, but I think this issue was primarily blocked by design disagreements. If we can come to an agreement about desired behavior, I could see how easily the work could be completed and merged into master.

mjbommar mentioned this pull request Jul 24, 2014

[WIP] Make LabelEncoder more friendly to new labels #3243

Closed

mjbommar added 7 commits July 24, 2014 16:28

Clean commit for PR 3243

7fabf54

Updating docstrings

fac95e1

Adding test coverage and support for inverse_transform after new labe…

0d3851f

…ls seen.

Updating documentation examples

4ac58af

Updating docs

e314ed6

Improving error-handling for inverse_transform

751b585

Improving docstrings

866e939

python3 dict.iteritems deprecation fix

da4cafb

Switching from classes_/get_classes() to classes_ property.

0f3e3d3

mjbommar changed the title ~~Make LabelEncoder more friendly to new labels~~ [MRG-0] Make LabelEncoder more friendly to new labels Jul 25, 2014

arjoly reviewed Aug 11, 2014
View reviewed changes

larsmans force-pushed the master branch from 58a55ad to 4b82379 Compare August 25, 2014 21:50

mjbommar mentioned this pull request Aug 27, 2014

[WIP] Sparse and Multioutput LabelEncoder #3592

Closed

7 tasks

hamsal mentioned this pull request Aug 28, 2014

[MRG] Label Encoder Unseen Labels #3599

Closed

5 tasks

MechCoder force-pushed the master branch from 6deaea0 to 3f49cee Compare November 3, 2014 12:36

amueller added the Waiting for Reviewer label Dec 10, 2015

jnothman mentioned this pull request Apr 28, 2016

LabelBinarizer single label case #6723

Open

jnothman added the Need Contributor label Apr 28, 2016

mjbommar closed this Jul 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG-0] Make LabelEncoder more friendly to new labels #3483

[MRG-0] Make LabelEncoder more friendly to new labels #3483

Uh oh!

mjbommar commented Jul 24, 2014

Uh oh!

arjoly commented Jul 24, 2014

Uh oh!

mjbommar commented Jul 24, 2014

Uh oh!

coveralls commented Jul 24, 2014

Uh oh!

arjoly commented Jul 24, 2014

Uh oh!

coveralls commented Jul 24, 2014

Uh oh!

mblondel commented Jul 25, 2014

Uh oh!

coveralls commented Jul 25, 2014

Uh oh!

mjbommar commented Jul 25, 2014

Uh oh!

mjbommar commented Aug 3, 2014

Uh oh!

arjoly Aug 11, 2014

Uh oh!

jnothman commented Apr 28, 2016

Uh oh!

mjbommar commented Apr 28, 2016 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

[MRG-0] Make LabelEncoder more friendly to new labels #3483

[MRG-0] Make LabelEncoder more friendly to new labels #3483

Uh oh!

Conversation

mjbommar commented Jul 24, 2014

Summary:

Uh oh!

arjoly commented Jul 24, 2014

Uh oh!

mjbommar commented Jul 24, 2014

Uh oh!

coveralls commented Jul 24, 2014

Uh oh!

arjoly commented Jul 24, 2014

Uh oh!

coveralls commented Jul 24, 2014

Uh oh!

mblondel commented Jul 25, 2014

Uh oh!

coveralls commented Jul 25, 2014

Uh oh!

mjbommar commented Jul 25, 2014

Uh oh!

mjbommar commented Aug 3, 2014

Uh oh!

arjoly Aug 11, 2014

Choose a reason for hiding this comment

Uh oh!

jnothman commented Apr 28, 2016

Uh oh!

mjbommar commented Apr 28, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mjbommar commented Apr 28, 2016 •

edited

Loading