Skip to content

Make labelencoder use hashtable #7455

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

themrmax
Copy link
Contributor

@themrmax themrmax commented Sep 19, 2016

Reference Issue

Fixes #7432

What does this implement/fix? Explain your changes.

Use a dict lookup instead of np.sortedsearch for the label lookups. Also removed the custom fit_transform method, since the dict lookup is quicker than the np.unique(return_index=True) trick that was used here before.

Any other comments?

@themrmax themrmax closed this Sep 19, 2016
@jnothman
Copy link
Member

Not sure why you closed, but we can't have pandas as a dependency.

On 19 September 2016 at 17:06, themrmax notifications@github.com wrote:

Closed #7455 #7455.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#7455 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz65BEw_OKo5RNkAJbiFglEKCVwxtHks5qrjR7gaJpZM4KAMEg
.

@themrmax
Copy link
Contributor Author

@jnothman i closed because i noticed all the commits from my other unmerged pull request (can you take a look at that again btw? i fixed the issue with the sentinel.) how come we can't use pandas as a dependency?

@jnothman
Copy link
Member

i noticed all the commits from my other unmerged pull request

I'm not sure what that means.

how come we can't use pandas as a dependency?

Why should we? Minimising dependencies is generally a good idea to avoid issues with versioning, compatibility testing, installation overhead, etc...

@themrmax
Copy link
Contributor Author

the pull request has 13 commits on it but i only want the last one. the reason i would use pandas is because pandas.algos and pandas.core.base has some very fast implementations of hashtables, and unique, 10-20x speedup on numpy and, 3x speedup on the my simplistic implemenation using dict

@jnothman
Copy link
Member

Sometimes you have to compromise or reimplement.

On 20 September 2016 at 10:09, themrmax notifications@github.com wrote:

the pull request has 13 commits on it but i only want the last one. the
reason i would use pandas is because pandas.algos and pandas.core.base
has some very fast implementations of hashtables, and unique, 10-20x
speedup on numpy and, 3x speedup on the my simplistic implemenation using
dict


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7455 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz64YCficjRFbGm91jALN5A3kRKXWuks5qryQ6gaJpZM4KAMEg
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make LabelEncoder use a hash table
2 participants