Skip to content

Rebase of barnes-hut TSNE #4887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 20, 2015
Merged

Rebase of barnes-hut TSNE #4887

merged 3 commits into from
Sep 20, 2015

Conversation

amueller
Copy link
Member

Trying to rebase #4025.

@amueller amueller force-pushed the tsne_fixes branch 2 times, most recently from 57c3f06 to 8fabe50 Compare June 22, 2015 22:14
@amueller amueller force-pushed the tsne_fixes branch 4 times, most recently from 4dafbbb to 2366fcb Compare August 3, 2015 20:47
@amueller
Copy link
Member Author

amueller commented Aug 3, 2015

If this implements a transform method, it should inherit from TransformerMixin which will give it additional tests.

@gatapia
Copy link

gatapia commented Aug 6, 2015

I'm trying this pull request out just to see if it does reduce memory and am not having any luck. This is the code I'm using, you'll see that my data is pretty conservative:

X = manifold.TSNE(2, method='barnes_hut').fit_transform(np.random.random(size=(40000, 2)))

This eats up 32GB ram in short order, I'm pretty sure I applied the pull request correctly otherwise method='barnes_hut' would throw an error no?

I was under the impression that the barnes hut implementation was much more memory friendly, am I correct?

Edit: this is on Windows 64bit

@amueller
Copy link
Member Author

amueller commented Aug 6, 2015

@gatapia in principle yes, this PR unfortunately not. There is still work to be done to use sparse matrices to reduce the memory footprint. In the current form, it "only" speeds up the computation about one or two orders of magnitude. Any help with going to sparse matrices is very welcome.

@amueller amueller force-pushed the tsne_fixes branch 2 times, most recently from 7ebf045 to 141cf52 Compare September 9, 2015 21:48
@amueller amueller added this to the 0.17 milestone Sep 9, 2015
@amueller
Copy link
Member Author

amueller commented Sep 9, 2015

I'd really like to have this in 0.17. I removed the transform method because I'm not sure what the API should be there. While the current version could be improved, it gives 10x-100x speedups. I'll post some benchmarks soon.

@amueller
Copy link
Member Author

amueller commented Sep 9, 2015

digits_tsne_barnes_huts
digits_tsne_exact
manifold_example
manifold_example_barnes_huts

@amueller
Copy link
Member Author

amueller commented Sep 9, 2015

On 5 classes of the digits its a factor of 2, on all classes it's a factor of 3, for the s-curve it's a factor of 10.

@amueller
Copy link
Member Author

amueller commented Sep 9, 2015

Not sure I'm entirely convinced the new example says a lot, though.

@amueller
Copy link
Member Author

checks pass :) I'm thinking about polishing or removing the faces example, otherwise this should be good.

@ogrisel
Copy link
Member

ogrisel commented Sep 11, 2015

I'm thinking about polishing or removing the faces example, otherwise this should be good.

+1: either simplify it to only keep the best / fastest performing model or just remove it (a polished version can be re-added later).

@ogrisel
Copy link
Member

ogrisel commented Sep 11, 2015

In particular it's important to have some explanation that analyze and explain the obtained results: are they good, is the identity or the pose of faces preserved by the 2D embedding? Here it's not very clear to me.

@amueller
Copy link
Member Author

what do you think about the rest?

@ogrisel
Copy link
Member

ogrisel commented Sep 11, 2015

I think it's already a good improvement upon what we currently have in master. We can work on the memory usage fix in another PR.

@ogrisel
Copy link
Member

ogrisel commented Sep 11, 2015

+1 for merge once the face example is removed (or simplified and explained).

@amueller
Copy link
Member Author

OK I'll remove it for now and merge.

cemoody and others added 2 commits September 11, 2015 13:19
Renamed example file & added comments

FEAT Barnes-Hut t-SNE

Vectorized the math calls -- 3x faster on neg grad

Draft of n-dimensional barnes-hut; failing tests

Tests pass, error being computed in cython

Added error calculation to pos gradient

Changed cases where points are very close to be clearer

Clearer variable names
amueller added a commit that referenced this pull request Sep 20, 2015
@amueller amueller merged commit 770d089 into scikit-learn:master Sep 20, 2015
@amueller amueller deleted the tsne_fixes branch May 19, 2017 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants