t-SNE has inefficient memory structure #7089

amueller · 2016-07-26T19:05:38Z

The barnes-hut implementation of t-SNE currently uses a dense matrix representation for the distances, but should be using a sparse matrix.
Discussion see #4025

shanglun · 2016-07-30T16:06:52Z

Does this still need a contributor? I'll work on this issue.

amueller · 2016-08-03T19:58:54Z

It does. It's somewhat non-trivial but you're very welcome to give it a go!

shanglun · 2016-08-04T13:00:30Z

Ok, I will give it a go then. Glad to be helping on a 1.0 milestone!

zhexuany · 2016-08-08T15:30:26Z

@shanglun Have you made any progress? I am also willing to contribute some code for this issue. :)

shanglun · 2016-08-08T15:40:17Z

Yes, I am still looking into it and I am making progress. I will reach out
on this thread if I need help. Thank you!

On Aug 8, 2016 11:31 AM, "Zhexuan Zachary Yang" notifications@github.com
wrote:

@shanglun https://github.com/shanglun Have you made any progress? I am
also willing to contribute some code for this issue. :)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7089 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKNQ_OTWUqjHenuZQ1XTY6MMQGhrtYNIks5qd0vEgaJpZM4JVevI
.

zhexuany · 2016-08-08T15:47:43Z

Great. Please let me know if you need any help. :)

On Aug 8, 2016, at 10:41 AM, Sean Wang notifications@github.com wrote:

Yes, I am still looking into it and I am making progress. I will reach out
on this thread if I need help. Thank you!

On Aug 8, 2016 11:31 AM, "Zhexuan Zachary Yang" notifications@github.com
wrote:

@shanglun https://github.com/shanglun Have you made any progress? I am
also willing to contribute some code for this issue. :)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7089 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKNQ_OTWUqjHenuZQ1XTY6MMQGhrtYNIks5qd0vEgaJpZM4JVevI
.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub #7089 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AMtCBIeecpI5SngxTwEJ9kN6dDnjo_4eks5qd04MgaJpZM4JVevI.

DataWaveAnalytics · 2016-08-10T05:38:08Z

In this work, the authors proposed a new method for visualization of high-dimensional data (LargeVis) using ideas of a previous study (LINE). I did some test and the method works really fast. They proposed an interesting algorithm to build a pretty accurate kNN graph. In the experiment sections, the authors mentioned they parallelize parts of t-SNE, so maybe you could ask the authors to contribute.

BTW, I'm looking forward to see LINE and LargeVis in future versions ;)

shanglun · 2016-08-10T12:16:39Z

Interesting. I think these studies might be out of scope for this
particular ticket, but it would be interesting to investigate further and
include additional methods in the future. Delving into the implementation
details of the t-Sne in the past few days has been quite interesting.

Was your benchmarking written in Python? Or account language?

On Aug 10, 2016 1:39 AM, "Claudio Sanhueza" notifications@github.com
wrote:

In this work http://dl.acm.org/citation.cfm?id=2883041, the authors
proposed a new method for visualization of high-dimensional data (LargeVis
https://github.com/lferry007/LargeVis) using ideas of a previous study
https://arxiv.org/abs/1503.03578 (LINE
https://github.com/tangjianpku/LINE). I did some test and the method
works really fast. They proposed an interesting algorithm to build a pretty
accurate kNN graph. In the experiment sections, the authors mentioned they
parallelize parts of t-SNE, so maybe you could ask the authors to
contribute.

BTW, I'm looking forward to see LINE and LargeVis in future versions ;)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7089 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKNQ_JIt5qV51j87yQ8B5qA1jKW2ih3Aks5qeWP3gaJpZM4JVevI
.

DataWaveAnalytics · 2016-08-11T02:01:23Z

I did the tests in C++.

What are you specifically modifying in t-SNE? Memory management?

shanglun · 2016-08-11T02:11:03Z

Yeah, still investigating and experimenting, but based on discussions in
linked thread we will probably move to a sparse matrix implementation and
make sure that we are smart enough about the data to handle the edge cases.

If you have the C++ code I'd very much love to collaborate on a new
implementation. If you are already good with Python I think there wouldn't
be any barrier to you implementing it and issuing a PR, however.

On Aug 10, 2016 10:02 PM, "Claudio Sanhueza" notifications@github.com
wrote:

I did the tests in C++.

What are you specifically modifying in t-SNE? Memory management?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7089 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKNQ_ECUJcqkoC_cxmglkc9Et2AOQIRjks5qeoKkgaJpZM4JVevI
.

shanglun · 2016-08-11T02:22:04Z

If you just want to see another implementation option in t-Sne you can
point me to your C++ code. I'll read the papers, look at the C++ code and
implement the python version and issue a PR if it's a good fit for the
library. Always look forward to write interesting code.

On Aug 10, 2016 10:11 PM, "Shanglun Wang" shanglunwang@gmail.com wrote:

Yeah, still investigating and experimenting, but based on discussions in
linked thread we will probably move to a sparse matrix implementation and
make sure that we are smart enough about the data to handle the edge cases.

If you have the C++ code I'd very much love to collaborate on a new
implementation. If you are already good with Python I think there wouldn't
be any barrier to you implementing it and issuing a PR, however.

On Aug 10, 2016 10:02 PM, "Claudio Sanhueza" notifications@github.com
wrote:

I did the tests in C++.

What are you specifically modifying in t-SNE? Memory management?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7089 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKNQ_ECUJcqkoC_cxmglkc9Et2AOQIRjks5qeoKkgaJpZM4JVevI
.

DataWaveAnalytics · 2016-08-11T02:38:16Z

The original implementation of t-SNE was done in C++. You can find the original sources here.

I just adapt:

Include parameters in command line.
Read data points from a text file.
Small memory management issues.

Originally, all the input data is contained in a specific formatted binary file created by the wrappers.

shanglun · 2016-08-11T02:44:17Z

Oh, I see, the improvements you mentioned in the original post was just
about parallelization of the t-Sne, and the paper wasn't really about that.

I was under the impression that there was some new implementation of the
t-Sne that was faster.

Let's follow up when I finish this ticket, and we can look into optimizing
the t-Sne and implementing other algorithms that might be useful, such as
largeVis.

On Aug 10, 2016 10:39 PM, "Claudio Sanhueza" notifications@github.com
wrote:

The original implementation of t-SNE was done in C++. You can find the
original sources here https://github.com/lvdmaaten/bhtsne/.

I just adapt:

Include parameters in command line.

Read data points from a text file.

Small memory management issues.

Originally, all the input data is contained in a specific formatted binary
file created by the wrappers.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7089 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKNQ_BjICwiF1f49DgJZ6vCOg6Ct83eCks5qeotJgaJpZM4JVevI
.

shanglun · 2016-08-18T04:03:00Z

Not very happy about this but I have hit a busy period with work and will
not be able to devote enough time to ticket for a short bit. If someone
would like to work on this ticket please feel free.

On Aug 10, 2016 10:44 PM, "Shanglun Wang" shanglunwang@gmail.com wrote:

Oh, I see, the improvements you mentioned in the original post was just
about parallelization of the t-Sne, and the paper wasn't really about that.

I was under the impression that there was some new implementation of the
t-Sne that was faster.

Let's follow up when I finish this ticket, and we can look into optimizing
the t-Sne and implementing other algorithms that might be useful, such as
largeVis.

On Aug 10, 2016 10:39 PM, "Claudio Sanhueza" notifications@github.com
wrote:

The original implementation of t-SNE was done in C++. You can find the
original sources here https://github.com/lvdmaaten/bhtsne/.

I just adapt:

Include parameters in command line.

Read data points from a text file.

Small memory management issues.

Originally, all the input data is contained in a specific formatted
binary file created by the wrappers.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7089 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKNQ_BjICwiF1f49DgJZ6vCOg6Ct83eCks5qeotJgaJpZM4JVevI
.

jnothman · 2017-05-24T14:44:59Z

More discussion of this issue and its potential solution over at #8582

jnothman · 2017-05-27T11:52:47Z

I'm so annoyed by this false advertising that I'm tempted to fix it myself. But given my lack of availability, I'm going to mark it for the sprint and hope that someone in Paris can give it a go.

DataWaveAnalytics · 2017-05-28T02:02:27Z

Maybe this can help to improve things.
https://github.com/DmitryUlyanov/Multicore-TSNE

jnothman · 2017-05-28T05:13:16Z

It is not so hard to improve our implementation. It just needs someone confident and available to do it. That implementation may remain faster; we could consider adopting its code, with permission, but not with a cffi dependency (and perhaps not other dependencies it builds on). On 28 May 2017 12:02 pm, "Claudio Sanhueza" <notifications@github.com> wrote: Maybe this can help to improve things. https://github.com/DmitryUlyanov/Multicore-TSNE — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#7089 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz673pY211H63aDBjzJj0yMxb_p1H0ks5r-NW0gaJpZM4JVevI> .

jnothman · 2017-07-13T00:19:17Z

Fixed in #9032

amueller added Enhancement Need Contributor labels Jul 26, 2016

amueller added this to the 1.0 milestone Jul 26, 2016

This was referenced Jul 26, 2016

[MRG] Cemoody/bhtsne Barnes-Hut t-SNE #4025

Closed

Python crashes when calculating large t-SNE #4619

Closed

sonjageorgievska mentioned this issue Aug 22, 2016

Currently there is no implementation that supports large matrices (for a flat, non-hierarchical embedding) sonjageorgievska/Embeddings#1

Closed

jnothman mentioned this issue May 24, 2017

t-SNE results in errors when reducing dim to default 2 #8582

Closed

lesteve added the Sprint label Jun 1, 2017

tomMoral mentioned this issue Jun 7, 2017

[MRG+1] Reducing t-SNE memory usage #9032

Merged

5 tasks

jnothman closed this as completed Jul 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

t-SNE has inefficient memory structure #7089

t-SNE has inefficient memory structure #7089

amueller commented Jul 26, 2016 •

edited

Loading

shanglun commented Jul 30, 2016

amueller commented Aug 3, 2016

shanglun commented Aug 4, 2016

zhexuany commented Aug 8, 2016

shanglun commented Aug 8, 2016

zhexuany commented Aug 8, 2016

DataWaveAnalytics commented Aug 10, 2016

shanglun commented Aug 10, 2016

DataWaveAnalytics commented Aug 11, 2016

shanglun commented Aug 11, 2016

shanglun commented Aug 11, 2016

DataWaveAnalytics commented Aug 11, 2016

shanglun commented Aug 11, 2016

shanglun commented Aug 18, 2016

jnothman commented May 24, 2017

jnothman commented May 27, 2017

DataWaveAnalytics commented May 28, 2017

jnothman commented May 28, 2017 via email

jnothman commented Jul 13, 2017

t-SNE has inefficient memory structure #7089

t-SNE has inefficient memory structure #7089

Comments

amueller commented Jul 26, 2016 • edited Loading

shanglun commented Jul 30, 2016

amueller commented Aug 3, 2016

shanglun commented Aug 4, 2016

zhexuany commented Aug 8, 2016

shanglun commented Aug 8, 2016

zhexuany commented Aug 8, 2016

DataWaveAnalytics commented Aug 10, 2016

shanglun commented Aug 10, 2016

DataWaveAnalytics commented Aug 11, 2016

shanglun commented Aug 11, 2016

shanglun commented Aug 11, 2016

DataWaveAnalytics commented Aug 11, 2016

shanglun commented Aug 11, 2016

shanglun commented Aug 18, 2016

jnothman commented May 24, 2017

jnothman commented May 27, 2017

DataWaveAnalytics commented May 28, 2017

jnothman commented May 28, 2017 via email

jnothman commented Jul 13, 2017

amueller commented Jul 26, 2016 •

edited

Loading