-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
t-SNE results in errors when reducing dim to default 2 #8582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Please post a stand-alone snippet reproducing the problem and the full stack-trace you get. |
@lesteve thanks for the reply
The above example was run on a PC with 24 cores and 64GB of ram and still get memory errors. |
@lesteve
|
It would be best if you could provide or generate some data that causes the
failure.
…On 15 March 2017 at 08:08, kirk86 ***@***.***> wrote:
@lesteve <https://github.com/lesteve>
Notice that also Isomap produces memory error even when running with
default parameters.
In [57]: X_train_iso = Isomap(n_jobs=-1).fit_transform(X_train)
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-57-26dde712b79e> in <module>()
----> 1 X_train_iso = Isomap(n_jobs=-1).fit_transform(X_train)
/home/user/miniconda2/lib/python2.7/site-packages/sklearn/manifold/isomap.pyc in fit_transform(self, X, y)
178 X_new: array-like, shape (n_samples, n_components)
179 """
--> 180 self._fit_transform(X)
181 return self.embedding_
182
/home/user/miniconda2/lib/python2.7/site-packages/sklearn/manifold/isomap.pyc in _fit_transform(self, X)
122 G *= -0.5
123
--> 124 self.embedding_ = self.kernel_pca_.fit_transform(G)
125
126 def reconstruction_error(self):
/home/jmitro/miniconda2/lib/python2.7/site-packages/sklearn/decomposition/kernel_pca.pyc in fit_transform(self, X, y, **params)
258 X_new: array-like, shape (n_samples, n_components)
259 """
--> 260 self.fit(X, **params)
261
262 X_transformed = self.alphas_ * np.sqrt(self.lambdas_)
/home/user/miniconda2/lib/python2.7/site-packages/sklearn/decomposition/kernel_pca.pyc in fit(self, X, y)
233 Returns the instance itself.
234 """
--> 235 X = check_array(X, accept_sparse='csr', copy=self.copy_X)
236 K = self._get_kernel(X)
237 self._fit_transform(K)
/home/user/miniconda2/lib/python2.7/site-packages/sklearn/utils/validation.pyc in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
380 force_all_finite)
381 else:
--> 382 array = np.array(array, dtype=dtype, order=order, copy=copy)
383
384 if ensure_2d:
MemoryError:
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#8582 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6-qBbRaOWlW1QWB9wmQ9U93tlTXZks5rlwHUgaJpZM4Mbi9z>
.
|
@jnothman the data is simply the mnist dataset.
otherwise you'd have to download and load the dataset manually |
So you don't need to transform it to get the memory error?
…On 15 March 2017 at 10:27, kirk86 ***@***.***> wrote:
@jnothman <https://github.com/jnothman> the data is simply the mnist
dataset.
if you have keras installed is as simple as
from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
otherwise you'd have to download and load the dataset manually
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8582 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6zufh5_k8ogN5TJ-w6BbS1UeoSooks5rlyJugaJpZM4Mbi9z>
.
|
@jnothman the only thing that I've done is
Then feed that to isomap or tsne |
So when @lesteve suggested you post a standalone snippet, he hoped you would write: from sklearn.manifold import TSNE
from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(-1, 784)
X_train = X_train / 255.
X_train_reduced = TSNE(n_components=2, random_state=0).fit_transform(X_train) Yes, this blows up in memory consumption. |
The line where the traceback reports the Can the implementation be made to have a lower memory consumption? |
I am a bit surprised to see that |
@jnothman I'm not really aware of the internals of sklearn but would you say that roughly the same amount of memory is also required if I use |
Isomap definitely has an n**2 floats consumption too. I can't see in LLE whether this is necessarily the case. I'm no expert on manifold learning. I think in all cases it would be a good idea to transform only a sample of the dataset. |
@jnothman thanks a lot. |
So this makes the sklearn TSNE broken? Other implementations work without this in issue. |
@rjurney The algorithms are working. However, if your input is huge and your memory is small in comparison, it will lead to a |
Would increasing the swap-size fix it? |
I doubt it. Sampling your dataset is a much more appropriate solution,
presuming you're trying to visualise something.
…On 21 March 2017 at 18:29, Abhishek Bhatia ***@***.***> wrote:
Would increasing the swap fix it?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8582 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz65CCnoDmxvaZAFEyfIQJzPp7jPNAks5rn3xegaJpZM4Mbi9z>
.
|
my two cents is that if you don't split the data in a appropriate manner so that it keeps the classes balanced you might end up with very unexpected results. |
@jnothman barnes-hut t-SNE should run in O(NlogN) time and O(N) memory -- (see Section 1. of paper https://arxiv.org/abs/1301.3342) |
@artcg indeed In regards to your 2nd question I would say that what @jnothman and others have said about data size is true. Therefore, regardless of |
@kirk86 |
although I'm not very familiar with this code, i think it's quite clear
that our implementation still has O(N^2) memory requirements, though it may
not be hard to reduce that to O(N log N). Without not looking at the paper,
I don't understand how one can perform exact nearest neighbours
calculations without that memory cost.
On 23 Mar 2017 6:07 am, "Arthur Goldberg" <notifications@github.com> wrote:
@kirk86 <https://github.com/kirk86>
(Not related to this issue but to solve your personal problem I reckon you
could implement t-SNE in tensorflow pretty easily which would allow you to
use multiple cores.... Might be a fun project)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8582 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6xodQUAg3ouJx0kwMprTew7if-Gfks5roXF6gaJpZM4Mbi9z>
.
|
Instead of finding exact nearest neighbours, the Barnes-Hut uses a sparse approximate, it basically says for a point p, we find the similarities to the N nearest neighbours, and for all neighbours further away we call the similarity 0.
So with a sparse representation for n points, we need to compute N * n similarities (N is constant so overall O(n) ).
…On 22 Mar 2017, at 20:46, Joel Nothman ***@***.***> wrote:
although I'm not very familiar with this code, i think it's quite clear
that our implementation still has O(N^2) memory requirements, though it may
not be hard to reduce that to O(N log N). Without not looking at the paper,
I don't understand how one can perform exact nearest neighbours
calculations without that memory cost.
On 23 Mar 2017 6:07 am, "Arthur Goldberg" ***@***.***> wrote:
@kirk86 <https://github.com/kirk86>
(Not related to this issue but to solve your personal problem I reckon you
could implement t-SNE in tensorflow pretty easily which would allow you to
use multiple cores.... Might be a fun project)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8582 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6xodQUAg3ouJx0kwMprTew7if-Gfks5roXF6gaJpZM4Mbi9z>
.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
sorry, you're right. we could have a theoretical O(N) even if we have to
look up each in an O(logN) data structure to find nearest neighbours. The
implementation is definitely not kind in memory. PR using kneighbors_graph
is welcome. I also suggest that we offer support for passing such a sparse
graph when in precomputed mode as we do for dbscan
…On 23 Mar 2017 7:56 am, "Arthur Goldberg" ***@***.***> wrote:
Instead of finding exact nearest neighbours, the Barnes-Hut uses a sparse
approximate, it basically says for a point p, we find the similarities to
the N nearest neighbours, and for all neighbours further away we call the
similarity 0.
So with a sparse representation for n points, we need to compute N * n
similarities (N is constant so overall O(n) ).
On 22 Mar 2017, at 20:46, Joel Nothman ***@***.***> wrote:
> although I'm not very familiar with this code, i think it's quite clear
> that our implementation still has O(N^2) memory requirements, though it
may
> not be hard to reduce that to O(N log N). Without not looking at the
paper,
> I don't understand how one can perform exact nearest neighbours
> calculations without that memory cost.
>
> On 23 Mar 2017 6:07 am, "Arthur Goldberg" ***@***.***>
wrote:
>
> @kirk86 <https://github.com/kirk86>
> (Not related to this issue but to solve your personal problem I reckon
you
> could implement t-SNE in tensorflow pretty easily which would allow you
to
> use multiple cores.... Might be a fun project)
>
> —
> You are receiving this because you were mentioned.
>
> Reply to this email directly, view it on GitHub
> <https://github.com/scikit-learn/scikit-learn/issues/
8582#issuecomment-288506918>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/
AAEz6xodQUAg3ouJx0kwMprTew7if-Gfks5roXF6gaJpZM4Mbi9z>
> .
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or mute the thread.
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8582 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz65VzFXhWPApmeD4kX0jj8AsBgFHQks5roYsOgaJpZM4Mbi9z>
.
|
apart from pairwise distances, it would be worth scouring the code for
other suboptimal complexity issues, and using a benchmark script to plot
memory wrt samples. (In fact it would be great to have this for all
estimators alongside expected space and time complexity)
…On 23 Mar 2017 9:12 am, "Joel Nothman" ***@***.***> wrote:
sorry, you're right. we could have a theoretical O(N) even if we have to
look up each in an O(logN) data structure to find nearest neighbours. The
implementation is definitely not kind in memory. PR using kneighbors_graph
is welcome. I also suggest that we offer support for passing such a sparse
graph when in precomputed mode as we do for dbscan
On 23 Mar 2017 7:56 am, "Arthur Goldberg" ***@***.***>
wrote:
> Instead of finding exact nearest neighbours, the Barnes-Hut uses a sparse
> approximate, it basically says for a point p, we find the similarities to
> the N nearest neighbours, and for all neighbours further away we call the
> similarity 0.
>
> So with a sparse representation for n points, we need to compute N * n
> similarities (N is constant so overall O(n) ).
>
>
>
> On 22 Mar 2017, at 20:46, Joel Nothman ***@***.***> wrote:
>
> > although I'm not very familiar with this code, i think it's quite clear
> > that our implementation still has O(N^2) memory requirements, though it
> may
> > not be hard to reduce that to O(N log N). Without not looking at the
> paper,
> > I don't understand how one can perform exact nearest neighbours
> > calculations without that memory cost.
> >
> > On 23 Mar 2017 6:07 am, "Arthur Goldberg" ***@***.***>
> wrote:
> >
> > @kirk86 <https://github.com/kirk86>
> > (Not related to this issue but to solve your personal problem I reckon
> you
> > could implement t-SNE in tensorflow pretty easily which would allow you
> to
> > use multiple cores.... Might be a fun project)
> >
> > —
> > You are receiving this because you were mentioned.
> >
> > Reply to this email directly, view it on GitHub
> > <#8582#
> issuecomment-288506918>,
> > or mute the thread
> > <https://github.com/notifications/unsubscribe-auth/AAEz6xodQ
> UAg3ouJx0kwMprTew7if-Gfks5roXF6gaJpZM4Mbi9z>
> > .
> > —
> > You are receiving this because you were mentioned.
> > Reply to this email directly, view it on GitHub, or mute the thread.
> >
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#8582 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAEz65VzFXhWPApmeD4kX0jj8AsBgFHQks5roYsOgaJpZM4Mbi9z>
> .
>
|
@glemaitre I found the barnes-hut implementation on Github and I've been able to run
it on my data using Python 2.7 (but not Python 3, there's a bug). It was
complex to get my docker image to have both Anaconda 2 and 3, but it
finally worked. So, it would be good if sklearn had the barnes hut
implementation. The existing one is so limiting it is almost a trick that
it is in the library at all. I wish it hadn't been there to waste my time.
It should be replaced with BH-TSNE.
As to my data, the file was not large. It was a few megabytes of data, so
it is surprising that sklearn's TSNE could eat nearly 300GB of RAM. I
really think this should be fixed, to use BH-TSNE.
The bh-tsne project is https://github.com/lvdmaaten/bhtsne
On Wed, Mar 22, 2017 at 3:16 PM, Joel Nothman <notifications@github.com>
wrote:
… apart from pairwise distances, it would be worth scouring the code for
other suboptimal complexity issues, and using a benchmark script to plot
memory wrt samples. (In fact it would be great to have this for all
estimators alongside expected space and time complexity)
On 23 Mar 2017 9:12 am, "Joel Nothman" ***@***.***> wrote:
> sorry, you're right. we could have a theoretical O(N) even if we have to
> look up each in an O(logN) data structure to find nearest neighbours. The
> implementation is definitely not kind in memory. PR using
kneighbors_graph
> is welcome. I also suggest that we offer support for passing such a
sparse
> graph when in precomputed mode as we do for dbscan
>
> On 23 Mar 2017 7:56 am, "Arthur Goldberg" ***@***.***>
> wrote:
>
>> Instead of finding exact nearest neighbours, the Barnes-Hut uses a
sparse
>> approximate, it basically says for a point p, we find the similarities
to
>> the N nearest neighbours, and for all neighbours further away we call
the
>> similarity 0.
>>
>> So with a sparse representation for n points, we need to compute N * n
>> similarities (N is constant so overall O(n) ).
>>
>>
>>
>> On 22 Mar 2017, at 20:46, Joel Nothman ***@***.***>
wrote:
>>
>> > although I'm not very familiar with this code, i think it's quite
clear
>> > that our implementation still has O(N^2) memory requirements, though
it
>> may
>> > not be hard to reduce that to O(N log N). Without not looking at the
>> paper,
>> > I don't understand how one can perform exact nearest neighbours
>> > calculations without that memory cost.
>> >
>> > On 23 Mar 2017 6:07 am, "Arthur Goldberg" ***@***.***>
>> wrote:
>> >
>> > @kirk86 <https://github.com/kirk86>
>> > (Not related to this issue but to solve your personal problem I reckon
>> you
>> > could implement t-SNE in tensorflow pretty easily which would allow
you
>> to
>> > use multiple cores.... Might be a fun project)
>> >
>> > —
>> > You are receiving this because you were mentioned.
>> >
>> > Reply to this email directly, view it on GitHub
>> > <#8582#
>> issuecomment-288506918>,
>> > or mute the thread
>> > <https://github.com/notifications/unsubscribe-auth/AAEz6xodQ
>> UAg3ouJx0kwMprTew7if-Gfks5roXF6gaJpZM4Mbi9z>
>> > .
>> > —
>> > You are receiving this because you were mentioned.
>> > Reply to this email directly, view it on GitHub, or mute the thread.
>> >
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> <https://github.com/scikit-learn/scikit-learn/issues/
8582#issuecomment-288537015>,
>> or mute the thread
>> <https://github.com/notifications/unsubscribe-auth/
AAEz65VzFXhWPApmeD4kX0jj8AsBgFHQks5roYsOgaJpZM4Mbi9z>
>> .
>>
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8582 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACkpSeDc9uLEswsDQ4uCWd9Jmgl4EuPks5roZ3YgaJpZM4Mbi9z>
.
|
Fixing this will also require changing or duplicating @rjurney, I agree that our current implementation is falsely advertised, and apologise that our reviews were not sufficient to ensure the implementation was true to the algorithm. |
I am having a similar issue on my windows 8gb RAM windows pc (completely kills it dead). No such issue on my Linux Ubuntu 32gb RAM pc. Not sure what versions of sk etc are on my linux but windows is 0.18.1. |
This is actually a duplicate of #7089. I would be very keen to mentor someone in fixing this using |
Apart from using Getting technical: Since the indices of the sparse neighbors don't matter to
For dense neighborhoods:
The output of if using_neighbors:
for k in range(K):
j = neighbors[i, k]
P[i, j] /= sum_Pi
sum_disti_Pi += affinities[i, j] * P[i, j]
else:
for j in range(K):
P[i, j] /= sum_Pi
sum_disti_Pi += affinities[i, j] * P[i, j] could be contracted to: for k in range(row_offsets[i], row_offsets[i + 1]):
P[k] /= sum_Pi
sum_disti_Pi += data[k] * P[k] |
Should be fixed in #9032 |
Hi everyone, I was wondering whether anyone knows why t-SNE results into memory errors when reducing the dimension of the data to the default number, which is 2.
I have the mnist dataset and I apply some transformation on it. Resulting in a reduced mnist dataset.
Original: 60000x784
After transformation: 60000x200
When I apply t-SNE on transformed mnist 60000x200, I always get a memory error. Also I was wondering why doesn't the multicore option of
n_jobs=-1
exist?The text was updated successfully, but these errors were encountered: