Skip to content

t-SNE results in errors when reducing dim to default 2 #8582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kirk86 opened this issue Mar 13, 2017 · 32 comments
Closed

t-SNE results in errors when reducing dim to default 2 #8582

kirk86 opened this issue Mar 13, 2017 · 32 comments
Labels
Bug Moderate Anything that requires some knowledge of conventions and best practices

Comments

@kirk86
Copy link

kirk86 commented Mar 13, 2017

Hi everyone, I was wondering whether anyone knows why t-SNE results into memory errors when reducing the dimension of the data to the default number, which is 2.
I have the mnist dataset and I apply some transformation on it. Resulting in a reduced mnist dataset.
Original: 60000x784
After transformation: 60000x200

When I apply t-SNE on transformed mnist 60000x200, I always get a memory error. Also I was wondering why doesn't the multicore option of n_jobs=-1 exist?

@lesteve
Copy link
Member

lesteve commented Mar 14, 2017

Please post a stand-alone snippet reproducing the problem and the full stack-trace you get.

@kirk86
Copy link
Author

kirk86 commented Mar 14, 2017

@lesteve thanks for the reply
here's the output of a simple example

In [47]: X_train.shape
Out[47]: (60000, 200)

n [50]: X_train_reduced = TSNE(n_components=2, random_state=0).fit_transform(X_train)
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-50-6848c4dfd81b> in <module>()
----> 1 X_train_reduced = TSNE(n_components=2, random_state=0).fit_transform(X_train)

/home/user/miniconda2/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in fit_transform(self, X, y)
    882             Embedding of the training data in low-dimensional space.
    883         """
--> 884         embedding = self._fit(X)
    885         self.embedding_ = embedding
    886         return self.embedding_

/home/user/miniconda2/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in _fit(self, X, skip_num_points)
    764                 neighbors_nn = neighbors_nn[:, 1:]
    765             P = _joint_probabilities_nn(distances, neighbors_nn,
--> 766                                         self.perplexity, self.verbose)
    767         else:
    768             P = _joint_probabilities(distances, self.perplexity, self.verbose)

/home/user/miniconda2/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in _joint_probabilities_nn(distances, neighbors, desired_perplexity, verbose)
     96     m = "All probabilities should be finite"
     97     assert np.all(np.isfinite(conditional_P)), m
---> 98     P = conditional_P + conditional_P.T
     99     sum_P = np.maximum(np.sum(P), MACHINE_EPSILON)
    100     P = np.maximum(squareform(P) / sum_P, MACHINE_EPSILON)

MemoryError:

The above example was run on a PC with 24 cores and 64GB of ram and still get memory errors.

@kirk86
Copy link
Author

kirk86 commented Mar 14, 2017

@lesteve
Notice that also Isomap produces memory error even when running with default parameters.

 In [57]: X_train_iso = Isomap(n_jobs=-1).fit_transform(X_train)
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-57-26dde712b79e> in <module>()
----> 1 X_train_iso = Isomap(n_jobs=-1).fit_transform(X_train)

/home/user/miniconda2/lib/python2.7/site-packages/sklearn/manifold/isomap.pyc in fit_transform(self, X, y)
    178         X_new: array-like, shape (n_samples, n_components)
    179         """
--> 180         self._fit_transform(X)
    181         return self.embedding_
    182

/home/user/miniconda2/lib/python2.7/site-packages/sklearn/manifold/isomap.pyc in _fit_transform(self, X)
    122         G *= -0.5
    123
--> 124         self.embedding_ = self.kernel_pca_.fit_transform(G)
    125
    126     def reconstruction_error(self):

/home/jmitro/miniconda2/lib/python2.7/site-packages/sklearn/decomposition/kernel_pca.pyc in fit_transform(self, X, y, **params)
    258         X_new: array-like, shape (n_samples, n_components)
    259         """
--> 260         self.fit(X, **params)
    261
    262         X_transformed = self.alphas_ * np.sqrt(self.lambdas_)

/home/user/miniconda2/lib/python2.7/site-packages/sklearn/decomposition/kernel_pca.pyc in fit(self, X, y)
    233             Returns the instance itself.
    234         """
--> 235         X = check_array(X, accept_sparse='csr', copy=self.copy_X)
    236         K = self._get_kernel(X)
    237         self._fit_transform(K)

/home/user/miniconda2/lib/python2.7/site-packages/sklearn/utils/validation.pyc in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    380                                       force_all_finite)
    381     else:
--> 382         array = np.array(array, dtype=dtype, order=order, copy=copy)
    383
    384         if ensure_2d:

MemoryError:

@jnothman
Copy link
Member

jnothman commented Mar 14, 2017 via email

@kirk86
Copy link
Author

kirk86 commented Mar 14, 2017

@jnothman the data is simply the mnist dataset.
if you have keras installed is as simple as

from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()

otherwise you'd have to download and load the dataset manually

@jnothman
Copy link
Member

jnothman commented Mar 15, 2017 via email

@kirk86
Copy link
Author

kirk86 commented Mar 15, 2017

@jnothman the only thing that I've done is

X_train = X_train.reshape(-1, 784)
X_test = X_test.reshape(-1, 784)

X_train /= 255.
X_test /= 255.

Then feed that to isomap or tsne

@jnothman
Copy link
Member

So when @lesteve suggested you post a standalone snippet, he hoped you would write:

from sklearn.manifold import TSNE
from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(-1, 784)
X_train = X_train / 255.
X_train_reduced = TSNE(n_components=2, random_state=0).fit_transform(X_train)

Yes, this blows up in memory consumption.

@jnothman
Copy link
Member

jnothman commented Mar 15, 2017

The line where the traceback reports the MemoryError should require (n_samples**2 * 2) floats, so roughly 55GB of memory, apart from the original distances matrix of the same n_samples**2 size. I find it strange that the docs suggest that barnes_hut can "scale to millions of examples" when there's this kind of quadratic memory consumption (@cemoody?).

Can the implementation be made to have a lower memory consumption?

@jnothman
Copy link
Member

I am a bit surprised to see that barnes_hut computes all-pairs distances, when it could be using kneighbors_graph to sparsely calculate distances where a binary tree is appropriate.

@kirk86
Copy link
Author

kirk86 commented Mar 15, 2017

@jnothman I'm not really aware of the internals of sklearn but would you say that roughly the same amount of memory is also required if I use Isomap or LocallyLinearEmbedding since all of them blow up in terms of memory consumption.

@jnothman
Copy link
Member

Isomap definitely has an n**2 floats consumption too. I can't see in LLE whether this is necessarily the case. I'm no expert on manifold learning.

I think in all cases it would be a good idea to transform only a sample of the dataset.

@kirk86
Copy link
Author

kirk86 commented Mar 15, 2017

@jnothman thanks a lot.

@rjurney
Copy link

rjurney commented Mar 19, 2017

So this makes the sklearn TSNE broken? Other implementations work without this in issue.

@glemaitre
Copy link
Member

@rjurney The algorithms are working. However, if your input is huge and your memory is small in comparison, it will lead to a MemoryError.

@abhigenie92
Copy link

abhigenie92 commented Mar 21, 2017

Would increasing the swap-size fix it?

@jnothman
Copy link
Member

jnothman commented Mar 21, 2017 via email

@kirk86
Copy link
Author

kirk86 commented Mar 21, 2017

my two cents is that if you don't split the data in a appropriate manner so that it keeps the classes balanced you might end up with very unexpected results.

@artcg
Copy link

artcg commented Mar 22, 2017

@jnothman barnes-hut t-SNE should run in O(NlogN) time and O(N) memory -- (see Section 1. of paper https://arxiv.org/abs/1301.3342)

@artcg
Copy link

artcg commented Mar 22, 2017

BTW @kirk86 I am curious why you specified 'when reducing dim to default of 2' -- have you experienced other behavior when not using n_components=2 ?
I think the sklearn implementations still is broken for n_components != 2 (see #7885)

@kirk86
Copy link
Author

kirk86 commented Mar 22, 2017

@artcg indeed bhtsne is a better alternative but most of the implementations I've seen so far they don't utilize multiprocessing. I've tried a couple of those and they use only 1 out of n cores.

In regards to your 2nd question I would say that what @jnothman and others have said about data size is true. Therefore, regardless of n_components if your data is too large it will probably through a memory error.

@artcg
Copy link

artcg commented Mar 22, 2017

@kirk86
(Not related to this issue but to solve your personal problem I reckon you could implement t-SNE in tensorflow pretty easily which would allow you to use multiple cores/GPUs/whatever .... Might be a fun project)

@jnothman
Copy link
Member

jnothman commented Mar 22, 2017 via email

@artcg
Copy link

artcg commented Mar 22, 2017 via email

@jnothman
Copy link
Member

jnothman commented Mar 22, 2017 via email

@jnothman
Copy link
Member

jnothman commented Mar 22, 2017 via email

@rjurney
Copy link

rjurney commented Mar 23, 2017 via email

@jnothman
Copy link
Member

Fixing this will also require changing or duplicating _binary_search_perplexity to allow a sparse distances input and sparse output. I'd be happy (time permitting) to guide someone through the changes if they started working on one or another part of it.

@rjurney, I agree that our current implementation is falsely advertised, and apologise that our reviews were not sufficient to ensure the implementation was true to the algorithm.

@jnothman jnothman added the Moderate Anything that requires some knowledge of conventions and best practices label Mar 23, 2017
@JoshuaC3
Copy link

I am having a similar issue on my windows 8gb RAM windows pc (completely kills it dead). No such issue on my Linux Ubuntu 32gb RAM pc. Not sure what versions of sk etc are on my linux but windows is 0.18.1.

@jnothman
Copy link
Member

This is actually a duplicate of #7089. I would be very keen to mentor someone in fixing this using kneighbors_graph.

@jnothman
Copy link
Member

Apart from using kneighbors_graph to get the neighborhoods, joint_probabilities_nn needs to correspondingly take a sparse matrix as input and return a sparse matrix as output. Similarly, _binary_search_perplexity would have to have different input/output.

Getting technical: Since the indices of the sparse neighbors don't matter to _binary_search_perplexity, only the distances, I could imagine it taking data, row_offsets where, for sparse CSR Barnes-Hut neighborhoods:

data, row_offsets = D.data, D.indptr

For dense neighborhoods:

data, row_offsets = np.arange(0, D.size + 1, D.shape[1])

The output of _binary_search_perplexity would be like data, and the current implementation's

            if using_neighbors:
                for k in range(K):
                    j = neighbors[i, k]
                    P[i, j] /= sum_Pi
                    sum_disti_Pi += affinities[i, j] * P[i, j]
            else:
                for j in range(K):
                    P[i, j] /= sum_Pi
                    sum_disti_Pi += affinities[i, j] * P[i, j]

could be contracted to:

            for k in range(row_offsets[i], row_offsets[i + 1]):
                P[k] /= sum_Pi
                sum_disti_Pi += data[k] * P[k]

@jnothman
Copy link
Member

Should be fixed in #9032

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Moderate Anything that requires some knowledge of conventions and best practices
Projects
None yet
Development

No branches or pull requests

8 participants