[MRG+1] Fixes bugs in t-SNE argument passing, and a bug with running using n dfs #7885

artcg · 2016-11-16T00:58:43Z

What does this implement/fix? Explain your changes.

There was a bug in the implementation of t-SNE with n degrees of freedom for the 'exact' method.
There was a bug where some arguments were not being passed in t-SNE
I added support for a trefoil knot generator
I changed a couple default arguments for t-SNE
(Details given in the commit message)

Any other comments?

Please let me know if you have any questions about the changes!

jnothman

Thanks. Please give this PR a more informative title

jnothman · 2016-11-16T02:48:59Z

sklearn/manifold/t_sne.py

-        # degrees_of_freedom = n_components - 1 comes from
-        # "Learning a Parametric Embedding by Preserving Local Structure"
-        # Laurens van der Maaten, 2009.
-        degrees_of_freedom = max(self.n_components - 1.0, 1)


We need to retain this behaviour by default. I.e. make degrees_of_freedom=None by default, and handle that case with this logic.

I have followed your suggestion!

jnothman · 2016-11-16T02:49:19Z

sklearn/manifold/t_sne.py

@@ -645,10 +645,10 @@ class TSNE(BaseEstimator):
    """

    def __init__(self, n_components=2, perplexity=30.0,
-                 early_exaggeration=4.0, learning_rate=1000.0, n_iter=1000,
+                 early_exaggeration=4.0, learning_rate=200.0, n_iter=2000,


We will not change default values without a deprecation cycle. I think we're best not changing these.

I have put the defaults back as you suggested-- although I do believe the default learning rate is too high at the moment!

Maybe not best to do in this PR, but these lower learning rates may be a better default choice (https://www.reddit.com/r/MachineLearning/comments/47kf7w/scikitlearn_tsne_implementation/)

Yes, maybe we can set the default to a lower value. But we should at least compare the result on the first 10,000 images from the MNIST dataset with the result from the original paper. For example, in this case it works with learning_rate=1000, however, the gaps between classes in the original t-SNE paper are a little bit larger.

By the way, in the original pull request I wrote down some notes on the learning rate and compared several implementations (link):

In the literature:

original paper: initialization with standard deviation 1e-4, 1000 episodes, learning rate 100, momentum 0.5 for 250 episodes, 0.8 for the rest, early exaggeration with 4 for 50 episodes

matlab implementation: learning rate 500, early exaggeration for 100 episodes

python implementation: initialization with standard deviation 1, learning rate 500, early exaggeration for 100 episodes, momentum 0.5 for 20 episodes

divvy: initialization with standard deviation 1e-4, 1000 episodes, learning rate 1000, momentum 0.5 for 100 episodes, 0.8 for the rest, early exaggeration with 4 for 100 episodes

parametric t-sne (not comparable): conjugate gradient

barnes-hut t-sne: initialization with standard deviation 1e-4, 1000 episodes, learning rate 200, momentum 0.5 for 250 episodes, 0.8 for the rest, early exaggeration with 12 for 250 episodes

My experiences:

the learning rate has to be set manually for optimal performance, something between 100 and 1000

a high momentum (0.8) during early exaggeration improves the result

This implementation uses the following schedule:

initialization with standard deviation 1e-4, 1000 episodes, learning rate 1000, momentum 0.5 for 50 episodes, 0.8 for the rest, early exaggeration with 4 for 100 episodes

I wonder why we decided to use 1000 which is at the edge of the recommended range [100, 1000]?

Should we open this as an issue perhaps so we can check for deprecation etc. and discuss further?

jnothman · 2016-11-16T02:52:44Z

sklearn/manifold/t_sne.py

@@ -142,10 +142,12 @@ def _kl_divergence(params, P, degrees_of_freedom, n_samples, n_components,

    # Q is a heavy-tailed distribution: Student's t-distribution
    n = pdist(X_embedded, "sqeuclidean")
-    n += 1.


I may look at this closer later, but would appreciate someone else's comments on these changes to KL-divergence. @ssaeger, @AlexanderFabisch, @cemoody, @lesteve?

Let me know if you would like me to post an explanation of KL - correction, I have done the calculus to confirm -- although you can tell that my code works simply my running the program if you wish! (make sure to pass method='exact')

KL is common enough, I'm just trying to run through a long backlog of issues in my commutes to work, and don't have time to check some details.

@jnothman on your phone? I haven't really found a good way to do that and would be very interested.

No, this isn't NYC. We sit on our public transport, most of the time. I use a laptop without an internet connection. On the other hand, I have about 60 issue pages open...

It doesn't look like you have a test for n_components > 2 yet...?

Sorry for the slow reply @jnothman
The test case works by testing for degrees_of_freedom > 1, "alpha" in the test case refers to the number of degrees of freedom. The reason for this is because the calculation of KL divergence doesn't depend on n_components (that just changes the dimensions of the matrix)
However, I could also create a test case for n_components > 2 if you like? It might be a sensible idea!

Well, @AlexanderFabisch did say:

Probably nobody used n_components > 2 before. :) We need some kind of unit test for that

Yes that is because by default degrees_of_freedom = n_components - 1
However, you can also run t-SNE with other degrees_of_freedom on a 2D embedding

adding an additional test for larger n_components wouldn't hurt though ;)

jnothman · 2016-11-16T02:52:59Z

sklearn/manifold/t_sne.py

                 n_iter_without_progress=30, min_grad_norm=1e-7,
                 metric="euclidean", init="random", verbose=0,
-                 random_state=None, method='barnes_hut', angle=0.5):
+                 random_state=None, method='barnes_hut', angle=0.5, degrees_of_freedom=1, min_error_diff=1e-7):


PEP8 line length

I have fixed this PEP8 violation in my latest commit

jnothman · 2016-11-16T02:53:28Z

sklearn/datasets/samples_generator.py

@@ -1351,6 +1351,42 @@ def make_s_curve(n_samples=100, noise=0.0, random_state=None):

    return X, t

+def make_trefoil_knot(n_samples=100, noise=0.0, random_state=None):


Please split this into a separate pull request.

I have deleted this function from this pull request

jnothman · 2016-11-16T02:54:06Z

sklearn/datasets/samples_generator.py

+def make_trefoil_knot(n_samples=100, noise=0.0, random_state=None):
+    """Generate an Uniform Trefoil Knot 
+
+    Read more in the :ref:`User Guide <sample_generators>`.


Only true if you modify the user guide (which you should). Sample generators are clearest with an example, too.

I have deleted this function from this pull request

artcg · 2016-11-16T10:27:40Z

Thanks for the comments Joel, I'll make the changes you mentioned this afternoon!

On 16 Nov 2016, at 2:55 a.m., Joel Nothman notifications@github.com wrote:

@jnothman requested changes on this pull request.

Thanks. Please give this PR a more informative title

In sklearn/manifold/t_sne.py:

@@ -736,11 +738,9 @@ def _fit(self, X, skip_num_points=0):
"the metric or precomputed distances given "
"as X are not correct")
   # Degrees of freedom of the Student's t-distribution. The suggestion
   # degrees_of_freedom = n_components - 1 comes from
   # "Learning a Parametric Embedding by Preserving Local Structure"
   # Laurens van der Maaten, 2009.
   degrees_of_freedom = max(self.n_components - 1.0, 1)
We need to retain this behaviour by default. I.e. make degrees_of_freedom=None by default, and handle that case with this logic.
In sklearn/manifold/t_sne.py:

@@ -645,10 +645,10 @@ class TSNE(BaseEstimator):
"""
 def __init__(self, n_components=2, perplexity=30.0,
            early_exaggeration=4.0, learning_rate=1000.0, n_iter=1000,
            early_exaggeration=4.0, learning_rate=200.0, n_iter=2000,
We will not change default values without a deprecation cycle. I think we're best not changing these.
In sklearn/manifold/t_sne.py:

@@ -142,10 +142,12 @@ def _kl_divergence(params, P, degrees_of_freedom, n_samples, n_components,
 # Q is a heavy-tailed distribution: Student's t-distribution
 n = pdist(X_embedded, "sqeuclidean")
n += 1.
I may look at this closer later, but would appreciate someone else's comments on these changes to KL-divergence. @ssaeger, @AlexanderFabisch, @cemoody, @lesteve?

In sklearn/manifold/t_sne.py:
              n_iter_without_progress=30, min_grad_norm=1e-7,
              metric="euclidean", init="random", verbose=0,
            random_state=None, method='barnes_hut', angle=0.5):
            random_state=None, method='barnes_hut', angle=0.5, degrees_of_freedom=1, min_error_diff=1e-7):
PEP8 line length
In sklearn/datasets/samples_generator.py:

@@ -1351,6 +1351,42 @@ def make_s_curve(n_samples=100, noise=0.0, random_state=None):
 return X, t
+def make_trefoil_knot(n_samples=100, noise=0.0, random_state=None):
Please split this into a separate pull request.

In sklearn/datasets/samples_generator.py:

@@ -1351,6 +1351,42 @@ def make_s_curve(n_samples=100, noise=0.0, random_state=None):
 return X, t
+def make_trefoil_knot(n_samples=100, noise=0.0, random_state=None):

"""Generate an Uniform Trefoil Knot

Read more in the :ref:User Guide <sample_generators>.
Only true if you modify the user guide (which you should). Sample generators are clearest with an example, too.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

artcg · 2016-11-16T16:05:24Z

I've noticed there is also a bug in barnes_hut t-SNE with n degrees of freedom -- I will fix this then mark my Pull request as MRG again.

amueller · 2016-11-28T23:19:18Z

is there a test for the barnes-hut change?

artcg · 2016-11-29T11:55:48Z

Yep the gradient is checked numerically

…

On 28 Nov 2016, at 11:19 p.m., Andreas Mueller ***@***.***> wrote: is there a test for the barnes-hut change? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

amueller · 2016-11-29T22:45:29Z

I mean there is no regression test for that, is there?

artcg · 2016-11-30T01:58:44Z

I’m afraid I’m unfamiliar with software regression testing! I will look into it..

…

On 29 Nov 2016, at 22:45, Andreas Mueller ***@***.***> wrote: I mean there is no regression test for that, is there? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

artcg · 2016-11-30T02:02:24Z

I don’t think there should be any software regression (from what I understand of the term through reading wikipedia), the original code had a bug that didn’t make any difference when n=1 (which encompasses the vast majority of run cases!) So the fix doesn’t make any difference for n=1

…

On 29 Nov 2016, at 22:45, Andreas Mueller ***@***.***> wrote: I mean there is no regression test for that, is there? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

jnothman · 2016-11-30T03:21:34Z

Regression testing just means making a test so that if someone rewrote the code in the future, there would be a test to ensure this behaviour is correct.

…

On 30 November 2016 at 13:02, artcg ***@***.***> wrote: I don’t think there should be any software regression (from what I understand of the term through reading wikipedia), the original code had a bug that didn’t make any difference when n=1 (which encompasses the vast majority of run cases!) So the fix doesn’t make any difference for n=1 On 29 Nov 2016, at 22:45, Andreas Mueller ***@***.***> wrote: > I mean there is no regression test for that, is there? > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub, or mute the thread. > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7885 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6y-CbOiF8Luj2AXtQ-FJPooeZEdVks5rDNkygaJpZM4KzOFH> .

artcg · 2016-11-30T14:27:42Z

Ok I’ve added the n_components != 2 test RE: The Barnes-Hut test, There is a test wherein the gradient and KL are checked against the exact method, the function is called 'test_barnes_hut_angle()' in /manifold/tests/

…

On 30 Nov 2016, at 03:21, Joel Nothman ***@***.***> wrote: Regression testing just means making a test so that if someone rewrote the code in the future, there would be a test to ensure this behaviour is correct. On 30 November 2016 at 13:02, artcg ***@***.***> wrote: > I don’t think there should be any software regression (from what I > understand of the term through reading wikipedia), > the original code had a bug that didn’t make any difference when n=1 > (which encompasses the vast majority of run cases!) > So the fix doesn’t make any difference for n=1 > > > On 29 Nov 2016, at 22:45, Andreas Mueller ***@***.***> > wrote: > > > I mean there is no regression test for that, is there? > > > > — > > You are receiving this because you authored the thread. > > Reply to this email directly, view it on GitHub, or mute the thread. > > > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#7885 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AAEz6y-CbOiF8Luj2AXtQ-FJPooeZEdVks5rDNkygaJpZM4KzOFH> > . > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

amueller · 2016-11-30T19:49:43Z

@artcg thanks but that test only covers the exact t-SNE code, right? Or does it do both?

Maybe I'm misunderstanding something. You fixed a bug in both algorithms, the exact one and the approximate one, right? If you fixed two bugs, you should introduce a test for each of them.

artcg · 2016-12-08T11:57:41Z

@amueller I see what you mean! I believe currently the tests work as follows: the function test_gradient() checks the exact method is numerically correct the function test_barnes_hut_angle() checks that the barnes_hut method agrees with the exact method when angle is zero (thereby indirectly checking that the barnes_hut method is numerically correct) Originally these tests did not check for n_components != 2 which is why the bug slid through, however I have adapted both tests in this pull request such that the case for n_components != 2 is checked for which should make sure that the bug is neither present in the exact or barnes hut adaption. Please let me know if there’s anything else I should look into / explain!

…

On 30 Nov 2016, at 19:50, Andreas Mueller ***@***.***> wrote: @artcg thanks but that test only covers the exact t-SNE code, right? Or does it do both? Maybe I'm misunderstanding something. You fixed a bug in both algorithms, the exact one and the approximate one, right? If you fixed two bugs, you should introduce a test for each of them. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

amueller · 2016-12-08T17:13:28Z

@artcg ok, thanks :)

artcg · 2016-12-29T16:19:01Z

Hi all hope you have had a good holiday,
any update on the review for this? Sorry the PR has been a mess! Still getting used to github...

jnothman

Relying on some of the technical review from @AlexanderFabisch (thanks!), this LGTM now, apart from the minor nitpicks.

Please add a whats_new.rst entry.

jnothman · 2017-01-01T12:23:05Z

sklearn/manifold/_barnes_hut_tsne.pyx

        for j in range(l[0]):
-            qijZ = ((1.0 + dist2s[j]) / dof) ** exponent
+            oijZ = ((1.0 + (dist2s[j]) / dof)) ** -1


(dist2s[j]) -> dist2s[j]

jnothman · 2017-01-01T12:23:35Z

sklearn/manifold/_barnes_hut_tsne.pyx

@@ -528,8 +528,9 @@ cdef float compute_gradient_positive(float[:,:] val_P,
            for ax in range(n_dimensions):
                buff[ax] = pos_reference[i, ax] - pos_reference[j, ax]
                D += buff[ax] ** 2.0  
-            Q = (((1.0 + D) / dof) ** exponent)
+            Q = ((((D) / dof) + 1.0) ** -1)


((D) / dof) -> D / dof

…strating properties of Manifold Learning algorithms. An example has been added to /examples

…hod), and set up the input parameter to allow for user-selected number of degrees of freedom

n_iter 1000 -> 2000 The t-SNE has built in functionality to stop if it isn't learning (n_iter_without_progress, and min_error_diff) With these in place, it seems reasonable to allow it to iterate a little longer if it is still learning learning rate 1000 -> 200 The learning rate of 100 is used in the 2008 paper introducting t-SNE, and Van der Maaten uses the value of 200 on his implementation of bh_tsne. With that in mind, it seems sensible to use a value of 200

Fixed bug in barnes hut t-SNE

artcg · 2017-01-01T18:13:25Z

Thanks for the review @jnothman I made those changes and rebased the pull request so it is synced with the master

jnothman · 2017-01-01T23:46:12Z

doc/whats_new.rst

@@ -121,14 +121,15 @@ Enhancements
 Bug fixes
 .........

+   - Fixed a bug where :class:`sklearn.manifold.t_sne` behaved incorrectly with degrees of freedom other than 1 (default). by :user:`Arthur Goldberg <artcg>`.


Can we say n_components other than 2? Also, please keep to 80 chars per line

artcg · 2017-01-04T15:00:09Z

Sure thanks for pointing that out, changed.

jnothman · 2017-07-13T00:18:08Z

Should be fixed in #9032

jnothman · 2017-07-13T00:18:33Z

Thanks @artcg for your contributions.

artcg · 2017-07-18T10:59:49Z

And thanks for the help @jnothman! I learned a lot about good code practice from your reviews

artcg changed the title ~~Tsne~~ [RMG] Tsne Nov 16, 2016

jnothman requested changes Nov 16, 2016

View reviewed changes

artcg changed the title ~~[RMG] Tsne~~ [RMG] Fixes bugs in t-SNE argument passing, and a bug with running using n dfs Nov 16, 2016

artcg changed the title ~~[RMG] Fixes bugs in t-SNE argument passing, and a bug with running using n dfs~~ [MRG] Fixes bugs in t-SNE argument passing, and a bug with running using n dfs Nov 16, 2016

artcg changed the title ~~[MRG] Fixes bugs in t-SNE argument passing, and a bug with running using n dfs~~ [WIP] Fixes bugs in t-SNE argument passing, and a bug with running using n dfs Nov 16, 2016

artcg changed the title ~~[WIP] Fixes bugs in t-SNE argument passing, and a bug with running using n dfs~~ [MRG] Fixes bugs in t-SNE argument passing, and a bug with running using n dfs Nov 18, 2016

jnothman reviewed Jan 1, 2017

View reviewed changes

jnothman changed the title ~~[MRG] Fixes bugs in t-SNE argument passing, and a bug with running using n dfs~~ [MRG+1] Fixes bugs in t-SNE argument passing, and a bug with running using n dfs Jan 1, 2017

artcg added 9 commits January 1, 2017 17:40

Added support for generating a Trefoil Knot. This is useful for demon…

e0d4aee

…strating properties of Manifold Learning algorithms. An example has been added to /examples

Fixed a bug with t-SNE using n degrees of freedom (in the 'exact' met…

f562cbf

…hod), and set up the input parameter to allow for user-selected number of degrees of freedom

Fixed bug where a couple arguments weren't passed

93bc5ed

Made changes suggested by jnothman

576ffc9

Multiplied KL-grad by 2

c3acffb

Changed test case to check gradient with non-default degrees_of_freedom

a56d979

Fixed bug in barnes hut t-SNE

attempt at fix and pep8 fix

851a6fc

Added test case for n_components != 2

0709534

artcg added 2 commits January 1, 2017 18:03

Added to whatsnew

afc61db

pep8 fixes

58dde71

git base conflict fix

c0423c5

jnothman reviewed Jan 1, 2017

View reviewed changes

Update whats_new.rst

8c11392

jnothman approved these changes Jan 5, 2017

View reviewed changes

artcg mentioned this pull request Mar 22, 2017

t-SNE results in errors when reducing dim to default 2 #8582

Closed

jnothman added this to the 0.19 milestone Jun 18, 2017

jnothman closed this Jul 13, 2017

		@@ -1351,6 +1351,42 @@ def make_s_curve(n_samples=100, noise=0.0, random_state=None):

		return X, t

		def make_trefoil_knot(n_samples=100, noise=0.0, random_state=None):

[MRG+1] Fixes bugs in t-SNE argument passing, and a bug with running using n dfs #7885

[MRG+1] Fixes bugs in t-SNE argument passing, and a bug with running using n dfs #7885

Conversation

artcg commented Nov 16, 2016

What does this implement/fix? Explain your changes.

Any other comments?

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

artcg Nov 16, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

artcg Nov 17, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

artcg Nov 22, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

artcg Nov 16, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

artcg Nov 16, 2016 • edited Loading

Choose a reason for hiding this comment

artcg commented Nov 16, 2016

artcg commented Nov 16, 2016

amueller commented Nov 28, 2016

artcg commented Nov 29, 2016 via email

amueller commented Nov 29, 2016

artcg commented Nov 30, 2016 via email

artcg commented Nov 30, 2016 via email

jnothman commented Nov 30, 2016 via email

artcg commented Nov 30, 2016 via email • edited Loading

amueller commented Nov 30, 2016

artcg commented Dec 8, 2016 via email

amueller commented Dec 8, 2016

artcg commented Dec 29, 2016

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

artcg commented Jan 1, 2017

Choose a reason for hiding this comment

artcg commented Jan 4, 2017

jnothman commented Jul 13, 2017

jnothman commented Jul 13, 2017

artcg commented Jul 18, 2017

artcg Nov 16, 2016 •

edited

Loading

artcg Nov 17, 2016 •

edited

Loading

artcg Nov 22, 2016 •

edited

Loading

artcg Nov 16, 2016 •

edited

Loading

artcg Nov 16, 2016 •

edited

Loading

artcg commented Nov 30, 2016 via email •

edited

Loading