Skip to content

[MRG+1] Fixes bugs in t-SNE argument passing, and a bug with running using n dfs #7885

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from

Conversation

artcg
Copy link

@artcg artcg commented Nov 16, 2016

What does this implement/fix? Explain your changes.

  1. There was a bug in the implementation of t-SNE with n degrees of freedom for the 'exact' method.

  2. There was a bug where some arguments were not being passed in t-SNE

  3. I added support for a trefoil knot generator

  4. I changed a couple default arguments for t-SNE
    (Details given in the commit message)

Any other comments?

Please let me know if you have any questions about the changes!

@artcg artcg changed the title Tsne [RMG] Tsne Nov 16, 2016
Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Please give this PR a more informative title

# degrees_of_freedom = n_components - 1 comes from
# "Learning a Parametric Embedding by Preserving Local Structure"
# Laurens van der Maaten, 2009.
degrees_of_freedom = max(self.n_components - 1.0, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to retain this behaviour by default. I.e. make degrees_of_freedom=None by default, and handle that case with this logic.

Copy link
Author

@artcg artcg Nov 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have followed your suggestion!

@@ -645,10 +645,10 @@ class TSNE(BaseEstimator):
"""

def __init__(self, n_components=2, perplexity=30.0,
early_exaggeration=4.0, learning_rate=1000.0, n_iter=1000,
early_exaggeration=4.0, learning_rate=200.0, n_iter=2000,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will not change default values without a deprecation cycle. I think we're best not changing these.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have put the defaults back as you suggested-- although I do believe the default learning rate is too high at the moment!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not best to do in this PR, but these lower learning rates may be a better default choice (https://www.reddit.com/r/MachineLearning/comments/47kf7w/scikitlearn_tsne_implementation/)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, maybe we can set the default to a lower value. But we should at least compare the result on the first 10,000 images from the MNIST dataset with the result from the original paper. For example, in this case it works with learning_rate=1000, however, the gaps between classes in the original t-SNE paper are a little bit larger.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, in the original pull request I wrote down some notes on the learning rate and compared several implementations (link):

In the literature:

  • original paper: initialization with standard deviation 1e-4, 1000 episodes, learning rate 100, momentum 0.5 for 250 episodes, 0.8 for the rest, early exaggeration with 4 for 50 episodes
  • matlab implementation: learning rate 500, early exaggeration for 100 episodes
  • python implementation: initialization with standard deviation 1, learning rate 500, early exaggeration for 100 episodes, momentum 0.5 for 20 episodes
  • divvy: initialization with standard deviation 1e-4, 1000 episodes, learning rate 1000, momentum 0.5 for 100 episodes, 0.8 for the rest, early exaggeration with 4 for 100 episodes
  • parametric t-sne (not comparable): conjugate gradient
  • barnes-hut t-sne: initialization with standard deviation 1e-4, 1000 episodes, learning rate 200, momentum 0.5 for 250 episodes, 0.8 for the rest, early exaggeration with 12 for 250 episodes

My experiences:

  • the learning rate has to be set manually for optimal performance, something between 100 and 1000
  • a high momentum (0.8) during early exaggeration improves the result

This implementation uses the following schedule:

  • initialization with standard deviation 1e-4, 1000 episodes, learning rate 1000, momentum 0.5 for 50 episodes, 0.8 for the rest, early exaggeration with 4 for 100 episodes

I wonder why we decided to use 1000 which is at the edge of the recommended range [100, 1000]?

Copy link
Author

@artcg artcg Nov 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we open this as an issue perhaps so we can check for deprecation etc. and discuss further?

@@ -142,10 +142,12 @@ def _kl_divergence(params, P, degrees_of_freedom, n_samples, n_components,

# Q is a heavy-tailed distribution: Student's t-distribution
n = pdist(X_embedded, "sqeuclidean")
n += 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may look at this closer later, but would appreciate someone else's comments on these changes to KL-divergence. @ssaeger, @AlexanderFabisch, @cemoody, @lesteve?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you would like me to post an explanation of KL - correction, I have done the calculus to confirm -- although you can tell that my code works simply my running the program if you wish! (make sure to pass method='exact')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KL is common enough, I'm just trying to run through a long backlog of issues in my commutes to work, and don't have time to check some details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnothman on your phone? I haven't really found a good way to do that and would be very interested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this isn't NYC. We sit on our public transport, most of the time. I use a laptop without an internet connection. On the other hand, I have about 60 issue pages open...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like you have a test for n_components > 2 yet...?

Copy link
Author

@artcg artcg Nov 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the slow reply @jnothman
The test case works by testing for degrees_of_freedom > 1, "alpha" in the test case refers to the number of degrees of freedom. The reason for this is because the calculation of KL divergence doesn't depend on n_components (that just changes the dimensions of the matrix)
However, I could also create a test case for n_components > 2 if you like? It might be a sensible idea!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, @AlexanderFabisch did say:

  1. Probably nobody used n_components > 2 before. :) We need some kind of unit test for that

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is because by default degrees_of_freedom = n_components - 1
However, you can also run t-SNE with other degrees_of_freedom on a 2D embedding

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding an additional test for larger n_components wouldn't hurt though ;)

n_iter_without_progress=30, min_grad_norm=1e-7,
metric="euclidean", init="random", verbose=0,
random_state=None, method='barnes_hut', angle=0.5):
random_state=None, method='barnes_hut', angle=0.5, degrees_of_freedom=1, min_error_diff=1e-7):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 line length

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed this PEP8 violation in my latest commit

@@ -1351,6 +1351,42 @@ def make_s_curve(n_samples=100, noise=0.0, random_state=None):

return X, t

def make_trefoil_knot(n_samples=100, noise=0.0, random_state=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please split this into a separate pull request.

Copy link
Author

@artcg artcg Nov 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have deleted this function from this pull request

def make_trefoil_knot(n_samples=100, noise=0.0, random_state=None):
"""Generate an Uniform Trefoil Knot

Read more in the :ref:`User Guide <sample_generators>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only true if you modify the user guide (which you should). Sample generators are clearest with an example, too.

Copy link
Author

@artcg artcg Nov 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have deleted this function from this pull request

@artcg
Copy link
Author

artcg commented Nov 16, 2016

Thanks for the comments Joel, I'll make the changes you mentioned this afternoon!

On 16 Nov 2016, at 2:55 a.m., Joel Nothman notifications@github.com wrote:

@jnothman requested changes on this pull request.

Thanks. Please give this PR a more informative title

In sklearn/manifold/t_sne.py:

@@ -736,11 +738,9 @@ def _fit(self, X, skip_num_points=0):
"the metric or precomputed distances given "
"as X are not correct")

  •    # Degrees of freedom of the Student's t-distribution. The suggestion
    
  •    # degrees_of_freedom = n_components - 1 comes from
    
  •    # "Learning a Parametric Embedding by Preserving Local Structure"
    
  •    # Laurens van der Maaten, 2009.
    
  •    degrees_of_freedom = max(self.n_components - 1.0, 1)
    
    We need to retain this behaviour by default. I.e. make degrees_of_freedom=None by default, and handle that case with this logic.

In sklearn/manifold/t_sne.py:

@@ -645,10 +645,10 @@ class TSNE(BaseEstimator):
"""

 def __init__(self, n_components=2, perplexity=30.0,
  •             early_exaggeration=4.0, learning_rate=1000.0, n_iter=1000,
    
  •             early_exaggeration=4.0, learning_rate=200.0, n_iter=2000,
    
    We will not change default values without a deprecation cycle. I think we're best not changing these.

In sklearn/manifold/t_sne.py:

@@ -142,10 +142,12 @@ def _kl_divergence(params, P, degrees_of_freedom, n_samples, n_components,

 # Q is a heavy-tailed distribution: Student's t-distribution
 n = pdist(X_embedded, "sqeuclidean")

In sklearn/manifold/t_sne.py:

              n_iter_without_progress=30, min_grad_norm=1e-7,
              metric="euclidean", init="random", verbose=0,
  •             random_state=None, method='barnes_hut', angle=0.5):
    
  •             random_state=None, method='barnes_hut', angle=0.5, degrees_of_freedom=1, min_error_diff=1e-7):
    
    PEP8 line length

In sklearn/datasets/samples_generator.py:

@@ -1351,6 +1351,42 @@ def make_s_curve(n_samples=100, noise=0.0, random_state=None):

 return X, t

+def make_trefoil_knot(n_samples=100, noise=0.0, random_state=None):
Please split this into a separate pull request.

In sklearn/datasets/samples_generator.py:

@@ -1351,6 +1351,42 @@ def make_s_curve(n_samples=100, noise=0.0, random_state=None):

 return X, t

+def make_trefoil_knot(n_samples=100, noise=0.0, random_state=None):

  • """Generate an Uniform Trefoil Knot
  • Read more in the :ref:User Guide <sample_generators>.
    Only true if you modify the user guide (which you should). Sample generators are clearest with an example, too.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

@artcg artcg changed the title [RMG] Tsne [RMG] Fixes bugs in t-SNE argument passing, and a bug with running using n dfs Nov 16, 2016
@artcg artcg changed the title [RMG] Fixes bugs in t-SNE argument passing, and a bug with running using n dfs [MRG] Fixes bugs in t-SNE argument passing, and a bug with running using n dfs Nov 16, 2016
@artcg artcg changed the title [MRG] Fixes bugs in t-SNE argument passing, and a bug with running using n dfs [WIP] Fixes bugs in t-SNE argument passing, and a bug with running using n dfs Nov 16, 2016
@artcg
Copy link
Author

artcg commented Nov 16, 2016

I've noticed there is also a bug in barnes_hut t-SNE with n degrees of freedom -- I will fix this then mark my Pull request as MRG again.

@artcg artcg changed the title [WIP] Fixes bugs in t-SNE argument passing, and a bug with running using n dfs [MRG] Fixes bugs in t-SNE argument passing, and a bug with running using n dfs Nov 18, 2016
@amueller
Copy link
Member

is there a test for the barnes-hut change?

@artcg
Copy link
Author

artcg commented Nov 29, 2016 via email

@amueller
Copy link
Member

I mean there is no regression test for that, is there?

@artcg
Copy link
Author

artcg commented Nov 30, 2016 via email

@artcg
Copy link
Author

artcg commented Nov 30, 2016 via email

@jnothman
Copy link
Member

jnothman commented Nov 30, 2016 via email

@artcg
Copy link
Author

artcg commented Nov 30, 2016 via email

@amueller
Copy link
Member

@artcg thanks but that test only covers the exact t-SNE code, right? Or does it do both?

Maybe I'm misunderstanding something. You fixed a bug in both algorithms, the exact one and the approximate one, right? If you fixed two bugs, you should introduce a test for each of them.

@artcg
Copy link
Author

artcg commented Dec 8, 2016 via email

@amueller
Copy link
Member

amueller commented Dec 8, 2016

@artcg ok, thanks :)

@artcg
Copy link
Author

artcg commented Dec 29, 2016

Hi all hope you have had a good holiday,
any update on the review for this? Sorry the PR has been a mess! Still getting used to github...

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relying on some of the technical review from @AlexanderFabisch (thanks!), this LGTM now, apart from the minor nitpicks.

Please add a whats_new.rst entry.

for j in range(l[0]):
qijZ = ((1.0 + dist2s[j]) / dof) ** exponent
oijZ = ((1.0 + (dist2s[j]) / dof)) ** -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(dist2s[j]) -> dist2s[j]

@@ -528,8 +528,9 @@ cdef float compute_gradient_positive(float[:,:] val_P,
for ax in range(n_dimensions):
buff[ax] = pos_reference[i, ax] - pos_reference[j, ax]
D += buff[ax] ** 2.0
Q = (((1.0 + D) / dof) ** exponent)
Q = ((((D) / dof) + 1.0) ** -1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

((D) / dof) -> D / dof

@jnothman jnothman changed the title [MRG] Fixes bugs in t-SNE argument passing, and a bug with running using n dfs [MRG+1] Fixes bugs in t-SNE argument passing, and a bug with running using n dfs Jan 1, 2017
artcg added 9 commits January 1, 2017 17:40
…strating properties of Manifold Learning algorithms.

An example has been added to /examples
…hod), and set up the input parameter to allow for user-selected number of degrees of freedom
n_iter 1000 -> 2000

The t-SNE has built in functionality to stop if it isn't learning (n_iter_without_progress, and min_error_diff)
With these in place, it seems reasonable to allow it to iterate a little longer if it is still learning

learning rate 1000 -> 200

The learning rate of 100 is used in the 2008 paper introducting t-SNE, and Van der Maaten uses the value of 200 on
his implementation of bh_tsne. With that in mind, it seems sensible to use a value of 200
@artcg
Copy link
Author

artcg commented Jan 1, 2017

Thanks for the review @jnothman I made those changes and rebased the pull request so it is synced with the master

@@ -121,14 +121,15 @@ Enhancements
Bug fixes
.........

- Fixed a bug where :class:`sklearn.manifold.t_sne` behaved incorrectly with degrees of freedom other than 1 (default). by :user:`Arthur Goldberg <artcg>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we say n_components other than 2? Also, please keep to 80 chars per line

@artcg
Copy link
Author

artcg commented Jan 4, 2017

Sure thanks for pointing that out, changed.

@jnothman
Copy link
Member

Should be fixed in #9032

@jnothman jnothman closed this Jul 13, 2017
@jnothman
Copy link
Member

Thanks @artcg for your contributions.

@artcg
Copy link
Author

artcg commented Jul 18, 2017

And thanks for the help @jnothman! I learned a lot about good code practice from your reviews

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants