Skip to content

[MRG] Metric Learning Tutorial Notebook #27

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 9, 2016

Conversation

bhargavvader
Copy link
Contributor

@bhargavvader bhargavvader commented Sep 8, 2016

@perimosocordiae , thought this might be handy in visualising what the algorithms are doing. It's very useful for me, so I thought I'd clean up the code I have and make a notebook.

Let me know if this is ok, and whether you might want more things added to this.
Once #26 is completed, we can change this later to reflect fit_transform as well.

Edit: also wanted to know where the right place for this would be, whether in the examples folder or in the docs folder.

@perimosocordiae
Copy link
Contributor

Very cool! I think this would fit nicely in examples/.

Some of the code in the notebook can be cleaned up, and it might be better to use an alternate dataset to really show how the methods differ. Some of the papers have illustrative examples that we might be able to use.

The one big omission from the notebook is how you might specify constraints manually for each method. (Rather than just using the _Supervised variants.) This might be a bit harder to show clearly, but I think it's worthwhile.

@bhargavvader
Copy link
Contributor Author

bhargavvader commented Sep 9, 2016

Could you mention which part exactly could be cleaned up?
I would ideally want to use a different data set too, but thought of using just one for all the examples because it would be easier in making a comparison between the algorithms.
Do you have any ideas of other data sets to use, in particular?

I like the idea of specifying constraints manually, I'll add what I can regarding that to the notebook.

Edit: On that note, is there an example or documentation for specifying constraints?

@perimosocordiae
Copy link
Contributor

Regarding code cleanup, you keep reassigning the X variable, which makes things confusing to read. Maybe have each method assign to a separate variable (i.e., x_lmnn or x_itml) when calling .transform()?

The docs aren't very clear about manual constraint specification yet, but the docstrings for each method should give you a starting point. For example, ITML's constraints are a "4-tuple of arrays (a,b,c,d)" which are "indices into X, such that d(X[a],X[b]) < d(X[c],X[d])". You can also look at the constraints class to find examples of making each kind of constraint.

@bhargavvader
Copy link
Contributor Author

TODO:

  • clean up notebook with regard to old comments
  • add example of fit_transform as well
  • briefly describe each algorithm as well as link to paper
  • give example of manual constraints
  • maybe add nice motivating example to use any one also (e.g clustering)

Just some notes for myself. I'll get to this soon :)

@perimosocordiae
Copy link
Contributor

This is looking good! A few lines of description about each method will be nice, as well as a mention that (for the most part) we follow scikit-learn conventions.

It might also be good to explain why running the same learner multiple times might not always give the same results. The current API for the supervised methods hides all of the randomness, which might be confusing to a new user.

@bhargavvader
Copy link
Contributor Author

Added descriptions and links for all the algorithms, and mentioned about scikit-learn and using random seeds.

Only have to do the manual constraints bit, just going to spend some time coming up with a good example for it.

@bhargavvader
Copy link
Contributor Author

bhargavvader commented Nov 3, 2016

@perimosocordiae, apart from ITML and LSML all the methods use labels as a way to supervise the learning, right?

So the example of manual constraints, it's only for them, right?

edit: Ah, I noticed the connectivity graph for SDML and chunks for RCA too.

@bhargavvader
Copy link
Contributor Author

@perimosocordiae I'm done with the tutorial!
If you think it needs any other additions let me know; but from my side, right now, you can merge.

@bhargavvader bhargavvader changed the title [WIP] Metric Learning Tutorial Notebook [MRG] Metric Learning Tutorial Notebook Nov 7, 2016
@perimosocordiae perimosocordiae merged commit 3c6b951 into scikit-learn-contrib:master Nov 9, 2016
@perimosocordiae
Copy link
Contributor

Great, thanks! This will be a useful resource for people new to the project.

@terrytangyuan
Copy link
Member

Awesome! It would be better if we include the associated .py file as part of the test suite.

@perimosocordiae
Copy link
Contributor

We could have an automated test run jupyter nbconvert --to script on it, then make sure it runs without errors. I think that could be added to the TravisCI tests without too much difficulty.

@bhargavvader
Copy link
Contributor Author

I haven't made an associated .py file, but could if needed; but I think adding it to the TravisCI tests would be better, right?

@terrytangyuan
Copy link
Member

CJ's idea sounds good without the need to maintain a separate py file. Yeah it would be better to continuously test it, which is critical for future users.

@wdevazelhes wdevazelhes mentioned this pull request Mar 8, 2019
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants