Skip to content

[MRG] Adding fit_transform #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

bhargavvader
Copy link
Contributor

@bhargavvader bhargavvader commented Sep 8, 2016

@perimosocordiae , could you have a look and see if this is the correct idea/direction to go in?
And this sort of fit_transform would be wanted for all the algorithms, correct?

This is with respect to #25.

@bhargavvader
Copy link
Contributor Author

bhargavvader commented Sep 8, 2016

I think what I've currently done is a bit verbose - maybe just a -

def fit_transform(self, X, Y = none):
      self.fit()
      return self.transform(self.X)

would be better?

@perimosocordiae
Copy link
Contributor

I think you might be able to define a generic fit_transform() method on the base class, similar to your example:

def fit_transform(self, *args, **kwargs):
    self.fit(*args, **kwargs)
    return self.transform()

Some subclasses might need special treatment, but that should be a good start.

We'll also want to add a test case for each algorithm's subclass.

@bhargavvader
Copy link
Contributor Author

@perimosocordiae which subclasses might need special treatment? I made the change and tried it out on all the algorithms and it seems fine.

Should the test format be similar to what is already done in metric_learn_test?

@perimosocordiae
Copy link
Contributor

The docstring needs some work. Maybe just describe the function as calling .fit() and then returning the result of .transform(). There should be a note to see the arguments passed to .fit(), specifically, because the generic *args, **kwargs parameters don't tell the user anything. Also, the docstring should specify that it returns the metric-transformed input data, not the transformation matrix.

For testing, it would be nice to have a simple loop that checks the result of calling .fit_transform() vs manually calling .fit() and then .transform() for all methods. Be sure to use some small set of test data, so that the test case doesn't take very long to run. This could be another file in the test/ directory. I'll probably rename the existing test to something like test_iris.py, and your new test file could be named test_fit_transform.py.

@bhargavvader
Copy link
Contributor Author

bhargavvader commented Sep 13, 2016

Funnily enough, tests fail for ITML, SDML, RCA, LSML and LDFA when I do .fit() and .transform() and when I do fit_transform(). I'm still going to push with the failing tests and re-done docstrings, could you check and see maybe why they are failing? Is it to do with a random seed maybe?

edit: i'll change the specifics of the tests; this is just a rough outline of what they might look like.

@perimosocordiae
Copy link
Contributor

The test failures look like a random seed issue, though I haven't verified on my end.

@bhargavvader
Copy link
Contributor Author

How do you suggest getting around the random seed issue? Can I pass a seed to the algorithm; or would declaring it outside before running .fit() and .transform() work as well?

@bhargavvader
Copy link
Contributor Author

bhargavvader commented Sep 15, 2016

@perimosocordiae could you please help verify the source of the errors?
While lsml, sdml, and itml use random_subset for constraints, rca and lfda don't.
So, some questions:

  1. Could we allow the passing of a random seed to the algorithms which use random_subset ? It could always come in handy when users want to replicate results (and for testing, of course).

  2. What is going on with rca and lfda? Also, funnily, lfda returns the same matrix but with the signs reversed when I use fit_transform. Any clue why why this may be so?

@perimosocordiae
Copy link
Contributor

I made a new issue (#33) for your first point. RCA is affected by the same issue because the chunks method also uses random numbers.

LFDA is based on an eigen decomposition and eigenvectors are sign-invariant, so that's why you're seeing the sign flipping. Both +v and -v are equivalent, which makes things trickier to test. We could choose a convention that the first value of the first eigenvector must be positive, which would make LFDA results reproducible without changing its results.

@bhargavvader
Copy link
Contributor Author

Thanks - this answers everything I wanted to know. Will get to this over the weekend.

@bhargavvader
Copy link
Contributor Author

bhargavvader commented Sep 29, 2016

@perimosocordiae , for some reason despite the passing of seed which is np.random.RandomState(1234) the tests fail. Can you please have a look and see if I am doing something wrong?

@bhargavvader
Copy link
Contributor Author

bhargavvader commented Sep 29, 2016

Thought I would fix it by adding parameters for random_state in the adjacency matrix and positive_negative_pairs calls in ITML and the others but it still comes up. Needs a closer look.

@bhargavvader
Copy link
Contributor Author

ping @perimosocordiae could you have a look please?

@perimosocordiae
Copy link
Contributor

You need to reset the random state between the two times you use it (in the fit_transform test cases).

@bhargavvader
Copy link
Contributor Author

Ah, okay. The tests are fixed now.

@bhargavvader bhargavvader changed the title [WIP] Adding fit_transform [MRG] Adding fit_transform Oct 4, 2016
@bhargavvader
Copy link
Contributor Author

@perimosocordiae , could you see if this is fine?
I could redo the Notebook with examples of fit_transform.

@perimosocordiae perimosocordiae merged commit c5087d7 into scikit-learn-contrib:master Oct 6, 2016
@perimosocordiae
Copy link
Contributor

Looks good, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants