Skip to content

[MRG] MAINT: remove variables not needed to store #159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

wdevazelhes
Copy link
Member

Fixes #134

``d(c, d) > neg`` for all given pairs of dissimilar points ``c`` and
``d``, with ``bounds=[pos, neg]``, and ``d`` the learned distance. If
not provided at initialization, these are the ones derived at train
time.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with ITML but the above is based on what I understood of it. Feel free to tell me if I'm wrong

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As currently implemented, bounds_ should be an ndarray-like of shape (2,).

We should probably update the code to accept list or tuple types for the bounds as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"these are the ones" -> "these are"

all given pairs of similar points ``a`` and ``b``, and
``d(c, d) > neg`` for all given pairs of dissimilar points ``c`` and
``d``, with ``bounds=[pos, neg]``, and ``d`` the learned distance.
If not provided at initialization, these will be derived at train time.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also updated the description of the argument in fit (bounds), to match the one of self.bounds_, and to be more detailed. Same comment: feel free to tell me if I'm wrong

``d(c, d) > neg`` for all given pairs of dissimilar points ``c`` and
``d``, with ``bounds=[pos, neg]``, and ``d`` the learned distance. If
not provided at initialization, these are the ones derived at train
time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As currently implemented, bounds_ should be an ndarray-like of shape (2,).

We should probably update the code to accept list or tuple types for the bounds as well.

``d(c, d) > neg`` for all given pairs of dissimilar points ``c`` and
``d``, with ``bounds=[pos, neg]``, and ``d`` the learned distance. If
not provided at initialization, these are the ones derived at train
time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"these are the ones" -> "these are"

time.

n_iter_ : `int`
The number of iterations the solver has ran.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"has ran" -> "has run"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As currently implemented, bounds_ should be an ndarray-like of shape (2,).

We should probably update the code to accept list or tuple types for the bounds as well.

That's right, I forgot about that. For now I'll just change the docstring, but I'll open an issue to get this thing straight

"these are the ones" -> "these are"

Thanks, will do

"has ran" -> "has run"

That's right, thanks

@@ -181,16 +181,16 @@ class ITML_Supervised(_BaseITML, TransformerMixin):

Attributes
----------
bounds_ : `list` of two numbers
bounds_ : array-like, shape=(2,)
Bounds on similarity, aside slack variables, s.t. ``d(a, b) < pos`` for
all given pairs of similar points ``a`` and ``b``, and
``d(c, d) > neg`` for all given pairs of dissimilar points ``c`` and
``d``, with ``bounds=[pos, neg]``, and ``d`` the learned distance. If
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep saying bounds=[pos, neg] here because it can be still be a list here, and I think this is more general than bounds=array([pos, neg]) which would make think more of a numpy array only

@wdevazelhes wdevazelhes changed the title MAINT: remove variables not needed to store [MRG] MAINT: remove variables not needed to store Jan 24, 2019
all given pairs of similar points ``a`` and ``b``, and
``d(c, d) > neg`` for all given pairs of dissimilar points ``c`` and
``d``, with ``bounds=[ pos, neg]``, and ``d`` the learned distance. If
not provided at initialization, these are derived at train
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it would be nice to explain how it is done then: bounds_[0] and bounds_[1] are set to the 5th and 95th percentile of the pairwise distances among all points available at training?
for supervised version, of all points in the training data X ; for weakly supervised, of all points present in pairs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree it's more precise

all given pairs of similar points ``a`` and ``b``, and
``d(c, d) > neg`` for all given pairs of dissimilar points ``c`` and
``d``, with ``bounds=[pos, neg]``, and ``d`` the learned distance.
If not provided at initialization, these will be derived at train time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

``d(c, d) > neg`` for all given pairs of dissimilar points ``c`` and
``d``, with ``bounds=[pos, neg]``, and ``d`` the learned distance. If
not provided at initialization, these are derived at train
time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

all given pairs of similar points ``a`` and ``b``, and ``d(c, d) > neg``
for all given pairs of dissimilar points ``c`` and ``d``, with
``bounds=[pos, neg]``, and ``d`` the learned distance. If not provided at
initialization, these will be derived at train time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

@wdevazelhes
Copy link
Member Author

Regarding bounds, I also noticed that for the supervised version it has to be given at initalization, and for the weakly supervised version it has to be put at fit. I think we should uniformize that. Since it is a data dependent parameter (indeed the default setting is deduced from the data), I think we should put them in fit. Since this is out of the scope of this PR and it should have a deprecation warning (which adds again code in this PR), I'll open another PR for that, and for this one I'll let the parameter where they are.

@wdevazelhes
Copy link
Member Author

I adressed all the comments, so we should be good to merge this one

Bounds on similarity, aside slack variables, s.t. ``d(a, b) < pos`` for
all given pairs of similar points ``a`` and ``b``, and
``d(c, d) > neg`` for all given pairs of dissimilar points ``c`` and
``d``, with ``bounds=[ pos, neg]``, and ``d`` the learned distance. If
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion was to get rid of bounds=[ pos, neg] altogether and replace pos and neg in the inequalities by bounds_[0] and bounds_[1]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I agree it's even clearer, done

William de Vazelhes added 2 commits January 29, 2019 13:49
@bellet bellet merged commit b336eba into scikit-learn-contrib:master Jan 29, 2019
@wdevazelhes wdevazelhes deleted the maint/dont_store_variables branch January 29, 2019 14:44
bellet pushed a commit that referenced this pull request Jan 29, 2019
* MAINT: remove variables not needed to store

* Address review #159 (review)

* DOC: add more precise docstring

* API: put parameter  in fit, deprecate it in init, and also change previous deprecation tests names

* Change remaining test names
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants