Skip to content

[MRG] Fix RCA_Supervised sklearn compat test #198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 13, 2019

Conversation

wdevazelhes
Copy link
Member

@wdevazelhes wdevazelhes commented May 2, 2019

I noticed that we didn't test RCA_Supervised scikit-learn compatibility

A first problem in scikit-learn's check_estimator that prevented the test to pass was that RCA_Supervised couldn't form the default n_chunks number of chunks for one of the toy examples that scikit-learn had run. Fixing num_chunks=2 ensures a normal use of the algorithm is tested (if we had put 1 chunk then it would have become a corner case)and is robuts when we cannot do a lot of chunks of data.

A second problem was just a bug when the input was int, because we couldn't substract inplace the mean to chunk_data (which was an array of ints, and numpy returns an error in this case). Casting to float this object fixes it

@wdevazelhes wdevazelhes changed the title Add checks for labels when having pairs [MRG] Fix RCA_Supervised sklearn compat test May 2, 2019
wdevazelhes pushed a commit to wdevazelhes/metric-learn that referenced this pull request May 2, 2019
# through slices hence we do a copy. We will also need to
# ensure the data is float so that we can substract the
# mean on it
chunk_data = data[chunk_mask].astype(float, copy=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logical indexing always creates a copy, so the copy=True isn't necessary here. In fact, astype() defaults to returning a new copy each time, so what we really want here is copy=False to avoid unnecessary copies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, thanks

# check_estimator(RCA_Supervised)
def test_rca(self):
def stable_init(self, num_dims=None, pca_comps=None,
chunk_size=2, preprocessor=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use **kwargs here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that would be better in general, but for here I think it might be useful to simulate dRCA to have the same arguments names as a real RCA, so that if scikit-learn have checks that depend on the arguments of RCA, they will be taken into account, what do you think ?
Though it's not the case here so maybe yes we can put **kwargs for simplicity

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how scikit-learn's testing works, but that seems plausible. This is fine as-is.

@bellet
Copy link
Member

bellet commented May 9, 2019

Can you briefly explain the problem and its solution in the PR description?

@wdevazelhes
Copy link
Member Author

Can you briefly explain the problem and its solution in the PR description?

Yes sorry
done

@bellet
Copy link
Member

bellet commented May 9, 2019

Thanks. Why is this stable_init thing needed?

@wdevazelhes
Copy link
Member Author

wdevazelhes commented May 9, 2019

Thanks. Why is this stable_init thing needed?

In fact there was also another bug because the default n_chunks was too big for some toy problem in scikit-learn check_estimator, so the stable_init fixing n_chunks=2 is a way to fix it (I updated the PR description, I had forgotten there was this bug too)

@bellet
Copy link
Member

bellet commented May 9, 2019

OK. I guess this could be fixed in a more robust way by making sure that n_chunks*chunk_size is not larger than the number of points, or having some kind of elegant break of the chunk generation procedure when no new chunk can be created. But let's keep this for another PR

# check_estimator(RCA_Supervised)
def test_rca(self):
def stable_init(self, num_dims=None, pca_comps=None,
chunk_size=2, preprocessor=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how scikit-learn's testing works, but that seems plausible. This is fine as-is.

@wdevazelhes
Copy link
Member Author

Merging, since travis test are green, and 2 approvals

@wdevazelhes wdevazelhes merged commit 9f73250 into scikit-learn-contrib:master May 13, 2019
bellet pushed a commit that referenced this pull request Jun 12, 2019
* Remove initialization of the data for RCA

* Add deprecated flag for supervised version too

* Remove comment saying we'll do PCA

* Add ChangedBehaviorWarning and do tests

* improve change behavior warning

* Update message in case covariance matrix is not invertible

* FIX: still ignore testing RCA while fixed in #198

* Some reformatting

* Fix test string

* TST: add test for warning message when covariance is not definite

* Address #194 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants