-
Notifications
You must be signed in to change notification settings - Fork 228
[MRG] Fix RCA_Supervised sklearn compat test #198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Fix RCA_Supervised sklearn compat test #198
Conversation
metric_learn/rca.py
Outdated
# through slices hence we do a copy. We will also need to | ||
# ensure the data is float so that we can substract the | ||
# mean on it | ||
chunk_data = data[chunk_mask].astype(float, copy=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logical indexing always creates a copy, so the copy=True
isn't necessary here. In fact, astype()
defaults to returning a new copy each time, so what we really want here is copy=False
to avoid unnecessary copies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, thanks
# check_estimator(RCA_Supervised) | ||
def test_rca(self): | ||
def stable_init(self, num_dims=None, pca_comps=None, | ||
chunk_size=2, preprocessor=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use **kwargs
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that would be better in general, but for here I think it might be useful to simulate dRCA to have the same arguments names as a real RCA, so that if scikit-learn have checks that depend on the arguments of RCA, they will be taken into account, what do you think ?
Though it's not the case here so maybe yes we can put **kwargs for simplicity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how scikit-learn's testing works, but that seems plausible. This is fine as-is.
Can you briefly explain the problem and its solution in the PR description? |
Yes sorry |
Thanks. Why is this |
In fact there was also another bug because the default |
OK. I guess this could be fixed in a more robust way by making sure that |
# check_estimator(RCA_Supervised) | ||
def test_rca(self): | ||
def stable_init(self, num_dims=None, pca_comps=None, | ||
chunk_size=2, preprocessor=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how scikit-learn's testing works, but that seems plausible. This is fine as-is.
Merging, since travis test are green, and 2 approvals |
* Remove initialization of the data for RCA * Add deprecated flag for supervised version too * Remove comment saying we'll do PCA * Add ChangedBehaviorWarning and do tests * improve change behavior warning * Update message in case covariance matrix is not invertible * FIX: still ignore testing RCA while fixed in #198 * Some reformatting * Fix test string * TST: add test for warning message when covariance is not definite * Address #194 (comment)
I noticed that we didn't test
RCA_Supervised
scikit-learn compatibilityA first problem in scikit-learn's
check_estimator
that prevented the test to pass was thatRCA_Supervised
couldn't form the defaultn_chunks
number of chunks for one of the toy examples that scikit-learn had run. Fixingnum_chunks=2
ensures a normal use of the algorithm is tested (if we had put 1 chunk then it would have become a corner case)and is robuts when we cannot do a lot of chunks of data.A second problem was just a bug when the input was int, because we couldn't substract inplace the mean to
chunk_data
(which was an array of ints, and numpy returns an error in this case). Casting to float this object fixes it