-
Notifications
You must be signed in to change notification settings - Fork 228
Repairs chunk generation for unknown labels, solves issue #260 #263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Some tests would be nice, in particular one which involves running RCA on a dataset which contains unlabeled points (which would crash before this PR). I think a natural candidate is to test whether RCA returns the same thing when fitted on a dataset with unlabeled point and the same dataset without the unlabeled points.
A second test to check that unlabeled points are not assigned to any chunk would be good too.
I wrote, the one on RCA in The tests of code coverage did not pass because of the fact that the method After a grep on all the
For the moment, I opted for deleting it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. There is a problem in one of the tests. Otherwise LGTM. Indeed adjacency_matrix
is not used anywhere anymore since we rewrote the API. I do not think we will ever need it again so it is safe to delete it.
The test on RCA is a natural one given the change made in this PR: it checks that unlabeled points are ignored. Note also that the test would fail in the previous version (since RCA would crash in the presence of unlabeled points). Generally when fixing a bug one should always write a test that would fail before to make sure the bug is not introduced again later without notice. The test to check that no unlabeled points are assigned to any chunks is related to the fact that the change in this PR keeps entries corresponding to unlabeled points in the chunk array. Therefore it makes sense to check that there is no indexing issue, i.e. no such point is actually assigned to a chunk. |
I spent some thinking to understand why I always have a covariance matrix that is composed of
I think both points should be addressed with proper errors. Should I raise an issue, or try and address those in this PR ? |
Good observations. Note that we already have a warning which is raised in this case, advising the user to first reduce the dimension of the data:
Regarding how to improve the errors:
|
For the first point, we could simply adapt the current error:
|
I did all the expected modifications and added the remark with the bound on the rank. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
The discussions in pull request #254 were about the generations of chunks in
constraints.py
not providing a map of the indice of the element to the chunk in the presence of unknown labels.See the example below:
This pull request solves that, which resolves #260 about RCA crashing when used with unknown labels. Since unknown labels seem to be handled in
rca.py
with a variable namedchunk_mask
I expectrca.py
to work with unknown labels now.I had to save the partial labels in
constraints.py
because I need to locate the locations of the unknown labels (label < 0) inpartial_labels
. I also corrected the remains ofunittest
intest_constraints.py
.