Break chunks generation in RCA when not enough possible chunks, fixes issue #200 #254

RobinVogel · 2019-10-29T10:43:51Z

I count for each set of instances of each label how many chunks it holds, and sum those.
If it is lower than the number of requested chunks, I raise an error.

bellet · 2019-10-29T17:14:27Z

Thanks for getting started on this. The solution looks fine but you need to write some tests to ensure that everything works as intended. You should test that the error is raised when we expect it, but it would also be useful to make sure everything goes through in cases when there is enough points. Generally it is good to test edge cases (e.g., just enough points, one point missing, etc).

bellet · 2019-10-29T17:36:13Z

Regarding where these tests should be put, I think they could either go to the class Test_RCA in metric_learn_test.py (as RCA_Supervised is currently the only algorithm using chunks), or to a new file test_constraints.py.

I think the second option is better for two reasons:

You can test the behavior of the method chunks directly, which will cover all other algorithms using chunks if we add any in the future
test_constraints.py would be the place where we can add more tests in the future for the methods in constraints.py (currently I think we only test the other methods indirectly through calls to supervised variants of specific algorithms, which is not ideal)

metric_learn/constraints.py

RobinVogel · 2019-10-30T18:05:42Z

I took some time to get familiar with tests and wrote one for the two untested cases in the chunk generation that you discussed. I tried to model what I wrote on the existing tests, and wrote something relatively general, at the risk of being verbose.

I separated the generation of labels from the tests and wrote it myself, since it needs to satisfy some constraints for edge cases.

I hesitated to group the tests in a class, as in test_sklearn_compat.py, since test_constraints.py is supposed to test the class Constraints.

bellet

Nice! Indeed you could put the tests into a class, maybe simply TestChunks or something

bellet · 2019-10-31T09:24:30Z

test/test_constraints.py

+  constraints = Constraints(labels)
+  chunks = constraints.chunks(num_chunks=num_chunks, chunk_size=chunk_size,
+                              random_state=SEED)
+  return chunks


Instead of simply returning chunks, it would be nice to test that chunks contains num_chunks chunks, each composed of chunk_size points

I will add it.

A remark, I expected chunks to map the datapoint index (input order) to the chunk number, but chunks removes -1 instances (unknown labels, according to line 21), which means that one has to consider datapoint indexes when removing -1's. But it must be taken into account in the implementation of RCA.

Can you elaborate on this? Not sure I understand what you mean.

Do we test at all RCA_Supervised in the presence of unlabeled points? At first sight, it looks like it might fail due to unlabeled points being removed from the chunk array as pointed out by @RobinVogel. @wdevazelhes, any comment?

I'll just add an example to make it clearer. Below is a comparison in the presence of unknown labels between the output that I expected from Constraints.chunks, i.e. expected_chunk and the one that I obtain from Constraints.chunks, i.e. chunks:

In [1]: from metric_learn.constraints import Constraints In [2]: partial_labels = [1, 2, 2, 1, -1, 3, 3] In [3]: cons = Constraints(partial_labels) In [4]: cons.chunks(num_chunks=3, chunk_size=2) Out[4]: array([0, 1, 1, 0, 2, 2]) In [5]: chunks = cons.chunks(num_chunks=3, chunk_size=2) In [6]: len(chunks), len(partial_labels) Out[6]: (6, 7) In [7]: expected_chunk = [0, 1, 1, 0, -1, 2, 2]

The output is not a map such that expected_chunk[i] is the chunk of element i, which I expected, but might not matter in RCA.

Thanks. And this is indeed a problem for RCA_Supervised, see the issue #260 that I just added. I have also added another issue (#261) to make clear in the doc that -1 is interpreted as unlabeled. You're very welcome to address those @RobinVogel while you have everything well in mind :-)

metric_learn/constraints.py

test/test_constraints.py

bellet

Thanks @RobinVogel ! I think this will be good to merge when the two small points above are fixed

test/test_constraints.py

bellet

Thanks, merging

wdevazelhes · 2019-11-13T15:53:15Z

This looks great !

fixes issue 200

c36d998

RobinVogel changed the title ~~fixes issue 200~~ Break chunks generation in RCA when not enough possible chunks, fixes issue #200 Oct 29, 2019

maj

a286d5c

bellet reviewed Oct 29, 2019

View reviewed changes

metric_learn/constraints.py Outdated Show resolved Hide resolved

RobinVogel added 2 commits October 30, 2019 13:45

add max_chunks in error message

93fbe80

tests the building of chunks in constraints.py

0d8a521

RobinVogel added 2 commits October 30, 2019 19:37

corrected faulty generation

d663608

still small mistake at generation

053da02

bellet reviewed Oct 31, 2019

View reviewed changes

encapsulate tests, modified message

4f8c247

wdevazelhes reviewed Nov 4, 2019

View reviewed changes

test/test_constraints.py Outdated Show resolved Hide resolved

bellet reviewed Nov 13, 2019

View reviewed changes

test/test_constraints.py Outdated Show resolved Hide resolved

Testing chunk generation in constraints

8a6d8bc

bellet approved these changes Nov 13, 2019

View reviewed changes

bellet merged commit 7700529 into scikit-learn-contrib:master Nov 13, 2019

RobinVogel deleted the issue_200 branch November 13, 2019 21:01

RobinVogel mentioned this pull request Nov 14, 2019

Repairs chunk generation for unknown labels, solves issue #260 #263

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Break chunks generation in RCA when not enough possible chunks, fixes issue #200 #254

Break chunks generation in RCA when not enough possible chunks, fixes issue #200 #254

RobinVogel commented Oct 29, 2019 •

edited by bellet

Loading

bellet commented Oct 29, 2019 •

edited

Loading

bellet commented Oct 29, 2019 •

edited

Loading

RobinVogel commented Oct 30, 2019

bellet left a comment

bellet Oct 31, 2019

RobinVogel Oct 31, 2019

bellet Nov 13, 2019 •

edited

Loading

RobinVogel Nov 13, 2019 •

edited

Loading

bellet Nov 13, 2019

bellet left a comment

bellet left a comment

wdevazelhes commented Nov 13, 2019

Break chunks generation in RCA when not enough possible chunks, fixes issue #200 #254

Break chunks generation in RCA when not enough possible chunks, fixes issue #200 #254

Conversation

RobinVogel commented Oct 29, 2019 • edited by bellet Loading

bellet commented Oct 29, 2019 • edited Loading

bellet commented Oct 29, 2019 • edited Loading

RobinVogel commented Oct 30, 2019

bellet left a comment

Choose a reason for hiding this comment

bellet Oct 31, 2019

Choose a reason for hiding this comment

RobinVogel Oct 31, 2019

Choose a reason for hiding this comment

bellet Nov 13, 2019 • edited Loading

Choose a reason for hiding this comment

RobinVogel Nov 13, 2019 • edited Loading

Choose a reason for hiding this comment

bellet Nov 13, 2019

Choose a reason for hiding this comment

bellet left a comment

Choose a reason for hiding this comment

bellet left a comment

Choose a reason for hiding this comment

wdevazelhes commented Nov 13, 2019

RobinVogel commented Oct 29, 2019 •

edited by bellet

Loading

bellet commented Oct 29, 2019 •

edited

Loading

bellet commented Oct 29, 2019 •

edited

Loading

bellet Nov 13, 2019 •

edited

Loading

RobinVogel Nov 13, 2019 •

edited

Loading