Skip to content

[Bug] RCA_Supervised crashes when fit on dataset with unlabeled points #260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bellet opened this issue Nov 13, 2019 · 0 comments · Fixed by #263
Closed

[Bug] RCA_Supervised crashes when fit on dataset with unlabeled points #260

bellet opened this issue Nov 13, 2019 · 0 comments · Fixed by #263
Milestone

Comments

@bellet
Copy link
Member

bellet commented Nov 13, 2019

Description

While the module Constraints provide the ability to have unlabeled points as input (labeled -1), the method chunks removes unlabeled points in the returned chunks array, which thus has different dimension than the dataset X passed as input to the fit method of RCA_Supervised.

I think the most natural and simple solution is to keep the unlabeled points in chunks with value -1, which is already interpreted by RCA as "not belonging to any chunk".

Steps/Code to Reproduce

from metric_learn import RCA_Supervised
import numpy as np

X = np.random.rand(5, 2)
y = [1, 1, -1, 2, 2]

rca = RCA_Supervised(num_chunks=2)
rca.fit(X, y)

Expected Results

Fit without error

Actual Results

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/aurelien/Documents/research/github/metric-learn/metric_learn/rca.py", line 244, in fit
    return RCA.fit(self, X, chunks)
  File "/home/aurelien/Documents/research/github/metric-learn/metric_learn/rca.py", line 132, in fit
    X, chunks = self._prepare_inputs(X, chunks, ensure_min_samples=2)
  File "/home/aurelien/Documents/research/github/metric-learn/metric_learn/base_metric.py", line 101, in _prepare_inputs
    **kwargs)
  File "/home/aurelien/Documents/research/github/metric-learn/metric_learn/_util.py", line 131, in check_input
    y_numeric=y_numeric)
  File "/home/aurelien/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 729, in check_X_y
    check_consistent_length(X, y)
  File "/home/aurelien/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 205, in check_consistent_length
    " samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [5, 4]

Versions

@bellet bellet added this to the v0.6.0 milestone Dec 13, 2019
bellet pushed a commit that referenced this issue Jan 3, 2020
* chunks return a map of index to chunk

* maj

* maj

* remove storing of known labels

* typo

* no self.num_points

* tests for unlabeled, repairs chunk generation

* maj

* testing diff features

* corrected test

* diff warning

* maj

* added parameter bound test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant