Skip to content

DecisionTreeClassifier unknown label type: 'continuous-multioutput' #7801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
KamodaP opened this issue Oct 31, 2016 · 20 comments
Closed

DecisionTreeClassifier unknown label type: 'continuous-multioutput' #7801

KamodaP opened this issue Oct 31, 2016 · 20 comments

Comments

@KamodaP
Copy link

KamodaP commented Oct 31, 2016

Description

DecisionTreeClassifier crashes with unknown label type: 'continuous-multioutput'. I've tried loading csv file using csv.reader, pandas.read_csv and some other stuff like parsing line-by-line.

Steps/Code to Reproduce

from sklearn import tree
feature_df = pd.read_csv(os.path.join(_PATH, 'features.txt'))
target_df = pd.read_csv(os.path.join(_PATH, 'target.txt'))
feature_df = feature_df._get_numeric_data()
target_df = target_df._get_numeric_data()
feature_df = feature_df.fillna(0)
target_df = target_df.fillna(0)
clf = tree.DecisionTreeClassifier()
clf_o = clf.fit(feature_df, target_df)

features.txt
target.txt

Expected Results

Error thrown informs user what REALLY is wrong, that f.e. his data set does not folllow assumptions (and what are those)

Actual Results

Traceback (most recent call last):
  File "D:\Piotr\Documents\uni\bap\BAPFingerprintLocalisation\main.py", line 19,
 in <module>
    decision_tree.treeClassification()
  File "D:\Piotr\Documents\uni\bap\BAPFingerprintLocalisation\code\decision_tree
.py", line 56, in treeClassification
    clf_o = clf.fit(feature_df, target_df)
  File "C:\Python35\lib\site-packages\sklearn\tree\tree.py", line 182, in fit
    check_classification_targets(y)
  File "C:\Python35\lib\site-packages\sklearn\utils\multiclass.py", line 172, in
 check_classification_targets
    raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous-multioutput'

Versions

Windows-10-10.0.14393-SP0
Python 3.5.1 (v3.5.1:37a07cee5969, Dec  6 2015, 01:54:25) [MSC v.1900 64 bit (AMD64)]
NumPy 1.11.0
SciPy 0.17.1
Scikit-Learn 0.18

Update:

I've changed number of target variables to one, just to simplify things

clf_o = clf.fit(feature_df, target_df.ix[:,1])

Output: Unknown label type: 'continuous'

@jnothman
Copy link
Member

You should be using DecisionTreeRegressor

@KamodaP
Copy link
Author

KamodaP commented Nov 1, 2016

Again, documentation lacks information on how many classes can classification handle. I can see that my dataset has waaaay too many classes, but your error message mentioned something like 'labels' which was confusing enough to forget how the dataset actually look like and meddle with methods of passing datasets.
I've updated the issue and ask you to reopen it.

@jnothman
Copy link
Member

jnothman commented Nov 1, 2016

Classification targets should be represented as integers or as strings. You can ask Pandas to read the target data in as a string and you'll be fine.

@jnothman
Copy link
Member

jnothman commented Nov 1, 2016

Or use a DecisionTreeRegressor

@KamodaP
Copy link
Author

KamodaP commented Nov 1, 2016

That's not my problem

@KamodaP
Copy link
Author

KamodaP commented Nov 1, 2016

See 'Expected Results' section of my issue

@jnothman
Copy link
Member

jnothman commented Nov 1, 2016

You're right that the error message could be more useful, but the documentation for fit does say "class labels in classification". Feel free to submit a clearer issue about needing to document the expected data type for classification ys, and another for raising appropriate error messages when float data is passed as y to a classifier.

@KamodaP
Copy link
Author

KamodaP commented Nov 1, 2016

Let me cite the whole section of documentation documenting parameter y of function fit in class DecisionTreeClassifier

The target values (class labels in classification, real numbers in regression). In the regression case, use dtype=np.float64 and order='C' for maximum efficiency.

That does not say that classes have a cap. What makes a target variable labeled continuous? How many classes have to be there to be considered regression type target variable? If it sais about regression, then can I do regression using DecisionTreeClassifier? Why not? Etc...

As for your previous comment:

Classification targets should be represented as integers or as strings. You can ask Pandas to read the target data in as a string and you'll be fine.

Does that mean that classes can't be represented as floats? Or as dicts? Lists? Tuples? Longs? Doubles? bytes? I know it is logical to represent classes as integers or strings, since they should not be plenty. But do they have to? What are the limitations?

And as to creating new ticket, isn't that useless since we've had quite a talk in here? Creating new ticket just to explain other guy the same thing?

@jnothman
Copy link
Member

jnothman commented Nov 1, 2016

It's not number of classes. It's use of non-integers and non-strings.

I like the issue descriptions to be focused. Your concern as raised here
seemed to be more of a usage problem.

And please don't hassle me about what I suggest. This isn't the only issue
I'm dealing with.

On 2 November 2016 at 00:24, Piotr Kamoda notifications@github.com wrote:

Let me cite the whole section of documentation documenting parameter y of
function fit in class DecisionTreeClassifier

The target values (class labels in classification, real numbers in
regression). In the regression case, use dtype=np.float64 and order='C' for
maximum efficiency.

That does not say that classes have a cap. What makes a target variable
labeled continuous? How many classes have to be there to be considered
regression type target variable? If it sais about regression, then can I do
regression using DecisionTreeClassifier? Why not? Etc...

And as to creating new ticket, isn't that useless since we've had quite a
talk in here? Creating new ticket just to explain other guy the same thing?


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#7801 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz63zNA0Qc4lzgHttXx-4VFkJFwFaDks5q5z16gaJpZM4KlSFw
.

@KamodaP
Copy link
Author

KamodaP commented Nov 1, 2016

You don't have to solve it today, I'm only trying to make the issue of bad error descriptions and bad documentation on tree classifier and regressor to become active and a task for future releases.

@amueller
Copy link
Member

amueller commented Nov 1, 2016

For the error message would "Unsupported output type: 'continuous-multioutput'" be better? That is the real issue. Also see #7809 for the docstring.

@KamodaP
Copy link
Author

KamodaP commented Nov 2, 2016

That's better. But still I don't understand why you won't name it as it is. Because literature mostly calls that 'Target' variables, and output could be mistaken with function output. Exception was thrown from function 'check_classification_targets', so even you say that's 'target' variable, and still you want to call it 'label' or 'output'. I'm not a member of scikit-learn member, so you will do as you please, but I would recommend to use words 'Target variable' in doscstring and error message. And I ask you to describe anywhere rules that input data (or target) should follow. A short sentence - 'Target variable (parameter y) has to be int or str'.

@jnothman
Copy link
Member

jnothman commented Nov 2, 2016

Maybe it's worth mentioning in/alongside the new section (45cb11d / #7519)
on multiclass and multilabel fitting in the tutorial. Or maybe this all
belongs in a section of the user guide on data representation conventions,
describing input/output formats for all standard methods...?

On 2 November 2016 at 20:56, Piotr Kamoda notifications@github.com wrote:

That's better. But still I don't understand why you won't name it as it
is. Because literature mostly calls that 'Target' variables, and output
could be mistaken with function output. Exception was thrown from function
'check_classification_targets', so even you say that's 'target' variable,
and still you want to call it 'label' or 'output'. I'm not a member of
scikit-learn member, so you will do as you please, but I would recommend to
use words 'Target variable' in doscstring and error message. And I ask you
to describe anywhere rules that input data (or target) should follow. A
short sentence - 'Target variable (parameter y) has to be int or str'.


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#7801 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz68AqYGWrP1C-BinLXGGHNt0VtV_qks5q6F5rgaJpZM4KlSFw
.

@KamodaP
Copy link
Author

KamodaP commented Nov 2, 2016

I've scanned the document and it seems a good place to mention those conventions. Also if you don't want to obfuscate the error messages too much then idea of putting that information in user guide isn't bad as well.
Well, the final solution (if any) will be as you wish it to be, I'm just saying that the idea seems ok, but you have your conventions. I wont make you do something.

@amueller
Copy link
Member

'Target variable (parameter y) has to be int or str'. is not right, because we support multi-label and multi-output multi-target

@amueller
Copy link
Member

Also, arbitrary objects that are not floats are supported as class labels, they don't have to be integers or strings.

@rajeshjnv
Copy link

If we put as imput training_data_X, training_scores_Y to fit method it cause error. To avoid it we will convert and encode labels

from sklearn import preprocessing
from sklearn import utils
lab_enc = preprocessing.LabelEncoder()
y_train = lab_enc.fit_transform(y_train)
print(y_train)
print(utils.multiclass.type_of_target(y_train))
print(utils.multiclass.type_of_target(y_train.astype('int')))
print(utils.multiclass.type_of_target(y_train))

@alexrindone
Copy link

I'm having this same issue, is there a fix for it?

@satish-bot
Copy link

ValueError Traceback (most recent call last)
in
1 Dec_tree_class = DecisionTreeClassifier()
----> 2 Dec_tree_class.fit(X_train,y_train)

~\Anaconda3\lib\site-packages\sklearn\tree\tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
814 sample_weight=sample_weight,
815 check_input=check_input,
--> 816 X_idx_sorted=X_idx_sorted)
817 return self
818

~\Anaconda3\lib\site-packages\sklearn\tree\tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
152
153 if is_classification:
--> 154 check_classification_targets(y)
155 y = np.copy(y)
156

~\Anaconda3\lib\site-packages\sklearn\utils\multiclass.py in check_classification_targets(y)
167 if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
168 'multilabel-indicator', 'multilabel-sequences']:
--> 169 raise ValueError("Unknown label type: %r" % y_type)
170
171

ValueError: Unknown label type: 'continuous'

@vaitybharati
Copy link

vaitybharati commented Mar 31, 2021

@alexrindone @satish-bot you may refer to https://stackoverflow.com/questions/56380097/cant-understand-this-error-unknown-label-type-continuous-multioutput/66890653#66890653. It answers one of the solution to this problem. Check if it helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants