DecisionTreeClassifier unknown label type: 'continuous-multioutput' #7801

KamodaP · 2016-10-31T17:30:17Z

Description

DecisionTreeClassifier crashes with unknown label type: 'continuous-multioutput'. I've tried loading csv file using csv.reader, pandas.read_csv and some other stuff like parsing line-by-line.

Steps/Code to Reproduce

from sklearn import tree
feature_df = pd.read_csv(os.path.join(_PATH, 'features.txt'))
target_df = pd.read_csv(os.path.join(_PATH, 'target.txt'))
feature_df = feature_df._get_numeric_data()
target_df = target_df._get_numeric_data()
feature_df = feature_df.fillna(0)
target_df = target_df.fillna(0)
clf = tree.DecisionTreeClassifier()
clf_o = clf.fit(feature_df, target_df)

features.txt
target.txt

Expected Results

Error thrown informs user what REALLY is wrong, that f.e. his data set does not folllow assumptions (and what are those)

Actual Results

Traceback (most recent call last):
  File "D:\Piotr\Documents\uni\bap\BAPFingerprintLocalisation\main.py", line 19,
 in <module>
    decision_tree.treeClassification()
  File "D:\Piotr\Documents\uni\bap\BAPFingerprintLocalisation\code\decision_tree
.py", line 56, in treeClassification
    clf_o = clf.fit(feature_df, target_df)
  File "C:\Python35\lib\site-packages\sklearn\tree\tree.py", line 182, in fit
    check_classification_targets(y)
  File "C:\Python35\lib\site-packages\sklearn\utils\multiclass.py", line 172, in
 check_classification_targets
    raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous-multioutput'

Versions

Windows-10-10.0.14393-SP0
Python 3.5.1 (v3.5.1:37a07cee5969, Dec  6 2015, 01:54:25) [MSC v.1900 64 bit (AMD64)]
NumPy 1.11.0
SciPy 0.17.1
Scikit-Learn 0.18

Update:

I've changed number of target variables to one, just to simplify things

clf_o = clf.fit(feature_df, target_df.ix[:,1])

Output: Unknown label type: 'continuous'

The text was updated successfully, but these errors were encountered:

jnothman · 2016-10-31T21:29:08Z

You should be using DecisionTreeRegressor

KamodaP · 2016-11-01T06:35:42Z

Again, documentation lacks information on how many classes can classification handle. I can see that my dataset has waaaay too many classes, but your error message mentioned something like 'labels' which was confusing enough to forget how the dataset actually look like and meddle with methods of passing datasets.
I've updated the issue and ask you to reopen it.

jnothman · 2016-11-01T10:36:54Z

Classification targets should be represented as integers or as strings. You can ask Pandas to read the target data in as a string and you'll be fine.

jnothman · 2016-11-01T10:37:10Z

Or use a DecisionTreeRegressor

KamodaP · 2016-11-01T12:26:07Z

That's not my problem

KamodaP · 2016-11-01T12:26:44Z

See 'Expected Results' section of my issue

jnothman · 2016-11-01T12:35:50Z

You're right that the error message could be more useful, but the documentation for fit does say "class labels in classification". Feel free to submit a clearer issue about needing to document the expected data type for classification ys, and another for raising appropriate error messages when float data is passed as y to a classifier.

KamodaP · 2016-11-01T13:24:09Z

Let me cite the whole section of documentation documenting parameter y of function fit in class DecisionTreeClassifier

The target values (class labels in classification, real numbers in regression). In the regression case, use dtype=np.float64 and order='C' for maximum efficiency.

That does not say that classes have a cap. What makes a target variable labeled continuous? How many classes have to be there to be considered regression type target variable? If it sais about regression, then can I do regression using DecisionTreeClassifier? Why not? Etc...

As for your previous comment:

Classification targets should be represented as integers or as strings. You can ask Pandas to read the target data in as a string and you'll be fine.

Does that mean that classes can't be represented as floats? Or as dicts? Lists? Tuples? Longs? Doubles? bytes? I know it is logical to represent classes as integers or strings, since they should not be plenty. But do they have to? What are the limitations?

And as to creating new ticket, isn't that useless since we've had quite a talk in here? Creating new ticket just to explain other guy the same thing?

jnothman · 2016-11-01T13:27:02Z

It's not number of classes. It's use of non-integers and non-strings.

I like the issue descriptions to be focused. Your concern as raised here
seemed to be more of a usage problem.

And please don't hassle me about what I suggest. This isn't the only issue
I'm dealing with.

On 2 November 2016 at 00:24, Piotr Kamoda notifications@github.com wrote:

Let me cite the whole section of documentation documenting parameter y of
function fit in class DecisionTreeClassifier

The target values (class labels in classification, real numbers in
regression). In the regression case, use dtype=np.float64 and order='C' for
maximum efficiency.

That does not say that classes have a cap. What makes a target variable
labeled continuous? How many classes have to be there to be considered
regression type target variable? If it sais about regression, then can I do
regression using DecisionTreeClassifier? Why not? Etc...

And as to creating new ticket, isn't that useless since we've had quite a
talk in here? Creating new ticket just to explain other guy the same thing?

—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#7801 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz63zNA0Qc4lzgHttXx-4VFkJFwFaDks5q5z16gaJpZM4KlSFw
.

KamodaP · 2016-11-01T13:32:16Z

You don't have to solve it today, I'm only trying to make the issue of bad error descriptions and bad documentation on tree classifier and regressor to become active and a task for future releases.

amueller · 2016-11-01T19:19:04Z

For the error message would "Unsupported output type: 'continuous-multioutput'" be better? That is the real issue. Also see #7809 for the docstring.

KamodaP · 2016-11-02T09:56:57Z

That's better. But still I don't understand why you won't name it as it is. Because literature mostly calls that 'Target' variables, and output could be mistaken with function output. Exception was thrown from function 'check_classification_targets', so even you say that's 'target' variable, and still you want to call it 'label' or 'output'. I'm not a member of scikit-learn member, so you will do as you please, but I would recommend to use words 'Target variable' in doscstring and error message. And I ask you to describe anywhere rules that input data (or target) should follow. A short sentence - 'Target variable (parameter y) has to be int or str'.

jnothman · 2016-11-02T10:38:20Z

Maybe it's worth mentioning in/alongside the new section (45cb11d / #7519)
on multiclass and multilabel fitting in the tutorial. Or maybe this all
belongs in a section of the user guide on data representation conventions,
describing input/output formats for all standard methods...?

On 2 November 2016 at 20:56, Piotr Kamoda notifications@github.com wrote:

That's better. But still I don't understand why you won't name it as it
is. Because literature mostly calls that 'Target' variables, and output
could be mistaken with function output. Exception was thrown from function
'check_classification_targets', so even you say that's 'target' variable,
and still you want to call it 'label' or 'output'. I'm not a member of
scikit-learn member, so you will do as you please, but I would recommend to
use words 'Target variable' in doscstring and error message. And I ask you
to describe anywhere rules that input data (or target) should follow. A
short sentence - 'Target variable (parameter y) has to be int or str'.

—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#7801 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz68AqYGWrP1C-BinLXGGHNt0VtV_qks5q6F5rgaJpZM4KlSFw
.

KamodaP · 2016-11-02T10:53:52Z

I've scanned the document and it seems a good place to mention those conventions. Also if you don't want to obfuscate the error messages too much then idea of putting that information in user guide isn't bad as well.
Well, the final solution (if any) will be as you wish it to be, I'm just saying that the idea seems ok, but you have your conventions. I wont make you do something.

amueller · 2016-11-17T17:22:59Z

'Target variable (parameter y) has to be int or str'. is not right, because we support multi-label and multi-output multi-target

amueller · 2016-11-17T17:23:32Z

Also, arbitrary objects that are not floats are supported as class labels, they don't have to be integers or strings.

rajeshjnv · 2019-06-03T19:14:51Z

If we put as imput training_data_X, training_scores_Y to fit method it cause error. To avoid it we will convert and encode labels

from sklearn import preprocessing
from sklearn import utils
lab_enc = preprocessing.LabelEncoder()
y_train = lab_enc.fit_transform(y_train)
print(y_train)
print(utils.multiclass.type_of_target(y_train))
print(utils.multiclass.type_of_target(y_train.astype('int')))
print(utils.multiclass.type_of_target(y_train))

alexrindone · 2019-09-08T03:52:42Z

I'm having this same issue, is there a fix for it?

satish-bot · 2021-01-19T05:17:48Z

ValueError Traceback (most recent call last)
in
1 Dec_tree_class = DecisionTreeClassifier()
----> 2 Dec_tree_class.fit(X_train,y_train)

~\Anaconda3\lib\site-packages\sklearn\tree\tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
814 sample_weight=sample_weight,
815 check_input=check_input,
--> 816 X_idx_sorted=X_idx_sorted)
817 return self
818

~\Anaconda3\lib\site-packages\sklearn\tree\tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
152
153 if is_classification:
--> 154 check_classification_targets(y)
155 y = np.copy(y)
156

~\Anaconda3\lib\site-packages\sklearn\utils\multiclass.py in check_classification_targets(y)
167 if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
168 'multilabel-indicator', 'multilabel-sequences']:
--> 169 raise ValueError("Unknown label type: %r" % y_type)
170
171

ValueError: Unknown label type: 'continuous'

vaitybharati · 2021-03-31T15:29:30Z

@alexrindone @satish-bot you may refer to https://stackoverflow.com/questions/56380097/cant-understand-this-error-unknown-label-type-continuous-multioutput/66890653#66890653. It answers one of the solution to this problem. Check if it helps.

jnothman closed this as completed Oct 31, 2016

jnothman mentioned this issue Nov 1, 2016

loading my own datasets #3808

Closed

amueller mentioned this issue Nov 1, 2016

Explicit docstrings for DecisionTreeClassifier.fit and DecisionTreeRegressor.fit #7809

Closed

vaitybharati mentioned this issue Mar 31, 2021

'Unknown label type: 'continuous-multioutput'' #19801

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DecisionTreeClassifier unknown label type: 'continuous-multioutput' #7801

DecisionTreeClassifier unknown label type: 'continuous-multioutput' #7801

KamodaP commented Oct 31, 2016 •

edited

Loading

jnothman commented Oct 31, 2016

KamodaP commented Nov 1, 2016

jnothman commented Nov 1, 2016

jnothman commented Nov 1, 2016

KamodaP commented Nov 1, 2016

KamodaP commented Nov 1, 2016

jnothman commented Nov 1, 2016 •

edited

Loading

KamodaP commented Nov 1, 2016 •

edited

Loading

jnothman commented Nov 1, 2016

KamodaP commented Nov 1, 2016

amueller commented Nov 1, 2016

KamodaP commented Nov 2, 2016

jnothman commented Nov 2, 2016

KamodaP commented Nov 2, 2016

amueller commented Nov 17, 2016

amueller commented Nov 17, 2016

rajeshjnv commented Jun 3, 2019

alexrindone commented Sep 8, 2019

satish-bot commented Jan 19, 2021

vaitybharati commented Mar 31, 2021 •

edited

Loading

DecisionTreeClassifier unknown label type: 'continuous-multioutput' #7801

DecisionTreeClassifier unknown label type: 'continuous-multioutput' #7801

Comments

KamodaP commented Oct 31, 2016 • edited Loading

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Update:

jnothman commented Oct 31, 2016

KamodaP commented Nov 1, 2016

jnothman commented Nov 1, 2016

jnothman commented Nov 1, 2016

KamodaP commented Nov 1, 2016

KamodaP commented Nov 1, 2016

jnothman commented Nov 1, 2016 • edited Loading

KamodaP commented Nov 1, 2016 • edited Loading

jnothman commented Nov 1, 2016

KamodaP commented Nov 1, 2016

amueller commented Nov 1, 2016

KamodaP commented Nov 2, 2016

jnothman commented Nov 2, 2016

KamodaP commented Nov 2, 2016

amueller commented Nov 17, 2016

amueller commented Nov 17, 2016

rajeshjnv commented Jun 3, 2019

alexrindone commented Sep 8, 2019

satish-bot commented Jan 19, 2021

vaitybharati commented Mar 31, 2021 • edited Loading

KamodaP commented Oct 31, 2016 •

edited

Loading

jnothman commented Nov 1, 2016 •

edited

Loading

KamodaP commented Nov 1, 2016 •

edited

Loading

vaitybharati commented Mar 31, 2021 •

edited

Loading