-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DecisionTreeClassifier unknown label type: 'continuous-multioutput' #7801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You should be using |
Again, documentation lacks information on how many classes can classification handle. I can see that my dataset has waaaay too many classes, but your error message mentioned something like 'labels' which was confusing enough to forget how the dataset actually look like and meddle with methods of passing datasets. |
Classification targets should be represented as integers or as strings. You can ask Pandas to read the target data in as a string and you'll be fine. |
Or use a |
That's not my problem |
See 'Expected Results' section of my issue |
You're right that the error message could be more useful, but the documentation for |
Let me cite the whole section of documentation documenting parameter y of function fit in class DecisionTreeClassifier
That does not say that classes have a cap. What makes a target variable labeled continuous? How many classes have to be there to be considered regression type target variable? If it sais about regression, then can I do regression using DecisionTreeClassifier? Why not? Etc... As for your previous comment:
Does that mean that classes can't be represented as floats? Or as dicts? Lists? Tuples? Longs? Doubles? bytes? I know it is logical to represent classes as integers or strings, since they should not be plenty. But do they have to? What are the limitations? And as to creating new ticket, isn't that useless since we've had quite a talk in here? Creating new ticket just to explain other guy the same thing? |
It's not number of classes. It's use of non-integers and non-strings. I like the issue descriptions to be focused. Your concern as raised here And please don't hassle me about what I suggest. This isn't the only issue On 2 November 2016 at 00:24, Piotr Kamoda notifications@github.com wrote:
|
You don't have to solve it today, I'm only trying to make the issue of bad error descriptions and bad documentation on tree classifier and regressor to become active and a task for future releases. |
For the error message would "Unsupported output type: 'continuous-multioutput'" be better? That is the real issue. Also see #7809 for the docstring. |
That's better. But still I don't understand why you won't name it as it is. Because literature mostly calls that 'Target' variables, and output could be mistaken with function output. Exception was thrown from function 'check_classification_targets', so even you say that's 'target' variable, and still you want to call it 'label' or 'output'. I'm not a member of scikit-learn member, so you will do as you please, but I would recommend to use words 'Target variable' in doscstring and error message. And I ask you to describe anywhere rules that input data (or target) should follow. A short sentence - 'Target variable (parameter y) has to be int or str'. |
Maybe it's worth mentioning in/alongside the new section (45cb11d / #7519) On 2 November 2016 at 20:56, Piotr Kamoda notifications@github.com wrote:
|
I've scanned the document and it seems a good place to mention those conventions. Also if you don't want to obfuscate the error messages too much then idea of putting that information in user guide isn't bad as well. |
'Target variable (parameter y) has to be int or str'. is not right, because we support multi-label and multi-output multi-target |
Also, arbitrary objects that are not floats are supported as class labels, they don't have to be integers or strings. |
If we put as imput training_data_X, training_scores_Y to fit method it cause error. To avoid it we will convert and encode labels from sklearn import preprocessing |
I'm having this same issue, is there a fix for it? |
ValueError Traceback (most recent call last) ~\Anaconda3\lib\site-packages\sklearn\tree\tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted) ~\Anaconda3\lib\site-packages\sklearn\tree\tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted) ~\Anaconda3\lib\site-packages\sklearn\utils\multiclass.py in check_classification_targets(y) ValueError: Unknown label type: 'continuous' |
@alexrindone @satish-bot you may refer to https://stackoverflow.com/questions/56380097/cant-understand-this-error-unknown-label-type-continuous-multioutput/66890653#66890653. It answers one of the solution to this problem. Check if it helps. |
Description
DecisionTreeClassifier crashes with
unknown label type: 'continuous-multioutput'
. I've tried loading csv file using csv.reader, pandas.read_csv and some other stuff like parsing line-by-line.Steps/Code to Reproduce
features.txt
target.txt
Expected Results
Error thrown informs user what REALLY is wrong, that f.e. his data set does not folllow assumptions (and what are those)
Actual Results
Versions
Update:
I've changed number of target variables to one, just to simplify things
Output:
Unknown label type: 'continuous'
The text was updated successfully, but these errors were encountered: