Skip to content

sklearn.tree.export_text failing when feature_names supplied #26265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
NickKanellos opened this issue Apr 23, 2023 · 8 comments · Fixed by #26289
Closed

sklearn.tree.export_text failing when feature_names supplied #26265

NickKanellos opened this issue Apr 23, 2023 · 8 comments · Fixed by #26289

Comments

@NickKanellos
Copy link

NickKanellos commented Apr 23, 2023

folks, I'm not sure why this works for

import sklearn.tree
print(my_feature_names)
['0' '0 trump' '0 trump versus' ... 'zur' 'zur ckhalten' 'zur ckhalten muss']

tree.export_graphviz(clf, out_file=None, max_depth=4, feature_names=my_feature_names)

but not for

import sklearn.tree
print(my_feature_names)
['0' '0 trump' '0 trump versus' ... 'zur' 'zur ckhalten' 'zur ckhalten muss']

tree.export_text(clf, max_depth=4, feature_names=my_feature_names)

Traceback (most recent call last):
  File "./sample-python-projects/machine-learning/HW1_Q2a.py", line 72, in <module>
    print(tree.export_text(clf, max_depth=4, feature_names=my_feature_names))
  File "C:\Users\sam\python\lib\site-packages\sklearn\tree\_export.py", line 1016, in export_text
    if feature_names:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Can anyone help?

@github-actions github-actions bot added the Needs Triage Issue requires triage label Apr 23, 2023
@adrinjalali
Copy link
Member

Could you please post a minimal reproducible? (something we can copy paste in its entirety to produce the issue).

@adrinjalali adrinjalali added Needs Reproducible Code Issue requires reproducible code and removed Needs Triage Issue requires triage labels Apr 25, 2023
@Charlie-XIAO
Copy link
Contributor

@NickKanellos From the error message, it seems that the feature names you passed in is an array, but as documented, feature_names must either be a list of strings or None.

feature_nameslist of str, default=None
Names of each of the features. If None, generic names will be used (“x[0]”, “x[1]”, …).

@Charlie-XIAO
Copy link
Contributor

Charlie-XIAO commented Apr 26, 2023

@adrinjalali I can give a piece of code that possibly produces the same error:

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import export_text
import numpy as np

import sklearn; print("Sklearn version:", sklearn.__version__)

# Loading data, make feature_names a numpy array on purpose
iris = load_iris()
X, y = iris["data"], iris["target"]
feature_names = np.array(iris["feature_names"])

# Training decision tree
decision_tree = DecisionTreeClassifier(random_state=0, max_depth=2).fit(X, y)

# Exporting text
r = export_text(decision_tree, feature_names=feature_names)
print(r)

With the current stable version (1.2.2 or earlier), we will have

Sklearn version: 1.2.2
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-7-e072cc517196>](https://localhost:8080/#) in <cell line: 17>()
     15 
     16 # Exporting text
---> 17 r = export_text(decision_tree, feature_names=feature_names)
     18 print(r)

[/usr/local/lib/python3.9/dist-packages/sklearn/tree/_export.py](https://localhost:8080/#) in export_text(decision_tree, feature_names, max_depth, spacing, decimals, show_weights)
   1014         value_fmt = "{}{} value: {}\n"
   1015 
-> 1016     if feature_names:
   1017         feature_names_ = [
   1018             feature_names[i] if i != _tree.TREE_UNDEFINED else None

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In the current development branch (main branch), the parameters are validated (see #25867) for feature_names to accept only list of strings or None, so we will have

Sklearn version: 1.3.dev0
Traceback (most recent call last):
  File "D:\Academic\03 - NYU Shanghai\8_Spring 2023\Open Source Software Development\scikit-learn-ossd\test1.py", line 17, in <module>
    r = export_text(decision_tree, feature_names=feature_names)
  File "D:\Academic\03 - NYU Shanghai\8_Spring 2023\Open Source Software Development\scikit-learn-ossd\sklearn\utils\_param_validation.py", line 186, in wrapper
    validate_parameter_constraints(
  File "D:\Academic\03 - NYU Shanghai\8_Spring 2023\Open Source Software Development\scikit-learn-ossd\sklearn\utils\_param_validation.py", line 97, in validate_parameter_constraints
    raise InvalidParameterError(
sklearn.utils._param_validation.InvalidParameterError: The 'feature_names' parameter of export_text must be an instance of 'list' or None. Got array(['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)',
       'petal width (cm)'], dtype='<U17') instead.

However, though export_graphviz is also documented to accept feature_name only as list of strings or None (see also #26034, though not merged yet), it actually works with numpy arrays without a problem. I think we can make export_graphviz and export_text consistent (i.e., both accept numpy arrays). To do this, I can edit the docstring and parameter validation in #26034, and for export_text, by changing the line

if feature_names:

to if feature_names is not None:, and changing the line
"feature_names": [list, None],

to "feature_names": ["array-like", None], and then running the same code snippet as above, we will be able get the desired result such that

Sklearn version: 1.3.dev0
|--- petal width (cm) <= 0.80
|   |--- class: 0
|--- petal width (cm) >  0.80
|   |--- petal width (cm) <= 1.75
|   |   |--- class: 1
|   |--- petal width (cm) >  1.75
|   |   |--- class: 2

If this is the desired behavior, I'm happy to make a PR for this.

  • Change docstring and parameter validation in MAINT Parameters validation for sklearn.tree.export_graphviz #26034 (this PR is made by me so it is convenient to make a change)
  • Change docstring and parameter validation in sklearn.tree.export_text (accept "array-like")
  • Change if feature_names: to if feature_names is not None: in sklearn.tree.export_text

@adrinjalali
Copy link
Member

we're not really doing anything which definitely requires feature names to be a list, I'm happy with a fix which would accept an array like, and convert it to a list or an array. @Charlie-XIAO would you have time to submit a PR for your suggestions?

@adrinjalali adrinjalali added Documentation Enhancement and removed Needs Reproducible Code Issue requires reproducible code labels Apr 27, 2023
@Charlie-XIAO
Copy link
Contributor

@adrinjalali Sure, I will do it right now.

@Charlie-XIAO
Copy link
Contributor

/take

@Nazishzaib

This comment was marked as spam.

@Charlie-XIAO
Copy link
Contributor

@Nazish12345 Thanks for your advice, but as you can see above, I've already opened a PR so that both export_text and export_graphviz will be able to accept array-like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants