Random Forest Prediction and np.nan #19767

cap-jmk · 2021-03-25T19:39:01Z

Describe the bug

When running my processing workflow on a "dirty" dataset containing np.nan values the algorithm does not handle the case.

Steps/Code to Reproduce

Example:

list_with_nan_values = []
for i in range(10): 
    list_with_nan_values.append(np.nan)
prediction  = random_forest.predict(list_with_nan_values)

Expected Results

No error is thrown, or at least a warning that there is a np.nan value there.

Actual Results

setting an array element with a sequence. The requested array 
has an inhomogeneous shape after 1 dimensions. 
The detected shape was (10,) + inhomogeneous part.

Versions


System:
    python: 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 15:59:12)  [Clang 11.0.1 ]
executable: /Users/julian/opt/anaconda3/envs/cadd-course/bin/python
   machine: Darwin-20.3.0-x86_64-i386-64bit

Python dependencies:
          pip: 21.0.1
   setuptools: 49.6.0.post20210108
      sklearn: 0.24.1
        numpy: 1.20.1
        scipy: 1.6.1
       Cython: None
       pandas: 1.2.3
   matplotlib: 3.3.4
       joblib: 1.0.1
threadpoolctl: 2.1.0

Built with OpenMP: True

The text was updated successfully, but these errors were encountered:

NicolasHug · 2021-03-25T21:15:31Z

Random forests don't support nans, in fit or in predict. Is this a feature request or a bug report? The HistGradientBoosting esitmators are the only ones to natively support nans (I think). You'll need impute the data if you want to use RFs.

The error message you get is strange. How did you fit the estimator? You should get something like

ValueError: Expected 2D array, got 1D array instead:
array=[nan nan nan nan nan nan nan nan nan nan].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

glemaitre · 2021-03-26T14:27:00Z

X is expected to be (n_samples, n_features) and there is not support for missing values in random forest. We add some work going on supporting missing values natively in decision tree in scikit-learn: #5974

I am closing this issue because it looks like a duplicate.

glemaitre · 2021-03-26T14:27:48Z

@MQSchleich Feel free to comment in case that we missed something in your original, in which case we would reopen this issue.

cap-jmk · 2021-03-27T20:48:57Z

Random forests don't support nans, in fit or in predict. Is this a feature request or a bug report? The HistGradientBoosting esitmators are the only ones to natively support nans (I think). You'll need impute the data if you want to use RFs.

The error message you get is strange. How did you fit the estimator? You should get something like

ValueError: Expected 2D array, got 1D array instead:
array=[nan nan nan nan nan nan nan nan nan nan].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Honestly, I was just reproducing a case without giving you the whole code base. The bug occurred when getting input from a data frame after converting it to a list, having 1 np.nan value. However, it is not bad at all that the forest throws an error. Only the message was so confusing, I needed some time to figure out what was going on. Therefore, I suggest it is actually better to not throw an error but continue processing and to throw a warning.

cap-jmk added the Bug: triage label Mar 25, 2021

glemaitre closed this as completed Mar 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random Forest Prediction and np.nan #19767

Random Forest Prediction and np.nan #19767

cap-jmk commented Mar 25, 2021

NicolasHug commented Mar 25, 2021

glemaitre commented Mar 26, 2021

glemaitre commented Mar 26, 2021

cap-jmk commented Mar 27, 2021

Random Forest Prediction and np.nan #19767

Random Forest Prediction and np.nan #19767

Comments

cap-jmk commented Mar 25, 2021

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

NicolasHug commented Mar 25, 2021

glemaitre commented Mar 26, 2021

glemaitre commented Mar 26, 2021

cap-jmk commented Mar 27, 2021