Closed
Description
Description
sklearn.dummy.DummyRegressor
requires X
to be numeric and have at least 1 feature in predict
. IMO, this is too strict.
Also, it is not done in fit
.
Steps/Code to Reproduce
import numpy as np
import pandas as pd
import sklearn.dummy
cls = sklearn.dummy.DummyRegressor(strategy='mean')
df = pd.DataFrame(data={'A': ['foo', 'bar', 'baz']})
X = df.loc[:, ['A']].values # X's data is of type object
y = [1, 2, 3]
cls.fit(X, y)
cls.predict(X)
Expected Result
returns [2, 2, 2]
and no ValueError at all.
Actual Result
Traceback (most recent call last):
File "<input>", line 1, in <module>
File ".../python3.6/site-packages/sklearn/dummy.py", line 468, in predict
X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
File ".../python3.6/site-packages/sklearn/utils/validation.py", line 382, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: could not convert string to float: 'baz'
Versions
Python 3.6.1 | packaged by conda-forge | (default, Mar 23 2017, 21:57:00)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
NumPy 1.12.1
SciPy 0.19.0
Scikit-Learn 0.18.1
Proposal
in sklearn.dummy.DummyRegressor
, method predict
call check_array
with kwargs dtype=None
and ensure_min_features=0