Skip to content

input validation in predict of DummyRegressor is too strict #9832

Closed
@dcatteeu

Description

@dcatteeu

Description

sklearn.dummy.DummyRegressor requires X to be numeric and have at least 1 feature in predict. IMO, this is too strict.

Also, it is not done in fit.

Steps/Code to Reproduce

import numpy as np
import pandas as pd
import sklearn.dummy

cls = sklearn.dummy.DummyRegressor(strategy='mean')
df = pd.DataFrame(data={'A': ['foo', 'bar', 'baz']})
X = df.loc[:, ['A']].values  # X's data is of type object
y = [1, 2, 3]
cls.fit(X, y)
cls.predict(X)

Expected Result

returns [2, 2, 2] and no ValueError at all.

Actual Result

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File ".../python3.6/site-packages/sklearn/dummy.py", line 468, in predict
    X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
  File ".../python3.6/site-packages/sklearn/utils/validation.py", line 382, in check_array
    array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: could not convert string to float: 'baz'

Versions

Python 3.6.1 | packaged by conda-forge | (default, Mar 23 2017, 21:57:00)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
NumPy 1.12.1
SciPy 0.19.0
Scikit-Learn 0.18.1

Proposal

in sklearn.dummy.DummyRegressor, method predict call check_array with kwargs dtype=None and ensure_min_features=0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions