Skip to content

MAINT Raise NotFittedError when using DictVectorizer without prior fitting #24838

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Dec 7, 2023
11 changes: 11 additions & 0 deletions doc/whats_new/v1.4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -424,6 +424,17 @@ Changelog
support missing values if all `estimators` support missing values.
:pr:`27710` by :user:`Guillaume Lemaitre <glemaitre>`.

:mod:`sklearn.feature_extraction`
.................................

- |API| Changed error type from :class:`AttributeError` to
:class:`exceptions.NotFittedError` in unfitted instances of
:class:`feature_extraction.DictVectorizer` for the following methods:
:func:`feature_extraction.DictVectorizer.inverse_transform`,
:func:`feature_extraction.DictVectorizer.restrict`,
:func:`feature_extraction.DictVectorizer.transform`.
:pr:`24838` by :user:`Lorenz Hertel <LoHertel>`.

:mod:`sklearn.feature_selection`
................................

Expand Down
5 changes: 5 additions & 0 deletions sklearn/feature_extraction/_dict_vectorizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -338,6 +338,8 @@ def inverse_transform(self, X, dict_type=dict):
D : list of dict_type objects of shape (n_samples,)
Feature mappings for the samples in X.
"""
check_is_fitted(self, "feature_names_")

# COO matrix is not subscriptable
X = check_array(X, accept_sparse=["csr", "csc"])
n_samples = X.shape[0]
Expand Down Expand Up @@ -373,6 +375,7 @@ def transform(self, X):
Xa : {array, sparse matrix}
Feature vectors; always 2-d.
"""
check_is_fitted(self, ["feature_names_", "vocabulary_"])
return self._transform(X, fitting=False)

def get_feature_names_out(self, input_features=None):
Expand Down Expand Up @@ -428,6 +431,8 @@ def restrict(self, support, indices=False):
>>> v.get_feature_names_out()
array(['bar', 'foo'], ...)
"""
check_is_fitted(self, "feature_names_")

if not indices:
support = np.where(support)[0]

Expand Down
21 changes: 21 additions & 0 deletions sklearn/feature_extraction/tests/test_dict_vectorizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
import scipy.sparse as sp
from numpy.testing import assert_allclose, assert_array_equal

from sklearn.exceptions import NotFittedError
from sklearn.feature_extraction import DictVectorizer
from sklearn.feature_selection import SelectKBest, chi2

Expand Down Expand Up @@ -239,3 +240,23 @@ def test_dict_vectorizer_get_feature_names_out():
assert isinstance(feature_names, np.ndarray)
assert feature_names.dtype == object
assert_array_equal(feature_names, ["1", "2", "3"])


@pytest.mark.parametrize(
"method, input",
[
("transform", [{1: 2, 3: 4}, {2: 4}]),
("inverse_transform", [{1: 2, 3: 4}, {2: 4}]),
("restrict", [True, False, True]),
],
)
def test_dict_vectorizer_not_fitted_error(method, input):
"""Check that unfitted DictVectorizer instance raises NotFittedError.

This should be part of the common test but currently they test estimator accepting
text input.
"""
dv = DictVectorizer(sparse=False)

with pytest.raises(NotFittedError):
getattr(dv, method)(input)