Skip to content

Commit ef6612f

Browse files
authored
DOC Fix typos/nitpicks in TargetEncoder docstring (scikit-learn#26645)
1 parent 23ff51c commit ef6612f

File tree

1 file changed

+13
-10
lines changed

1 file changed

+13
-10
lines changed

sklearn/preprocessing/_target_encoder.py

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -30,21 +30,21 @@ class TargetEncoder(OneToOneFeatureMixin, _BaseEncoder):
3030
.. note::
3131
`fit(X, y).transform(X)` does not equal `fit_transform(X, y)` because a
3232
cross-validation scheme is used in `fit_transform` for encoding. See the
33-
:ref:`User Guide <target_encoder>`. for details.
33+
:ref:`User Guide <target_encoder>` for details.
3434
3535
.. versionadded:: 1.3
3636
3737
Parameters
3838
----------
39-
categories : "auto" or a list of array-like, default="auto"
39+
categories : "auto" or list of shape (n_features,) of array-like, default="auto"
4040
Categories (unique values) per feature:
4141
4242
- `"auto"` : Determine categories automatically from the training data.
4343
- list : `categories[i]` holds the categories expected in the i-th column. The
4444
passed categories should not mix strings and numeric values within a single
4545
feature, and should be sorted in case of numeric values.
4646
47-
The used categories is stored in the `categories_` fitted attribute.
47+
The used categories are stored in the `categories_` fitted attribute.
4848
4949
target_type : {"auto", "continuous", "binary"}, default="auto"
5050
Type of target.
@@ -56,16 +56,17 @@ class TargetEncoder(OneToOneFeatureMixin, _BaseEncoder):
5656
5757
.. note::
5858
The type of target inferred with `"auto"` may not be the desired target
59-
type used for modeling. For example, if the target consistent of integers
59+
type used for modeling. For example, if the target consisted of integers
6060
between 0 and 100, then :func:`~sklearn.utils.multiclass.type_of_target`
6161
will infer the target as `"multiclass"`. In this case, setting
62-
`target_type="continuous"` will understand the target as a regression
62+
`target_type="continuous"` will specify the target as a regression
6363
problem. The `target_type_` attribute gives the target type used by the
6464
encoder.
6565
6666
smooth : "auto" or float, default="auto"
67-
The amount of mixing of the categorical encoding with the global target mean. A
68-
larger `smooth` value will put more weight on the global target mean.
67+
The amount of mixing of the target mean conditioned on the value of the
68+
category with the global target mean. A larger `smooth` value will put
69+
more weight on the global target mean.
6970
If `"auto"`, then `smooth` is set to an empirical Bayes estimate.
7071
7172
cv : int, default=5
@@ -75,7 +76,7 @@ class TargetEncoder(OneToOneFeatureMixin, _BaseEncoder):
7576
7677
shuffle : bool, default=True
7778
Whether to shuffle the data in :meth:`fit_transform` before splitting into
78-
batches. Note that the samples within each split will not be shuffled.
79+
folds. Note that the samples within each split will not be shuffled.
7980
8081
random_state : int, RandomState instance or None, default=None
8182
When `shuffle` is True, `random_state` affects the ordering of the
@@ -87,11 +88,13 @@ class TargetEncoder(OneToOneFeatureMixin, _BaseEncoder):
8788
Attributes
8889
----------
8990
encodings_ : list of shape (n_features,) of ndarray
90-
For feature `i`, `encodings_[i]` is the encoding matching the
91+
Encodings learnt on all of `X`.
92+
For feature `i`, `encodings_[i]` are the encodings matching the
9193
categories listed in `categories_[i]`.
9294
9395
categories_ : list of shape (n_features,) of ndarray
94-
The categories of each feature determined during fitting
96+
The categories of each feature determined during fitting or specified
97+
in `categories`
9598
(in order of the features in `X` and corresponding with the output
9699
of :meth:`transform`).
97100

0 commit comments

Comments
 (0)