Skip to content

DOC Ensures that SelfTrainingClassifier passes numpydoc validation #21277

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion maint_tools/test_docstrings.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@
"PassiveAggressiveClassifier",
"PassiveAggressiveRegressor",
"QuadraticDiscriminantAnalysis",
"SelfTrainingClassifier",
"SparseRandomProjection",
"SpectralBiclustering",
"SpectralCoclustering",
Expand Down
55 changes: 30 additions & 25 deletions sklearn/semi_supervised/_self_training.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,30 +37,30 @@ class SelfTrainingClassifier(MetaEstimatorMixin, BaseEstimator):
Parameters
----------
base_estimator : estimator object
An estimator object implementing ``fit`` and ``predict_proba``.
Invoking the ``fit`` method will fit a clone of the passed estimator,
which will be stored in the ``base_estimator_`` attribute.
An estimator object implementing `fit` and `predict_proba`.
Invoking the `fit` method will fit a clone of the passed estimator,
which will be stored in the `base_estimator_` attribute.

threshold : float, default=0.75
The decision threshold for use with `criterion='threshold'`.
Should be in [0, 1). When using the 'threshold' criterion, a
Should be in [0, 1). When using the `'threshold'` criterion, a
:ref:`well calibrated classifier <calibration>` should be used.

criterion : {'threshold', 'k_best'}, default='threshold'
The selection criterion used to select which labels to add to the
training set. If 'threshold', pseudo-labels with prediction
probabilities above `threshold` are added to the dataset. If 'k_best',
training set. If `'threshold'`, pseudo-labels with prediction
probabilities above `threshold` are added to the dataset. If `'k_best'`,
the `k_best` pseudo-labels with highest prediction probabilities are
added to the dataset. When using the 'threshold' criterion, a
:ref:`well calibrated classifier <calibration>` should be used.

k_best : int, default=10
The amount of samples to add in each iteration. Only used when
`criterion` is k_best'.
`criterion='k_best'`.

max_iter : int or None, default=10
Maximum number of iterations allowed. Should be greater than or equal
to 0. If it is ``None``, the classifier will continue to predict labels
to 0. If it is `None`, the classifier will continue to predict labels
until no new pseudo-labels are added, or all unlabeled samples have
been labeled.

Expand All @@ -74,7 +74,7 @@ class SelfTrainingClassifier(MetaEstimatorMixin, BaseEstimator):

classes_ : ndarray or list of ndarray of shape (n_classes,)
Class labels for each output. (Taken from the trained
``base_estimator_``).
`base_estimator_`).

transduction_ : ndarray of shape (n_samples,)
The labels used for the final fit of the classifier, including
Expand Down Expand Up @@ -104,11 +104,24 @@ class SelfTrainingClassifier(MetaEstimatorMixin, BaseEstimator):
termination_condition_ : {'max_iter', 'no_change', 'all_labeled'}
The reason that fitting was stopped.

- 'max_iter': `n_iter_` reached `max_iter`.
- 'no_change': no new labels were predicted.
- 'all_labeled': all unlabeled samples were labeled before `max_iter`
- `'max_iter'`: `n_iter_` reached `max_iter`.
- `'no_change'`: no new labels were predicted.
- `'all_labeled'`: all unlabeled samples were labeled before `max_iter`
was reached.

See Also
--------
LabelPropagation : Label propagation classifier.
LabelSpreading : Label spreading model for semi-supervised learning.

References
----------
David Yarowsky. 1995. Unsupervised word sense disambiguation rivaling
supervised methods. In Proceedings of the 33rd annual meeting on
Association for Computational Linguistics (ACL '95). Association for
Computational Linguistics, Stroudsburg, PA, USA, 189-196. DOI:
https://doi.org/10.3115/981658.981684

Examples
--------
>>> import numpy as np
Expand All @@ -123,14 +136,6 @@ class SelfTrainingClassifier(MetaEstimatorMixin, BaseEstimator):
>>> self_training_model = SelfTrainingClassifier(svc)
>>> self_training_model.fit(iris.data, iris.target)
SelfTrainingClassifier(...)

References
----------
David Yarowsky. 1995. Unsupervised word sense disambiguation rivaling
supervised methods. In Proceedings of the 33rd annual meeting on
Association for Computational Linguistics (ACL '95). Association for
Computational Linguistics, Stroudsburg, PA, USA, 189-196. DOI:
https://doi.org/10.3115/981658.981684
"""

_estimator_type = "classifier"
Expand All @@ -153,7 +158,7 @@ def __init__(

def fit(self, X, y):
"""
Fits this ``SelfTrainingClassifier`` to a dataset.
Fit self-training classifier using `X`, `y` as training data.

Parameters
----------
Expand All @@ -167,7 +172,7 @@ def fit(self, X, y):
Returns
-------
self : object
Returns an instance of self.
Fitted estimator.
"""
# we need row slicing support for sparce matrices, but costly finiteness check
# can be delegated to the base estimator.
Expand Down Expand Up @@ -281,7 +286,7 @@ def fit(self, X, y):

@if_delegate_has_method(delegate="base_estimator")
def predict(self, X):
"""Predict the classes of X.
"""Predict the classes of `X`.

Parameters
----------
Expand Down Expand Up @@ -326,7 +331,7 @@ def predict_proba(self, X):

@if_delegate_has_method(delegate="base_estimator")
def decision_function(self, X):
"""Calls decision function of the `base_estimator`.
"""Call decision function of the `base_estimator`.

Parameters
----------
Expand Down Expand Up @@ -372,7 +377,7 @@ def predict_log_proba(self, X):

@if_delegate_has_method(delegate="base_estimator")
def score(self, X, y):
"""Calls score on the `base_estimator`.
"""Call score on the `base_estimator`.

Parameters
----------
Expand Down