scikit-learn · thomasjpfan · Apr 18, 2023 · Apr 15, 2023 · betatim · Apr 17, 2023
diff --git a/doc/computing/computational_performance.rst b/doc/computing/computational_performance.rst
@@ -195,7 +195,7 @@ support vectors.
 .. centered:: |nusvr_model_complexity|
 
 For :mod:`sklearn.ensemble` of trees (e.g. RandomForest, GBT,
-ExtraTrees etc) the number of trees and their depth play the most
+ExtraTrees, etc.) the number of trees and their depth play the most
 important role. Latency and throughput should scale linearly with the number
 of trees. In this case we used directly the ``n_estimators`` parameter of
 :class:`~ensemble.GradientBoostingRegressor`.

diff --git a/doc/developers/contributing.rst b/doc/developers/contributing.rst
@@ -548,8 +548,8 @@ message, the following actions are taken.
     [cd build gh]          CD is run only for GitHub Actions
     [cd build cirrus]      CD is run only for Cirrus CI
     [lint skip]            Azure pipeline skips linting
-    [scipy-dev]            Build & test with our dependencies (numpy, scipy, etc ...) development builds
-    [nogil]                Build & test with the nogil experimental branches of CPython, Cython, NumPy, SciPy...
+    [scipy-dev]            Build & test with our dependencies (numpy, scipy, etc.) development builds
+    [nogil]                Build & test with the nogil experimental branches of CPython, Cython, NumPy, SciPy, ...
     [pypy]                 Build & test with PyPy
     [azure parallel]       Run Azure CI jobs in parallel
     [float32]              Run float32 tests by setting `SKLEARN_RUN_FLOAT32_TESTS=1`. See :ref:`environment_variable` for more details

diff --git a/doc/getting_started.rst b/doc/getting_started.rst
@@ -37,8 +37,8 @@ The :term:`fit` method generally accepts 2 inputs:
   represented as rows and features are represented as columns.
 - The target values :term:`y` which are real numbers for regression tasks, or
   integers for classification (or any other discrete set of values). For
-  unsupervized learning tasks, ``y`` does not need to be specified. ``y`` is
-  usually 1d array where the ``i`` th entry corresponds to the target of the
+  unsupervised learning tasks, ``y`` does not need to be specified. ``y`` is
+  usually a 1d array where the ``i`` th entry corresponds to the target of the
   ``i`` th sample (row) of ``X``.
 
 Both ``X`` and ``y`` are usually expected to be numpy arrays or equivalent

diff --git a/doc/modules/cross_decomposition.rst b/doc/modules/cross_decomposition.rst
@@ -28,7 +28,7 @@ PLS draws similarities with `Principal Component Regression
 <https://en.wikipedia.org/wiki/Principal_component_regression>`_ (PCR), where
 the samples are first projected into a lower-dimensional subspace, and the
 targets `y` are predicted using `transformed(X)`. One issue with PCR is that
-the dimensionality reduction is unsupervized, and may lose some important
+the dimensionality reduction is unsupervised, and may lose some important
 variables: PCR would keep the features with the most variance, but it's
 possible that features with a small variances are relevant from predicting
 the target. In a way, PLS allows for the same kind of dimensionality

diff --git a/doc/modules/feature_extraction.rst b/doc/modules/feature_extraction.rst
@@ -846,7 +846,7 @@ Note that the dimensionality does not affect the CPU training time of
 algorithms which operate on CSR matrices (``LinearSVC(dual=True)``,
 ``Perceptron``, ``SGDClassifier``, ``PassiveAggressive``) but it does for
 algorithms that work with CSC matrices (``LinearSVC(dual=False)``, ``Lasso()``,
-etc).
+etc.).
 
 Let's try again with the default setting::
 

diff --git a/doc/modules/lda_qda.rst b/doc/modules/lda_qda.rst
@@ -137,7 +137,7 @@ Mathematical formulation of LDA dimensionality reduction
 First note that the K means :math:`\mu_k` are vectors in
 :math:`\mathcal{R}^d`, and they lie in an affine subspace :math:`H` of
 dimension at most :math:`K - 1` (2 points lie on a line, 3 points lie on a
-plane, etc).
+plane, etc.).
 
 As mentioned above, we can interpret LDA as assigning :math:`x` to the class
 whose mean :math:`\mu_k` is the closest in terms of Mahalanobis distance,

diff --git a/sklearn/ensemble/_hist_gradient_boosting/splitting.pyx b/sklearn/ensemble/_hist_gradient_boosting/splitting.pyx
@@ -499,9 +499,9 @@ cdef class Splitter:
                 split_infos[split_info_idx].feature_idx = feature_idx
 
                 # For each feature, find best bin to split on
-                # Start with a gain of -1 (if no better split is found, that
+                # Start with a gain of -1 if no better split is found, that
                 # means one of the constraints isn't respected
-                # (min_samples_leaf, etc) and the grower will later turn the
+                # (min_samples_leaf, etc.) and the grower will later turn the
                 # node into a leaf.
                 split_infos[split_info_idx].gain = -1
                 split_infos[split_info_idx].is_categorical = is_categorical[feature_idx]

diff --git a/sklearn/metrics/_classification.py b/sklearn/metrics/_classification.py
@@ -316,7 +316,7 @@ def confusion_matrix(
            [0, 0, 1],
            [1, 0, 2]])
 
-    In the binary case, we can extract true positives, etc as follows:
+    In the binary case, we can extract true positives, etc. as follows:
 
     >>> tn, fp, fn, tp = confusion_matrix([0, 1, 0, 1], [1, 1, 1, 0]).ravel()
     >>> (tn, fp, fn, tp)

diff --git a/sklearn/model_selection/tests/test_search.py b/sklearn/model_selection/tests/test_search.py
@@ -379,7 +379,7 @@ def test_no_refit():
             and hasattr(grid_search, "best_params_")
         )
 
-        # Make sure the functions predict/transform etc raise meaningful
+        # Make sure the functions predict/transform etc. raise meaningful
         # error messages
         for fn_name in (
             "predict",

diff --git a/sklearn/neural_network/_multilayer_perceptron.py b/sklearn/neural_network/_multilayer_perceptron.py
@@ -360,7 +360,7 @@ def _backprop(self, X, y, activations, deltas, coef_grads, intercept_grads):
         return loss, coef_grads, intercept_grads
 
     def _initialize(self, y, layer_units, dtype):
-        # set all attributes, allocate weights etc for first call
+        # set all attributes, allocate weights etc. for first call
         # Initialize parameters
         self.n_iter_ = 0
         self.t_ = 0

diff --git a/sklearn/utils/tests/test_class_weight.py b/sklearn/utils/tests/test_class_weight.py
@@ -274,7 +274,7 @@ def test_compute_sample_weight_more_than_32():
     assert_array_almost_equal(weight, np.ones(y.shape[0]))
 
 
-def test_class_weight_does_not_contains_more_classses():
+def test_class_weight_does_not_contains_more_classes():
     """Check that class_weight can contain more labels than in y.
 
     Non-regression test for #22413

diff --git a/sklearn/utils/tests/test_estimator_html_repr.py b/sklearn/utils/tests/test_estimator_html_repr.py
@@ -205,7 +205,7 @@ def test_estimator_html_repr_pipeline():
 
 
 @pytest.mark.parametrize("final_estimator", [None, LinearSVC()])
-def test_stacking_classsifer(final_estimator):
+def test_stacking_classifier(final_estimator):
     estimators = [
         ("mlp", MLPClassifier(alpha=0.001)),
         ("tree", DecisionTreeClassifier()),