Skip to content

Commit e0ebc78

Browse files
DOC Update documentation of gradient boosting estimators w/ ranges (#22153)
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
1 parent 8286f02 commit e0ebc78

File tree

1 file changed

+49
-32
lines changed

1 file changed

+49
-32
lines changed

sklearn/ensemble/_gb.py

Lines changed: 49 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -989,18 +989,21 @@ class GradientBoostingClassifier(ClassifierMixin, BaseGradientBoosting):
989989
learning_rate : float, default=0.1
990990
Learning rate shrinks the contribution of each tree by `learning_rate`.
991991
There is a trade-off between learning_rate and n_estimators.
992+
Values must be in the range `(0.0, inf)`.
992993
993994
n_estimators : int, default=100
994995
The number of boosting stages to perform. Gradient boosting
995996
is fairly robust to over-fitting so a large number usually
996997
results in better performance.
998+
Values must be in the range `[1, inf)`.
997999
9981000
subsample : float, default=1.0
9991001
The fraction of samples to be used for fitting the individual base
10001002
learners. If smaller than 1.0 this results in Stochastic Gradient
10011003
Boosting. `subsample` interacts with the parameter `n_estimators`.
10021004
Choosing `subsample < 1.0` leads to a reduction of variance
10031005
and an increase in bias.
1006+
Values must be in the range `(0.0, 1.0]`.
10041007
10051008
criterion : {'friedman_mse', 'squared_error', 'mse'}, \
10061009
default='friedman_mse'
@@ -1019,10 +1022,9 @@ class GradientBoostingClassifier(ClassifierMixin, BaseGradientBoosting):
10191022
min_samples_split : int or float, default=2
10201023
The minimum number of samples required to split an internal node:
10211024
1022-
- If int, then consider `min_samples_split` as the minimum number.
1023-
- If float, then `min_samples_split` is a fraction and
1024-
`ceil(min_samples_split * n_samples)` are the minimum
1025-
number of samples for each split.
1025+
- If int, values must be in the range `[2, inf)`.
1026+
- If float, values must be in the range `(0.0, 1.0]` and `min_samples_split`
1027+
will be `ceil(min_samples_split * n_samples)`.
10261028
10271029
.. versionchanged:: 0.18
10281030
Added float values for fractions.
@@ -1034,10 +1036,9 @@ class GradientBoostingClassifier(ClassifierMixin, BaseGradientBoosting):
10341036
right branches. This may have the effect of smoothing the model,
10351037
especially in regression.
10361038
1037-
- If int, then consider `min_samples_leaf` as the minimum number.
1038-
- If float, then `min_samples_leaf` is a fraction and
1039-
`ceil(min_samples_leaf * n_samples)` are the minimum
1040-
number of samples for each node.
1039+
- If int, values must be in the range `[1, inf)`.
1040+
- If float, values must be in the range `(0.0, 1.0]` and `min_samples_leaf`
1041+
will be `ceil(min_samples_leaf * n_samples)`.
10411042
10421043
.. versionchanged:: 0.18
10431044
Added float values for fractions.
@@ -1046,16 +1047,19 @@ class GradientBoostingClassifier(ClassifierMixin, BaseGradientBoosting):
10461047
The minimum weighted fraction of the sum total of weights (of all
10471048
the input samples) required to be at a leaf node. Samples have
10481049
equal weight when sample_weight is not provided.
1050+
Values must be in the range `[0.0, 0.5]`.
10491051
10501052
max_depth : int, default=3
10511053
The maximum depth of the individual regression estimators. The maximum
10521054
depth limits the number of nodes in the tree. Tune this parameter
10531055
for best performance; the best value depends on the interaction
10541056
of the input variables.
1057+
Values must be in the range `[1, inf)`.
10551058
10561059
min_impurity_decrease : float, default=0.0
10571060
A node will be split if this split induces a decrease of the impurity
10581061
greater than or equal to this value.
1062+
Values must be in the range `[0.0, inf)`.
10591063
10601064
The weighted impurity decrease equation is the following::
10611065
@@ -1090,10 +1094,9 @@ class GradientBoostingClassifier(ClassifierMixin, BaseGradientBoosting):
10901094
max_features : {'auto', 'sqrt', 'log2'}, int or float, default=None
10911095
The number of features to consider when looking for the best split:
10921096
1093-
- If int, then consider `max_features` features at each split.
1094-
- If float, then `max_features` is a fraction and
1095-
`int(max_features * n_features)` features are considered at each
1096-
split.
1097+
- If int, values must be in the range `[1, inf)`.
1098+
- If float, values must be in the range `(0.0, 1.0]` and the features
1099+
considered at each split will be `int(max_features * n_features)`.
10971100
- If 'auto', then `max_features=sqrt(n_features)`.
10981101
- If 'sqrt', then `max_features=sqrt(n_features)`.
10991102
- If 'log2', then `max_features=log2(n_features)`.
@@ -1110,11 +1113,13 @@ class GradientBoostingClassifier(ClassifierMixin, BaseGradientBoosting):
11101113
Enable verbose output. If 1 then it prints progress and performance
11111114
once in a while (the more trees the lower the frequency). If greater
11121115
than 1 then it prints progress and performance for every tree.
1116+
Values must be in the range `[0, inf)`.
11131117
11141118
max_leaf_nodes : int, default=None
11151119
Grow trees with ``max_leaf_nodes`` in best-first fashion.
11161120
Best nodes are defined as relative reduction in impurity.
1117-
If None then unlimited number of leaf nodes.
1121+
Values must be in the range `[2, inf)`.
1122+
If `None`, then unlimited number of leaf nodes.
11181123
11191124
warm_start : bool, default=False
11201125
When set to ``True``, reuse the solution of the previous call to fit
@@ -1123,7 +1128,7 @@ class GradientBoostingClassifier(ClassifierMixin, BaseGradientBoosting):
11231128
11241129
validation_fraction : float, default=0.1
11251130
The proportion of training data to set aside as validation set for
1126-
early stopping. Must be between 0 and 1.
1131+
early stopping. Values must be in the range `(0.0, 1.0)`.
11271132
Only used if ``n_iter_no_change`` is set to an integer.
11281133
11291134
.. versionadded:: 0.20
@@ -1136,21 +1141,24 @@ class GradientBoostingClassifier(ClassifierMixin, BaseGradientBoosting):
11361141
data as validation and terminate training when validation score is not
11371142
improving in all of the previous ``n_iter_no_change`` numbers of
11381143
iterations. The split is stratified.
1144+
Values must be in the range `[1, inf)`.
11391145
11401146
.. versionadded:: 0.20
11411147
11421148
tol : float, default=1e-4
11431149
Tolerance for the early stopping. When the loss is not improving
11441150
by at least tol for ``n_iter_no_change`` iterations (if set to a
11451151
number), the training stops.
1152+
Values must be in the range `(0.0, inf)`.
11461153
11471154
.. versionadded:: 0.20
11481155
11491156
ccp_alpha : non-negative float, default=0.0
11501157
Complexity parameter used for Minimal Cost-Complexity Pruning. The
11511158
subtree with the largest cost complexity that is smaller than
1152-
``ccp_alpha`` will be chosen. By default, no pruning is performed. See
1153-
:ref:`minimal_cost_complexity_pruning` for details.
1159+
``ccp_alpha`` will be chosen. By default, no pruning is performed.
1160+
Values must be in the range `[0.0, inf)`.
1161+
See :ref:`minimal_cost_complexity_pruning` for details.
11541162
11551163
.. versionadded:: 0.22
11561164
@@ -1548,18 +1556,21 @@ class GradientBoostingRegressor(RegressorMixin, BaseGradientBoosting):
15481556
learning_rate : float, default=0.1
15491557
Learning rate shrinks the contribution of each tree by `learning_rate`.
15501558
There is a trade-off between learning_rate and n_estimators.
1559+
Values must be in the range `(0.0, inf)`.
15511560
15521561
n_estimators : int, default=100
15531562
The number of boosting stages to perform. Gradient boosting
15541563
is fairly robust to over-fitting so a large number usually
15551564
results in better performance.
1565+
Values must be in the range `[1, inf)`.
15561566
15571567
subsample : float, default=1.0
15581568
The fraction of samples to be used for fitting the individual base
15591569
learners. If smaller than 1.0 this results in Stochastic Gradient
15601570
Boosting. `subsample` interacts with the parameter `n_estimators`.
15611571
Choosing `subsample < 1.0` leads to a reduction of variance
15621572
and an increase in bias.
1573+
Values must be in the range `(0.0, 1.0]`.
15631574
15641575
criterion : {'friedman_mse', 'squared_error', 'mse'}, \
15651576
default='friedman_mse'
@@ -1578,10 +1589,9 @@ class GradientBoostingRegressor(RegressorMixin, BaseGradientBoosting):
15781589
min_samples_split : int or float, default=2
15791590
The minimum number of samples required to split an internal node:
15801591
1581-
- If int, then consider `min_samples_split` as the minimum number.
1582-
- If float, then `min_samples_split` is a fraction and
1583-
`ceil(min_samples_split * n_samples)` are the minimum
1584-
number of samples for each split.
1592+
- If int, values must be in the range `[2, inf)`.
1593+
- If float, values must be in the range `(0.0, 1.0]` and `min_samples_split`
1594+
will be `ceil(min_samples_split * n_samples)`.
15851595
15861596
.. versionchanged:: 0.18
15871597
Added float values for fractions.
@@ -1593,10 +1603,9 @@ class GradientBoostingRegressor(RegressorMixin, BaseGradientBoosting):
15931603
right branches. This may have the effect of smoothing the model,
15941604
especially in regression.
15951605
1596-
- If int, then consider `min_samples_leaf` as the minimum number.
1597-
- If float, then `min_samples_leaf` is a fraction and
1598-
`ceil(min_samples_leaf * n_samples)` are the minimum
1599-
number of samples for each node.
1606+
- If int, values must be in the range `[1, inf)`.
1607+
- If float, values must be in the range `(0.0, 1.0]` and `min_samples_leaf`
1608+
will be `ceil(min_samples_leaf * n_samples)`.
16001609
16011610
.. versionchanged:: 0.18
16021611
Added float values for fractions.
@@ -1605,16 +1614,19 @@ class GradientBoostingRegressor(RegressorMixin, BaseGradientBoosting):
16051614
The minimum weighted fraction of the sum total of weights (of all
16061615
the input samples) required to be at a leaf node. Samples have
16071616
equal weight when sample_weight is not provided.
1617+
Values must be in the range `[0.0, 0.5]`.
16081618
16091619
max_depth : int, default=3
16101620
Maximum depth of the individual regression estimators. The maximum
16111621
depth limits the number of nodes in the tree. Tune this parameter
16121622
for best performance; the best value depends on the interaction
16131623
of the input variables.
1624+
Values must be in the range `[1, inf)`.
16141625
16151626
min_impurity_decrease : float, default=0.0
16161627
A node will be split if this split induces a decrease of the impurity
16171628
greater than or equal to this value.
1629+
Values must be in the range `[0.0, inf)`.
16181630
16191631
The weighted impurity decrease equation is the following::
16201632
@@ -1650,10 +1662,9 @@ class GradientBoostingRegressor(RegressorMixin, BaseGradientBoosting):
16501662
max_features : {'auto', 'sqrt', 'log2'}, int or float, default=None
16511663
The number of features to consider when looking for the best split:
16521664
1653-
- If int, then consider `max_features` features at each split.
1654-
- If float, then `max_features` is a fraction and
1655-
`int(max_features * n_features)` features are considered at each
1656-
split.
1665+
- If int, values must be in the range `[1, inf)`.
1666+
- If float, values must be in the range `(0.0, 1.0]` and the features
1667+
considered at each split will be `int(max_features * n_features)`.
16571668
- If "auto", then `max_features=n_features`.
16581669
- If "sqrt", then `max_features=sqrt(n_features)`.
16591670
- If "log2", then `max_features=log2(n_features)`.
@@ -1669,16 +1680,19 @@ class GradientBoostingRegressor(RegressorMixin, BaseGradientBoosting):
16691680
alpha : float, default=0.9
16701681
The alpha-quantile of the huber loss function and the quantile
16711682
loss function. Only if ``loss='huber'`` or ``loss='quantile'``.
1683+
Values must be in the range `(0.0, 1.0)`.
16721684
16731685
verbose : int, default=0
16741686
Enable verbose output. If 1 then it prints progress and performance
16751687
once in a while (the more trees the lower the frequency). If greater
16761688
than 1 then it prints progress and performance for every tree.
1689+
Values must be in the range `[0, inf)`.
16771690
16781691
max_leaf_nodes : int, default=None
16791692
Grow trees with ``max_leaf_nodes`` in best-first fashion.
16801693
Best nodes are defined as relative reduction in impurity.
1681-
If None then unlimited number of leaf nodes.
1694+
Values must be in the range `[2, inf)`.
1695+
If None, then unlimited number of leaf nodes.
16821696
16831697
warm_start : bool, default=False
16841698
When set to ``True``, reuse the solution of the previous call to fit
@@ -1687,7 +1701,7 @@ class GradientBoostingRegressor(RegressorMixin, BaseGradientBoosting):
16871701
16881702
validation_fraction : float, default=0.1
16891703
The proportion of training data to set aside as validation set for
1690-
early stopping. Must be between 0 and 1.
1704+
early stopping. Values must be in the range `(0.0, 1.0)`.
16911705
Only used if ``n_iter_no_change`` is set to an integer.
16921706
16931707
.. versionadded:: 0.20
@@ -1700,21 +1714,24 @@ class GradientBoostingRegressor(RegressorMixin, BaseGradientBoosting):
17001714
data as validation and terminate training when validation score is not
17011715
improving in all of the previous ``n_iter_no_change`` numbers of
17021716
iterations.
1717+
Values must be in the range `[1, inf)`.
17031718
17041719
.. versionadded:: 0.20
17051720
17061721
tol : float, default=1e-4
17071722
Tolerance for the early stopping. When the loss is not improving
17081723
by at least tol for ``n_iter_no_change`` iterations (if set to a
17091724
number), the training stops.
1725+
Values must be in the range `(0.0, inf)`.
17101726
17111727
.. versionadded:: 0.20
17121728
17131729
ccp_alpha : non-negative float, default=0.0
17141730
Complexity parameter used for Minimal Cost-Complexity Pruning. The
17151731
subtree with the largest cost complexity that is smaller than
1716-
``ccp_alpha`` will be chosen. By default, no pruning is performed. See
1717-
:ref:`minimal_cost_complexity_pruning` for details.
1732+
``ccp_alpha`` will be chosen. By default, no pruning is performed.
1733+
Values must be in the range `[0.0, inf)`.
1734+
See :ref:`minimal_cost_complexity_pruning` for details.
17181735
17191736
.. versionadded:: 0.22
17201737

0 commit comments

Comments
 (0)