Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 22 additions & 17 deletions sklearn/ensemble/_hist_gradient_boosting/grower.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""
This module contains the TreeGrower class.

TreeGrowee builds a regression tree fitting a Newton-Raphson step, based on
TreeGrower builds a regression tree fitting a Newton-Raphson step, based on
the gradients and hessians of the training data.
"""
# Author: Nicolas Hug
Expand Down Expand Up @@ -33,20 +33,20 @@ class TreeNode:
----------
depth : int
The depth of the node, i.e. its distance from the root.
sample_indices : ndarray of unsigned int, shape (n_samples_at_node,)
sample_indices : ndarray of shape (n_samples_at_node,), dtype=np.uint
The indices of the samples at the node.
sum_gradients : float
The sum of the gradients of the samples at the node.
sum_hessians : float
The sum of the hessians of the samples at the node.
parent : TreeNode or None, optional (default=None)
parent : TreeNode, default=None
The parent of the node. None for root.

Attributes
----------
depth : int
The depth of the node, i.e. its distance from the root.
sample_indices : ndarray of unsigned int, shape (n_samples_at_node,)
sample_indices : ndarray of shape (n_samples_at_node,), dtype=np.uint
The indices of the samples at the node.
sum_gradients : float
The sum of the gradients of the samples at the node.
Expand Down Expand Up @@ -134,45 +134,50 @@ class TreeGrower:

Parameters
----------
X_binned : ndarray of int, shape (n_samples, n_features)
X_binned : ndarray of shape (n_samples, n_features), dtype=np.uint8
The binned input samples. Must be Fortran-aligned.
gradients : ndarray, shape (n_samples,)
gradients : ndarray of shape (n_samples,)
The gradients of each training sample. Those are the gradients of the
loss w.r.t the predictions, evaluated at iteration ``i - 1``.
hessians : ndarray, shape (n_samples,)
hessians : ndarray of shape (n_samples,)
The hessians of each training sample. Those are the hessians of the
loss w.r.t the predictions, evaluated at iteration ``i - 1``.
max_leaf_nodes : int or None, optional (default=None)
max_leaf_nodes : int, default=None
The maximum number of leaves for each tree. If None, there is no
maximum limit.
max_depth : int or None, optional (default=None)
max_depth : int, default=None
The maximum depth of each tree. The depth of a tree is the number of
edges to go from the root to the deepest leaf.
Depth isn't constrained by default.
min_samples_leaf : int, optional (default=20)
min_samples_leaf : int, default=20
The minimum number of samples per leaf.
min_gain_to_split : float, optional (default=0.)
min_gain_to_split : float, default=0.
The minimum gain needed to split a node. Splits with lower gain will
be ignored.
n_bins : int, optional (default=256)
n_bins : int, default=256
The total number of bins, including the bin for missing values. Used
to define the shape of the histograms.
n_bins_non_missing_ : array of uint32
n_bins_non_missing : ndarray, dtype=np.uint32, default=None
For each feature, gives the number of bins actually used for
non-missing values. For features with a lot of unique values, this
is equal to ``n_bins - 1``. If it's an int, all features are
considered to have the same number of bins. If None, all features
are considered to have ``n_bins - 1`` bins.
has_missing_values : ndarray of bool or bool, optional (default=False)
has_missing_values : bool or ndarray, dtype=bool, default=False
Whether each feature contains missing values (in the training data).
If it's a bool, the same value is used for all features.
l2_regularization : float, optional (default=0)
monotonic_cst : array-like of shape (n_features,), dtype=int, default=None
Indicates the monotonic constraint to enforce on each feature. -1, 1
and 0 respectively correspond to a positive constraint, negative
constraint and no constraint. Read more in the :ref:`User Guide
<monotonic_cst_gbdt>`.
l2_regularization : float, default=0.
The L2 regularization parameter.
min_hessian_to_split : float, optional (default=1e-3)
min_hessian_to_split : float, default=1e-3
The minimum sum of hessians needed in each node. Splits that result in
at least one child having a sum of hessians less than
``min_hessian_to_split`` are discarded.
shrinkage : float, optional (default=1)
shrinkage : float, default=1.
The shrinkage parameter to apply to the leaves values, also known as
learning rate.
"""
Expand Down