Skip to content

Adding L2- regualization in gradient boosting  #8784

@ppallesen

Description

@ppallesen

During a project I have been working a lot with the scikit learn and xgboost source code and I have been missing L2-regualzation in the scikit learn since my experience from xgboost is that including it gives preformance improvements. Their implimentation of this looks the following:

#Leaf calculation - param.h line 285
dw = -sum_grad / (sum_hess + p.reg_lambda);
#gain Calculation /imprunity_improvement - param.h line 250
gain= Sqr(sum_grad) / (sum_hess + p.reg_lambda);

Then there are some additional for l1 regualization, maximum improvement, etc. but the other code seems to be enogth given scikit learn hyperparameters. Full implimentation at https://github.com/dmlc/xgboost/blob/master/src/tree/param.h

I scikit learn the leaf calculation path seems to be handled by adding self.reg_lambda to all the denominators and adding it to the class. While the imprunity_improvement seems to handled by changing proxy_impurity_improvement and all classes in _criterion.pyx and adding an extra hyperparameter to it.

Does this seem reasonable or relevant?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions