-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FEA add (single) Cholesky Newton solver to GLMs #24637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEA add (single) Cholesky Newton solver to GLMs #24637
Conversation
The |
A unrelated question: Why don't we link the codecov in the CI anymore? |
Great. So let's hope, CI gets 🟢 . |
All 💚! I leave @ogrisel merge if everything looks good to him. |
wait a minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One open question to go...
sklearn/linear_model/_glm/glm.py
Outdated
# 2.2 Deal with relative gradient differences around machine precision. | ||
tiny_grad = sum_abs_grad_old * eps | ||
abs_grad_improvement = np.abs(sum_abs_grad - sum_abs_grad_old) | ||
check = abs_grad_improvement <= tiny_grad | ||
if is_verbose: | ||
print( | ||
" check |sum(|gradient|) - sum(|gradient_old|)| <= eps * " | ||
"sum(|gradient_old|):" | ||
f" {abs_grad_improvement} <= {tiny_grad} {check}" | ||
) | ||
if check: | ||
break | ||
# 2.3 This is really the last resort. | ||
# Check that sum(|gradient_{i-1}|) < sum(|gradient_{i-2}|) | ||
# = has_improved_sum_abs_grad_previous | ||
# If now sum(|gradient_{i}|) >= sum(|gradient_{i-1}|), this iteration | ||
# made things worse and we should have stopped at i-1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm undecided whether we need 2.2 and 2.3 in the line search. We could just fail and resort to lbfgs.
I added those 2 criteria when I was trying fallback strategies (e.g. simple gradient steps) for situations like indefinite hessian, negative pointwise hessian values, collinear features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I think we can first validate this solver by merging this PR and do this study in a follow-up PR. What do you think?
# 2.2 Deal with relative gradient differences around machine precision. | |
tiny_grad = sum_abs_grad_old * eps | |
abs_grad_improvement = np.abs(sum_abs_grad - sum_abs_grad_old) | |
check = abs_grad_improvement <= tiny_grad | |
if is_verbose: | |
print( | |
" check |sum(|gradient|) - sum(|gradient_old|)| <= eps * " | |
"sum(|gradient_old|):" | |
f" {abs_grad_improvement} <= {tiny_grad} {check}" | |
) | |
if check: | |
break | |
# 2.3 This is really the last resort. | |
# Check that sum(|gradient_{i-1}|) < sum(|gradient_{i-2}|) | |
# = has_improved_sum_abs_grad_previous | |
# If now sum(|gradient_{i}|) >= sum(|gradient_{i-1}|), this iteration | |
# made things worse and we should have stopped at i-1. | |
# TODO: study removing 2.2 and 2.3 for instead resorting to lbfgs-b. | |
# 2.2 Deal with relative gradient differences around machine precision. | |
tiny_grad = sum_abs_grad_old * eps | |
abs_grad_improvement = np.abs(sum_abs_grad - sum_abs_grad_old) | |
check = abs_grad_improvement <= tiny_grad | |
if is_verbose: | |
print( | |
" check |sum(|gradient|) - sum(|gradient_old|)| <= eps * " | |
"sum(|gradient_old|):" | |
f" {abs_grad_improvement} <= {tiny_grad} {check}" | |
) | |
if check: | |
break | |
# 2.3 This is really the last resort. | |
# Check that sum(|gradient_{i-1}|) < sum(|gradient_{i-2}|) | |
# = has_improved_sum_abs_grad_previous | |
# If now sum(|gradient_{i}|) >= sum(|gradient_{i-1}|), this iteration | |
# made things worse and we should have stopped at i-1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main problem is that I can't find the settings anymore that led me to implement those conditions. I could also remove now and we can reinsert when problems arrive. Those lines are still present in #23314.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please try to remove them and re-run the 2 PR examples and the benchmarks of this comment?
If the benchmark curve still looks the same with no new no-convergence warning, then I am fine to merge the PR without those checks and consider re-adding them if we observe the ConvergenceWarning
on a real dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I still think we could extract the line search code into a standalone function with dedicated unit tests. That would probably help trigger those conditions on arbitrary optimization problems (such as Multidimensional generalisations of the Rosenbrock function).
Can be done in a follow-up PR if what I said in the previous comment does not reveal any problem by itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d304ce9 remove those 2 line search criteria.
sklearn/linear_model/_glm/glm.py
Outdated
# 2.2 Deal with relative gradient differences around machine precision. | ||
tiny_grad = sum_abs_grad_old * eps | ||
abs_grad_improvement = np.abs(sum_abs_grad - sum_abs_grad_old) | ||
check = abs_grad_improvement <= tiny_grad | ||
if is_verbose: | ||
print( | ||
" check |sum(|gradient|) - sum(|gradient_old|)| <= eps * " | ||
"sum(|gradient_old|):" | ||
f" {abs_grad_improvement} <= {tiny_grad} {check}" | ||
) | ||
if check: | ||
break | ||
# 2.3 This is really the last resort. | ||
# Check that sum(|gradient_{i-1}|) < sum(|gradient_{i-2}|) | ||
# = has_improved_sum_abs_grad_previous | ||
# If now sum(|gradient_{i}|) >= sum(|gradient_{i-1}|), this iteration | ||
# made things worse and we should have stopped at i-1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I think we can first validate this solver by merging this PR and do this study in a follow-up PR. What do you think?
# 2.2 Deal with relative gradient differences around machine precision. | |
tiny_grad = sum_abs_grad_old * eps | |
abs_grad_improvement = np.abs(sum_abs_grad - sum_abs_grad_old) | |
check = abs_grad_improvement <= tiny_grad | |
if is_verbose: | |
print( | |
" check |sum(|gradient|) - sum(|gradient_old|)| <= eps * " | |
"sum(|gradient_old|):" | |
f" {abs_grad_improvement} <= {tiny_grad} {check}" | |
) | |
if check: | |
break | |
# 2.3 This is really the last resort. | |
# Check that sum(|gradient_{i-1}|) < sum(|gradient_{i-2}|) | |
# = has_improved_sum_abs_grad_previous | |
# If now sum(|gradient_{i}|) >= sum(|gradient_{i-1}|), this iteration | |
# made things worse and we should have stopped at i-1. | |
# TODO: study removing 2.2 and 2.3 for instead resorting to lbfgs-b. | |
# 2.2 Deal with relative gradient differences around machine precision. | |
tiny_grad = sum_abs_grad_old * eps | |
abs_grad_improvement = np.abs(sum_abs_grad - sum_abs_grad_old) | |
check = abs_grad_improvement <= tiny_grad | |
if is_verbose: | |
print( | |
" check |sum(|gradient|) - sum(|gradient_old|)| <= eps * " | |
"sum(|gradient_old|):" | |
f" {abs_grad_improvement} <= {tiny_grad} {check}" | |
) | |
if check: | |
break | |
# 2.3 This is really the last resort. | |
# Check that sum(|gradient_{i-1}|) < sum(|gradient_{i-2}|) | |
# = has_improved_sum_abs_grad_previous | |
# If now sum(|gradient_{i}|) >= sum(|gradient_{i-1}|), this iteration | |
# made things worse and we should have stopped at i-1. |
Check of final changes:
|
Let's wait for the CI to complete, if there is no new |
No |
Merged! Thank you a lot for bringing this solver in, @lorentzenchr! Thanks @ogrisel for your comprehensive review and analysis. |
+1 for binary logistic regression. Multinomial LR will probably require significantly more work and will be problematic when both n_classes and n_features are non small at the same time. |
* FEA add NewtonSolver, CholeskyNewtonSolver and QRCholeskyNewtonSolver * ENH better singular hessian special solve * CLN fix some typos found by reviewer * TST assert ConvergenceWarning is raised * MNT add BaseCholeskyNewtonSolver * WIP colinear design in GLMs * FIX _solve_singular * FIX false unpacking in * TST add tests for unpenalized GLMs * TST fix solutions of glm_dataset * ENH add SVDFallbackSolver * CLN remove SVDFallbackSolver * ENH use gradient step for singular hessians * ENH print iteration number in warnings * TST improve test_linalg_warning_with_newton_solver * CLN LinAlgWarning fron scipy.linalg * ENH more robust hessian * ENH increase maxls for lbfgs to make it more robust * ENH add hessian_warning for too many negative hessian values * CLN some warning messages * ENH add lbfgs_step * ENH use lbfgs_step for hessian_warning * TST make them pass * TST tweek rtol for lbfgs * TST add rigoros test for GLMs * TST improve test_warm_start * ENH improve lbfgs options for better convergence * CLN fix test_warm_start * TST fix assert singular values in datasets * CLN address most review comments * ENH enable more vebosity levels for lbfgs * DOC add whatsnew * CLN remove xfail and clean a bit * CLN docstring about minimum norm * More informative repr for the glm_dataset fixture cases * Forgot to run black * CLN remove unnecessary filterwarnings * CLN address review comments * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * CLN add comment for lbfgs ftol=64 * machine precision * CLN XXX code comment * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * CLN link issue and remove code snippet in comment * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * CLN add catch_warnings * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * [all random seeds] test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * Trigger with -Werror [all random seeds] test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * ENH increase maxls to 50 * [all random seeds] test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * Revert "Trigger with -Werror [all random seeds]" This reverts commit 99f4cf9. * TST add catch_warnings to filterwarnings * TST adapt tests for newton solvers * CLN cleaner gradient step with gradient_times_newton * DOC add whatsnew * ENH always use lbfgs as fallback * TST adapt rtol * TST fix test_linalg_warning_with_newton_solver * CLN address some review comments * Improve tests related to convergence warning on collinear data * overfit -> fit * Typo in comment * Apply suggestions from code review * ENH fallback_lbfgs_solve - Do not use lbfgs steps, fall back complete to lbfgs * ENH adapt rtol * Improve test_linalg_warning_with_newton_solver * Better comments * Fixed Hessian casing and improved warning messages * [all random seeds] test_linalg_warning_with_newton_solver * Ignore ConvergenceWarnings for now if convergence is good * CLN remove counting of warnings * ENH fall back to lbfgs if line search did not converge * DOC better comment on performance bottleneck * Update GLM related examples to use the new solver * CLN address reviewer comments * EXA improve some wordings * CLN do not pop "solver in parameter constraints * CLN fix typos * DOC fix docstring * CLN remove solver newton-qr-cholesky * DOC update PR number in whatsnew * CLN address review comments * CLN remove unnecessary catch_warnings * CLN address some review comments * DOC more precise whatsnew * CLN use init_zero_coef * CLN use and test init_zero_coef * CLN address some review comments * CLN mark NewtonSolver as private by leading underscore * CLN exact comments for inner_solve * TST add test_newton_solver_verbosity * TST extend test_newton_solver_verbosity * TST logic in test_glm_regression_unpenalized * TST use count_nonzero * CLN remove super rare line search checks * MNT move Newton solver to new file _newton_solver.py Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
* FEA add NewtonSolver, CholeskyNewtonSolver and QRCholeskyNewtonSolver * ENH better singular hessian special solve * CLN fix some typos found by reviewer * TST assert ConvergenceWarning is raised * MNT add BaseCholeskyNewtonSolver * WIP colinear design in GLMs * FIX _solve_singular * FIX false unpacking in * TST add tests for unpenalized GLMs * TST fix solutions of glm_dataset * ENH add SVDFallbackSolver * CLN remove SVDFallbackSolver * ENH use gradient step for singular hessians * ENH print iteration number in warnings * TST improve test_linalg_warning_with_newton_solver * CLN LinAlgWarning fron scipy.linalg * ENH more robust hessian * ENH increase maxls for lbfgs to make it more robust * ENH add hessian_warning for too many negative hessian values * CLN some warning messages * ENH add lbfgs_step * ENH use lbfgs_step for hessian_warning * TST make them pass * TST tweek rtol for lbfgs * TST add rigoros test for GLMs * TST improve test_warm_start * ENH improve lbfgs options for better convergence * CLN fix test_warm_start * TST fix assert singular values in datasets * CLN address most review comments * ENH enable more vebosity levels for lbfgs * DOC add whatsnew * CLN remove xfail and clean a bit * CLN docstring about minimum norm * More informative repr for the glm_dataset fixture cases * Forgot to run black * CLN remove unnecessary filterwarnings * CLN address review comments * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * CLN add comment for lbfgs ftol=64 * machine precision * CLN XXX code comment * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * CLN link issue and remove code snippet in comment * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * CLN add catch_warnings * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * [all random seeds] test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * Trigger with -Werror [all random seeds] test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * ENH increase maxls to 50 * [all random seeds] test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * Revert "Trigger with -Werror [all random seeds]" This reverts commit 99f4cf9. * TST add catch_warnings to filterwarnings * TST adapt tests for newton solvers * CLN cleaner gradient step with gradient_times_newton * DOC add whatsnew * ENH always use lbfgs as fallback * TST adapt rtol * TST fix test_linalg_warning_with_newton_solver * CLN address some review comments * Improve tests related to convergence warning on collinear data * overfit -> fit * Typo in comment * Apply suggestions from code review * ENH fallback_lbfgs_solve - Do not use lbfgs steps, fall back complete to lbfgs * ENH adapt rtol * Improve test_linalg_warning_with_newton_solver * Better comments * Fixed Hessian casing and improved warning messages * [all random seeds] test_linalg_warning_with_newton_solver * Ignore ConvergenceWarnings for now if convergence is good * CLN remove counting of warnings * ENH fall back to lbfgs if line search did not converge * DOC better comment on performance bottleneck * Update GLM related examples to use the new solver * CLN address reviewer comments * EXA improve some wordings * CLN do not pop "solver in parameter constraints * CLN fix typos * DOC fix docstring * CLN remove solver newton-qr-cholesky * DOC update PR number in whatsnew * CLN address review comments * CLN remove unnecessary catch_warnings * CLN address some review comments * DOC more precise whatsnew * CLN use init_zero_coef * CLN use and test init_zero_coef * CLN address some review comments * CLN mark NewtonSolver as private by leading underscore * CLN exact comments for inner_solve * TST add test_newton_solver_verbosity * TST extend test_newton_solver_verbosity * TST logic in test_glm_regression_unpenalized * TST use count_nonzero * CLN remove super rare line search checks * MNT move Newton solver to new file _newton_solver.py Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Reference Issues/PRs
Fixes #16634.
What does this implement/fix? Explain your changes.
This PR adds one Newton solver where the Newton step is obtained by a Cholesky decomposition. This solver is a good choice for
n_samples >> n_features
in well-behaved setting (positive definite hessian, usually attained with at least a small L2 penalty).This is basically the same as iterated reweighted least squares (IRLS) with inner Cholesky based solver on the normal equations.
Any other comments?
PoissonRegressor
,GammaRegressor
andTweedieRegressor
have the new solver. It would be very easy to extend it toLogisticRegression
as well.