Skip to content

FEA add (single) Cholesky Newton solver to GLMs #24637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 115 commits into from
Oct 24, 2022

Conversation

lorentzenchr
Copy link
Member

@lorentzenchr lorentzenchr commented Oct 11, 2022

Reference Issues/PRs

Fixes #16634.

What does this implement/fix? Explain your changes.

This PR adds one Newton solver where the Newton step is obtained by a Cholesky decomposition. This solver is a good choice for n_samples >> n_features in well-behaved setting (positive definite hessian, usually attained with at least a small L2 penalty).
This is basically the same as iterated reweighted least squares (IRLS) with inner Cholesky based solver on the normal equations.

Any other comments?

@ogrisel
Copy link
Member

ogrisel commented Oct 24, 2022

The pylatest_pip_openblas_pandas failure has been fixed in main.

@lorentzenchr
Copy link
Member Author

A unrelated question: Why don't we link the codecov in the CI anymore?

@lorentzenchr
Copy link
Member Author

The pylatest_pip_openblas_pandas failure has been fixed in main.

Great. So let's hope, CI gets 🟢 .

@jjerphan
Copy link
Member

All 💚! I leave @ogrisel merge if everything looks good to him.

@lorentzenchr
Copy link
Member Author

lorentzenchr commented Oct 24, 2022

wait a minute

Copy link
Member Author

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One open question to go...

Comment on lines 301 to 317
# 2.2 Deal with relative gradient differences around machine precision.
tiny_grad = sum_abs_grad_old * eps
abs_grad_improvement = np.abs(sum_abs_grad - sum_abs_grad_old)
check = abs_grad_improvement <= tiny_grad
if is_verbose:
print(
" check |sum(|gradient|) - sum(|gradient_old|)| <= eps * "
"sum(|gradient_old|):"
f" {abs_grad_improvement} <= {tiny_grad} {check}"
)
if check:
break
# 2.3 This is really the last resort.
# Check that sum(|gradient_{i-1}|) < sum(|gradient_{i-2}|)
# = has_improved_sum_abs_grad_previous
# If now sum(|gradient_{i}|) >= sum(|gradient_{i-1}|), this iteration
# made things worse and we should have stopped at i-1.
Copy link
Member Author

@lorentzenchr lorentzenchr Oct 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm undecided whether we need 2.2 and 2.3 in the line search. We could just fail and resort to lbfgs.
I added those 2 criteria when I was trying fallback strategies (e.g. simple gradient steps) for situations like indefinite hessian, negative pointwise hessian values, collinear features.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I think we can first validate this solver by merging this PR and do this study in a follow-up PR. What do you think?

Suggested change
# 2.2 Deal with relative gradient differences around machine precision.
tiny_grad = sum_abs_grad_old * eps
abs_grad_improvement = np.abs(sum_abs_grad - sum_abs_grad_old)
check = abs_grad_improvement <= tiny_grad
if is_verbose:
print(
" check |sum(|gradient|) - sum(|gradient_old|)| <= eps * "
"sum(|gradient_old|):"
f" {abs_grad_improvement} <= {tiny_grad} {check}"
)
if check:
break
# 2.3 This is really the last resort.
# Check that sum(|gradient_{i-1}|) < sum(|gradient_{i-2}|)
# = has_improved_sum_abs_grad_previous
# If now sum(|gradient_{i}|) >= sum(|gradient_{i-1}|), this iteration
# made things worse and we should have stopped at i-1.
# TODO: study removing 2.2 and 2.3 for instead resorting to lbfgs-b.
# 2.2 Deal with relative gradient differences around machine precision.
tiny_grad = sum_abs_grad_old * eps
abs_grad_improvement = np.abs(sum_abs_grad - sum_abs_grad_old)
check = abs_grad_improvement <= tiny_grad
if is_verbose:
print(
" check |sum(|gradient|) - sum(|gradient_old|)| <= eps * "
"sum(|gradient_old|):"
f" {abs_grad_improvement} <= {tiny_grad} {check}"
)
if check:
break
# 2.3 This is really the last resort.
# Check that sum(|gradient_{i-1}|) < sum(|gradient_{i-2}|)
# = has_improved_sum_abs_grad_previous
# If now sum(|gradient_{i}|) >= sum(|gradient_{i-1}|), this iteration
# made things worse and we should have stopped at i-1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main problem is that I can't find the settings anymore that led me to implement those conditions. I could also remove now and we can reinsert when problems arrive. Those lines are still present in #23314.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please try to remove them and re-run the 2 PR examples and the benchmarks of this comment?

#23314 (comment)

If the benchmark curve still looks the same with no new no-convergence warning, then I am fine to merge the PR without those checks and consider re-adding them if we observe the ConvergenceWarning on a real dataset.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I still think we could extract the line search code into a standalone function with dedicated unit tests. That would probably help trigger those conditions on arbitrary optimization problems (such as Multidimensional generalisations of the Rosenbrock function).

Can be done in a follow-up PR if what I said in the previous comment does not reveal any problem by itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d304ce9 remove those 2 line search criteria.

Comment on lines 301 to 317
# 2.2 Deal with relative gradient differences around machine precision.
tiny_grad = sum_abs_grad_old * eps
abs_grad_improvement = np.abs(sum_abs_grad - sum_abs_grad_old)
check = abs_grad_improvement <= tiny_grad
if is_verbose:
print(
" check |sum(|gradient|) - sum(|gradient_old|)| <= eps * "
"sum(|gradient_old|):"
f" {abs_grad_improvement} <= {tiny_grad} {check}"
)
if check:
break
# 2.3 This is really the last resort.
# Check that sum(|gradient_{i-1}|) < sum(|gradient_{i-2}|)
# = has_improved_sum_abs_grad_previous
# If now sum(|gradient_{i}|) >= sum(|gradient_{i-1}|), this iteration
# made things worse and we should have stopped at i-1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I think we can first validate this solver by merging this PR and do this study in a follow-up PR. What do you think?

Suggested change
# 2.2 Deal with relative gradient differences around machine precision.
tiny_grad = sum_abs_grad_old * eps
abs_grad_improvement = np.abs(sum_abs_grad - sum_abs_grad_old)
check = abs_grad_improvement <= tiny_grad
if is_verbose:
print(
" check |sum(|gradient|) - sum(|gradient_old|)| <= eps * "
"sum(|gradient_old|):"
f" {abs_grad_improvement} <= {tiny_grad} {check}"
)
if check:
break
# 2.3 This is really the last resort.
# Check that sum(|gradient_{i-1}|) < sum(|gradient_{i-2}|)
# = has_improved_sum_abs_grad_previous
# If now sum(|gradient_{i}|) >= sum(|gradient_{i-1}|), this iteration
# made things worse and we should have stopped at i-1.
# TODO: study removing 2.2 and 2.3 for instead resorting to lbfgs-b.
# 2.2 Deal with relative gradient differences around machine precision.
tiny_grad = sum_abs_grad_old * eps
abs_grad_improvement = np.abs(sum_abs_grad - sum_abs_grad_old)
check = abs_grad_improvement <= tiny_grad
if is_verbose:
print(
" check |sum(|gradient|) - sum(|gradient_old|)| <= eps * "
"sum(|gradient_old|):"
f" {abs_grad_improvement} <= {tiny_grad} {check}"
)
if check:
break
# 2.3 This is really the last resort.
# Check that sum(|gradient_{i-1}|) < sum(|gradient_{i-2}|)
# = has_improved_sum_abs_grad_previous
# If now sum(|gradient_{i}|) >= sum(|gradient_{i-1}|), this iteration
# made things worse and we should have stopped at i-1.

@lorentzenchr
Copy link
Member Author

Check of final changes:

  • SKLEARN_TESTS_GLOBAL_RANDOM_SEED="all" pytest -n auto -x sklearn/linear_model/_glm/tests/test_glm.py passes locally
  • Benchmark plots as in FEA add Cholesky based Newton solver to GLMs #23314 (comment)
    image
    image
    • "lbfgs" is LogisticRegression(solver="lbfgs")
    • "lbfgs2 is using
      class BinomialRegressor(_GeneralizedLinearRegressor):
      def _get_loss(self):
          return HalfBinomialLoss()
      with solver="lbfgs"

@lorentzenchr
Copy link
Member Author

The plot becomes even more convincing with train_size=50_000 instead of just 5000:
image

@ogrisel
Copy link
Member

ogrisel commented Oct 24, 2022

Let's wait for the CI to complete, if there is no new ConvergenceWarning in the examples, please feel free to merge!

@jjerphan
Copy link
Member

No ConvergenceWarning, hence…

@jjerphan jjerphan merged commit ff9344f into scikit-learn:main Oct 24, 2022
@jjerphan
Copy link
Member

Merged!

Thank you a lot for bringing this solver in, @lorentzenchr! Thanks @ogrisel for your comprehensive review and analysis.

@lorentzenchr lorentzenchr deleted the glm_newton_cholesky_only branch October 24, 2022 21:39
@lorentzenchr
Copy link
Member Author

@jjerphan @ogrisel That was a great endeavor together with you two. Thanks for having brought this over the finish line before the 1.2 release!

I'm wondering if we should enable this solver in LogisticRegression, too, at least for binary classification problems.

@ogrisel
Copy link
Member

ogrisel commented Oct 25, 2022

+1 for binary logistic regression. Multinomial LR will probably require significantly more work and will be problematic when both n_classes and n_features are non small at the same time.

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Oct 31, 2022
* FEA add NewtonSolver, CholeskyNewtonSolver and QRCholeskyNewtonSolver

* ENH better singular hessian special solve

* CLN fix some typos found by reviewer

* TST assert ConvergenceWarning is raised

* MNT add BaseCholeskyNewtonSolver

* WIP colinear design in GLMs

* FIX _solve_singular

* FIX false unpacking in

* TST add tests for unpenalized GLMs

* TST fix solutions of glm_dataset

* ENH add SVDFallbackSolver

* CLN remove SVDFallbackSolver

* ENH use gradient step for singular hessians

* ENH print iteration number in warnings

* TST improve test_linalg_warning_with_newton_solver

* CLN LinAlgWarning fron scipy.linalg

* ENH more robust hessian

* ENH increase maxls for lbfgs to make it more robust

* ENH add hessian_warning for too many negative hessian values

* CLN some warning messages

* ENH add lbfgs_step

* ENH use lbfgs_step for hessian_warning

* TST make them pass

* TST tweek rtol for lbfgs

* TST add rigoros test for GLMs

* TST improve test_warm_start

* ENH improve lbfgs options for better convergence

* CLN fix test_warm_start

* TST fix assert singular values in datasets

* CLN address most review comments

* ENH enable more vebosity levels for lbfgs

* DOC add whatsnew

* CLN remove xfail and clean a bit

* CLN docstring about minimum norm

* More informative repr for the glm_dataset fixture cases

* Forgot to run black

* CLN remove unnecessary filterwarnings

* CLN address review comments

* Trigger [all random seeds] on the following tests:
test_glm_regression
test_glm_regression_hstacked_X
test_glm_regression_vstacked_X
test_glm_regression_unpenalized
test_glm_regression_unpenalized_hstacked_X
test_glm_regression_unpenalized_vstacked_X
test_warm_start

* CLN add comment for lbfgs ftol=64 * machine precision

* CLN XXX code comment

* Trigger [all random seeds] on the following tests:

test_glm_regression
test_glm_regression_hstacked_X
test_glm_regression_vstacked_X
test_glm_regression_unpenalized
test_glm_regression_unpenalized_hstacked_X
test_glm_regression_unpenalized_vstacked_X
test_warm_start

* CLN link issue and remove code snippet in comment

* Trigger [all random seeds] on the following tests:

test_glm_regression
test_glm_regression_hstacked_X
test_glm_regression_vstacked_X
test_glm_regression_unpenalized
test_glm_regression_unpenalized_hstacked_X
test_glm_regression_unpenalized_vstacked_X
test_warm_start

* CLN add catch_warnings

* Trigger [all random seeds] on the following tests:

test_glm_regression
test_glm_regression_hstacked_X
test_glm_regression_vstacked_X
test_glm_regression_unpenalized
test_glm_regression_unpenalized_hstacked_X
test_glm_regression_unpenalized_vstacked_X
test_warm_start

* Trigger [all random seeds] on the following tests:

test_glm_regression
test_glm_regression_hstacked_X
test_glm_regression_vstacked_X
test_glm_regression_unpenalized
test_glm_regression_unpenalized_hstacked_X
test_glm_regression_unpenalized_vstacked_X
test_warm_start

* [all random seeds]

test_glm_regression
test_glm_regression_hstacked_X
test_glm_regression_vstacked_X
test_glm_regression_unpenalized
test_glm_regression_unpenalized_hstacked_X
test_glm_regression_unpenalized_vstacked_X
test_warm_start

* Trigger with -Werror [all random seeds]

test_glm_regression
test_glm_regression_hstacked_X
test_glm_regression_vstacked_X
test_glm_regression_unpenalized
test_glm_regression_unpenalized_hstacked_X
test_glm_regression_unpenalized_vstacked_X
test_warm_start

* ENH increase maxls to 50

* [all random seeds]

test_glm_regression
test_glm_regression_hstacked_X
test_glm_regression_vstacked_X
test_glm_regression_unpenalized
test_glm_regression_unpenalized_hstacked_X
test_glm_regression_unpenalized_vstacked_X
test_warm_start

* Revert "Trigger with -Werror [all random seeds]"

This reverts commit 99f4cf9.

* TST add catch_warnings to filterwarnings

* TST adapt tests for newton solvers

* CLN cleaner gradient step with gradient_times_newton

* DOC add whatsnew

* ENH always use lbfgs as fallback

* TST adapt rtol

* TST fix test_linalg_warning_with_newton_solver

* CLN address some review comments

* Improve tests related to convergence warning on collinear data

* overfit -> fit

* Typo in comment

* Apply suggestions from code review

* ENH fallback_lbfgs_solve
- Do not use lbfgs steps, fall back complete to lbfgs

* ENH adapt rtol

* Improve test_linalg_warning_with_newton_solver

* Better comments

* Fixed Hessian casing and improved warning messages

* [all random seeds]

test_linalg_warning_with_newton_solver

* Ignore ConvergenceWarnings for now if convergence is good

* CLN remove counting of warnings

* ENH fall back to lbfgs if line search did not converge

* DOC better comment on performance bottleneck

* Update GLM related examples to use the new solver

* CLN address reviewer comments

* EXA improve some wordings

* CLN do not pop "solver in parameter constraints

* CLN fix typos

* DOC fix docstring

* CLN remove solver newton-qr-cholesky

* DOC update PR number in whatsnew

* CLN address review comments

* CLN remove unnecessary catch_warnings

* CLN address some review comments

* DOC more precise whatsnew

* CLN use init_zero_coef

* CLN use and test init_zero_coef

* CLN address some review comments

* CLN mark NewtonSolver as private by leading underscore

* CLN exact comments for inner_solve

* TST add test_newton_solver_verbosity

* TST extend test_newton_solver_verbosity

* TST logic in test_glm_regression_unpenalized

* TST use count_nonzero

* CLN remove super rare line search checks

* MNT move Newton solver to new file _newton_solver.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
andportnoy pushed a commit to andportnoy/scikit-learn that referenced this pull request Nov 5, 2022
* FEA add NewtonSolver, CholeskyNewtonSolver and QRCholeskyNewtonSolver

* ENH better singular hessian special solve

* CLN fix some typos found by reviewer

* TST assert ConvergenceWarning is raised

* MNT add BaseCholeskyNewtonSolver

* WIP colinear design in GLMs

* FIX _solve_singular

* FIX false unpacking in

* TST add tests for unpenalized GLMs

* TST fix solutions of glm_dataset

* ENH add SVDFallbackSolver

* CLN remove SVDFallbackSolver

* ENH use gradient step for singular hessians

* ENH print iteration number in warnings

* TST improve test_linalg_warning_with_newton_solver

* CLN LinAlgWarning fron scipy.linalg

* ENH more robust hessian

* ENH increase maxls for lbfgs to make it more robust

* ENH add hessian_warning for too many negative hessian values

* CLN some warning messages

* ENH add lbfgs_step

* ENH use lbfgs_step for hessian_warning

* TST make them pass

* TST tweek rtol for lbfgs

* TST add rigoros test for GLMs

* TST improve test_warm_start

* ENH improve lbfgs options for better convergence

* CLN fix test_warm_start

* TST fix assert singular values in datasets

* CLN address most review comments

* ENH enable more vebosity levels for lbfgs

* DOC add whatsnew

* CLN remove xfail and clean a bit

* CLN docstring about minimum norm

* More informative repr for the glm_dataset fixture cases

* Forgot to run black

* CLN remove unnecessary filterwarnings

* CLN address review comments

* Trigger [all random seeds] on the following tests:
test_glm_regression
test_glm_regression_hstacked_X
test_glm_regression_vstacked_X
test_glm_regression_unpenalized
test_glm_regression_unpenalized_hstacked_X
test_glm_regression_unpenalized_vstacked_X
test_warm_start

* CLN add comment for lbfgs ftol=64 * machine precision

* CLN XXX code comment

* Trigger [all random seeds] on the following tests:

test_glm_regression
test_glm_regression_hstacked_X
test_glm_regression_vstacked_X
test_glm_regression_unpenalized
test_glm_regression_unpenalized_hstacked_X
test_glm_regression_unpenalized_vstacked_X
test_warm_start

* CLN link issue and remove code snippet in comment

* Trigger [all random seeds] on the following tests:

test_glm_regression
test_glm_regression_hstacked_X
test_glm_regression_vstacked_X
test_glm_regression_unpenalized
test_glm_regression_unpenalized_hstacked_X
test_glm_regression_unpenalized_vstacked_X
test_warm_start

* CLN add catch_warnings

* Trigger [all random seeds] on the following tests:

test_glm_regression
test_glm_regression_hstacked_X
test_glm_regression_vstacked_X
test_glm_regression_unpenalized
test_glm_regression_unpenalized_hstacked_X
test_glm_regression_unpenalized_vstacked_X
test_warm_start

* Trigger [all random seeds] on the following tests:

test_glm_regression
test_glm_regression_hstacked_X
test_glm_regression_vstacked_X
test_glm_regression_unpenalized
test_glm_regression_unpenalized_hstacked_X
test_glm_regression_unpenalized_vstacked_X
test_warm_start

* [all random seeds]

test_glm_regression
test_glm_regression_hstacked_X
test_glm_regression_vstacked_X
test_glm_regression_unpenalized
test_glm_regression_unpenalized_hstacked_X
test_glm_regression_unpenalized_vstacked_X
test_warm_start

* Trigger with -Werror [all random seeds]

test_glm_regression
test_glm_regression_hstacked_X
test_glm_regression_vstacked_X
test_glm_regression_unpenalized
test_glm_regression_unpenalized_hstacked_X
test_glm_regression_unpenalized_vstacked_X
test_warm_start

* ENH increase maxls to 50

* [all random seeds]

test_glm_regression
test_glm_regression_hstacked_X
test_glm_regression_vstacked_X
test_glm_regression_unpenalized
test_glm_regression_unpenalized_hstacked_X
test_glm_regression_unpenalized_vstacked_X
test_warm_start

* Revert "Trigger with -Werror [all random seeds]"

This reverts commit 99f4cf9.

* TST add catch_warnings to filterwarnings

* TST adapt tests for newton solvers

* CLN cleaner gradient step with gradient_times_newton

* DOC add whatsnew

* ENH always use lbfgs as fallback

* TST adapt rtol

* TST fix test_linalg_warning_with_newton_solver

* CLN address some review comments

* Improve tests related to convergence warning on collinear data

* overfit -> fit

* Typo in comment

* Apply suggestions from code review

* ENH fallback_lbfgs_solve
- Do not use lbfgs steps, fall back complete to lbfgs

* ENH adapt rtol

* Improve test_linalg_warning_with_newton_solver

* Better comments

* Fixed Hessian casing and improved warning messages

* [all random seeds]

test_linalg_warning_with_newton_solver

* Ignore ConvergenceWarnings for now if convergence is good

* CLN remove counting of warnings

* ENH fall back to lbfgs if line search did not converge

* DOC better comment on performance bottleneck

* Update GLM related examples to use the new solver

* CLN address reviewer comments

* EXA improve some wordings

* CLN do not pop "solver in parameter constraints

* CLN fix typos

* DOC fix docstring

* CLN remove solver newton-qr-cholesky

* DOC update PR number in whatsnew

* CLN address review comments

* CLN remove unnecessary catch_warnings

* CLN address some review comments

* DOC more precise whatsnew

* CLN use init_zero_coef

* CLN use and test init_zero_coef

* CLN address some review comments

* CLN mark NewtonSolver as private by leading underscore

* CLN exact comments for inner_solve

* TST add test_newton_solver_verbosity

* TST extend test_newton_solver_verbosity

* TST logic in test_glm_regression_unpenalized

* TST use count_nonzero

* CLN remove super rare line search checks

* MNT move Newton solver to new file _newton_solver.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add IRLS solver for linear models
4 participants