Skip to content

FIX ConvergenceWarning in plot_gpr_on_structured_data (#31164) #31289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

EngineerDanny
Copy link
Contributor

Reference Issues/PRs

Closes #31164

What does this implement/fix? Explain your changes.

This PR fixes the ConvergenceWarning and subsequent L-BFGS abort in the structured-sequence Gaussian Process example by freezing baseline_similarity_bounds, exactly as core tests already do in test_gpr.py.
No API change.

Any other comments?

Added a one-word typo correction (“operate”) in the example narrative.

Copy link

github-actions bot commented May 1, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: a377b3e. Link to the linter CI: here

Copy link
Contributor Author

@EngineerDanny EngineerDanny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @StefanieSenger and @ogrisel for your detailed investigation.

I’ve kept baseline_similarity_bounds="fixed" and added a note in the example explaining why this parameter is frozen.
Please let me know if more context is needed!

Copy link
Contributor

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work, @EngineerDanny!

I have added a suggestion on how to make this part of the example a bit simpler. The reasonale behind this is to not burden the users with all the details, which derails their attention from the actual intend of the example (which is not to demonstrate how to tune the baseline similarity, and more on model accuracy). They don't need to bother with convergence here, because you have fixed the problem for them to see what is really important. And it is enough to let them know what "fixed" does and that they might want to tune on the hyperparam if they apply this example to their use case. This suggestion is not to diminish your work (I can't even judge without spending a few days on it), but to keep the example a simple read.

Looking forward to see this merged.

#
# %%
# Freeze baseline_similarity to avoid ill‑conditioned optimisation
kernel = SequenceKernel(baseline_similarity_bounds="fixed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now there is baseline_similarity_bounds="fixed" that overwrites the current baseline_similarity_bounds=(1e-5, 1)) in SequenceKernel.__init__() which is still present in the code in line 57, which is unnecessarily complex.

Let's simplify by using either one of these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the line 57 to use baseline_similarity_bounds="fixed" instead.
What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I think we could then leave out setting the param again uppon instantiation.

EngineerDanny and others added 4 commits May 8, 2025 08:42
Co-authored-by: Stefanie Senger <91849487+StefanieSenger@users.noreply.github.com>
Co-authored-by: Stefanie Senger <91849487+StefanieSenger@users.noreply.github.com>
Co-authored-by: Stefanie Senger <91849487+StefanieSenger@users.noreply.github.com>
Copy link
Contributor

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your updates, @EngineerDanny!

There is a rendering issue related to a character following directly the backticks in the note, but apart from that to me this PR looks very good.

What do you think, @ogrisel?

#
# %%
# Freeze baseline_similarity to avoid ill‑conditioned optimisation
kernel = SequenceKernel(baseline_similarity_bounds="fixed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I think we could then leave out setting the param again uppon instantiation.

Comment on lines +108 to +110
# to show this example without ``ConvergenceWarning``s that would otherwise raise.
# In another use case, you probably want to optimise on
# ``baseline_similarity_bounds`` and should set bounds by passing a tuple of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# to show this example without ``ConvergenceWarning``s that would otherwise raise.
# In another use case, you probably want to optimise on
# ``baseline_similarity_bounds`` and should set bounds by passing a tuple of
# to show this example without ``ConvergenceWarning`` that would otherwise raise.
# In another use case, you probably want to optimise on
# ``baseline_similarity_bounds`` and should set bounds by passing a tuple of

This should fix the rendering issues that show up in the CI build.

Comment on lines +106 to +110
# .. note::
# Here, we freeze ``baseline_similarity_bounds`` by setting it to `"fixed"` in order
# to show this example without ``ConvergenceWarning``s that would otherwise raise.
# In another use case, you probably want to optimise on
# ``baseline_similarity_bounds`` and should set bounds by passing a tuple of
Copy link
Member

@ogrisel ogrisel May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it simple:

Suggested change
# .. note::
# Here, we freeze ``baseline_similarity_bounds`` by setting it to `"fixed"` in order
# to show this example without ``ConvergenceWarning``s that would otherwise raise.
# In another use case, you probably want to optimise on
# ``baseline_similarity_bounds`` and should set bounds by passing a tuple of
# .. note::
# Here, we freeze the value of ``baseline_similarity`` by setting
# `baseline_similarity_bounds="fixed"` as LBFGS would otherwise fail
# to optimize the value of this kernel parameter for some unknown reason.

I think the observed lack of convergence when passing non-fixed bounds reveals a bug, but I am not familiar enough with the GP literature w.r.t. kernel parameter tuning guarantees to be 100% sure whether this is expected or not in this particular. And if this is a bug, I don't know if it's in the kernel code, or the choice of parametrization or in the way we use LBFGS in the tuning code.

Maybe @snath-xoc has an idea? See the discussion in the linked issue for more background.

@snath-xoc
Copy link
Contributor

Hi all, so at first I thought it may be due to the implementation of the GPC (it uses a binary Laplace method of approximating the posterior), however the same issue occurs with the GPR. Monitoring the log-marginal likelihood values we get, there also seems to be no change (I get steady values of 3.88 if I increase the number of iterations). Moreover, I get an error of "ABNORMAL_TERMINATION_IN_LNSRCH" from the lfbgs with ore iterations. This typically occurs when the gradient does not correctly match the function, and so the line-search fails. I think the problem may lie in the way we are therefore calculating _f and _g i.e., kernel specification and not a bug in the GPR or GPC code (phew).

I am not entirely sure what the best fix here is, I think fixing the baseline similarity bounds seems to give a good enough solution, or else we would need to try a different way of calculating the gradient/jac (any ideas?)

@ogrisel
Copy link
Member

ogrisel commented May 13, 2025

Thanks for your analysis @snath-xoc. @EngineerDanny I am +1 for merging this PR with the more direct message suggested in #31289 (comment) for the time being, but if someone understands how to fix the gradient computation of the kernel to make the optimization work in this example, feel free to say so or open a follow-up PR.

@StefanieSenger StefanieSenger added the Waiting for Second Reviewer First reviewer is done, need a second one! label May 22, 2025
@adrinjalali
Copy link
Member

Seems like we have a solution proposal in #31366. Shall we wait for that issue to be closed then?

@StefanieSenger
Copy link
Contributor

We could also merge this PR for now and replace the "fixed" bounds with real bounds in the actual fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Waiting for Second Reviewer First reviewer is done, need a second one!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix ConvergenceWarning in plot_gpr_on_structured_data.py example
5 participants