Skip to content

Don't keep the store open in by_store_ctrl_ex #28198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mattcaswell
Copy link
Member

Previously #27529 made a change to by_store_ctrl_ex in order to open the OSSL_STORE early. The reason given in that PR is:

"This way, we can call OSSL_STORE_open_ex() in by_store_ctrl_ex(), and get to see possible errors when the URI is loaded"

That PR then kept the store open until cache_objects is called and then reused it. Unfortunately by the time cache_objects() is called we could be in a multi-threaded scenario where the X509_STORE is being shared by multiple threads. We then get a race condition where multiple threads are all using (and ultimately closing) the same OSSL_STORE_CTX.

The purpose of keeping the OSSL_STORE object between by_store_ctrl_ex() and cache_objects is presumably an optimisation to avoid having to open the store twice. But this does not work because of the above issue.

We just take the hit and open it again.

Fixes #28171

Previously openssl#27529 made a change to `by_store_ctrl_ex` in order to open
the OSSL_STORE early. The reason given in that PR is:

"This way, we can call OSSL_STORE_open_ex() in by_store_ctrl_ex(), and
get to see possible errors when the URI is loaded"

That PR then kept the store open until cache_objects is called and then
reused it. Unfortunately by the time cache_objects() is called we could be
in a multi-threaded scenario where the X509_STORE is being shared by multiple
threads. We then get a race condition where multiple threads are all using
(and ultimately closing) the same `OSSL_STORE_CTX`.

The purpose of keeping the `OSSL_STORE` object between by_store_ctrl_ex() and
`cache_objects` is presumably an optimisation to avoid having to open the
store twice. But this does not work because of the above issue.

We just take the hit and open it again.

Fixes openssl#28171
@esyr
Copy link
Contributor

esyr commented Aug 7, 2025

I guess it also fixes #28065.

Copy link
Contributor

@jogme jogme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@t8m
Copy link
Member

t8m commented Aug 8, 2025

I assume this will revert the performance improvement reported in #27529 (comment) though. Unfortunately I am afraid there is no way around that.

Will you take it out of draft @mattcaswell ?

@levitte
Copy link
Member

levitte commented Aug 11, 2025

Unfortunately, there's the risk that there are errors when reopening the URI in cache_objects, i.e. that this effectively reverts what #27529 was meant to fix (invisible errors, 'cause cache_objects can't report errors in any sensible way).

Mind you, I do understand the reasons for this PR, but it seems we're weighing one bad outcome against another.

@kroeckx
Copy link
Member

kroeckx commented Aug 11, 2025

Can you clarify why cache_objects can't report errors in any sensible way?

@levitte
Copy link
Member

levitte commented Aug 11, 2025

Can you clarify why cache_objects can't report errors in any sensible way?

Sure. It goes back to #27461, which was about STORE URIs not being reported on when giving them (when by_store_ctrl_ex is called), but later, when trying to use objects. It's even more exasperating when multiple stores are given, and tried through one after the other, and errors are ignored on the assumption that the desired object simply wasn't found (i.e. that finding it continues in the next store).

So cache_objects can report errors, but depending on where it's called, those reports may be confusing

@kroeckx
Copy link
Member

kroeckx commented Aug 11, 2025

And there is no cache per location?

@levitte
Copy link
Member

levitte commented Aug 11, 2025

Not sure I understand the question... CACHED_STORE is indeed a per-location struct. However, the error queue isn't cached per location.

@mattcaswell
Copy link
Member Author

Unfortunately, there's the risk that there are errors when reopening the URI in cache_objects, i.e. that this effectively reverts what #27529 was meant to fix

That risk existed even in #27529. In #27529 it only reused the currently open store once. After that it reopened it again anyway (because it closed the store at the end of cache_objects and set store->ctx = NULL). Since cache_objects is called regularly there was plenty of opportunity for errors.

@mattcaswell
Copy link
Member Author

Will you take it out of draft @mattcaswell ?

I think there is more work to be done. I'd like to confirm the performance impact (which I hope is negligible) - and I'd like to write a test. Unfortunately that won't happen until next week at the earliest.

@levitte
Copy link
Member

levitte commented Aug 11, 2025

Unfortunately, there's the risk that there are errors when reopening the URI in cache_objects, i.e. that this effectively reverts what #27529 was meant to fix

That risk existed even in #27529. In #27529 it only reused the currently open store once. After that it reopened it again anyway (because it closed the store at the end of cache_objects and set store->ctx = NULL). Since cache_objects is called regularly there was plenty of opportunity for errors.

Oh, good point

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

concurrent TLS connection segfault in x509 storage (regression on 3.0.17)
7 participants