Skip to content

openssl: fix Valgrind issues in nightly builds #5398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 18, 2020
Merged

Conversation

pks-t
Copy link
Member

@pks-t pks-t commented Feb 11, 2020

Our nightly builds are reporting quite a lot of issues due to use of uninitialized bytes, as reported in #5394. The root cause is inconsistencies between our normal and our nightly pipelines, where our nightly jobs missed "-DVALGRIND". We thus didn't use the sanitizing macro provided by Valgrind for these bytes, causing the warnings.

This PR fixes two issues: first, it consolidates -DVALGRIND into -DUSE_LEAKS_CHECKER=valgrind to avoid above errors going forward. Fixing this breaks our Bionic builds, though, as they use OpenSSL 1.1, which made struct SSL opaque. The second commit thus switches us over to using our own allocator that initializes all bytes and sets it up for use by OpenSSL, removing the old Valgrind macros. Added benefit is that this works generically across leaks checkers, even though it's not yet wired up for the others.

Closes #5394.

OpenSSL doesn't initialize bytes on purpose in order to generate
additional entropy. Valgrind isn't too happy about that though, causing
it to generate warninings about various issues regarding use of
uninitialized bytes.

We traditionally had some infrastructure to silence these errors in our
OpenSSL stream implementation, where we invoke the Valgrind macro
`VALGRIND_MAKE_MEMDEFINED` in various callbacks that we provide to
OpenSSL. Naturally, we only include these instructions if a preprocessor
define "VALGRIND" is set, and that in turn is only set if passing
"-DVALGRIND" to CMake. We do that in our usual Azure pipelines, but we
in fact forgot to do this in our nightly build. As a result, we get a
slew of warnings for these nightly builds, but not for our normal
builds.

To fix this, we could just add "-DVALGRIND" to our nightly builds. But
starting with commit d827b11 (tests: execute leak checker via CTest
directly, 2019-06-28), we do have a secondary variable that directs
whether we want to use memory sanitizers for our builds. As such, every
user wishing to use Valgrind for our tests needs to pass both options
"VALGRIND" and "USE_LEAK_CHECKER", which is cumbersome and error prone,
as can be seen by our own builds.

Instead, let's consolidate this into a single option, removing the old
"-DVALGRIND" one. Instead, let's just add the preprocessor directive if
USE_LEAK_CHECKER equals "valgrind" and remove "-DVALGRIND" from our own
pipelines.
As OpenSSL loves using uninitialized bytes as another source of entropy,
we need to mark them as defined so that Valgrind won't complain about
use of these bytes. Traditionally, we've been using the macro
`VALGRIND_MAKE_MEM_DEFINED` provided by Valgrind, but starting with
OpenSSL 1.1 the code doesn't compile anymore due to `struct SSL` having
become opaque. As such, we also can't set it as defined anymore, as we
have no way of knowing its size.

Let's change gears instead by just swapping out the allocator functions
of OpenSSL with our own ones. The twist is that instead of calling
`malloc`, we just call `calloc` to have the bytes initialized
automatically. Next to soothing Valgrind, this approach has the benefit
of being completely agnostic of the memory sanitizer and is neatly
contained at a single place.

Note that we shouldn't do this for non-Valgrind builds. As we cannot
set up memory functions for a given SSL context, only, we need to swap
them at a global context. Furthermore, as it's possible to call
`OPENSSL_set_mem_functions` once only, we'd prevent users of libgit2 to
set up their own allocators.
@ethomson
Copy link
Member

We thus didn't use the sanitizing macro provided by Valgrind for these bytes, causing the warnings.

Aha. 😭 Well, that makes sense as least.

@pks-t
Copy link
Member Author

pks-t commented Feb 11, 2020

Btw, failure in nightlies is currently due to ARM32. They're the last thing on my CI spring cleanup list, after that nightlies should be up to task again.

@ethomson
Copy link
Member

Regrettably, the error in the ARM builds is not actually in our code, it's a problem in QEMU when running Go images. Which docker is. One can, as best as I can tell, no longer run docker multiarch images that are ARM32 or ARM64.

I propose that we remove those from the nightly matrix for now. I like testing nightlies on ARM, so we might want to consider putting real hardware here. I had considered this before, and it's probably time to reconsider.

Let's get this in and then release 0.99?

@pks-t
Copy link
Member Author

pks-t commented Feb 18, 2020

Regrettably, the error in the ARM builds is not actually in our code, it's a problem in QEMU when running Go images. Which docker is. One can, as best as I can tell, no longer run docker multiarch images that are ARM32 or ARM64.

There's some good news here: it's not the Qemu/Docker/Go combination in general, but this has been broken by me when introducing gosu for unprivileged builds, which is a Go-based binary. I've replaced it with sudo in #5406, fixing our ARM builds. There's one test failure due to insufficient RAM on ARM32 which I fix in that PR, too, as well as some smallish cleanups to our CI infrastructure.

I propose that we remove those from the nightly matrix for now. I like testing nightlies on ARM, so we might want to consider putting real hardware here. I had considered this before, and it's probably time to reconsider.

So luckily, this is not required. We should just merge #5406 to fix them properly.

Let's get this in and then release 0.99?

Yup, agreed. Let's merge this here and #5406 and then go for v0.99. Care to give the other PR a review so that we can fast-track it? I'll then handle the release on Wednesday or Thursday, depending on how fast we are.

@pks-t pks-t merged commit 5254c9b into master Feb 18, 2020
@pks-t pks-t deleted the pks/valgrind-openssl branch February 18, 2020 17:49
@ethomson
Copy link
Member

There's some good news here: it's not the Qemu/Docker/Go combination in general, but this has been broken by me when introducing gosu for unprivileged builds, which is a Go-based binary. I've replaced it with sudo in #5406, fixing our ARM builds. There's one test failure due to insufficient RAM on ARM32 which I fix in that PR, too, as well as some smallish cleanups to our CI infrastructure.

That's great. I was thinking that the go errors were docker, not gosu -- docker is written in go, IIRC, and I was definitely hitting this locally, even when (trying to) override the entrypoint for the container. Possible that my dockerfu is just weak and I was doing this wrong and still hitting gosu somehow. Thanks for taking care of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

httpclient: reads on uninitialized memory
2 participants