Skip to content

Uhashlib fix multiple digest calls #4488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

Jongy
Copy link
Contributor

@Jongy Jongy commented Feb 10, 2019

Fixes #4119 .

Tested on unix and ESP32. On the ESP32, code size increase by 16 bytes. Worth it IMO to fix this very unintuitive problem.

Concept was taken from CPython, specifically from `Modules/sha1module.c:SHA1Type_digest_impl`.

I deliberately removed the `mbedtls_*_free` calls, as they won't allow continious
digesting, and anyway we can omit them because all they do is `memset(0)` (And it's safe to say
we favor the code size decrease over the possible crypto harm done by not zeroing the hash
structure).
@Jongy Jongy force-pushed the uhashlib-fix-multiple-digest-calls branch from 0ad5ba4 to ae7ada4 Compare February 10, 2019 22:14
@pfalcon
Copy link
Contributor

pfalcon commented Feb 10, 2019

Well, this doesn't just increase code size by 16 bytes, this increases stack usage noticeably. And with all that, there's still no guarantee that this is going to work in the general case, e.g. when real context is stored in hardware, and you copy just some dummy data in memory.

@Jongy
Copy link
Contributor Author

Jongy commented Feb 10, 2019

And with all that, there's still no guarantee that this is going to work in the general case, e.g. when real context is stored in hardware, and you copy just some dummy data in memory.

Well, that's why I didn't modify cc3200's impl which is HW based (and which appears to take care of the multiple digest and partial hashing itself, anyway).

About the stack size - the context structures go from 80~ to 110~ bytes. I don't know what we consider legit here (regarding increasing the stack usage).

Anyway I think this really sucks that you just get wrong results. We should either apply this fix, or, if we don't find this to our liking, perhaps raise an exception on multiple/partial digests?

@pfalcon
Copy link
Contributor

pfalcon commented Feb 10, 2019

About the stack size - the context structures go from 80~ to 110~ bytes.

I myself would consider that big. Everything is relative of course. I also would consider it's sad that for normal usercase, where first all needed data is fed, then digest taken once, there's this extra copy overhead. That's why I myself never considered a solution like that, but at one point I considered to add "finalized" flag, which would allow to call .digest() multiple times, but not feed more data.

@Jongy
Copy link
Contributor Author

Jongy commented Feb 14, 2019

there's this extra copy overhead

memcpy(..., ..., 100~ bytes) is not really comparable to the _final operation in any typical hashing algorithm these days..

I considered to add "finalized" flag, which would allow to call .digest() multiple times

Yeah that's what I started with, but apparently that can't be easily done either. If you don't store the digest yourself, the API of existing hash impls don't expose it very nicely, and you need to do some extra work to "re-finish" the hash, per type.

@dpgeorge
Copy link
Member

Thanks @Jongy for posting this PR. Although it does make it more user friendly, the stack size usage is a real concern, mainly because the final functions (eg mbedtls_sha256_finish_ret) already use a lot of stack internally (eg 300 bytes).

And I'm not sure about the use case of supporting multiple calls to digest, and also partial digests. Is it really needed to do this? If it is needed then I think it's much better to add .copy() (per #4539) so that a copy can be made before calling digest, to support multiple calls to digest.

Given all that, see #7316 which just adds a flag to prevent update/digest from being called after digest is called the first time.

@Jongy
Copy link
Contributor Author

Jongy commented May 25, 2021

Replaced by #7316 which is a different approach to avoid the same issue of getting the wrong result. I think that the core issue was the un-intuitiveness, and not the real need to provide stream hashing.

@Jongy Jongy closed this May 25, 2021
@dpgeorge
Copy link
Member

I think that the core issue was the un-intuitiveness, and not the real need to provide stream hashing.

Yes, that's what I feel as well.

Thanks for closing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extmod Relates to extmod/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESP32 SHA1 on uhashlib is not correct at 2nd or more times.
3 participants