Fix GH-16870: gmp_pow(64, 11) throws overflow exception #16884

cmb69 · 2024-11-21T11:18:32Z

The current guard to prevent FPEs is way too restrictive; 64 ** 11 is a perfectly reasonable operation. Instead, we now estimate the number of bytes of the resulting GMP (assuming that numbers are stored base 256 encoded), and fail if that exceeds a given threshold. The chosen threshold is somewhat arbitrary.

We also ensure that we do not prematurely convert a given non int base to an int to avoid overflow which could circumvent our early check.

Note that the threshold of 2000 seems way too small. To avoid triggering overflow handling during allocations it should rather be close to SIZE_MAX (or whatever libgmp/mpir assume there), but even 10,000 would cause

php-src/ext/gmp/tests/gmp_pow_fpe.phpt

Lines 20 to 29 in ff3b4ec

    
           try { 
        
               gmp_pow(gmp_add(gmp_mul(gmp_init(PHP_INT_MAX), gmp_init(PHP_INT_MAX)), 3), 256); 
        
           } catch (\ValueError $e) { 
        
           	echo $e->getMessage() . PHP_EOL; 
        
           } 
        
           try { 
        
               gmp_pow(gmp_init(PHP_INT_MAX), 256); 
        
           } catch (\ValueError $e) { 
        
           	echo $e->getMessage(); 
        
           }

to no longer throw, but instead yield reasonable results: numbers with a bit less than 10,000 and 5,000 digits, respectively. It's hard for me to tell whether these examples would actually cause libgmp's overflow check to trigger, since mpir doesn't even have these overflow checks.

In any way, it might not be the best idea to push the limits of our early overflow check just to avoid overflow checks to be triggered by libgmp, since we're likely hitting OOM, which causes abort.

cmb69 · 2024-11-21T12:39:32Z

The threshold logic should be unified with #16880 and other places in gmp.c.

Girgias · 2024-11-22T23:11:49Z

ext/gmp/gmp.c


 	if (Z_TYPE_P(base_arg) == IS_LONG && Z_LVAL_P(base_arg) >= 0) {
 		INIT_GMP_RETVAL(gmpnum_result);
-		if ((log(Z_LVAL_P(base_arg)) * exp) > powmax) {
+		if ((log(Z_LVAL_P(base_arg)) / log(256) * exp) > powmax) {


Don't we just care if it can be "allocated reasonably" ?
Which is actually somewhat described in section 5.14 of the GMP manual (mpz_export)

Wonder if we shouldn't use a heuristic based on that?

This is basically what is already happening in this patch. The difference is that I'm counting bytes, not bits. The logic is to estimate the number of bytes the result would have, and if this is greater than some threshold (powmax is badly named; should be rather max_size_in_bytes or so), trigger a value error.

mpz_sizeinbase() is only used in the else branch, because if the input is a zend_long, that wouldn't work. So I'm calculating log256(base_arg) * exp instead.

I'll update to counting bytes.

cmb69 · 2024-11-24T15:02:57Z

Oh, I had only checked with 64bit builds, but now noticed that two test cases are failing on 32bit. I could further reduce max_bits, but I think gmp_pow(20,10) should actually succeed (result is a number with only 14 decimal digits; even 20**10 works); same for the others that are now failing. Maybe we can agree on a threshold (max_bits) first; I will then adapt test cases as needed (and also rebase, if we do not want to target any stable branches).

Girgias · 2024-11-25T13:43:05Z

Oh, I had only checked with 64bit builds, but now noticed that two test cases are failing on 32bit. I could further reduce max_bits, but I think gmp_pow(20,10) should actually succeed (result is a number with only 14 decimal digits; even 20**10 works); same for the others that are now failing. Maybe we can agree on a threshold (max_bits) first; I will then adapt test cases as needed (and also rebase, if we do not want to target any stable branches).

I think this fix is correct, so we may as well apply it to lower branches. We could also have different thresholds for 32 and 64 bit.

cmb69 · 2024-11-25T14:22:14Z

The current threshold is very low; it would only allow for numbers with roughly 2^14 bits (while in theory 2^34 bits on 32bit systems, and 2^37 bits on 64bit systems would be supported). While that would be sufficient for the proposed test cases #16898 and #16896, I think we should actually raise that threshold (and fix the failing test cases). Even 2^24 bits shouldn't cause serious issues regarding memory consumption (would take about 2MB per number; default memory_limit is 128MB). And for users doing some scientific computing, even that might be too low.

cmb69 · 2024-11-25T18:19:52Z

I think this fix is correct,

No, not really. The problem is that mpz_sizeinbase(x, 2) == ceil(log2(x)) for all x >= 2 (I'm ignoring negative bases for now). So there is already some difference regarding the threshold (passing ints would allow for slightly higher bases to be accepted). But the big bummer are zero and one; while log2(0) == log2(1) == 0), mpz_sizeinbase(0, 2) == mpz_sizeinbase(1, 2) == 1, so raising "0" or "1" to the power of x would only be supported for x <= maxbits, while raising 0 or 1 to the power of x would be supported for all x. We could use mpz_sizeinbase(x, 2) - 1 instead (what is floor(log2(x)), but that would still leave a small difference for int bases and other bases (although this would make more sense; you get a bigger range of supported values when not using ints). Alternative would be to drop the special casing for int bases altogether (probably not a big performance improvement anyway).

Negative bases will never take the special casing for ints, but otherwise have the same issue regarding the miscalculation for -1 (mpz_sizeinbase() ignores the sign). Would also be solved if we use mpz_sizeinbase(x, 2) - 1.

PS: from the mpz_sizeinbase() docs (emphasis mine):

Return the size of op measured in number of digits in the given base. base can vary from 2 to 62. The sign of op is ignored, just the absolute value is used. The result will be either exact or 1 too big. If base is a power of 2, the result is always exact. If op is zero the return value is always 1.

So subtracting 1 indeed makes sense.

That is equivalent to `floor(log2(Z_LVAL_P(base_arg)))` and as such more suitable[1]. Since this now fails gmp_pow_fpe.phpt, we use an int there. [1] <php#16884 (comment)>

cmb69 · 2024-11-25T19:09:26Z

Added two commits, rebased onto PHP-8.2 and force-pushed.

The current guard to prevent FPEs is way too restrictive; `64 ** 11` is a perfectly reasonable operation. Instead, we now estimate the number of bytes of the resulting GMP (assuming that numbers are stored base 256 encoded), and fail if that exceeds a given threshold. The chosen threshold is somewhat arbitrary. We also ensure that we do not prematurely convert a given non int base to an int to avoid overflow which could circumvent our early check.

That is equivalent to `floor(log2(Z_LVAL_P(base_arg)))` and as such more suitable[1]. Since this now fails gmp_pow_fpe.phpt, we use an int there. [1] <php#16884 (comment)>

`[-1, 1]` are special, since the result always has a single digit (ignoring the sign). We also test the edge cases regarding the difference between `log2(x)` and `mpz_sizeinbase(x, 2) - 1`, where the latter may support higher numbers (always less than an order of binary magnitude). These tests need to be adapted if we change `max_bits` (what we likely will do).

cmb69 · 2024-11-25T21:36:25Z

Rebased and force-pushed again (it's a pain to switch target branches; I'm not even sure this will now run on PHP-8.2 ; note to self: first change target branch, then force-push rebased PR).

cmb69 · 2024-11-25T23:04:46Z

ext/gmp/gmp.c


 	if (Z_TYPE_P(base_arg) == IS_LONG && Z_LVAL_P(base_arg) >= 0) {
 		INIT_GMP_RETVAL(gmpnum_result);
-		if ((log(Z_LVAL_P(base_arg)) * exp) > powmax) {
+		if ((log2(Z_LVAL_P(base_arg)) * exp) > max_bits) {


For parity with the else branch, this should be

Suggested change

if ((log2(Z_LVAL_P(base_arg)) * exp) > max_bits) {

if ((floor(log2(Z_LVAL_P(base_arg))) * exp) > max_bits) {

But that would fail the last check in gmp_pow_fpe.phpt.

cmb69 · 2024-11-25T23:08:59Z

ext/gmp/gmp.c

@@ -1350,22 +1350,20 @@ ZEND_FUNCTION(gmp_pow)
 		RETURN_THROWS();
 	}

-    double powmax = log((double)ZEND_LONG_MAX);
+	double max_bits = 16000; // TODO: value is very small, but passed current test suite


max_bits is a misnomer now; it's actually something like max_double_bits. Maybe someone has an idea for a good name. Alternatively, we could double the value here, and check against > max_bits / 2.0 below. Shouldn't be a performance issue, if we declare const or use a macro.

Girgias · 2024-11-26T00:02:55Z

ext/gmp/tests/gh16870.phpt

+var_dump((string) gmp_pow(64, 11));
+var_dump((string) gmp_pow(-1, PHP_INT_MAX));
+var_dump((string) gmp_pow(0, PHP_INT_MAX));
+var_dump((string) gmp_pow(1, PHP_INT_MAX));
+var_dump((string) gmp_pow("-1", PHP_INT_MAX));
+var_dump((string) gmp_pow("0", PHP_INT_MAX));
+var_dump((string) gmp_pow("1", PHP_INT_MAX));
+var_dump((string) gmp_pow(3, 10094));


Please add tests that use the ** operator, as those would also be affected I guess? Or did we somehow not covered this in the initial bug fix?

Girgias · 2024-11-26T00:12:45Z

Maybe we shouldn't try to determine the threshold, scientific computation should be allowed to do whatever they want within their memory bounds.

I must say the whole bits computation and comparison is confusing to me, like I understand the logic of it, but the implementation of that is another story, which is why I think your previous fix was correct (ignoring the threshold).

I don't understand why you would want to double the max_bits value to then divide it by 2 afterwards.

cmb69 · 2024-11-26T11:01:05Z

I don't understand why you would want to double the max_bits value to then divide it by 2 afterwards.

We're trying to estimate the number of bits the result needs, but we're actually estimating the number of double-bits (= number of bits divided by 2), since we're using mpz_sizeinbase(x, 2) - 1 (aka. floor(log2(x))). So max_bits is a confusing misnomer.

Maybe we shouldn't try to determine the threshold, scientific computation should be allowed to do whatever they want within their memory bounds.

Usually I'm very fine with this approach, but libgmp aborts() on out of memory conditions, what is bad for NTS, but particularly bad for ZTS SAPIs, where all worker threads will be killed. Okay, maybe not that much of an issue in practice, since on OOM you've already lost, although libgmp aborts when malloc fails. So there may be ample memory available, but just not enough to calculate a very huge number.

Anyhow, this is not a hill I'm willing to die on. I think we can just go with #16934 instead.

Girgias · 2024-11-26T21:00:11Z

I think the issue found by the fuzzer is just very theoretical, and unlikely TM to happen.

Ideally, setting up GMP to use ZendMM would be best, but that apparently also has issues which is annoying...

github-actions bot added the Extension: gmp label Nov 21, 2024

cmb69 linked an issue Nov 21, 2024 that may be closed by this pull request

gmp_pow(64, 11) throws overflow exception #16870

Closed

This was referenced Nov 21, 2024

gmp_pow(64, 11) throws overflow exception #16870

Closed

ext/gmp: add test for uses of gmp_pow with number sizes commonly used in cryptography #16896

Closed

cmb69 mentioned this pull request Nov 22, 2024

Add test for bug GH-16870 (Overly restrictive gmp_pow overflow checks) #16898

Open

Girgias reviewed Nov 22, 2024

View reviewed changes

cmb69 marked this pull request as ready for review November 24, 2024 15:02

cmb69 force-pushed the cmb/gh16870 branch from 1940580 to 5cb70a7 Compare November 25, 2024 19:08

cmb69 changed the base branch from master to PHP-8.2 November 25, 2024 19:09

cmb69 added 5 commits November 25, 2024 22:34

Remove now unused variable

4acc1b2

Count bits instead of bytes

9ce9a4e

Use mpz_sizeinbase(gmpnum_base, 2) - 1

333524e

That is equivalent to `floor(log2(Z_LVAL_P(base_arg)))` and as such more suitable[1]. Since this now fails gmp_pow_fpe.phpt, we use an int there. [1] <php#16884 (comment)>

cmb69 force-pushed the cmb/gh16870 branch from 5cb70a7 to 3998435 Compare November 25, 2024 21:35

cmb69 commented Nov 25, 2024

View reviewed changes

Girgias reviewed Nov 26, 2024

View reviewed changes

cmb69 closed this Nov 26, 2024

cmb69 deleted the cmb/gh16870 branch November 26, 2024 11:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GH-16870: gmp_pow(64, 11) throws overflow exception #16884

Fix GH-16870: gmp_pow(64, 11) throws overflow exception #16884

cmb69 commented Nov 21, 2024

cmb69 commented Nov 21, 2024

Girgias Nov 22, 2024

cmb69 Nov 24, 2024

cmb69 commented Nov 24, 2024

Girgias commented Nov 25, 2024

cmb69 commented Nov 25, 2024

cmb69 commented Nov 25, 2024 •

edited

Loading

cmb69 commented Nov 25, 2024

cmb69 commented Nov 25, 2024 •

edited

Loading

cmb69 Nov 25, 2024

cmb69 Nov 25, 2024

Girgias Nov 26, 2024

Girgias commented Nov 26, 2024

cmb69 commented Nov 26, 2024

Girgias commented Nov 26, 2024

	try {
	gmp_pow(gmp_add(gmp_mul(gmp_init(PHP_INT_MAX), gmp_init(PHP_INT_MAX)), 3), 256);
	} catch (\ValueError $e) {
	echo $e->getMessage() . PHP_EOL;
	}
	try {
	gmp_pow(gmp_init(PHP_INT_MAX), 256);
	} catch (\ValueError $e) {
	echo $e->getMessage();
	}

	if ((log2(Z_LVAL_P(base_arg)) * exp) > max_bits) {
	if ((floor(log2(Z_LVAL_P(base_arg))) * exp) > max_bits) {

Fix GH-16870: gmp_pow(64, 11) throws overflow exception #16884

Fix GH-16870: gmp_pow(64, 11) throws overflow exception #16884

Conversation

cmb69 commented Nov 21, 2024

cmb69 commented Nov 21, 2024

Girgias Nov 22, 2024

Choose a reason for hiding this comment

cmb69 Nov 24, 2024

Choose a reason for hiding this comment

cmb69 commented Nov 24, 2024

Girgias commented Nov 25, 2024

cmb69 commented Nov 25, 2024

cmb69 commented Nov 25, 2024 • edited Loading

cmb69 commented Nov 25, 2024

cmb69 commented Nov 25, 2024 • edited Loading

cmb69 Nov 25, 2024

Choose a reason for hiding this comment

cmb69 Nov 25, 2024

Choose a reason for hiding this comment

Girgias Nov 26, 2024

Choose a reason for hiding this comment

Girgias commented Nov 26, 2024

cmb69 commented Nov 26, 2024

Girgias commented Nov 26, 2024

cmb69 commented Nov 25, 2024 •

edited

Loading

cmb69 commented Nov 25, 2024 •

edited

Loading