Skip to content

Detect heap freelist corruption #14054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

arnaud-lb
Copy link
Member

@arnaud-lb arnaud-lb commented Apr 26, 2024

This detects corruptions of freelist pointers in the heap by using a shadow pointer.

The pointer is checked for equality with its shadow before dereferencing or before moving it to zend_mm_heap->free_slots. The shadow is obfuscated by XORing with a random key. The key is refreshed every time the heap is shutdown, and also re-seeded if the heap is shutdown in a fork, so different processes or requests always have different keys.

The shadow pointer is stored at the end of the slot. This requires to increase the minimum block size from 8 to 16.

This is a common technique found in various allocators [1] [2] [3]. Edit: also in the suhosin patch.

This makes exploitation of buffer overflows via freelist corruption less practical, as it is now required to override more bytes, and to have knowledge of the the per-request key.

Overhead on the Symfony benchmark is ±0.30% wall time, or +0.64% valgrind Ir.

I've tried a few things to reduce the overhead. For instance we can remove one branch if shadows are also copied to heap->free_slots, because we can then XOR unconditionally:

if (!(heap->free_slots[bin_size].slot ^ heap->free_slots[bin_size].shadow ^ heap->key)) {
    // fast path: pointer is not corrupt and there is a free slot
} else {
    // slow path
}

However the extra copies required to maintain heap->free_slots[bin_size].shadow cancel any benefit.

Related: #13943 (cc @jvoisin)

Comment on lines 1361 to 1431
if (EXPECTED(heap->free_slot[bin_num] != NULL)) {
zend_mm_free_slot *p = heap->free_slot[bin_num];
heap->free_slot[bin_num] = p->next_free_slot;
heap->free_slot[bin_num] = zend_mm_check_next_free_slot(heap, bin_num, p);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next_free_slot is checked when moving from p->next_free_slot to heap->free_slot[bin_num] (and also in zend_mm_gc()).

@arnaud-lb arnaud-lb force-pushed the freelist-corrupt branch 4 times, most recently from 61c54a8 to 5473c3b Compare April 27, 2024 15:03
@arnaud-lb arnaud-lb marked this pull request as ready for review April 29, 2024 11:52
@jvoisin
Copy link
Contributor

jvoisin commented Apr 29, 2024

The key is refreshed every time the heap is shutdown, and also re-seeded if the heap is shutdown in a fork, so different processes or requests always have different keys.

I don't think that's right. Maybe you meant initialized/created instead of shutdown?

@jvoisin
Copy link
Contributor

jvoisin commented Apr 29, 2024

Do you want to add the bitswap done in partitionaAlloc in this pull-request as well, or keep it as a followup?

Copy link
Contributor

@jvoisin jvoisin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent idea <3

@@ -1248,6 +1272,45 @@ static zend_always_inline int zend_mm_small_size_to_bin(size_t size)

#define ZEND_MM_SMALL_SIZE_TO_BIN(size) zend_mm_small_size_to_bin(size)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to have a comment detailing/explaining the idea behind the shadow.

Comment on lines 1280 to 1347
static zend_always_inline zend_mm_free_slot* zend_mm_decode_free_slot(zend_mm_heap *heap, zend_mm_free_slot *slot)
{
return (zend_mm_free_slot*)((uintptr_t)slot ^ heap->key);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static zend_always_inline zend_mm_free_slot* zend_mm_decode_free_slot(zend_mm_heap *heap, zend_mm_free_slot *slot)
{
return (zend_mm_free_slot*)((uintptr_t)slot ^ heap->key);
}
#define zend_mm_decode_free_slot zend_mm_encode_free_slot

This explicitly shows that it's the same operation and simplifies a bit the code, I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed (I changed the encode/decode functions, so this doesn't apply anymore)

static zend_always_inline void zend_mm_set_next_free_slot(zend_mm_heap *heap, uint32_t bin_num, zend_mm_free_slot *slot, zend_mm_free_slot *next)
{
slot->next_free_slot = next;
*(zend_mm_free_slot**)((char*)slot + bin_shadow_offset[bin_num]) = zend_mm_encode_free_slot(heap, next);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tad unreadable and not self-explainatory. It would be nice to either hide it behind a SET_SHADOW macro or refactor it a bit, especially since it's used a couple of times.

*(zend_mm_free_slot**)((char*)dest + bin_shadow_offset[bin_num]) = *(zend_mm_free_slot**)((char*)from + bin_shadow_offset[bin_num]);
}

static ZEND_COLD ZEND_NORETURN void zend_mm_free_slot_corrupted(void)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for a dedicated function, it can be replaced with something like zend_mm_panic("zend_mm_heap shadow corrupted");

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reduces the code size a bit because the user of this function is inlined many times, but not that much actually, and zend_mm_panic is also noreturn/cold, so yes we can use it directly.

static zend_result zend_mm_refresh_key(zend_mm_heap *heap)
{
php_random_result result = php_random_algo_xoshiro256starstar.generate(&heap->random_state);
ZEND_ASSERT(result.size == sizeof(uint64_t));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

php_random_result.size is defined as an uint64_t, why this assert?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the ZEND_ASSERT is not necessary here: xoshiro256** is a 64-bit engine and that will not change.

As for the question: You probably misread the assertion. It's not asserting that size of result.size matches that of uint64_t, but rather that the contents of it match a uint64_t. The result.size field indicates how many bytes of result.result actually contain randomness (required to support 32-bit engines, such as Mt19937).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent here was to ensure that the engine was indeed 64-bit, and that this invariant survives refactorings in zend_alloc or in ext/random.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the ext/random side, the behavior of xoshiro256** is guaranteed by https://github.com/php/php-src/blob/master/ext/random/tests/02_engine/xoshiro256starstar_value.phpt. If either the output size or implementation would change, that test would fail. That's the whole point of using a well-defined engine.

If you feel more safe with the assert in place, I won't object, just wanted to point out the guarantees of the randomness API for you to make an informed decision.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static php_random_result generate(void *state)
{
	return (php_random_result){
		.size = sizeof(uint64_t),
		.result = generate_state(state),
	};
}

it's indeed pretty obvious from the code, but I guess the assert doesn't hurt, if only to save a look into ext/random/engine_xoshiro256starstar.c to ensure that this PRNG is always returning sizeof(uint64_t) bytes of entropy.

@@ -245,6 +248,7 @@ struct _zend_mm_heap {
size_t size; /* current memory usage */
size_t peak; /* peak memory usage */
#endif
uintptr_t key; /* free slot shadow ptr key */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uintptr_t key; /* free slot shadow ptr key */
uintptr_t key; /* free slot shadow ptr xor key */

To explain its type: it needs to be a uintptr_t since it's xored with pointers. Also, the name shadow ptr is a bit confusing, since it's not really a pointer it self, but a key to encrypt the free slot ptr.

Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand anything about the allocator, but I've got some comments about the use of randomness 😄

As for the change in ext/random: Is that actually necessary here? If it is, then I probably did #13568 incorrectly and would like to request that you add the ZEND_API for all engines. Thanks!

@jvoisin jvoisin mentioned this pull request Apr 30, 2024
6 tasks
@arnaud-lb
Copy link
Member Author

Thank you for the reviews @jvoisin @TimWolla!

As for the change in ext/random: Is that actually necessary here? If it is, then I probably did #13568 incorrectly and would like to request that you add the ZEND_API for all engines. Thanks!

Honestly I'm not sure the PHPAPI / ZEND_API is actually required, I added that out of habit. I've just noticed now that all algos had PHPAPI in ext/random/php_random.h, and based on your suggestion I've added the flag in all definitions as well. I don't mind reverting that.

The key is refreshed every time the heap is shutdown, and also re-seeded if the heap is shutdown in a fork, so different processes or requests always have different keys.

I don't think that's right. Maybe you meant initialized/created instead of shutdown?

The terminology is confusing, but we use zend_mm_shutdown to clean up and re-init the heap between requests. NB: This also means that the first shutdown may happen in a parent process, so the key is only refreshed after the first request in a child. I plan to change that in a followup.

Do you want to add the bitswap done in partitionaAlloc in this pull-request as well, or keep it as a followup?

This is not as necessary as in partition alloc: In partition alloc the shadow is not enabled in all builds, so the bitswap is the only protection (along with a range check). In our case it increases the minimum overflow by at least 2 bytes, so I've added that now. It adds a bit more overhead but I was able to cancel it by replacing the load of bin_shadow_offset[bin_num] with bin_data_size[bin_num]-8, as it is already loaded in all code paths.

Copy link
Contributor

@jvoisin jvoisin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment on lines 1352 to 1366
zend_mm_free_slot *next = slot->next_free_slot;
zend_mm_free_slot *shadow = ZEND_MM_FREE_SLOT_PTR_SHADOW(slot, bin_num);
if (EXPECTED(next != NULL)) {
if (UNEXPECTED(next != zend_mm_decode_free_slot(heap, shadow))) {
zend_mm_panic("zend_mm_heap corrupted");
}
}
return (zend_mm_free_slot*)next;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
zend_mm_free_slot *next = slot->next_free_slot;
zend_mm_free_slot *shadow = ZEND_MM_FREE_SLOT_PTR_SHADOW(slot, bin_num);
if (EXPECTED(next != NULL)) {
if (UNEXPECTED(next != zend_mm_decode_free_slot(heap, shadow))) {
zend_mm_panic("zend_mm_heap corrupted");
}
}
return (zend_mm_free_slot*)next;
zend_mm_free_slot *next = slot->next_free_slot;
if (EXPECTED(next != NULL)) {
zend_mm_free_slot *shadow = ZEND_MM_FREE_SLOT_PTR_SHADOW(slot, bin_num);
if (UNEXPECTED(next != zend_mm_decode_free_slot(heap, shadow))) {
zend_mm_panic("zend_mm_heap corrupted");
}
}
return (zend_mm_free_slot*)next;

#define ZEND_MM_FREE_SLOT_PTR_SHADOW(free_slot, bin_num) \
*((zend_mm_free_slot**)((char*)(free_slot) + bin_data_size[(bin_num)] - sizeof(zend_mm_free_slot*)))

static zend_always_inline zend_mm_free_slot* zend_mm_encode_free_slot(zend_mm_heap *heap, zend_mm_free_slot *slot)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
static zend_always_inline zend_mm_free_slot* zend_mm_encode_free_slot(zend_mm_heap *heap, zend_mm_free_slot *slot)
static zend_always_inline zend_mm_free_slot* zend_mm_encode_free_slot(const zend_mm_heap *heap, const zend_mm_free_slot *slot)

@TimWolla
Copy link
Member

Honestly I'm not sure the PHPAPI / ZEND_API is actually required, I added that out of habit. I've just noticed now that all algos had PHPAPI in ext/random/php_random.h, and based on your suggestion I've added the flag in all definitions as well. I don't mind reverting that.

Ah, I wasn't aware that they had the flag in the header, but not in the C file. While making the adjustments, you missed the “User” engine and Mt19937. I've created #14088 to make this change independently of this PR, because it makes sense as a stand-alone fix.

@arnaud-lb arnaud-lb force-pushed the freelist-corrupt branch from c8aecd9 to 5cbfa37 Compare May 1, 2024 11:28
Copy link
Contributor

@zeriyoshi zeriyoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with the ZendMM implementation, but the use of ext-random seems appropriate!

Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably won't be able to make a deep review until May 13.
Please don't merge this before.
Also think if it's possible to do this without ext/random dependency.

Comment on lines 62 to 63
#include "ext/random/php_random_csprng.h"
#include "ext/random/php_random.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not allowed to intraduce Zend -> ext dependencies

Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch makes sense, but I would like to make it a bit better.
Actually, we already had some heap protection in PHP-5 but decided to remove it when optimized MM for speed. Please take a look into PHP-5.6 code.

  1. ext/random dependency should be removed (PHP passes necessary functions to Zend Engine through zend_utility_functions, but I'm not sure if this will work for MM).
  2. the heap protection should be optional (use ZEND_MM_HEAP_PROTECTION macro, similar to ZEND_MM_STAT and others).
  3. try to not change the zend_alloc_sizes.h numbers, but increase the allocated size in zend_mm_alloc_heap if necessary (like we do for ZEND_DEBUG)

Comment on lines 1344 to 1348
static zend_always_inline void zend_mm_copy_next_free_slot(zend_mm_free_slot* dest, uint32_t bin_num, zend_mm_free_slot* from)
{
dest->next_free_slot = from->next_free_slot;
ZEND_MM_FREE_SLOT_PTR_SHADOW(dest, bin_num) = ZEND_MM_FREE_SLOT_PTR_SHADOW(from, bin_num);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand why do you need this function. Why don't you use zend_mm_set_next_free_slot()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference between the two functions is that zend_mm_copy_next_free_slot() is a pure copy of next_free_slot and the shadow value, so that invalid pointer/shadow pairs remain invalid, while zend_mm_set_next_free_slot() creates a valid shadow out of a pointer value.

In zend_mm_gc(), using zend_mm_copy_next_free_slot() avoids decoding, checking, and re-encoding the pointer, but I'm not sure it matters in this context, and it is actually desirable to check the pointer against its shadow here, so that corruptions are detected earlier. I will replace zend_mm_copy_next_free_slot() by zend_mm_set_next_free_slot(..., zend_mm_check_next_free_slot()).

ZEND_MM_FREE_SLOT_PTR_SHADOW(dest, bin_num) = ZEND_MM_FREE_SLOT_PTR_SHADOW(from, bin_num);
}

static zend_always_inline zend_mm_free_slot *zend_mm_check_next_free_slot(zend_mm_heap *heap, uint32_t bin_num, zend_mm_free_slot* slot)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would name this zend_mm_get_next_free_slot or zend_mm_next_free_slot.
The heap verification is encapsulated and not important the the rest of the MM implementation.

May be it make sense to stay only two functions zend_mm_set_next_free_slot and ````zend_mm_get_next_free_slot`` instead of all the functions above.

Comment on lines 321 to 398
#define _BIN_DATA_SIZE(num, size, elements, pages, x, y) size,
#define _BIN_DATA_SIZE(num, size, elements, pages, x, y) \
/* Need two words for free slot pointer and shadow */ \
MAX(size, sizeof(zend_mm_free_slot*)*2)
#define _BIN_DATA_SIZE_C(num, size, elements, pages, x, y) \
_BIN_DATA_SIZE(num, size, elements, pages, x, y),
static const uint32_t bin_data_size[] = {
ZEND_MM_BINS_INFO(_BIN_DATA_SIZE, x, y)
ZEND_MM_BINS_INFO(_BIN_DATA_SIZE_C, x, y)
};

#define _BIN_DATA_ELEMENTS(num, size, elements, pages, x, y) elements,
#define _BIN_DATA_ELEMENTS(num, size, elements, pages, x, y) \
/* Adjusting size requires adjusting elements */ \
(elements / (_BIN_DATA_SIZE(num, size, elements, pages, x, y) / size))
#define _BIN_DATA_ELEMENTS_C(num, size, elements, pages, x, y) \
_BIN_DATA_ELEMENTS(num, size, elements, pages, x, y),
static const uint32_t bin_elements[] = {
ZEND_MM_BINS_INFO(_BIN_DATA_ELEMENTS, x, y)
ZEND_MM_BINS_INFO(_BIN_DATA_ELEMENTS_C, x, y)
};

#define _BIN_DATA_PAGES(num, size, elements, pages, x, y) pages,
#define _BIN_DATA_PAGES(num, size, elements, pages, x, y) pages
#define _BIN_DATA_PAGES_C(num, size, elements, pages, x, y) \
_BIN_DATA_PAGES(num, size, elements, pages, x, y),
static const uint32_t bin_pages[] = {
ZEND_MM_BINS_INFO(_BIN_DATA_PAGES, x, y)
ZEND_MM_BINS_INFO(_BIN_DATA_PAGES_C, x, y)
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to avoid these changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the branch to avoid changing these: 5c29ad5. However, this needs more special cases than updating the bin_ tables as I did before to reflect the effective size and elements.

Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to ext/random LGTM (except small nit) and the use of randomness appears to be sound. Can't comment on the allocator changes themselves.

goto fallback;
}
} else {
fallback:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fallback:
fallback:

Small nit, not sure if there's any coding standard rules, but I prefer indenting labels with a single space.

This keeps them near the left of the file, making them stand out, while not looking like a function definition to naive diff tools.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran a quick grep to find what we use most often. We are not super consistent, but we indent labels with tabs in 72% of the case in Zend/ and main/ when excluding generated files (170 labels indented with tabs, 19 with spaces, 46 without indentation).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've did a quick look with GitHub's search using repo:php/php-src /^\s*[a-z]+:/ language:C. Most of the results appear to be default:.

This raises the question: Did you exclude default: from your quick grep? Otherwise that might bias the results.

In any case, I don't care too strongly, but have a preference for the space 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This raises the question: Did you exclude default: from your quick grep? Otherwise that might bias the results.

Ah! There is indeed a lot of default:. After excluding them I get almost exclusively labels without indentation, except in zend_strtod.c, which indents labels with one space.

We keep track of free slots by organizing them in a linked list, with the
first word of every free slot being a pointer to the next one.

In order to make corruptions more difficult to exploit, we check the consistency
of these pointers before dereference by comparing them with a shadow. The shadow
is a copy of the pointer, stored at the end of the slot.

Before this change, an off-by-1 write is enough to produce a valid freelist
pointer. After this change, a bigger out of bound write is required for that.
The difficulty is increase further by XOR'ing the shadow with a secret, and
byte-swapping it, which increases the minimal required out of bound write
length.
arnaud-lb added a commit that referenced this pull request Jun 12, 2024
Co-authored-by: Tim Düsterhus <tim@bastelstu.be>
@arnaud-lb arnaud-lb closed this in 25360ef Jun 12, 2024
@arnaud-lb
Copy link
Member Author

Thank you @dstogov @jvoisin @TimWolla @zeriyoshi!

@mvorisek
Copy link
Contributor

The performance degradation on "Wordpress 6.2 JIT" benchmark is 2%, is there any way it can be improved?

@iluuu1994
Copy link
Member

@arnaud-lb Did you make any real-time performance measurements? It would be good to know how big the impact is (Valgrind is not always reliable).

@iluuu1994
Copy link
Member

@kocsismate Maybe you can run your benchmark suite for this change? That would be really helpful.

@kocsismate
Copy link
Member

@iluuu1994 yes of course! I'll try to run it later today, possibly at night

@arnaud-lb
Copy link
Member Author

The performance degradation on "Wordpress 6.2 JIT" benchmark is 2%, is there any way it can be improved?

Is this the nightly Valgrind measurement?

@arnaud-lb Did you make any real-time performance measurements? It would be good to know how big the impact is (Valgrind is not always reliable).

Yes, I was getting consistently less than 0.30% wall-clock time regression on the symfony-demo and bench.php benchmarks, with a stddev < 0.001.

I didn't check wordpress until now, however. I did just now, and I'm getting very inconsistent results ranging from -0.90% to +2.50% wall-clock time when averaging 5 runs of php-cgi -T50,50, even when comparing the base branch to itself. stddev on this benchmark is around 0.025 after 20 runs.

My benchmarking protocol is this, on a i9:

for f in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo performance > "$f"; done
echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
echo 0 > /proc/sys/kernel/randomize_va_space

# repeated 5 times
# cpus 0-15 are the performance cores 
taskset --cpu-list 0-15 php-cgi -T50,50 ...

With opcache enabled. I take the Elapsed time: output from php-cgi and average the results. From the output of perf stat I can see that task-time, instructions, cycles differ in the same proportions as php-cgi's Elapsed time:. Cache misses, RSS, page faults, L1 metrics, branches, branch-misses differ in much smaller proportions.

@kocsismate
Copy link
Member

kocsismate commented Jun 13, 2024

@iluuu1994 @arnaud-lb I run the benchmark twice, results for Symfony and Laravel are extremely consistent, bench.php is someho very inconsistent again:

https://gist.github.com/kocsismate/1de2eb67fe8725cf287554e36b0f7b76

@iluuu1994
Copy link
Member

Ok! That doesn't sound too bad. Some more conclusive Wordpress tests would be nice. @arnaud-lb Might you get some more consistent results with one core? Although that's likely less representative.

@arnaud-lb
Copy link
Member Author

I'm getting less variance with a single core. The worse run was +1.08% with stddev 0.009, and the best was +0.41% with stddev 0.0026.

@mvorisek
Copy link
Contributor

Do you think the performance can be improved somehow? The benchmarks do not focus on allocation, so allocation heavy tests would probably have even worse results...

@arnaud-lb
Copy link
Member Author

I will check, but I've already spent a considerable amount of time trying to minimize the overhead of this change so I don't expect big improvements.

@kocsismate
Copy link
Member

@arnaud-lb Could you please share some details how you benchmark wordpress? I'd like to add it to my benchmark suite, but I'm not very familiar what config it needs, which page to run the benchmark against, or whether php-cgi needs any environment variable to be passed.

@arnaud-lb
Copy link
Member Author

@kocsismate for the measurements above I've used the same setup as

function runWordpress(bool $jit): array {
, but without valgrind. This benchmark requires a mysql server (there is a docker-compose in benchmark/), and you need the mysqli extension. I've used SCRIPT_FILENAME=$(pwd)/index.php REQUEST_URI=/ as environment, but I believe that benchmark.php doesn't pass any environment variables to php-cgi.

@arnaud-lb
Copy link
Member Author

arnaud-lb commented Jun 14, 2024

@mvorisek do you have some specific code that I could benchmark?

@mvorisek
Copy link
Contributor

I do not have any specific code, but I have in mind some code that will do a lot of allocation. If the Wordpress /w JIT is 1 - 2 % slower, I guess it is possible to put together some edge case code that will be impacted by this PR much more.

@kocsismate
Copy link
Member

@arnaud-lb Thank you very much for the info! It has just come to my mind that I was concerned about Wordpress because of its MySQL dependency which may skew the data, so the measurements represent I/O more than they should. Or shouldn't I be worried about this?

@arnaud-lb
Copy link
Member Author

@kocsismate really I'm not sure. What I observed earlier is that php-cgi was using about 0.9 CPUs during this benchmark according to perf stat, so we are not benchmarking only MySQL here. Also, I was under the impression that CPU time was varying in the same proportion as elapsed time, but looking back at my results I'm not sure anymore.

I would say that if we can reduce the variance to an acceptable amount, it's fine. Otherwise, and if the variance is caused by MySQL / I/O, maybe we can measure the CPU time instead of wall-clock time.

arnaud-lb added a commit that referenced this pull request Mar 31, 2025
Debugging memory corruption issues in production can be difficult when it's not possible to use a debug build or ASAN/MSAN/Valgrind (e.g. for performance reasons).

This change makes it possible to enable some basic heap debugging helpers without rebuilding PHP. This is controlled by the environment variable ZEND_MM_DEBUG. The env var takes a comma-separated list of parameters:

- poison_free=byte: Override freed blocks with the specified byte value (represented as a number)
- poison_alloc=byte: Override newly allocated blocks with the specified byte value (represented as a number)
- padding=bytes: Pad allocated blocks with the specified amount of bytes (if non-zero, a value >= 16 is recommended to not break alignments) 
- check_freelists_on_shutdown=0|1: Enable checking freelist consistency [1] on shutdown

Example:

    ZEND_MM_DEBUG=poison_free=0xbe,poison_alloc=0xeb,padding=16,check_freelists_on_shutdown=1 php ...

This is implemented by installing custom handlers when ZEND_MM_DEBUG is set.

This has zero overhead when ZEND_MM_DEBUG is not set. When ZEND_MM_DEBUG is set, the overhead is about 8.5% on the Symfony Demo benchmark.

Goals:

 - Crash earlier after a memory corruption, to extract a useful backtrace
 - Be usable in production with reasonable overhead
 - Having zero overhead when not enabled

Non-goals:

 - Replace debug builds, valgrind, ASAN, MSAN or other sanitizers

[1] #14054

Co-authored-by: Tim Düsterhus <timwolla@googlemail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants