-
Notifications
You must be signed in to change notification settings - Fork 1.1k
add pdep/pext compress implementation #2242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
We do not currently target BMI2, see this line...
We do not do so because we don't use BMI2 in the haswell kernel. Note that if we are going to use pdep/pext, we are going to have to omit pre-Zen3 AMD processors were these instructions are super slow. It may require an entirely new kernel which is a bit complicated. It is entirely safe to use pdep/pext in the icelake kernel however. |
Why not target BMI2 for Haswell? Even if you want to avoid PDEP/PEXT because of AMD, don't you still think the other instructions are nice? Does this mean that the compiler can't insert BMI2 instructions at all or does it just mean that you don't use BMI2 intrinsics? |
Currently, GCC and LLVM won't use BMI2 in the haswell kernel. The reason we do not do it is because we do not expect benefits. Of course, that's an empirical issue. I will investigate later. :-) |
Note that the initial versions of simdjson used BMI2 (pdep/pext). At least the way we were using it was not clearly beneficial on Intel processors, and, at the time, quite bad for AMD processors. So we progressively removed it. |
(We do use BMI2 in the icelake kernel.) |
Well, according to Agner Fog's instruction tables, (I assume you know what reciprocal throughput is, but honestly I forget which is which sometimes)
The other instructions So it seems to me the only drawback of BMI2 would be pre-Zen3 AMD processors being really slow at So I don't really see the point of disabling BMI2. Just because the custom |
Update: I added |
It is not enabled by default. So we are not disabling it. However, it is interesting to check whether enabling it would be beneficial, and I plan to investigate. As your analysis suggests, the important instruction is
We need the tables because we can't assume that BMI2 is present. |
Thanks. I will review and assess. We might introduce a kernel between icelake and haswell if the performance gains are sufficiently clear. Of course, a Zen 4 or Zen 5 will use the icelake kernel (where we do use pdep and pext). |
My message was an attempt to explain everything that is included in BMI2, not say which instructions are most important to simdjson. Based on a objdump, here are the number of occurrences for each instruction:
They aren't used much, but they do appear in simdjson. And they can eliminate unnecessary internal flag register allocation and possibly eliminate an unnecessary
Haswell has BMI2 though, right? So you can assume that BMI2 is present. LLVM gives you BMI2 for the Haswell cpu target. |
It is an empirical question. By how much does enabling BMI2 in LLVM or GCC improves performance. As I wrote, I plan to review this issue experimentally.
We do not assume that the processor supports the equivalent of Haswell. This is determined at runtime by CPU detection. |
Feel free, but I think you're unlikely to notice a big difference in practice, these are sort of micro-optimizations. Yes, they save the CPU a little work, but it might be able to do that extra work without adding to the runtime.
Right, but I'm only talking about the Haswell kernel? I added BMI2 to the list of features that Haswell supports, so if we dispatch to the Haswell kernel, that code can use BMI2 instructions, especially I am trying to get you to answer the question of why BMI2 is not enabled for the Haswell kernel. You said you would do it if it's beneficial, which you would test empirically. But my question isn't "Should it be enabled?", my question is, "Why not?" I'm asking what's the harm in saving the CPU a small amount of work or power consumption? I'm basically implying that I think the default position is to have BMI2 enabled for the Haswell kernel, whereas you're saying that the default position is to have BMI2 disabled for the Haswell kernel. I'm questioning why you think that way. I think that these instructions were added to the CPU as an optimization, so if the compiler agrees, i.e., wants to insert it in your Haswell specific code, why not just let it? Again, I am not sure that the specific circumstances in which those instructions are inserted are sufficient to see a performance improvement, but the potential is there, so why not? I definitely don't think you'll see performance degradation. Why not just enable it for the Haswell kernel? |
That's sad that GCC does worse with it on. Well, c'est la vie. |
Had this idea, but unfortunately I can't get it to compile so I'm a bit stuck.
I get a couple compile errors that look like this:
HAS_FAST_PDEP_AND_PEXT
become runtime-dispatched code?