-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[WebAssembly] Fix lowering of (extending) loads from addrspace(1) globals #155937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[WebAssembly] Fix lowering of (extending) loads from addrspace(1) globals #155937
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-backend-webassembly Author: None (QuantumSegfault) ChangesImplements custom EXTLOAD lowering logic to enable correct lowering of such from WASM (address space 1) globals, similarily to the existing LOAD logic. These fixes also include making sure that the global.load is done for the type of the underlying global (accounting for globals declared smaller than i32), and the loaded value is extended/truncated to the desired result width (preventing invalid instruction sequences). Loads from integral or floating-point globals of widths other than 8, 16, 32, or 64 bits are explicitly disallowed. Full diff: https://github.com/llvm/llvm-project/pull/155937.diff 2 Files Affected:
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
index 5a45134692865..343f85e475016 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
@@ -19,6 +19,7 @@
#include "WebAssemblyTargetMachine.h"
#include "WebAssemblyUtilities.h"
#include "llvm/CodeGen/CallingConvLower.h"
+#include "llvm/CodeGen/ISDOpcodes.h"
#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineJumpTableInfo.h"
@@ -111,6 +112,17 @@ WebAssemblyTargetLowering::WebAssemblyTargetLowering(
}
}
+ // Likewise, transform extending loads for address space 1
+ for (auto T : {MVT::i32, MVT::i64}) {
+ for (auto S : {MVT::i8, MVT::i16, MVT::i32}) {
+ if (T != S) {
+ setLoadExtAction(ISD::EXTLOAD, T, S, Custom);
+ setLoadExtAction(ISD::ZEXTLOAD, T, S, Custom);
+ setLoadExtAction(ISD::SEXTLOAD, T, S, Custom);
+ }
+ }
+ }
+
setOperationAction(ISD::GlobalAddress, MVTPtr, Custom);
setOperationAction(ISD::GlobalTLSAddress, MVTPtr, Custom);
setOperationAction(ISD::ExternalSymbol, MVTPtr, Custom);
@@ -1707,6 +1719,11 @@ static bool IsWebAssemblyGlobal(SDValue Op) {
if (const GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(Op))
return WebAssembly::isWasmVarAddressSpace(GA->getAddressSpace());
+ if (Op->getOpcode() == WebAssemblyISD::Wrapper)
+ if (const GlobalAddressSDNode *GA =
+ dyn_cast<GlobalAddressSDNode>(Op->getOperand(0)))
+ return WebAssembly::isWasmVarAddressSpace(GA->getAddressSpace());
+
return false;
}
@@ -1764,16 +1781,110 @@ SDValue WebAssemblyTargetLowering::LowerLoad(SDValue Op,
LoadSDNode *LN = cast<LoadSDNode>(Op.getNode());
const SDValue &Base = LN->getBasePtr();
const SDValue &Offset = LN->getOffset();
+ ISD::LoadExtType ExtType = LN->getExtensionType();
+ EVT ResultType = LN->getValueType(0);
if (IsWebAssemblyGlobal(Base)) {
if (!Offset->isUndef())
report_fatal_error(
"unexpected offset when loading from webassembly global", false);
- SDVTList Tys = DAG.getVTList(LN->getValueType(0), MVT::Other);
- SDValue Ops[] = {LN->getChain(), Base};
- return DAG.getMemIntrinsicNode(WebAssemblyISD::GLOBAL_GET, DL, Tys, Ops,
- LN->getMemoryVT(), LN->getMemOperand());
+ EVT GT = MVT::INVALID_SIMPLE_VALUE_TYPE;
+
+ if (const GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(Base))
+ GT = EVT::getEVT(GA->getGlobal()->getValueType());
+ if (Base->getOpcode() == WebAssemblyISD::Wrapper)
+ if (const GlobalAddressSDNode *GA =
+ dyn_cast<GlobalAddressSDNode>(Base->getOperand(0)))
+ GT = EVT::getEVT(GA->getGlobal()->getValueType());
+
+ if (GT != MVT::i8 && GT != MVT::i16 && GT != MVT::i32 && GT != MVT::i64 &&
+ GT != MVT::f32 && GT != MVT::f64)
+ report_fatal_error("encountered unexpected global type for Base when "
+ "loading from webassembly global",
+ false);
+
+ EVT PromotedGT = (GT == MVT::i8 || GT == MVT::i16) ? MVT::i32 : GT;
+
+ if (ExtType == ISD::NON_EXTLOAD) {
+ // A normal, non-extending load may try to load more or less than the
+ // underlying global, which is invalid. We lower this to a load of the
+ // global (i32 or i64) then truncate or extend as needed
+
+ // Modify the MMO to load the full global
+ MachineMemOperand *OldMMO = LN->getMemOperand();
+ MachineMemOperand *NewMMO = DAG.getMachineFunction().getMachineMemOperand(
+ OldMMO->getPointerInfo(), OldMMO->getFlags(),
+ LLT(PromotedGT.getSimpleVT()), OldMMO->getBaseAlign(),
+ OldMMO->getAAInfo(), OldMMO->getRanges(), OldMMO->getSyncScopeID(),
+ OldMMO->getSuccessOrdering(), OldMMO->getFailureOrdering());
+
+ SDVTList Tys = DAG.getVTList(PromotedGT, MVT::Other);
+ SDValue Ops[] = {LN->getChain(), Base};
+ SDValue GlobalGetNode = DAG.getMemIntrinsicNode(
+ WebAssemblyISD::GLOBAL_GET, DL, Tys, Ops, PromotedGT, NewMMO);
+
+ if (ResultType.bitsEq(PromotedGT)) {
+ return GlobalGetNode;
+ }
+
+ SDValue ValRes;
+ if (ResultType.isFloatingPoint())
+ ValRes = DAG.getFPExtendOrRound(GlobalGetNode, DL, ResultType);
+ else
+ ValRes = DAG.getAnyExtOrTrunc(GlobalGetNode, DL, ResultType);
+
+ return DAG.getMergeValues({ValRes, LN->getChain()}, DL);
+ }
+
+ if (ExtType == ISD::ZEXTLOAD || ExtType == ISD::SEXTLOAD) {
+ // Turn the unsupported load into an EXTLOAD followed by an
+ // explicit zero/sign extend inreg. Same as Expand
+
+ SDValue Result =
+ DAG.getExtLoad(ISD::EXTLOAD, DL, ResultType, LN->getChain(), Base,
+ LN->getMemoryVT(), LN->getMemOperand());
+ SDValue ValRes;
+ if (ExtType == ISD::SEXTLOAD)
+ ValRes = DAG.getNode(ISD::SIGN_EXTEND_INREG, DL, Result.getValueType(),
+ Result, DAG.getValueType(LN->getMemoryVT()));
+ else
+ ValRes = DAG.getZeroExtendInReg(Result, DL, LN->getMemoryVT());
+
+ return DAG.getMergeValues({ValRes, LN->getChain()}, DL);
+ }
+
+ if (ExtType == ISD::EXTLOAD) {
+ // Expand the EXTLOAD into a regular LOAD of the global, and if
+ // needed, a zero-extension
+
+ EVT OldLoadType = LN->getMemoryVT();
+ EVT NewLoadType = (OldLoadType == MVT::i8 || OldLoadType == MVT::i16)
+ ? MVT::i32
+ : OldLoadType;
+
+ // Modify the MMO to load a whole WASM "register"'s worth
+ MachineMemOperand *OldMMO = LN->getMemOperand();
+ MachineMemOperand *NewMMO = DAG.getMachineFunction().getMachineMemOperand(
+ OldMMO->getPointerInfo(), OldMMO->getFlags(),
+ LLT(NewLoadType.getSimpleVT()), OldMMO->getBaseAlign(),
+ OldMMO->getAAInfo(), OldMMO->getRanges(), OldMMO->getSyncScopeID(),
+ OldMMO->getSuccessOrdering(), OldMMO->getFailureOrdering());
+
+ SDValue Result =
+ DAG.getLoad(NewLoadType, DL, LN->getChain(), Base, NewMMO);
+
+ if (NewLoadType != ResultType) {
+ SDValue ValRes = DAG.getNode(ISD::ANY_EXTEND, DL, ResultType, Result);
+ return DAG.getMergeValues({ValRes, LN->getChain()}, DL);
+ }
+
+ return Result;
+ }
+
+ report_fatal_error(
+ "encountered unexpected ExtType when loading from webassembly global",
+ false);
}
if (std::optional<unsigned> Local = IsWebAssemblyLocal(Base, DAG)) {
diff --git a/llvm/test/CodeGen/WebAssembly/lower-load-wasm-global.ll b/llvm/test/CodeGen/WebAssembly/lower-load-wasm-global.ll
new file mode 100644
index 0000000000000..0112296df1aa8
--- /dev/null
+++ b/llvm/test/CodeGen/WebAssembly/lower-load-wasm-global.ll
@@ -0,0 +1,185 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s | FileCheck %s
+
+; Test various loads from WASM (address space 1) globals lower as intended
+
+target triple = "wasm32-unknown-unknown"
+
+
+@globalI8 = local_unnamed_addr addrspace(1) global i8 undef
+@globalI32 = local_unnamed_addr addrspace(1) global i32 undef
+@globalI64 = local_unnamed_addr addrspace(1) global i64 undef
+
+
+define i32 @zext_i8_i32() {
+; CHECK-LABEL: zext_i8_i32:
+; CHECK: .functype zext_i8_i32 () -> (i32)
+; CHECK-NEXT: # %bb.0:
+; CHECK-NEXT: global.get globalI32
+; CHECK-NEXT: i32.const 255
+; CHECK-NEXT: i32.and
+; CHECK-NEXT: # fallthrough-return
+ %v = load i8, ptr addrspace(1) @globalI32
+ %e = zext i8 %v to i32
+ ret i32 %e
+}
+
+define i32 @sext_i8_i32() {
+; CHECK-LABEL: sext_i8_i32:
+; CHECK: .functype sext_i8_i32 () -> (i32)
+; CHECK-NEXT: # %bb.0:
+; CHECK-NEXT: global.get globalI32
+; CHECK-NEXT: i32.extend8_s
+; CHECK-NEXT: # fallthrough-return
+ %v = load i8, ptr addrspace(1) @globalI32
+ %e = sext i8 %v to i32
+ ret i32 %e
+}
+
+define i32 @zext_i16_i32() {
+; CHECK-LABEL: zext_i16_i32:
+; CHECK: .functype zext_i16_i32 () -> (i32)
+; CHECK-NEXT: # %bb.0:
+; CHECK-NEXT: global.get globalI32
+; CHECK-NEXT: i32.const 65535
+; CHECK-NEXT: i32.and
+; CHECK-NEXT: # fallthrough-return
+ %v = load i16, ptr addrspace(1) @globalI32
+ %e = zext i16 %v to i32
+ ret i32 %e
+}
+
+define i32 @sext_i16_i32() {
+; CHECK-LABEL: sext_i16_i32:
+; CHECK: .functype sext_i16_i32 () -> (i32)
+; CHECK-NEXT: # %bb.0:
+; CHECK-NEXT: global.get globalI32
+; CHECK-NEXT: i32.extend16_s
+; CHECK-NEXT: # fallthrough-return
+ %v = load i16, ptr addrspace(1) @globalI32
+ %e = sext i16 %v to i32
+ ret i32 %e
+}
+
+
+define i64 @zext_i8_i64() {
+; CHECK-LABEL: zext_i8_i64:
+; CHECK: .functype zext_i8_i64 () -> (i64)
+; CHECK-NEXT: # %bb.0:
+; CHECK-NEXT: global.get globalI64
+; CHECK-NEXT: i64.const 255
+; CHECK-NEXT: i64.and
+; CHECK-NEXT: # fallthrough-return
+ %v = load i8, ptr addrspace(1) @globalI64
+ %e = zext i8 %v to i64
+ ret i64 %e
+}
+
+define i64 @sext_i8_i64() {
+; CHECK-LABEL: sext_i8_i64:
+; CHECK: .functype sext_i8_i64 () -> (i64)
+; CHECK-NEXT: # %bb.0:
+; CHECK-NEXT: global.get globalI64
+; CHECK-NEXT: i64.extend8_s
+; CHECK-NEXT: # fallthrough-return
+ %v = load i8, ptr addrspace(1) @globalI64
+ %e = sext i8 %v to i64
+ ret i64 %e
+}
+
+define i64 @zext_i16_i64() {
+; CHECK-LABEL: zext_i16_i64:
+; CHECK: .functype zext_i16_i64 () -> (i64)
+; CHECK-NEXT: # %bb.0:
+; CHECK-NEXT: global.get globalI64
+; CHECK-NEXT: i64.const 65535
+; CHECK-NEXT: i64.and
+; CHECK-NEXT: # fallthrough-return
+ %v = load i16, ptr addrspace(1) @globalI64
+ %e = zext i16 %v to i64
+ ret i64 %e
+}
+
+define i64 @sext_i16_i64() {
+; CHECK-LABEL: sext_i16_i64:
+; CHECK: .functype sext_i16_i64 () -> (i64)
+; CHECK-NEXT: # %bb.0:
+; CHECK-NEXT: global.get globalI64
+; CHECK-NEXT: i64.extend16_s
+; CHECK-NEXT: # fallthrough-return
+ %v = load i16, ptr addrspace(1) @globalI64
+ %e = sext i16 %v to i64
+ ret i64 %e
+}
+
+define i64 @zext_i32_i64() {
+; CHECK-LABEL: zext_i32_i64:
+; CHECK: .functype zext_i32_i64 () -> (i64)
+; CHECK-NEXT: # %bb.0:
+; CHECK-NEXT: global.get globalI64
+; CHECK-NEXT: i64.const 4294967295
+; CHECK-NEXT: i64.and
+; CHECK-NEXT: # fallthrough-return
+ %v = load i32, ptr addrspace(1) @globalI64
+ %e = zext i32 %v to i64
+ ret i64 %e
+}
+
+define i64 @sext_i32_i64() {
+; CHECK-LABEL: sext_i32_i64:
+; CHECK: .functype sext_i32_i64 () -> (i64)
+; CHECK-NEXT: # %bb.0:
+; CHECK-NEXT: global.get globalI64
+; CHECK-NEXT: i64.extend32_s
+; CHECK-NEXT: # fallthrough-return
+ %v = load i32, ptr addrspace(1) @globalI64
+ %e = sext i32 %v to i64
+ ret i64 %e
+}
+
+
+define i64 @load_i64_from_i32() {
+; CHECK-LABEL: load_i64_from_i32:
+; CHECK: .functype load_i64_from_i32 () -> (i64)
+; CHECK-NEXT: # %bb.0:
+; CHECK-NEXT: global.get globalI32
+; CHECK-NEXT: i64.extend_i32_u
+; CHECK-NEXT: # fallthrough-return
+ %v = load i64, ptr addrspace(1) @globalI32
+ ret i64 %v
+}
+
+define i32 @load_i32_from_i64() {
+; CHECK-LABEL: load_i32_from_i64:
+; CHECK: .functype load_i32_from_i64 () -> (i32)
+; CHECK-NEXT: # %bb.0:
+; CHECK-NEXT: global.get globalI64
+; CHECK-NEXT: i32.wrap_i64
+; CHECK-NEXT: # fallthrough-return
+ %v = load i32, ptr addrspace(1) @globalI64
+ ret i32 %v
+}
+
+define i8 @load_i8() {
+; CHECK-LABEL: load_i8:
+; CHECK: .functype load_i8 () -> (i32)
+; CHECK-NEXT: # %bb.0:
+; CHECK-NEXT: global.get globalI8
+; CHECK-NEXT: # fallthrough-return
+ %v = load i8, ptr addrspace(1) @globalI8
+ ret i8 %v
+}
+
+define i64 @load_i16_from_i8_zext_to_i64() {
+; CHECK-LABEL: load_i16_from_i8_zext_to_i64:
+; CHECK: .functype load_i16_from_i8_zext_to_i64 () -> (i64)
+; CHECK-NEXT: # %bb.0:
+; CHECK-NEXT: global.get globalI8
+; CHECK-NEXT: i64.extend_i32_u
+; CHECK-NEXT: i64.const 65535
+; CHECK-NEXT: i64.and
+; CHECK-NEXT: # fallthrough-return
+ %v = load i16, ptr addrspace(1) @globalI8
+ %e = zext i16 %v to i64
+ ret i64 %e
+}
|
I've never worked with the LLVM code before, so please let me know if there was a simpler or cleaner way to do any of that. |
I ran the test-suite, and my changes introduced regressions in three tests.
The SIMD test fails now because there are various ANDs that are no longer being folded into the zext loads. I'm guessing that's because I marked extending loads "Custom" and it can no longer reason about it? The other two tests seem to be failing for similar reasons. What should I do about it? Is my code flawed, or do the tests need to be regenerated? |
6e3d317
to
ce78c24
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! good work on your first PR to LLVM. I've only contributed a few patches to WebAssembly but figure you'd need some reviews with the PR. I just have a few comments. Also, if you can some description on what the PR is doing that'd be great, I'm a bit hazy with what the PR is doing. I'll approve the CI to run. Seems you have a few bugs with SIMD, i'll check it out later and see if i can come up with anything
// global (i32 or i64) then truncate or extend as needed | ||
|
||
// Modify the MMO to load the full global | ||
MachineMemOperand *OldMMO = LN->getMemOperand(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like this can be made into a local helper since you're reusing it at the bottom as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is this the correct approach (and worth making into a helper)? I wasn't sure whether or not it's safe to modify the existing MMO, so I figured I better clone it.
Otherwise, I could call MachineMemOperand::setType
on the existing MMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks safe enough, LLT in each MMO is per SDNode with no references
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But could the MMO as a whole ever be shared? Or is that also created per node at first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm this i actually dont know, it is possible, i'll tag someone in after finishing the review
ce78c24
to
37e324d
Compare
7701611
to
24b14e4
Compare
The issue (#155880) I opened explains the problems I'm solving with this PR. Is there something in particular you need explained further? Should I edit the PR description with some details from the issue? |
I think I grasp the general idea now after taking some time to review. Ty! looks good to me, one thing i noticed locally is yes, the combine of You can run the update test check again on simd.ll to show that CI actually passes all check for now, and redo it after you have added PerformDAGCombine. Tagging @sparker-arm @dschuff for some reviews |
Thanks for the ping, I did actually take a look at it, and in particular the issue with the simd tests (since as you both observed, the load+extend is not getting turned into load_s). It's not quite obvious to me yet what is happening since I would have thought that if this custom lowering declines to actually lower the instruction, that it would fall back on the default lowering path including all of the DAG combines. But obviously that's not what's happening. I didn't get as far as trying to figure out exactly which code is running in each case and why. |
The odd thing: it's only when +simd128 is enabled that it triggers this problem. define i32 @extract_var_v16i8_s(<16 x i8> %v, i32 %i) {
%elem = extractelement <16 x i8> %v, i32 %i
%a = sext i8 %elem to i32
ret i32 %a
} Without any flags (no SIMD): .globl extract_var_v16i8_s # -- Begin function extract_var_v16i8_s
.type extract_var_v16i8_s,@function
extract_var_v16i8_s: # @extract_var_v16i8_s
.functype extract_var_v16i8_s (i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32) -> (i32)
.local i32
# %bb.0:
global.get __stack_pointer
i32.const 16
i32.sub
local.tee 17
local.get 15
i32.store8 15
local.get 17
local.get 14
i32.store8 14
# ... some more stores, omitted for brevity
local.get 17
local.get 1
i32.store8 1
local.get 17
local.get 0
i32.store8 0
local.get 17
local.get 16
i32.const 15
i32.and
i32.or
i32.load8_s 0
# fallthrough-return
end_function With SIMD128 enabled: .globl extract_var_v16i8_s # -- Begin function extract_var_v16i8_s
.type extract_var_v16i8_s,@function
extract_var_v16i8_s: # @extract_var_v16i8_s
.functype extract_var_v16i8_s (v128, i32) -> (i32)
.local i32
# %bb.0:
global.get __stack_pointer
i32.const 16
i32.sub
local.tee 2
local.get 0
v128.store 0
local.get 2
local.get 1
i32.const 15
i32.and
i32.or
i32.load8_u 0
i32.extend8_s
# fallthrough-return
end_function Note the difference between the ends.... Does that imply a bug with WASM's SIMD128 handling? |
Okay, so examining the DAGs, it seems to be a problem with when our instructions our lowered. Without SIMD, the type-legalized DAG already has a When SIMD128 is enabled, the load is not introduced into the full legalization phase (as the lowering of So how can I tell it it's allowed to recombine after legalization, as long is the load is not from addrspace(1)? No SIMDType-legalized DAG
Optimized type-legalized DAG
Legalized DAG
+SIMD128Type-legalized DAG
Optimized type-legalized DAG
Legalized DAG
Optimized legalized DAG
|
I think a more direct comparison is with and without your patch. I did this yesterday for Without patch:
With patch:
Here everything is the same until the last step, where it seems the DAG combiner is turning |
Okay, I'm not quite finished digging, but I think I roughly know why it's failing now. Before my patch, the combination of the But now that ZEXTLOAD is handled Custom, it can't With the patch applied, and SIMD disabled, the combination falls through and gets handled instead by But for some reason (despite appearing otherwise the same), |
Okay. if (LegalOperations &&
!TLI.isLoadExtLegal(ExtType, Load->getValueType(0), MemVT))
return false; It all comes down to the same: Not sure what to do about it. |
24b14e4
to
8a03ea5
Compare
@@ -3637,6 +3755,184 @@ static SDValue performMulCombine(SDNode *N, | |||
} | |||
} | |||
|
|||
static SDValue performANDCombine(SDNode *N, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I found one solution, but I don't like it. Duplicate the combiner logic here.
It works for simd.ll, but instruction cost calculation for the LoopVectorize related tests is still off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree... this shouldn't be the way to go.
DAGCombiner::reduceLoadWidth doesn't work with vector types. Certainly when I was working in this area, it was all about optimisations for CoreMark :) Does this need to be fixed during DAGCombine / Lowering? Could some tablegen patterns not just be added? |
The load in question is not a vector load. It's a scalar load of one of the elements the vector from the stack, created as the lowering of Folding It feels like we're missing something obvious here. Maybe the addrspace(1) load handling would be better suited elsewhere (other than ISelLowering)? Somewhere that won't interfere with these other optimizations??? |
Out of curiosity, would the new GlobalISel system have handled this situation better? |
How about making TargetLowering consider the address space value for |
Maybe, it might have better handling of address spaces due to its popularity with GPUs. But I'd expect the backend would still have some implementation work to do. |
Yeah, that'd be perfect! But I was under the impression that making such changes to long standing system to solve a minor issue arising due to an unusual use case would be undesirable.
It looks like it natively supports understanding other address spaces (you must define the pointer's address space in the LLT you define as legal for the op), and in general has better support for understanding complex logic without needed custom, optimization breaking code. Like, address space 0 could be marked "legal" for SEXT/ZEXT, but must be "lowered" for address space 1 (no custom logic required). Then for the LOAD itself: legal for addrspace 0, custom for addrspace 1. Are there any plans to migrate WASM to GlobalISel any time soon? If there's interest but not enough manpower, I wouldn't mind trying to set up the foundation myself (but that will still take up time from reviewers down the line). Or is it simply not worth it at all for this backend? It seems like the plan is to eventually phase out SelectionDAG and FastISel in favor of it...or am I misunderstanding its purpose? |
Not currently. It is indeed more a matter of lack of engineering time (and unclear how much the benefit would be) than anything else. I wouldn't mind having someone work on this, but I expect it would be a long project to get it up to parity with SelectionDAG (although I would hope not as large as it's been for other architectures like ARM64, as the wasm backend is simpler... I'm still not sure what the status there is, probably @sparker-arm knows). I would hope we could do some investigation on the front end to have some confidence that there would be some benefit in the end (i.e. code quality, maintainability, etc). None of us on the wasm side are experts on GlobalISel, but if you're serious about this, then I wouldn't mind putting in the time it would probably take for competent review :) |
Oh, and regarding an eventual full migration: IIUC the purpose of GlobalISel is indeed to be a full replacement for SDAG, but it's a huge undertaking to migrate everything. Hard to say whether it will ever full happen across all the backends. |
My goal would to be do it somewhat quick and dirty at first, get a PoC going before cleaning it up. I started on this yesterday out of curiosity, and it doesn't seem so bad yet. Got In some ways, WASM is simpler than other architectures due to the lack of physical registers and stack spilling. Once I get the swifterror kink worked out, I'll open a draft PR (worst case it gets scrapped *shrug*). |
Alright, #157161 is underway. But that doesn't solve our issues here. I liked @sparker-arm's idea, but I'm not sure that's feasible. I had no idea this would end up being such a rabbit hole. |
Implements custom EXTLOAD lowering logic to enable correct lowering of such from WASM (address space 1) globals, similarily to the existing LOAD logic.
These fixes also include making sure that the global.load is done for the type of the underlying global (accounting for globals declared smaller than i32), and the loaded value is extended/truncated to the desired result width (preventing invalid instruction sequences).
Loads from integral or floating-point globals of widths other than 8, 16, 32, or 64 bits are explicitly disallowed.
Fixes: #155880