-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[CodeGen, CHERI] Add capability types to MVT. #156616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Open question: should we omit the |
Additional note: I've opted to omit the |
Tagging @jrtc27 @arichardson |
Yeah it is not needed upstream. |
You mean cPTR? |
Sorry, yes. Brain glitch. |
Authorship probably shouldn’t claim to be solely you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add @davidchisnall and @jrtc27 to Co-authored-by:
but otherwise this looks good to me once c256 is gone.
Question to other LLVM maintainers, is the "Capability" name okay here, or does it need to be something more explicit like "CheriCapability"?
This adds value types for representing capability types, enabling their use in instruction selection and other parts of the backend. These types are distinguished from each other only by size. This is sufficient, at least today, because no existing CHERI configuration supports multiple capability sizes simultaneously. Hybrid configurations supporting intermixed integral pointers and capabilities do exist, and are one of the reasons why these value types are needed beyond existing integral types. Co-authored-by: David Chisnall <theraven@theravensnest.org> Co-authored-by: Jessica Clarke <jrtc27@jrtc27.com>
@llvm/pr-subscribers-tablegen Author: Owen Anderson (resistor) ChangesThis adds value types for representing capability types, enabling their use in instruction selection and other parts of the backend. These types are distinguished from each other only by size. This is sufficient, at least today, because no existing CHERI configuration supports multiple capability sizes simultaneously. Hybrid configurations supporting intermixed integral pointers and capabilities do exist, and are one of the reasons why these value types are needed beyond existing integral types. Full diff: https://github.com/llvm/llvm-project/pull/156616.diff 3 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/ValueTypes.td b/llvm/include/llvm/CodeGen/ValueTypes.td
index b06158d85f510..0d6875dd1b36b 100644
--- a/llvm/include/llvm/CodeGen/ValueTypes.td
+++ b/llvm/include/llvm/CodeGen/ValueTypes.td
@@ -28,6 +28,7 @@ class ValueType<int size, int value> {
// Indicates this VT should be included in the
// [FIRST_VALUETYPE,LAST_VALUETYPE] range.
bit isNormalValueType = true;
+ bit isCapability = false;
}
class VTAny<int value> : ValueType<0, value> {
@@ -65,6 +66,10 @@ class VTVecTup<int size, int nf, ValueType dummy_elt, int value>
let isRISCVVecTuple = true;
}
+class VTCapability<int size, int value> : ValueType<size, value> {
+ let isCapability = true;
+}
+
defset list<ValueType> ValueTypes = {
def OtherVT : ValueType<0, 1> { // "Other" value
@@ -357,6 +362,9 @@ def amdgpuBufferStridedPointer : ValueType<192, 252>;
def aarch64mfp8 : ValueType<8, 253>; // 8-bit value in FPR (AArch64)
+def c64 : VTCapability<64, 254>; // 64-bit capability value
+def c128 : VTCapability<128, 255>; // 128-bit capability value
+
let isNormalValueType = false in {
def token : ValueType<0, 504>; // TokenTy
def MetadataVT : ValueType<0, 505> { // Metadata
diff --git a/llvm/include/llvm/CodeGenTypes/MachineValueType.h b/llvm/include/llvm/CodeGenTypes/MachineValueType.h
index b8e91a022ec5e..b3ba1d7c3a568 100644
--- a/llvm/include/llvm/CodeGenTypes/MachineValueType.h
+++ b/llvm/include/llvm/CodeGenTypes/MachineValueType.h
@@ -178,6 +178,12 @@ namespace llvm {
return (isFixedLengthVector() && getFixedSizeInBits() == 2048);
}
+ /// Return true if this is a capability type.
+ bool isCapability() const {
+ return (SimpleTy >= MVT::FIRST_CAPABILITY_VALUETYPE) &&
+ (SimpleTy <= MVT::LAST_CAPABILITY_VALUETYPE);
+ }
+
/// Return true if this is an overloaded type for TableGen.
bool isOverloaded() const {
switch (SimpleTy) {
diff --git a/llvm/utils/TableGen/Basic/VTEmitter.cpp b/llvm/utils/TableGen/Basic/VTEmitter.cpp
index 040f37c3a5e1e..53cb53296e7c2 100644
--- a/llvm/utils/TableGen/Basic/VTEmitter.cpp
+++ b/llvm/utils/TableGen/Basic/VTEmitter.cpp
@@ -132,6 +132,7 @@ void VTEmitter::run(raw_ostream &OS) {
bool IsVector = VT->getValueAsBit("isVector");
bool IsScalable = VT->getValueAsBit("isScalable");
bool IsRISCVVecTuple = VT->getValueAsBit("isRISCVVecTuple");
+ bool IsCapability = VT->getValueAsBit("isCapability");
int64_t NF = VT->getValueAsInt("NF");
bool IsNormalValueType = VT->getValueAsBit("isNormalValueType");
int64_t NElem = IsVector ? VT->getValueAsInt("nElem") : 0;
@@ -152,6 +153,7 @@ void VTEmitter::run(raw_ostream &OS) {
UpdateVTRange("INTEGER_VALUETYPE", Name, IsInteger && !IsVector);
UpdateVTRange("FP_VALUETYPE", Name, IsFP && !IsVector);
UpdateVTRange("VALUETYPE", Name, IsNormalValueType);
+ UpdateVTRange("CAPABILITY_VALUETYPE", Name, IsCapability);
// clang-format off
OS << " GET_VT_ATTR("
|
Added co-author lines to the commit message. I thought about adding pointers to the downstream commits in the commit log, but maybe that's overkill? |
I've removed it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LLVM is purely "Squash and merge" and configured to use the PR description as the commit message, so please update that. |
GitHub takes the Co-authored-by lines from the commits when generating the squashed commit message, it's not necessary to add them to the PR description as well. |
Either way, I've added them to the PR description :-) |
Good question. @nikic what's your view on nomenclature here? |
Can the new types be useful for other targets with similar architectural features? If so, I think it makes sense to give them more sensible names. Otherwise the term should be explained somewhere, as the current name does not give a clue that it is just a flavor of a pointer. |
@arichardson I pulled in the |
Well, CHERI is a portable concept; Morello is CHERI for Arm, the upcoming standard RVY and Microsoft/SCI's non-standard CHERIoT (hopefully converging with the upcoming standard) are CHERI for RISC-V, we've historically had CHERI-MIPS and we've also sketched CHERI-x86. So it's useful for other targets in the sense it's not a RISC-V thing, but how applicable to other kinds of pointers it is I don't know, we've never explored that. |
There's a moderate amount of code in the CHERI downstreams that uses While there's an extent to which the interpretation of a MVT is always up to the backend, given the above I don't think it would be advisable for a backend to attempt to overload these to represent some other tangential concept. I'll go ahead and add a block comment clarifying that these represent CHERI capabilities. Can you take a look once I push it? |
Sure. I'm not really familiar with CHERI, so I'd prefer the comment clarify what these are. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
This is definitely quite similar in terms of the constraints that are imposed. Historically this MVT was actually called iFATPTR128, but in addition to fat-pointer semantics, CHERI capabilities also have the restriction that the rights they grant can only be reduced and never increased (monotonicity). We decided to rename it to capability to clarify this. I don't recall how much this difference actually matters at the SelectionDAG level, but I imagine there are a few cases where we do have to care. |
@@ -228,6 +228,9 @@ namespace llvm { | |||
return isSimple() ? V.is2048BitVector() : isExtended2048BitVector(); | |||
} | |||
|
|||
/// Return true if this is a capability type. | |||
bool isCapability() const { return isSimple() ? V.isCapability() : false; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided to rename it to capability to clarify this.
Sorry, this term doesn't make sense to me. The dictionary describes "capability" as "the power or ability to do something", but I don't see how a type or a pointer can be that power or ability to do something (and what is something?).
Can this be named isCheriPointer()
maybe? And the individual types cheri_ptr64
/cheri_ptr128
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isInteger means values of that type are integers. isCapability means values of that type are capabilities. This is capability in the capability-based security sense, which has a long history and is not some new term we've invented. See https://en.wikipedia.org/wiki/Capability-based_security for example:
A capability (known in some systems as a key) is a communicable, unforgeable token of authority
cheri_ptr64/cheri_ptr128 gets quite verbose and looks odd next to the iN counterparts. pAny for us is {iPTR, cPTR}, but you would be proposing having that mean {iPTR, cheri_ptrPTR}? That looks rather odd and silly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll also throw in this link re: capability as existing terminology https://en.wikipedia.org/wiki/Capability-based_addressing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a compromise: keep using c64/c128 and change bool isCapability()
to bool isCheriCapability()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isInteger means values of that type are integers. isCapability means values of that type are capabilities. This is capability in the capability-based security sense, which has a long history and is not some new term we've invented. See https://en.wikipedia.org/wiki/Capability-based_security for example:
A capability (known in some systems as a key) is a communicable, unforgeable token of authority
Thanks, I wasn't aware of that.
cheri_ptr64/cheri_ptr128 gets quite verbose and looks odd next to the iN counterparts. pAny for us is {iPTR, cPTR}, but you would be proposing having that mean {iPTR, cheri_ptrPTR}? That looks rather odd and silly.
It doesn't look odd to me. We have x86mmx, a whole bunch of riscv_nxvMiNxK, aarch64svcount, spirvbuiltin, amdgpuBufferFatPointer, aarch64mfp8, etc. Contrary, c64
seems too short and ambiguous (the first impression would be that it is a complex number).
How about a compromise: keep using c64/c128 and change
bool isCapability()
tobool isCheriCapability()
?
I'll be happy with any naming llvm maintainers are happy. I'm just suggesting variants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look odd to me. We have x86mmx, a whole bunch of riscv_nxvMiNxK, aarch64svcount, spirvbuiltin, amdgpuBufferFatPointer, aarch64mfp8, etc. Contrary, c64 seems too short and ambiguous (the first impression would be that it is a complex number).
The difference is that all those are niche things that have a very narrow use. CHERI capabilities are much more pervasive; every pointer becomes one, and we need the full flexibility of c64/c128/cPTR (naming TBD) because we want to be general across 32-bit and 64-bit architectures just as with i32/i64/iPTR. Unlike all of those, pAny also needs to cover our types. So yes, LLVM has all kinds of cumbersome names for things, but they're for things that are architecture-specific and limited in use, neither of which apply to us. If you are targeting a CHERI system, cN are just as core types as iN, and having uniformity between the two is a good idea IMO.
To be clear though, I'm not wed to cN/cPTR and nothing else, but cheri_ptr64/cheri_ptr128/cheri_ptrPTR is going to make all the CHERI code far uglier than it needs to be (and the cheri_ptr version of iPTR, cheri_ptrPTR, is particularly nonsensical as a name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the predicate's name to isCheriCapability
throughout. I think that "namespacing" the term capability to be specifically its CHERI-verse interpretation is helpful.
This adds value types for representing capability types, enabling their use in instruction selection and other parts of the backend.
These types are distinguished from each other only by size. This is sufficient, at least today, because no existing CHERI configuration supports multiple capability sizes simultaneously. Hybrid configurations supporting intermixed integral pointers and capabilities do exist, and are one of the reasons why these value types are needed beyond existing integral types.
Co-authored-by: David Chisnall theraven@theravensnest.org
Co-authored-by: Jessica Clarke jrtc27@jrtc27.com