Skip to content

Conversation

resistor
Copy link
Collaborator

@resistor resistor commented Sep 3, 2025

This adds value types for representing capability types, enabling their use in instruction selection and other parts of the backend.

These types are distinguished from each other only by size. This is sufficient, at least today, because no existing CHERI configuration supports multiple capability sizes simultaneously. Hybrid configurations supporting intermixed integral pointers and capabilities do exist, and are one of the reasons why these value types are needed beyond existing integral types.

Co-authored-by: David Chisnall theraven@theravensnest.org
Co-authored-by: Jessica Clarke jrtc27@jrtc27.com

@resistor
Copy link
Collaborator Author

resistor commented Sep 3, 2025

Open question: should we omit the c256 type? I believe it is purely historical at this point?

@resistor
Copy link
Collaborator Author

resistor commented Sep 3, 2025

Additional note: I've opted to omit the cAny tblgen wildcard type from this patch to keep it minimal. I expect to add it in a subsequent patch.

@resistor resistor marked this pull request as ready for review September 3, 2025 08:30
@resistor
Copy link
Collaborator Author

resistor commented Sep 3, 2025

Tagging @jrtc27 @arichardson

@jrtc27
Copy link
Collaborator

jrtc27 commented Sep 3, 2025

Open question: should we omit the c256 type? I believe it is purely historical at this point?

Yeah it is not needed upstream.

@jrtc27
Copy link
Collaborator

jrtc27 commented Sep 3, 2025

Additional note: I've opted to omit the cAny tblgen wildcard type from this patch to keep it minimal. I expect to add it in a subsequent patch.

You mean cPTR?

@resistor
Copy link
Collaborator Author

resistor commented Sep 3, 2025

Additional note: I've opted to omit the cAny tblgen wildcard type from this patch to keep it minimal. I expect to add it in a subsequent patch.

You mean cPTR?

Sorry, yes. Brain glitch.

@jrtc27
Copy link
Collaborator

jrtc27 commented Sep 3, 2025

Authorship probably shouldn’t claim to be solely you

Copy link
Member

@arichardson arichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add @davidchisnall and @jrtc27 to Co-authored-by: but otherwise this looks good to me once c256 is gone.

Question to other LLVM maintainers, is the "Capability" name okay here, or does it need to be something more explicit like "CheriCapability"?

resistor and others added 2 commits September 3, 2025 23:00
This adds value types for representing capability types, enabling their use in instruction selection and other parts of the backend.

These types are distinguished from each other only by size. This is sufficient, at least today, because no existing CHERI configuration supports multiple capability sizes simultaneously. Hybrid configurations supporting intermixed integral pointers and capabilities do exist, and are one of the reasons why these value types are needed beyond existing integral types.

Co-authored-by: David Chisnall <theraven@theravensnest.org>
Co-authored-by: Jessica Clarke <jrtc27@jrtc27.com>
@llvmbot
Copy link
Member

llvmbot commented Sep 3, 2025

@llvm/pr-subscribers-tablegen

Author: Owen Anderson (resistor)

Changes

This adds value types for representing capability types, enabling their use in instruction selection and other parts of the backend.

These types are distinguished from each other only by size. This is sufficient, at least today, because no existing CHERI configuration supports multiple capability sizes simultaneously. Hybrid configurations supporting intermixed integral pointers and capabilities do exist, and are one of the reasons why these value types are needed beyond existing integral types.


Full diff: https://github.com/llvm/llvm-project/pull/156616.diff

3 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/ValueTypes.td (+8)
  • (modified) llvm/include/llvm/CodeGenTypes/MachineValueType.h (+6)
  • (modified) llvm/utils/TableGen/Basic/VTEmitter.cpp (+2)
diff --git a/llvm/include/llvm/CodeGen/ValueTypes.td b/llvm/include/llvm/CodeGen/ValueTypes.td
index b06158d85f510..0d6875dd1b36b 100644
--- a/llvm/include/llvm/CodeGen/ValueTypes.td
+++ b/llvm/include/llvm/CodeGen/ValueTypes.td
@@ -28,6 +28,7 @@ class ValueType<int size, int value> {
   // Indicates this VT should be included in the
   // [FIRST_VALUETYPE,LAST_VALUETYPE] range.
   bit isNormalValueType = true;
+  bit isCapability = false;
 }
 
 class VTAny<int value> : ValueType<0, value> {
@@ -65,6 +66,10 @@ class VTVecTup<int size, int nf, ValueType dummy_elt, int value>
   let isRISCVVecTuple = true;
 }
 
+class VTCapability<int size, int value> : ValueType<size, value> {
+  let isCapability = true;
+}
+
 defset list<ValueType> ValueTypes = {
 
 def OtherVT : ValueType<0,   1> {  // "Other" value
@@ -357,6 +362,9 @@ def amdgpuBufferStridedPointer : ValueType<192, 252>;
 
 def aarch64mfp8 : ValueType<8,  253>;  // 8-bit value in FPR (AArch64)
 
+def c64 : VTCapability<64, 254>;   // 64-bit capability value
+def c128 : VTCapability<128, 255>; // 128-bit capability value
+
 let isNormalValueType = false in {
 def token      : ValueType<0, 504>;  // TokenTy
 def MetadataVT : ValueType<0, 505> { // Metadata
diff --git a/llvm/include/llvm/CodeGenTypes/MachineValueType.h b/llvm/include/llvm/CodeGenTypes/MachineValueType.h
index b8e91a022ec5e..b3ba1d7c3a568 100644
--- a/llvm/include/llvm/CodeGenTypes/MachineValueType.h
+++ b/llvm/include/llvm/CodeGenTypes/MachineValueType.h
@@ -178,6 +178,12 @@ namespace llvm {
       return (isFixedLengthVector() && getFixedSizeInBits() == 2048);
     }
 
+    /// Return true if this is a capability type.
+    bool isCapability() const {
+      return (SimpleTy >= MVT::FIRST_CAPABILITY_VALUETYPE) &&
+             (SimpleTy <= MVT::LAST_CAPABILITY_VALUETYPE);
+    }
+
     /// Return true if this is an overloaded type for TableGen.
     bool isOverloaded() const {
       switch (SimpleTy) {
diff --git a/llvm/utils/TableGen/Basic/VTEmitter.cpp b/llvm/utils/TableGen/Basic/VTEmitter.cpp
index 040f37c3a5e1e..53cb53296e7c2 100644
--- a/llvm/utils/TableGen/Basic/VTEmitter.cpp
+++ b/llvm/utils/TableGen/Basic/VTEmitter.cpp
@@ -132,6 +132,7 @@ void VTEmitter::run(raw_ostream &OS) {
     bool IsVector = VT->getValueAsBit("isVector");
     bool IsScalable = VT->getValueAsBit("isScalable");
     bool IsRISCVVecTuple = VT->getValueAsBit("isRISCVVecTuple");
+    bool IsCapability = VT->getValueAsBit("isCapability");
     int64_t NF = VT->getValueAsInt("NF");
     bool IsNormalValueType =  VT->getValueAsBit("isNormalValueType");
     int64_t NElem = IsVector ? VT->getValueAsInt("nElem") : 0;
@@ -152,6 +153,7 @@ void VTEmitter::run(raw_ostream &OS) {
     UpdateVTRange("INTEGER_VALUETYPE", Name, IsInteger && !IsVector);
     UpdateVTRange("FP_VALUETYPE", Name, IsFP && !IsVector);
     UpdateVTRange("VALUETYPE", Name, IsNormalValueType);
+    UpdateVTRange("CAPABILITY_VALUETYPE", Name, IsCapability);
 
     // clang-format off
     OS << "  GET_VT_ATTR("

@resistor
Copy link
Collaborator Author

resistor commented Sep 3, 2025

Authorship probably shouldn’t claim to be solely you

Added co-author lines to the commit message. I thought about adding pointers to the downstream commits in the commit log, but maybe that's overkill?

@resistor
Copy link
Collaborator Author

resistor commented Sep 3, 2025

Open question: should we omit the c256 type? I believe it is purely historical at this point?

Yeah it is not needed upstream.

I've removed it.

Copy link
Member

@arichardson arichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jrtc27
Copy link
Collaborator

jrtc27 commented Sep 3, 2025

Authorship probably shouldn’t claim to be solely you

Added co-author lines to the commit message. I thought about adding pointers to the downstream commits in the commit log, but maybe that's overkill?

LLVM is purely "Squash and merge" and configured to use the PR description as the commit message, so please update that.

@nikic
Copy link
Contributor

nikic commented Sep 3, 2025

Authorship probably shouldn’t claim to be solely you

Added co-author lines to the commit message. I thought about adding pointers to the downstream commits in the commit log, but maybe that's overkill?

LLVM is purely "Squash and merge" and configured to use the PR description as the commit message, so please update that.

GitHub takes the Co-authored-by lines from the commits when generating the squashed commit message, it's not necessary to add them to the PR description as well.

@resistor
Copy link
Collaborator Author

resistor commented Sep 3, 2025

Either way, I've added them to the PR description :-)

@resistor
Copy link
Collaborator Author

resistor commented Sep 3, 2025

Question to other LLVM maintainers, is the "Capability" name okay here, or does it need to be something more explicit like "CheriCapability"?

Good question. @nikic what's your view on nomenclature here?

@s-barannikov
Copy link
Contributor

Can the new types be useful for other targets with similar architectural features? If so, I think it makes sense to give them more sensible names. Otherwise the term should be explained somewhere, as the current name does not give a clue that it is just a flavor of a pointer.

@resistor
Copy link
Collaborator Author

resistor commented Sep 3, 2025

LGTM

@arichardson I pulled in the getEVTString implementation that goes with this change. Please re-review if possible.

@jrtc27
Copy link
Collaborator

jrtc27 commented Sep 3, 2025

Can the new types be useful for other targets with similar architectural features? If so, I think it makes sense to give them more sensible names. Otherwise the term should be explained somewhere, as the current name does not give a clue that it is just a flavor of a pointer.

Well, CHERI is a portable concept; Morello is CHERI for Arm, the upcoming standard RVY and Microsoft/SCI's non-standard CHERIoT (hopefully converging with the upcoming standard) are CHERI for RISC-V, we've historically had CHERI-MIPS and we've also sketched CHERI-x86. So it's useful for other targets in the sense it's not a RISC-V thing, but how applicable to other kinds of pointers it is I don't know, we've never explored that.

@resistor
Copy link
Collaborator Author

resistor commented Sep 3, 2025

Can the new types be useful for other targets with similar architectural features? If so, I think it makes sense to give them more sensible names. Otherwise the term should be explained somewhere, as the current name does not give a clue that it is just a flavor of a pointer.

There's a moderate amount of code in the CHERI downstreams that uses isCapability (or a transitive caller of it) to implement CHERI-aware code generation. From some quick searching, that code is mostly in SelectionDAG, but it does get used in a few other bits of llvm/lib/CodeGen as well.

While there's an extent to which the interpretation of a MVT is always up to the backend, given the above I don't think it would be advisable for a backend to attempt to overload these to represent some other tangential concept.

I'll go ahead and add a block comment clarifying that these represent CHERI capabilities. Can you take a look once I push it?

@s-barannikov
Copy link
Contributor

s-barannikov commented Sep 3, 2025

I'll go ahead and add a block comment clarifying that these represent CHERI capabilities. Can you take a look once I push it?

Sure. I'm not really familiar with CHERI, so I'd prefer the comment clarify what these are.
I once worked on a target (not LLVM-based) that also has peculiar pointers (e.g. you can cast it to an integer, but the opposite isn't allowed in user mode, and it also encodes information about bounds of the object it originates from). Maybe it's the same thing. These fat pointers were called "pointer descriptors" there.

Copy link

github-actions bot commented Sep 3, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@arichardson
Copy link
Member

I'll go ahead and add a block comment clarifying that these represent CHERI capabilities. Can you take a look once I push it?

Sure. I'm not really familiar with CHERI, so I'd prefer the comment clarify what these are. I once worked on a target (not LLVM-based) that also has peculiar pointers (e.g. you can cast it to an integer, but the opposite isn't allowed in user mode, and it also encodes information about object bounds it originates from). Maybe it's the same thing. These fat pointers were called "pointer descriptors" there.

This is definitely quite similar in terms of the constraints that are imposed. Historically this MVT was actually called iFATPTR128, but in addition to fat-pointer semantics, CHERI capabilities also have the restriction that the rights they grant can only be reduced and never increased (monotonicity). We decided to rename it to capability to clarify this. I don't recall how much this difference actually matters at the SelectionDAG level, but I imagine there are a few cases where we do have to care.

@@ -228,6 +228,9 @@ namespace llvm {
return isSimple() ? V.is2048BitVector() : isExtended2048BitVector();
}

/// Return true if this is a capability type.
bool isCapability() const { return isSimple() ? V.isCapability() : false; }
Copy link
Contributor

@s-barannikov s-barannikov Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided to rename it to capability to clarify this.

Sorry, this term doesn't make sense to me. The dictionary describes "capability" as "the power or ability to do something", but I don't see how a type or a pointer can be that power or ability to do something (and what is something?).

Can this be named isCheriPointer() maybe? And the individual types cheri_ptr64/cheri_ptr128?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isInteger means values of that type are integers. isCapability means values of that type are capabilities. This is capability in the capability-based security sense, which has a long history and is not some new term we've invented. See https://en.wikipedia.org/wiki/Capability-based_security for example:

A capability (known in some systems as a key) is a communicable, unforgeable token of authority

cheri_ptr64/cheri_ptr128 gets quite verbose and looks odd next to the iN counterparts. pAny for us is {iPTR, cPTR}, but you would be proposing having that mean {iPTR, cheri_ptrPTR}? That looks rather odd and silly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll also throw in this link re: capability as existing terminology https://en.wikipedia.org/wiki/Capability-based_addressing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a compromise: keep using c64/c128 and change bool isCapability() to bool isCheriCapability()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isInteger means values of that type are integers. isCapability means values of that type are capabilities. This is capability in the capability-based security sense, which has a long history and is not some new term we've invented. See https://en.wikipedia.org/wiki/Capability-based_security for example:

A capability (known in some systems as a key) is a communicable, unforgeable token of authority

Thanks, I wasn't aware of that.

cheri_ptr64/cheri_ptr128 gets quite verbose and looks odd next to the iN counterparts. pAny for us is {iPTR, cPTR}, but you would be proposing having that mean {iPTR, cheri_ptrPTR}? That looks rather odd and silly.

It doesn't look odd to me. We have x86mmx, a whole bunch of riscv_nxvMiNxK, aarch64svcount, spirvbuiltin, amdgpuBufferFatPointer, aarch64mfp8, etc. Contrary, c64 seems too short and ambiguous (the first impression would be that it is a complex number).

How about a compromise: keep using c64/c128 and change bool isCapability() to bool isCheriCapability()?

I'll be happy with any naming llvm maintainers are happy. I'm just suggesting variants.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look odd to me. We have x86mmx, a whole bunch of riscv_nxvMiNxK, aarch64svcount, spirvbuiltin, amdgpuBufferFatPointer, aarch64mfp8, etc. Contrary, c64 seems too short and ambiguous (the first impression would be that it is a complex number).

The difference is that all those are niche things that have a very narrow use. CHERI capabilities are much more pervasive; every pointer becomes one, and we need the full flexibility of c64/c128/cPTR (naming TBD) because we want to be general across 32-bit and 64-bit architectures just as with i32/i64/iPTR. Unlike all of those, pAny also needs to cover our types. So yes, LLVM has all kinds of cumbersome names for things, but they're for things that are architecture-specific and limited in use, neither of which apply to us. If you are targeting a CHERI system, cN are just as core types as iN, and having uniformity between the two is a good idea IMO.

To be clear though, I'm not wed to cN/cPTR and nothing else, but cheri_ptr64/cheri_ptr128/cheri_ptrPTR is going to make all the CHERI code far uglier than it needs to be (and the cheri_ptr version of iPTR, cheri_ptrPTR, is particularly nonsensical as a name).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the predicate's name to isCheriCapability throughout. I think that "namespacing" the term capability to be specifically its CHERI-verse interpretation is helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants