[CodeGen, CHERI] Add capability types to MVT. #156616

resistor · 2025-09-03T08:23:27Z

This adds value types for representing capability types, enabling their use in instruction selection and other parts of the backend.

These types are distinguished from each other only by size. This is sufficient, at least today, because no existing CHERI configuration supports multiple capability sizes simultaneously. Hybrid configurations supporting intermixed integral pointers and capabilities do exist, and are one of the reasons why these value types are needed beyond existing integral types.

Co-authored-by: David Chisnall theraven@theravensnest.org
Co-authored-by: Jessica Clarke jrtc27@jrtc27.com

resistor · 2025-09-03T08:24:38Z

Open question: should we omit the c256 type? I believe it is purely historical at this point?

resistor · 2025-09-03T08:29:29Z

Additional note: I've opted to omit the cAny tblgen wildcard type from this patch to keep it minimal. I expect to add it in a subsequent patch.

resistor · 2025-09-03T08:30:40Z

Tagging @jrtc27 @arichardson

llvm/include/llvm/CodeGen/ValueTypes.td

jrtc27 · 2025-09-03T13:55:14Z

Open question: should we omit the c256 type? I believe it is purely historical at this point?

Yeah it is not needed upstream.

jrtc27 · 2025-09-03T13:55:25Z

Additional note: I've opted to omit the cAny tblgen wildcard type from this patch to keep it minimal. I expect to add it in a subsequent patch.

You mean cPTR?

resistor · 2025-09-03T13:56:11Z

Additional note: I've opted to omit the cAny tblgen wildcard type from this patch to keep it minimal. I expect to add it in a subsequent patch.

You mean cPTR?

Sorry, yes. Brain glitch.

jrtc27 · 2025-09-03T13:57:40Z

Authorship probably shouldn’t claim to be solely you

arichardson

I would add @davidchisnall and @jrtc27 to Co-authored-by: but otherwise this looks good to me once c256 is gone.

Question to other LLVM maintainers, is the "Capability" name okay here, or does it need to be something more explicit like "CheriCapability"?

This adds value types for representing capability types, enabling their use in instruction selection and other parts of the backend. These types are distinguished from each other only by size. This is sufficient, at least today, because no existing CHERI configuration supports multiple capability sizes simultaneously. Hybrid configurations supporting intermixed integral pointers and capabilities do exist, and are one of the reasons why these value types are needed beyond existing integral types. Co-authored-by: David Chisnall <theraven@theravensnest.org> Co-authored-by: Jessica Clarke <jrtc27@jrtc27.com>

llvmbot · 2025-09-03T15:02:59Z

@llvm/pr-subscribers-tablegen

Author: Owen Anderson (resistor)

Changes

This adds value types for representing capability types, enabling their use in instruction selection and other parts of the backend.

These types are distinguished from each other only by size. This is sufficient, at least today, because no existing CHERI configuration supports multiple capability sizes simultaneously. Hybrid configurations supporting intermixed integral pointers and capabilities do exist, and are one of the reasons why these value types are needed beyond existing integral types.

Full diff: https://github.com/llvm/llvm-project/pull/156616.diff

3 Files Affected:

(modified) llvm/include/llvm/CodeGen/ValueTypes.td (+8)
(modified) llvm/include/llvm/CodeGenTypes/MachineValueType.h (+6)
(modified) llvm/utils/TableGen/Basic/VTEmitter.cpp (+2)

diff --git a/llvm/include/llvm/CodeGen/ValueTypes.td b/llvm/include/llvm/CodeGen/ValueTypes.td
index b06158d85f510..0d6875dd1b36b 100644
--- a/llvm/include/llvm/CodeGen/ValueTypes.td
+++ b/llvm/include/llvm/CodeGen/ValueTypes.td
@@ -28,6 +28,7 @@ class ValueType<int size, int value> {
   // Indicates this VT should be included in the
   // [FIRST_VALUETYPE,LAST_VALUETYPE] range.
   bit isNormalValueType = true;
+  bit isCapability = false;
 }
 
 class VTAny<int value> : ValueType<0, value> {
@@ -65,6 +66,10 @@ class VTVecTup<int size, int nf, ValueType dummy_elt, int value>
   let isRISCVVecTuple = true;
 }
 
+class VTCapability<int size, int value> : ValueType<size, value> {
+  let isCapability = true;
+}
+
 defset list<ValueType> ValueTypes = {
 
 def OtherVT : ValueType<0,   1> {  // "Other" value
@@ -357,6 +362,9 @@ def amdgpuBufferStridedPointer : ValueType<192, 252>;
 
 def aarch64mfp8 : ValueType<8,  253>;  // 8-bit value in FPR (AArch64)
 
+def c64 : VTCapability<64, 254>;   // 64-bit capability value
+def c128 : VTCapability<128, 255>; // 128-bit capability value
+
 let isNormalValueType = false in {
 def token      : ValueType<0, 504>;  // TokenTy
 def MetadataVT : ValueType<0, 505> { // Metadata
diff --git a/llvm/include/llvm/CodeGenTypes/MachineValueType.h b/llvm/include/llvm/CodeGenTypes/MachineValueType.h
index b8e91a022ec5e..b3ba1d7c3a568 100644
--- a/llvm/include/llvm/CodeGenTypes/MachineValueType.h
+++ b/llvm/include/llvm/CodeGenTypes/MachineValueType.h
@@ -178,6 +178,12 @@ namespace llvm {
       return (isFixedLengthVector() && getFixedSizeInBits() == 2048);
     }
 
+    /// Return true if this is a capability type.
+    bool isCapability() const {
+      return (SimpleTy >= MVT::FIRST_CAPABILITY_VALUETYPE) &&
+             (SimpleTy <= MVT::LAST_CAPABILITY_VALUETYPE);
+    }
+
     /// Return true if this is an overloaded type for TableGen.
     bool isOverloaded() const {
       switch (SimpleTy) {
diff --git a/llvm/utils/TableGen/Basic/VTEmitter.cpp b/llvm/utils/TableGen/Basic/VTEmitter.cpp
index 040f37c3a5e1e..53cb53296e7c2 100644
--- a/llvm/utils/TableGen/Basic/VTEmitter.cpp
+++ b/llvm/utils/TableGen/Basic/VTEmitter.cpp
@@ -132,6 +132,7 @@ void VTEmitter::run(raw_ostream &OS) {
     bool IsVector = VT->getValueAsBit("isVector");
     bool IsScalable = VT->getValueAsBit("isScalable");
     bool IsRISCVVecTuple = VT->getValueAsBit("isRISCVVecTuple");
+    bool IsCapability = VT->getValueAsBit("isCapability");
     int64_t NF = VT->getValueAsInt("NF");
     bool IsNormalValueType =  VT->getValueAsBit("isNormalValueType");
     int64_t NElem = IsVector ? VT->getValueAsInt("nElem") : 0;
@@ -152,6 +153,7 @@ void VTEmitter::run(raw_ostream &OS) {
     UpdateVTRange("INTEGER_VALUETYPE", Name, IsInteger && !IsVector);
     UpdateVTRange("FP_VALUETYPE", Name, IsFP && !IsVector);
     UpdateVTRange("VALUETYPE", Name, IsNormalValueType);
+    UpdateVTRange("CAPABILITY_VALUETYPE", Name, IsCapability);
 
     // clang-format off
     OS << "  GET_VT_ATTR("

resistor · 2025-09-03T15:03:41Z

Authorship probably shouldn’t claim to be solely you

Added co-author lines to the commit message. I thought about adding pointers to the downstream commits in the commit log, but maybe that's overkill?

resistor · 2025-09-03T15:06:21Z

Open question: should we omit the c256 type? I believe it is purely historical at this point?

Yeah it is not needed upstream.

I've removed it.

arichardson

LGTM

jrtc27 · 2025-09-03T16:01:52Z

Authorship probably shouldn’t claim to be solely you

Added co-author lines to the commit message. I thought about adding pointers to the downstream commits in the commit log, but maybe that's overkill?

LLVM is purely "Squash and merge" and configured to use the PR description as the commit message, so please update that.

nikic · 2025-09-03T16:03:40Z

Authorship probably shouldn’t claim to be solely you

Added co-author lines to the commit message. I thought about adding pointers to the downstream commits in the commit log, but maybe that's overkill?

LLVM is purely "Squash and merge" and configured to use the PR description as the commit message, so please update that.

GitHub takes the Co-authored-by lines from the commits when generating the squashed commit message, it's not necessary to add them to the PR description as well.

resistor · 2025-09-03T16:04:50Z

Either way, I've added them to the PR description :-)

resistor · 2025-09-03T16:11:48Z

Question to other LLVM maintainers, is the "Capability" name okay here, or does it need to be something more explicit like "CheriCapability"?

Good question. @nikic what's your view on nomenclature here?

s-barannikov · 2025-09-03T16:16:09Z

Can the new types be useful for other targets with similar architectural features? If so, I think it makes sense to give them more sensible names. Otherwise the term should be explained somewhere, as the current name does not give a clue that it is just a flavor of a pointer.

resistor · 2025-09-03T16:24:27Z

LGTM

@arichardson I pulled in the getEVTString implementation that goes with this change. Please re-review if possible.

jrtc27 · 2025-09-03T16:27:59Z

Can the new types be useful for other targets with similar architectural features? If so, I think it makes sense to give them more sensible names. Otherwise the term should be explained somewhere, as the current name does not give a clue that it is just a flavor of a pointer.

Well, CHERI is a portable concept; Morello is CHERI for Arm, the upcoming standard RVY and Microsoft/SCI's non-standard CHERIoT (hopefully converging with the upcoming standard) are CHERI for RISC-V, we've historically had CHERI-MIPS and we've also sketched CHERI-x86. So it's useful for other targets in the sense it's not a RISC-V thing, but how applicable to other kinds of pointers it is I don't know, we've never explored that.

resistor · 2025-09-03T16:32:56Z

Can the new types be useful for other targets with similar architectural features? If so, I think it makes sense to give them more sensible names. Otherwise the term should be explained somewhere, as the current name does not give a clue that it is just a flavor of a pointer.

There's a moderate amount of code in the CHERI downstreams that uses isCapability (or a transitive caller of it) to implement CHERI-aware code generation. From some quick searching, that code is mostly in SelectionDAG, but it does get used in a few other bits of llvm/lib/CodeGen as well.

While there's an extent to which the interpretation of a MVT is always up to the backend, given the above I don't think it would be advisable for a backend to attempt to overload these to represent some other tangential concept.

I'll go ahead and add a block comment clarifying that these represent CHERI capabilities. Can you take a look once I push it?

s-barannikov · 2025-09-03T16:39:57Z

I'll go ahead and add a block comment clarifying that these represent CHERI capabilities. Can you take a look once I push it?

Sure. I'm not really familiar with CHERI, so I'd prefer the comment clarify what these are.
I once worked on a target (not LLVM-based) that also has peculiar pointers (e.g. you can cast it to an integer, but the opposite isn't allowed in user mode, and it also encodes information about bounds of the object it originates from). Maybe it's the same thing. These fat pointers were called "pointer descriptors" there.

github-actions · 2025-09-03T16:42:01Z

✅ With the latest revision this PR passed the C/C++ code formatter.

arichardson · 2025-09-03T16:44:08Z

I'll go ahead and add a block comment clarifying that these represent CHERI capabilities. Can you take a look once I push it?

Sure. I'm not really familiar with CHERI, so I'd prefer the comment clarify what these are. I once worked on a target (not LLVM-based) that also has peculiar pointers (e.g. you can cast it to an integer, but the opposite isn't allowed in user mode, and it also encodes information about object bounds it originates from). Maybe it's the same thing. These fat pointers were called "pointer descriptors" there.

This is definitely quite similar in terms of the constraints that are imposed. Historically this MVT was actually called iFATPTR128, but in addition to fat-pointer semantics, CHERI capabilities also have the restriction that the rights they grant can only be reduced and never increased (monotonicity). We decided to rename it to capability to clarify this. I don't recall how much this difference actually matters at the SelectionDAG level, but I imagine there are a few cases where we do have to care.

s-barannikov · 2025-09-03T17:21:26Z

llvm/include/llvm/CodeGen/ValueTypes.h

@@ -228,6 +228,9 @@ namespace llvm {
      return isSimple() ? V.is2048BitVector() : isExtended2048BitVector();
    }

+    /// Return true if this is a capability type.
+    bool isCapability() const { return isSimple() ? V.isCapability() : false; }


We decided to rename it to capability to clarify this.

Sorry, this term doesn't make sense to me. The dictionary describes "capability" as "the power or ability to do something", but I don't see how a type or a pointer can be that power or ability to do something (and what is something?).

Can this be named isCheriPointer() maybe? And the individual types cheri_ptr64/cheri_ptr128?

isInteger means values of that type are integers. isCapability means values of that type are capabilities. This is capability in the capability-based security sense, which has a long history and is not some new term we've invented. See https://en.wikipedia.org/wiki/Capability-based_security for example:

A capability (known in some systems as a key) is a communicable, unforgeable token of authority

cheri_ptr64/cheri_ptr128 gets quite verbose and looks odd next to the iN counterparts. pAny for us is {iPTR, cPTR}, but you would be proposing having that mean {iPTR, cheri_ptrPTR}? That looks rather odd and silly.

I'll also throw in this link re: capability as existing terminology https://en.wikipedia.org/wiki/Capability-based_addressing

How about a compromise: keep using c64/c128 and change bool isCapability() to bool isCheriCapability()?

isInteger means values of that type are integers. isCapability means values of that type are capabilities. This is capability in the capability-based security sense, which has a long history and is not some new term we've invented. See https://en.wikipedia.org/wiki/Capability-based_security for example:

A capability (known in some systems as a key) is a communicable, unforgeable token of authority

Thanks, I wasn't aware of that.

cheri_ptr64/cheri_ptr128 gets quite verbose and looks odd next to the iN counterparts. pAny for us is {iPTR, cPTR}, but you would be proposing having that mean {iPTR, cheri_ptrPTR}? That looks rather odd and silly.

It doesn't look odd to me. We have x86mmx, a whole bunch of riscv_nxvMiNxK, aarch64svcount, spirvbuiltin, amdgpuBufferFatPointer, aarch64mfp8, etc. Contrary, c64 seems too short and ambiguous (the first impression would be that it is a complex number).

How about a compromise: keep using c64/c128 and change bool isCapability() to bool isCheriCapability()?

I'll be happy with any naming llvm maintainers are happy. I'm just suggesting variants.

It doesn't look odd to me. We have x86mmx, a whole bunch of riscv_nxvMiNxK, aarch64svcount, spirvbuiltin, amdgpuBufferFatPointer, aarch64mfp8, etc. Contrary, c64 seems too short and ambiguous (the first impression would be that it is a complex number).

The difference is that all those are niche things that have a very narrow use. CHERI capabilities are much more pervasive; every pointer becomes one, and we need the full flexibility of c64/c128/cPTR (naming TBD) because we want to be general across 32-bit and 64-bit architectures just as with i32/i64/iPTR. Unlike all of those, pAny also needs to cover our types. So yes, LLVM has all kinds of cumbersome names for things, but they're for things that are architecture-specific and limited in use, neither of which apply to us. If you are targeting a CHERI system, cN are just as core types as iN, and having uniformity between the two is a good idea IMO.

To be clear though, I'm not wed to cN/cPTR and nothing else, but cheri_ptr64/cheri_ptr128/cheri_ptrPTR is going to make all the CHERI code far uglier than it needs to be (and the cheri_ptr version of iPTR, cheri_ptrPTR, is particularly nonsensical as a name).

I've changed the predicate's name to isCheriCapability throughout. I think that "namespacing" the term capability to be specifically its CHERI-verse interpretation is helpful.

resistor marked this pull request as ready for review September 3, 2025 08:30

nikic reviewed Sep 3, 2025

View reviewed changes

llvm/include/llvm/CodeGen/ValueTypes.td Outdated Show resolved Hide resolved

arichardson reviewed Sep 3, 2025

View reviewed changes

resistor and others added 2 commits September 3, 2025 23:00

Remove c256 per review consensus.

5304f54

resistor force-pushed the mvt-cap branch from 817c16f to 5304f54 Compare September 3, 2025 15:02

llvmbot added the tablegen label Sep 3, 2025

arichardson approved these changes Sep 3, 2025

View reviewed changes

llvmbot added the llvm:codegen label Sep 3, 2025

arichardson approved these changes Sep 3, 2025

View reviewed changes

resistor force-pushed the mvt-cap branch from 17e0fed to 91d0302 Compare September 3, 2025 16:39

resistor added 2 commits September 4, 2025 00:43

Add getEVTString() implementation that goes with this patch logically.

2e409d8

Add comment explanatory comment.

81f938e

resistor force-pushed the mvt-cap branch from 91d0302 to 81f938e Compare September 3, 2025 16:43

s-barannikov reviewed Sep 3, 2025

View reviewed changes

Change isCapability to isCheriCapability

6d57989

[CodeGen, CHERI] Add capability types to MVT. #156616

Are you sure you want to change the base?

[CodeGen, CHERI] Add capability types to MVT. #156616

Conversation

resistor commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

resistor commented Sep 3, 2025

Uh oh!

resistor commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

resistor commented Sep 3, 2025

Uh oh!

Uh oh!

jrtc27 commented Sep 3, 2025

Uh oh!

jrtc27 commented Sep 3, 2025

Uh oh!

resistor commented Sep 3, 2025

Uh oh!

jrtc27 commented Sep 3, 2025

Uh oh!

arichardson left a comment

Choose a reason for hiding this comment

Uh oh!

llvmbot commented Sep 3, 2025

Uh oh!

resistor commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

resistor commented Sep 3, 2025

Uh oh!

arichardson left a comment

Choose a reason for hiding this comment

Uh oh!

jrtc27 commented Sep 3, 2025

Uh oh!

nikic commented Sep 3, 2025

Uh oh!

resistor commented Sep 3, 2025

Uh oh!

resistor commented Sep 3, 2025

Uh oh!

s-barannikov commented Sep 3, 2025

Uh oh!

resistor commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrtc27 commented Sep 3, 2025

Uh oh!

resistor commented Sep 3, 2025

Uh oh!

s-barannikov commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arichardson commented Sep 3, 2025

Uh oh!

s-barannikov Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jrtc27 Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

resistor Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

arichardson Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

s-barannikov Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

jrtc27 Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

resistor commented Sep 3, 2025 •

edited

Loading

resistor commented Sep 3, 2025 •

edited

Loading

resistor commented Sep 3, 2025 •

edited

Loading

resistor commented Sep 3, 2025 •

edited

Loading

s-barannikov commented Sep 3, 2025 •

edited

Loading

github-actions bot commented Sep 3, 2025 •

edited

Loading

s-barannikov Sep 3, 2025 •

edited

Loading