Skip to content

Conversation

mylai-mtk
Copy link
Contributor

@mylai-mtk mylai-mtk commented Aug 5, 2025

Expected Behavior:

Stop from adopting control flow integrity (CFI) behaviors, e.g. generating LPAD/LUI insns, use software guarded jumps, etc., by testing if the target has the Zicfilp extension and instead, do CFI if the LLVM module has all of these flags:

  • cf-protection-branch: Needs to be a non-zero integer (which means true)
  • cf-branch-label-scheme: Needs to be unlabeled or func-sig
    • // LPAD generation is only supported with unlabeled as of now

Context:

In clang, Zicfilp-based CFI (the unlabeled scheme) can now be enabled by giving the -fcf-protection=branch -mcf-branch-label-scheme=unlabeled options. With these options, the clang frontend adds the above-mentioned flags to LLVM modules. Here we want to align compiler CFI behaviors to be enabled by the semantics of those LLVM module flags, instead of relying on the inaccurate indicator of whether the Zicfilp extension is available, so the toolchain's behavior is more streamlined and expected.

Also, since LPAD insns can be executed regardless of whether Zicfilp extension is available on target or not (due to LPAD insn being encoded as a standard hint insn), clang accepts the above-mentioned CLI options even if Zicfilp is not available and expects backend to still generate LPAD insns. This patch enables LPAD insn generation and other CFI-tailored compiler changes in such cases.

Expected Behavior:

Stop codegening LPAD insns by testing if the target has the Zicfilp extension
and instead, codegen LPAD insns if the LLVM module has all of these flags:

+ `cf-protection-branch`: Needs to be a non-zero integer (which means `true`)
+ `cf-branch-label-scheme`: Needs to be `unlabeled`

Context:

In clang, Zicfilp-based control flow integrity (the `unlabeled` scheme) can now
be enabled by giving the `-fcf-protection=branch
-mcf-branch-label-scheme=unlabeled` options. With these options, the clang
frontend adds the above-mentioned flags to LLVM modules. Here we want to align
LPAD insn codegen to be enabled by the semantics of those LLVM module flags,
instead of relying on the inaccurate indicator of whether the Zicfilp extension
is available, so the toolchain's behavior is more streamlined and expected.

Also, since LPAD insns can be executed regardless of whether Zicfilp is
available in target or not (due to LPAD insn being encoded as a standard hint
insn), clang accepts the above-mentioned CLI options even if Zicfilp is not
enabled and expects backend to still generate LPAD insns. This patch enables
LPAD insn generation in such cases.
@llvmbot
Copy link
Member

llvmbot commented Aug 5, 2025

@llvm/pr-subscribers-llvm-support

@llvm/pr-subscribers-backend-risc-v

Author: Ming-Yi Lai (mylai-mtk)

Changes

Expected Behavior:

Stop codegening LPAD insns by testing if the target has the Zicfilp extension and instead, codegen LPAD insns if the LLVM module has all of these flags:

  • cf-protection-branch: Needs to be a non-zero integer (which means true)
  • cf-branch-label-scheme: Needs to be unlabeled

Context:

In clang, Zicfilp-based control flow integrity (the unlabeled scheme) can now be enabled by giving the -fcf-protection=branch -mcf-branch-label-scheme=unlabeled options. With these options, the clang frontend adds the above-mentioned flags to LLVM modules. Here we want to align LPAD insn codegen to be enabled by the semantics of those LLVM module flags, instead of relying on the inaccurate indicator of whether the Zicfilp extension is available, so the toolchain's behavior is more streamlined and expected.

Also, since LPAD insns can be executed regardless of whether Zicfilp is available in target or not (due to LPAD insn being encoded as a standard hint insn), clang accepts the above-mentioned CLI options even if Zicfilp is not enabled and expects backend to still generate LPAD insns. This patch enables LPAD insn generation in such cases.


Full diff: https://github.com/llvm/llvm-project/pull/152121.diff

3 Files Affected:

  • (modified) llvm/include/llvm/Support/RISCVISAUtils.h (+19)
  • (modified) llvm/lib/Target/RISCV/RISCVIndirectBranchTracking.cpp (+20-1)
  • (modified) llvm/test/CodeGen/RISCV/lpad.ll (+5)
diff --git a/llvm/include/llvm/Support/RISCVISAUtils.h b/llvm/include/llvm/Support/RISCVISAUtils.h
index 165bb08d66431..e2dd833de74fd 100644
--- a/llvm/include/llvm/Support/RISCVISAUtils.h
+++ b/llvm/include/llvm/Support/RISCVISAUtils.h
@@ -14,7 +14,9 @@
 #define LLVM_SUPPORT_RISCVISAUTILS_H
 
 #include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/StringSwitch.h"
 #include "llvm/Support/Compiler.h"
+#include <cassert>
 #include <map>
 #include <string>
 
@@ -43,6 +45,23 @@ struct ExtensionComparator {
 typedef std::map<std::string, ExtensionVersion, ExtensionComparator>
     OrderedExtensionMap;
 
+enum class ZicfilpLabelSchemeKind {
+  Invalid,
+  Unlabeled,
+  FuncSig,
+};
+
+// See clang::getCFBranchLabelSchemeFlagVal() for possible CFBranchLabelScheme
+inline ZicfilpLabelSchemeKind
+getZicfilpLabelScheme(const StringRef CFBranchLabelScheme) {
+  const auto Ret = StringSwitch<ZicfilpLabelSchemeKind>(CFBranchLabelScheme)
+                       .Case("unlabeled", ZicfilpLabelSchemeKind::Unlabeled)
+                       .Case("func-sig", ZicfilpLabelSchemeKind::FuncSig)
+                       .Default(ZicfilpLabelSchemeKind::Invalid);
+  assert(Ret != ZicfilpLabelSchemeKind::Invalid);
+  return Ret;
+}
+
 } // namespace RISCVISAUtils
 
 } // namespace llvm
diff --git a/llvm/lib/Target/RISCV/RISCVIndirectBranchTracking.cpp b/llvm/lib/Target/RISCV/RISCVIndirectBranchTracking.cpp
index 43621b8f0f33d..554f7d5494731 100644
--- a/llvm/lib/Target/RISCV/RISCVIndirectBranchTracking.cpp
+++ b/llvm/lib/Target/RISCV/RISCVIndirectBranchTracking.cpp
@@ -19,11 +19,14 @@
 #include "llvm/CodeGen/MachineFunctionPass.h"
 #include "llvm/CodeGen/MachineInstrBuilder.h"
 #include "llvm/CodeGen/MachineModuleInfo.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Support/RISCVISAUtils.h"
 
 #define DEBUG_TYPE "riscv-indrect-branch-tracking"
 #define PASS_NAME "RISC-V Indirect Branch Tracking"
 
 using namespace llvm;
+using namespace llvm::RISCVISAUtils;
 
 cl::opt<uint32_t> PreferredLandingPadLabel(
     "riscv-landing-pad-label", cl::ReallyHidden,
@@ -64,9 +67,25 @@ static void emitLpad(MachineBasicBlock &MBB, const RISCVInstrInfo *TII,
 bool RISCVIndirectBranchTracking::runOnMachineFunction(MachineFunction &MF) {
   const auto &Subtarget = MF.getSubtarget<RISCVSubtarget>();
   const RISCVInstrInfo *TII = Subtarget.getInstrInfo();
-  if (!Subtarget.hasStdExtZicfilp())
+
+  const Module *const M = MF.getFunction().getParent();
+  if (!M)
+    return false;
+  if (const Metadata *const Flag = M->getModuleFlag("cf-protection-branch");
+      !Flag || mdconst::extract<ConstantInt>(Flag)->isZero())
     return false;
 
+  StringRef CFBranchLabelScheme;
+  if (const Metadata *const MD = M->getModuleFlag("cf-branch-label-scheme"))
+    CFBranchLabelScheme = cast<MDString>(MD)->getString();
+  else
+    report_fatal_error("missing cf-branch-label-scheme module flag");
+
+  const ZicfilpLabelSchemeKind Scheme =
+      getZicfilpLabelScheme(CFBranchLabelScheme);
+  if (Scheme != ZicfilpLabelSchemeKind::Unlabeled)
+    report_fatal_error("unsupported cf-branch-label-scheme module flag");
+
   uint32_t FixedLabel = 0;
   if (PreferredLandingPadLabel.getNumOccurrences() > 0) {
     if (!isUInt<20>(PreferredLandingPadLabel))
diff --git a/llvm/test/CodeGen/RISCV/lpad.ll b/llvm/test/CodeGen/RISCV/lpad.ll
index 93eda6f10eedb..f69b4dad068b5 100644
--- a/llvm/test/CodeGen/RISCV/lpad.ll
+++ b/llvm/test/CodeGen/RISCV/lpad.ll
@@ -289,3 +289,8 @@ define void @interrupt() "interrupt"="machine" {
 ; FIXED-ONE-NEXT:    mret
   ret void
 }
+
+!llvm.module.flags = !{!0, !1}
+
+!0 = !{i32 8, !"cf-protection-branch", i32 1}
+!1 = !{i32 1, !"cf-branch-label-scheme", !"unlabeled"}

if (const Metadata *const MD = M->getModuleFlag("cf-branch-label-scheme"))
CFBranchLabelScheme = cast<MDString>(MD)->getString();
else
report_fatal_error("missing cf-branch-label-scheme module flag");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use reportFatalUsageError which won't generate a stack trace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is inlined to somewhere (getSubtargetImpl()) that have multiple possible call paths, so I think it's better to print stack for this one.

@topperc
Copy link
Collaborator

topperc commented Aug 5, 2025

Do you plan to also address the code that uses Subtarget.hasStdExtZicfilp() in RISCVISelLowering and these patterns in RISCVInstrInfo.td

let Predicates = [NoStdExtZicfilp],                                              
    isBarrier = 1, isBranch = 1, isIndirectBranch = 1, isTerminator = 1 in       
def PseudoBRIND : Pseudo<(outs), (ins GPRJALR:$rs1, simm12:$imm12), []>,         
                  PseudoInstExpansion<(JALR X0, GPR:$rs1, simm12:$imm12)>;       
                                                                                 
let Predicates = [HasStdExtZicfilp],                                             
    isBarrier = 1, isBranch = 1, isIndirectBranch = 1, isTerminator = 1 in {     
def PseudoBRINDNonX7 : Pseudo<(outs), (ins GPRJALRNonX7:$rs1, simm12:$imm12), []>,
                       PseudoInstExpansion<(JALR X0, GPR:$rs1, simm12:$imm12)>;  
def PseudoBRINDX7 : Pseudo<(outs), (ins GPRX7:$rs1, simm12:$imm12), []>,         
                    PseudoInstExpansion<(JALR X0, GPR:$rs1, simm12:$imm12)>;     
}                                                                                
                                                                                 
// For Zicfilp, need to avoid using X7/T2 for indirect branches which need       
// landing pad.                                                                  
let Predicates = [HasStdExtZicfilp] in {                                         
def : Pat<(brind GPRJALRNonX7:$rs1), (PseudoBRINDNonX7 GPRJALRNonX7:$rs1, 0)>;   
def : Pat<(brind (add GPRJALRNonX7:$rs1, simm12:$imm12)),                        
          (PseudoBRINDNonX7 GPRJALRNonX7:$rs1, simm12:$imm12)>;                  
                                                                                 
def : Pat<(riscv_sw_guarded_brind GPRX7:$rs1), (PseudoBRINDX7 GPRX7:$rs1, 0)>;   
def : Pat<(riscv_sw_guarded_brind (add GPRX7:$rs1, simm12:$imm12)),              
          (PseudoBRINDX7 GPRX7:$rs1, simm12:$imm12)>;                            
}                                                                                
                                                                                 
let Predicates = [NoStdExtZicfilp] in {                                          
def : Pat<(brind GPRJALR:$rs1), (PseudoBRIND GPRJALR:$rs1, 0)>;                  
def : Pat<(brind (add GPRJALR:$rs1, simm12:$imm12)),                             
          (PseudoBRIND GPRJALR:$rs1, simm12:$imm12)>;                            
} 

@mylai-mtk
Copy link
Contributor Author

@topperc

Do you plan to also address the code that uses Subtarget.hasStdExtZicfilp() in RISCVISelLowering and these patterns in RISCVInstrInfo.td

It looks like I have to, otherwise tests would just fail. As a result, I reviewed all behaviors that depended on hasStdExtZicfilp() and changed them to hasZicfilpCFI(), which is a new predicate also defined on the RISCVSubtarget class but has its returned value determined solely by the Function's (Module's) cf flags (Zicfilp ext availability is not taken into consideration). I guess I would need to rewrite the PR commit messages as the modified scope is now significantly larger.

@mylai-mtk mylai-mtk changed the title [RISCV][Zicfilp] Codegen LPAD insns by looking at module flags [RISCV][Zicfilp] Enable Zicfilp CFI compiler behaviors by looking at module flags Aug 7, 2025
@lenary
Copy link
Member

lenary commented Aug 7, 2025

As soon as you have module flags, you need to ensure that when LTO merges two modules, that you get the right errors or merge behaviour.

I understand that, unlike subtarget features, when linking all modules (and functions inside that module) need the same protection configuration, right? This semantic is probably the easiest to implement.

Please write a test for this and put it in llvm/test/LTO/RISCV

@mylai-mtk
Copy link
Contributor Author

@lenary

you need to ensure that when LTO merges two modules, that you get the right errors or merge behaviour. ... Please write a test for this ...

Thanks for pointing this out. I had added the LTO tests.

@mylai-mtk
Copy link
Contributor Author

Update: Model ZICFILP-func-sig and ZICFILP-unlabeled as real target features

Reason: During my prototyping of some ELF header field emission (cf. riscv-non-isa/riscv-elf-psabi-doc#474), and I found that in the emission function only "real" target features could be accessed, so I think it's better to just consolidate the module flags into real target features.

@mylai-mtk mylai-mtk force-pushed the zicfilp-codegen-by-cf-flag branch from e0044a7 to f9f6bba Compare August 21, 2025 09:03
…sZicfilpCFI() even if new Subtarget is not created
Copy link
Member

@lenary lenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is broadly correct, but I would like another reviewer.

My comments below are details, not a fundamental problem with the overall approach.

Comment on lines 1805 to 1812
def FeatureZicfilpUnlabeled
: SubtargetFeature<
"zicfilp-unlabeled", "HasZicfilpUnlabeled", "true",
"Enforce forward-edge control-flow integrity with ZICFILP-unlabeled">;
def FeatureZicfilpFuncSig
: SubtargetFeature<
"zicfilp-func-sig", "HasZicfilpFuncSig", "true",
"Enforce forward-edge control-flow integrity with ZICFILP-func-sig">;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do a target feature that's an enum, not a boolean. See the RISCVProcFamily enum for how this is done - fewer methods are automatically generated, but it's clearer IMO.

Copy link
Contributor Author

@mylai-mtk mylai-mtk Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if the "enum" (or in the TableGen backend implementation, "integer") is a proper fit here. It looks like the integer version of SubtargetFeature actually models features with "leveled" semantics, e.g. -target-feature +feature-10 would override -target-feature +feature-0. This feels wrong as the leveled semantics does not suit ZicfilpCFI in that different options should err instead of override. The current TableGen usages of this integer-based SubtargetFeature may work, but I don't think its semantics guarantees future soundness.

// Value - Value the XXXSubtarget field to be set to by feature.
//
// A value of "true" or "false" implies the field is a bool. Otherwise,
// it is assumed to be an integer. the integer value may be the name of an
// enum constant. If multiple features use the same integer field, the
// field will be set to the maximum value of all enabled features that
// share the field.
//
string Value = v;

(

// Value - Value the XXXSubtarget field to be set to by feature.
)

@mylai-mtk
Copy link
Contributor Author

Hi, I have been modeling the cf-protection module flags as target features, and I found some problems so I'm not sure whether I should continue pursuing this path:

By modeling Zicfilp CFI module flags as subtarget features, I hope the following could be achieved:

  • Provide easy access to whether Zicfilp CFI is enabled.

-> References to MCSubtargetInfo is more widely available throughout the code base; Modules, on the hand, are somehow harder to obtain.


  • Avoid repeated parsing of module flags. (I mean no duplicated and scattered codes)
  • Provide correct and immutable indication of whether Zicfilp CFI is enabled since the very beginning of the backend pipeline.

-> By enforcing the consistency between the feature bits and module flags when creating RISCVSubtargets, these 2 goals could be achieved.


The problem arose when I was trying to handle the module-wide application of the Zicfilp CFI module flags: In TargetMachine, there's also a feature bitmap contained in the STI field (typed MCSubtargetInfo). This TM.STI does not belong to any Function, and is created without the influence of a specific Module instance, so I cannot sync (enforce) the information of module flags into it at creation, thus the above goal is somehow, broken.

In my experience with LLVM, I always assumed that TM.STI->hasFeature(XXX) represents whether a feature is enabled for the given Module we're compiling. However, the inability to sync module flags to TM.STI triggered a challenge to this assumption: Could a TargetMachine instance be reused to compile multiple Modules, and TM.STI actually represents what the target machine is capable of, not how it behaves? If this is indeed the correct understanding of TargetMachine, then it would be natural that no module flags could be synced to TM.STI, since module flags controls the Module's behavior, but not necessarily the TargetMachine's, and different Modules compiled with the same TargetMachine instance could exhibit different behaviors. It would further declare that my goal of "Avoid repeated parsing of module flags" is not suitable to implement with TM.STI, since it does not convey Module behaviors. (and thus parsing module flags when implementing module-wide behavior is preferred)

So in a nutshell, I want to ask:

  • Whether modeling per-Function behavior with RISCVSubtarget is suitable?
  • Whether modeling per-Module behavior with TM.STI is not suitable, since TM.STI is never intended to be in sync with and serve only one specific Module?

(By the way, I considered recommending this approach to Zicfiss in #152251, but given the uncertainty I'm currently facing, I think it's better that I get approval for this approach beforehand.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants