Skip to content

Conversation

jurahul
Copy link
Contributor

@jurahul jurahul commented Sep 4, 2025

Print the source location of the instruction definition in comment next to the enum value for each instruction. To make this more readable, change formatting of the instruction enums to be better aligned.

Example output:

    VLD4qWB_register_Asm_8                 = 573, // (ARMInstrNEON.td:8849)
    VMOVD0                                 = 574, // (ARMInstrNEON.td:6337)
    VMOVDcc                                = 575, // (ARMInstrVFP.td:2466)
    VMOVHcc                                = 576, // (ARMInstrVFP.td:2474)
    VMOVQ0                                 = 577, // (ARMInstrNEON.td:6341)

Print the source location of the instruction definition in comment
next to the enum value for each instruction. To make this more
readable, change formatting of the instruction enums to be better
aligned.
@jurahul jurahul force-pushed the nfc_inst_def_source_loc_in_comment branch from 9a5b639 to bd041cf Compare September 4, 2025 16:57
@topperc
Copy link
Collaborator

topperc commented Sep 4, 2025

How much file size does this add? These files are already pretty large. We removed comments from *GenDAGISel.inc by default years ago because they had a noticeable effect on compile time.

@jurahul
Copy link
Contributor Author

jurahul commented Sep 4, 2025

I didn't check because I didn't think it might be an issue. I can measure and post it here.

@jurahul
Copy link
Contributor Author

jurahul commented Sep 4, 2025

I guess also that this enum in particular is available throughout a target backend, so its included multiple times (once per .cpp file for that target)

@jurahul
Copy link
Contributor Author

jurahul commented Sep 4, 2025

                                       old        new     % increase
AArch64GenInstrInfo.inc            2655197    3258785     22.73%
AMDGPUGenInstrInfo.inc            23086409   25991133     12.58%
ARCGenInstrInfo.inc                 249828     290249     16.18%
ARMGenInstrInfo.inc                1544602    1782064     15.37%
AVRGenInstrInfo.inc                 185376     212756     14.77%
BPFGenInstrInfo.inc                 178051     205677     15.52%
CSKYGenInstrInfo.inc                237436     286008     20.46%
DirectXGenInstrInfo.inc             117364     133948     14.13%
HexagonGenInstrInfo.inc             998187    1208303     21.05%
LanaiGenInstrInfo.inc               147120     169398     15.14%
LoongArchGenInstrInfo.inc           525739     677574     28.88%
M68kGenInstrInfo.inc                486862     566026     16.26%
MSP430GenInstrInfo.inc              218135     255037     16.92%
MipsGenInstrInfo.inc                908964    1077987     18.60%
NVPTXGenInstrInfo.inc              1880227    2185637     16.24%
PPCGenInstrInfo.inc                 818220     992338     21.28%
R600GenInstrInfo.inc                228852     264643     15.64%
RISCVGenInstrInfo.inc              5542500    6326909     14.15%
SPIRVGenInstrInfo.inc               275777     315732     14.49%
SparcGenInstrInfo.inc               242143     291428     20.35%
SystemZGenInstrInfo.inc             897855    1099037     22.41%
VEGenInstrInfo.inc                 2403221    2945974     22.58%
WebAssemblyGenInstrInfo.inc         788063     911669     15.68%
X86GenInstrInfo.inc                8169180    9305799     13.91%
XCoreGenInstrInfo.inc               186533     215773     15.68%
XtensaGenInstrInfo.inc              214711     248995     15.97%

So average is a 20% increase in the size.

@jurahul jurahul marked this pull request as ready for review September 4, 2025 18:51
@jurahul jurahul requested a review from arsenm September 4, 2025 18:51
@jurahul jurahul requested a review from topperc September 4, 2025 18:51
@llvmbot
Copy link
Member

llvmbot commented Sep 4, 2025

@llvm/pr-subscribers-tablegen

Author: Rahul Joshi (jurahul)

Changes

Print the source location of the instruction definition in comment next to the enum value for each instruction. To make this more readable, change formatting of the instruction enums to be better aligned.

Example output:

    VLD4qWB_register_Asm_8                 = 573, // (ARMInstrNEON.td:8849)
    VMOVD0                                 = 574, // (ARMInstrNEON.td:6337)
    VMOVDcc                                = 575, // (ARMInstrVFP.td:2466)
    VMOVHcc                                = 576, // (ARMInstrVFP.td:2474)
    VMOVQ0                                 = 577, // (ARMInstrNEON.td:6341)

Full diff: https://github.com/llvm/llvm-project/pull/156927.diff

2 Files Affected:

  • (modified) llvm/utils/TableGen/Common/CodeGenInstruction.h (+2)
  • (modified) llvm/utils/TableGen/InstrInfoEmitter.cpp (+20-6)
diff --git a/llvm/utils/TableGen/Common/CodeGenInstruction.h b/llvm/utils/TableGen/Common/CodeGenInstruction.h
index 9372614f26e1a..ed0bfa7098eb7 100644
--- a/llvm/utils/TableGen/Common/CodeGenInstruction.h
+++ b/llvm/utils/TableGen/Common/CodeGenInstruction.h
@@ -320,6 +320,8 @@ class CodeGenInstruction {
     return RV && isa<DagInit>(RV->getValue());
   }
 
+  StringRef getName() const { return TheDef->getName(); }
+
 private:
   bool isOperandImpl(StringRef OpListName, unsigned i,
                      StringRef PropertyName) const;
diff --git a/llvm/utils/TableGen/InstrInfoEmitter.cpp b/llvm/utils/TableGen/InstrInfoEmitter.cpp
index 26d93fc13c9ba..afbd64e4c7012 100644
--- a/llvm/utils/TableGen/InstrInfoEmitter.cpp
+++ b/llvm/utils/TableGen/InstrInfoEmitter.cpp
@@ -25,6 +25,8 @@
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/Support/Casting.h"
+#include "llvm/Support/FormattedStream.h"
+#include "llvm/Support/SourceMgr.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/TableGen/Error.h"
 #include "llvm/TableGen/Record.h"
@@ -1289,8 +1291,9 @@ void InstrInfoEmitter::emitRecord(
 
 // emitEnums - Print out enum values for all of the instructions.
 void InstrInfoEmitter::emitEnums(
-    raw_ostream &OS,
+    raw_ostream &RawOS,
     ArrayRef<const CodeGenInstruction *> NumberedInstructions) {
+  formatted_raw_ostream OS(RawOS);
   OS << "#ifdef GET_INSTRINFO_ENUM\n";
   OS << "#undef GET_INSTRINFO_ENUM\n";
 
@@ -1302,18 +1305,29 @@ void InstrInfoEmitter::emitEnums(
 
   OS << "namespace llvm::" << Namespace << " {\n";
 
+  auto II = llvm::max_element(
+      NumberedInstructions,
+      [](const CodeGenInstruction *InstA, const CodeGenInstruction *InstB) {
+        return InstA->getName().size() < InstB->getName().size();
+      });
+  size_t MaxNameSize = (*II)->getName().size();
+
   OS << "  enum {\n";
-  for (const CodeGenInstruction *Inst : NumberedInstructions)
-    OS << "    " << Inst->TheDef->getName()
-       << "\t= " << Target.getInstrIntValue(Inst->TheDef) << ",\n";
+  for (const CodeGenInstruction *Inst : NumberedInstructions) {
+    OS << "    " << Inst->TheDef->getName();
+    OS.PadToColumn(MaxNameSize + 5);
+    OS << " = " << Target.getInstrIntValue(Inst->TheDef) << ", // (";
+    OS << SrcMgr.getFormattedLocationNoOffset(Inst->TheDef->getLoc().front())
+       << ")\n";
+  }
   OS << "    INSTRUCTION_LIST_END = " << NumberedInstructions.size() << '\n';
-  OS << "  };\n\n";
+  OS << "  };\n";
   OS << "} // end namespace llvm::" << Namespace << '\n';
   OS << "#endif // GET_INSTRINFO_ENUM\n\n";
 
   OS << "#ifdef GET_INSTRINFO_SCHED_ENUM\n";
   OS << "#undef GET_INSTRINFO_SCHED_ENUM\n";
-  OS << "namespace llvm::" << Namespace << "::Sched {\n\n";
+  OS << "namespace llvm::" << Namespace << "::Sched {\n";
   OS << "  enum {\n";
   auto ExplictClasses = SchedModels.explicitSchedClasses();
   for (const auto &[Idx, Class] : enumerate(ExplictClasses))

@jurahul
Copy link
Contributor Author

jurahul commented Sep 4, 2025

How big a concern is this? Or is it one of those deaths by a thousand cuts cases adding to the incremental creep?

@topperc
Copy link
Collaborator

topperc commented Sep 4, 2025

How big a concern is this? Or is it one of those deaths by a thousand cuts cases adding to the incremental creep?

Probably just death by a thousand cuts. There appear to be several other silly things we could remove in this file to mitigate if need to.

I've never had much trouble finding instructions with grep on the targets I've worked on, but maybe other targets are worse? It's digging the classes and multiclasses needed to understand an instruction that I usually struggle with.

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@s-barannikov
Copy link
Contributor

I've never had much trouble finding instructions with grep on the targets I've worked on, but maybe other targets are worse? It's digging the classes and multiclasses needed to understand an instruction that I usually struggle with.

I do llvm-tblgen -I llvm/include -I llvm/lib/Target/SomeTarget SomeTarget.td if I need to know something specific about an instruction.

@jurahul
Copy link
Contributor Author

jurahul commented Sep 4, 2025

I've never had much trouble finding instructions with grep on the targets I've worked on, but maybe other targets are worse? It's digging the classes and multiclasses needed to understand an instruction that I usually struggle with.

I do llvm-tblgen -I llvm/include -I llvm/lib/Target/SomeTarget SomeTarget.td if I need to know something specific about an instruction.

Right, though not all folks may be familiar with this and its easier and more readily available if its right there.

it's digging the classes and multiclasses needed to understand an instruction that I usually struggle with.

+1, esp if these are auto-generated (which I am sure a lot of downstream backends do).

@topperc
Copy link
Collaborator

topperc commented Sep 4, 2025

I've never had much trouble finding instructions with grep on the targets I've worked on, but maybe other targets are worse? It's digging the classes and multiclasses needed to understand an instruction that I usually struggle with.

I do llvm-tblgen -I llvm/include -I llvm/lib/Target/SomeTarget SomeTarget.td if I need to know something specific about an instruction.

Right, though not all folks may be familiar with this and its easier and more readily available if its right there.

That command doesn’t print any location information anyway.

@jurahul
Copy link
Contributor Author

jurahul commented Sep 4, 2025

I've never had much trouble finding instructions with grep on the targets I've worked on, but maybe other targets are worse? It's digging the classes and multiclasses needed to understand an instruction that I usually struggle with.

I do llvm-tblgen -I llvm/include -I llvm/lib/Target/SomeTarget SomeTarget.td if I need to know something specific about an instruction.

Right, though not all folks may be familiar with this and its easier and more readily available if its right there.

That command doesn’t print any location information anyway.

Right, you need --print-detailed-records to get the location

@s-barannikov
Copy link
Contributor

To be clear, I have no objections to this change.

@jurahul jurahul merged commit 78fbca4 into llvm:main Sep 4, 2025
9 checks passed
@jurahul jurahul deleted the nfc_inst_def_source_loc_in_comment branch September 4, 2025 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants