Skip to content

Conversation

bdunkin
Copy link

@bdunkin bdunkin commented Jun 11, 2025

This change rewrites the implementation of AlignArrayOfStructures to address current limitations and formatting bugs. It now also handles non-rectangular arrays (arrays of arrays where the inner arrays do not all have the same number of elements).

There were a couple significant sources of formatting bugs in the previous implementation:

  • Modifying newlines during alignment instead of during the line break optimization pass. This caused many unexpected formatings because it overwrote the result of the line break optimization pass, ignoring almost all formatting rules.
  • Modifying spaces before computing alignment information. This caused cases like a single row to change format, even when there was no alignment to be made.
  • Not keeping all the bookkeeping around token columns up to date when alignment was made. This was not unique to AlignArrayOfStructures, and this change fixes all cases. This meant trailing comments, and escaped newlines would not always align correctly.
  • Incorrect handling of tabs for alignment within a line, as opposed to the beginning of a line.

All of these issues have been fixed, however to do so there is a significant backwards incompatible change to how rows with line breaks are handled. Previously the columns before and after the line break would attempt to be aligned with other rows. With this change, all elements in such a row will instead be placed on their own line, and no alignment will be attempted, similar to if a trailing comma is added to a list. Trying to align columns in this case had too many edge cases. Some could not reasonably be addressed without knowing alignment information during line break optimization, which is not possible because it would require knowing the formatting of future tokens. See the changed tests for examples of the difference.

Fixed Issues (some are fixed only in that they no longer produce wacky alignment, rather than what the user wished it produced):
#138151 #85937 #76803 #76803 #76717 #55477 #55154 #53442 #55485 #55477 #54354 #51766 #86439 #142072 #89721

Attribution Note - I have been authorized to contribute this change on behalf of my company: ArenaNet LLC

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

"};",
Style);
verifyFormat("struct test demo[] = {\n"
" {1, 2, 3, 4, 5},\n"
" {3, 4, 5},\n"
" {1, 2, 3, 4, 5 },\n"
Copy link
Author

@bdunkin bdunkin Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the 5 was actually not properly aligned in this test before as the column in the last row has two characters.

@bdunkin bdunkin marked this pull request as ready for review June 11, 2025 20:36
@llvmbot
Copy link
Member

llvmbot commented Jun 11, 2025

@llvm/pr-subscribers-clang-format

Author: Ben Dunkin (bdunkin)

Changes

This change implements AlignArrayOfStructures for non-rectangular arrays (arrays of arrays where the inner arrays do not all have the same number of elements).

It is largely backwards compatible with existing rectangular arrays, with one exception (see here). Obviously, this is not backwards compatible with existing non-rectangular arrays which would have been skipped, but will now be formatted.

The tests for non-rectangular have been updated because they did not place the end brace in the same column for all rows like is done for rectangular arrays. Some new tests were also added to cover some missing, but interesting cases.

Attribution Note - I have been authorized to contribute this change on behalf of my company: ArenaNet LLC


Patch is 28.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/143781.diff

4 Files Affected:

  • (modified) clang/lib/Format/TokenAnnotator.cpp (+22-1)
  • (modified) clang/lib/Format/WhitespaceManager.cpp (+235-119)
  • (modified) clang/lib/Format/WhitespaceManager.h (+10-72)
  • (modified) clang/unittests/Format/FormatTest.cpp (+53-23)
diff --git a/clang/lib/Format/TokenAnnotator.cpp b/clang/lib/Format/TokenAnnotator.cpp
index aed1672afac66..636bf2efbb81d 100644
--- a/clang/lib/Format/TokenAnnotator.cpp
+++ b/clang/lib/Format/TokenAnnotator.cpp
@@ -4251,7 +4251,28 @@ FormatToken *TokenAnnotator::calculateInitializerColumnList(
       CurrentToken = CurrentToken->Next;
       if (!CurrentToken)
         break;
-      CurrentToken->StartsColumn = true;
+
+      // Right (closing) braces should not count as starting a column because
+      // they are aligned using separate logic.
+
+      // Note: This uses startsSequence() so that trailing comments are skipped
+      // when checking if the token after a comma/l-brace is a r_brace. We can't
+      // just ignore comments in general, because an inline comment with
+      // something else after it should still count as starting a column.
+      // IE:
+      //
+      //        { // a
+      //          4
+      //        }
+      //
+      //   vs.
+      //
+      //        { /* a */ 4 }
+      //
+      // In the first case, the comment does not start a column, but in the
+      // second it does
+      CurrentToken->StartsColumn = !CurrentToken->startsSequence(tok::r_brace);
+
       CurrentToken = CurrentToken->Previous;
     }
     CurrentToken = CurrentToken->Next;
diff --git a/clang/lib/Format/WhitespaceManager.cpp b/clang/lib/Format/WhitespaceManager.cpp
index cfaadf07edfd0..a458e3610db66 100644
--- a/clang/lib/Format/WhitespaceManager.cpp
+++ b/clang/lib/Format/WhitespaceManager.cpp
@@ -1318,75 +1318,141 @@ void WhitespaceManager::alignArrayInitializers(unsigned Start, unsigned End) {
 
 void WhitespaceManager::alignArrayInitializersRightJustified(
     CellDescriptions &&CellDescs) {
-  if (!CellDescs.isRectangular())
+
+  const int ColumnCount = CellDescs.ColumnStartingCellIndices.size();
+  if (ColumnCount < 2)
     return;
 
   const int BracePadding = Style.Cpp11BracedListStyle ? 0 : 1;
+  auto &ColumnStartingIndices = CellDescs.ColumnStartingCellIndices;
   auto &Cells = CellDescs.Cells;
-  // Now go through and fixup the spaces.
-  auto *CellIter = Cells.begin();
-  for (auto i = 0U; i < CellDescs.CellCounts[0]; ++i, ++CellIter) {
-    unsigned NetWidth = 0U;
-    if (isSplitCell(*CellIter))
-      NetWidth = getNetWidth(Cells.begin(), CellIter, CellDescs.InitialSpaces);
-    auto CellWidth = getMaximumCellWidth(CellIter, NetWidth);
-
-    if (Changes[CellIter->Index].Tok->is(tok::r_brace)) {
-      // So in here we want to see if there is a brace that falls
-      // on a line that was split. If so on that line we make sure that
-      // the spaces in front of the brace are enough.
-      const auto *Next = CellIter;
-      do {
-        const FormatToken *Previous = Changes[Next->Index].Tok->Previous;
-        if (Previous && Previous->isNot(TT_LineComment)) {
-          Changes[Next->Index].Spaces = BracePadding;
-          Changes[Next->Index].NewlinesBefore = 0;
-        }
-        Next = Next->NextColumnElement;
-      } while (Next);
-      // Unless the array is empty, we need the position of all the
-      // immediately adjacent cells
-      if (CellIter != Cells.begin()) {
-        auto ThisNetWidth =
-            getNetWidth(Cells.begin(), CellIter, CellDescs.InitialSpaces);
-        auto MaxNetWidth = getMaximumNetWidth(
-            Cells.begin(), CellIter, CellDescs.InitialSpaces,
-            CellDescs.CellCounts[0], CellDescs.CellCounts.size());
-        if (ThisNetWidth < MaxNetWidth)
-          Changes[CellIter->Index].Spaces = (MaxNetWidth - ThisNetWidth);
-        auto RowCount = 1U;
-        auto Offset = std::distance(Cells.begin(), CellIter);
-        for (const auto *Next = CellIter->NextColumnElement; Next;
-             Next = Next->NextColumnElement) {
-          if (RowCount >= CellDescs.CellCounts.size())
-            break;
-          auto *Start = (Cells.begin() + RowCount * CellDescs.CellCounts[0]);
-          auto *End = Start + Offset;
-          ThisNetWidth = getNetWidth(Start, End, CellDescs.InitialSpaces);
-          if (ThisNetWidth < MaxNetWidth)
-            Changes[Next->Index].Spaces = (MaxNetWidth - ThisNetWidth);
-          ++RowCount;
-        }
-      }
-    } else {
-      auto ThisWidth =
-          calculateCellWidth(CellIter->Index, CellIter->EndIndex, true) +
-          NetWidth;
-      if (Changes[CellIter->Index].NewlinesBefore == 0) {
-        Changes[CellIter->Index].Spaces = (CellWidth - (ThisWidth + NetWidth));
-        Changes[CellIter->Index].Spaces += (i > 0) ? 1 : BracePadding;
+
+  // Calculate column widths
+  SmallVector<unsigned> ColumnWidths;       // Widths from the previous column
+  SmallVector<unsigned> SummedColumnWidths; // Widths from the start of the row
+
+  unsigned CurrentWidth = 0;
+  for (unsigned CellIndex : ColumnStartingIndices) {
+    const CellDescription *current = &Cells[CellIndex];
+
+    // We keep track of the width of split cells separately because if all cells
+    // in a column are split, and none of them are wider than the previous
+    // column, we need to reset the "current width" so subsequent columns are
+    // not moved out farther than expected.
+    //
+    // Given this, where the string was split because of the column limit:
+    //     {
+    //      { test1, test2, "abc "
+    //        "def", e },
+    //      { test3, test4, "abc "
+    //        "def", e },
+    //     }
+    //
+    // The 'e' tokens should not be pushed out like this:
+    //     {
+    //      { test1, test2, "abc "
+    //        "def",        e },
+    //      { test3, test4, "abc "
+    //        "def",        e },
+    //     }
+    unsigned SplitMaxWidth = 0;
+    unsigned MaxWidth = 0;
+    while (current != nullptr) {
+      unsigned CellWidth = calculateCellWidth(*current);
+
+      // If there is a split, then the "width" calculated here is actually
+      // the relative column count from the opening brace, but we want it
+      // from the end of the previous cell
+      if (current->HasSplit) {
+        // Keep track of the width in case all cells are split in the column
+        SplitMaxWidth = std::max(SplitMaxWidth, CellWidth);
+
+        // If the width is less than the previous columns combined, we should
+        // skip it would result in the next column starting before the previous
+        // column ended. We skip it by setting it to 0 as that will never be
+        // more than the current max width
+        if (CellWidth > CurrentWidth)
+          CellWidth -= CurrentWidth;
+        else
+          CellWidth = 0;
+      } else {
+        // +1 for the space after the comma in the previous column in all but
+        // the first column in which we add the brace padding from the opening
+        // brace
+        CellWidth += (current->Cell > 0) ? 1 : BracePadding;
       }
-      alignToStartOfCell(CellIter->Index, CellIter->EndIndex);
-      for (const auto *Next = CellIter->NextColumnElement; Next;
-           Next = Next->NextColumnElement) {
-        ThisWidth =
-            calculateCellWidth(Next->Index, Next->EndIndex, true) + NetWidth;
-        if (Changes[Next->Index].NewlinesBefore == 0) {
-          Changes[Next->Index].Spaces = (CellWidth - ThisWidth);
-          Changes[Next->Index].Spaces += (i > 0) ? 1 : BracePadding;
-        }
-        alignToStartOfCell(Next->Index, Next->EndIndex);
+
+      MaxWidth = std::max(MaxWidth, CellWidth);
+      current = current->NextColumnElement;
+    }
+
+    // If all the cells had splits, and at least one was not empty, use the max
+    // split width as the width of this column so that alignment below will
+    // calculate the right number of spaces, and "reset" the current width
+    // because this column always moves on to a new line which invalidates the
+    // current width.
+    if (MaxWidth == 0 && SplitMaxWidth > 0) {
+      MaxWidth = SplitMaxWidth;
+      CurrentWidth = 0;
+    }
+
+    ColumnWidths.push_back(MaxWidth);
+
+    CurrentWidth += MaxWidth;
+    SummedColumnWidths.push_back(CurrentWidth);
+
+    // +1 for the comma between cells
+    CurrentWidth++;
+  }
+
+  // Fixup spaces
+  for (RowDescription &Row : CellDescs.Rows) {
+    unsigned WidthSoFarInRow = 0;
+
+    for (unsigned i = Row.StartCellIndex; i < Row.EndCellIndex; i++) {
+      auto &Cell = Cells[i];
+      unsigned CellWidth = calculateCellWidth(Cell);
+
+      if (Cell.HasSplit) {
+        WidthSoFarInRow = CellWidth;
+
+        // This cell has been split on to multiple lines, probably because it
+        // was too long to fit on a single line, so make sure each change in the
+        // cell is aligned to the start of the first change
+        for (unsigned j = Cell.Index; j < Cell.EndIndex; j++)
+          if (Changes[j].NewlinesBefore > 0)
+            Changes[j].Spaces = Changes[Cell.Index].Spaces;
+
+      } else {
+        auto &Change = Changes[Cell.Index];
+        unsigned AlignmentSpaces = ColumnWidths[Cell.Cell] - CellWidth;
+        // This change's left side is currently flush with the previous cell and
+        // we need to move it so its right side is flush with the end of the
+        // current column. If there are new lines before it, spaces have already
+        // been added in front of it so that it is aligned with the cell on the
+        // previous line.
+        if (Change.NewlinesBefore == 0)
+          Change.Spaces = AlignmentSpaces;
+        else
+          Change.Spaces += AlignmentSpaces;
+
+        WidthSoFarInRow = SummedColumnWidths[Cell.Cell];
       }
+
+      // +1 for the comma after columns in all but the last column
+      // Note: this can't check Cell.Cell because a row may not have a full set
+      // of columns
+      if (i < Row.EndCellIndex - 1)
+        WidthSoFarInRow++;
+    }
+
+    // Align the end brace. If there was a trailing comma, the brace is already
+    // aligned on the next line and we do not need to touch it.
+    if (!Row.hasTrailingComma) {
+      auto &EndBrace = Changes[Row.ClosingBraceChangeIndex];
+
+      unsigned AlignmentSpaces = SummedColumnWidths.back() - WidthSoFarInRow;
+      EndBrace.Spaces = AlignmentSpaces + BracePadding;
     }
   }
 }
@@ -1394,47 +1460,91 @@ void WhitespaceManager::alignArrayInitializersRightJustified(
 void WhitespaceManager::alignArrayInitializersLeftJustified(
     CellDescriptions &&CellDescs) {
 
-  if (!CellDescs.isRectangular())
+  const unsigned ColumnCount = CellDescs.ColumnStartingCellIndices.size();
+  if (ColumnCount < 2)
     return;
 
+  const unsigned LastColumnIndex = ColumnCount - 1;
   const int BracePadding = Style.Cpp11BracedListStyle ? 0 : 1;
+  auto &ColumnStartingIndices = CellDescs.ColumnStartingCellIndices;
   auto &Cells = CellDescs.Cells;
-  // Now go through and fixup the spaces.
-  auto *CellIter = Cells.begin();
-  // The first cell of every row needs to be against the left brace.
-  for (const auto *Next = CellIter; Next; Next = Next->NextColumnElement) {
-    auto &Change = Changes[Next->Index];
-    Change.Spaces =
-        Change.NewlinesBefore == 0 ? BracePadding : CellDescs.InitialSpaces;
-  }
-  ++CellIter;
-  for (auto i = 1U; i < CellDescs.CellCounts[0]; i++, ++CellIter) {
-    auto MaxNetWidth = getMaximumNetWidth(
-        Cells.begin(), CellIter, CellDescs.InitialSpaces,
-        CellDescs.CellCounts[0], CellDescs.CellCounts.size());
-    auto ThisNetWidth =
-        getNetWidth(Cells.begin(), CellIter, CellDescs.InitialSpaces);
-    if (Changes[CellIter->Index].NewlinesBefore == 0) {
-      Changes[CellIter->Index].Spaces =
-          MaxNetWidth - ThisNetWidth +
-          (Changes[CellIter->Index].Tok->isNot(tok::r_brace) ? 1
-                                                             : BracePadding);
+
+  // Calculate column starting widths
+  SmallVector<unsigned> StartWidths;
+
+  // The first column starts after the opening brace's padding
+  StartWidths.push_back(BracePadding);
+
+  for (unsigned i = 0; i < ColumnCount; i++) {
+    const CellDescription *current = &Cells[ColumnStartingIndices[i]];
+
+    unsigned MaxWidth = 0;
+    while (current != nullptr) {
+      unsigned CellWidth = calculateCellWidth(*current);
+
+      // +1 for the comma after the cell if it exists
+      if (Changes[current->EndIndex].Tok->is(tok::comma))
+        CellWidth++;
+
+      // +1 for the space after the column if this is not the last column
+      if (i < LastColumnIndex)
+        CellWidth++;
+
+      // If there is a split, then the "width" calculated here is the relative
+      // column count from the opening brace, otherwise this is the relative
+      // column count from the previous cell. We always want it relative to the
+      // opening brace.
+      if (!current->HasSplit)
+        CellWidth += StartWidths[i];
+
+      MaxWidth = std::max(MaxWidth, CellWidth);
+      current = current->NextColumnElement;
     }
-    auto RowCount = 1U;
-    auto Offset = std::distance(Cells.begin(), CellIter);
-    for (const auto *Next = CellIter->NextColumnElement; Next;
-         Next = Next->NextColumnElement) {
-      if (RowCount >= CellDescs.CellCounts.size())
-        break;
-      auto *Start = (Cells.begin() + RowCount * CellDescs.CellCounts[0]);
-      auto *End = Start + Offset;
-      auto ThisNetWidth = getNetWidth(Start, End, CellDescs.InitialSpaces);
-      if (Changes[Next->Index].NewlinesBefore == 0) {
-        Changes[Next->Index].Spaces =
-            MaxNetWidth - ThisNetWidth +
-            (Changes[Next->Index].Tok->isNot(tok::r_brace) ? 1 : BracePadding);
+
+    // If this is the last column, add the brace padding to the width so that
+    // the end brace gets the necessary padding
+    if (i == LastColumnIndex)
+      MaxWidth += BracePadding;
+
+    StartWidths.push_back(MaxWidth);
+  }
+
+  // Fixup spaces
+  for (RowDescription &Row : CellDescs.Rows) {
+    unsigned WidthSoFarInRow = 0;
+    for (unsigned i = Row.StartCellIndex; i < Row.EndCellIndex; i++) {
+      auto &Cell = Cells[i];
+      unsigned CellWidth = calculateCellWidth(Cell);
+
+      auto &Change = Changes[Cell.Index];
+
+      // We only want to modify the spaces in front of this cell if there are no
+      // new lines before it. If there are new lines, it has been formatted to
+      // be on its own line, and we are not to change that. We _do_ need to keep
+      // track of how far we are in the line for future columns, though
+      if (Change.NewlinesBefore == 0) {
+        if (WidthSoFarInRow <= StartWidths[Cell.Cell])
+          Change.Spaces = StartWidths[Cell.Cell] - WidthSoFarInRow;
+
+        WidthSoFarInRow += CellWidth + Change.Spaces;
+      } else if (Cell.HasSplit) {
+        WidthSoFarInRow = CellWidth;
+      } else {
+        WidthSoFarInRow += CellWidth;
       }
-      ++RowCount;
+
+      // +1 for the comma after columns in all but the last column
+      // Note: this can't check Cell.Cell because a row may not have a full set
+      // of columns
+      if (i < Row.EndCellIndex - 1)
+        WidthSoFarInRow += 1;
+    }
+
+    // Align the end brace. If there was a trailing comma, the brace is already
+    // aligned on the next line and we do not need to touch it.
+    if (!Row.hasTrailingComma) {
+      auto &EndBrace = Changes[Row.ClosingBraceChangeIndex];
+      EndBrace.Spaces = StartWidths.back() - WidthSoFarInRow;
     }
   }
 }
@@ -1455,11 +1565,12 @@ WhitespaceManager::CellDescriptions WhitespaceManager::getCells(unsigned Start,
 
   unsigned Depth = 0;
   unsigned Cell = 0;
-  SmallVector<unsigned> CellCounts;
   unsigned InitialSpaces = 0;
   unsigned InitialTokenLength = 0;
   unsigned EndSpaces = 0;
   SmallVector<CellDescription> Cells;
+  SmallVector<RowDescription> Rows;
+  SmallVector<unsigned> StartingCellIndices;
   const FormatToken *MatchingParen = nullptr;
   for (unsigned i = Start; i < End; ++i) {
     auto &C = Changes[i];
@@ -1470,6 +1581,7 @@ WhitespaceManager::CellDescriptions WhitespaceManager::getCells(unsigned Start,
     if (Depth == 2) {
       if (C.Tok->is(tok::l_brace)) {
         Cell = 0;
+        Rows.push_back(RowDescription{unsigned(Cells.size()), 0, 0, false});
         MatchingParen = C.Tok->MatchingParen;
         if (InitialSpaces == 0) {
           InitialSpaces = C.Spaces + C.TokenLength;
@@ -1494,11 +1606,19 @@ WhitespaceManager::CellDescriptions WhitespaceManager::getCells(unsigned Start,
       }
     } else if (Depth == 1) {
       if (C.Tok == MatchingParen) {
-        if (!Cells.empty())
-          Cells.back().EndIndex = i;
-        Cells.push_back(CellDescription{i, ++Cell, i + 1, false, nullptr});
-        CellCounts.push_back(C.Tok->Previous->isNot(tok::comma) ? Cell + 1
-                                                                : Cell);
+        Rows.back().ClosingBraceChangeIndex = i;
+        Rows.back().EndCellIndex = Cells.size();
+        // If this is an empty row, just push back the cell
+        if (Cell == 0) {
+          Cells.push_back(CellDescription{i, Cell, i + 1, false, nullptr});
+        } else {
+          if (!Cells.empty())
+            Cells.back().EndIndex = i;
+          Cells.push_back(CellDescription{i, ++Cell, i + 1, false, nullptr});
+        }
+        if (C.Tok->getPreviousNonComment()->is(tok::comma))
+          Rows.back().hasTrailingComma = true;
+
         // Go to the next non-comment and ensure there is a break in front
         const auto *NextNonComment = C.Tok->getNextNonComment();
         while (NextNonComment && NextNonComment->is(tok::comma))
@@ -1564,35 +1684,31 @@ WhitespaceManager::CellDescriptions WhitespaceManager::getCells(unsigned Start,
       }
       if (Changes[i].Tok != C.Tok)
         --i;
+
+      if (Cell >= StartingCellIndices.size())
+        StartingCellIndices.push_back(Cells.size());
       Cells.push_back(CellDescription{i, Cell, i, HasSplit, nullptr});
     }
   }
 
-  return linkCells({Cells, CellCounts, InitialSpaces});
+  return linkCells({Cells, Rows, StartingCellIndices, InitialSpaces});
 }
 
-unsigned WhitespaceManager::calculateCellWidth(unsigned Start, unsigned End,
-                                               bool WithSpaces) const {
+unsigned
+WhitespaceManager::calculateCellWidth(const CellDescription &Cell) const {
   unsigned CellWidth = 0;
-  for (auto i = Start; i < End; i++) {
+  for (auto i = Cell.Index; i < Cell.EndIndex; i++) {
     if (Changes[i].NewlinesBefore > 0)
       CellWidth = 0;
+
+    if (CellWidth != 0)
+      CellWidth += Changes[i].Spaces;
+
     CellWidth += Changes[i].TokenLength;
-    CellWidth += (WithSpaces ? Changes[i].Spaces : 0);
   }
   return CellWidth;
 }
 
-void WhitespaceManager::alignToStartOfCell(unsigned Start, unsigned End) {
-  if ((End - Start) <= 1)
-    return;
-  // If the line is broken anywhere in there make sure everything
-  // is aligned to the parent
-  for (auto i = Start + 1; i < End; i++)
-    if (Changes[i].NewlinesBefore > 0)
-      Changes[i].Spaces = Changes[Start].Spaces;
-}
-
 WhitespaceManager::CellDescriptions
 WhitespaceManager::linkCells(CellDescriptions &&CellDesc) {
   auto &Cells = CellDesc.Cells;
diff --git a/clang/lib/Format/WhitespaceManager.h b/clang/lib/Format/WhitespaceManager.h
index 5dd667fb4ba3b..068c447a9f01e 100644
--- a/clang/lib/Format/WhitespaceManager.h
+++ b/clang/lib/Format/WhitespaceManager.h
@@ -189,22 +189,18 @@ class WhitespaceManager {
     }
   };
 
+  struct RowDescription {
+    unsigned StartCellIndex = 0;
+    unsigned EndCellIndex = 0;
+    unsigned ClosingBraceChangeIndex = 0;
+    bool hasTrailingComma = false;
+  };
+
   struct CellDescriptions {
     SmallVector<CellDescription> Cells;
-    SmallVector<unsigned> CellCounts;
+    SmallVector<RowDescription> Rows;
+    SmallVector<unsigned> ColumnStartingCellIndices;
     unsigned InitialSpaces = 0;
-
-    // Determine if every row in the array
-    // has the same number of columns.
-    bool isRectangular() const {
-      if (CellCounts.size() < 2)
-        return false;
-
-      for (auto NumberOfColumns : CellCounts)
-        if (NumberOfColumns != CellCounts[0])
-          return false;
-      return true;
-    }
   };
 
   /// Calculate \c IsTrailingComment, \c TokenLength for the last tokens
@@ -274,8 +270,7 @@ class WhitespaceManager {
   void alignArrayInitializersLeftJustified(CellDescriptions &&CellDescs);
 
   /// Calculate the cell width between two indexes.
-  unsigned calculateCellWidth(unsigned Start, unsigned End,
-                              bool WithSpaces ...
[truncated]

// { /* a */ 4 }
//
// In the first case, the comment does not start a column, but in the
// second it does
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// second it does
// second it does.

Comments end in full stop. (Also below.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added periods at the end of all the comments.

@@ -22276,7 +22276,7 @@ TEST_F(FormatTest, CatchAlignArrayOfStructuresLeftAlignment) {
verifyFormat("auto foo = Items{\n"
" Section{\n"
" 0, bar(),\n"
" }\n"
" }\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say this is fine.

@@ -27901,39 +27901,39 @@ TEST_F(FormatTest, AlignArrayOfStructuresLeftAlignmentNonSquare) {
// crashes, these tests assert that the array is not changed but will
// also act as regression tests for when it is properly fixed
verifyFormat("struct test demo[] = {\n"
" {1, 2},\n"
" {1, 2 },\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I'm not so sure about adding the spaces here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In CatchAlignArrayOfStructuresLeftAlignment(), where we test rectangular arrays, the end braces of all rows are aligned. I followed that precedent for non-rectangular arrays, but it is easy to change if necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a solid point. I personally think it looks nicer, just don't know about changing the formatting (and thus the tests).
Let's see what others think.

@bdunkin bdunkin force-pushed the bdunkin/non-rectangular-array-alignment branch from d15c79f to a4f9b91 Compare June 12, 2025 17:36
Copy link
Contributor

@mydeveloperday mydeveloperday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't change existing tests, you need an option if you want to pad with spaces

@bdunkin
Copy link
Author

bdunkin commented Jun 16, 2025

You can't change existing tests, you need an option if you want to pad with spaces

It is unclear to me if this should be considered a bug fix, or a new feature. Surely requiring new options are not required for every bug fix.

The public docs state:

As of clang-format 15 this option only applied to arrays with equal number of columns per row.

but the option was added in clang-format 13, so it seems this formatting has already had backwards incompatible changes without a new option, and this change is, perhaps, fixing whatever issues caused it to change in clang-format 15.

@HazardyKnusperkeks
Copy link
Contributor

You can't change existing tests, you need an option if you want to pad with spaces

It is unclear to me if this should be considered a bug fix, or a new feature. Surely requiring new options are not required for every bug fix.

It boils down to the question if this is seen as a bug or not. As already stated, in my opinion it looks nicer with the padding, but I have no opinion on wether it is currently a bug or not.

@bdunkin
Copy link
Author

bdunkin commented Jun 16, 2025

I will gladly add a new option, if that is deemed necessary. I just want to make sure that's the direction I should go. My experience has been it is very hard to take back an option once it has been released, so they should be added deliberately, and with careful consideration.

@bdunkin
Copy link
Author

bdunkin commented Jul 11, 2025

Just a friendly bump on this change :)

@owenca
Copy link
Contributor

owenca commented Jul 14, 2025

IIRC, we had so many issues with AlignArrayOfStructures that at one point we were discussing whether to disable it. Even after @mydeveloperday had reduced its functionality to only handling rectangular arrays, we still had to fix quite a few bugs including crashes before it was deemed (IMO barely) usable. I don't think we should make it handle non-rectangular arrays until the numerous existing bugs are fixed. Maybe what we really need is a redesign.

@bdunkin
Copy link
Author

bdunkin commented Jul 14, 2025

Which existing bugs need to be fixed? I tried to not change the underlying code too much, but if a redesign is on the table, does that mean backwards compatibility is a lesser concern?

@owenca
Copy link
Contributor

owenca commented Jul 15, 2025

Which existing bugs need to be fixed?

https://github.com/llvm/llvm-project/issues?q=is%3Aopen%20is%3Aissue%20in%3Atitle%20AlignArrayOfStructures. There may be more, e.g. when the option name is not in the issue title.

I tried to not change the underlying code too much, but if a redesign is on the table, does that mean backwards compatibility is a lesser concern?

Yes, if the new design fixes existing bugs or clearly improve the alignment.

@mydeveloperday
Copy link
Contributor

I couldn't approve this change with the test changing so... we'll just get beaten by people saying "you changed my code"

@bdunkin
Copy link
Author

bdunkin commented Jul 15, 2025

I will test my changes against those issues and see what happens. I would very much like to keep this feature, and am willing to spend some time to make it right. The issue of not changing existing tests still seems unresolved though. What happens if fixing some of those issues breaks existing tests?

@bdunkin bdunkin force-pushed the bdunkin/non-rectangular-array-alignment branch from a4f9b91 to f0a9d10 Compare August 11, 2025 18:15
// subsequent tokens so that trailing comments and escaped newlines can be
// aligned properly
static void
SetChangeSpaces(unsigned Start, unsigned Spaces,
Copy link
Author

@bdunkin bdunkin Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating Spaces, without also modifying PreviousEndOfTokenColumn for the next token means the spaces are not considered when aligning trailing comments or preprocessor line breaks. I changed all places in this file that modified Spaces directly to use this function instead, even when they were not related to AlignArrayOfStructures.

@@ -1318,281 +1343,253 @@ void WhitespaceManager::alignArrayInitializers(unsigned Start, unsigned End) {

void WhitespaceManager::alignArrayInitializersRightJustified(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function, and the left justified version, are full rewrites. Comparing to the previous implementation may not be useful.

}
return CellWidth;
}

void WhitespaceManager::alignToStartOfCell(unsigned Start, unsigned End) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is no longer used.

// between 2 tokens and another non-empty range at the start of the second
// token. We didn't implement logic to combine replacements for 2
// consecutive source ranges into a single replacement, because the
// between 2 tokens and another non-empty range at the start of the
Copy link
Author

@bdunkin bdunkin Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure why this comment has different line wrapping now, but this is the current output of clang-format, so...

@bdunkin bdunkin force-pushed the bdunkin/non-rectangular-array-alignment branch from f0a9d10 to 20a8432 Compare August 11, 2025 18:46

/// Get a set of fully specified CellDescriptions between \p Start and
/// \p End of the change list.
CellDescriptions getCells(unsigned Start, unsigned End);

/// Does this \p Cell contain a split element?
static bool isSplitCell(const CellDescription &Cell);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions are no longer used.

@@ -118,9 +118,9 @@ const tooling::Replacements &WhitespaceManager::generateReplacements() {
alignConsecutiveTableGenDefinitions();
}
alignChainedConditionals();
alignArrayInitializers();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aligning elements in arrays can changed the length of a line, so we must do it before trailing comments and escaped newlines, otherwise they will not be working with the final length of the line.

@bdunkin
Copy link
Author

bdunkin commented Aug 11, 2025

@owenca @mydeveloperday @HazardyKnusperkeks I have updated this change to address a significant number of formatting issues related to this option. It is almost a full rewrite of the feature, but I believe it now plays much more nicely with other formatting rules. Rows with line breaks in them can't reasonably have column alignment, and I made a choice to handle them as if they were a list with a trailing comma, but I am certainly open to other handling if there is a better suggestion.

Copy link
Contributor

@HazardyKnusperkeks HazardyKnusperkeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't looked over everything. But we still have the issue of changed tests (and I think more then before?), before there isn't an agreement, a thorough review is maybe just a waste of time.

TT_DesignatedInitializerPeriod,
TT_DesignatedInitializerLSquare));

bool AlignedArrayInitializer = Current.opensAlignedArrayInitializer(Style);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool AlignedArrayInitializer = Current.opensAlignedArrayInitializer(Style);
const bool AlignedArrayInitializer = Current.opensAlignedArrayInitializer(Style);

unsigned Depth = 0;
while (CurrentToken && CurrentToken != Line.Last) {
if (CurrentToken->is(tok::l_brace)) {
CurrentToken->IsArrayInitializer = true;
if (CurrentToken->Next)
CurrentToken->Next->MustBreakBefore = true;

// Ensure the end brace of the outer array is on its own line
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Ensure the end brace of the outer array is on its own line
// Ensure the end brace of the outer array is on its own line.

Comment on lines +4259 to +4260
auto *NextNonComment = CurrentToken->getNextNonComment();
if (NextNonComment)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto *NextNonComment = CurrentToken->getNextNonComment();
if (NextNonComment)
if (auto *NextNonComment = CurrentToken->getNextNonComment())

Comment on lines +4277 to +4285
//
// { // a
// 4
// }
//
// vs.
//
// { /* a */ 4 }
//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe put this next to each other, to not waste as much vertical space.

Tok.OriginalColumn, Tok.NewlinesBefore, "", "",
/*IsAligned=*/false, InPPDirective && !Tok.IsFirst,
/*IsInsideToken=*/false));
Changes.push_back(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is what clang-format does to this file now. I didn't make any changes to this line.

FirstChange.Spaces += ColumnChange;
FirstChange.StartOfTokenColumn += ColumnChange;

for (unsigned i = Start + 1; i < Changes.size(); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (unsigned i = Start + 1; i < Changes.size(); i++) {
for (auto I = Start + 1; i < Changes.size(); i++) {


unsigned CurrentWidth = 0;
for (unsigned CellIndex : ColumnStartingIndices) {
const CellDescription *current = &Cells[CellIndex];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const CellDescription *current = &Cells[CellIndex];
const CellDescription *Current = &Cells[CellIndex];

const CellDescription *current = &Cells[CellIndex];

unsigned MaxWidth = 0;
while (current != nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while (current != nullptr) {
for (; Current; Current = Current->NextColumnElement) {

// Align the end brace.
const unsigned AlignmentSpaces =
(SummedColumnWidths.back() - WidthSoFarInRow) + BracePadding;
setChangeSpaces(Row.ClosingBraceChangeIndex, AlignmentSpaces);
}
}

void WhitespaceManager::alignArrayInitializersLeftJustified(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be combined with the RightJustified version? Maybe a template on a bool so one can constexpr if the differences?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the differences between the functions are worth having separate functions. They seem like they are the same, but the details are actually different enough that combining it would make it harder to understand, imo.

@@ -1609,6 +1602,10 @@ WhitespaceManager::linkCells(CellDescriptions &&CellDesc) {
return std::move(CellDesc);
}

void WhitespaceManager::setChangeSpaces(unsigned Start, unsigned Spaces) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for a member and a non member function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The member function is a convenience for using the non member function. The non member function is required because there are static helper functions that need this functionality as well.

@bdunkin
Copy link
Author

bdunkin commented Aug 28, 2025

@owenca just a friendly bump on this review. The main question is currently whether changing tests is acceptable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants