Skip to content

Conversation

philnik777
Copy link
Contributor

@philnik777 philnik777 commented Aug 7, 2025

Instead of just calling the single element erase on every element of the range, we can combine some of the operations in a custom implementation. Specifically, we don't need to search for the previous node or re-link the list every iteration. Removing this unnecessary work results in some nice performance improvements:

-----------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                             old           new
-----------------------------------------------------------------------------------------------------------------------
std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/0                    457 ns        459 ns
std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/32                   995 ns        626 ns
std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/1024               18196 ns       7995 ns
std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/8192              124722 ns      70125 ns
std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/0            456 ns        461 ns
std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/32          1183 ns        769 ns
std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/1024       27827 ns      18614 ns
std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/8192      266681 ns     226107 ns
std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/0               455 ns        462 ns
std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/32              996 ns        659 ns
std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/1024          15963 ns       8108 ns
std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/8192         136493 ns      71848 ns
std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/0               454 ns        455 ns
std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/32              985 ns        703 ns
std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/1024          16277 ns       9085 ns
std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/8192         125736 ns      82710 ns
std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/0          457 ns        454 ns
std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/32        1091 ns        646 ns
std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/1024     17784 ns       7664 ns
std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/8192    127098 ns      72806 ns

Copy link

github-actions bot commented Aug 7, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@philnik777 philnik777 marked this pull request as ready for review August 11, 2025 19:53
@philnik777 philnik777 requested a review from a team as a code owner August 11, 2025 19:53
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Aug 11, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 11, 2025

@llvm/pr-subscribers-libcxx

Author: Nikolas Klauser (philnik777)

Changes
----------------------------------------------------------------------------------------------------------------
Benchmark                                                                                        old         new
----------------------------------------------------------------------------------------------------------------
std::unordered_map&lt;int, int&gt;::erase(iterator, iterator) (erase half the container)/0          450 ns      446 ns
std::unordered_map&lt;int, int&gt;::erase(iterator, iterator) (erase half the container)/32        1017 ns      614 ns
std::unordered_map&lt;int, int&gt;::erase(iterator, iterator) (erase half the container)/1024     16035 ns     7747 ns
std::unordered_map&lt;int, int&gt;::erase(iterator, iterator) (erase half the container)/8192    122107 ns    73020 ns

Full diff: https://github.com/llvm/llvm-project/pull/152471.diff

1 Files Affected:

  • (modified) libcxx/include/__hash_table (+49-5)
diff --git a/libcxx/include/__hash_table b/libcxx/include/__hash_table
index dacc152030e14..2f0f9457f1416 100644
--- a/libcxx/include/__hash_table
+++ b/libcxx/include/__hash_table
@@ -1848,12 +1848,56 @@ __hash_table<_Tp, _Hash, _Equal, _Alloc>::erase(const_iterator __p) {
 template <class _Tp, class _Hash, class _Equal, class _Alloc>
 typename __hash_table<_Tp, _Hash, _Equal, _Alloc>::iterator
 __hash_table<_Tp, _Hash, _Equal, _Alloc>::erase(const_iterator __first, const_iterator __last) {
-  for (const_iterator __p = __first; __first != __last; __p = __first) {
-    ++__first;
-    erase(__p);
+  if (__first == __last)
+    return iterator(__last.__node_);
+
+  // current node
+  __next_pointer __current = __first.__node_;
+  size_type __bucket_count = bucket_count();
+  size_t __chash = std::__constrain_hash(__current->__hash(), __bucket_count);
+  // find previous node
+  __next_pointer __before_first = __bucket_list_[__chash];
+  for (; __before_first->__next_ != __current; __before_first = __before_first->__next_)
+    ;
+
+  __next_pointer __end = __last.__node_;
+
+  // If __before_first is in the same bucket, clear this bucket first without re-linking it
+  if (__before_first != __first_node_.__ptr() &&
+      std::__constrain_hash(__before_first->__hash(), __bucket_count) == __chash) {
+    while (__current != __end) {
+      if (auto __next_chash = std::__constrain_hash(__current->__hash(), __bucket_count); __next_chash != __chash) {
+        __chash = __next_chash;
+        break;
+      }
+      auto __next = __current->__next_;
+      __node_traits::deallocate(__node_alloc(), __current->__upcast(), 1);
+      __current = __next;
+      --__size_;
+    }
   }
-  __next_pointer __np = __last.__node_;
-  return iterator(__np);
+
+  while (__current != __end) {
+    auto __next = __current->__next_;
+    __node_traits::deallocate(__node_alloc(), __current->__upcast(), 1);
+    __current = __next;
+    --__size_;
+
+    // When switching buckets, set the old bucket to be empty and update the next bucket to have __before_first as its
+    // before-first element
+    if (__next) {
+      if (auto __next_chash = std::__constrain_hash(__next->__hash(), __bucket_count); __next_chash != __chash) {
+        __bucket_list_[__chash] = nullptr;
+        __chash                 = __next_chash;
+        __bucket_list_[__chash] = __before_first;
+      }
+    }
+  }
+
+  // re-link __before_start with __last
+  __before_first->__next_ = __current;
+
+  return iterator(__last.__node_);
 }
 
 template <class _Tp, class _Hash, class _Equal, class _Alloc>

@philnik777 philnik777 force-pushed the optimize_hash_table_erase branch 2 times, most recently from 40ad452 to ff5a8ec Compare August 18, 2025 13:49
@philnik777 philnik777 force-pushed the optimize_hash_table_erase branch from ff5a8ec to e006595 Compare August 25, 2025 07:11
@philnik777 philnik777 force-pushed the optimize_hash_table_erase branch from e006595 to 7bc9893 Compare August 25, 2025 13:33
@philnik777 philnik777 merged commit e4eccd6 into llvm:main Aug 25, 2025
67 of 75 checks passed
@philnik777 philnik777 deleted the optimize_hash_table_erase branch August 25, 2025 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants