Skip to content

Commit 9980d74

Browse files
Michal Hockotorvalds
authored andcommitted
mm, hugetlb: get rid of surplus page accounting tricks
alloc_surplus_huge_page increases the pool size and the number of surplus pages opportunistically to prevent from races with the pool size change. See commit d1c3fb1 ("hugetlb: introduce nr_overcommit_hugepages sysctl") for more details. The resulting code is unnecessarily hairy, cause code duplication and doesn't allow to share the allocation paths. Moreover pool size changes tend to be very seldom so optimizing for them is not really reasonable. Simplify the code and allow to allocate a fresh surplus page as long as we are under the overcommit limit and then recheck the condition after the allocation and drop the new page if the situation has changed. This should provide a reasonable guarantee that an abrupt allocation requests will not go way off the limit. If we consider races with the pool shrinking and enlarging then we should be reasonably safe as well. In the first case we are off by one in the worst case and the second case should work OK because the page is not yet visible. We can waste CPU cycles for the allocation but that should be acceptable for a relatively rare condition. Link: http://lkml.kernel.org/r/20180103093213.26329-5-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andrea Reale <ar@linux.vnet.ibm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1 parent ab5ac90 commit 9980d74

File tree

1 file changed

+23
-39
lines changed

1 file changed

+23
-39
lines changed

mm/hugetlb.c

Lines changed: 23 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1540,62 +1540,46 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
15401540
static struct page *__alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask,
15411541
int nid, nodemask_t *nmask)
15421542
{
1543-
struct page *page;
1544-
unsigned int r_nid;
1543+
struct page *page = NULL;
15451544

15461545
if (hstate_is_gigantic(h))
15471546
return NULL;
15481547

1549-
/*
1550-
* Assume we will successfully allocate the surplus page to
1551-
* prevent racing processes from causing the surplus to exceed
1552-
* overcommit
1553-
*
1554-
* This however introduces a different race, where a process B
1555-
* tries to grow the static hugepage pool while alloc_pages() is
1556-
* called by process A. B will only examine the per-node
1557-
* counters in determining if surplus huge pages can be
1558-
* converted to normal huge pages in adjust_pool_surplus(). A
1559-
* won't be able to increment the per-node counter, until the
1560-
* lock is dropped by B, but B doesn't drop hugetlb_lock until
1561-
* no more huge pages can be converted from surplus to normal
1562-
* state (and doesn't try to convert again). Thus, we have a
1563-
* case where a surplus huge page exists, the pool is grown, and
1564-
* the surplus huge page still exists after, even though it
1565-
* should just have been converted to a normal huge page. This
1566-
* does not leak memory, though, as the hugepage will be freed
1567-
* once it is out of use. It also does not allow the counters to
1568-
* go out of whack in adjust_pool_surplus() as we don't modify
1569-
* the node values until we've gotten the hugepage and only the
1570-
* per-node value is checked there.
1571-
*/
15721548
spin_lock(&hugetlb_lock);
1573-
if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) {
1574-
spin_unlock(&hugetlb_lock);
1575-
return NULL;
1576-
} else {
1577-
h->nr_huge_pages++;
1578-
h->surplus_huge_pages++;
1579-
}
1549+
if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages)
1550+
goto out_unlock;
15801551
spin_unlock(&hugetlb_lock);
15811552

15821553
page = __hugetlb_alloc_buddy_huge_page(h, gfp_mask, nid, nmask);
1554+
if (!page)
1555+
goto out_unlock;
15831556

15841557
spin_lock(&hugetlb_lock);
1585-
if (page) {
1558+
/*
1559+
* We could have raced with the pool size change.
1560+
* Double check that and simply deallocate the new page
1561+
* if we would end up overcommiting the surpluses. Abuse
1562+
* temporary page to workaround the nasty free_huge_page
1563+
* codeflow
1564+
*/
1565+
if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) {
1566+
SetPageHugeTemporary(page);
1567+
put_page(page);
1568+
page = NULL;
1569+
} else {
1570+
int r_nid;
1571+
1572+
h->surplus_huge_pages++;
1573+
h->nr_huge_pages++;
15861574
INIT_LIST_HEAD(&page->lru);
15871575
r_nid = page_to_nid(page);
15881576
set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
15891577
set_hugetlb_cgroup(page, NULL);
1590-
/*
1591-
* We incremented the global counters already
1592-
*/
15931578
h->nr_huge_pages_node[r_nid]++;
15941579
h->surplus_huge_pages_node[r_nid]++;
1595-
} else {
1596-
h->nr_huge_pages--;
1597-
h->surplus_huge_pages--;
15981580
}
1581+
1582+
out_unlock:
15991583
spin_unlock(&hugetlb_lock);
16001584

16011585
return page;

0 commit comments

Comments
 (0)