Skip to content

Commit 52abb27

Browse files
committed
Merge tag 'slab-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fixes from Vlastimil Babka: - The "common kmalloc v4" series [1] by Hyeonggon Yoo. While the plan after LPC is to try again if it's possible to get rid of SLOB and SLAB (and if any critical aspect of those is not possible to achieve with SLUB today, modify it accordingly), it will take a while even in case there are no objections. Meanwhile this is a nice cleanup and some parts (e.g. to the tracepoints) will be useful even if we end up with a single slab implementation in the future: - Improves the mm/slab_common.c wrappers to allow deleting duplicated code between SLAB and SLUB. - Large kmalloc() allocations in SLAB are passed to page allocator like in SLUB, reducing number of kmalloc caches. - Removes the {kmem_cache_alloc,kmalloc}_node variants of tracepoints, node id parameter added to non-_node variants. - Addition of kmalloc_size_roundup() The first two patches from a series by Kees Cook [2] that introduce kmalloc_size_roundup(). This will allow merging of per-subsystem patches using the new function and ultimately stop (ab)using ksize() in a way that causes ongoing trouble for debugging functionality and static checkers. - Wasted kmalloc() memory tracking in debugfs alloc_traces A patch from Feng Tang that enhances the existing debugfs alloc_traces file for kmalloc caches with information about how much space is wasted by allocations that needs less space than the particular kmalloc cache provides. - My series [3] to fix validation races for caches with enabled debugging: - By decoupling the debug cache operation more from non-debug fastpaths, extra locking simplifications were possible and thus done afterwards. - Additional cleanup of PREEMPT_RT specific code on top, by Thomas Gleixner. - A late fix for slab page leaks caused by the series, by Feng Tang. - Smaller fixes and cleanups: - Unneeded variable removals, by ye xingchen - A cleanup removing a BUG_ON() in create_unique_id(), by Chao Yu Link: https://lore.kernel.org/all/20220817101826.236819-1-42.hyeyoo@gmail.com/ [1] Link: https://lore.kernel.org/all/20220923202822.2667581-1-keescook@chromium.org/ [2] Link: https://lore.kernel.org/all/20220823170400.26546-1-vbabka@suse.cz/ [3] * tag 'slab-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (30 commits) mm/slub: fix a slab missed to be freed problem slab: Introduce kmalloc_size_roundup() slab: Remove __malloc attribute from realloc functions mm/slub: clean up create_unique_id() mm/slub: enable debugging memory wasting of kmalloc slub: Make PREEMPT_RT support less convoluted mm/slub: simplify __cmpxchg_double_slab() and slab_[un]lock() mm/slub: convert object_map_lock to non-raw spinlock mm/slub: remove slab_lock() usage for debug operations mm/slub: restrict sysfs validation to debug caches and make it safe mm/sl[au]b: check if large object is valid in __ksize() mm/slab_common: move declaration of __ksize() to mm/slab.h mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using mm/slab_common: unify NUMA and UMA version of tracepoints mm/sl[au]b: cleanup kmem_cache_alloc[_node]_trace() mm/sl[au]b: generalize kmalloc subsystem mm/slub: move free_debug_processing() further mm/sl[au]b: introduce common alloc/free functions without tracepoint mm/slab: kmalloc: pass requests larger than order-1 page to page allocator mm/slab_common: cleanup kmalloc_large() ...
2 parents 55be608 + 00a7829 commit 52abb27

File tree

11 files changed

+854
-921
lines changed

11 files changed

+854
-921
lines changed

Documentation/mm/slub.rst

Lines changed: 21 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -400,21 +400,30 @@ information:
400400
allocated objects. The output is sorted by frequency of each trace.
401401

402402
Information in the output:
403-
Number of objects, allocating function, minimal/average/maximal jiffies since alloc,
404-
pid range of the allocating processes, cpu mask of allocating cpus, and stack trace.
403+
Number of objects, allocating function, possible memory wastage of
404+
kmalloc objects(total/per-object), minimal/average/maximal jiffies
405+
since alloc, pid range of the allocating processes, cpu mask of
406+
allocating cpus, numa node mask of origins of memory, and stack trace.
405407

406408
Example:::
407409

408-
1085 populate_error_injection_list+0x97/0x110 age=166678/166680/166682 pid=1 cpus=1::
409-
__slab_alloc+0x6d/0x90
410-
kmem_cache_alloc_trace+0x2eb/0x300
411-
populate_error_injection_list+0x97/0x110
412-
init_error_injection+0x1b/0x71
413-
do_one_initcall+0x5f/0x2d0
414-
kernel_init_freeable+0x26f/0x2d7
415-
kernel_init+0xe/0x118
416-
ret_from_fork+0x22/0x30
417-
410+
338 pci_alloc_dev+0x2c/0xa0 waste=521872/1544 age=290837/291891/293509 pid=1 cpus=106 nodes=0-1
411+
__kmem_cache_alloc_node+0x11f/0x4e0
412+
kmalloc_trace+0x26/0xa0
413+
pci_alloc_dev+0x2c/0xa0
414+
pci_scan_single_device+0xd2/0x150
415+
pci_scan_slot+0xf7/0x2d0
416+
pci_scan_child_bus_extend+0x4e/0x360
417+
acpi_pci_root_create+0x32e/0x3b0
418+
pci_acpi_scan_root+0x2b9/0x2d0
419+
acpi_pci_root_add.cold.11+0x110/0xb0a
420+
acpi_bus_attach+0x262/0x3f0
421+
device_for_each_child+0xb7/0x110
422+
acpi_dev_for_each_child+0x77/0xa0
423+
acpi_bus_attach+0x108/0x3f0
424+
device_for_each_child+0xb7/0x110
425+
acpi_dev_for_each_child+0x77/0xa0
426+
acpi_bus_attach+0x108/0x3f0
418427

419428
2. free_traces::
420429

include/linux/compiler_attributes.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,8 @@
3535

3636
/*
3737
* Note: do not use this directly. Instead, use __alloc_size() since it is conditionally
38-
* available and includes other attributes.
38+
* available and includes other attributes. For GCC < 9.1, __alloc_size__ gets undefined
39+
* in compiler-gcc.h, due to misbehaviors.
3940
*
4041
* gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-alloc_005fsize-function-attribute
4142
* clang: https://clang.llvm.org/docs/AttributeReference.html#alloc-size

include/linux/compiler_types.h

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -271,14 +271,16 @@ struct ftrace_likely_data {
271271

272272
/*
273273
* Any place that could be marked with the "alloc_size" attribute is also
274-
* a place to be marked with the "malloc" attribute. Do this as part of the
275-
* __alloc_size macro to avoid redundant attributes and to avoid missing a
276-
* __malloc marking.
274+
* a place to be marked with the "malloc" attribute, except those that may
275+
* be performing a _reallocation_, as that may alias the existing pointer.
276+
* For these, use __realloc_size().
277277
*/
278278
#ifdef __alloc_size__
279279
# define __alloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) __malloc
280+
# define __realloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__)
280281
#else
281282
# define __alloc_size(x, ...) __malloc
283+
# define __realloc_size(x, ...)
282284
#endif
283285

284286
#ifndef asm_volatile_goto

include/linux/slab.h

Lines changed: 89 additions & 99 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,8 @@
2929
#define SLAB_RED_ZONE ((slab_flags_t __force)0x00000400U)
3030
/* DEBUG: Poison objects */
3131
#define SLAB_POISON ((slab_flags_t __force)0x00000800U)
32+
/* Indicate a kmalloc slab */
33+
#define SLAB_KMALLOC ((slab_flags_t __force)0x00001000U)
3234
/* Align objs on cache lines */
3335
#define SLAB_HWCACHE_ALIGN ((slab_flags_t __force)0x00002000U)
3436
/* Use GFP_DMA memory */
@@ -184,11 +186,25 @@ int kmem_cache_shrink(struct kmem_cache *s);
184186
/*
185187
* Common kmalloc functions provided by all allocators
186188
*/
187-
void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __alloc_size(2);
189+
void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __realloc_size(2);
188190
void kfree(const void *objp);
189191
void kfree_sensitive(const void *objp);
190192
size_t __ksize(const void *objp);
193+
194+
/**
195+
* ksize - Report actual allocation size of associated object
196+
*
197+
* @objp: Pointer returned from a prior kmalloc()-family allocation.
198+
*
199+
* This should not be used for writing beyond the originally requested
200+
* allocation size. Either use krealloc() or round up the allocation size
201+
* with kmalloc_size_roundup() prior to allocation. If this is used to
202+
* access beyond the originally requested allocation size, UBSAN_BOUNDS
203+
* and/or FORTIFY_SOURCE may trip, since they only know about the
204+
* originally allocated size via the __alloc_size attribute.
205+
*/
191206
size_t ksize(const void *objp);
207+
192208
#ifdef CONFIG_PRINTK
193209
bool kmem_valid_obj(void *object);
194210
void kmem_dump_obj(void *object);
@@ -243,27 +259,17 @@ static inline unsigned int arch_slab_minalign(void)
243259

244260
#ifdef CONFIG_SLAB
245261
/*
246-
* The largest kmalloc size supported by the SLAB allocators is
247-
* 32 megabyte (2^25) or the maximum allocatable page order if that is
248-
* less than 32 MB.
249-
*
250-
* WARNING: Its not easy to increase this value since the allocators have
251-
* to do various tricks to work around compiler limitations in order to
252-
* ensure proper constant folding.
262+
* SLAB and SLUB directly allocates requests fitting in to an order-1 page
263+
* (PAGE_SIZE*2). Larger requests are passed to the page allocator.
253264
*/
254-
#define KMALLOC_SHIFT_HIGH ((MAX_ORDER + PAGE_SHIFT - 1) <= 25 ? \
255-
(MAX_ORDER + PAGE_SHIFT - 1) : 25)
256-
#define KMALLOC_SHIFT_MAX KMALLOC_SHIFT_HIGH
265+
#define KMALLOC_SHIFT_HIGH (PAGE_SHIFT + 1)
266+
#define KMALLOC_SHIFT_MAX (MAX_ORDER + PAGE_SHIFT - 1)
257267
#ifndef KMALLOC_SHIFT_LOW
258268
#define KMALLOC_SHIFT_LOW 5
259269
#endif
260270
#endif
261271

262272
#ifdef CONFIG_SLUB
263-
/*
264-
* SLUB directly allocates requests fitting in to an order-1 page
265-
* (PAGE_SIZE*2). Larger requests are passed to the page allocator.
266-
*/
267273
#define KMALLOC_SHIFT_HIGH (PAGE_SHIFT + 1)
268274
#define KMALLOC_SHIFT_MAX (MAX_ORDER + PAGE_SHIFT - 1)
269275
#ifndef KMALLOC_SHIFT_LOW
@@ -415,10 +421,6 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
415421
if (size <= 512 * 1024) return 19;
416422
if (size <= 1024 * 1024) return 20;
417423
if (size <= 2 * 1024 * 1024) return 21;
418-
if (size <= 4 * 1024 * 1024) return 22;
419-
if (size <= 8 * 1024 * 1024) return 23;
420-
if (size <= 16 * 1024 * 1024) return 24;
421-
if (size <= 32 * 1024 * 1024) return 25;
422424

423425
if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
424426
BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()");
@@ -428,6 +430,7 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
428430
/* Will never be reached. Needed because the compiler may complain */
429431
return -1;
430432
}
433+
static_assert(PAGE_SHIFT <= 20);
431434
#define kmalloc_index(s) __kmalloc_index(s, true)
432435
#endif /* !CONFIG_SLOB */
433436

@@ -456,51 +459,32 @@ static __always_inline void kfree_bulk(size_t size, void **p)
456459
kmem_cache_free_bulk(NULL, size, p);
457460
}
458461

459-
#ifdef CONFIG_NUMA
460462
void *__kmalloc_node(size_t size, gfp_t flags, int node) __assume_kmalloc_alignment
461463
__alloc_size(1);
462464
void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t flags, int node) __assume_slab_alignment
463465
__malloc;
464-
#else
465-
static __always_inline __alloc_size(1) void *__kmalloc_node(size_t size, gfp_t flags, int node)
466-
{
467-
return __kmalloc(size, flags);
468-
}
469-
470-
static __always_inline void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t flags, int node)
471-
{
472-
return kmem_cache_alloc(s, flags);
473-
}
474-
#endif
475466

476467
#ifdef CONFIG_TRACING
477-
extern void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t flags, size_t size)
478-
__assume_slab_alignment __alloc_size(3);
479-
480-
#ifdef CONFIG_NUMA
481-
extern void *kmem_cache_alloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
482-
int node, size_t size) __assume_slab_alignment
483-
__alloc_size(4);
484-
#else
485-
static __always_inline __alloc_size(4) void *kmem_cache_alloc_node_trace(struct kmem_cache *s,
486-
gfp_t gfpflags, int node, size_t size)
487-
{
488-
return kmem_cache_alloc_trace(s, gfpflags, size);
489-
}
490-
#endif /* CONFIG_NUMA */
468+
void *kmalloc_trace(struct kmem_cache *s, gfp_t flags, size_t size)
469+
__assume_kmalloc_alignment __alloc_size(3);
491470

471+
void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
472+
int node, size_t size) __assume_kmalloc_alignment
473+
__alloc_size(4);
492474
#else /* CONFIG_TRACING */
493-
static __always_inline __alloc_size(3) void *kmem_cache_alloc_trace(struct kmem_cache *s,
494-
gfp_t flags, size_t size)
475+
/* Save a function call when CONFIG_TRACING=n */
476+
static __always_inline __alloc_size(3)
477+
void *kmalloc_trace(struct kmem_cache *s, gfp_t flags, size_t size)
495478
{
496479
void *ret = kmem_cache_alloc(s, flags);
497480

498481
ret = kasan_kmalloc(s, ret, size, flags);
499482
return ret;
500483
}
501484

502-
static __always_inline void *kmem_cache_alloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
503-
int node, size_t size)
485+
static __always_inline __alloc_size(4)
486+
void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
487+
int node, size_t size)
504488
{
505489
void *ret = kmem_cache_alloc_node(s, gfpflags, node);
506490

@@ -509,25 +493,11 @@ static __always_inline void *kmem_cache_alloc_node_trace(struct kmem_cache *s, g
509493
}
510494
#endif /* CONFIG_TRACING */
511495

512-
extern void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) __assume_page_alignment
513-
__alloc_size(1);
496+
void *kmalloc_large(size_t size, gfp_t flags) __assume_page_alignment
497+
__alloc_size(1);
514498

515-
#ifdef CONFIG_TRACING
516-
extern void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order)
517-
__assume_page_alignment __alloc_size(1);
518-
#else
519-
static __always_inline __alloc_size(1) void *kmalloc_order_trace(size_t size, gfp_t flags,
520-
unsigned int order)
521-
{
522-
return kmalloc_order(size, flags, order);
523-
}
524-
#endif
525-
526-
static __always_inline __alloc_size(1) void *kmalloc_large(size_t size, gfp_t flags)
527-
{
528-
unsigned int order = get_order(size);
529-
return kmalloc_order_trace(size, flags, order);
530-
}
499+
void *kmalloc_large_node(size_t size, gfp_t flags, int node) __assume_page_alignment
500+
__alloc_size(1);
531501

532502
/**
533503
* kmalloc - allocate memory
@@ -597,31 +567,43 @@ static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags)
597567
if (!index)
598568
return ZERO_SIZE_PTR;
599569

600-
return kmem_cache_alloc_trace(
570+
return kmalloc_trace(
601571
kmalloc_caches[kmalloc_type(flags)][index],
602572
flags, size);
603573
#endif
604574
}
605575
return __kmalloc(size, flags);
606576
}
607577

578+
#ifndef CONFIG_SLOB
608579
static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t flags, int node)
609580
{
610-
#ifndef CONFIG_SLOB
611-
if (__builtin_constant_p(size) &&
612-
size <= KMALLOC_MAX_CACHE_SIZE) {
613-
unsigned int i = kmalloc_index(size);
581+
if (__builtin_constant_p(size)) {
582+
unsigned int index;
614583

615-
if (!i)
584+
if (size > KMALLOC_MAX_CACHE_SIZE)
585+
return kmalloc_large_node(size, flags, node);
586+
587+
index = kmalloc_index(size);
588+
589+
if (!index)
616590
return ZERO_SIZE_PTR;
617591

618-
return kmem_cache_alloc_node_trace(
619-
kmalloc_caches[kmalloc_type(flags)][i],
620-
flags, node, size);
592+
return kmalloc_node_trace(
593+
kmalloc_caches[kmalloc_type(flags)][index],
594+
flags, node, size);
621595
}
622-
#endif
623596
return __kmalloc_node(size, flags, node);
624597
}
598+
#else
599+
static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t flags, int node)
600+
{
601+
if (__builtin_constant_p(size) && size > KMALLOC_MAX_CACHE_SIZE)
602+
return kmalloc_large_node(size, flags, node);
603+
604+
return __kmalloc_node(size, flags, node);
605+
}
606+
#endif
625607

626608
/**
627609
* kmalloc_array - allocate memory for an array.
@@ -647,10 +629,10 @@ static inline __alloc_size(1, 2) void *kmalloc_array(size_t n, size_t size, gfp_
647629
* @new_size: new size of a single member of the array
648630
* @flags: the type of memory to allocate (see kmalloc)
649631
*/
650-
static inline __alloc_size(2, 3) void * __must_check krealloc_array(void *p,
651-
size_t new_n,
652-
size_t new_size,
653-
gfp_t flags)
632+
static inline __realloc_size(2, 3) void * __must_check krealloc_array(void *p,
633+
size_t new_n,
634+
size_t new_size,
635+
gfp_t flags)
654636
{
655637
size_t bytes;
656638

@@ -671,6 +653,12 @@ static inline __alloc_size(1, 2) void *kcalloc(size_t n, size_t size, gfp_t flag
671653
return kmalloc_array(n, size, flags | __GFP_ZERO);
672654
}
673655

656+
void *__kmalloc_node_track_caller(size_t size, gfp_t flags, int node,
657+
unsigned long caller) __alloc_size(1);
658+
#define kmalloc_node_track_caller(size, flags, node) \
659+
__kmalloc_node_track_caller(size, flags, node, \
660+
_RET_IP_)
661+
674662
/*
675663
* kmalloc_track_caller is a special version of kmalloc that records the
676664
* calling function of the routine calling it for slab leak tracking instead
@@ -679,9 +667,9 @@ static inline __alloc_size(1, 2) void *kcalloc(size_t n, size_t size, gfp_t flag
679667
* allocator where we care about the real place the memory allocation
680668
* request comes from.
681669
*/
682-
extern void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long caller);
683670
#define kmalloc_track_caller(size, flags) \
684-
__kmalloc_track_caller(size, flags, _RET_IP_)
671+
__kmalloc_node_track_caller(size, flags, \
672+
NUMA_NO_NODE, _RET_IP_)
685673

686674
static inline __alloc_size(1, 2) void *kmalloc_array_node(size_t n, size_t size, gfp_t flags,
687675
int node)
@@ -700,21 +688,6 @@ static inline __alloc_size(1, 2) void *kcalloc_node(size_t n, size_t size, gfp_t
700688
return kmalloc_array_node(n, size, flags | __GFP_ZERO, node);
701689
}
702690

703-
704-
#ifdef CONFIG_NUMA
705-
extern void *__kmalloc_node_track_caller(size_t size, gfp_t flags, int node,
706-
unsigned long caller) __alloc_size(1);
707-
#define kmalloc_node_track_caller(size, flags, node) \
708-
__kmalloc_node_track_caller(size, flags, node, \
709-
_RET_IP_)
710-
711-
#else /* CONFIG_NUMA */
712-
713-
#define kmalloc_node_track_caller(size, flags, node) \
714-
kmalloc_track_caller(size, flags)
715-
716-
#endif /* CONFIG_NUMA */
717-
718691
/*
719692
* Shortcuts
720693
*/
@@ -774,11 +747,28 @@ static inline __alloc_size(1, 2) void *kvcalloc(size_t n, size_t size, gfp_t fla
774747
}
775748

776749
extern void *kvrealloc(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
777-
__alloc_size(3);
750+
__realloc_size(3);
778751
extern void kvfree(const void *addr);
779752
extern void kvfree_sensitive(const void *addr, size_t len);
780753

781754
unsigned int kmem_cache_size(struct kmem_cache *s);
755+
756+
/**
757+
* kmalloc_size_roundup - Report allocation bucket size for the given size
758+
*
759+
* @size: Number of bytes to round up from.
760+
*
761+
* This returns the number of bytes that would be available in a kmalloc()
762+
* allocation of @size bytes. For example, a 126 byte request would be
763+
* rounded up to the next sized kmalloc bucket, 128 bytes. (This is strictly
764+
* for the general-purpose kmalloc()-based allocations, and is not for the
765+
* pre-sized kmem_cache_alloc()-based allocations.)
766+
*
767+
* Use this to kmalloc() the full bucket size ahead of time instead of using
768+
* ksize() to query the size after an allocation.
769+
*/
770+
size_t kmalloc_size_roundup(size_t size);
771+
782772
void __init kmem_cache_init_late(void);
783773

784774
#if defined(CONFIG_SMP) && defined(CONFIG_SLAB)

0 commit comments

Comments
 (0)