|
|
Subscribe / Log in / New account

Introducing maple trees

February 12, 2021

This article was contributed by Marta Rybczyńska

Seen from outside, the internals of the Linux kernel appear to be stable, especially in subsystems like the memory-management subsystem. However, from time to time, developers need to replace an internal interface to solve a longstanding problem. One such issue is contention on the lock used to protect essential memory-management structures, including the page tables and virtual memory areas (VMAs). Liam Howlett and Matthew Wilcox have been developing a new data structure, called a "maple tree", to replace the data structures currently used for VMAs. This potentially big change in internal kernel structures has been recently posted for a review in a massive patch set.

Linux is a virtual-memory system. The address space for each process contains multiple virtual memory areas represented by vm_area_struct structures. Each VMA represents a contiguous block of address space and represents a range of memory of the same type, which may be anonymous memory (not backed by a file), a memory-mapped file, or even device memory. A virtual memory area, as seen from the process, is contiguous, while the supporting physical memory may not be. In addition, the address space contains holes between the VMAs; the kernel fills those empty blocks (leaving space for unmapped "guard" pages to catch buffer overflows) when it needs to add a new mapped area, for example when loading a library or in a response to an mmap() call.

Almost anything one can do in the system involves memory, so the operations on the structures representing the VMAs must be fast. These operations include lookups (finding out which VMA corresponds to a given virtual address, finding out if the memory is mapped, or locating an empty gap for a new memory area), and modifications (growing a stack, for example).

VMAs are currently stored in a modified red-black tree (rbtree) with an additional, doubly-linked list that allows the kernel to traverse all of the VMAs in a process's address space. Kernel developers have been unhappy with this data structure for some time, for a number of reasons: rbtrees do not support ranges well, they are difficult to handle in a lockless manner (the balancing operation of the rbtree affects multiple items at the same time), and rbtree traversal is inefficient, which is why the additional linked list exists.

Operations on VMAs are protected by a lock (specifically a reader/writer semaphore), which is contained in struct mm_struct. This lock was called mmap_sem until it was renamed to mmap_lock for the 5.8 release in June 2020. This renaming was a part of an effort to wrap the lock in an API, hoping to ease its replacement in the future.

Users, especially those with threaded applications in large systems, often experience contention on this lock. The problem has been discussed by kernel developers numerous times, with at least three sessions discussing it at the 2019 Linux Storage, Filesystem, and Memory-Management Summit (LFSMM). The core of the problem is that the lock is required for many operations, including almost anything involving page tables and VMAs. There are other related structures that are effectively protected by mmap_lock (with the additional difficulty of lack of documentation of this fact). In addition to splitting the unrelated structures out from under mmap_lock, the developers were considering either using a structure that would allow lockless access to the VMAs or using some type of range lock. The maple tree was proposed as one of the solutions at that time, but at that time it was in an early state of development and the code was not available yet.

Introducing maple trees

Maple trees (the name might refer to the shape of the maple leaf, leading in multiple directions) differ in important aspects from rbtrees. They belong to the B-tree family, so their nodes can contain more than two elements — up to 16 elements in leaf nodes, or ten elements in internal nodes. The use of B-trees also means that there is less need to create new nodes, as nodes may include empty slots that can be filled over time without additional allocations. Each node requires at most 256 bytes, which is a multiple of popular cache line sizes. The increased number of elements in a node and the cache-aligned size means fewer cache misses when traversing the tree.

The improved support for searching in maple trees also comes from the B-tree structure. In a B-tree, each node holds keys, called "pivots", that separate the node into subtrees. A subtree before a given key contains only values lower or equal to the key, while subtree after the key contains only values higher than the key (and lower than the next key).

Of course, maple trees were designed to work in a lockless manner, using read-copy-update (RCU). The maple tree is a generic structure that can be used in different kernel subsystems; the first usage is replacing the rbtrees and linked lists to manage VMAs. One of the authors, Liam Howlett, explained the reasons for the design in a blog post.

Maple trees offer two APIs: the simple one and the advanced one. The simple API uses the mtree_ prefix for its functions, with the main structure, struct maple_tree, defined as follows:

    struct maple_tree {
	spinlock_t    ma_lock;
	unsigned int  ma_flags;
	void __rcu    *ma_root;
    };

The static initializers are DEFINE_MTREE(name) and MTREE_INIT(name, flags), with the latter taking a bitmask of flags from the two defined currently. MAPLE_ALLOC_RANGE indicates that the tree will be used to allocate ranges and that it should manage gaps between the allocations; MAPLE_USE_RCU activates the RCU mode for the case of multiple readers. Dynamic initialization is possible with the same flags using mtree_init():

    void mtree_init(struct maple_tree *mt, unsigned int ma_flags);

Developers can free the whole tree with:

    void mtree_destroy(struct maple_tree *mt);

Three functions exist to add entries to the tree; mtree_insert(), mtree_insert_range(), and mtree_store_range(). The first two functions only add an entry if it does not previously exist, while the third function can overwrite an existing entry. They are defined as follows:

    int mtree_insert(struct maple_tree *mt, unsigned long index,
	void *entry, gfp_t gfp);
    int mtree_insert_range(struct maple_tree *mt, unsigned long first,
	unsigned long last, void *entry, gfp_t gfp);
    int mtree_store_range(struct maple_tree *mt, unsigned long first,
	unsigned long last, void *entry, gfp_t gfp);

mtree_insert() takes a pointer to the tree mt, the integer key of the entry index, the pointer to the entry entry, and the memory allocation flags gfp for new tree nodes (if needed). mtree_insert_range() inserts a range from first to last with the given data entry. These functions return zero on success, and a negative value otherwise, including -EEXIST if the key already exists. Finally, mtree_store_range() takes the same arguments as mtree_insert_range(); the difference is that it replaces any existing entries for the corresponding keys.

Two functions exist to get an entry from the tree or remove an entry:

    void *mtree_load(struct maple_tree *mt, unsigned long index);
    void *mtree_erase(struct maple_tree *mt, unsigned long index);

To read an entry, the developer should use mtree_load(), which takes a pointer to the tree mt and the key of the requested value index. The function returns a pointer to the entry, or NULL if the key was not found in the tree. The same syntax is used to remove an entry from the tree using mtree_erase(). It removes the given key from the tree and returns the associated entry, returning NULL if no such value was found.

There is more to the simple API than the above, including mtree_alloc_range() to allocate a range from the key space. The advanced API (which uses the mas_ prefix) adds iterator functions for the traversal of the whole tree, or to access next and previous elements using a state variable. With this fine-grained control, developers can resume a search as needed. The API also allows developers to find empty areas or duplicate the tree.

Replacing the VMA rbtree (and more)

The patch set contains more than just the introduction of maple trees. It is worth noting that a big part of the set adds and updates tests, a welcome addition given the impact of the changes and the future importance of the new data structure.

In the 70 patches in this set, maple trees are used to replace rbtrees in all operations on VMAs, and one of the patches removes the usage of rbtrees from VMAs. Another part of the patch set leads to the removal of the VMA linked list. This removal requires modifications in various places in the kernel code that were using the VMA list directly: the architecture code, core dump and program loading routines, some device drivers, and of course the memory-management code. The patch set also removes the VMA cache (which tracks the VMAs that are most recently accessed by each process to speed lookup), as the implementation with maple trees is faster and the cache is no longer needed.

The cover letter includes some performance results, which are somewhat hard to interpret. Some microbenchmarks show a performance increase, and some (a smaller number) a decrease. Kernel compilation time is similar to that with the vanilla 5.10 kernel, with a few more instructions executed (probably linked to the added code). Howlett requested insights from developers into the results.

Current status and next steps

Maple trees are currently at the RFC stage, with limitations; the implementation does not support 32-bit systems or non-MMU platforms, for example. However, the code is functional and kernel developers may look into it to decide if it is the direction they want to go on the quest of removing mmap_lock (as the lock was not removed in this patch set). Based on the size of the patch set, it may take time until the review is finished.


Index entries for this article
KernelMaple trees
GuestArticlesRybczynska, Marta


to post comments

Introducing maple trees

Posted Feb 12, 2021 20:41 UTC (Fri) by zev (subscriber, #88455) [Link] (21 responses)

struct maple_tree {
	spinlock_t    ma_lock;
	unsigned int  ma_flags;
	void __rcu    *ma_root;
};
I've been under the impression that this style of struct-member naming (with a common prefix specific to the containing struct type) was an artifact of coding around the limitations of prehistoric C compilers that didn't put struct members in a separate namespace. Is there a particular reason to keep carrying the extra verbosity around in newly written code like this?

Introducing maple trees

Posted Feb 12, 2021 22:20 UTC (Fri) by Funcan (subscriber, #44209) [Link] (1 responses)

I've seen it justified as so that you don't have to remember the type of the pointer when you're reading code in some style guides

Introducing maple trees

Posted Feb 15, 2021 9:08 UTC (Mon) by metan (subscriber, #74107) [Link]

I've been avoiding such prefixes in my code but then I realized that prefixing members in certain cases makes the code easier to read. For example if you loop over a link list of structures the code is more readable when members are prefixed. Hence I do not frown upon prefixed structure members in generic structures and syscall API anymore.

Introducing maple trees

Posted Feb 12, 2021 22:35 UTC (Fri) by dave_malcolm (subscriber, #15013) [Link]

FWIW it's used in CPython, where I find it useful for distinguishing what part of an object is what type for implementing inheritance in C. See e.g. https://docs.python.org/3.10/c-api/typeobj.htmlwhere some fields are "ob_"-prefixed, some are "tp_"-prefixed, etc

Introducing maple trees

Posted Feb 13, 2021 4:56 UTC (Sat) by dw (guest, #12017) [Link] (14 responses)

The preprocessor doesn't have any concept of namespaces. This form allows definition of compatibility macros when fields change shape, particularly useful to preserve userspace APIs. Separately, it's juts a hell of a lot easier to grep. Sure there are semantic greps, but nothing as usable as grep itself.

Introducing maple trees

Posted Feb 13, 2021 14:11 UTC (Sat) by ibukanov (guest, #3942) [Link] (13 responses)

grepability is such underrated feature. At work I deal with full Chromium code and I often wished that C++ would not support function name overloading. Surely there are various indexer that supposed to help, but so far I have not find the one that works just as reliably with huge source trees as grep/grep/git grep, but grep is not particularly useful when everybody uses common short names.

My ideal language will be one where within package/module all names should be unique including fields and methods and outside that all names should use a package prefix. Sadly modern language trends go to the opposite direction trying to minimize the number of keystrokes programmers use at the price of increased maintainability cost.

Introducing maple trees

Posted Feb 13, 2021 14:48 UTC (Sat) by mathstuf (subscriber, #69389) [Link] (7 responses)

Barring glob imports, don't most module-using languages have this property (no, C++ won't gain it; the module namespace is completely disjoint from the symbol namespace)? At least to the effect of putting you at the import statement of the file where package names can be factored out.

Introducing maple trees

Posted Feb 13, 2021 16:57 UTC (Sat) by floppus (guest, #137245) [Link] (6 responses)

Not if the language has object methods/properties. If you have two unrelated classes Foo and Bar, and both define a "to_string" method, there's no way for grep to know whether "x.to_string()" is referring to Foo::to_string or Bar::to_string.

Introducing maple trees

Posted Feb 13, 2021 23:08 UTC (Sat) by mathstuf (subscriber, #69389) [Link] (5 responses)

Why would I care about uses of a trait/interface method on a given type using grep or something as naive as it? Given something like `print<T>()`'s implementation (which doesn't care whether it is Foo or Bar being given to it), grep is already at a loss. You'd need to 1) know that print<T> uses to_string, 2) find callers of it, 3) match the caller type on your query, and 4) return the result *in* print<T> with a reference to the instantiator. Doing this across some external library's API is just going to be untenable without something which *understands* the language. Or are you proposing that things like templated code, common interfaces, or the like are not supported by such a language?

Introducing maple trees

Posted Feb 14, 2021 18:58 UTC (Sun) by ibukanov (guest, #3942) [Link] (4 responses)

Common interfaces are OK since in practical cases I do not look for implementations of a a particular method for implementations of a particular interface. And those can be searched with grep by searching for the name of interface/abstract class. Plus any sensible code base would not use well-known names like to_string() to mean unrelated things.

What is problematic is things like x.size() versus, say hypothetical x.vector::size() or even x.std::size() where size() is not a part of any interface. I have no troubles with a few classes named Settings as long as all usages is like namespace::Settings or at least the source file containing "using namespace::Setting" without opening the whole namespace. And that can be enforced with a linter. Sadly C++ method/field names are heavily overloaded and there is no good way to namespace them except by using explicit prefixes.

Introducing maple trees

Posted Feb 14, 2021 22:26 UTC (Sun) by mathstuf (subscriber, #69389) [Link] (3 responses)

Ah, this sounds more like a knock on C++'s decidedly weird ADL or other strange name lookup rules. Yes, for C++, one is just stuck with some kind of smarter, C++-aware querying engine. I have no idea what one does for code that is not in the current build (either via configure options or platform checks) with such a thing though. Luckily, I don't know of any other language which has adopted such lookup rules (other than those importing C++ like Objective-C++).

Introducing maple trees

Posted Feb 15, 2021 8:15 UTC (Mon) by ibukanov (guest, #3942) [Link] (1 responses)

Hm, I suppose in most languages with a notion of an object calling a method does not require a namespace prefix and typically there is no way to use one even if one wants to. And yes, ADL made the problem with C++ even worse.

Introducing maple trees

Posted Feb 15, 2021 14:31 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

FWIW, Rust has the (awkward) `<foo as Trait>::conflated_name()` syntax for explicitly using a given trait's method. This is only necessary if `foo` implements two traits (that are both "visible" within the scope) with the same method name. AFAIK, Rust won't even try to use the signature to disambiguate, but instead error on the ambiguity of the name itself. I think such an approach avoids SFINAE from sneaking into the language, which I can only see as a good thing.

Introducing maple trees

Posted Oct 3, 2022 22:51 UTC (Mon) by bartoc (guest, #124262) [Link]

Other languages do have the "dead code not found" problem. This tends to be resolved by having some kind of conditional compilation facility that still parses code, even in the "dead" branch.

Introducing maple trees

Posted Feb 16, 2021 22:03 UTC (Tue) by JohnVonNeumann (guest, #131609) [Link]

This is a funny comment, I'm about to start a new C++ course at Uni and was reading through the textbook and ran across function name overloading, I thought it sounded like a neat idea but figured it probably had a few issues. For you, is it just grepability that is the issue with FNO?

Introducing maple trees

Posted Feb 18, 2021 22:17 UTC (Thu) by Jandar (subscriber, #85683) [Link] (3 responses)

> I often wished that C++ would not support function name overloading.

Function name overloading is essential for writing generic template functions. In a template function I can call DoSomething(var) where the type of var depends on the template parameters. How should I do this without function name overloading? If one considers operator overloading as a kind of function name overloading, template functions simply can't work without it.

Introducing maple trees

Posted Feb 19, 2021 13:38 UTC (Fri) by farnz (subscriber, #17727) [Link]

There are more principled ways to support overloading than C++ uses; for example, Haskell has typeclasses, and Rust has traits, which provide the same functionality, but restrict the name lookup problem.

Introducing maple trees

Posted Feb 25, 2021 22:05 UTC (Thu) by scientes (guest, #83068) [Link]

Creating a programming language based on the "it's turtles all the way down" idiom.....

Introducing maple trees

Posted Oct 3, 2022 22:55 UTC (Mon) by bartoc (guest, #124262) [Link]

you can be more explicit about how names are resolved in various ways. The biggest problem c++ has here is some of the rules are _extremely_ surprising, such as pulling names from the scope of any type parameters of the function parameter list.

Just getting rid of this helps, and some languages also let you "open" or "close" symbols to control if they are looked up in the definition or instantiation context.

The other way people do this is to just group potential overloads under some named object (like rust traits).

Introducing maple trees

Posted Feb 13, 2021 23:25 UTC (Sat) by Paf (subscriber, #91811) [Link] (1 responses)

It’s not just greppability, it’s general searchability period. Without some full scale IDE, it’s pretty hard to find all the uses of “lock” when you mean specifically the lock in the maple tree structure. But all the uses of ma_lock? Oh, thank God, that’s easy with any tool, cscope, grep, vim, whatever.

This is very much considered good practice in at least large C code bases in my experience.

Introducing maple trees

Posted Feb 14, 2021 1:14 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

Oh, for C yes. This makes complete sense. It's the desire for a "no two distinct things use the same name ever" in a language I was more questioning.

Introducing maple trees

Posted Feb 22, 2021 1:51 UTC (Mon) by ras (subscriber, #33059) [Link]

> Is there a particular reason to keep carrying the extra verbosity around in newly written code like this?

grep.

Introducing maple trees

Posted Feb 13, 2021 7:48 UTC (Sat) by Otus (subscriber, #67685) [Link] (3 responses)

> up to ten elements in leaf nodes, or 16 elements in internal nodes

The cover letter has these the other way around:

> has a branching factor of 16 for leaf/non-alloc nodes and 10 for internal alloc nodes

Introducing maple trees

Posted Feb 13, 2021 19:34 UTC (Sat) by jake (editor, #205) [Link] (2 responses)

> The cover letter has these the other way around:

indeed it does, fixed now, thanks.

jake

Introducing maple trees

Posted Feb 15, 2021 9:49 UTC (Mon) by weberm (guest, #131630) [Link] (1 responses)

Hmm, the blog post ( https://blogs.oracle.com/linux/the-maple-tree section 'the path forward' ) says the following:

"Leaves store up to 31 entries, which means the next/prev are almost always in the same node.

Non-leaves have a branching factor of 10 when tracking gaps and 16 when not tracking gaps, while leaves have a branching factor of 31."

Given the VMA use case would track gaps, that would mean 10 at non-leaves and .. 31 at leaves?

Introducing maple trees

Posted Feb 15, 2021 12:43 UTC (Mon) by willy (subscriber, #9762) [Link]

The '31' is incorrect in that blog post. We use 256-byte nodes and each entry uses an 8-byte pointer and an 8-byte pivot. One 8-byte entry points to the parent so we can walk up the tree, leaving 31 entries for use. The start index and end index of the node are implied by the parent's pivots, meaning we can store 15 pivots and 16 pointers in each leaf entry.

Introducing maple trees

Posted Feb 13, 2021 17:10 UTC (Sat) by dougg (guest, #1894) [Link] (1 responses)

Is this the team that gave us xarray? If so, this follow-up effort deserves fast tracking.

Introducing maple trees

Posted Feb 14, 2021 3:17 UTC (Sun) by willy (subscriber, #9762) [Link]

Very kind of you to say so!

XArray was primarily a one-man effort while the Maple Tree is a collaboration (with Liam doing the majority of the work).
The Maple Tree trades off a more complex implementation for a denser and more featureful representation than the XArray.
The APIs are deliberately similar, and I hope to merge the two eventually.

Introducing maple trees

Posted Feb 22, 2021 4:18 UTC (Mon) by ssmith32 (subscriber, #72404) [Link] (3 responses)

I kept reading this, and the linked article, trying to figure out what sets a maple tree apart from an ordinary B-tree, that warrants the new (and more difficult to search for information on..) name. Did I miss something, or are the developers just having a little fun with this one?

Introducing maple trees

Posted Feb 22, 2021 8:15 UTC (Mon) by Wol (subscriber, #4433) [Link] (2 responses)

Maybe the maple-tree trade-offs are different to the btree tradeoffs. Feed pathological data to a btree and you end up with a linked list that is expensive to re-balance. And that's the last thing you want to have to do in the middle of a critical section :-)

Cheers,
Wol

Introducing maple trees

Posted Feb 25, 2021 16:58 UTC (Thu) by jch (guest, #51929) [Link] (1 responses)

I think ssmith32 was speaking about B-trees, while you seem to refer to binary search trees, which, as you justly note, are not rebalanced and have pathological worst-case behaviour.

I too would like to know whether "Maple trees" improve upon B-trees, and in what way.

Introducing maple trees

Posted Mar 5, 2021 7:51 UTC (Fri) by ssmith32 (subscriber, #72404) [Link]

Exactly B-trees, not Binary trees. The description of Maple trees matched my memory of B-trees (a throwback to spinning rust, now returns cache-sized, instead of page-sized!).

On the upside, thus article prodded me to pull my CLRS off the floor, where it had sat after my last interview forced me to revive my dynamic programming skills (as limited as they ever were)

https://en.m.wikipedia.org/wiki/B-tree


Copyright © 2021, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds