-
Notifications
You must be signed in to change notification settings - Fork 859
Batch neighbour retrieval in single server case #21862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Batch neighbour retrieval in single server case #21862
Conversation
6b721a2
to
b3150f6
Compare
…trieval-single-server-case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good Code in general, I think this is a step in the right direction.
But as you noted it will not yet solve the problem, as it still iterates over all neighbors, it just adds context switches for now. (You want to use those to return so that is good 👍)
I am starting to think about this on a more high level.
Is the intermediate copy into the std::vector actually necessary?
Or should we aim for an iterator approach directly. The iterator we could just keep open.
And only consume the batch we want.
If we cache then the iterator can be feed from the cache on next round.
arangod/Graph/Providers/SingleServer/SingleServerNeighbourProvider.cpp
Outdated
Show resolved
Hide resolved
- add a use-cache plug to SingleServerNeighbourProvider because for some usages - e.g. a simple tree traversal - the cache will not be useful and just waste memory. The plug is not yet used. - reverse batch size memory batch for batch vector for more efficient memory usage - get batch size from ExecutionBlock::DefaultBatchSize instead of using a "magic number"
…our-retrieval-single-server-case
This is the first PR to retrieve neighbours in batches in traversals. This can drastically improve limited traversal runtimes on graphs with supernodes.
In the graph's
SingleServerProvider
, this PR adapts how neighbours of a specific vertex are read in theexpand
function. It now introduces a neighbour provider that is responsible for providing the neighbours in batches (which extracts a lot of code in theSingleServerProvider
). For now we loop over all batches to get all neighbours inexpand
. In a later PR (when the cluster case is also implemented),expand
should return one batch per call.The neighbour provider is set to a specific vertex and provides one batch of neighbours per call to its
next
function. Internally, it saves read neighbour batches to a cache. If the neighbour provider is set to a vertex for which the cache includes already all neighbours, the neighbour provider provides these cached batches instead of reading them again from memory.