Skip to content

Batch neighbour retrieval in single server case #21862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: devel
Choose a base branch
from

Conversation

jvolmer
Copy link
Contributor

@jvolmer jvolmer commented Jul 16, 2025

This is the first PR to retrieve neighbours in batches in traversals. This can drastically improve limited traversal runtimes on graphs with supernodes.

In the graph's SingleServerProvider, this PR adapts how neighbours of a specific vertex are read in the expand function. It now introduces a neighbour provider that is responsible for providing the neighbours in batches (which extracts a lot of code in the SingleServerProvider). For now we loop over all batches to get all neighbours in expand. In a later PR (when the cluster case is also implemented), expand should return one batch per call.

The neighbour provider is set to a specific vertex and provides one batch of neighbours per call to its next function. Internally, it saves read neighbour batches to a cache. If the neighbour provider is set to a vertex for which the cache includes already all neighbours, the neighbour provider provides these cached batches instead of reading them again from memory.

@jvolmer jvolmer self-assigned this Jul 16, 2025
@cla-bot cla-bot bot added the cla-signed label Jul 16, 2025
@jvolmer jvolmer force-pushed the feature/batch-neighbour-retrieval-single-server-case branch from 6b721a2 to b3150f6 Compare July 17, 2025 12:51
Copy link
Member

@mchacki mchacki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good Code in general, I think this is a step in the right direction.
But as you noted it will not yet solve the problem, as it still iterates over all neighbors, it just adds context switches for now. (You want to use those to return so that is good 👍)

I am starting to think about this on a more high level.
Is the intermediate copy into the std::vector actually necessary?
Or should we aim for an iterator approach directly. The iterator we could just keep open.
And only consume the batch we want.

If we cache then the iterator can be feed from the cache on next round.

jvolmer added 2 commits July 28, 2025 15:13
- add a use-cache plug to SingleServerNeighbourProvider because for
some usages - e.g. a simple tree traversal - the cache will not be
useful and just waste memory. The plug is not yet used.
- reverse batch size memory batch for batch vector for more efficient
memory usage
- get batch size from ExecutionBlock::DefaultBatchSize instead of
using a "magic number"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants