Skip to content

Conversation

jbajic
Copy link
Contributor

@jbajic jbajic commented Aug 19, 2025

Scope & Purpose

Introduces filtering to the vector index. This way, we will push the filtering expression to the vector index iterator and evaluate it there. A continuation of this would be an investigation into whether the batching of documents would be beneficial, and if so, implementing it.
Now queries like this should work, and the FilterNode will be merged with the EnumerateNearVectorIndexNode:

FOR d IN col
FILTER d.val > 3
LET dist = APPROX_NEAR_L2(d.vector, @q)
SORT dist LIMIT 3 RETURN d

This implementation still does not support storedValues usage and compound indexes.

  • 💩 Bugfix
  • 🍕 New feature
  • 🔥 Performance improvement
  • 🔨 Refactoring/simplification

Checklist

  • Tests
    • Regression tests
    • C++ Unit tests
    • integration tests
    • resilience tests
  • 📖 CHANGELOG entry made
  • 📚 documentation written (release notes, API changes, ...)
  • Backports
    • Backport for 3.12.0: (Please link PR)
    • Backport for 3.11: (Please link PR)
    • Backport for 3.10: (Please link PR)

Related Information

(Please reference tickets / specification / other PRs etc)

  • Docs PR:
  • Enterprise PR:
  • GitHub issue / Jira ticket:
  • Design document:

@jbajic jbajic self-assigned this Aug 19, 2025
@cla-bot cla-bot bot added the cla-signed label Aug 19, 2025
@maierlars
Copy link
Contributor

Interesting, so we are materializing the document twice? It appears to me that filtering based on stored values would be even easier and more efficient. Just my five cents 😁 but I like the progress 👏🏽

@jbajic
Copy link
Contributor Author

jbajic commented Sep 3, 2025

Interesting, so we are materializing the document twice? It appears to me that filtering based on stored values would be even easier and more efficient. Just my five cents 😁 but I like the progress 👏🏽

Yes, we are materializing twice. This would be the non-optimized case that would support all FILTER clauses next to the vector index. We are keeping in mind to work on supporting storedValues in vector index 😃 to avoid double materialization. Thanks for the comment and for keeping track of the vector index work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants