Skip to content

feat(community): Enhance SAP HANA Vector integration with metadata columns, keyword filtering, and internal embeddings #8001

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

yberber-sap
Copy link

@yberber-sap yberber-sap commented Apr 10, 2025

Description

This PR enhances the SAP HANA Cloud Vector Store integration in langchainjs with three major features that align its behavior with the existing Python implementation:


Enhancements

  1. Dedicated Metadata Columns
    Metadata fields can now be stored in dedicated table columns in addition to the JSON metadata field. This enables more efficient filtering and indexing, and mirrors the behavior of the Python implementation (add_texts in hanavector.py).

  2. Keyword Search Filtering
    Added support for a $contains operator in metadata filters, which constructs SQL full-text search conditions such as:

    SCORE(? IN ("<column>" EXACT SEARCH MODE 'text')) > 0

    When filtering on fields stored in the JSON metadata, a WITH clause is used to project the relevant key into its own column.

  3. Internal Embedding Support
    Introduced a HanaInternalEmbeddings class that uses SAP HANA's native VECTOR_EMBEDDING function to compute document and query embeddings directly in the database. This allows fully in-database similarity search with reduced external dependencies.
    Depending on whether internal or external embeddings are used, the query structure dynamically switches between injecting a precomputed vector (via embedQuery()) or calling the native HANA VECTOR_EMBEDDING function inline.


Tests & Documentation


Issue: No associated issue.

Dependencies: No new dependencies were introduced in this PR.

Twitter handle: @sapopensource

@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Apr 10, 2025
Copy link

vercel bot commented Apr 10, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchainjs-docs ✅ Ready (Inspect) Visit Preview Apr 11, 2025 6:49am
1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchainjs-api-refs ⬜️ Ignored (Inspect) Apr 11, 2025 6:49am

@dosubot dosubot bot added the auto:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features label Apr 10, 2025
yberber-sap and others added 22 commits April 11, 2025 08:30
Co-authored-by: wsvoja <120735101+wsvoja@users.noreply.github.com>
Co-authored-by: wsvoja <120735101+wsvoja@users.noreply.github.com>
Co-authored-by: wsvoja <120735101+wsvoja@users.noreply.github.com>
…rnal embedding example

Co-authored-by: wsvoja <120735101+wsvoja@users.noreply.github.com>
@yberber-sap
Copy link
Author

Hi @jacoblee93 👋 Can you take a look at this PR? I'd appreciate your feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant