Skip to content

Fix inconsistent top_k validation in SentenceTransformersDiversityRanker #9698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

SaurabhLingam
Copy link

@SaurabhLingam SaurabhLingam commented Aug 10, 2025

  • fixes SentenceTransformersDiversityRanker with MMR - inconsistent top_k handling #9695

  • changed elif to if in run() method to ensure top_k validation always runs regardless of whatever top_k comes from init or runtime

  • Both scenarios now consistently raise ValueError with descriptive message format: 'top_k must be between 1 and X, but got Y'

  • Fixes inconsistency where init top_k gave confusing MMR error while runtime top_k gave clear validation error

- change elif to if in run() method to ensure top_k validation always
  runs regardless of whatever top_k comes from init or runtime
- Both scenarios now consistently raise ValueError with descriptive
  message format: 'top_k must be between 1 and X, but got Y'
- Fixes inconsistency where init top_k gave confusing MMR error while
  runtime top_k gave clear validation error
@SaurabhLingam SaurabhLingam requested a review from a team as a code owner August 10, 2025 08:12
@SaurabhLingam SaurabhLingam requested review from vblagoje and removed request for a team August 10, 2025 08:12
@CLAassistant
Copy link

CLAassistant commented Aug 10, 2025

CLA assistant check
All committers have signed the CLA.

@anakin87 anakin87 self-requested a review August 10, 2025 09:44
@anakin87
Copy link
Member

@SaurabhLingam thanks for this PR.

Unfortunately, the tests are failing because this component has a very inconsistent behavior: #9695 (comment)
Let me investigate a bit internally how we want to fix that and get back to you.

@anakin87
Copy link
Member

@SaurabhLingam could you please modify this PR according #9695 (comment)?

  • in run, only check if top_k<=0 and raise and error
  • review the MMR computation algorithm. The while condition should be while len(selected) < min(top_k, len(documents)):

Please also add a release note, as explained here. It should contain something like:

fixes:
  - |
    Ensure consistent behavior in `SentenceTransformersDiversityRanker`. Like other rankers, it now returns
    all documents instead of raising an error when `top_k` exceeds the number of available documents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SentenceTransformersDiversityRanker with MMR - inconsistent top_k handling
3 participants