Skip to content

Feat: Allowing evaluations using Ragas Metrics in EvalTask #5197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sahusiddharth
Copy link

@sahusiddharth sahusiddharth commented Apr 23, 2025

This PR enables evaluation using Ragas's framework alongside existing Vertex's metrics.

Implementation Details

Ragas metrics evaluation is executed in a separate loop after the main executor loop where Vertex metrics are evaluated. This separate implementation was necessary because:

Ragas performs evaluation asynchronously, while the existing evaluation infrastructure uses multi-threading.
Combining these approaches led to several runtime errors:

BlockingIOError: [Errno 35] Resource temporarily unavailable inside gRPC polling callbacks
Future Attached to a Different Loop errors when async Ragas calls were invoked on one event loop but processed by another
Synchronous Ragas functions (wrappers around async implementations) caused similar conflicts

Attempted Solutions

Multiple approaches were tested to integrate Ragas within the existing evaluation loop:

  • Using synchronous single_turn_score functions resulted in gRPC polling callback errors
  • Using asynchronous single_turn_ascore functions created coroutine processing challenges
  • Attempts to isolate asyncio event loops between threads were unsuccessful

Final Solution

The chosen implementation runs Ragas metrics separately after the main evaluation loop completes, preserving both the multi-threaded performance of the existing evaluation system and the asynchronous benefits of Ragas, while avoiding runtime conflicts between the two approaches.

The diagram illustrates the functional organization within _evaluation.py where changes have been implemented. Yellow boxes indicate functions that import from the Ragas framework

Screenshot 2025-04-23 at 8 01 15 AM

Testing:

A complete end-to-end example demonstrating the implementation is available in the accompanying gist, which shows successful execution without runtime errors:

https://gist.github.com/sahusiddharth/39030eb6318a16b7cdc3d30c6a7c458b

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: vertex-ai Issues related to the googleapis/python-aiplatform API. labels Apr 23, 2025
@sahusiddharth sahusiddharth requested a review from jsondai April 26, 2025 09:54
@sahusiddharth
Copy link
Author

Hi @jsondai, please let me know if there’s anything I can adjust to help move this PR forward!

@jsondai
Copy link
Member

jsondai commented Apr 30, 2025

Hi @jsondai, please let me know if there’s anything I can adjust to help move this PR forward!

Hi Siddharth,

Thank you very much for the PR!

Our team is discussing on:

  1. Future support and maintenance for the external partnership and integration to the Vertex Eval SDK.
  2. The standard process for external partners to integrate wtih Vertex Evaluation Service.

Regarding the code changes, could you please add some unit tests for the PR in tests/unit/vertexai/test_evaluation.py? Otherwise it looks good to me.

Thanks,
Jason

@jaycee-li jaycee-li added do not merge Indicates a pull request not ready for merge, due to either quality or timing. and removed do not merge Indicates a pull request not ready for merge, due to either quality or timing. labels May 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: vertex-ai Issues related to the googleapis/python-aiplatform API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants