Skip to content

feat: Add RunPipeline tool #253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 15, 2025

Conversation

vm-mishchenko
Copy link
Collaborator

@vm-mishchenko vm-mishchenko commented May 14, 2025

Add a new RunPipeline tool that can execute aggregation pipeline without requiring an Atlas account, cluster, or collection.

The tool accepts a set of documents, an aggregation pipeline, and a search index definition, and runs them against the Search Playground. The Search Playground internally creates an ephemeral collection and executes the pipeline in a temporary environment.

Manual testing + integration test. More tests will be added in the following prs.

What would be the result of query?
image

Is the query syntax correct?
image

Why $search doesn't return first doc?
image

@vm-mishchenko vm-mishchenko force-pushed the add-run-playground-tool branch from 6bee4bc to 256587a Compare May 14, 2025 18:55
@vm-mishchenko vm-mishchenko changed the base branch from main to search-skunkworks-2025 May 14, 2025 18:55
@vm-mishchenko vm-mishchenko marked this pull request as ready for review May 14, 2025 19:11
@vm-mishchenko vm-mishchenko requested a review from a team as a code owner May 14, 2025 19:11
export const RunPipelineOperationArgs = {
documents: z
.array(z.record(z.string(), z.unknown()))
.describe("Documents to run the pipeline against. 500 is maximum.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Worth adding .max(500) to codify this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice idea, added

export class RunPipeline extends ToolBase {
protected name = "run-pipeline";
protected description =
"Run aggregation pipeline for provided documents without needing an Atlas account, cluster, or collection.";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's helpful to provide some description context to the LLM about when to use this tool? Like can be useful in cases such as x, y, z., since the use cases seem more open ended than the other tools

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a small clause: The tool can be useful for running ad-hoc pipelines for testing or debugging.

I agree, it's quite open ended tool so I would leave it to llm to decide when exactly it wants to use it.

.array(z.record(z.string(), z.unknown()))
.describe("Aggregation pipeline to run on the provided documents.")
.default(DEFAULT_PIPELINE),
searchIndexDefinition: z
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are more specific types for aggregationPipeline/searchIndexDefinition/synonyms useful for the LLM or is it already pretty good at determining the types from the description?

For ex, the search playground looks limited to a subset of aggregation pipeline stages. Would those be helpful to include in the type?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it would be hard to add more specific zod types here. All these entities have a complex dynamic structure unfortunately.

I updated Aggregation pipeline... to MongoDB aggregation pipeline (same for other fields) to stress MongoDB part that hopefully nudges LLM to the right direction.

Regarding supported stages, I’d avoid listing them here. If we hardcode them, the list will likely get out of sync over time between the Playground and MCP. I’d rather rely on the Playground’s response to flag any unsupported stages. It actually supports more than what’s in the public docs (product wants to position it as a Search only playground for now).

@vm-mishchenko vm-mishchenko force-pushed the add-run-playground-tool branch from 2f25ada to 3df5eca Compare May 15, 2025 15:35
@vm-mishchenko vm-mishchenko requested a review from edgarw19 May 15, 2025 15:53
Copy link
Collaborator

@edgarw19 edgarw19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@vm-mishchenko vm-mishchenko merged commit 298adf4 into search-skunkworks-2025 May 15, 2025
9 checks passed
@vm-mishchenko vm-mishchenko deleted the add-run-playground-tool branch May 15, 2025 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants