Skip to content

Commit d12ba28

Browse files
Add MongoDB Vector Search Tool (crewAIInc#319)
* INTPYTHON-580 Design and Implement MongoDBVectorSearchTool * add implementation * wip * wip * finish tests * add todo * refactor to wrap langchain-mongodb * cleanup * address review * Fix usage of EnvVar class * inline code * lint * lint * fix usage of SearchIndexModel * Refactor: Update EnvVar import path and remove unused tests.utils module - Changed import of EnvVar from tests.utils to crewai.tools in multiple files. - Updated README.md for MongoDB vector search tool with additional context. - Modified subprocess command in vector_search.py for package installation. - Cleaned up test_generate_tool_specs.py to improve mock patching syntax. - Deleted unused tests/utils.py file. * update the crewai dep and the lockfile * chore: update package versions and dependencies in uv.lock - Removed `auth0-python` package. - Updated `crewai` version to 0.140.0 and adjusted its dependencies. - Changed `json-repair` version to 0.25.2. - Updated `litellm` version to 1.72.6. - Modified dependency markers for several packages to improve compatibility with Python versions. * refactor: improve MongoDB vector search tool with enhanced error handling and new dimensions field - Added logging for error handling in the _run method and during client cleanup. - Introduced a new 'dimensions' field in the MongoDBVectorSearchConfig for embedding vector size. - Refactored the _run method to return JSON formatted results and handle exceptions gracefully. - Cleaned up import statements and improved code readability. * address review * update tests * debug * fix test * fix test * fix test * support azure openai --------- Co-authored-by: lorenzejay <lorenzejaytech@gmail.com>
1 parent 55b59ef commit d12ba28

File tree

12 files changed

+3187
-1951
lines changed

12 files changed

+3187
-1951
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ CrewAI provides an extensive collection of powerful tools ready to enhance your
2525
- **File Management**: `FileReadTool`, `FileWriteTool`
2626
- **Web Scraping**: `ScrapeWebsiteTool`, `SeleniumScrapingTool`
2727
- **Database Integrations**: `PGSearchTool`, `MySQLSearchTool`
28+
- **Vector Database Integrations**: `MongoDBVectorSearchTool`, `QdrantVectorSearchTool`, `WeaviateVectorSearchTool`
2829
- **API Integrations**: `SerperApiTool`, `EXASearchTool`
2930
- **AI-powered Tools**: `DallETool`, `VisionTool`, `StagehandTool`
3031

@@ -226,4 +227,3 @@ Join our rapidly growing community and receive real-time support:
226227
- [Open an Issue](https://github.com/crewAIInc/crewAI/issues)
227228

228229
Build smarter, faster, and more powerful AI solutions—powered by CrewAI Tools.
229-

crewai_tools/__init__.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
from .adapters.enterprise_adapter import EnterpriseActionTool
22
from .adapters.mcp_adapter import MCPServerAdapter
3+
from .adapters.zapier_adapter import ZapierActionTool
34
from .aws import (
45
BedrockInvokeAgentTool,
56
BedrockKBRetrieverTool,
@@ -23,9 +24,9 @@
2324
DirectorySearchTool,
2425
DOCXSearchTool,
2526
EXASearchTool,
27+
FileCompressorTool,
2628
FileReadTool,
2729
FileWriterTool,
28-
FileCompressorTool,
2930
FirecrawlCrawlWebsiteTool,
3031
FirecrawlScrapeWebsiteTool,
3132
FirecrawlSearchTool,
@@ -35,6 +36,8 @@
3536
LinkupSearchTool,
3637
LlamaIndexTool,
3738
MDXSearchTool,
39+
MongoDBVectorSearchConfig,
40+
MongoDBVectorSearchTool,
3841
MultiOnTool,
3942
MySQLSearchTool,
4043
NL2SQLTool,
@@ -76,4 +79,3 @@
7679
YoutubeVideoSearchTool,
7780
ZapierActionTools,
7881
)
79-
from .adapters.zapier_adapter import ZapierActionTool

crewai_tools/tools/__init__.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,10 @@
1616
from .exa_tools.exa_search_tool import EXASearchTool
1717
from .file_read_tool.file_read_tool import FileReadTool
1818
from .file_writer_tool.file_writer_tool import FileWriterTool
19+
from .files_compressor_tool.files_compressor_tool import FileCompressorTool
1920
from .firecrawl_crawl_website_tool.firecrawl_crawl_website_tool import (
2021
FirecrawlCrawlWebsiteTool,
2122
)
22-
from .files_compressor_tool.files_compressor_tool import FileCompressorTool
2323
from .firecrawl_scrape_website_tool.firecrawl_scrape_website_tool import (
2424
FirecrawlScrapeWebsiteTool,
2525
)
@@ -30,6 +30,11 @@
3030
from .linkup.linkup_search_tool import LinkupSearchTool
3131
from .llamaindex_tool.llamaindex_tool import LlamaIndexTool
3232
from .mdx_search_tool.mdx_search_tool import MDXSearchTool
33+
from .mongodb_vector_search_tool import (
34+
MongoDBToolSchema,
35+
MongoDBVectorSearchConfig,
36+
MongoDBVectorSearchTool,
37+
)
3338
from .multion_tool.multion_tool import MultiOnTool
3439
from .mysql_search_tool.mysql_search_tool import MySQLSearchTool
3540
from .nl2sql.nl2sql_tool import NL2SQLTool
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# MongoDBVectorSearchTool
2+
3+
## Description
4+
This tool is specifically crafted for conducting vector searches within docs within a MongoDB database. Use this tool to find semantically similar docs to a given query.
5+
6+
MongoDB can act as a vector database that is used to store and query vector embeddings. You can follow the docs here:
7+
https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-overview/
8+
9+
## Installation
10+
Install the crewai_tools package with MongoDB support by executing the following command in your terminal:
11+
12+
```shell
13+
pip install crewai-tools[mongodb]
14+
```
15+
16+
or
17+
18+
```
19+
uv add crewai-tools --extra mongodb
20+
```
21+
22+
## Example
23+
To utilize the MongoDBVectorSearchTool for different use cases, follow these examples:
24+
25+
```python
26+
from crewai_tools import MongoDBVectorSearchTool
27+
28+
# To enable the tool to search any website the agent comes across or learns about during its operation
29+
tool = MongoDBVectorSearchTool(
30+
database_name="example_database',
31+
collection_name='example_collections',
32+
connection_string="<your_mongodb_connection_string>",
33+
)
34+
```
35+
36+
or
37+
38+
```python
39+
from crewai_tools import MongoDBVectorSearchConfig, MongoDBVectorSearchTool
40+
41+
# Setup custom embedding model and customize the parameters.
42+
query_config = MongoDBVectorSearchConfig(limit=10, oversampling_factor=2)
43+
tool = MongoDBVectorSearchTool(
44+
database_name="example_database',
45+
collection_name='example_collections',
46+
connection_string="<your_mongodb_connection_string>",
47+
query_config=query_config,
48+
index_name="my_vector_index",
49+
generative_model="gpt-4o-mini"
50+
)
51+
52+
# Adding the tool to an agent
53+
rag_agent = Agent(
54+
name="rag_agent",
55+
role="You are a helpful assistant that can answer questions with the help of the MongoDBVectorSearchTool.",
56+
goal="...",
57+
backstory="...",
58+
llm="gpt-4o-mini",
59+
tools=[tool],
60+
)
61+
```
62+
63+
Preloading the MongoDB database with documents:
64+
65+
```python
66+
from crewai_tools import MongoDBVectorSearchTool
67+
68+
# Generate the documents and add them to the MongoDB database
69+
test_docs = client.collections.get("example_collections")
70+
71+
# Create the tool.
72+
tool = MongoDBVectorSearchTool(
73+
database_name="example_database',
74+
collection_name='example_collections',
75+
connection_string="<your_mongodb_connection_string>",
76+
)
77+
78+
# Add the text from a set of CrewAI knowledge documents.
79+
texts = []
80+
for d in os.listdir("knowledge"):
81+
with open(os.path.join("knowledge", d), "r") as f:
82+
texts.append(f.read())
83+
tool.add_texts(text)
84+
85+
# Create the vector search index (if it wasn't already created in Atlas).
86+
tool.create_vector_search_index(dimensions=3072)
87+
```
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
from .vector_search import (
2+
MongoDBToolSchema,
3+
MongoDBVectorSearchConfig,
4+
MongoDBVectorSearchTool,
5+
)
6+
7+
__all__ = [
8+
"MongoDBVectorSearchConfig",
9+
"MongoDBVectorSearchTool",
10+
"MongoDBToolSchema",
11+
]
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
from __future__ import annotations
2+
3+
from time import monotonic, sleep
4+
from typing import TYPE_CHECKING, Any, Callable, Dict, List, Optional
5+
6+
if TYPE_CHECKING:
7+
from pymongo.collection import Collection
8+
9+
10+
def _vector_search_index_definition(
11+
dimensions: int,
12+
path: str,
13+
similarity: str,
14+
filters: Optional[List[str]] = None,
15+
**kwargs: Any,
16+
) -> Dict[str, Any]:
17+
# https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-type/
18+
fields = [
19+
{
20+
"numDimensions": dimensions,
21+
"path": path,
22+
"similarity": similarity,
23+
"type": "vector",
24+
},
25+
]
26+
if filters:
27+
for field in filters:
28+
fields.append({"type": "filter", "path": field})
29+
definition = {"fields": fields}
30+
definition.update(kwargs)
31+
return definition
32+
33+
34+
def create_vector_search_index(
35+
collection: Collection,
36+
index_name: str,
37+
dimensions: int,
38+
path: str,
39+
similarity: str,
40+
filters: Optional[List[str]] = None,
41+
*,
42+
wait_until_complete: Optional[float] = None,
43+
**kwargs: Any,
44+
) -> None:
45+
"""Experimental Utility function to create a vector search index
46+
47+
Args:
48+
collection (Collection): MongoDB Collection
49+
index_name (str): Name of Index
50+
dimensions (int): Number of dimensions in embedding
51+
path (str): field with vector embedding
52+
similarity (str): The similarity score used for the index
53+
filters (List[str]): Fields/paths to index to allow filtering in $vectorSearch
54+
wait_until_complete (Optional[float]): If provided, number of seconds to wait
55+
until search index is ready.
56+
kwargs: Keyword arguments supplying any additional options to SearchIndexModel.
57+
"""
58+
from pymongo.operations import SearchIndexModel
59+
60+
if collection.name not in collection.database.list_collection_names():
61+
collection.database.create_collection(collection.name)
62+
63+
result = collection.create_search_index(
64+
SearchIndexModel(
65+
definition=_vector_search_index_definition(
66+
dimensions=dimensions,
67+
path=path,
68+
similarity=similarity,
69+
filters=filters,
70+
**kwargs,
71+
),
72+
name=index_name,
73+
type="vectorSearch",
74+
)
75+
)
76+
77+
if wait_until_complete:
78+
_wait_for_predicate(
79+
predicate=lambda: _is_index_ready(collection, index_name),
80+
err=f"{index_name=} did not complete in {wait_until_complete}!",
81+
timeout=wait_until_complete,
82+
)
83+
84+
85+
def _is_index_ready(collection: Collection, index_name: str) -> bool:
86+
"""Check for the index name in the list of available search indexes to see if the
87+
specified index is of status READY
88+
89+
Args:
90+
collection (Collection): MongoDB Collection to for the search indexes
91+
index_name (str): Vector Search Index name
92+
93+
Returns:
94+
bool : True if the index is present and READY false otherwise
95+
"""
96+
for index in collection.list_search_indexes(index_name):
97+
if index["status"] == "READY":
98+
return True
99+
return False
100+
101+
102+
def _wait_for_predicate(
103+
predicate: Callable, err: str, timeout: float = 120, interval: float = 0.5
104+
) -> None:
105+
"""Generic to block until the predicate returns true
106+
107+
Args:
108+
predicate (Callable[, bool]): A function that returns a boolean value
109+
err (str): Error message to raise if nothing occurs
110+
timeout (float, optional): Wait time for predicate. Defaults to TIMEOUT.
111+
interval (float, optional): Interval to check predicate. Defaults to DELAY.
112+
113+
Raises:
114+
TimeoutError: _description_
115+
"""
116+
start = monotonic()
117+
while not predicate():
118+
if monotonic() - start > timeout:
119+
raise TimeoutError(err)
120+
sleep(interval)

0 commit comments

Comments
 (0)