Skip to content

updating title on responses file search cookbook #2004

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 6, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions examples/File_Search_Responses.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
"\n",
"_File search was previously available on the Assistants API. It's now available on the new Responses API, an API that can be stateful or stateless, and with from new features like metadata filtering_\n",
"\n",
"### Set up"
"# Creating Vector Store with our PDFs"
]
},
{
Expand Down Expand Up @@ -52,8 +52,6 @@
"id": "43e5cb9c-fc99-45e2-bd79-9c9ba5b410cc",
"metadata": {},
"source": [
"### Creating Vector Store with our PDFs\n",
"\n",
"We will create a Vector Store on OpenAI API and upload our PDFs to the Vector Store. OpenAI will read those PDFs, separate the content into multiple chunks of text, run embeddings on those and store those embeddings and the text in the Vector Store. It will enable us to query this Vector Store to return relevant content based on a query."
]
},
Expand Down Expand Up @@ -157,7 +155,7 @@
"id": "e5f4ade3-2b3e-4df6-a441-c1ee3ea73172",
"metadata": {},
"source": [
"### Standalone vector search\n",
"# Standalone vector search\n",
"\n",
"Now that our vector store is ready, we are able to query the Vector Store directly and retrieve relevant content for a specific query. Using the new [vector search API](https://platform.openai.com/docs/api-reference/vector-stores/search), we're able to find relevant items from our knowledge base without necessarily integrating it in an LLM query."
]
Expand Down Expand Up @@ -211,7 +209,7 @@
"source": [
"We can see that different size (and under-the-hood different texts) have been returned from the search query. They all have different relevancy score that are calculated by our ranker which uses hybrid search.\n",
"\n",
"### Integrating search results with LLM in a single API call\n",
"# Integrating search results with LLM in a single API call\n",
"\n",
"However instead of querying the vector store and then passing the data into the Responses or Chat Completion API call, an even more convenient way to use this search results in an LLM query would be to plug use file_search tool as part of OpenAI Responses API."
]
Expand Down Expand Up @@ -273,7 +271,7 @@
"source": [
"We can see that `gpt-4o-mini` was able to answer a query that required more recent, specialised knowledge about OpenAI's Deep Research. It used content from the file `Introducing deep research _ OpenAI.pdf` that had chunks of texts that were the most relevant. If we want to go even deeper in the analysis of chunk of text retrieved, we can also analyse the different texts that were returned by the search engine by adding `include=[\"output[*].file_search_call.search_results\"]` to our query.\n",
"\n",
"## Evaluating performance\n",
"# Evaluating performance\n",
"\n",
"What is key for those information retrieval system is to also measure the relevance & quality of files retrieved for those answers. The following steps of this cookbook will consist in generating an evaluation dataset and calculating different metrics over this generated dataset. This is an imperfect approach and we'll always recommend to have a human-verified evaluation dataset for your own use-cases, but it will show you the methodology to evaluate those. It will be imperfect because some of the questions generated might be generic (e.g: What's said by the main stakeholder in this document) and our retrieval test will have a hard time to figure out which document that question was generated for."
]
Expand All @@ -283,7 +281,7 @@
"id": "93291578-d04a-4e71-8ecb-9f0f647e68c3",
"metadata": {},
"source": [
"### Generating questions\n",
"## Generating evaluations\n",
"\n",
"We will create functions that will read through the PDFs we have locally and generate a question that can only be answered by this document. Therefore it'll create our evaluation dataset that we can use after."
]
Expand Down Expand Up @@ -431,6 +429,8 @@
"id": "dbda554b-c3d4-4b07-9028-b41670c2fa20",
"metadata": {},
"source": [
"## Evaluating\n",
"\n",
"We'll convert our dictionary into a dataframe and process it using gpt-4o-mini. We will look out for the expected file "
]
},
Expand Down