Skip to content

Commit ce7178c

Browse files
Tilde ThuriumTuanaCelikbilgeyucel
authored
Add tutorial for file preprocessing index pipeline (deepset-ai#274)
* add file type preprocessing tutorial * update index.toml * update workflows and README * Update tutorials/30_File_Type_Preprocessing_Index_Pipeline.ipynb Co-authored-by: Tuana Çelik <tuana.celik@deepset.ai> * Update index.toml Co-authored-by: Bilge Yücel <bilgeyucel96@gmail.com> * Update index.toml Co-authored-by: Bilge Yücel <bilgeyucel96@gmail.com> * Update README.md Co-authored-by: Bilge Yücel <bilgeyucel96@gmail.com> * address feedback * remove commented out code and add telemetry text * adding a mini explanation on DocumentJoiner * moving pipeline drawing up --------- Co-authored-by: Tuana Çelik <tuana.celik@deepset.ai> Co-authored-by: Bilge Yücel <bilgeyucel96@gmail.com>
1 parent 6f4c8dc commit ce7178c

File tree

5 files changed

+5139
-2
lines changed

5 files changed

+5139
-2
lines changed

.github/workflows/run_tutorials.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ jobs:
7070
"tutorials/27_First_RAG_Pipeline.ipynb"
7171
"tutorials/28_Structured_Output_With_Loop.ipynb"
7272
"tutorials/29_Serializing_Pipelines.ipynb"
73+
"tutorials/30_File_Type_Preprocessing_Index_Pipeline.ipynb"
7374
"tutorials/31_Metadata_Filtering.ipynb"
7475
)
7576
for changed_file in ${{ steps.files.outputs.all_changed_files }}; do

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,5 +47,6 @@ To contribute to the tutorials, please check out our [Contributing Guidelines](.
4747
| [Your First QA Pipeline with Retrieval-Augmentation](./tutorials/27_First_RAG_Pipeline.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/27_First_RAG_Pipeline.ipynb) |
4848
| [Generating Structured Output with Loop-Based Auto-Correction](./tutorials/28_Structured_Output_With_Loop.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/28_Structured_Output_With_Loop.ipynb) |
4949
| [Serializing Pipelines](./tutorials/29_Serializing_Pipelines.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/29_Serializing_Pipelines.ipynb) |
50+
| [Preprocessing Different File Types](./tutorials/30_File_Type_Preprocessing_Index_Pipeline.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/30_File_Type_Preprocessing_Index_Pipeline.ipynb) |
5051
| [Metadata Filtering](./tutorials/31_Metadata_Filtering.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/31_Metadata_Filtering.ipynb) |
5152

index.toml

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -298,13 +298,24 @@ completion_time = "10 min"
298298
created_at = 2024-01-29
299299
haystack_2 = true
300300

301+
[[tutorial]]
302+
title = "Preprocessing Different File Types"
303+
description = "Learn how to build an indexing pipeline that will preprocess files based on their file type"
304+
level = "beginner"
305+
weight = 72
306+
notebook = "30_File_Type_Preprocessing_Index_Pipeline.ipynb"
307+
aliases = []
308+
completion_time = "10 min"
309+
created_at = 2024-01-30
310+
haystack_2 = true
311+
301312
[[tutorial]]
302313
title = "Filtering Documents with Metadata"
303314
description = "Learn how to filter down to specific documents at retrieval time using metadata"
304315
level = "beginner"
305316
weight = 6
306317
notebook = "31_Metadata_Filtering.ipynb"
307318
aliases = []
308-
completion_time = "10 min"
319+
completion_time = "15 min"
309320
created_at = 2024-01-30
310-
haystack_2 = true
321+
haystack_2 = true

tutorials/29_Serializing_Pipelines.ipynb

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -396,6 +396,20 @@
396396
"source": [
397397
"new_pipeline.run(data={\"builder\": {\"sentence\": \"I love capybaras\"}})"
398398
]
399+
},
400+
{
401+
"cell_type": "markdown",
402+
"metadata": {},
403+
"source": [
404+
"## What's next\n",
405+
"\n",
406+
"🎉 Congratulations! You've serialzed a pipeline into YAML, edited it and ran it again!\n",
407+
"\n",
408+
"If you liked this tutorial, you may also enjoy:\n",
409+
"- [Creating Your First QA Pipeline with Retrieval-Augmentation](https://haystack.deepset.ai/tutorials/27_first_rag_pipeline)\n",
410+
"\n",
411+
"To stay up to date on the latest Haystack developments, you can [sign up for our newsletter](https://landing.deepset.ai/haystack-community-updates?utm_campaign=developer-relations&utm_source=tutorial&utm_medium=serialization). Thanks for reading!"
412+
]
399413
}
400414
],
401415
"metadata": {

0 commit comments

Comments
 (0)