LlamaIndex Prompt Engineering Tutorial (FlowGPT)

Building RAG with LLMs and Prompts
Jerry Liu, LlamaIndex co-founder/CEO

RAG
Context
● LLMs are a phenomenal piece of technology for knowledge generation and
reasoning. They are pre-trained on large amounts of publicly available data.
Use Cases
Question-Answering
Text Generation
Summarization
Planning
LLM’s
Context
● How do we best augment LLMs with our own private data?
Raw Files API’s
? Use Cases
Question-Answering
Text Generation
Summarization
Vector Stores SQL DB’s Planning
LLM’s
LlamaIndex: A data framework for LLM applications
● Data Management and Query Engine for your LLM application
● Offers components across the data lifecycle: ingest, index, and query over data
Data Ingestion
Data Structures Queries
(LlamaHub 🦙)
● Connect your existing ● Store and index your ● Retrieve and query over
data sources and data data for different use data
formats (API’s, PDF’s, cases. Integrate with ● Includes: QA,
docs, SQL, etc.) different db’s (vector Summarization, Agents,
db, graph db, kv db) and more
Data Connectors: powered by LlamaHub 🦙
● Easily ingest any kind of data, from anywhere
○ into unified document containers
● Powered by community-driven hub
○ rapidly growing (100+ loaders and counting!)
● Growing support for multimodal documents (e.g. with inline images)
<10 lines of code to

ingest from Notion
Data Indices + Query Interface
Your source Our data A retriever helps to A query engine

documents are indices help to retrieve relevant manages retrieval
stored in a data provide a view of documents for your and synthesis
collection your raw data query given the query.
In-memory, Vectors,
MongoDB knowledge
graphs,
keywords
Storage Abstractions
KV Stores:
● In-memory
● MongoDB
● S3
Vector Stores:
● Pinecone
● Weaviate
● Chroma
● Milvus
● Faiss
● Qdrant
● Redis
● Deeplake
● Metal
● DynamoDB
● LanceDB
● Opensearch
● etc.
RAG Stack for building a QA System
Data Ingestion / Parsing Data Querying
Chunk
Chunk
Doc
Chunk Chunk
Chunk
Vector LLM
Chunk
Database
Chunk
5 Lines of Code in LlamaIndex!

RAG Stack (Data Ingestion/Parsing)
Current State:
● Load in documents into a text representation
(e.g. from LlamaHub)
Chunk ● Split up document(s) into even chunks (by
sentences, or by tokens)
● Load into vector database
Chunk
Doc
Vector
Chunk Database
Chunk
Naive RAG Stack (Querying)
Current State:
● Find top-k most similar chunks from vector
database collection
● Plug into LLM Response Synthesis
Module
This is RAG Prompt Engineering
Chunk
Chunk
Vector LLM
Chunk
Database
Response Synthesis
Create and refine
Response Synthesis
Tree Summarize
Let’s Build LLM Response Synthesis!
https://colab.research.google.com/drive/15Qk6cXCj8U5RcvdykGSRdWemn497Dq
cv?usp=sharing
Challenge with RAG Stack
● Top-k retrieval can be limiting - works mostly for questions about specific facts
● What if we wanted to ask summarization questions?
Summary Index: Returns All Context
from llama_index import SummaryIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader('data').load_data()
index = SummaryIndex.from_documents(documents)
query_engine = index.as_query_engine(response_mode="tree_summarize")
response = query_engine.query("Could you give a summary of this article in

newline separated bullet points?")
Answer
● The author began writing and programming before college, and studied philosophy in college before switching to AI.
● He realized that AI, as practiced at the time, was a hoax and decided to focus on Lisp hacking instead.
● He wrote a book about Lisp hacking and graduated with a PhD in computer science.
● ….
Building a Unified Query Interface
Can use a “Router”

abstraction to route to
different query engines.
For instance, can do joint

semantic search /
summarization
Considerations when building Router
● Deciding whether to select one option or multiple options
● How to do structured output parsing
● Integrating with Function Calling APIs
● [Multi-Routing] Figuring out how to Combine Results
Let’s build a Router!
https://colab.research.google.com/drive/19cmfRyWR-5t0exWedMmK0WElUdoxfF
nv?usp=sharing

LlamaIndex Prompt Engineering Tutorial (FlowGPT)

Uploaded by

Copyright:

Available Formats

LlamaIndex Prompt Engineering Tutorial (FlowGPT)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

LlamaIndex Prompt Engineering Tutorial (FlowGPT)

Uploaded by

Copyright:

Available Formats

Building RAG with LLMs and Prompts

Jerry Liu, LlamaIndex co-founder/CEO

Raw Files API’s

<10 lines of code to

Your source Our data A retriever helps to A query engine

5 Lines of Code in LlamaIndex!

This is RAG Prompt Engineering

response = query_engine.query("Could you give a summary of this article in

Can use a “Router”

For instance, can do joint

You might also like