Basic LLM Routing Tutorial
Basic LLM Routing Tutorial
Basic LLM Routing Tutorial
Day 1 of 4
1 ANSHUMAN JHA
Table of Contents
1. Introduction
2. What is Model Routing?
3. How Does Model Routing Work?
4. Why is Model Routing Important?
5. Detailed Sample Code
5.1 Set Up the Environment
5.2 Set Up API Keys
5.3 Define the Models
5.4 Implement the Router Logic
5.5 Test the Model Router
5.6 Analyze the Results
6. Conclusion
7. Example Google Colab Notebook
2 ANSHUMAN JHA
1. Introduction
3 ANSHUMAN JHA
2. What is Model Routing?
4 ANSHUMAN JHA
On the other hand, if a query requires deeper
contextual understanding, it might be routed to
a more complex model, such as GPT-3.5 or
GPT-4.
5 ANSHUMAN JHA
3. How Does Model Routing
Work?
Model Routing typically involves the
following components:
1. Router: The decision-making unit that
analyzes incoming requests and
determines the appropriate model to
handle them.
2. Models: Different LLMs or specialized
models that can process requests.
3. Criteria: The parameters or rules that
guide the router's decision, such as the
nature of the input data, expected
output, latency constraints, or resource
availability.
4. Orchestrator: A system or framework
that oversees the routing process,
ensuring that requests are handled
smoothly and efficiently.
6 ANSHUMAN JHA
4. Why is Model Routing
Important?
8 ANSHUMAN JHA
5.1 Set Up the Environment
9 ANSHUMAN JHA
5.2 Set Up API Keys
10 ANSHUMAN JHA
5.3 Define the Models
11 ANSHUMAN JHA
5.4 Implement the Router Logic
Parameters:
prompt (str): The input text for the model.
Returns:
str: The model response.
"""
# If the prompt has more than 50 words,
use GPT-4o
if len(prompt.split()) > 50:
model = models["complex_model"]
else: # Otherwise, use GPT-4o-mini
model = models["simple_model"]
response = openai.Completion.create(
model=model,
prompt=prompt,
max_tokens=100
)
return response.choices[0].text.strip()
12 ANSHUMAN JHA
5.5 Test the Model Router
Let's test the router with different prompts
to see how it works.
# Simple prompt (less than 50 words)
complex_response = route_request(complex_prompt)
13 ANSHUMAN JHA
5.6 Analyze the Results
14 ANSHUMAN JHA
Conclusion
In this first part of our series, we've
introduced the concept of Model Routing
and explored how it can be used to optimize
the performance, cost, and resource
management of AI systems.
15 ANSHUMAN JHA
Link of collab Notebook
16 ANSHUMAN JHA
Basic_LLM_Routing_Tutorial_Day_1
1
#3. How Does Model Routing Work? ###Model Routing typically involves the following
components: 1. Router: The decision-making unit that analyzes incoming requests and determines
the appropriate model to handle them. 2. Models: Different LLMs or specialized models that
can process requests. 3. Criteria: The parameters or rules that guide the router’s decision, such
as the nature of the input data, expected output, latency constraints, or resource availability. 4.
Orchestrator: A system or framework that oversees the routing process, ensuring that requests
are handled smoothly and efficiently
2
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-
packages (from requests>=2.20->openai==0.28) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in
/usr/local/lib/python3.10/dist-packages (from requests>=2.20->openai==0.28)
(2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in
/usr/local/lib/python3.10/dist-packages (from requests>=2.20->openai==0.28)
(2024.7.4)
Requirement already satisfied: aiohappyeyeballs>=2.3.0 in
/usr/local/lib/python3.10/dist-packages (from aiohttp->openai==0.28) (2.3.5)
Requirement already satisfied: aiosignal>=1.1.2 in
/usr/local/lib/python3.10/dist-packages (from aiohttp->openai==0.28) (1.3.1)
Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-
packages (from aiohttp->openai==0.28) (24.2.0)
Requirement already satisfied: frozenlist>=1.1.1 in
/usr/local/lib/python3.10/dist-packages (from aiohttp->openai==0.28) (1.4.1)
Requirement already satisfied: multidict<7.0,>=4.5 in
/usr/local/lib/python3.10/dist-packages (from aiohttp->openai==0.28) (6.0.5)
Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-
packages (from aiohttp->openai==0.28) (1.9.4)
Requirement already satisfied: async-timeout<5.0,>=4.0 in
/usr/local/lib/python3.10/dist-packages (from aiohttp->openai==0.28) (4.0.3)
3
##5.4 Implement the Router Logic
Next, we’ll implement the router logic.
Our simple router will check the length of the input text and decide which model to use.
[ ]: def route_request(prompt):
"""
Routes the request to the appropriate model based on the input length.
Parameters:
prompt (str): The input text for the model.
Returns:
str: The model response.
"""
if len(prompt.split()) > 50: # If the prompt has more than 50 words, use␣
↪GPT-4o
model = models["complex_model"]
# Use ChatCompletion for GPT-4o
response = openai.ChatCompletion.create(
model=model,
messages=[
{"role": "user", "content": prompt}
],
max_tokens=100
)
return response.choices[0].text.strip()
else: # Otherwise, use GPT-4o-mini
model = models["simple_model"]
# Use ChatCompletion for gpt-4o-mini (chat model)
response = openai.ChatCompletion.create(
model=model,
messages=[
{"role": "user", "content": prompt}
],
max_tokens=100
)
return response.choices[0].message.content.strip() # Access content for␣
↪chat models
4
Response from Simple Model:
Model routing is a concept often discussed in the context of web application
development, particularly in the frameworks that follow the Model-View-
Controller (MVC) architectural pattern. While it might not be universally
defined as a standalone term, it generally pertains to how data models and their
associated routes are organized and managed within an application. Here’s a
breakdown of the concept, including its principles and relevance:
1. **MVC Architecture**:
- In the MVC pattern, the application is divided
complex_response = route_request(complex_prompt)
print("Response from Complex Model:\n", complex_response)
Model routing involves the use of algorithms or rules to determine which machine
##5.6: Analyze the Results
In this step, you’ll see that the router directs the simpler prompt to GPT-4o-mini, and the more
complex one to GPT-4o. This basic example demonstrates how routing can optimize the use of
different models based on input characteristics.
3 6. Conclusion
###In this first part of our series, we’ve introduced the concept of Model Routing and explored
how it can be used to optimize the performance, cost, and resource management of AI systems.
###We’ve also walked through a basic implementation in a Google Colab notebook, providing a
hands-on example of how to get started with model routing.
###In the next tutorial, we will take a deeper dive into the step-by-step process of setting up
model routing in a more complex scenario, building upon the foundation laid today. Stay tuned!