Agentic AI Cloud - Investor Summary_vDraft (2) (2)

haimaker
Investment Summary
December 2024
Copyright © 2024 SambaNova Systems

Confidential & Proprietary | Internal Use
Only
The total AI market size could reach $990 billion
within the next 2 years, driven by Agentic AI
2024: AI Market 2027: AI Market
“Single shot AI” “Agentic AI”
$70- 40%-55% $780-

90Bn CAGR
TAM $990 Bn
TAM
Note: TAM represents AI Infrastructure including compute, inference services, tooling and infrastructure software, networking & storage.
Sources: IDC, Gartner, Morgan Stanley, Bain & Company
Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary 2

Trends in AI Cloud Infrastructure
Today’s AI infrastructure can’t support the

Performance Needs for the Future of AI
Agentic AI requires Fast Inference and Concurrent

Support of Larger and Larger Foundational Models
Tuned with Private Data
Compute Must be Optimized for Lower Provisioning

Requirements and Energy Consumption

What is haimaker
haimaker is the first AI Agentic Cloud

purpose-built to support multi-model
agent-based AI, delivering high-speed,
inference built on top of SambaNova1,
the most efficient compute hardware in the
world
1) haimaker is spin-out of SambaNova Systems with SambaNova serving as the anchor

investor.

There is Not a good Agentic Cloud Solution On the
Market Today
Current AI Clouds are Not Agentic AI Cloud
Built for Agentic AI Requirements
● Inference services are single- ● Ultra-low latency in multi-

model, single-shot model applications
● Sub-par performance for multi- ● Run 1000s of fine tuned

model agentic flows models and hotswap in
milliseconds
● Focus is on massive
infrastructure investment, not ● Efficient compute at the lowest
AI workflows tokens per second, per watt

Agentic AI will Drive an Exponential Increase in
Inference Demand
Low compute, High Latency ~100x More Compute, Low Latency
Acceptable Required
Unique
model Simple Chatbot Knowledge Chain of Thought Agentic System
Unique Single shot workflow w/ Many intermediate steps Many models, steps and huge
step Assistant
Data source (e.g., vector single mode (e.g. ChatGPT) Chatbot with RAG, requiring from single model to amounts of data to enable
db)
embeddings and much more produce higher quality autonomous agents
data response
Tokens generated per

1X 1X 10X 100X
query
Models Used per query 1 12 1 ~5-10
Data per query 1X 5X 10X 1000X
1. DC, Gartner, MorganCopyright

Stanley, BainSambaNova
© 2024 & Company, SambaNova
Systems Systems..
Inc. | Confidential & Proprietary
These Systems Require Performance Optimized for
Specific Parameters
Agentic AI Performance
Category Metrics Use Cases
Mostly In ● Customer Support Focused
dev, ● AI Avatars data &
nearing Conversational AI Time to First Token simpler
● Human-like
prod logic
interactivity
● Autonomous Systems
Decision Support
Total Latency ● Algorithmic Trading
Systems
● Emergency Response
● Drug Discovery Broader

Token Throughput
Knowledge Generation ● Research data &
Speed
● Complex Simulations complex
AI Frontier reasoning

The Agentic AI Performance Gap
Typical NVIDIA H100
Speeds1 Agentic AI Requirement
~2-4 s < 400 ms

Time to First Token
(seconds) (milliseconds)
~.5+ s < 200 ms

Total Latency
Throughput Speed ~75 tps 500+ tps

(tokens per second) (tokens per second)
5-10x Optimization Gap
1. H100 speeds based on average serving profiles on the LLAMA 3.2 70b.
2. Data derived from averages of publicly available inference services, including Together, Fireworks, DeepInfra, Lambda, NovitaAI, Hyperbolic, Lepton.

Our Partnership With SambaNova Provides Us With
the Fastest Speeds on the Largest Models
Llama 3.1 405B Inference

Speeds
Do not
NVIDIA HW support 405b

Faster Using Only 16 chips and 10kW
10X Faster Tokens/Second/User
Powe
SambaNova Chips
r
Llama 3.1 405B 16-
bit
200 16 10kW
Llama 3.1 70B 16-bit 580 16 10kW
Llama 3.1 8B 16-bit 1115 16 10kW

1
Llama 3.2 3B 16-bit 1500 16 10kW
0 1
0
Llama 3.2 1B 16-bit 2500 16 10kW

New players in the market attempt to solve for
speed but offer solutions that cannot scale and
require new DCs
to be built to support
Hardware Configuration to run 70b
100s of models Single Model Capabilities Only
1 rack
10 KW total 238 KW total 92 KW total

Enter haimaker: The Agentic AI Cloud
haimaker provides a purpose-built platform for the agentic AI revolution.
● HIgh-Speed Inference: High-speed, low-latency

performance at scale with HW optimized for
model swapping and fine tuned data.
● SambaNova Partnership; Spin out from

SambaNova provides access to the the most
energy efficient AI Compute HW at wholesale
prices.
● Federated Infrastructure: A global network of

existing data centers contributing capacity
without requiring new builds.
● World Class Orchestration: Combining high

performance computing with cloud native
efficiency.
● Sovereign AI: Pipeline of Sovereign customers

seeking a managed cloud solution TODAY!

A Unified Vision for Agentic AI
We are partnering with Lepton AI to provide the Only End-to-End Cloud Solution for Agentic AI that is
ready to enter the market today
API Services
haimaker's compute backbone,
federated globally
Lepton for AI Cloud Architecture
Integrated compute to runtime,
ready for agentic flows
Lepton's orchestration on haimaker's

distributed compute
High performance, low latency, and

cost efficiency at scale
Federated by haimaker
Specifically designed for multi-model,
low-latency workflows
Production ready, secure, and

haimaker Federated Data Centers Powered SambaNova’s compliant
World’s Most Efficient Compute

Business Model
Lines of Business Revenue Streams
● Hardware sales: Sell SambaNova AI Compute ● Rack Sales: $350k margin per node/rack sold
HW to data centers where a portion of all HW to data centers, with a target of 200 racks sold
sold gets contributed to our managed cloud for in year 1.
a revenue share. ● Inference: 10% revenue share on inference
● Agentic AI Cloud: Provide managed cloud services from data center-contributed capacity.
inference service, in a federated model, ● SaaS: $25k SaaS revenue per rack per year for
enabling data centers to quickly enter the AI
Lepton AI orchestration platform.
market and optimize per-rack profitability.
We have line of sight into more than $70M of contributed margin from hardware sales
in year 1
Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary

We have a pipeline of deals ready to close and
adopt a managed cloud solution
The leadership team at haimaker is

the same team that is driving
hardware salesSystems
Copyright © 2024 SambaNova at SambaNova
Inc. | Confidential & Proprietary
Operating Model with SambaNova
NewCo will acquire SambaNova racks at a fixed price.
NewCo will contractually provide a guaranteed minimum capacity

for SambaNova and adhere to SLAs that require fulfilling any additional
capacity requests through a purchase mechanism.
NewCo will launch with SambaNova's existing technology, as is, to

offer a white-label managed cloud. Funding will be used for hiring and
optmization of the technology for the Agentic use case.

Partnership with Lepton

Graveyard
Investment Summary

Only
Enter haimaker: THe New Agentic AI Cloud
● Inference services catered to agentic flows

Performance Tokens
powered by the fastest, most efficient hardware
● Spin out from SambaNova provides access to

SambaNova
compute at below market rates
● World class orchestration technology to

Lepton AI Partnership optimize the federated model across data centers
and geos
An inference cloud purpose-built for Agentic AI, powered by SambaNova, the world’s
most capable AI hardware

haimaker: The Agentic AI Cloud
haimaker provides a purpose-built platform for the agentic AI revolution.
● HIgh-Speed Inference: High-speed, low-latency

performance at scale with HW optimized for
model swapping and fine tuned data.
● SambaNova Partnership; Access to the the

most energy efficient AI Compute HW at
wholesale prices.
● World Class Orchestration: Combining high

performance computing with cloud native
efficiency
● Federated Infrastructure: A global network of

existing data centers without requiring new builds.
● Sovereign AI: Pipeline of Sovereign customers

seeking a managed cloud solution TODAY!

The Agentic AI Opportunity
Agentic AI Revolution: Agentic AI
applications will require 100-1000x more
inference capacity.
“Single shot AI” “Agentic AI” Multi-Model Workflows: Agentic flows rely
on multiple models with multiple steps
increasing the need for compute.
$70- Performance Demands: Low latency and

40%-55% $780- high throughput are critical for real-time
90Bn CAGR
TAM $990 Bn agentic applications.
TAM
Infrastructure Limitations: Current cloud
infrastructure is not designed for agentic AI's
unique demands.
Demand for Scalability: Businesses need a

platform that can scale to meet the
exponential growth of agentic AI.

Agentic AI will result in an exponential increase in
inference demand, driving revenue for On Demand Cloud
models that can meet the performance requirements
Low compute, High Latency ~100x More Compute, Low Latency
Acceptable Required
Unique
model Simple Chatbot Knowledge Chain of Thought Agentic System
Unique Single shot workflow w/ Many intermediate steps Many models, steps and huge
step Assistant
Data source (e.g., vector single mode (e.g. ChatGPT) Chatbot with RAG, requiring from single model to amounts of data to enable
db)
embeddings and much more produce higher quality autonomous agents
data response
Tokens generated per

1X 1X 10X 100X
query
Models Used per query 1 12 1 ~5-10
Data per query 1X 5X 10X 1000X
1. DC, Gartner, Morgan Stanley, Bain & Company, SambaNova Systems..

The Agentic AI Performance Imperative
Agentic AI Performance
Category Metrics Use Cases
Mostly In ● Customer Support Focused
dev, ● AI Avatars data &
nearing Conversational AI Time to First Token simpler
● Human-like
prod logic
interactivity
● Autonomous Systems
Decision Support
Total Latency ● Algorithmic Trading
Systems
● Emergency Response
● Drug Discovery Broader

Token Throughput
Knowledge Generation ● Research data &
Speed
● Complex Simulations complex
AI Frontier reasoning

The Current Infrastructure Bottleneck
Typical NVIDIA H100
Speeds1 Agentic AI Requirement
~2-4 s < 400 ms

Time to First Token
~.5+ s < 200 ms

Total Latency
Throughput Speed ~75 tps 500+ tps

(tokens per second) (tokens per second)
5-10x Optimization Gap
1. H100 speeds based on average serving profiles on the LLAMA 3.2 70b.
2. Data derived from averages of publicly available inference services, including Together, Fireworks, DeepInfra, Lambda, NovitaAI, Hyperbolic, Lepton.

New players in the market attempt to solve for
speed but offer solutions that cannot scale and
require new DCs
to be built to support
Hardware Configuration to run 70b
100s of models Single Model Capabilities Only
1 rack
10 KW total 238 KW total 92 KW total

The total AI market size could reach $780 billion to
$990 billion within the next 2 years, driven by
Agentic AI
“Single shot AI” “Agentic AI”
$70- 40%-55% $780-

90Bn CAGR
TAM $990 Bn
TAM

Enter Agentic AI Cloud
On Demand ● Inference services catered to agentic flows

Compute powered by the fastest, most efficient hardware
● Global install base comprised of contributed

Federated Model
capacity from existing data centers (no new build)
● World class orchestration technology to

Managed Cloud
optimize the federated model across data centers
Services
and geos
An inference cloud purpose-built for Agentic AI, powered by SambaNova, the world’s
most capable AI hardware

AI Agentic AI Cloud: The Only Way Forward
AI Agentic Nvidia Groq Cerebras Breakdown

Cloud
Nvidia’s architecture makes inference speed

impossible. This is where SambaNova, Groq and
Fastest Tokens Cerebras thrive. Fast inference is fundamental for
Agentic AI.
Groq and Cerebras requires tens of racks to get this

Efficient Tokens speed from one model.
And since Agentic AI requires lots of models,

Groq and Cerebras become unviable, as more models
Models per System means more racks. Nvidia too quickly runs out of
memory/
Agentic AI also requires should amounts of data

from external sources and from reused data during
Data per System previous steps. Groq and Cerebras run out of memory.

Agentic AI Cloud: We Start Where Others Fail
We unlock vacant or older

data centers without the need to
retrofit or built out.
We manage SambaNova’s
Cloud and cater to the Agentic
AI market
Complete E2E solution with

turnkey managed services
available for sovereign AI clouds.

Business Model
Objective: Establish an Agentic AI Cloud as a sales acceleration and hardware distribution
platform for SambaNova, akin to CoreWeave's relationship with NVIDIA, via white-label managed
cloud .
Lines of Business Revenue Streams
● HW Sales via Revenue Share: Sell ● Rack Sales: $350k margin per node/rack sold
SambaNova AI Compute HW to data centers to data centers, with a target of 200 racks sold
where a portion of all HW sold gets contributed in year 1.
to the AI Agentic Cloud inference platform. ● Inference: 10% revenue share on inference
● Agentic AI Cloud: Provide managed cloud services from data center-contributed capacity.
inference service, in a federated model, ● SaaS: $25k SaaS revenue per rack per year for
enabling data centers to quickly enter the AI
orchestration platform.
market and optimize per-rack profitability.
We will launch the business by purchasing 85 racks in Year 1 from SambaNova.

Proposed Structure & Investment Thesis
Lines of Business Investment Thesis
● Spin out NewCo as a managed AI Agentic ● Time to revenue is immediate via HW

Cloud and SambaNova’s distribution sales.
arm. ● Proven sales pipeline executed by the
● SambaNoa provides a $20M anchor same management team.
investment within a ~$120M raise ○ Line of sight into 200 racks year
(option to increase). 1.
● Funds to be used to purchase 85 racks ● Wholesale agreements with SambaNova
from SambaNova during year 1 to launch provide competitive advantage.
the Agentic Cloud.
● Limited new software development
drives time to market.
● Break-even and FCF positive within 2
years.
Operating Model with SambaNova
NewCo will acquire SambaNova racks at a fixed price.
NewCo will contractually provide a guaranteed minimum capacity

for SambaNova and adhere to SLAs that require fulfilling any additional
capacity requests through a purchase mechanism.
NewCo will launch with SambaNova's existing technology, as is, to

offer a white-label managed cloud. Funding will be used for hiring and
optmization of the technology for the Agentic use case.

Financial Model ($000s)
Scenario: $120M funding with 85 racks purchased during year 1
Rack Sales Drive Immediate Profits, FCF Positive in Year 2

Sources & Uses ($000s)
Scenario 1: Sources and Uses (85 racks)
Notes:
An itemized breakdown of all uses of funds is currently under development.

Graveyard
Investment Summary

Only
Enter Agentic AI Cloud Powered by SambaNova
An Agentic AI requires a
system that: Agentic AI Cloud SambaNova
Can run fast inference On Demand

10 kw per rack
at efficient TPS per MW Compute
Can run many models at

Model Chaining & Host 100s of model per
once and hotswap in
Orchestration rack
milliseconds
And can cache huge SLAs on TTFT, Latency Large Context

amounts of data and Throughput Capabilities
An inference cloud purpose-built for Agentic AI, powered by the world’s most capable
AI hardware

SambaNova
$120M Delivery of
Cash 100 racks
Principal,
Interest,
Colocation Fees Distributions
Asset Lender +
Data Center SPV
Colo Services
80/20 debt Equity Investors
equity $$
100% rev share to

recovery 50/50 after Exclusive right to
operate racks in DC
4 year term
Recovery is Sum of:
Capex + Interest +
Colo
100 racks
Agentic Cloud
(“Newco”)

SambaNova
Delivery of 100 racks

$120M 100% rev share to recovery 50/50 after
Cash
4 year term. Recovery is Sum of: Capex + Interest + Colo
Exclusive right to
operate racks in DC
Principal,
Interest,
Colocation Fees Distributions
Asset Lender +
Data Center SPV
Colo Services
80/20 debt Equity Investors
equity $$

More info on AI Cloud…. We talked a lot of about
the hardware, now let’s talk about the cloud more
here before we get into business model. This is
circle back to Slide 5 that makes it come to life.
This slide to the CSP/DC level?
Introducing the only true Agentic AI Cloud - we start
where others fail
Quick start: we go into vacant datacenters without

retrofit or need for new builds
Expertise: We manage SambaNova’s cloud
Complete solution: E2E solution with turnkey managed

services available for sovereign AI clouds

Agentic AI requires a system that:
●Can run fast inference (tokens per second per user)

●But that fast inference needs to be efficient (tokens per second per watt)
●Can run lots of models and hotswap in milliseconds (models deployed per rack)
●And can cache huge amounts of data, as these systems need to remember a lot
of things (total context window per rack)
Notes:
1) An itemized breakdown of all uses of funds is currently under development.

Enter SambaNova + Agentic AI Cloud (I think we
keep this slide to the developer personal level?)
Agentic AI
Cloud Inference
Performance
Token Speeds
Serving Profiles
Built for Agentic AI
No Data Center
Buildout/Upgrade
● Operates at 11 kW for inference on a single Required
rack
● Host 100s of models per rack
● 5x more tokens per MW than NVIDIA 100
● Works in traditional CPU datacenters
SN40L: SambaNova’s new CoE-optimized RDU
“Cerulean” Architecture-based Reconfigurable Dataflow Unit
5nm TSMC 3-tier Dataflow Memory

Cerulean SN40L
RDU
520 MB
102B Transistors
On-Chip Memory
1,040 RDU Cores 64 GB

High Bandwidth
Memory
4
2
638 TFLOPS (bf16) 1.5 TB
High Capacity Memory
Generative AI Training and Inference

SN40L System - 3-tier Dataflow Memory
On-Chip SRAM [4 GB, PBs per sec]

Dataflow enabled by large On-Chip Memory
12.8
TB/s
RDU High Bandwidth Memory [512 GB]
Super Low Latency Model Switching (Eg. <0.02sec for llama V2
7B)
800
GB/s
RDU High Capacity DDR Memory [12 TB]
4
Up to 5 Trillion Parameters! 3

Support Large Volume of Models with 10x Less Hardware
10 Nodes Memory Required [DGX H100] 1 Node Memory Reduce Data Center
Required Footprint
Space, HW, Power
Simplified Hardware
Capacity Planning
Throughput + Memory
capacity → Throughput
Auto-Optimize for
Various Traffic Patterns
With less predictable
model traffic patterns,
system can self-optimize
60% 30% 80% 15% 40% 4

4
Utilized Utilized Utilized Utilized Utilized
Illustrative Example: 175 expert models (~2.6T Parameters)

including mix of: Llama 3 8B, Mistral 7B, Llama 3 70B, Llama 2
70B

The total AI market size could reach $780 billion to
$990 billion by 2027, driven by Agentic AI
2024: Single Shot

2027: Agentic AI
AI
40%-55% CAGR
$70-90 Bn $300-$500 Bn
Total Addressable Total Addressable
Market Market
Notes:
1) TAM represents AI Infrastructure Enablers including compute tooling and infrastructure software, networking & storage.
2) Sources: IDC, Gartner, Morgan Stanley, Bain & Company

Agentic AI Cloud - Investor Summary_vDraft (2) (2)

Uploaded by

Copyright:

Available Formats

Agentic AI Cloud - Investor Summary_vDraft (2) (2)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Agentic AI Cloud - Investor Summary_vDraft (2) (2)

Uploaded by

Copyright:

Available Formats

haimaker

Copyright © 2024 SambaNova Systems

$70- 40%-55% $780-

Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary 2

Today’s AI infrastructure can’t support the

Agentic AI requires Fast Inference and Concurrent

Compute Must be Optimized for Lower Provisioning

Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary 3

haimaker is the first AI Agentic Cloud

1) haimaker is spin-out of SambaNova Systems with SambaNova serving as the anchor

Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary 4

● Inference services are single- ● Ultra-low latency in multi-

● Sub-par performance for multi- ● Run 1000s of fine tuned

Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary 5

Tokens generated per

1. DC, Gartner, MorganCopyright

● Drug Discovery Broader

Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary 7

~2-4 s < 400 ms

~.5+ s < 200 ms

Throughput Speed ~75 tps 500+ tps

Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary 8

Llama 3.1 405B Inference

Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary 9

Llama 3.1 70B 16-bit 580 16 10kW

Llama 3.1 8B 16-bit 1115 16 10kW

Llama 3.2 1B 16-bit 2500 16 10kW

Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary 10

100s of models Single Model Capabilities Only

10 KW total 238 KW total 92 KW total

Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary 11

● HIgh-Speed Inference: High-speed, low-latency

● SambaNova Partnership; Spin out from

● Federated Infrastructure: A global network of

● World Class Orchestration: Combining high

● Sovereign AI: Pipeline of Sovereign customers

Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary 12

Lepton's orchestration on haimaker's

High performance, low latency, and

Production ready, secure, and

Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary 13

Lines of Business Revenue Streams

Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary

The leadership team at haimaker is

NewCo will acquire SambaNova racks at a fixed price.

NewCo will contractually provide a guaranteed minimum capacity

NewCo will launch with SambaNova's existing technology, as is, to

Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary

Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary

Copyright © 2024 SambaNova Systems

● Inference services catered to agentic flows

● Spin out from SambaNova provides access to

● World class orchestration technology to

Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary

● HIgh-Speed Inference: High-speed, low-latency

● SambaNova Partnership; Access to the the

● World Class Orchestration: Combining high

● Federated Infrastructure: A global network of

● Sovereign AI: Pipeline of Sovereign customers

Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary 20