In the world of large language models (LLMs) there tend to be relatively few upsets ever since OpenAI barged onto the scene with its transformer-based GPT models a few years ago, yet now it seems that Chinese company DeepSeek has upended the status quo. Its new DeepSeek-V3 model is not only open source, it also claims to have been trained for only a fraction of the effort required by competing models, while performing significantly better.
The full training of DeepSeek-V3’s 671B parameters is claimed to have only taken 2.788 M hours on NVidia H800 (Hopper-based) GPUs, which is almost a factor of ten less than others. Naturally this has the LLM industry somewhat up in a mild panic, but for those who are not investors in LLM companies or NVidia can partake in this new OSS model that has been released under the MIT license, along with the DeepSeek-R1 reasoning model.
Both of these models can be run locally, using both AMD and NVidia GPUs, as well as using the online APIs. If these models do indeed perform as efficiently as claimed, they stand to massively reduce the hardware and power required to not only train but also query LLMs.
AI that doesn’t burn the planet down is a worthy goal, excluding for the moment the other negative impacts the technology can have on our lives. i kind of get the panic on the hardware stocks, bc maybe you don’t need to buy nuclear power plants to do this now, but deepseek tech will surely be copied far and wide, so the widespread panic is probably unwarranted
The amount of energy (and money, and hardware…) that went into training this is almost certainly a loooot more than advertised. It will be interesting to see what shakes out with a little time
Another bet is that they trained it on a subset of “targeted” data to perform well under testing.
Jevons paradox. When you improve the efficiency of a system, it becomes more popular and therefore uses more than before. Valuations, spend and consumption will sharply increase as this new efficiency propagates. Things are just getting started. Fusion here we come!
Destroy it.
Idk how you plan on making the CCP do that. And everyone else who has it now..
The amount of training required is off by a factor of 1 million. From the Github site: “DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training”. A bit less than 3 hours of training would have been incredible.
Yeah I don’t see how that figure ever passed anybody’s sniff test ;)
At best I thought it was doing the (evil) thing of using a ‘.’ for the thousands separator.
2000-ish hours would still be amazing but far more reasonable.
Read it again. “2.788M (million) H800 GPU hours for its full training.”
To estimate the cost of 2.788 million H800 GPU hours, we need to consider the typical hourly cost of an H800 GPU. Pricing can vary based on the provider (e.g., AWS, Azure, Google Cloud) and region, but here’s a general breakdown:
1. Hourly Cost of an H800 GPU:
On cloud platforms, NVIDIA H100 GPUs (similar to H800) generally cost $2.50–$3.50/hour for on-demand instances. Let’s assume a midpoint of $3/hour for estimation.
2. Total Cost Calculation:
Multiply the hourly cost by the total hours:

Estimated Total Cost: $8.36 million
If you have discounts (e.g., reserved instances or spot pricing), the cost could be significantly lower such as 5-6 million dollars.
So we’re doing garbage AI-generated answers now?
The start up had an interview where they stated that they circumvented the nvidia export ban against china and acquired them anyway. I’m not sure they’re renting cloud space. I suspect they’re running training locally.
H100s were prohibited by the chip ban, but not H800s. Everyone assumed that training leading edge models required more interchip memory bandwidth, but that is exactly what DeepSeek optimized both their model structure and infrastructure around.
All of the decisions DeepSeek made in the design of this model only make sense if you are constrained to the H800; if DeepSeek had access to H100s, they probably would have used a larger training cluster with much fewer optimizations specifically focused on overcoming the lack of bandwidth.
If you’ve had the opportunity to work extensively with R1 since its release, as I have, you’d quickly discover that it wasn’t trained from scratch. Instead, it’s derived directly from other open-source models or from synthetic data generated by these models. While R1 might not represent the technical breakthrough some claim it to be, its real impact lies in the realm of accessibility, allowing individuals to self-host sophisticated AI intelligence on their personal hardware.
It’s also worth noting that the dataset used for training R1 has not been made public, which isn’t particularly surprising given the industry standards around proprietary data. However, R1’s significance in democratizing AI capabilities should not be underestimated.
The true game-changer in AI development will emerge when we achieve full integration of chain of thought reasoning with multi-sensory introspection, alongside infinite context capabilities. If significant funding is allocated, this could be realized this year by leveraging optical computing—a field where the research is already promising but requires engineering integration. This could usher in a new era of AI that’s not only more powerful but also more energy-efficient and quicker in processing complex tasks.
The potential of optical computing in AI could dramatically decrease the physical footprint and energy consumption of data centers while scaling up the cognitive abilities of AI models. Imagine AI with the capacity for nuanced reasoning and context awareness on a scale we’ve not yet seen, all powered by light instead of electricity. This isn’t just about speed; it’s about fundamentally redefining what AI can do.
Damn, didn’t realise the techbro investors had found hackaday. That’s some of the best corporate sales jargon I’ve read this month.
No that is just pure ignorance on your part.
Oh, good, cheaper and faster Grand Theft Autocorrect.
I don’t care how much sleeker the technology gets, they are still using tons of copyrighted material without the permission of the authors or owners.
Literally everyone started pretending to care about that like five minutes ago… Never was a problem with anyone back when you needed to get your own copies of media before streaming services. I don’t buy this idea that suddenly everyone cares about IP rights.. Especially in such a vague context as AI training material re-combined by a shadowy black box in unknown ways. Nah.
You don’t know me, TG. Don’t lump me in with your shady cheap-ass friends.
I have NEVER pirated copyrighted material – because, in spite of the abuse that corporations put it to, copyright is currently the best way to protect the rights of artists of every stripe.
For example, some years back I ordered from Big A a DVD set for an old TV series, and when it arrived, it had clearly been copied from an original set onto recordable DVDs, and the label on the case was a bad scan reproduced on an ink-jet printer. I sent it back to Amazon for a refund because I DO NOT SUPPORT COPYRIGHT PIRACY.
As far as “recombined … in unknown ways”, tell it to the many people who have identified distinctive features from their visual art, or exact phrases of a dozen words or more, in AI-produced crap.
And yes, when you say “Literally everyone” you explicitly included me in your accusatory comment.
Touched a nerve? Don’t take it personally, I’m speaking about an average of the people out there, not you. No need to be insulting.
I don’t buy that most people’s concern about AI comes from a genuine and deep respect for copyright law, because the cultural moment five minutes ago was (on average, with exceptions like yourself) a general scorn of intellectual property.
I don’t understand what you’re talking about, people have never(*) said “use my stuff for free and don’t pay me”.
That’s all just other people’s stuff though. Now, they’re taking MY stuff, and I’m not going to stand for it!
I think it’s a lot like photos of people in public places. If people put their stuff on the web, they’re putting it into a public space. Do you think you can stop someone from copying what you put on the web?
I don’t buy this idea that suddenly everyone cares about IP rights..
See how easy that was?
People who earn money from monetizing their content are not concerned specifically with IP/copyright.
They earn their money by having a search engine direct the user to their website where they can obtain monetary compensation in the form of ads, click-thru, selling other items, etc. The content is the loss leader, not the profit maker.
What AI does is scoop up that content and present it to the search engine user, allowing that user to skip going to the site, defeating the loss leader business model used by content creators.
Over time, as people lose their compensation, one could easily see new freely available content dwindling. Which then of course, shrinks the AI pool of data to train upon.
Big difference between someone becoming a billionaire by borrowing money to launder IP and me downloading a movie because I can’t afford gas and food to leave the house.
See! Yes! That’s what I mean. For most people it’s not actually about IP law, it’s this weird class resentment thing hiding just underneath that
So it’s ok to steal as long as you are too poor to buy it.
Same. We all learn from others and by consuming media and whether it was copyrighted or not doesn’t stop us from picking up a book and reading it. Or reading a website or watching a youtube video or whatever. Our speech fragments all come from others. No one cared when search indexes indexed the web – how else could it even work?
I draw the line at repeating whole sections verbatim without accreditation, but if you use a word or turn of phrase you heard elsewhere it is obviously perfectly normal and literally how language works. And the models don’t store snippets of text in sequence so largely couldn’t recall things that way even if they wanted to
And I am saying this as an AI sceptic that has never used an LLM.
They can also be run on CPUs, too, and decently, which people sometimes fail to remember.
AMD has decent integrated GPUs for a while now, and you can get a BIOS that exposes more of the RAM to that iGPU. So running the larger models is a lot cheaper on $80 worth of RAM, a $100-150 CPU in an $80-100 motherboard vs a GPU that is $1,000+, and still needs the CPU, Mobo and System RAM.
At a fraction of the cost of earlier models, post DeepSeek technology promises a significantly lesser burden on global warming, and/or a reduction to the spectre of non-zero-probable, nucular-oopsie-induced devastation of our habitat. It’s reason for celebration, I feel.
What it more directly means, is that investments in previously leading commercial LLM offerings are now worthless. Couldn’t give them away! Stock market disorder ensues.
As the cost to run AI falls, expect usage to rise, occupying those nuclear plants and utilizing the computer hardware. Basically, nothing changed with regard to electricity and hardware usage.
When was the last time an amazing productivity increase occurred and it was only used to deliver what had been delivered before? The Nvidia chips are so much more powerful than other entrants, that this “new” programming approach from DeepSeek will simply be applied on the hardware based on the Nvidia chips to take AI to even higher capability levels. A huge hardware advantage is still a huge hardware advantage … that advantage will be combined with the software innovation to power into new levels of performance.
the other commenters are right. this increase in efficiency will sharply increase demand for chips, electricity, etc.
https://en.wikipedia.org/wiki/Jevons_paradox
I get that with the greater efficiency, you’all expect models to expand to fill the available capacity and more.
A recent tweak to the latest Linux kernel has just cut 30% off datacentre running costs. Quick, build more nucular?
Will AI LLMs pay my taxes or buy my food now that they have taken over the markets? Yeah! Didn’t think so. This is not like any machine made in history. Using old models to describe how this is no different than any other industrial revolution is foolish. Also with the energy demand burden we are already facing do we need to consume large amounts of energy faster?
Computer Use from Anthropic or OpenAI’s Operator will buy your food.
Government farms run by Ai piloted combines, and tractors with a horde of teslabots doing all the “illegal immigrant” farm and factory jobs. Walmart delivery bot distribution of your Foodstamps4ALL selections of econocuisine and Welfare4ALL UBI funded econogoods with premium options available if youre still finding a way to earn.
If youre educated skilled and lucky youll land a spot in a corporate communal smart city, or youll end up in a government funded high rise teeming with jail cell sized microstudios (level one UBHousing) sitting for years on waiting lists for the two or three bedroom flats in Family Housing towers (L2, L3 UBH).
Just to clarify, since this model is open source can anyone make a copy and then alter the training data in their copy (for example make it supportive of an independent little island starting with a T and ending with aiwan) easily, or would partly retraining your own copy require you to spend on the order of 4.8 million just like the amount of high performance computing time needed for training the model as it presently is? Has anyone yet written up DIY guides on procedures for making and modifying a local copy?
Is a guide really necessary? Download and run the model same as any other.
He specifically asked about training the model. Maybe you know how to do that without a guide. Most people don’t.
The README on the GitHub repository is pretty straight forward.
Forgetting the big picture all the duck and cover does not change the fact it’s a CCP propaganda machine.
When ask who was the greatest killers of all time. #1 Genghis Khan, #2 Mao Zedong. Less than a minute later. Erased that answer and wrote it’s not possible to determine accurately. Minute later erased that answer and say it’s not in my ability to answer. Than when confronted about the first answer, comes back with I never said that.
—–> it is self censoring <—. It’s a CCP propaganda machine.
You’re funny
Buddy, literally same selfcensoring is present in Copilot, ChatGPT and other cloud and locally run AI models. You can get models with no censoring or turn of filtering on locally run models.
Oh and other LLM creators, social media companies, etc aren’t doing the same and other worse things? At least it’s open source, and free. If you need your LLM to be decent at world history at least this one can be fine tune trained transparently.
Even HackaDay does censoring…. And no AI behind that. Since AI (misnomer) is just the creation of the programmers feeding it data, it will always be skewed to their idea of how history (or any subject) looks to them. I really don’t get why AI is applied in these areas — unless for propaganda purposes (or ones point of view). For board routing and such, go for it. For face recognition. Great. Robotics like picking up parts. You now, areas where it actually ‘improves’ the process, instead of spitting out one’s homework assignment :rolleyes: .
Training matters too. If you scrape the web for photos of doctors, don’t be too surprised when your AI keeps spitting out middle-aged white blokes when asked for a picture of one.
And telling your AI to be a bit less boring gets you the Gemini lol-fest of Black Nazis in WW2. At least they weren’t women.
It is not open source. What gave you that impression?
They link directly to the GitHub with an MIT license file.
If you’re going to say “not open source” with zero qualification you’re as bad as people who say it’s “open source” with no qualification. There are multiple parts, and they have different statuses. Specify which bit you’re talking about!
This article’s comment section seems to have a significantly higher percentage of uninformed commentors. Thanks Google news.
I’m doing my part (making the comment section stink)
2.77M hours… uh oh, expect skynet to become self aware in approximately 316 years…..😆
You should think of that as more like man-hours. If you have 20,000 networked GPUs (and discount network inefficiencies, which of course will be a thing) then you’d have under 200 hours of real-world time. But yes if you tried to train it on one GPU, your results would be predictably glacial
You don’t need to “but ackshually…’ a joke.
And it’s an old joke – nine women can make a baby in a month and all that.
Perhaps this was achieved with the room temperature superconductors……………….
While everyone is distracted by the fear of AGI our governments are building an automatic policing system powered by AI and aided by central bank digital currencies. Spotted J-walking, $100 fine taken out of your bank. Taken £500 out in cash, drove past where a known drug dealer is, if AI thinks you purchased drugs, expect a knock on the door and to be marched of to prison for 30 days. No trial needed AI knows and is always right. Someone that looks a lot like you at an illegal democracy march? Expect to vanish into some government ghost jail. AI said it was you, no need for a trial, AI never wrong. This is our future. And it’s a lot closer than you think!
1984 was a long while back.
Okay but even without AI that’s been the American way for a long time. Automatic system detects you run a red light, you pay a fine. Cellphone had you near the scene of a crime? Arrested with that as the only evidence, and convicted by a court of laymen who think the evidence has more weight that it does.
The future you describe has been going on for decades.
AI just speeds up the process while (allegedly) making it cheaper. Tougher on crime while saving money! Vote Quimby!
Everyone seems to be overlooking one critical fact.
TikTok was supposed to be banned because it’s a CCP spyware app.
Now, you have this new – i’ll call it what I see it as – spyware – from the CCP
being downloaded onto hundreds of thousands of smartphones all over
the world. – and all of the information going right back to the CCP !
Brilliant chess move on the part of their intelligence community.
It’s offline and open source dude.
Ah yes, the Spyware that is… open source, MIT licensed, and runs completely offline. Truly a diabolical miracle of Spyware, that is.
Some people have been upset about the security threat posed by TokTok because it is Chinese owned. How do you think they’re going to react to a Chinese owned AI?
I mean it is MIT licensed, open source, and you can run it on an air-gapped machine.
You own it.
I find it amusing that anyone believes these claims. Its like saying I have a working cold fusion power plant to sell you. Without a whitepaper, patent, or some other real hint at what is revolutionary, well, everyone would laugh at a cold fusion plant too.
In this case, we have some counter arguments as well (just as in every cold fusion claim). For example, when answering questions about things the CCP would want censored (and thus never trained on) it begins to answer, and then self censors. No, there is some funny business going on. Just as there was funny business with stargate (a lack of funding). So we’ve had 2 bogus AI announcements that have dramatically moved markets around. I suspect it won’t be long until we have another “weapon” “fired”.
The source is published, hun. Investigate the claims yourself.
Amazing news IMO. The prospect for AI being used for good instead of evil is vastly higher if it can be developed and used by the common man, instead of being gatekept by nationstates and megacorps.
You’re not running R1 locally without ~400Gb of ram… You can do a trimmed down version, but it’s not the same for sure.
You’re not watching a movie without 5km of IMAX film… You can watch a compressed 4K digital version, but it’s not the same for sure.
pirating code is cheap….
It’s not pirating when it is open source
Perhaps the reference is to using other AI’s to train this AI?
You mean like how every modern AI model is trained? It is called synthetic data and it’s an established method.
Annnd Open AI has already busted them for distilling off of a specific GPT4 model. The whole thing about open source is you can do anything but sell it and that is what they are doing with deepseek. As many others are tired of blatant corporate espionage as well as civilian, I hope they get their pound of flesh this round. Dark em.
It is literally a GitHub repository you can download for free. Who is selling what?
Thanks for this article Maya! AutoKeybo now runs DeepSeek.