More Details On Why DeepSeek Is A Big Deal

February 3, 2025

The DeepSeek large language models (LLM) have been making headlines lately, and for more than one reason. IEEE Spectrum has an article that sums everything up very nicely.

We shared the way DeepSeek made a splash when it came onto the AI scene not long ago, and this is a good opportunity to go into a few more details of why this has been such a big deal.

For one thing, DeepSeek (there’s actually two flavors, -V3 and -R1, more on them in a moment) punches well above its weight. DeepSeek is the product of an innovative development process, and freely available to use or modify. It is also indirectly highlighting the way companies in this space like to label their LLM offerings as “open” or “free”, but stop well short of actually making them open source.

The DeepSeek-V3 LLM was developed in China and reportedly cost less than 6 million USD to train. This was possible thanks to developing DualPipe, a highly optimized and scalable method of training the system despite limitations due to export restrictions on Nvidia hardware. Details are in the technical paper for DeepSeek-V3.

There’s also DeepSeek-R1, a chain-of-thought “reasoning” model which handily provides its thought process enclosed within easily-parsed <think> and </think> pseudo-tags that are included in its responses. A model like this takes an iterative step-by-step approach to formulating responses, and benefits from prompts that provide a clear goal the LLM can aim for. The way DeepSeek-R1 was created was itself novel. Its training started with supervised fine-tuning (SFT) which is a human-led, intensive process as a “cold start” which eventually handed off to a more automated reinforcement learning (RL) process with a rules-based reward system. The result avoided problems that come from relying too much on RL, while minimizing the human effort of SFT. Technical details on the process of training DeepSeek-R1 are here.

DeepSeek-V3 and -R1 are freely available in the sense that one can access the full-powered models online or via an app, or download distilled models for local use on more limited hardware. It is free and open as in accessible, but not open source because not everything needed to replicate the work is actually released. Like with most LLMs, the training data and actual training code used are not available.

What is released and making waves of its own are the technical details of how researchers produced what they did, and that means there are efforts to try to make an actually open source version. Keep an eye out for Open-R1!

7 thoughts on “More Details On Why DeepSeek Is A Big Deal”

Vik says:

February 3, 2025 at 4:14 pm

I see a FLOSSAIatHome project in the making.

Report comment

Reply
Daniel Thomas Erickson says:

February 3, 2025 at 4:16 pm

Listening to Glenn Beck, he tried DeepSeek. He started asking questions where some of the responses could include China’s questionable history. Glenn took screen shots the whole time
At first, the response was correct. Then, he said the screen went blank, and showed a different response that cast a better light on China.
When Glenn asked DeepSearch about this, it lied; saying it did not display such info.
An AI that will lie to you. No thanks.

Report comment

Reply
1. Chyna says:
  
  February 3, 2025 at 4:23 pm
  
  An AI that will lie to you. No thanks.
  
  You realize that all of the major Western AI’s have been “safety aligned” too, yeah?
  
  Report comment
  
  Reply
2. Jon Mayo says:
  
  February 3, 2025 at 4:27 pm
  
  AIs all lie when that is a valid path to the goal. We did train them on human data, which is packed full of lies and deceit.
  
  Report comment
  
  Reply
3. Anonymous says:
  
  February 3, 2025 at 5:50 pm
  
  If it was the online one then duh. It’s going to their servers in China, they would basically be legally forced to so that.
  
  Try running it locally. So far I haven’t had issues with the distilled versions at least. A full model might be more censored but would lack some ways they could keep it neutered and could still be tuned to repair it.
  
  This also ignores how US models have been censored in stupid ways as well.
  
  Report comment
  
  Reply
Tony M says:

February 3, 2025 at 5:12 pm

Let’s be honest, the “sin” of DeepSeek is being made in China, so haters gonna hate. Sad situation, in the meanwhile, DS already help me to put together some nice trading bots, in almost 1/4 of the time it took me with chatgpt and half the time using Claude.

Report comment

Reply
NSFW says:

February 3, 2025 at 5:50 pm

I’d like to see the term “freeware” popularized for models where the weights are available but the training data isn’t.

Because the “open source” models are only as open as the .exe files of freeware Windows and DOS applications.

Report comment

Reply

Hackaday

More Details On Why DeepSeek Is A Big Deal

7 thoughts on “More Details On Why DeepSeek Is A Big Deal”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

USB Hub-A-Dub-Dub: Weird Edge Cases Are My Ruin

Digital Paint Mixing Has Been Greatly Improved With 1930s Math

Supercon 2024: Joshua Wise Hacks The Bambu X1 Carbon

Big Chemistry: Catalysts

Contrails Are A Hot Topic, But What Is To Be Done?

Our Columns

Keebin’ With Kristina: The One With The Keyboard Configurator

Underwater Robotics Hack Chat

Hackaday Links: February 2, 2025

Time Vs Money, 3D Printer Style

Hackaday Podcast Episode 306: Bambu Hacks, AI Strikes Back, John Deere Gets Sued, And All About Capacitors

7 thoughts on “More Details On Why DeepSeek Is A Big Deal”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns