DeepSeek: Time to rethink your AI thesis?
Written: June 19th, 2024
As most people reading this may be aware a Chinese-built large language model called DeepSeek-R1 launched last week and has created pandemonium in the markets and AI community. For those who didnt spend most of their past week reading up on these development below is a summary for what is going on and how I think the implications may play out.
Background
Spun out of a hedge fund, DeepSeek emerged from relative obscurity last month when it released a chatbot called V3, which outperformed major rivals, despite being built on a shoestring budget. Experts estimate that it cost around $6 million to rent the hardware needed to train the model, compared with upwards of $60 million for Meta’s Llama 3.1 405B, which used 11 times the computing resources. Part of the buzz around DeepSeek is that it has succeeded in making R1 despite US export controls that limit Chinese firms’ access to the best computer chips designed for AI processing.
This marks a transformative moment in the artificial intelligence sector. With its development of cost-effective and high-performing models, such as DeepSeek-V3 and R1, the company has disrupted the traditional AI ecosystem dominated by Western tech giants like OpenAI, Google, and Meta. DeepSeek’s approach is emblematic of a broader shift in AI development, one that emphasizes efficiency, open-source collaboration, and cost accessibility over the scale-driven proprietary strategies of its Western counterparts.
This is a fundamental recalibration in the competitive dynamics of the AI space. By prioritizing efficiency and affordability, the company has challenged the notion that cutting-edge AI requires massive compute resources and exorbitant costs. It demonstrates that with the right optimization strategies, smaller teams can rival, and even surpass, the performance of larger players. This has profound implications for the democratization of AI, potentially enabling broader access to advanced AI tools for startups, mid-sized enterprises, and developing economies.
Moreover, DeepSeek’s commitment to open-source development has amplified its influence. Open-source models lower the barriers to entry for smaller players and accelerate innovation across industries. This, however, comes at a cost: the erosion of margins for AI companies reliant on proprietary models. The economic pressure this creates could reshape the financial structures of AI development, shifting the focus from monetizing models to deriving value through applications and services.
China’s advantage
If AI models turn into Commodities, then the world will build their applications on those models. If China has the best one then they will control the base layer of global AI applications
What gives China it’s edge is commoditization of goods and services. As the cost of deploying and utilizing advanced AI systems decreases, AI’s value increasingly shifts from proprietary models to infrastructure, data, and applications. This plays directly into China’s strengths. With its robust manufacturing ecosystem, lower labor costs, and well-established supply chains for hardware and components, China is uniquely positioned to capitalize on the commoditization trend.
It can undercut competitors on price while expanding its influence in global markets. This mirrors broader trends in industries like electronics and solar energy, where China has achieved dominance through scale and cost efficiency.
For the West, where AI companies often rely on high margins from proprietary technologies, this commoditization poses a challenge. It undermines the economic model that has historically fueled AI research and development, forcing companies to rethink their strategies to remain competitive. China’s ability to leverage cost structures and rapidly scale commoditized solutions could make it the dominant force in shaping AI’s global adoption and standards.
Market Reactions and Economic Implications
The current market turbulence reflects investor concerns about the future profitability of companies that have heavily invested in AI infrastructure. The realization that high-performing AI models can be developed at a fraction of the previously assumed costs challenges the anticipated return on investment for these firms. Consequently, there is a growing apprehension that the substantial capital expenditures by major tech companies may not yield the expected competitive advantages, leading to potential reevaluations of their investment strategies.
So what now?
There is a popular refrain in the AI world that none of these AI companies truly have a moat (or defensibility) against competitors. This DeepSeek news has turbo charged those views. Plus now we have to worry that China will be the place which leads in AI innovation. Lots to think through, below are some interesting takes from very smart people
Everyone will be using the phrase Jevon’s Paradox:
If AI models keep dropping price, than its not so much that people will buy fewer chips and build less data centers but rather do more with their purchases.
This is the stance the big tech companies are taking. Last night Microsoft CEO Satya Nadella tweeted out:
Jevons paradox strikes again! As Al gets more efficient and accessible, we will see its use skyrocket, turning it into a commodity we just can't get enough of.
The Jevons paradox is an economic principle that states that increased efficiency can lead to increased consumption of a resource
He said this to allay fears in the market for folks like MSFT and NVDIA.
Writer, M.G. Siegler questions this stance:
Already, we're seemingly getting a memo sent around to bring up Jevons paradox – the notion that an increase in efficiency leads to an increase in consumption. Translation: DeepSeek is great because it will lift all boats as AI can scale faster and further. Sure, at the highest level that's undoubtedly true! But the details matter. Microsoft is about $80B deep in the weeds this year. The CapEx numbers that Nadella has been so busy touting may have just become an albatross around their neck. And if that's the case, he's lucky (and perhaps prescient) to offload OpenAI's 'Stargate' Project spend to Oracle and others.
Meanwhile, where all of this leaves NVIDIA – the current king of the hill thanks to being the key company at the center of all of this spend – is either catastrophic or ultimately okay. The revenue ramp was already slowing, as it must given the law of large numbers, but if all of Big Tech decides to slash their CapEx at once... NVIDIA's stock price may suffer a heart attack. And again, that would ripple through the entire market. But in the Jevons paradox equation above, NVIDIA would ultimately be fine as they'd undoubtedly remain a key provider of the underlying technology now being used at greater scale (albeit with lower individual entity spend).
The others in Big Tech will make a similar case: that all of the data centers built will still be needed as AI goes global, but the jury remains out in terms of technology depreciation in this world and future. One big question: as we shift from a world of pre-training to a world of inference, are the same servers and chips going to be just as useful/good versus racks built specifically for that purpose? And DeepSeek's breakthroughs just made all of that even more complicated. And while yes, just as in past booms, the build outs ended up being crucial for the future, those that spent on those build-outs usually didn't fare as well...
The real problem is that it won't be so simple to simply pull back spend. Beyond a lot of it already being committed, there's obviously still a very real risk that DeepSeek is just a blip on the radar and not the bomb that blows up everything. So none of Big Tech can really afford, quite literally, to let their feet off the gas just yet. Instead, they're going to have to try to recreate and study the methods the group used to create their models and see how replicable and scalable it is.
Speaking of NVDIA since we all own a ton of it, the great Ben Thompson wrote this about that company in his newsletter this morning
I own Nvidia! Am I screwed?
There are real challenges this news presents to the Nvidia story. Nvidia has two big moats:
CUDA is the language of choice for anyone programming these models, and CUDA only works on Nvidia chips.
Nvidia has a massive lead in terms of its ability to combine multiple chips together into one large virtual GPU.
These two moats work together. I noted above that if DeepSeek had access to H100s they probably would have used a larger cluster to train their model, simply because that would have been the easier option; the fact they didn't, and were bandwidth constrained, drove a lot of their decisions in terms of both model architecture and their training infrastructure. Just look at the U.S. labs: they haven't spent much time on optimization because Nvidia has been aggressively shipping ever more capable systems that accommodate their needs. The route of least resistance has simply been to pay Nvidia. DeepSeek, however, just demonstrated that another route is available: heavy optimization can produce remarkable results on weaker hardware and with lower memory bandwidth; simply paying Nvidia more isn't the only way to make better models.
That noted, there are three factors still in Nvidia's favor. First, how capable might DeepSeek's approach be if applied to H100s, or upcoming GB100s? Just because they found a more efficient way to use compute doesn't mean that more compute wouldn't be useful. Second, lower inference costs should, in the long run, drive greater usage.
Third, reasoning models like R1, derive their superior performance from using more compute. To the extent that increasing the power and capabilities of AI depend on more compute is the extent that Nvidia stands to benefit!
Still, it's not all rosy. At a minimum DeepSeek's efficiency and broad availability cast significant doubt on the most optimistic Nvidia growth story, at least in the near term. The payoffs from both model and infrastructure optimization also suggest there are significant gains to be had from exploring alternative approaches to inference in particular. For example, it might be much more plausible to run inference on a standalone AMD GPU, completely sidestepping AMD's inferior chip-to-chip communications capability. Reasoning models also increase the payoff for inference-only chips that are even more specialized than Nvidia's GPUs.
In short, Nvidia isn't going anywhere; the Nvidia stock, however, is suddenly facing a lot more uncertainty that hasn't been priced in. And that, by extension, is going to drag everyone down.
He also had a good take on the true cost (~$6m) that has been quoted for what it took DeepSeek to build their model
DeepSeek claimed the model training took 2,788 thousand H800 GPU hours, which, at a cost of $2/GPU hour, comes out to a mere $5.576 million.
DeepSeek is clear that these costs are only for the final training run, and exclude all other expenses; from the V3
paper:
Lastly, we emphasize again the economical training costs of DeepSeek-V3, summarized in Table 1, achieved through our optimized co-design of algorithms, frameworks, and hardware. During the pre-training stage, training DeepSeek-V3 on each trillion tokens requires only 180K H800 GPU hours, i.e., 3.7 days on our cluster with 2048 H800 GPUs. Consequently, our pre- training stage is completed in less than two months and costs 2664K GPU hours. Combined with 119K GPU hours for the context length extension and 5K GPU hours for post-training, DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data.
So no, you can't replicate DeepSeek the company for $5.576 million.
My Take: I personally do not give much credence to this number at all. There is a line in the above paper which says: Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data.
Seems like a big number to leave out but still though this is a huge change and will cause all the AI labs to take a long look at the way they build models. This is a true inflection point. If you can run high end AI models on less powerful devices, than this augurs well for folks like Apple who could build powerful AI products that can run on high end iPhones. Another thing to note DeepSeek’s model used advanced techniques like data-efficient training, rigorous parameter tuning, and extensive manual intervention to adapt to the constraints of less powerful hardware. While this approach drastically reduced costs it also introduces significant scalability challenges. The reliance on bespoke optimizations means that reproducing or scaling this method to train larger models or handle diverse tasks would require a high level of expertise and potentially an unsustainable amount of manual effort. In contrast, our labs like OpenAI and Google have developed highly automated pipelines that rely on expansive infrastructure and powerful hardware, enabling them to scale up model training efficiently. How much this matters, remains to be seen but should be discounted.
This is alot, what’s the overall take
Seldom have I seen so many smart posts and articles come out about a tech development so quickly. It is all very head spinning. As an optimist I think nothing fundamentally changes in the long term except for we might be closer to achieving the previously impossible quicker. My view on these large companies spending billions without any clear ROI has always been that they believe they are building a God like product (or what they call artificial super intelligence or ASI) which if they do then the ROI for them is a good chunk of the economic output of the world. This development makes that likely to happen a little faster (if it can actually be done).
Now, in the short term will there be significant impacts on stocks of companies like NVDIA and MSFT? Maybe, I would not be surprised if that stock trends down quite a bit for the next couple of quarters, but if you were bullish on AI last month, this shouldn’t shake your belief on its impact. AI will need all the fuel and infrastructure it can get.
Smaller, non-foundation model, AI companies may benefit from the lowered costs of training models but the further commoditization of the model layer will result in them becoming more like “CPG” companies quicker. The savings they get from spending less on compute and saving will go towards branding and distribution, sort of like how the large CPG companies function today. They are going to be selling a commodity and that comes with a host of new challenges. Their investors and VCs may be ones most impacted by today’s news. Thing is though, this was the case even before this past week’s developments. It’s jsut going to happen faster now.
So, alot has changed but maybe not that much has is my current take. Of course thats not what is going to drive views and attention so we are going to hear a lot about this development for a bit. I saw let’s take a deep breath and see what the next few months bring. They will likely be rocky but should provide a lot more clarity on where we are headed.