The AI world is buzzing about DeepSeek R1, a new open-source reasoning model developed by a Chinese AI startup called DeepSeek. The company claims that R1 performs just as well, or even better, than OpenAI’s ChatGPT o1 on many benchmarks, but at a much lower cost.
Experts believe that R1’s development is a big win for researchers and developers, especially those in countries with fewer resources. Hancheng Cao, a professor at Emory University, said this model could be a game-changer for the Global South, offering a more affordable way to access cutting-edge AI.
DeepSeek’s achievement is especially impressive because Chinese AI companies face challenges due to US restrictions on advanced chips. These sanctions were meant to slow down China’s AI progress, but instead, they’ve pushed companies like DeepSeek to innovate more efficiently and collaborate in new ways.
To make R1, DeepSeek had to rethink its training process. The company used GPUs from Nvidia with limited performance, but they found ways to make the model work efficiently without needing as much computing power. Zihan Wang, a former employee at DeepSeek, explained that they focused on getting accurate answers quickly, rather than breaking down every step in detail, which helped reduce the time and power needed.
R1 has impressed researchers for its ability to solve complex problems, especially in math and coding. The model uses a “chain of thought” approach, like ChatGPT o1, to solve problems step by step. But what stands out is how simple and efficient R1’s engineering is, says Dimitris Papailiopoulos, a researcher at Microsoft.
DeepSeek has also released smaller versions of R1 that can run on regular laptops. One of them even outperforms OpenAI’s o1-mini in some tests.
DeepSeek was founded in 2023 by Liang Wenfeng, an alumnus of Zhejiang University, and was incubated by his hedge fund, High-Flyer. Despite being a relatively new company, DeepSeek has made significant strides in AI development. They even have a large stockpile of Nvidia A100 chips, which are now banned for export to China. This stockpile allowed DeepSeek to train its models before the restrictions kicked in.
While big companies like Alibaba and ByteDance dominate the Chinese AI market, DeepSeek is unique because it doesn’t plan to raise funds and works independently. This independence gives them more flexibility to experiment and innovate. Zihan Wang, a former employee, praised the company for giving researchers the freedom to experiment with new ideas.
Liang Wenfeng, DeepSeek’s founder, acknowledged the challenges Chinese companies face with chip shortages and inefficient engineering techniques. However, DeepSeek has found ways to overcome these hurdles by optimizing memory and speeding up calculations without sacrificing accuracy.
DeepSeek, like many other Chinese AI companies, is embracing open-source principles. For instance, Alibaba Cloud has released over 100 open-source AI models. This open-source culture is popular among young Chinese researchers, as it gives them access to valuable resources without financial barriers.
According to a report, China now contributes to 36% of the world’s AI large language models, making it the second-largest contributor after the United States.
Despite the challenges, the US export controls may have pushed Chinese companies to become more efficient, and we may see more partnerships and collaborations in the future. For example, Alibaba Cloud recently partnered with 01.AI, another Chinese AI startup, to work together on large AI models.
Overall, the AI industry in China is adapting to the changing landscape, and companies like DeepSeek are leading the way in making AI more accessible and efficient.