DeepSeek touts new training method as China pushes AI efficiency
Chinese startups continue to operate under significant constraints
[BENGALURU] DeepSeek published a paper outlining a more efficient approach to developing artificial intelligence (AI), illustrating the Chinese AI industry’s effort to compete with the likes of OpenAI despite a lack of free access to Nvidia chips.
The document, co-authored by founder Liang Wenfeng, introduces a framework it called Manifold-Constrained Hyper-Connections. It’s designed to improve scalability while reducing the computational and energy demands of training advanced AI systems, according to the authors.
Such publications from DeepSeek have foreshadowed the release of major models in the past. The Hangzhou-based startup stunned the industry with the R1 reasoning model a year ago, developed at a fraction of the cost of its Silicon Valley rivals. DeepSeek has since released several smaller platforms but anticipation is mounting for its next flagship system, widely dubbed the R2, expected around the Spring Festival in February.
Chinese startups continue to operate under significant constraints, with the US preventing access to the most advanced semiconductors essential to developing and running AI. Those restrictions have forced researchers to pursue unconventional methods and architectures.
DeepSeek, known for its unorthodox innovations, published its latest paper this week through the open repository arXiv and open-source platform Hugging Face. The paper lists 19 authors, with Liang’s name appearing last.
The founder, who’s consistently steered DeepSeek’s research agenda, has pushed his team to rethink how large-scale AI systems are conceived and built.
The latest research addresses challenges such as training instability and limited scalability, noting that the new method incorporates “rigorous infrastructure optimisation to ensure efficiency”. Tests were conducted on models ranging from three billion to 27 billion parameters, building on ByteDance’s 2024 research into hyper-connection architectures.
The technique holds promise “for the evolution of foundational models”, the authors said. BLOOMBERG
Decoding Asia newsletter: your guide to navigating Asia in a new global order. Sign up here to get Decoding Asia newsletter. Delivered to your inbox. Free.
Share with us your feedback on BT's products and services