Artificial Intelligence

Chinese AI Firm DeepSeek Releases Updated R1 Model on Hugging Face

this update is a ‘minor’ upgrade and is licensed under the permissive MIT licence, allowing commercial use. The Hugging Face repository for R1‑0528 includes configuration files and weights, which guide the model’s behaviour, but lacks a detailed description

Chinese AI Firm DeepSeek Releases Updated R1 Model on Hugging Face
info_icon

Chinese AI start-up DeepSeek released an updated version of its R1 reasoning AI model, named R1‑0528, on the Hugging Face developer platform, the company announced on WeChat on Wednesday.

With 685 billion parameters, the updated R1 is a substantial model likely to require significant computational resources beyond consumer‑grade hardware.

According to the announcement, this update is a ‘minor’ upgrade and is licensed under the permissive MIT licence, allowing commercial use. The Hugging Face repository for R1‑0528 includes configuration files and weights, which guide the model’s behaviour, but lacks a detailed description.

This follows DeepSeek’s earlier R1 launch in January, which challenged industry leaders such as OpenAI with its cost‑efficient performance reportedly developed for just $6 million compared with billions spent by competitors.

DeepSeek V3 Updates

DeepSeek recently released updates to its V3 model, named V3‑0324, aimed at enhancing programming capabilities to compete with its rivals.

The update was quietly posted on the AI community platform Hugging Face and on GitHub without a public statement. The model’s licence was also changed to the permissive MIT licence, an open‑source standard from the Massachusetts Institute of Technology, allowing commercial use.

DeepSeek‑V3, an earlier model from the Chinese start-up, is a mixture‑of‑experts language model with 671 billion parameters, of which 37 billion are activated per token. Trained on 14.8 trillion high‑quality tokens, it excels at complex tasks such as coding, mathematics and reasoning.

The model features an innovative architecture, including the MLA (multi‑head latent efficiency) mechanism for improved inference efficiency, the MoE framework for load balancing and reduced auxiliary loss, and MTP (multi‑token prediction) for handling complex prompts.

This release has fuelled discussions about whether cutting‑edge AI platforms can be developed for a fraction of the cost of those built by US companies, particularly after DeepSeek’s earlier R1 model reportedly cost $6 million.

DeepSeek Wave

DeepSeek recently gained attention in the AI industry for its large language models, which reportedly outperformed OpenAI’s top models.

The start-up’s R1 model, launched in November 2024, and the V3‑0324 model surpassed OpenAI’s o1 Preview and GPT‑4o across multiple benchmarks, according to industry discussions.

DeepSeek disclosed financial figures indicating a theoretical profit five times its operational costs, with a cost‑profit ratio of up to 545% per day for its V3 and R1 models.

However, the company emphasised that these figures are hypothetical, as actual revenue is likely lower owing to limited monetisation and discounts during off‑peak hours. The reported costs also exclude significant expenses for research, development and model training.

Published At:
×