DeepSeek, a Chinese AI start-up, is making buzz around the artificial intelligence (AI) industry as its large language models (LLMs) reportedly outperformed the pioneer OpenAI’s top models. The start-up’s R1 model, launched in November 2024, and the latest V3 model beat OpenAI’s o1 Preview and GPT-4o on multiple benchmarks.
The DeepSeek-R1-Lite outperformed the OpenAI o1-preview on important benchmarks like AIME 2024 (AI system optimization via Multiple LLM Evaluators), Math, and Codeforces while the OpenAI model beat R1 lite in benchmarks like ZebraLogic and LiveCodeBench.
Similarly, DeepSeek’s latest V3 model was at par with reputed models like GPT-4o, Llama 3.1 and Claude 3.5 Sonnet.
What is DeepSeek-V3?
DeepSeek-V3 is a mixture-of-experts (MoE) language model with 671 bn parameters, 37 bn of which are activated per token. It is trained on 14.8 trn high-quality tokens, enabling it to handle complex tasks in coding, mathematics and reasoning.
The model is built upon an innovative architecture. It includes the MLA (Multi-Head latent Efficiency) mechanism that enhances the model’s inference efficiency, the MoE model which helps in load balancing and avoiding auxiliary loss and MTP (Multi-Token Prediction) which helps in solving complex prompts.
DeepSeek-V3 vs OpenAI o1
The o1 Preview is a relatively older model released four months ago on September 12, 2024, while DeepSeek-V3 is quite the latest, released on December 27, 2024.
Both models feature a 128K token input context window, though o1 Preview offers a higher maximum output of 32.8K tokens compared to DeepSeek-V3's 8K tokens.
While o1 Preview is closed-source with a knowledge cut-off date of October 2023, DeepSeek-V3 is open-source with an unknown knowledge cut-off.
The pricing structure shows a significant difference, with DeepSeek-V3 being substantially more economical at $0.14 per mn input tokens and $0.28 per mn output tokens, compared to o1 Preview's $15.00 and $60.00 respectively.
What Makes DeepSeek-V3 Important?
Almost all major Generative-AI giants of the world are based in California’s Bay Area. AI start-ups like OpenAI, Anthropic, Perplexity, Databricks and xAI are all operating from the same region in the US. At a time when the power to dictate the century’s most crucial technology is consolidated within a few players from the US, DeepSeek-V3 represents a shift in the balance of power in the LLM space.
DeepSeek-V3 is an open-source model that allows anyone to access, modify, and deploy it to solve specific problems or perform tasks. This democratizes the technology and allows small companies or individual developers to take advantage of cutting-edge technology without the need for massive resources.
Along with strong technical specifications, what truly makes DeepSeek-V3 stand out is its affordability. The start-up has made it clear that low costs are at the core of its mission and ensures that the user has to incur minimal costs for both training and inference.
The mix of high performance and low cost could significantly impact industries that rely on AI for tasks like content creation, data analysis, and customer support. Smaller startups now have the opportunity to leverage advanced AI technology at a fraction of the price of traditional models like GPT-4 or Claude Sonnet 3.5.
DeepSeek-V3 Identifies as Chat-GPT
Soon after its launch, multiple posts on X claimed that DeepSeek-V3 identifies itself as OpenAI’s Chat-GPT. X users posted multiple screenshots online which showed that on being asked “What model are you?”, the DeepSeek-V3 replies, “I’m Chat-GPT, a language model developed by OpenAI.”
Users claimed that if you ask DeepSeek-V3 a question about DeepSeek’s API, it’ll provide instructions on how to use OpenAI’s API. The model even tells some of the same jokes as Chat-GPT.
One possible explanation for this is that DeepSeek-V3 might be trained on public datasets containing text generated by GPT-4 via Chat-GPT. The model might have memorised some of GPT-4’s outputs and is now generating verbatim outputs.
However, another explanation could be that the model might have been directly trained on the outputs of other models like Chat-GPT and tried to capitalise on its pre-trained knowledge.
Sam Altman’s Indirect Dig at DeepSeek
OpenAI CEO Sam Altman took an indirect dig at DeepSeek and other competitors through an X post saying, “It is (relatively) easy to copy something that you know works. It is extremely hard to do something new, risky, and difficult when you don't know if it will work.”
The OpenAI CEO remarked how difficult it is to build a team and carry them together towards a new venture. He also appreciated his team leader saying, “It's also extremely hard to rally a big talented research team to charge a new hill in the fog together. This is key to driving progress forward. Thanks to Ilya, Jakub, Bob, Mark, and many other exceptional research leaders who got us here.”