Advertisement
X

DeepSeek Unveils V3.1 Model with Think & Non-Think Mode, Says “First Step Toward Agent Era”

Chinese start-up DeepSeek has released its V3.1 AI model, featuring a hybrid Think/Non-Think inference design, 128K-token context window and improved agent skills. The model is available on Hugging Face and GitHub with updated APIs and developer tools

DeepSeek Unveils V3.1 Model
Summary
  • DeepSeek-V3.1 adds hybrid-inference “Think” and “Non-Think” modes for faster reasoning

  • Expands long-context window to 128,000 tokens, enabling much longer dialogues

  • Mixture-of-Experts (MoE) design: ~671B parameters, 37B activated per token

  • Agent upgrades: lower Think-mode latency, improved tool-calling, API and template support

Advertisement

Chinese AI start-up DeepSeek on Thursday announced its latest DeepSeek-V3.1 model. The start-up described the upgraded model on WeChat as “first step toward the agent era.”

It introduces a hybrid-inference design that supports two modes- Think and Non-Think, within a single model, promising faster reasoning, stronger agent skills and an extended long-context window.

DeepSeek explained the hybrid-inference approach in its release and linked documentation, showing that a single model can operate in a deliberative, chain-of-thought Think mode or in a direct-answer Non-Think mode by switching chat templates or control tokens.

The model card and usage notes on DeepSeek’s model pages show special tokens and templates used to switch modes; for example, the docs reference an extra ‘think’ token used to trigger non-thinking flows.

Better Performance, Larger Context Window

DeepSeek said the new V3.1-Think checkpoint achieves comparable answer quality to its earlier R1 reasoning checkpoint while responding faster, yielding lower latency for Think-mode responses versus the prior R1 reasoning checkpoint.

Advertisement

The release emphasises improved agent capabilities, post-training upgrades specifically targeted at instrumented tool-calling, better programmatic/tool usage, and stronger performance on agent tasks such as search and tool orchestration.

A major capacity upgrade in V3.1 is long-context support: the model expands the context window to 128,000 tokens.

DeepSeek’s model page and release notes describe a two-phase long-context extension pipeline used to build the V3.1-Base variant and detail substantial increases in the extension datasets, for example, the 32,000 extension phase was enlarged about tenfold to roughly 630 billion tokens, and the 128,000 extension phase was increased about 3.3× to about 209 billion tokens, according to the release and technical notes.

On architecture and scale, DeepSeek-V3 is a Mixture-of-Experts (MoE) family: the V3/V3.1 lineage totals approximately 671 billion parameters with roughly 37 billion activated parameters per token, as described in the technical report and GitHub materials.

The company has published DeepSeek-V3.1 and a V3.1-Base checkpoint for download, with model files and checkpoints listed on Hugging Face and mirrored repositories.

Advertisement

Developer Guidance

The announcement also includes practical developer guidance: the model card outlines chat templates and control tokens (the tokenizer and template files are provided), explains how to switch modes by changing templates, and notes API compatibility and interface updates aimed at easing integration, including support for an Anthropic-style API format.

Reuters and DeepSeek’s docs additionally reported that the company will adjust API pricing for model usage effective 6 September 2025.

DeepSeek positioned V3.1 relative to its earlier R1 reasoning releases: R1 (for example, DeepSeek-R1-0528) emphasised improved reasoning via cold-start data and supported structured outputs and function calling.

V3.1 presents a hybrid that narrows the gap by offering a Think mode intended to match R1’s reasoning quality while lowering latency. The company’s GitHub and API pages link the R1 checkpoints and usage notes alongside the new V3.1 resources.

Advertisement

The technical report on arXiv and the project’s GitHub repository provide deeper detail: the V3 paper describes the MoE architecture, multi-head latent attention (MLA) and DeepSeekMoE designs, a multi-token prediction objective, and pre-training on an expanded token corpus that DeepSeek says achieves strong performance while using an efficient training budget.

Model cards on Hugging Face, linked GitHub repos and API docs together host the practical assets developers need to run local or hosted instances and to test the new templates and long-context features.

Show comments