DeepSeek V3.1 - A Hybrid Model That Replaces V3 And R1

DeepSeek has launched its latest model, DeepSeek V3.1, a hybrid model that supports both thinking and non-thinking modes. With it comes an update to API pricing and the end of discounted hours.

The open-weight model is available for download on HuggingFace, and DeepSeek has updated its official API to use the new V3.1 model, replacing the previous V3 and R1 models. You can also try the model on DeepSeek’s chat interface.

DeepSeek V3.1 – A Hybrid Model

DeepSeek’s V3 (non-thinking) and R1 (thinking) LLMs were a great choice for AI roleplay due to their affordable pricing, discounted happy hours, quality creative writing, and limited free access through proxy services like OpenRouter.

And while you can still use V3 and R1 from other providers, DeepSeek has replaced the previous models with its new V3.1 model on its official API.

Also Read: DeepSeek R1 vs. V3 – Which Is Better For AI Roleplay?

The new hybrid model supports both thinking and non-thinking modes, has an increased context size of 128K tokens, and is faster and consumes fewer tokens while thinking. It’s also optimized for tool usage and agent tasks.

Replaces V3 and R1 – What Does This Mean For AI Roleplay?

Since the model has only been available for a few hours, people are still forming their initial impressions of DeepSeek V3.1 for AI roleplay. So far, V3.1’s non-thinking mode seems to have retained V3’s characteristics and behavior while slightly improving its ability to follow instructions. And V3.1’s thinking mode continues to retain R1’s characteristics and behavior.

In the coming days, prompts and presets optimized for DeepSeek V3.1 will be available, allowing us to accurately judge the new model’s performance in AI roleplay. However, we don’t expect significant changes in how the new model behaves compared to the previous V3 and R1 models.

Also Read: Gemini API Ban Wave – AI Roleplay And Google’s API Policies

DeepSeek V3.1 is built upon the original DeepSeek V3, so it’s not an entirely new model. V3.1 has shown significant improvements in benchmarks so far. For AI roleplay, the model behaves and acts very similarly to previous models.

New API Pricing Effective 5th September

The launch of DeepSeek V3.1 also brings an update to DeepSeek’s official API pricing. Since they no longer offer V3 and R1 as separate models, DeepSeek has unified the pricing for V3.1’s non-thinking and thinking modes.

Also Read: Understanding Tokens And Context Size

The new pricing comes into effect on September 5th, 2025, with $0.07 per 1 million input tokens that hit their input tokens cache (repeated tokens), $0.56 per 1 million tokens that miss their input tokens cache (new tokens), and $1.68 per 1 million output tokens.

DeepSeek will also stop offering discounted hours (off-peak pricing) starting September 5th. Despite the pricing changes, they remain one of the most affordable first-party APIs for accessing LLMs as powerful and capable as DeepSeek.

What Is DeepSeek Input Tokens Cache?

The official DeepSeek API has a nifty setting enabled by default that caches your input tokens.

Permanent tokens (character definition, scenario, instructions, system/custom prompts, etc.) that you send to DeepSeek along with your first message to your AI character are cached.
DeepSeek processes new tokens you send (new messages) and saves them in its cache. Unless any previously sent tokens are modified, they stay cached.
DeepSeek recognizes cached tokens with each new input and charges a significantly lower price for processing them.

The cached input tokens feature is a blessing for those who enjoy long roleplays and constantly swipe to generate new responses. It significantly lowers the cost of processing duplicate input.

Cost Of Using DeepSeek Official API

In September 2025, we used 798,963 tokens with DeepSeek V3.1 non-thinking mode and 1,440,232 Tokens with thinking mode and sent a total of 398 messages to our AI characters. It cost us $0.61.

DeepSeek API Pricing Change September 2025

DeepSeek’s Pricing Compared To Others

Model	Input Tokens (Per Million)		Output Tokens (Per Million)
Model	Cache Hit	Cache Miss	Output Tokens (Per Million)
DeepSeek V3.1	$0.07	$0.56	$1.68
GPT-5	$0.125	$1.25	$10
GPT-5 mini	$0.025	$0.25	$2
Claude Opus 4.1	$1.50	$15	$75
Claude Sonnet 4	$0.30	$3	$3
Grok 4	$0.75	$3.00	$15
Grok 3	$0.75	$3.00	$15
Grok 3 mini	$0.075	$0.30	$0.50
Gemini 2.5 Pro	$0.31	$1.25	$10
Gemini 2.5 Flash	$0.075	$0.30	$2.50
Mistral Large	$2	$2	$6
Mistral Medium 3	$0.4	$0.4	$2
Mistral Small 3.2	$0.1	$0.1	$0.3

DeepSeek V3.1, A Step Ahead

DeepSeek’s latest model, V3.1, is a hybrid model that supports both thinking and non-thinking modes. It demonstrates significant improvements in benchmarks and is a step ahead towards DeepSeek’s future models.

The initial impression of DeepSeek V3.1’s performance in AI roleplay does not indicate any major change from the previous V3 and R1 models. We will be able to accurately judge the new model’s performance once optimized prompts and presets are available.

Along with the new model, there are changes to the official API’s pricing and the end of discounted hours. However, DeepSeek remains one of the most affordable options for AI roleplay compared to other similarly capable models.