DeepSeek’s Input Tokens Cache And AI Roleplay

During AI roleplay, every message you send to the LLM is one big prompt that includes the character definition, scenario, system or custom prompts, conversation history, and more. As your conversation with the LLM progresses, the amount of repetitive input also increases.

What Are Input Tokens?

When you roleplay with AI, you’re interacting with an LLM that converts your input into tokens, analyzes the context, and predicts the next token one at a time. It doesn’t communicate like a human; it follows patterns learned from training to generate the most probable response.

Learn More: Understanding Tokens And Context Size

The frontend you use structures each message you send to the LLM as one big prompt. This prompt includes:

Permanent tokens that always remain within the context window, such as character definition/personality, scenario, system/custom prompts, etc.
Temporary tokens that eventually exit the context window, such as the starting message, example dialogues, conversation history, etc.

Many people think only their most recent reply counts as Input Tokens, but that’s not true. Input Tokens include everything in the prompt you send to the LLM. You can control how many Input Tokens are sent by adjusting the Context Size setting in your frontend.

DeepSeek’s Input Tokens Cache

When you send the first message in a new chat, all the information is new to the LLM. It receives details such as character definition, scenario, and system or custom prompts, along with your response.

DeepSeek caches your initial input. When you send a second message, DeepSeek detects the repeated tokens in your prompt, retrieves them from its cache, and charges you a much lower price for processing them.

As your conversation progresses, DeepSeek processes your new tokens (the content of your latest message) and saves them in its cache. Unless any previously sent tokens are modified, they remain cached. With each new message, DeepSeek recognizes cached tokens in your prompt and charges a much lower price for processing them.

DeepSeek charges $0.028 per million tokens for processing tokens stored in its cache, while it charges $0.28 per million tokens for processing new Input Tokens.

Additionally, when you reroll or regenerate responses, you are not sending any new Input Tokens. DeepSeek processes your request by retrieving information from its cache, making your swipes and regenerations significantly less expensive.

How Long Do Input Tokens Remain Cached?

We used SillyTavern during our tests, so your results may vary depending on the frontend you use. We paused our roleplay, closed SillyTavern, shut down our computer, and resumed our roleplay after approximately 12 hours.

DeepSeek Input Tokens Cache 12 Hour Break

Our Input Tokens remained cached for about 12 hours. The official documentation states that “unused cache entries are automatically cleared, typically within a few hours to days.” There’s no specific retention time mentioned, but your Input Tokens should remain cached even if you step away from your chat for a few hours.

We even branched our chat on SillyTavern, and the Input Tokens remained cached even when we continued the other chat.

Does Deleting Messages Affect The Input Tokens Cache?

We sent four messages and received four responses from the LLM, then deleted those eight messages from our ongoing chat before sending a new message. Deleting these recent eight messages didn’t affect Deepseek’s Input Tokens Cache.

DeepSeek Input Tokens Cache Before And After Delete

However, deleting previous chat messages from the middle of the conversation affected the cache. The content before the deleted messages remained cached, but DeepSeek treated everything after the deleted messages as new input.

DeepSeek Input Tokens Cache Before And After Delete Middle Messages

DeepSeek’s Input Tokens Cache And Lorebooks

Similar to how deleting recent messages didn’t affect the Input Tokens cache, lorebooks inserted at the bottom of the prompt (depth 0) also didn’t affect the Input Tokens cache.

DeepSeek Input Tokens Cache Before And After Lorebook Depth 0

However, lorebooks inserted before or after character definitions affected the Input Tokens cache. The content before the lorebooks prompt remained cached, but DeepSeek treated everything after the lorebooks prompt as new input.

DeepSeek Input Tokens Cache Before And After Lorebook 02

Inserting lorebooks at the bottom of the prompt is the most effective way to use lorebooks with DeepSeek’s Input Tokens cache.

DeepSeek’s Input Tokens Cache And Context Size

During our testing, we used a context size of 8,192. When we reached the context size limit, SillyTavern started removing the earliest chat messages from the context window, as intended. However, this impacted the Input Tokens cache, since DeepSeek treated all content after the removed chat messages as new input.

DeepSeek Input Tokens Cache Before And After Context Size Limit

When you reach your context size limit, you don’t get as much benefit from DeepSeek’s Input Tokens cache. Some repeated tokens remain cached, but because older chat messages are constantly removed, DeepSeek treats the majority of your prompt as new input.

Also Read: Context Rot: Large Context Size Negatively Impacts AI Roleplay

You could consider increasing your context size for longer roleplays, but depending on your budget, exceeding a context size of 16,384 could make AI roleplay an expensive hobby.

DeepSeek’s Input Tokens Cache And AI Roleplay

DeepSeek’s Input Tokens Cache is a feature available through the first-party API that reduces the cost of processing duplicate Input Tokens, such as repeated instructions and chat history.

For AI roleplay, the Input Tokens Cache feature helps reduce costs. Since your prompts often repeat tokens, DeepSeek retrieves duplicates from its cache and charges far less to process them.

If you make significant changes to the start or middle of your prompt, like adding a lorebook entry or deleting earlier messages, DeepSeek treats all input from that point onward as new. Changes at the end of your prompt don’t affect the Input Tokens cache.

Once you reach your context size limit, the oldest chat messages start dropping out of the context window. And you don’t get as much benefit from the Input Tokens cache because your prompt is constantly changing.