Close Menu
Roleplay With AI
    X (Twitter) Reddit Discord
    Roleplay With AIRoleplay With AI
    • Home
    • What’s New
      • Newsletter
    • News
      • Interviews
    • Guides
      • LLMs For AI Roleplay
      • Beginner Guides
    • Entertainment
      • Opinions
    • AI Roleplay
      • Feature Articles
      • Local Roleplay
      • Online Roleplay
    Roleplay With AI
    Home»Guides & Tips»DeepSeek’s Input Tokens Cache And AI Roleplay
    DeepSeek’s Input Tokens Cache And AI Roleplay
    Guides & Tips

    DeepSeek’s Input Tokens Cache And AI Roleplay

    By WayfarerSeptember 17, 2025Updated:December 7, 20255 Mins Read

    During AI roleplay, every message you send to the LLM is one big prompt that includes the character definition, scenario, system or custom prompts, conversation history, and more. As your conversation with the LLM progresses, the amount of repetitive input also increases.

    Table of Contents
    1. What Are Input Tokens?
    2. DeepSeek’s Input Tokens Cache
      1. How Long Do Input Tokens Remain Cached?
      2. Does Deleting Messages Affect The Input Tokens Cache?
      3. DeepSeek’s Input Tokens Cache And Lorebooks
      4. DeepSeek’s Input Tokens Cache And Context Size
    3. DeepSeek’s Input Tokens Cache And AI Roleplay

    What Are Input Tokens?

    When you roleplay with AI, you’re interacting with an LLM that converts your input into tokens, analyzes the context, and predicts the next token one at a time. It doesn’t communicate like a human; it follows patterns learned from training to generate the most probable response.

    Learn More: Understanding Tokens And Context Size

    The frontend you use structures each message you send to the LLM as one big prompt. This prompt includes:

    • Permanent tokens that always remain within the context window, such as character definition/personality, scenario, system/custom prompts, etc.
    • Temporary tokens that eventually exit the context window, such as the starting message, example dialogues, conversation history, etc.

    Many people think only their most recent reply counts as Input Tokens, but that’s not true. Input Tokens include everything in the prompt you send to the LLM. You can control how many Input Tokens are sent by adjusting the Context Size setting in your frontend.

    DeepSeek’s Input Tokens Cache

    When you send the first message in a new chat, all the information is new to the LLM. It receives details such as character definition, scenario, and system or custom prompts, along with your response.

    Input Tokens First Message

    DeepSeek caches your initial input. When you send a second message, DeepSeek detects the repeated tokens in your prompt, retrieves them from its cache, and charges you a much lower price for processing them.

    Input Tokens Second Message

    As your conversation progresses, DeepSeek processes your new tokens (the content of your latest message) and saves them in its cache. Unless any previously sent tokens are modified, they remain cached. With each new message, DeepSeek recognizes cached tokens in your prompt and charges a much lower price for processing them.

    Input Tokens Third Message

    DeepSeek charges $0.028 per million tokens for processing tokens stored in its cache, while it charges $0.28 per million tokens for processing new Input Tokens.

    Additionally, when you reroll or regenerate responses, you are not sending any new Input Tokens. DeepSeek processes your request by retrieving information from its cache, making your swipes and regenerations significantly less expensive.

    How Long Do Input Tokens Remain Cached?

    We used SillyTavern during our tests, so your results may vary depending on the frontend you use. We paused our roleplay, closed SillyTavern, shut down our computer, and resumed our roleplay after approximately 12 hours.

    DeepSeek Input Tokens Cache 12 Hour Break

    Our Input Tokens remained cached for about 12 hours. The official documentation states that “unused cache entries are automatically cleared, typically within a few hours to days.” There’s no specific retention time mentioned, but your Input Tokens should remain cached even if you step away from your chat for a few hours.

    We even branched our chat on SillyTavern, and the Input Tokens remained cached even when we continued the other chat.

    Does Deleting Messages Affect The Input Tokens Cache?

    We sent four messages and received four responses from the LLM, then deleted those eight messages from our ongoing chat before sending a new message. Deleting these recent eight messages didn’t affect Deepseek’s Input Tokens Cache.

    DeepSeek Input Tokens Cache Before And After Delete

    However, deleting previous chat messages from the middle of the conversation affected the cache. The content before the deleted messages remained cached, but DeepSeek treated everything after the deleted messages as new input.

    DeepSeek Input Tokens Cache Before And After Delete Middle Messages

    DeepSeek’s Input Tokens Cache And Lorebooks

    Similar to how deleting recent messages didn’t affect the Input Tokens cache, lorebooks inserted at the bottom of the prompt (depth 0) also didn’t affect the Input Tokens cache.

    DeepSeek Input Tokens Cache Before And After Lorebook Depth 0

    However, lorebooks inserted before or after character definitions affected the Input Tokens cache. The content before the lorebooks prompt remained cached, but DeepSeek treated everything after the lorebooks prompt as new input.

    DeepSeek Input Tokens Cache Before And After Lorebook 02

    Inserting lorebooks at the bottom of the prompt is the most effective way to use lorebooks with DeepSeek’s Input Tokens cache.

    DeepSeek’s Input Tokens Cache And Context Size

    During our testing, we used a context size of 8,192. When we reached the context size limit, SillyTavern started removing the earliest chat messages from the context window, as intended. However, this impacted the Input Tokens cache, since DeepSeek treated all content after the removed chat messages as new input.

    DeepSeek Input Tokens Cache Before And After Context Size Limit

    When you reach your context size limit, you don’t get as much benefit from DeepSeek’s Input Tokens cache. Some repeated tokens remain cached, but because older chat messages are constantly removed, DeepSeek treats the majority of your prompt as new input.

    Also Read: Context Rot: Large Context Size Negatively Impacts AI Roleplay

    You could consider increasing your context size for longer roleplays, but depending on your budget, exceeding a context size of 16,384 could make AI roleplay an expensive hobby.

    DeepSeek’s Input Tokens Cache And AI Roleplay

    DeepSeek’s Input Tokens Cache is a feature available through the first-party API that reduces the cost of processing duplicate Input Tokens, such as repeated instructions and chat history.

    For AI roleplay, the Input Tokens Cache feature helps reduce costs. Since your prompts often repeat tokens, DeepSeek retrieves duplicates from its cache and charges far less to process them.

    If you make significant changes to the start or middle of your prompt, like adding a lorebook entry or deleting earlier messages, DeepSeek treats all input from that point onward as new. Changes at the end of your prompt don’t affect the Input Tokens cache.

    Once you reach your context size limit, the oldest chat messages start dropping out of the context window. And you don’t get as much benefit from the Input Tokens cache because your prompt is constantly changing.

    API & Proxies DeepSeek
    Share. Twitter Reddit WhatsApp Bluesky Copy Link
    Wayfarer
    • Website
    • X (Twitter)

    Wayfarer is the founder of RPWithAI. He’s a former journalist who became interested in AI in 2023 and quickly developed a passion for AI roleplay. He enjoys medieval and fantasy settings, and his roleplays often involve politics, power struggles, and magic.

    Related Articles

    DeepSeek V3.2's Performance In AI Roleplay

    DeepSeek V3.2’s Performance In AI Roleplay

    December 11, 2025
    Use DeepSeek On WyvernChat

    Use DeepSeek On WyvernChat

    September 25, 2025
    Optimize SillyTavern For AI Roleplay

    Optimize SillyTavern For AI Roleplay

    August 19, 2025
    Use DeepSeek On Chub

    Use DeepSeek On Chub

    August 10, 2025
    DeepSeek R1 vs. V3 - Which Is Better For AI Roleplay?

    DeepSeek R1 vs. V3 – Which Is Better For AI Roleplay?

    August 5, 2025
    Understanding Tokens And Context Size

    Understanding Tokens And Context Size

    July 18, 2025

    New Articles

    AI Roleplay Is No Longer A Niche, But There’s A Catch

    AI Roleplay Is No Longer A Niche, But There’s A Catch

    December 22, 2025
    Neuro-sama’s 2025 Subathon: Sub Goals, Music Video, Merch Drop, And More!

    Neuro-sama’s 2025 Subathon: Sub Goals, Music Video, Merch Drop, And More!

    December 20, 2025
    Sophia's LoreBary 2.0 - Roleplay Studio, AI Assistant, And More

    Sophia’s LoreBary 2.0 – Roleplay Studio, AI Assistant, And More

    December 15, 2025
    Sketchy AI Roleplay Platform Ads On Reddit

    Sketchy AI Roleplay Platform Ads On Reddit

    December 13, 2025
    DeepSeek V3.2's Performance In AI Roleplay

    DeepSeek V3.2’s Performance In AI Roleplay

    December 11, 2025
    Subscribe to Our Newsletter!

    Stay in the loop with the AI roleplay scene! Subscribe to our newsletter to get our latest posts delivered directly to your inbox twice a month.

    About Us & Policies
    • About Us
    • Contact Us
    • Content Policy
    • Privacy Policy
    Connect With Us
    X (Twitter) Reddit Discord
    © 2025 RPWithAI. All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.