Understanding Tokens And Context Size

Ever wondered why the prince you created with a detailed backstory and rich appearance turns into a generic medieval character after just a few messages? Or why the shy, clumsy barista who struggles with taking your order breaks character and isn’t the cute failure you want to roleplay with?

If you’re roleplaying with AI and want a good experience with the characters you chat with and the worlds you build, you need to understand what tokens and context size are.

What Are Tokens?

TLDR: When you roleplay with AI, you’re interacting with an LLM that converts your input into tokens, analyzes the context, and predicts the next token one at a time. It doesn’t communicate like a human; it follows patterns learned from training to generate the most probable response.

Explanation: When you roleplay with AI characters, you’re interacting with a Large Language Model (LLM). LLMs are trained on large amounts of text, which they process as tokens during training.

When you send information to the LLM, such as character details or roleplay messages, it breaks that input into tokens. Then, using its training, it analyzes the context and predicts which output tokens to generate, forming a response one token at a time.

The LLM doesn’t talk to you like a human would. It relies on complex mathematical patterns to understand the context of your input and predict the most likely next token based on that context.

The cute and clumsy barista you’re roleplaying with isn’t spontaneously deciding to drop the coffee cup. The LLM predicts that spilling coffee is the most appropriate response based on the information it receives from you and its training data.

What Is Context Size (Context Window)?

TLDR: Context size is your AI’s memory. Too low, and it forgets quickly or acts out of character. Too high, and it remembers too much, leading to incoherent responses.

Explanation: Context size, or context window, is the total number of tokens the LLM stores and references when generating new tokens. Think of it as your AI’s memory; the context size determines how much your AI character can remember.

As you roleplay with your AI character, you continually add new information to its memory. When the AI’s memory reaches its limit, it forgets the earliest memories to make room for new ones.

So, even if you began your roleplay at a coffee shop, your AI character won’t remember meeting there later unless that detail remains within the context window.

A low context size causes your AI character to forget information quickly and generate responses that seem out of character or irrelevant to your roleplay. However, an extremely high context size can cause your AI character to remember too much and generate incoherent responses.

Why Tokens And Context Size Matter In AI Roleplay

Every AI character consists of tokens, and your roleplay constantly produces new tokens for the LLM to process while generating responses. Your context size determines how many tokens the LLM considers relevant to the conversation.

Also Read: Context Rot: Large Context Size Negatively Impacts AI Roleplay

Tokens and context size are crucial to your roleplay experience because they influence how your AI character behaves and responds. If you’re casual and just enjoy chatting, they might not matter as much.

But if you want a more immersive experience and to get the most out of your roleplay, you need to understand and optimize tokens and context size.

Are Tokens The Same As Words?

No, tokens aren’t the same as words. Tokens can be entire words, parts of words, punctuation, spaces, and sometimes entire common phrases. Additionally, different models process words into tokens differently, depending on the tokenization method used during training.

For example, a character with 500 permanent tokens on JanitorAI might use between 580 and 600 tokens on another platform, depending on how that platform’s LLM processes your data.

What Are Permanent Tokens and Temporary Tokens?

When you roleplay with an AI character, certain essential information must always remain within the context window. This essential information, called permanent tokens, is sent to the AI with every message.

Examples of permanent tokens are Character Personality, Character Description, and Scenario. The platform or site you use for AI roleplay determines what qualifies as permanent tokens.

In comparison, temporary tokens are pieces of information that will exit the context window during your roleplay to make room for new details. Examples of temporary tokens include the Initial or Starting Message, Example Dialogues, and your back-and-forth messages with the AI.

Will An AI Character Be Better If It Has More Permanent Tokens?

Permanent tokens should remain relevant to your AI character. Include traits and information that define the character, providing tokens that the LLM can use to make the AI feel unique and more human-like.

Adding useless permanent tokens to your character won’t improve it. Permanent tokens that aren’t relevant to the roleplay might cause the LLM to overlook important details of your character.

Also Read: Understanding Lorebooks In AI Roleplay

For example, your cute and clumsy barista character can have permanent tokens that describe their work uniform if you plan to return to the coffee shop often. But if your roleplay shifts away from the coffee shop, you can include the uniform description in the initial message instead.

Alternatively, if your platform supports the lorebooks feature, you can add details about your character’s work uniform and the coffee shop as a lorebook entry.

What Else Adds Tokens?

Your AI character details and chat messages aren’t the only things generating tokens for the LLM to process. Advanced prompts, your persona description and details, and other instructions added by the platform you’re using also contribute tokens for the LLM to process.

Ideal Context Size (Context Window)

The ideal context window is between 8,192 and 16,384 tokens. If you’re using an online platform, it sets the maximum context size you can utilize. When running an LLM locally, your hardware and the model’s capabilities determine how large your context size can be.

A larger context size doesn’t always improve the roleplay experience. As the context window expands, the LLM has more tokens to consider when generating responses. If there’s too much information, the LLM has to decide what to skip and what to reference, which can slow down response times and cause responses to become incoherent.

How To Continue A Roleplay With Full Context

A full context window doesn’t end your roleplay. As older messages drop out of the context, newer ones stay available to the LLM, and it keeps generating responses relevant to the roleplay.

You can use features like Summarize (in SillyTavern) or Chat Memory (on JanitorAI) to generate a summary of all previous chat messages. The summary will be available to your LLM as permanent tokens until a new one is generated.

In addition to creating a summary, you can manually add information and adjust what the AI should remember during extended roleplay sessions. If your platform has a Summarize or Chat Memory feature, make good use of it so your AI character doesn’t forget important details, even after 250 messages.

Input and Output Tokens While Using API/Proxy

The total number of tokens you input and generate matters if you use an API or proxy service to access powerful LLM models for your AI roleplay. These services charge based on your token usage, so using a high context size can become expensive.

If your API or proxy service provider has an input tokens cache feature, use it. DeepSeek’s official API includes this feature, which you can read about here, and it significantly lowers the cost of processing input tokens.