Run KoboldCpp On Google Colab

Running LLMs locally requires a desktop or laptop with decent hardware. Those who have a gaming or productivity rig with a dedicated GPU can run small to medium models locally without breaking a sweat.

If you don’t have a dedicated GPU or have one with 6GB or less VRAM, running even small models locally at an acceptable quant can be challenging. But that doesn’t mean you should give up on running LLMs with KoboldCpp and the privacy benefits that come along with it.

KoboldCpp On Google Colab

Google Colab is a cloud-based service that offers limited free access to computing resources, including GPUs. You can run KoboldCpp on Google Colab and use 24B or smaller models at Q4_K_S or IQ4_XS quantization for AI roleplay.

Google offers Colab for machine learning, research, and educational purposes. The service is free, but resources are not guaranteed. Depending on available resources, your instance might disconnect, or you may not be able to use Colab.

If you want a reliable and low-cost alternative to access dedicated computing resources for AI roleplay, learn how to run KoboldCpp on Runpod.

How To Run KoboldCpp On Google Colab

It’s easy to run KoboldCpp on Google Colab using KoboldCpp’s Notebook.

Open KoboldCpp’s Notebook.
Choose a Model from the dropdown menu or enter the URL of a GGUF model from Hugging Face.
Do not change the number of Layers. The default setting offloads all model layers onto the GPU.
Choose your Context Size from the drop-down menu or enter your own value. For larger models (15B or bigger), setting a Context Size over 8,192 may not work, depending on available resources.
Keep Flash Attention enabled (default setting). Don’t disable it unless necessary.
Enable Multiplayer only if you want to share your session with others.
Keep Delete Existing Models enabled (default setting).
You can optionally load Image and Speech generation models. Keep in mind that these models also require computing resources. You might not be able to use all the models you want if they exceed the resources Colab offers for free.
Enable Allow Save To Google Drive only if you want to save data from the KoboldAI Lite frontend. If you are using a different frontend that saves your conversations, like SillyTavern, you don’t need to turn this option on.

Once you finish configuring the options, click the play button and wait for Colab to complete setting up your virtual machine. You can scroll down to view the logs and progress.

Once the setup is completed, Colab will provide you with Cloudflare tunnel links to access your KoboldCpp instance. You can use these links on your frontend to connect to KoboldCpp’s API.

Google Colab Cloudflare Tunnel Link On SIllyTavern

You will need to keep the Colab page open and complete any CAPTCHA if prompted. Google will shut down your virtual machine if you fail to complete the CAPTCHA.

Incomplete Setup

If your logs end with “Could not load text model,” then there is something wrong with your configuration. Either the Context Size you chose was too high and there wasn’t enough VRAM available to allocate for KV Cache, or the model size/quant was too large.

Refer to the logs to identify what went wrong, update your configuration, and try again.

Terminate Your Virtual Machine

To terminate the virtual machine Colab setup for you, select the drop-down menu for Additional connection options > Manage sessions > Terminate current session (by clicking the trash icon).

Alternatively, you can click the stop button (where the play button was previously). This only stops your session and can be used to change your model or adjust other settings. It does not terminate your virtual machine.

Privacy While Using Google Colab

Although you are not running models “locally” when using Google Colab, you still have more control over your data compared to using cloud providers to access LLMs. KoboldCpp’s Notebook does not log any prompts or generations.

Google can see the model you are using and the resources it consumes. Once you terminate your session, by default, Colab deletes the machine and all its data. Monitoring and storing your prompts and generations use more resources than they are worth.

However, it is still a cloud service, and 100% privacy cannot be guaranteed. Avoid sharing any personally identifiable information in your conversations, and adhere to Google Colab’s Terms of Service.

Troubleshooting And Help

The logs provide useful information to help you understand what’s going wrong if Colab fails to load your model. However, if you can’t figure it out on your own, you can ask for help on KoboldAI’s Discord server or the r/KoboldAI subreddit.

Rub KoboldCpp On Google Colab

If your hardware can’t handle running LLMs locally, you can still enjoy private, free AI roleplay by running KoboldCpp on Google Colab. KoboldCpp’s Notebook simplifies setup, and you can connect any frontend to KoboldCpp’s API using the Cloudflare tunnel links Colab provides.

Since Google Colab is free, it’s not always available and offers limited resources. If you want a reliable and low-cost alternative, consider running KoboldCpp on Runpod.

Running KoboldCpp on Colab gives you more control over your data compared to other LLM providers, but since it’s still a cloud service, avoid sharing personal information and follow Google Colab’s Terms of Service.