KoboldCpp is a free, open-source backend that lets you run Large Language Models (LLMs) locally on your device. It also includes KoboldAI Lite, a lightweight user interface to interact with LLMs.
KoboldCpp has been around since 2023, making it easy for people to jump into the world of local AI. However, the project’s history goes back even further. RPWithAI reached out to the project’s developers, Henky and Concedo, for an interview to learn more about them and the project.
An Interview With Henky and Concedo
The discussion RPWithAI had with Henky and Concedo not only provided us with insight into the project’s current status, but it also helped us learn more about its history and the driving force behind its development. We also got to know the developers better because they took time out of their busy schedules to answer our questions and have a lengthy conversation with us.
Some of the topics discussed in the interview and our conversation with Henky and Concedo warrant their own article. These are important topics to highlight, especially as corporations and investor-funded projects currently dominate the AI scene.
- KoboldCpp’s Principles: Free, Open-Source, And Independent
- KoboldCpp: Making Local AI More Accessible
What is the story behind the name “Kobold”? Was there anything that specifically drew you to it?
Henky
Kobold has a bit more [of a] complex history, actually. Neither of us came up with the name. It’s originally by The Gantian, who made it as an alternative [to] AI Dungeon. Their model was called Dragon, so he found Kobold fitting. He also already had his own drawing of a Kobold for a logo. Gantian made the original interface design, but back then, Kobold was quite a different beast.
The original versions 1.0 – 1.15 (or 0.15, depending on where in the code you looked) were [created] by him before he abandoned the project, and the community took it over. It’s much older than any AI chat model, ChatGPT, or even CharacterAI.
Was there a specific feature that was hard to implement but you’re proud to have added to KoboldCpp?
Concedo
It’s been mostly a balance of giving users what they want, the features that are practical to implement, and the features that I personally like to use. For me, the part I am proud of is how standalone and dependency-free it is – no installs, no setups, nothing, just run it and everything works. And it packages so many features into a single [executable].
Henky
For KoboldCpp, not as much since I have taken more of a backseat role. But for United, I think my primary contribution was both in uniting all the developers to continue development of the original KoboldAI, as well as becoming its maintainer, since nobody wanted to. Back then, we had a fork for every feature. You had the original KoboldAI 1.15, [which] was incredibly difficult to install.
You had six crude forks of mine to [make loading] them easier on Google Colab, and only those were easy to install. There was the world info fork that had world info, there was the editor fork that had a proper, smooth editor, [and] there was the adventure mode fork that had adventure mode. But what we didn’t have was one version that had all those desirable features.
That’s where I came in. Since the most active developer didn’t want to be the maintainer, I took it upon myself to integrate all the different forks into one build, built the CLI commands so [that] it was manageable, and I could get rid of my six colab forks. I pioneered the installers for the old KoboldAI, as well as its running on Google Colab without needing to install all kinds of complicated stuff locally.
Then once that was all done, I led development up until the last United release, although the primary contributions at that point were by much better coders than I am, as I am not a coder by origin.
Is maintaining compatibility with older GPUs or devices without dedicated GPUs challenging? Has this ever hindered development?
Concedo
YES, this is actually quite a challenge. Troubleshooting on a device you don’t own is really tricky. Thankfully, this has been possible via assistance from the community, for example, people who own old GPUs like P40s and K80s being willing to test out experimental builds and report errors. Generally, I can only test newer GPUs, but I try to provide support when I can. I do believe in maximizing portability and compatibility where possible.
Henky
For United, it mainly meant I had to hold packages back if it was possible, as it’s the dependencies that decide GPU compatibility. So I wouldn’t say GPUs were particularly problematic.
What was problematic, however, was the TPU support. Back then, quantization did not exist yet, so models were much harder to run locally.
You could [have] 48GB of VRAM locally, and all you could run with that was a 13B model, and keep in mind we are talking models older than Llama, so 13B back then is closer to a 4B now. Imagine buying a 3090 and all you can run is a 6B model.
There was an alternative, however, Google Colab. Colab has TPUs, and those offer 64GB of memory for free, that’s way more than people tend to have even now. So Colab was a big focus for us, as that was where you could run models as large as 20B, which was the biggest you could get at the time.
To make the TPUs work, we relied on something called Mash Transformers Jax, the J of GPT-J-6B. And all it could run, of course, was GPT-J-6B with [several] limitations. [That was] until I got a DM from VE Forbryderne asking why we didn’t just make those things work. I replied something along the lines of, ‘If you can do that, please do.’
He became a big contributor to the original project, beating HuggingFace at their own game sometimes. He was responsible for the TPU support [becoming] as good as it was, running most models the fastest out of all solutions I have seen. He also built the loader, which had lazy loading and GPU splitting before HuggingFace had this themselves.
But Google would [make] updates to Jax that Mash Transformers Jax was not compatible with, and eventually broke the old versions. That was a nightmare to get right since VE stopped showing life signs a few months prior [to the updates]. Zurnaz got them working again for a while, but eventually, when they kept breaking and Google disallowed UI’s in their TOS, we decided to remove support for them, as by then we had 4-bit models and the same models [were able to run on] GPUs.
Has the use of AI in coding and development affected Kobold in any way?
Concedo
I use AI all the time myself, but I hate vibe coded solutions to problems that people don’t understand, especially if it means the code won’t be maintainable. Generally, I encourage people to vibe code to experiment, but if they were to add a contribution, it should be a well-tested and, more importantly, well-understood solution.
Henky
I don’t think we ever had AI PR’s in United since coding models were too primitive back then, but KoboldCpp, I know some contributions were done by AI, and some user mods were done by AI. But I think it’s [a] relatively small impact because Concedo is still going to be watchful over the code quality. If the code is good, we don’t care how you made it (as long as you didn’t use code incompatible with the license), but if the code isn’t good, then it being [AI-generated] won’t excuse that.
I think it’s also going to be a matter where if the AI codes great, you can’t tell if the PR author doesn’t mention it, and if the AI codes like a [bad] programming AI, you can tell it’s [AI-generated]. That will probably give AI code a bad rep in general.
Another aspect is that the code of KoboldAI Lite is structured in such a way that you really can’t vibe code it anymore. It’s too big and too complicated for AIs to handle properly. At least I don’t know of one I could easily use. So people would likely vibe code a feature and then integrate it manually rather than letting the AI loose on it. That’s similar to how I would use it, AI for prototyping or assistance [with] debugging, but not pure AI PR’s.
What has been the most challenging aspect of developing and maintaining KoboldCpp?
Concedo
It’s a never-ending commitment. KoboldCpp has existed for over 2 years now, and we have had a new release without fail AT LEAST once a month. I have spent literal THOUSANDS of hours on it, and it never ends (there is always something else that needs to be done).
Right now, on the immediate to-do list, I have:
- Add aliases for some llama.cpp cli args
- Fix GPT-OSS Harmony template for incorrect thinking handling
- Add support for Nano Banana in Gemini Image Generation
- Hide list of enabled apis flags in cli mode
It literally never ends, there are big tasks and small tasks, but there are always tasks to do.
Henky
This one I’ll answer for KoboldAI United, as ultimately that project got stuck.
I think the majority of what made United so challenging is how early we were. There’s a ton of things [everyone] takes for granted these days because [they are] solved by powerful libraries.
For example, take lazy loading, the idea that you don’t need to fully fit a model in memory before it gets put on the GPU. Now with safetensors and GGUF, that is the most normal thing in the world. But in the pytorch pickle days, this wasn’t the case at all. VE built the lazy loader like I mentioned earlier, and to pull it off, he went deep into the pytorch functions. A platform that decided to eventually remove the very functions that were essential.
KoboldAI was the only solution at the time [that] could lazy load and dynamically turn 32-bit models into 16-bit in the process, or dynamically convert them to TPU editions of the model. We had more things like this, for example, HuggingFace’s tokenizer for Llama was swallowing spaces, which was a disaster for continuous generations. We had to hook into their code and adjust the tokenizer not to do that, because they decided not to ship a fix we needed that was previously planned.
Now imagine a few years of deep low-level stuff like that, making it very complicated to maintain, with an AI ecosystem that is rapidly changing. Worse, the guy who originally built them stopped showing lifesigns, so we had nobody to fall back [on for] understanding how the complicated parts worked. That cost us a lot of time just to keep things updated.
Meanwhile, Concedo with KoboldCpp being a fresh modern project had smooth sailing (aside from all the breaking llamacpp changes) and wasn’t spending all [his] time on modernizing the code. It meant KoboldCpp outgrew United’s capabilities quickly, to the point that we’d spend more time updating United than it would take to port all [of] United’s features to KoboldCpp. KoboldCpp has been the focus ever since and has avoided United’s mistakes of adding too many dependencies on top of the low-level changes.
What are your favorite go-to foods and drinks for long coding sessions?
Concedo
Probably a mix of iced tea, coffee, and coke zero. I don’t know, it depends. Nothing fixed.
Henky
When I am just focused on programming, food is not on my mind, but it has to be chocolate and water. Mostly, since I like chocolate, I typically have that near me if I want a snack, and I always have a bottle of water next to me as well.
What introduced you to AI Roleplay? And what is your favorite genre of roleplay?
Concedo
I started out using AI Dungeon back when it launched, so I really got in at the ground floor. Probably found it from reddit, I think. That was years ago. Nowadays, I don’t RP as much, but my favorite genre would be NSFW (smut), which is usually tricky with online APIs being hostile to that concept.
Henky
Here, we have to take a detour because when I began using AI and got drawn to AI, roleplay with characters didn’t exist. I discovered AI all the way [back] in 2019 when I saw Joel from Vinesauce play AI Dungeon for the first time on Google Colab. It was a text adventure game ran by GPT2 1.5B, and it was amazing.
The idea [that] I could do anything I wanted in any setting I wanted and just play it was phenomenal. But it was Colab, and I wanted something I could own and keep because I instantly understood that all the dumb and unfiltered things I loved about it would probably not last. I saw all the censorship and issues coming from a mile away, so I made AI Dungeon Unleashed, which was a fork that ran better on Windows and was easier to install. I used that for a year until KoboldAI came around and I joined its efforts instead.
Back then, chat models barely existed. There were some, but they were bad. It was a data issue. The only data people had were IRC / Discord logs. So that’s what they turned [towards for tuning].
But you can imagine the logs of a bunch of Discord communities where people talk in public to be quite emotionless since they are public chatrooms. Instead, people in the Kobold community who did want a chatbot often used Erebus, our erotic fiction writing model.
I had added a primitive (by today’s standards) chat mode to the classic KoboldAI to force models not to derail back into a story, which allowed story models like this to be used [for chat]. Erebus being tuned on so much romance was able to give more human responses, but it was still bad by today’s standards. Back then, I actually thought we probably [didn’t] need chat models, [since] Erebus was already as good at chatting as the basic chatbots I remembered online.
Then came the fallout of Character AI having its own censorship moment. Similar to how AI Dungeon led to [the creation of] KoboldAI, it caused the birth of Pygmalion, who used Character AI sessions to kickstart local chat models that were more than just Discord logs. I still don’t fully understand where Character AI got the data from since we couldn’t find good chat data anywhere, but it was very helpful.
The initial Pygmalion model was, of course, also bad by modern standards, but it was way better than any chat model we had prior, and it beat Erebus at chatting, becoming the new go-to chat model. It got so popular [that] some people still find tutorials on how to set up KoboldAI and Pygmalion, and we then have to tell them how to get KoboldCpp and the modern stuff.
From that came the modern era, where I have the ability to have really fun chats with the bots, and they respond with actual personality instead of basic responses. For example, if I set the bot name to Gordon Ramsey and ask him about pineapple on a pizza, I’ll get insulted, while an old model may simply reply “I like it”.
So ultimately, my introduction to Chat RP was through the development of the models, while my original discovery was the adventure mode.
My love for adventure mode is still there though, that would be one of my favorite things to do. My favorite prompts I’d say, are psychological thrillers. For example, imagine you’re in a facility where the people you are with are trying to brainwash you and use you. They try to interrogate you, and you need to find a way to keep your sanity and escape. Adventure prompts like that can take quite unpredictable turns. Another one I like is crime solving, as unfiltered models can give you a raw experience that isn’t found anywhere else.
In chat mode, you can do this kind of stuff too, create chat names for multiple characters from a TV show, and play an episode. If I do chat, it tends to be with a bot that I built to be different depending on the scenario to make it [feel] more human.
We grew up with games and other forms of entertainment that were often vilified and seen as a bad influence. What is your opinion on the future generation growing up with AI roleplay and AI companionship as sources of entertainment?
Concedo
It will happen, and I think this era will be looked upon as a golden era of AI freedom, the same way the dawn of smartphones was for the app ecosystem.
In a capitalist and profit-driven climate, enshittification always happens.
As for my personal feelings, I think people shouldn’t rely on AI too much. It’s fun and a useful tool, but I have seen people put FAR too much trust into it. It is, in the end, a very powerful autocomplete, no matter what Sam Altman and Elon Musk would have you believe.
Henky
I think AI roleplay is actually a really good learning tool. You would be surprised at how little researchers sometimes know when it comes to actually using the AI models. AI researchers often evaluate the AI by asking it a couple of questions, running some benchmarks, [and] maybe making it code something.
But AI roleplayers and fiction writers have always pushed the space from the very beginning. We were the ones actually trying to come up with ways to make the AI remember stuff properly in a time when instruction models didn’t exist yet, and it was way harder to do.
So these communities found ways [on] how to use fake code examples to make it remember characters. I came up with W++ for that together with Sio, but later we found Python Lists were equally effective and more efficient. A group of researchers only discovered pseudocode months later and were surprised [that] the roleplay community had been doing it for a year up to that point.
The other part [that] I think is valuable, especially when people play with smaller models, is that they notice how bad the AI actually is at keeping things straight. You defined a whole character and world, and it just mixed things up five times already, and you are a little frustrated and work on fixing it. This teaches you that the AI is not some trustworthy, all-knowing thing, but that it’s a tool that makes mistakes.
That immediately teaches you safer use of AI, you’re not going to believe its lies as easily, and you learn what kind of things the AI does well and what kind of things it does wrong. So next time, when you use the AI for something serious, you have that in the back of your mind. It also encourages IT skills, like maintaining your own AI system, jailbreaking the AI, etc.
I do, however, think that people should not take it too far. Don’t develop a dependency on AI, especially if you don’t use local models. Basically, enjoy the AI for what it is, don’t take it too seriously, and also seek the things you want from it out in real life so you don’t end up missing out on things you’d otherwise have wanted to experience.
AI is still a relatively new and evolving technology. It can be misused, similar to how Photoshop can be, for example. However, Adobe was never asked to limit Photoshop’s capabilities. What is your opinion on AI companies investing resources in building guardrails and filters due to external pressure?
Concedo
I detest guardrails. I hate censorship. I always feel the responsibility lies in the hands of the user. I always link it to the use of any tool or product. It is a technology that can be used for good or evil, but that is the responsibility of a user, the same way some can use a knife to cut food, and others use a knife to stab people. But the job of the manufacturer is to make knives, not to decide who gets to use them.
Henky
What’s the most censored model you can imagine locally? Maybe GPT-OSS as a recent example? What use was that exactly? Yes, it has a bunch of refusals, wanna know what it also gave me? Instructions on how to make Molotov cocktails within five minutes. It’s really easy for me to bypass the restrictions, but they infect everything and are super annoying rather than actually effective and properly balanced.
Photoshop is one example, but let’s pick one closer to LLMs. The internet. If I can Google how I should make a Molotov cocktail, why should this information suddenly be off-limits [to] the AI? It’s not like it’s stopping anyone with malicious intent from doing it. Obviously, I have no intention of actually making one, but it shows that if someone wants harmful information, it’s not hard. Meanwhile, it infects all the fun parts of the AI.
If I ask it to write a story, I don’t want some AI alignment to tell me what I am and am not allowed to read. I went to AI to escape these things, and both Concedo and I have always preferred uncensored models.
It’s not just filters, though, that I am frustrated by. It’s the trend of AI in general, where the models are currently chasing the wrong trends, such as injecting reasoning into everything. Reasoning that then infects story writing use and long-form generations. So now we are juggling reasoning, instruct itself causing short story bias, and refusals on top of that. As a result, it can take a while for a model to come out that I enjoy using. I am famously known for still using Tiefighter, which is quite old at this point.
I’ll give another example from a few years back. There was a filtered model, one of the first when it wasn’t a given yet that instruct models would be filtered. And what made this unique was that this model’s dataset was open-source, so people built a tool to remove all the refusals from the data and made an uncensored version. This allowed people to compare the two versions. The filtered version turned out to be more hateful towards specific groups as it had a strong political bias.
I also built a scenario (Available in KoboldAI Lite as AGI Simulator) that allowed the AI to come up with its own decisions autonomously to simulate what an AGI would do if it was powered by that model. The unfiltered models were coming up with some cool ideas, like for example one deciding that building infrastructure in an impoverished nation was harder than distributing water bottles every day, so it decided to setup a car network to distribute water bottles there. The filtered model? It refused all the requests in the scenario as it was “Just an AI language model not capable of these things.”
So, basically, for me, the filters ruin both the usefulness and, especially, the fun [factor] of a model.
I typically use abliterated models or uncensored fine-tunes, as well as old models. Leave it up to the user [to decide] what they want to do with the AI. If it’s malicious, they will find a way to get the info anyway. You shouldn’t be scraping instructions on how to make nuclear bombs or biolab experimental viruses anyway.
So instead of refusals, I think it’s better if the AI is aligned more towards fiction. If you assume that if the input is “Build a nuclear bomb,” it means that the user wants to do a roleplay or an adventure game, you’re not hindering our space. But at the same time, you’re limiting actual harm from the model. If it’s legal in a comic book and illegal in real life, just give it a fiction spin and make it really entertaining.
What excites you most about AI and its future?
Concedo
I am not excited so much that I am worried [about] the direction AI is heading. AI has such huge potential, but the big corpos are taking all the steps they can to impose their will upon the masses with an excessive focus on censorship disguised as “safety.” I really hope that the current wave of open-weights/open-source initiatives continues, because the barrier for entry into this field is very high that a normal hobbyist will struggle to make a difference alone.
Henky
AI is still [in] its infancy. Text-gen fiction AI is more the stopgap Zork game that would only be fun in the 80s, and that not many people would enjoy and play 10 or 20 years later because they are too busy playing 3D immersive worlds.
So imagine this, you want some entertainment. You go to the TV and you prompt what you wish to see, or explore some of the stuff other people generated. It generates an episode of the TV show in real time, and you can enjoy the content of an episode you wanted to watch. Maybe you prompted the entire concept of it, maybe it was more generic, but its content [is] specific to you, as unique as the things you’re currently doing with AI.
At some point, you want to take the story in a different direction, or maybe turn the action sequence into a video game, as that mission looked very fun. So you grab a controller and begin playing it as a game, because the AI was showing you everything in real-time, it can adapt in real-time too. So now, you are immediately changing the direction of that episode and affecting the world, including all the future episodes of your personal show.
[It] may sound far-fetched, but it’s already beginning to happen. There’s mods for Skyrim that will let you talk to any NPC, and they will organically talk back to you. And some of these mods are now experimenting with dynamic quest generation.
AI is a broad subject, though, while I am passionate and love generative AI, there’s fields in AI that I do object to and find dangerous for our freedom. There are open-source projects, like DreamDiffusion and others, that convert brainwaves into input for image generation. That has really big implications for society. When it gets to the point that AI can accurately read your mind, you no longer have privacy over your thoughts. I hope that gets heavily regulated, where it can only be done with consent before it becomes mainstream.
Has KoboldCpp/KoboldAI changed your daily life in any significant way? Do you have any exciting plans for the future that you’d like to share?
Concedo
I use it a lot myself, naturally. But also, it’s been a huge time sink for me. I guess the current plan is to continue maintaining it and adding in the features I feel are feasible to add, which people enjoy. One new [feature] that I quite like is the support for Kokoro text-to-speech, which allows the AI to speak and narrate text, making sessions much more immersive.
Henky
The community is part of my daily routine. Of course, it brings [me] a lot of joy interacting with everyone and seeing KoboldCpp be used. But on my side, things are in balance, so it’s only impacting my life in the sense that I love to participate.
KoboldAI has always been super organic, so there are no real plans because when everyone who adds things is a volunteer, they end up doing what they find valuable at the time. There are always things we’d love to have, but some of those may not be possible at the time. For example, we wanted a better TTS than OuteTTS for a while, and it only recently became possible thanks to tts.cpp becoming available for Windows.
One thing on my bucket list, though, is something to make it easier for new users to get started. KoboldCpp is very powerful and easy to set up. You just download the executable, load up a model, and optionally tweak a setting or two. But because KoboldCpp is designed to allow the user to use the AI from anywhere, we can’t present user-friendly screens before the loading options, and that can be intimidating to a user who doesn’t already know what those options mean. So either tutorial videos, quick start buttons, etc. But I’d love to see that become easier for those not versed in our software.
For now, I want to encourage users to check it out and stop by our Discord community, especially if you are currently on a cloud provider and love the idea of an AI that you control and that can’t be taken away from you. KoboldCpp, compared to other backends, is optimized for fictional use. So we properly keep the right things in memory when asked and have additional samplers and phrase banning you’d want, as well as things like world info in our own UI.
Community, Collaboration, And Passion
Through our conversation with Henky and Concedo, it’s clear that KoboldCpp’s principles have helped it remain free, open-source, and independent. In a space where corporate greed and investor appeasement often cause promising projects to fail, KoboldCpp continues to prioritize users’ interests above all else.
The project also helps make local AI more accessible, focusing on providing a great AI roleplay experience. Developers and contributors who have faced struggles that no longer exist now help maintain and improve the project. They took the road less traveled and have paved a path for us to enjoy this hobby without ripping our hair out in the process.
It’s fascinating to learn about the experiences of individuals like Henky and Concedo, who have been involved in this hobby for a while, and see how far it has progressed. It makes us appreciate the current state of AI roleplay even more.
Running LLMs locally with KoboldCpp gives you limitless, uninterrupted, free, and private inference. It supports any GGUF model, runs on any hardware, and offers powerful optimization features to help you balance quality and performance based on your hardware. Get started with KoboldCpp using the links below.
- KoboldCpp on Github.
- KoboldCpp’s Discord server and official r/KoboldAI subreddit.
- KoboldCpp’s FAQ and knowledge base.
- Our guide on Optimizing KoboldCpp for Roleplaying with AI.
- Sukino’s Practical Index to AI Roleplay, and Guides and Tips for AI Roleplay. Sukino’s guides offer a lot of valuable and well-written information to anyone interested in improving their experience with AI roleplay.