koboldcpp-1.96

@Reithan

koboldcpp-1.96

NEW: Now supports audio inputs for models (in addition to existing vision inputs). Specifically, support for Qwen 2.5 Omni 3B has been added (the 3B is better than the 7B which cannot understand music).
- Use it similar to existing vision models - you download the base model and then the mmproj and load both.
- You can then launch KoboldCpp, and upload your images/audio in the KoboldAI Lite UI, and ask the AI questions about them.
- Multiple images and audio files can be used together, though be aware that you will need a high context especially for large audio files.
- The 3B seems to perform better than the 7B. The 7B hallucinates on music very hard.
Added miniaudio: .wav, .mp3 and .flac files are now supported on all audio endpoints (Whisper transcribe and multimodal audio)
Fixes for gemma3n incoherence, should be working out of the box now.
Fixes to allow the new Jamba 1.7 models to work. Note that context shift and fast forwarding cannot be used on Jamba.
Allow automatically resuming incomplete model downloads if aria2c is used.
Prints some system information on startup to terminal to aid future debugging
Added emulation for OpenAI /v1/images/generations endpoint for image generation
Fixed noscript image generation
Apply nsigma masking (thanks @Reithan)
Allow flash attention to be used with image generation (thanks @wbruna)
Backwards compatibility for json_schema field improved.
Ensured that finish_reason is always sent last with no additional text content on the same chunk.
Important Change: Default context size is now 8k (up from 4k) to better represent modern models. This may affect your memory usage. Existing kcpps configs are unaffected.
Important Change: The flag --usecublas has been renamed to --usecuda. Backwards compatibility for the old flag name is retained, but you're recommended to change to the new name.
Added new AutoGuess templates for Kimi K2, Jamba and Dots. Hunyuan A13B template is not included as the ideal template cannot be determined.
Improved formatting of multimodal chunk handling
Fixes for remotetunnel not starting on some linux systems.
Updated Kobold Lite, multiple fixes and improvements
- Aesthetic UI has been completely refactored and slightly simplified for easier management. Most functionality should be unchanged.
- Allow connecting to OpenAI endpoints without a key.
- Added more experimental flags to control audio compression, autoguess tags and unsaved file warnings.
- Allow uploading audio files and embedding them into your saved stories, lamejs mp3 encoder added.
- Allow audio capture from microphone to embed into story
- Added shortcut for inserting instructions into memory
- Allow disabling default stop sequences.
- Breaking Change: Attached image and audio data is no longer stored inline in the story, but instead as metadata in the savefile
  - Save files from past versions are 100% forwards compatible, but any new media files in future saves are only partially backwards compatible - all media saved in future versions will not be accessible when re-opened in past versions of the UI.
  - This is required to handle the large size of audio files. All old savefiles will upgrade perfectly fine, but you can't add new media and then access it back in old versions again.
- Fixed a few html parsing bugs.
Merged new model support, fixes and improvements from upstream

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here if you are a Windows user or download our rolling ROCm binary here if you use Linux.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
https://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

koboldcpp-1.96

koboldcpp-1.96

Contributors

Uh oh!