High-performance, microservice-based AI inference server with Unity integration support.
- 🎯 Unity Ready: Seamless integration with Unity Assets like "LLM for Unity"
- ⚡ High Performance: Custom llama.cpp binary with GPU acceleration
- 📈 Scalable: Redis queue-based worker architecture
- 🐳 Easy Deploy: Docker Compose setup with auto-downloading models
- 📊 Monitoring: Built-in Grafana dashboards and observability
📖 Complete Documentation - Full guides, API reference, and examples
- 🚀 Getting Started - Installation and basic setup
- 🏗️ Architecture - System design and components
- 🚢 Deployment - Production deployment strategies
- 🔌 API Reference - Complete endpoint documentation
- 🛠️ Components - Individual service configuration
- 🔧 Troubleshooting - Common issues and solutions
Deploy instantly on RunPod GPU cloud:
- 🔗 One-Click Deploy of GPU component
- Set
REMOTE=false
for standalone inference endpoint (make sure to expose llamacpp and SD ports) - Set
REMOTE=true
to connect to remote Redis queue from API component
# 0. Environment:
Linux
NVIDIA GPU with cuda 12.2+ drivers installed
NVIDIA Container Toolkit
Docker compose plugin
# 1. Get the code
git clone https://github.com/velesio/velesio-aiserver.git
cd velesio-aiserver
# 2. Configure environment
cp .env.example .env # Edit tokens and model URLs
# 3. Launch!
docker-compose up -d
# 4. If you are locally developing you can use the --build flag, and include the undreamai_server binaries in the /gpu dir with the server_setup.sh script
docker-compose up -d --build
Your API will be available at https://localhost:8000
🎉
📖 Need more details? Check out the Getting Started Guide for comprehensive setup instructions.
Built specifically for Unity developers:
- LLM for Unity - Text Generation
- SD Integration For Untiy - Image Generation
Distributed microservice design for maximum flexibility:
┌─────────────┐ ┌─────────┐ ┌─────────────┐
│ API │────│ Redis │────│ GPU Workers │
│ (FastAPI) │ │ Queue │ │ (LLM + SD) │
└─────────────┘ └─────────┘ └─────────────┘
│ │
│ ┌─────────────┐ │
└───────────│ Monitoring │────────┘
│(Grafana+Prom)│
└─────────────┘
- API Service: FastAPI with token auth and job queuing
- GPU Workers: Custom llama.cpp + Stable Diffusion inference engines
- Redis Queue: Decoupled job processing for scalability
- Monitoring: Pre-configured Grafana dashboards
📖 Learn more: Architecture Documentation
UndreamAI UnityAsset and Server dobrado76 SD Integration for Unity Automatic1111 SD Web server LLAMACPP
Questions? Check the Documentation or open an issue!