🚀 Velesio AI Server

High-performance, microservice-based AI inference server with Unity integration support.

✨ Features

🎯 Unity Ready: Seamless integration with Unity Assets like "LLM for Unity"
⚡ High Performance: Custom llama.cpp binary with GPU acceleration
📈 Scalable: Redis queue-based worker architecture
🐳 Easy Deploy: Docker Compose setup with auto-downloading models
📊 Monitoring: Built-in Grafana dashboards and observability

📚 Documentation

📖 Complete Documentation - Full guides, API reference, and examples

🚀 Getting Started - Installation and basic setup
🏗️ Architecture - System design and components
🚢 Deployment - Production deployment strategies
🔌 API Reference - Complete endpoint documentation
🛠️ Components - Individual service configuration
🔧 Troubleshooting - Common issues and solutions

⚡ Quick Start

RunPod Template

Deploy instantly on RunPod GPU cloud:

🔗 One-Click Deploy of GPU component
Set REMOTE=false for standalone inference endpoint (make sure to expose llamacpp and SD ports)
Set REMOTE=true to connect to remote Redis queue from API component

Local Setup

# 0. Environment:
Linux
NVIDIA GPU with cuda 12.2+ drivers installed
NVIDIA Container Toolkit
Docker compose plugin
# 1. Get the code
git clone https://github.com/velesio/velesio-aiserver.git
cd velesio-aiserver
# 2. Configure environment
cp .env.example .env  # Edit tokens and model URLs
# 3. Launch!
docker-compose up -d
# 4. If you are locally developing you can use the --build flag, and include the undreamai_server binaries in the /gpu dir with the server_setup.sh script
docker-compose up -d --build

Your API will be available at https://localhost:8000 🎉

📖 Need more details? Check out the Getting Started Guide for comprehensive setup instructions.

🎮 Unity Integrations

Built specifically for Unity developers:

LLM for Unity - Text Generation
SD Integration For Untiy - Image Generation

🏗️ Architecture

Distributed microservice design for maximum flexibility:

┌─────────────┐    ┌─────────┐    ┌─────────────┐
│    API      │────│  Redis  │────│ GPU Workers │
│  (FastAPI)  │    │ Queue   │    │ (LLM + SD)  │
└─────────────┘    └─────────┘    └─────────────┘
       │                                  │
       │           ┌─────────────┐        │
       └───────────│ Monitoring  │────────┘
                   │(Grafana+Prom)│
                   └─────────────┘

API Service: FastAPI with token auth and job queuing
GPU Workers: Custom llama.cpp + Stable Diffusion inference engines
Redis Queue: Decoupled job processing for scalability
Monitoring: Pre-configured Grafana dashboards

📖 Learn more: Architecture Documentation

🔌 Open Source References

UndreamAI UnityAsset and Server dobrado76 SD Integration for Unity Automatic1111 SD Web server LLAMACPP

Questions? Check the Documentation or open an issue!

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.github		.github
api		api
docs		docs
gpu		gpu
monitoring		monitoring
nginx		nginx
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Velesio AI Server

✨ Features

📚 Documentation

⚡ Quick Start

RunPod Template

Local Setup

🎮 Unity Integrations

🏗️ Architecture

🔌 Open Source References

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Velesio/velesio-aiserver

Folders and files

Latest commit

History

Repository files navigation

🚀 Velesio AI Server

✨ Features

📚 Documentation

⚡ Quick Start

RunPod Template

Local Setup

🎮 Unity Integrations

🏗️ Architecture

🔌 Open Source References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages