You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note that this project is under active development.
Some of the development is currently happening in the llama.cpp and whisper.cpp repos
Features
Low-level cross-platform implementation
Integer quantization support
Broad hardware support
Automatic differentiation
ADAM and L-BFGS optimizers
No third-party dependencies
Zero memory allocations during runtime
Build
git clone https://github.com/ggml-org/ggml
cd ggml
# install python dependencies in a virtual environment
python3.10 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# build the examples
mkdir build &&cd build
cmake ..
cmake --build . --config Release -j 8
GPT inference (example)
# run the GPT-2 small 117M model
../examples/gpt-2/download-ggml-model.sh 117M
./bin/gpt-2-backend -m models/gpt-2-117M/ggml-model.bin -p "This is an example"
For more information, checkout the corresponding programs in the examples folder.
Using CUDA
# fix the path to point to your CUDA compiler
cmake -DGGML_CUDA=ON -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.1/bin/nvcc ..
Download and unzip the NDK from this download page. Set the NDK_ROOT_PATH environment variable or provide the absolute path to the CMAKE_ANDROID_NDK in the command below.