You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This set of benchmarks is meant to test USearch capabilities for Billion-scale vector search.
It provides an alternative to the ann-benchmarks and the big-ann-benchmarks which generally operate on much smaller collections.
The main objective is to understand the scaling laws of the USearch compared to FAISS.
Supplementary adapters for other popular systems is also available under index/ directory:
Alternative HNSW implementations, like HNSWlib,
Alternative CPU-based libraries, like SCANN,
Vector Databases, like Qdrant, and Wevaite.
The primary dataset used for benchmarks is the Deep1B dataset of 1 Billion 96-dimensional vectors, totalling at 384 GB.
Ground-truth nearest neighbors are provided to calculate the recall metrics.
Setup
First of all, we recommend creating a conda environment to isolate the dependencies:
wget https://storage.yandexcloud.net/yandex-research/ann-datasets/DEEP/base.1B.fbin -P data
wget https://storage.yandexcloud.net/yandex-research/ann-datasets/DEEP/base.10M.fbin -P data # For smaller subset
To run the ANN benchmarks pass a configuration file: