TorchDR is an open-source library for dimensionality reduction (DR) built on PyTorch. DR constructs low-dimensional representations (or embeddings) that best preserve the intrinsic geometry of an input dataset encoded via a pairwise affinity matrix. TorchDR provides GPU-accelerated implementations of popular DR algorithms in a unified framework, ensuring high performance by leveraging the latest advances of the PyTorch ecosystem.
🚀 Blazing Fast: engineered for speed with GPU acceleration, torch.compile
support, and optimized algorithms leveraging sparsity and negative sampling.
🧩 Modular by Design: very component is designed to be easily customized, extended, or replaced to fit your specific needs.
🪶 Memory-Efficient: natively handles sparsity and memory-efficient symbolic operations to process massive datasets without memory overflows.
🤝 Seamless Integration: Fully compatible with the scikit-learn and PyTorch ecosystems. Use familiar APIs and integrate effortlessly into your existing workflows.
📦 Minimal Dependencies: requires only PyTorch, NumPy, and scikit‑learn; optionally add Faiss for fast k‑NN or KeOps for symbolic computation.
TorchDR offers a user-friendly API similar to scikit-learn where dimensionality reduction modules can be called with the fit_transform
method. It seamlessly accepts both NumPy arrays and PyTorch tensors as input, ensuring that the output matches the type and backend of the input.
from sklearn.datasets import fetch_openml
from torchdr import UMAP
x = fetch_openml("mnist_784").data.astype("float32")
z = UMAP(n_neighbors=30).fit_transform(x)
TorchDR is fully GPU compatible, enabling significant speed-ups when a GPU is available. To run computations on the GPU, simply set device="cuda"
as shown in the example below:
z_gpu = UMAP(n_neighbors=30, device="cuda").fit_transform(x)
TorchDR supports torch.compile
for an additional performance boost on modern PyTorch versions. Just add the compile=True
flag as follows:
z_gpu_compile = UMAP(n_neighbors=30, device="cuda", compile=True).fit_transform(x)
The backend
keyword specifies which tool to use for handling kNN computations and memory-efficient symbolic computations.
- Set
backend="faiss"
to rely on Faiss for fast kNN computations (Recommended). - To perform exact symbolic tensor computations on the GPU without memory limitations, you can leverage the KeOps library. This library also allows computing kNN graphs. To enable KeOps, set
backend="keops"
. - Finally, setting
backend=None
will use raw PyTorch for all computations.
TorchDR provides a suite of neighbor embedding methods.
Linear-time (Negative Sampling). State-of-the-art speed on large datasets: UMAP
, LargeVis
, InfoTSNE
, PACMAP
.
Quadratic-time (Exact Repulsion). Compute the full pairwise repulsion: SNE
, TSNE
, TSNEkhorn
, COSNE
.
Remark. For quadratic-time algorithms,
TorchDR
provides exact implementations that scale linearly in memory usingbackend=keops
. ForTSNE
specifically, one can also explore fast approximations, such asFIt-SNE
implemented in tsne-cuda, which bypass full pairwise repulsion.
TorchDR provides various spectral embedding methods: PCA
, IncrementalPCA
, KernelPCA
, PHATE
.
Relying on TorchDR enables an orders-of-magnitude improvement in runtime performance compared to CPU-based implementations. See the code.
See the examples folder for all examples.
MNIST. (Code) A comparison of various neighbor embedding methods on the MNIST digits dataset.
CIFAR100. (Code)
Visualizing the CIFAR100 dataset using DINO features and TSNE
.
TorchDR features a wide range of affinities which can then be used as a building block for DR algorithms. It includes:
- Affinities based on k-NN normalizations:
SelfTuningAffinity
,MAGICAffinity
,UMAPAffinity
,PHATEAffinity
,PACMAPAffinity
. - Doubly stochastic affinities:
SinkhornAffinity
,DoublyStochasticQuadraticAffinity
. - Adaptive affinities with entropy control:
EntropicAffinity
,SymmetricEntropicAffinity
.
TorchDR provides efficient GPU-compatible evaluation metrics: silhouette_score
.
Install the core torchdr
library from PyPI:
pip install torchdr
torchdr
does not install faiss-gpu
or pykeops
by default. You need to install them separately to use the corresponding backends.
-
Faiss (Recommended): For the fastest k-NN computations, install Faiss. Please follow their official installation guide. A common method is using
conda
:conda install -c pytorch -c nvidia faiss-gpu
-
KeOps: For memory-efficient symbolic computations, install PyKeOps.
pip install pykeops
If you want to use the latest, unreleased version of torchdr
, you can install it directly from GitHub:
pip install git+https://github.com/torchdr/torchdr
If you have any questions or suggestions, feel free to open an issue on the issue tracker or contact Hugues Van Assel directly.