TorchDR is an open-source dimensionality reduction (DR) library using PyTorch. Its goal is to provide fast GPU-compatible implementations of DR algorithms, as well as to accelerate the development of new DR methods by providing a common simplified framework.
DR aims to construct a low-dimensional representation (or embedding) of an input dataset that best preserves its geometry encoded via a pairwise affinity matrix. To this end, DR methods optimize the embedding such that its associated pairwise affinity matrix matches the input affinity. TorchDR provides a general framework for solving problems of this form. Defining a DR algorithm solely requires choosing or implementing an Affinity object for both input and embedding as well as an objective function.
- Speed: supports GPU acceleration, leverages sparsity and sampling strategies with contrastive learning techniques.
- Modularity: all of it is written in Python in a highly modular way, making it easy to create or transform components.
- Memory efficiency: relies on sparsity and/or symbolic tensors to avoid memory overflows.
- Compatibility: implemented methods are fully compatible with the sklearn API and torch ecosystem.
TorchDR
offers a user-friendly API similar to scikit-learn where dimensionality reduction modules can be called with the fit_transform
method. It seamlessly accepts both NumPy arrays and PyTorch tensors as input, ensuring that the output matches the type and backend of the input.
from sklearn.datasets import fetch_openml
from torchdr import PCA, TSNE
x = fetch_openml("mnist_784").data.astype("float32")
x_ = PCA(n_components=50).fit_transform(x)
z = TSNE(perplexity=30).fit_transform(x_)
TorchDR
is fully GPU compatible, enabling significant speed-ups when a GPU is available. To run computations on the GPU, simply set device="cuda"
as shown in the example below:
z_gpu = TSNE(perplexity=30, device="cuda").fit_transform(x_)
The backend
keyword specifies which tool to use for handling kNN computations and memory-efficient symbolic computations.
- To perform symbolic tensor computations on the GPU without memory limitations, you can leverage the KeOps Library. This library also allows computing kNN graphs. To enable KeOps, set
backend="keops"
. - Alternatively, you can use
backend="faiss"
to rely on Faiss for fast kNN computations. - Finally, setting
backend=None
will use raw PyTorch for all computations.
Relying on TorchDR
enables an order-of-magnitude improvement in both runtime and memory performance compared to CPU-based implementations. See the code. Stay tuned for additional benchmarks.
Dataset | Samples | Method | Runtime (sec) | Memory (MB) |
---|---|---|---|---|
Macosko | 44,808 | Classic UMAP (CPU) | 61.3 | 410.9 |
TorchDR UMAP (GPU) | 7.7 | 100.4 | ||
10x Mouse Zheng | 1,306,127 | Classic UMAP (CPU) | 1910.4 | 11278.1 |
TorchDR UMAP (GPU) | 184.4 | 2699.7 |
See the examples folder for all examples.
MNIST. (Code) A comparison of various neighbor embedding methods on the MNIST digits dataset.
Single-cell genomics. (Code)
Visualizing cells using LargeVis
from TorchDR
.
CIFAR100. (Code) Visualizing the CIFAR100 dataset using DINO features and TSNE.
TorchDR
features a wide range of affinities which can then be used as a building block for DR algorithms. It includes:
- Usual affinities:
ScalarProductAffinity
,GaussianAffinity
,StudentAffinity
. - Affinities based on k-NN normalizations:
SelfTuningAffinity
,MAGICAffinity
. - Doubly stochastic affinities:
SinkhornAffinity
,DoublyStochasticQuadraticAffinity
. - Adaptive affinities with entropy control:
EntropicAffinity
,SymmetricEntropicAffinity
.
Spectral. TorchDR
provides spectral embeddings calculated via eigenvalue decomposition of the affinities or their Laplacian: PCA
, KernelPCA
, IncrementalPCA
.
Neighbor Embedding. TorchDR
includes various neighbor embedding methods: SNE
, TSNE
, TSNEkhorn
, UMAP
, LargeVis
, InfoTSNE
.
TorchDR
provides efficient GPU-compatible evaluation metrics: silhouette_score
.
You can install the toolbox through PyPI with:
pip install torchdr
To get the latest version, you can install it from the source code as follows:
pip install git+https://github.com/torchdr/torchdr
If you have any questions or suggestions, feel free to open an issue on the issue tracker or contact Hugues Van Assel directly.