Torch Dimensionality Reduction

TorchDR is an open-source dimensionality reduction (DR) library using PyTorch. Its goal is to provide fast GPU-compatible implementations of DR algorithms, as well as to accelerate the development of new DR methods by providing a common simplified framework.

DR aims to construct a low-dimensional representation (or embedding) of an input dataset that best preserves its geometry encoded via a pairwise affinity matrix. To this end, DR methods optimize the embedding such that its associated pairwise affinity matrix matches the input affinity. TorchDR provides a general framework for solving problems of this form. Defining a DR algorithm solely requires choosing or implementing an Affinity object for both input and embedding as well as an objective function.

Benefits of TorchDR

Speed: supports GPU acceleration, leverages sparsity and sampling strategies with contrastive learning techniques.
Modularity: all of it is written in Python in a highly modular way, making it easy to create or transform components.
Memory efficiency: relies on sparsity and/or symbolic tensors to avoid memory overflows.
Compatibility: implemented methods are fully compatible with the sklearn API and torch ecosystem.

Getting Started

TorchDR offers a user-friendly API similar to scikit-learn where dimensionality reduction modules can be called with the fit_transform method. It seamlessly accepts both NumPy arrays and PyTorch tensors as input, ensuring that the output matches the type and backend of the input.

from sklearn.datasets import fetch_openml
from torchdr import PCA, TSNE

x = fetch_openml("mnist_784").data.astype("float32")

x_ = PCA(n_components=50).fit_transform(x)
z = TSNE(perplexity=30).fit_transform(x_)

TorchDR is fully GPU compatible, enabling significant speed-ups when a GPU is available. To run computations on the GPU, simply set device="cuda" as shown in the example below:

z_gpu = TSNE(perplexity=30, device="cuda").fit_transform(x_)

Backends

The backend keyword specifies which tool to use for handling kNN computations and memory-efficient symbolic computations.

To perform symbolic tensor computations on the GPU without memory limitations, you can leverage the KeOps Library. This library also allows computing kNN graphs. To enable KeOps, set backend="keops".
Alternatively, you can use backend="faiss" to rely on Faiss for fast kNN computations.
Finally, setting backend=None will use raw PyTorch for all computations.

Benchmarks

Relying on TorchDR enables an order-of-magnitude improvement in both runtime and memory performance compared to CPU-based implementations. See the code. Stay tuned for additional benchmarks.

Dataset	Samples	Method	Runtime (sec)	Memory (MB)
Macosko	44,808	Classic UMAP (CPU)	61.3	410.9
		TorchDR UMAP (GPU)	7.7	100.4
10x Mouse Zheng	1,306,127	Classic UMAP (CPU)	1910.4	11278.1
		TorchDR UMAP (GPU)	184.4	2699.7

Examples

See the examples folder for all examples.

MNIST. (Code) A comparison of various neighbor embedding methods on the MNIST digits dataset.

Single-cell genomics. (Code) Visualizing cells using LargeVis from TorchDR.

CIFAR100. (Code) Visualizing the CIFAR100 dataset using DINO features and TSNE.

Implemented Features (to date)

Affinities

TorchDR features a wide range of affinities which can then be used as a building block for DR algorithms. It includes:

Usual affinities: ScalarProductAffinity, GaussianAffinity, StudentAffinity.
Affinities based on k-NN normalizations: SelfTuningAffinity, MAGICAffinity.
Doubly stochastic affinities: SinkhornAffinity, DoublyStochasticQuadraticAffinity.
Adaptive affinities with entropy control: EntropicAffinity, SymmetricEntropicAffinity.

Dimensionality Reduction Algorithms

Spectral. TorchDR provides spectral embeddings calculated via eigenvalue decomposition of the affinities or their Laplacian: PCA, KernelPCA, IncrementalPCA.

Neighbor Embedding. TorchDR includes various neighbor embedding methods: SNE, TSNE, TSNEkhorn, UMAP, LargeVis, InfoTSNE.

Evaluation Metric

TorchDR provides efficient GPU-compatible evaluation metrics: silhouette_score.

Installation

You can install the toolbox through PyPI with:

pip install torchdr

To get the latest version, you can install it from the source code as follows:

pip install git+https://github.com/torchdr/torchdr

Finding Help

If you have any questions or suggestions, feel free to open an issue on the issue tracker or contact Hugues Van Assel directly.

Name		Name	Last commit message	Last commit date
Latest commit History 187 Commits
.circleci		.circleci
.github		.github
benchmarks		benchmarks
docs		docs
examples		examples
torchdr		torchdr
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
RELEASES.rst		RELEASES.rst
codecov.yml		codecov.yml
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Torch Dimensionality Reduction

Benefits of TorchDR

Getting Started

Backends

Benchmarks

Examples

Implemented Features (to date)

Affinities

Dimensionality Reduction Algorithms

Evaluation Metric

Installation

Finding Help

About

Releases 3

Packages

Contributors 7

Languages

License

TorchDR/TorchDR

Folders and files

Latest commit

History

Repository files navigation

Torch Dimensionality Reduction

Benefits of TorchDR

Getting Started

Backends

Benchmarks

Examples

Implemented Features (to date)

Affinities

Dimensionality Reduction Algorithms

Evaluation Metric

Installation

Finding Help

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 7

Languages

Packages