Skip to content

[WIP] Metric Learning :: NCA #4789

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 31 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
4065753
Vectorized implementation of NCA
May 29, 2015
3f7abd8
Merge branch 'master' of github.com:scikit-learn/scikit-learn into me…
May 29, 2015
16adf0c
Let SciPy decide which method to use
May 29, 2015
453f4e7
Adds adagrad
May 29, 2015
baf96e1
Adds loopy NCA optimizer (without memory overhead)
May 29, 2015
cae85f3
Bugfixes and fancy debug output
May 29, 2015
6fb2853
Semi-vectorized NCA oracle, merged adagrad and gd
May 30, 2015
7c78bc1
Major update
May 31, 2015
9c5449c
Simple benchmark
May 31, 2015
1bf9eb4
Replace tol with gtol for scipy's optimizer
May 31, 2015
251818a
Fixes PyCharm warnings
May 31, 2015
ea577c5
Replace expensive multiply-sum with fast np.einsum
Jun 1, 2015
b11c259
Make oracle non-nested function to facilitate profiling
Jun 2, 2015
777c441
Get rid of lambdas, vectorize logsumexp
Jun 3, 2015
f2cc168
Merge branch 'master' of github.com:scikit-learn/scikit-learn into me…
Jun 3, 2015
e7ba1e3
Vectorized np.exp in semivectorized version
Jun 3, 2015
5e30c7c
Reuse euclidean_distances
Jun 3, 2015
187f663
Adds euclidean_distances to vectorized version
Jun 4, 2015
fe8bf85
Arguments rename, basic implementation of stochastic solver
Jun 5, 2015
504e972
Some documentation
Jun 5, 2015
e06e634
Fixes
Jun 7, 2015
e7c76fb
Adds threshold parameter
Jun 9, 2015
7f02808
Reworked benchmark
Jun 9, 2015
76fe794
Merge branch 'master' of github.com:scikit-learn/scikit-learn into me…
Jun 9, 2015
92409fc
New benchmark, fix for loss calculation
Jun 14, 2015
4830a78
Mean execution time
Jun 22, 2015
ee2f95d
Propagate L inside
Jun 18, 2015
63997df
Propagate L flag, debug facilities, optimizations
Jun 24, 2015
ce5a9bd
Combine semivectorized and stochastic methods
Jun 27, 2015
8d6befe
Fix minor warnings
Jun 27, 2015
efce3f2
Merge branch 'master' of github.com:scikit-learn/scikit-learn into HEAD
Jun 27, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions benchmarks/bench_nca.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
"""
A comparison of different optimizers for NCA

Data: UC Irvine's Wine dataset.

"""
import numpy as np

from sklearn.base import clone
from sklearn import metrics
from sklearn.metric_learning import NCATransformer
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import Pipeline
from sklearn.cross_validation import KFold

from sklearn.utils.bench import total_seconds
from sklearn.datasets import fetch_mldata, make_classification

def get_params(learning_rate=1, loss='kl', max_iter=100, solver='adagrad',
random_state=0, verbose=2, n_init=10, method='semivectorized',
threshold=None):
return {
"learning_rate" : learning_rate,
"loss" : loss,
"max_iter" : max_iter,
"solver" : solver,
"random_state" : random_state,
"verbose" : verbose,
"n_init" : n_init,
"method" : method,
"threshold" : threshold,
}

if __name__ == '__main__':

separator = "=" * 30 + "\n"
wine_ds = fetch_mldata('wine')

datasets = [
('UCI Wine', (wine_ds.data, wine_ds.target)),
('mk_cls', make_classification(n_samples=300, n_features=20, n_informative=2, n_classes=2))
]

params = [
get_params(method='semivectorized'),
get_params(method='vectorized'),
get_params(method='semivectorized', threshold=0),
get_params(method='vectorized', threshold=0),
]

for name, (X, y) in datasets:
print(separator)
print("Using {} dataset".format(name))
print(separator)

for args in params:
nca = NCATransformer(**args)
print(nca)
nca.fit(X, y)
print(separator)

9 changes: 9 additions & 0 deletions sklearn/metric_learning/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
"""
The :mod:`sklearn.metric_learning` module includes models that find
a metric that makes samples from the same class closer than samples
from different classes.
"""

from .nca import NCATransformer

__all__ = ['NCATransformer']
Loading