[MRG] Example Bag-of-Visual-Words #6509

glemaitre · 2016-03-08T20:18:06Z

This is a first draft of the example illustrating the BOVW using scikit-learn.

amueller · 2016-03-08T23:17:25Z

using k-means++ often means that the initialization takes much longer than the actual algorithm (for large n_clusters in praticular).
I wouldn't do cross-validation, and I'd see if you can make it faster by using less words or less iterations and get comparable performance.

amueller · 2016-03-08T23:18:33Z

sklearn/datasets/tudarmstadt.py

@@ -0,0 +1,126 @@
+"""TU Darmstadt dataset.


maybe I'll just add this to the example, and not put it in the datasets folder. This is not really a widely used dataset. and once it is in the public api, it is hard to get rid of.

glemaitre · 2016-03-09T01:18:47Z

I moved the dataset into example as suggested.

Regarding the parameters, I decrease the number of words to 80. I also limited the maximum number of patches extracted to 20,000 per images (maybe we will classify grass instead of cow), which is about 1/4-5th of the possible total number of patches.

The results seem appropriate with a large decrease of the computation time, which start to be suitable.

Classification performed - the confusion matrix obtained is:
[[23  0  0]
 [ 1 19  0]
 [ 0  0 23]]
It took 85.6792821884 seconds.

GaelVaroquaux · 2016-03-09T06:16:15Z

It took 85.6792821884 seconds.

That's looking great! If you manage to get this below 1 min (you're almost there), we can have this as one of our standard examples.

glemaitre · 2016-03-09T08:38:09Z

Using 50 words will make drop a bit the accuracy but the time is 35 sec.

Classification performed - the confusion matrix obtained is:
[[19  4  0]
 [ 0 20  0]
 [ 0  0 23]]
It took 33.9749679565 seconds.

Also, I am taking advantage of my 4 available threads (i7 2620M). I don't know if it has to be taken into account.

glemaitre · 2016-03-11T15:50:50Z

@GaelVaroquaux @amueller

What further improvements/changes should I take care of?

Cheers,

glemaitre · 2016-04-23T13:50:25Z

@amueller @GaelVaroquaux

What else should I look at?

amueller · 2016-10-11T01:42:15Z

sorry for the slow reply.
Please fix python3 compatibility, rename the example to plot_bovw.py" (theplot_`` is the important part, I just thought "example" was a bit redundant in the example folder ;).

Also add a header to the example and a text explaining what's happening.
thanks!

glemaitre · 2016-10-11T12:28:19Z

Ok, I am up to date now with the requirements. I'll do that in the week-end and ping once done.

…sets

… application

glemaitre · 2016-10-11T19:54:35Z

@amueller Shall I remove the timer?

glemaitre · 2016-10-18T22:05:42Z

@amueller @GaelVaroquaux Do you see any other additional improvements to bring?

Should tudarmstadt.py be moved into the sklearn.datasets module.

I decided to decrease the number of patches to get under 10 sec of processing.
I got that confusion matrix which I think is fine for an example:

[[21  2  0]
 [ 4 16  0]
 [ 0  0 23]]

We are confusing cars and motorbike but we get the cows right ;D

amueller · 2016-10-19T15:14:01Z

examples/bovw/plot_bovw.py

+Bag of Visual Words
+===================
+
+An illustration of a Bag of Visual Words approach for image recognition


Maybe explain that this is not how one would actually solve this problem but that it's simplified not use any vision library and to run fast enough ;)

amueller · 2016-10-19T15:14:40Z

Can you check the pep8 errors please?

glemaitre · 2016-10-19T15:51:19Z

Can you check the pep8 errors please?

@amueller I have the following errors since the doc come first.

plot_bovw.py:25:1: E402 module level import not at top of file
plot_bovw.py:26:1: E402 module level import not at top of file
plot_bovw.py:28:1: E402 module level import not at top of file
plot_bovw.py:29:1: E402 module level import not at top of file
plot_bovw.py:30:1: E402 module level import not at top of file
plot_bovw.py:31:1: E402 module level import not at top of file
plot_bovw.py:32:1: E402 module level import not at top of file
plot_bovw.py:33:1: E402 module level import not at top of file
plot_bovw.py:34:1: E402 module level import not at top of file
plot_bovw.py:36:1: E402 module level import not at top of file

raghavrv · 2016-10-31T16:07:45Z

examples/bovw/plot_bovw.py

+from sklearn.metrics import confusion_matrix
+from sklearn.externals.joblib import Parallel, delayed
+
+from tudarmstadt import fetch_tu_darmstadt


This needs to be moved to sklearn/datasets

glemaitre · 2016-10-31T18:22:45Z

@raghavrv I moved it. But I will recall what @amueller mentioned earlier

maybe I'll just add this to the example, and not put it in the datasets folder. This is not really a widely used dataset. and once it is in the public api, it is hard to get rid of.

That's why we kept it originally in the example folder. So let me know what is the best.

raghavrv · 2016-10-31T22:57:52Z

That's why we kept it originally in the example folder. So let me know what is the best.

Argh. My bad! I did not notice that comment... Could you very kindly revert please?

glemaitre · 2016-10-31T23:05:23Z

It has been hidden since that I previously addressed it. I remind it when it pop up after reversing.

glemaitre · 2016-12-01T10:48:00Z

@amueller ping

lesteve · 2016-12-01T15:36:37Z

That's why we kept it originally in the example folder. So let me know what is the best.

Hmmm but then it kind of defeats the implicit contract of an example which is being able to copy and paste it in an IPython session and have it running without any hitch.

Not sure what is the best way of doing this (the dataset is 42MB so too big to be included in the scikit-learn repo). Here are a few alternatives I can think of:

put the fetcher code in the example
keep it like this but explain in the example docstring that you need to copy locally the fetcher file (potentially linking to its example HTML URL)
create a sklearn.example_datasets for datasets that are only used in the examples

Not convinced by any of these alternatives, so better suggestions welcome!

lesteve · 2016-12-01T16:06:28Z

Another alternative: you download the fetcher file tudarmstadt.py from the example gallery "Download Python file" URL and exec in inside plot_bovw.py. Keeps the example simple, maybe a little bit too much magic ...

GaelVaroquaux · 2016-12-02T06:47:58Z

I think that it needs either to be added inside the example or moved to the datasets module.

If we move it to the datasets module, do you foresee other examples built upon it?

glemaitre · 2016-12-04T12:34:23Z

I think that there is room for additional examples using this dataset and core methods from sklearn.
I would probably focus on examples proposed in the PASCAL challenge.

amueller · 2016-12-06T21:12:39Z

I don't think we'll have other examples use visual datasets (either this or PASCAL VOC) because we don't have the tools to extract the right features - and because everything runs waaaay to long for a scikit-learn example.

amueller · 2016-12-06T21:14:04Z

So therefore I'd rather not have it in the datasets folder - it's not very relevant to current computer vision, and it's not really helpful to show anything else in scikit-learn. But the example file is already quite long...

amueller · 2016-12-06T21:16:07Z

examples/bovw/plot_bovw.py

+# Define the parameters in use afterwards
+patch_size = (9, 9)
+max_patches = 100
+n_jobs = -1


we can't really do that in an example, can we? We have j_jobs=3 and n_jobs=4 in some examples, though I'm not sure if that is a good idea.

I modified to be on a single core.

amueller · 2016-12-06T21:17:34Z

examples/bovw/plot_bovw.py

+patch_arr = patch_arr.reshape((patch_arr.shape[0] * patch_arr.shape[1],
+                               patch_arr.shape[2]))
+# Build a PCA dictionary
+dict_PCA = PCA(n_components=n_components, random_state=rng)


Do the results get much worse / are much slower if you work on the raw pixels instead of PCA components?

The results are equivalent. I remove the PCA since it shorten the example and this is still a texton approach.

glemaitre · 2016-12-06T22:35:47Z

I don't think we'll have other examples use visual datasets (either this or PASCAL VOC) because we don't have the tools to extract the right features - and because everything runs waaaay to long for a scikit-learn example.

Fair enough. With this example, the only interesting point to me is the parallel between the Bag of Words and Bag of Visual Words.

glemaitre · 2016-12-15T13:05:55Z

@amueller are the last changes fine with you?

glemaitre · 2017-01-12T10:54:07Z

@amueller @GaelVaroquaux is this PR still of interest for merging or it should be closed?

glemaitre · 2017-02-23T17:19:11Z

I am closing that PR. It seems that scikit-learn has already plenty of examples.

amueller reviewed Mar 8, 2016
View reviewed changes

Guillaume Lemaitre added 10 commits October 11, 2016 21:06

Add the first draft of the BoVW

7242783

Update the extraction

b0159bc

Change size of the batch

4dd4f66

Correct a stupid stacking mistake to create the training and testing …

8f33995

…sets

Correct the error of conversion

5dd3f08

Move the dataset in example and set-up the parameters to speed-up the…

4bcf0fe

… application

Reducing the number of words

60c7c74

Respect PEP8 standard

88a9415

Change stratifiedkfold from cross_validation to model_selection

3f36f25

Fix header, Python 3, and import

8e46d38

glemaitre force-pushed the example_bow branch from 6309899 to 8e46d38 Compare October 11, 2016 19:51

Remove a useless diff

13b8bb8

RPGOne approved these changes Oct 11, 2016

View reviewed changes

Remove timer

1fbef83

amueller reviewed Oct 19, 2016

View reviewed changes

Add more explanation and remove useless module

6e13c15

raghavrv suggested changes Oct 31, 2016

View reviewed changes

Update the doc

3c853eb

glemaitre force-pushed the example_bow branch from 3851d39 to 3c853eb Compare October 31, 2016 23:03

nelson-liu mentioned this pull request Nov 19, 2016

[MRG+1] Support Vector Data Description #7910

Open

Moved the print to avoid the E402

df1f169

amueller reviewed Dec 6, 2016

View reviewed changes

Remove PCA and make it more pythonic

b5a5d7f

PEP8

c5f13ac

glemaitre force-pushed the example_bow branch from c9d04f9 to c5f13ac Compare December 7, 2016 08:54

glemaitre changed the title ~~[WIP] Example Bag-of-Visual-Words~~ [MRG] Example Bag-of-Visual-Words Jan 12, 2017

glemaitre closed this Feb 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Example Bag-of-Visual-Words #6509

[MRG] Example Bag-of-Visual-Words #6509

glemaitre commented Mar 8, 2016

amueller commented Mar 8, 2016

amueller Mar 8, 2016

glemaitre commented Mar 9, 2016

GaelVaroquaux commented Mar 9, 2016 via email

glemaitre commented Mar 9, 2016

glemaitre commented Mar 11, 2016

glemaitre commented Apr 23, 2016

amueller commented Oct 11, 2016

glemaitre commented Oct 11, 2016

glemaitre commented Oct 11, 2016

glemaitre commented Oct 18, 2016

amueller Oct 19, 2016

amueller commented Oct 19, 2016

glemaitre commented Oct 19, 2016

raghavrv Oct 31, 2016

glemaitre commented Oct 31, 2016

raghavrv commented Oct 31, 2016

glemaitre commented Oct 31, 2016

glemaitre commented Dec 1, 2016

lesteve commented Dec 1, 2016

lesteve commented Dec 1, 2016 •

edited

Loading

GaelVaroquaux commented Dec 2, 2016

glemaitre commented Dec 4, 2016

amueller commented Dec 6, 2016

amueller commented Dec 6, 2016

amueller Dec 6, 2016

glemaitre Dec 6, 2016

amueller Dec 6, 2016

glemaitre Dec 6, 2016

glemaitre commented Dec 6, 2016

glemaitre commented Dec 15, 2016

glemaitre commented Jan 12, 2017

glemaitre commented Feb 23, 2017

[MRG] Example Bag-of-Visual-Words #6509

[MRG] Example Bag-of-Visual-Words #6509

Conversation

glemaitre commented Mar 8, 2016

amueller commented Mar 8, 2016

amueller Mar 8, 2016

Choose a reason for hiding this comment

glemaitre commented Mar 9, 2016

GaelVaroquaux commented Mar 9, 2016 via email

glemaitre commented Mar 9, 2016

glemaitre commented Mar 11, 2016

glemaitre commented Apr 23, 2016

amueller commented Oct 11, 2016

glemaitre commented Oct 11, 2016

glemaitre commented Oct 11, 2016

glemaitre commented Oct 18, 2016

amueller Oct 19, 2016

Choose a reason for hiding this comment

amueller commented Oct 19, 2016

glemaitre commented Oct 19, 2016

raghavrv Oct 31, 2016

Choose a reason for hiding this comment

glemaitre commented Oct 31, 2016

raghavrv commented Oct 31, 2016

glemaitre commented Oct 31, 2016

glemaitre commented Dec 1, 2016

lesteve commented Dec 1, 2016

lesteve commented Dec 1, 2016 • edited Loading

GaelVaroquaux commented Dec 2, 2016

glemaitre commented Dec 4, 2016

amueller commented Dec 6, 2016

amueller commented Dec 6, 2016

amueller Dec 6, 2016

Choose a reason for hiding this comment

glemaitre Dec 6, 2016

Choose a reason for hiding this comment

amueller Dec 6, 2016

Choose a reason for hiding this comment

glemaitre Dec 6, 2016

Choose a reason for hiding this comment

glemaitre commented Dec 6, 2016

glemaitre commented Dec 15, 2016

glemaitre commented Jan 12, 2017

glemaitre commented Feb 23, 2017

lesteve commented Dec 1, 2016 •

edited

Loading