Skip to content

MNT Speed up of plot_face_recognition.py example #21725

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

pramodatre
Copy link
Contributor

Reference Issues/PRs

References: #21598

What does this implement/fix? Explain your changes.

GridSearchCV now leverages all the cores on the machine by specifying n_jobs=-1 -- this is the most time consuming step in this example.

This example runs in 6s now which used to take around 20s (3x speedup)

Before:

Fitting the classifier to the training set
done in 21.296s

After:

Fitting the classifier to the training set
done in 6.311s

…ning GridSearchCV. Speed up is around 3x (plot_face_recognition.py running time reduced from 21s to 6s)
@ghost
Copy link

ghost commented Nov 21, 2021

@adrinjalali @ogrisel

Hey Pramod,

thanks for the PR. I appreciate the effort :)
However, we already had several discussions concerning the n_jobs parameter and decided on not changing it to -1 for concurrency reasons. Let me quote @ogrisel who argued:

`Running with -1 by default is problematic because on machines with a large number of CPUs (e.g. 64 or more), spawning the workers can dominate with concurrent access to the hard disk just to start the python interpreters and import the modules. Furthermore it can also use too much memory and cause crashes.

This is why we would rather use a small number of workers (e.g. 2 instead of -1) when we want to use parallelism in examples or tests in scikit-learn.`

Therefore, we might either change the param to 2 or start looking for some other ways of speeding up the exec.

@ghost
Copy link

ghost commented Nov 21, 2021

@pramodatre You could maybe try and alter the n_components param and see if the results are stable or much worse (maybe go down from 150 to 120 or 100)

@pramodatre
Copy link
Contributor Author

@sveneschlbeck Thank you for your note on n_jobs=-1. So, does that mean we can set n_jobs=2 then?

Also, I will try to play with n_components and report on performance changes.

@ghost
Copy link

ghost commented Nov 21, 2021

@pramodatre I would wait for a second reviewer but yes, I think that setting n_jobs to 2 is good. This should already improve the exec time. The point is just that we don't want to use ALL ressources available...
Great, waiting for your report :)

@pramodatre
Copy link
Contributor Author

@sveneschlbeck Sounds good for the second reviewer input for n_jobs. In parallel, I will be exploring other options as you suggested. BTW, this is my first PR for an open-source project and I'm thrilled to learn more about the process (I'm already learning!). Thank you for your help!

@ghost
Copy link

ghost commented Nov 21, 2021

@pramodatre Sounds great! All help is welcome. :)

Here's a tip for you. If we make improvements or changes to the code, we always back it up by posting some kind of proof (proof that it did indeed increase speed in this case). This could be done by pasting in a screenshot of your execution showing both the execution time before/after and the output result (plots) before/after. This makes it easy to analyse whether the changes are helpful or not.

In general, we want to decrease the execution time while keeping the results the same, meaning

  • Plots should look the same or similar (small differences are no problem)
  • Accuracy, Loss Rate or other metrics should no vary to much (0.98 insted of 0.99 would be acceptable but not 0.5 instead of 0.9)
  • The main idea or learning goal behind an example should not be falsified (e.g. when the purpose of an example is to show the working of a specific function or parameter, then deleting or replacing that very parameter would counteract the learning effect

@pramodatre
Copy link
Contributor Author

@sveneschlbeck That's very helpful! Thank you for summarizing the goal for this effort :)

@ogrisel
Copy link
Member

ogrisel commented Nov 22, 2021

Let's use n_jobs=2 instead of n_jobs=-1. Please also report the difference in speed you observe when running with n_jobs=2 vs the default (sequential run).

Sometimes starting work processes and data communication overhead between processes do no make it possible to observe an actual speed improvement (depending the size of the data and the duration of the parallel tasks).

@@ -108,7 +108,7 @@
"C": [1e3, 5e3, 1e4, 5e4, 1e5],
"gamma": [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1],
}
clf = GridSearchCV(SVC(kernel="rbf", class_weight="balanced"), param_grid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively trying RandomizedSearchCV with n_iter=10 should work well here and be faster than an exhaustive grid search of the 5x5 combinations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, I think that it could be good to add a StandardScaler before applying the PCA.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RandomizedSearch looks like a good idea using the loguniform to sample C and gamma.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this idea! Yeah, totally makes sense! I will try this option and report performance differences I register.

@glemaitre glemaitre changed the title Speed up of plot_face_recognition.py example MNT Speed up of plot_face_recognition.py example Nov 22, 2021
@adrinjalali adrinjalali mentioned this pull request Nov 22, 2021
41 tasks
@pramodatre
Copy link
Contributor Author

Let's use n_jobs=2 instead of n_jobs=-1. Please also report the difference in speed you observe when running with n_jobs=2 vs the default (sequential run).

Sometimes starting work processes and data communication overhead between processes do no make it possible to observe an actual speed improvement (depending the size of the data and the duration of the parallel tasks).

Sounds good! Just curious of the choice of n_jobs=2 -- is this decision based on the most common hardware configuration of dual core?

@glemaitre
Copy link
Member

The CircleCI machine allocates us only 2 cores if I am not wrong. Here, it should be enough for the task that we want to do indeed. In practice, it would be a bit of a waste to take 40 cores of a big machine for this example :)

In some way, this is more pedagogical to show a user that you should set the value to a meaningful value instead of just expanding to whatever is available.

@pramodatre
Copy link
Contributor Author

@glemaitre Thank you for the explanation!

@pramodatre
Copy link
Contributor Author

Here are the results of various experiments. In conclusion, n_components = 150, RandomizedSearchCV, n_iter=10 gives comparable results to the expected precision/recall/f-score mentioned in the example docstring. The running time for this configuration is around 6s compared to 21s (around 3x speedup).

Please look at the Averages for all experiments section in the excel sheet.

Scikit-learn example plot_face_recognition.py profiling.xlsx

@glemaitre
Copy link
Member

glemaitre commented Nov 23, 2021

@pramodatre Thanks. Could you provide a markdown table with the results instead of an Excel spreadsheet? It would be easier to have a self-contained message in the PR.

I push with the solution is as well a good thing because we will run the example and check the speed-up that we get on CircleCI compared to the documentation created for dev.

@pramodatre
Copy link
Contributor Author

@pramodatre Thanks. Could you provide a markdown table with the results instead of an Excel spreadsheet? It would be easier to have a self-contained message in the PR.

I push with the solution is as well a good thing because we will run the example and check the speed-up that we get on CircleCI compared to the documentation created for dev.

@glemaitre Thanks for the feedback! Yeah, did not realize that excel will hide all the details. Are screenshots of the evaluation tables an option? If not, for sure, I will create markdown. Please let me know.

@glemaitre
Copy link
Member

glemaitre commented Nov 23, 2021 via email

@pramodatre
Copy link
Contributor Author

@glemaitre Yeah, good point! I will go ahead and create markdown tables then. Thanks!

@pramodatre
Copy link
Contributor Author

pramodatre commented Nov 23, 2021

Here are expected precision, recall, and f-scores in the example docstring

  precision recall f1-score
Ariel Sharon 0.67 0.92 0.77
Colin Powell 0.75 0.78 0.76
Donald Rumsfeld 0.78 0.67 0.72
George W Bush 0.86 0.86 0.86
Gerhard Schroeder 0.76 0.76 0.76
Hugo Chavez 0.67 0.67 0.67
Tony Blair 0.81 0.69 0.75

Here are the results for average precision, recall, and f-score for five runs, i.e., each table presented below contains values that are averaged over five runs of the same experiment.

 n_components = 150, RandomizedSearchCV, n_iter=15 precision recall f1-score
Ariel Sharon 0.708 0.538 0.6
Colin Powell 0.792 0.866 0.826
Donald Rumsfeld 0.886 0.658 0.754
George W Bush 0.848 0.954 0.894
Gerhard Schroeder 0.908 0.792 0.844
Hugo Chavez 0.89 0.544 0.676
Tony Blair 0.962 0.786 0.862
Avg train time (s) 11.1036    
n_components = 150, RandomizedSearchCV, n_iter=10  precision recall f1-score
Ariel Sharon 0.672 0.63 0.624
Colin Powell 0.764 0.868 0.812
Donald Rumsfeld 0.784 0.688 0.728
George W Bush 0.882 0.918 0.9
Gerhard Schroeder 0.864 0.808 0.832
Hugo Chavez 0.858 0.532 0.652
Tony Blair 0.928 0.78 0.848
Avg train time (s) 6.449    
n_components = 120, GridSeachCV, sequential execution  precision recall f1-score
Ariel Sharon 0.706 0.708 0.706
Colin Powell 0.824 0.882 0.85
Donald Rumsfeld 0.892 0.63 0.74
George W Bush 0.874 0.972 0.924
Gerhard Schroeder 0.896 0.824 0.856
Hugo Chavez 0.918 0.586 0.716
Tony Blair 0.93 0.792 0.854
Avg train time (s) 19.792    
n_components = 120, GridSearchCV, n_jobs=2  precision recall f1-score
Ariel Sharon 0.658 0.648 0.65
Colin Powell 0.808 0.882 0.842
Donald Rumsfeld 0.87 0.646 0.74
George W Bush 0.876 0.968 0.916
Gerhard Schroeder 0.91 0.816 0.858
Hugo Chavez 0.936 0.544 0.692
Tony Blair 0.936 0.798 0.862
Avg train time (s) 9.848    
n_components = 150, GridSearchCV, n_jobs=2  precision recall f1-score
Ariel Sharon 0.752 0.522 0.606
Colin Powell 0.794 0.872 0.832
Donald Rumsfeld 0.86 0.674 0.752
George W Bush 0.85 0.964 0.902
Gerhard Schroeder 0.932 0.792 0.858
Hugo Chavez 0.902 0.518 0.66
Tony Blair 0.948 0.778 0.852
Avg train time (s) 11.8114    

Note that the average precision, recall, and f-scores for the configuration n_components = 150, RandomizedSearchCV, n_iter=10 is better than the first table (expected performance). Hence, this configuration gives a speedup of 3x (6s instead of 20s) for running this example.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good! Thanks for reporting your experiment results with markdown tables in the comments of this PR.

While we are at it, can you please remove the table with the expected results (to avoid confusion if the values don't match the actual output).

And also: please replace lines like:

# #############################################################################
# Download the data, if not already on disk and load it as numpy arrays
lfw_people = fetch_lfw_people(min_faces_per_person=70, resize=0.4)

By:

# %%
# Download the data, if not already on disk and load it as numpy arrays

# %%
lfw_people = fetch_lfw_people(min_faces_per_person=70, resize=0.4)

This should split the example into rendered notebook cells that will render better with our modern version of sphinx-gallery. Feel free to also insert a line with:

# %%

at the top of the notebook before the from time import time import statement so that all the code cells are easy to execute using the jupyter integration of VS Code or PyCharm for instance.

@ogrisel
Copy link
Member

ogrisel commented Nov 25, 2021

By looking at the rendered page, the new cell-based layout is much nicer to read. Thanks!

https://164173-843222-gh.circle-artifacts.com/0/doc/_changed.html

I believe the last code cell is too big and it would be worth splitting it to put the eigenface gallery in its own cell.

@ogrisel
Copy link
Member

ogrisel commented Nov 25, 2021

The confusion matrix could also be split in its own code cell.

@ogrisel
Copy link
Member

ogrisel commented Nov 25, 2021

And actually the confusion matrix could better be displayed with this utility:

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.ConfusionMatrixDisplay.html#sklearn.metrics.ConfusionMatrixDisplay.from_predictions

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So usually we intent to use it as follow with sphinx-gallery

# %%
# Some markdown
print("we start directly the code without a new marker").

I made a couple of suggestions to correct it.

@@ -108,7 +108,7 @@
"C": [1e3, 5e3, 1e4, 5e4, 1e5],
"gamma": [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1],
}
clf = GridSearchCV(SVC(kernel="rbf", class_weight="balanced"), param_grid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, I think that it could be good to add a StandardScaler before applying the PCA.

@@ -108,7 +108,7 @@
"C": [1e3, 5e3, 1e4, 5e4, 1e5],
"gamma": [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1],
}
clf = GridSearchCV(SVC(kernel="rbf", class_weight="balanced"), param_grid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RandomizedSearch looks like a good idea using the loguniform to sample C and gamma.

# Download the data, if not already on disk and load it as numpy arrays

# %%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to keep the sphinx-gallery style that is slightly different from the vs code style

Suggested change
# %%

Comment on lines 56 to 60
# %%
# Split into a training set and a test set using a stratified k fold

# split into a training and testing set
# %%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# %%
# Split into a training set and a test set using a stratified k fold
# split into a training and testing set
# %%
# %%
# Split into a training set and a test and keep 25% of the data for testing.

# Compute a PCA (eigenfaces) on the face dataset (treated as unlabeled
# dataset): unsupervised feature extraction / dimensionality reduction
# %%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# %%

# Train a SVM classification model

# %%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# %%

# Quantitative evaluation of the model quality on the test set

# %%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# %%

# Qualitative evaluation of the predictions using matplotlib
# %%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# %%

@glemaitre glemaitre self-requested a review November 25, 2021 09:58
Comment on lines 93 to 96
param_grid = {
"C": [1e3, 5e3, 1e4, 5e4, 1e5],
"gamma": [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1],
}
Copy link
Member

@glemaitre glemaitre Nov 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it would make sense to change the following with distributions:

param_distribution = {
    "C": loguniform(1e3, 1e5),
    "gamma": loguniform(1e-4, 1e-1),
}

where loguniform is imported as from sklearn.utils.fixes import loguniform.
I checked locally and it provides the same results in average.

@pramodatre
Copy link
Contributor Author

pramodatre commented Nov 29, 2021

By looking at the rendered page, the new cell-based layout is much nicer to read. Thanks!

https://164173-843222-gh.circle-artifacts.com/0/doc/_changed.html

I believe the last code cell is too big and it would be worth splitting it to put the eigenface gallery in its own cell.

@ogrisel @glemaitre Do you have any suggestion for me to visualize this documentation locally? This way, I can check the documentation before I commit the changes.

No worries if such a setup is not possible. I found that the CI provides this link once I push my commit.

@pramodatre
Copy link
Contributor Author

pramodatre commented Dec 1, 2021

@ogrisel @glemaitre Thank you for all the feedback, I have incorporated all your feedback with the last two commits. Please let me know if you want me to resolve conversation OR is this something you will decide looking at my commit?

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great thank!

Just a final comment suggestion below. Also could you please add a final comment at the end of the notebook that the face recognition problem would be much more efficiently addressed by training convolutional neural networks but this family of models is outside of the scope of the scikit-learn library. Interested readers should instead try to use pytorch or tensorflow.

@@ -124,10 +115,11 @@
print("done in %0.3fs" % (time() - t0))

print(classification_report(y_test, y_pred, target_names=target_names))
print(confusion_matrix(y_test, y_pred, labels=range(n_classes)))
ConfusionMatrixDisplay.from_estimator(clf, X_test_pca, y_test)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Could you please try with:

Suggested change
ConfusionMatrixDisplay.from_estimator(clf, X_test_pca, y_test)
ConfusionMatrixDisplay.from_estimator(
clf, X_test_pca, y_test, display_labels=target_names
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This makes it easy to read the confusion matrix :)

Comment on lines 30 to 32
# Display progress logs on stdout
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(message)s")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe those lines have no longer any effect:

Suggested change
# Display progress logs on stdout
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(message)s")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Removed these lines.

@glemaitre glemaitre merged commit f853e78 into scikit-learn:main Dec 15, 2021
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @pramodatre

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Dec 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants