Skip to content

[MRG] Add plotting module with heatmaps for confusion matrix and grid search results #9173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 102 commits into
base: main
Choose a base branch
from

Conversation

thismlguy
Copy link
Contributor

@thismlguy thismlguy commented Jun 20, 2017

Continues PR 8082

Changes made:

  • Added plot functions - plot_confusion_matrix, plot_gridsearch_results
  • Updated examples - plot_rbf_parameters.py, plot_confusion_matrix.py
  • added unit tests for new plot modules

@amueller
Copy link
Member

can you fix the flake8 issues?

Copy link
Member

@amueller amueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very first round ;)

:no-inherited-members:

This module is experimental. Use at your own risk.
Use of this module requires the matplotlib library.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe state version? 1.5?

Copy link
Contributor Author

@thismlguy thismlguy Jun 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified to: "Use of this module requires the matplotlib library,
version 1.5 or later (preferably 2.0)."

:toctree: generated/
:template: function.rst

plot_heatmap
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add functions here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added. this was WIP in previous PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't count on users to read this message. My two cents is that if you really want users to notice that the plotting module is experimental, you'd have to put it in a sub module "experimental" or "future", and only move it to the main namespace when the API is stable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit unclear to me what it would mean for the API to be stable, and I really don't like forcing people to change their code later. I would probably just remove the warning here and then do standard deprecation cycles.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing deprecation cycles is forcing people to change their code eventually, with the additional risk that they won't know that this code is experimental :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecation cycles only sometimes require users to change their code - if they are actually using the feature you're deprecating. That's not very common for most deprecations in scikit-learn. And that is only if there is actually a change.
And I'm not sure what's experimental about this code. The experiment is more having plotting inside scikit-learn. Since it's plotting and therefor user facing, I'd rather have a warning on every call then putting it in a different module.

I guess the thing we are trying to communicate is "don't build long-term projects relying on the presence of plotting in scikit-learn because we might remove it again".

generate the heatmap.
"""

import matplotlib.pyplot as plt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if we want actual tests of the functionality (I'm leaning no), but I think we want at least smoke-tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added smoke-tests which just run the code without asserts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm looks like the config we use for coverage doesn't have matplotlib. We should change that...

if normalize:
values = values.astype('float') / values.sum(axis=1)[:, np.newaxis]

print(title)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print?

from sklearn.plot import plot_heatmap


def plot_gridsearch_results(cv_results, param_grid, metric='mean_test_score',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only works for 2d grid-searches, right? I think we should also support 1d, and error for anything else.

except ImportError:
raise SkipTest("Not testing plot_heatmap, matplotlib not installed.")

import matplotlib.pyplot as plt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm it looks like matplotlib is not installed for the service that computes coverage? hm...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea this seems to be a problem because my tests cover much more than what Codecov is showing. i wasn't able to check coverage locally as I'm using a mac and running matplotlib from terminal is giving errors which I'm not able to resolve yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you resolved those?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope these are tricky to resolve. it requires me to re-install python using a different method and then i'll have to setup my scikit-learn development environment again. since sklearn doesn't test a lot on matplotlib, i'm just using a jupyter-notebook for this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should discuss this in person.

@codecov
Copy link

codecov bot commented Jun 21, 2017

Codecov Report

Merging #9173 into master will decrease coverage by 0.17%.
The diff coverage is 23.68%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9173      +/-   ##
==========================================
- Coverage    96.3%   96.13%   -0.18%     
==========================================
  Files         332      337       +5     
  Lines       60549    60754     +205     
==========================================
+ Hits        58314    58403      +89     
- Misses       2235     2351     +116
Impacted Files Coverage Δ
sklearn/plot/__init__.py 100% <100%> (ø)
sklearn/plot/tests/test_heatmap.py 32.43% <32.43%> (ø)
sklearn/plot/_confusion_matrix.py 33.33% <33.33%> (ø)
sklearn/plot/_heatmap.py 6.66% <6.66%> (ø)
sklearn/plot/_gridsearch_results.py 8.57% <8.57%> (ø)
sklearn/utils/testing.py 89.5% <0%> (-0.04%) ⬇️
sklearn/decomposition/tests/test_pca.py 100% <0%> (ø) ⬆️
sklearn/decomposition/pca.py 94.5% <0%> (ø) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0d5d842...b5e5823. Read the comment docs.

@thismlguy
Copy link
Contributor Author

I think we should also give the user an option to plot everything as a single line graph. This will allow them to get a good plot results in case number of unique parameters are 3 or more but the total number of cases are within say 20.

Adding x-axis labels will be a challenge. But we can give each combo an ID and then print a table below showing the value of each parameter for each ID on the plot.

Thoughts?

@thismlguy
Copy link
Contributor Author

Also, could you help me understand the circleCI error (details here):

Exception occurred:
File "/home/ubuntu/miniconda/envs/testenv/lib/python2.7/site-packages/docutils/writers/_html_base.py", line 671, in depart_document
assert not self.context, 'len(context) = %s' % len(self.context)
AssertionError: len(context) = 1
The full traceback has been saved in /tmp/sphinx-err-RgKdVr.log, if you want to report the issue to the developers.
Please also report this if it was a user error, so that a better error message can be provided next time.
A bug report can be filed in the tracker at https://github.com/sphinx-doc/sphinx/issues. Thanks!
make: *** [html] Error 1

./build_tools/circle/build_doc.sh returned exit code 2

Action failed: ./build_tools/circle/build_doc.sh

@amueller
Copy link
Member

@aarshayj can you merge master? This is an issue with old sphinx, I think, which should be fixed in master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants