-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DOC rework plot_document_classification_20newsgroups.py example #22928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I triggered the CI again because the artifacts were not generated |
…arn into doc_classification
…nto doc_classification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, here is a first pass of feedback.
I also think you should append your name in the list of authors of this file in the header comment.
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
…nto doc_classification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the nice refactoring. Here is a batch of feedback!
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
…nto doc_classification
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments.
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Thank you for your contribution, @ArturoAmorQ. 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a final review and here are a few more nitpicks but otherwise LGTM.
This example is much more interesting to read that it used to be :) Thank you very much for this contribution @ArturoAmorQ !
…it-learn#22928) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Reference Issues/PRs
What does this implement/fix? Explain your changes.
The 20newsgroups classification example is our only document classification
example and it can/should be more didactic. Therefore, this PR reworks it by:
ConfusionMatrixDisplay
to visualize resultsAny other comments?
The demo uses
RidgeClassifier
as it has a relatively low training time, but any classifier with acoef_
attribute may work.This example uses
TfidfVectorizer
only. A comparison of its performance with respect toHashingVectorizer
will be craft in another example.