Skip to content

Conversation

patel-zeel
Copy link
Contributor

Reference Issues/PRs

#21598

What does this implement/fix? Explain your changes.

  • Reduce the number of categories from 20 to five.

before: ~104 seconds

11314 documents
20 categories

Supervised SGDClassifier on 100% of the data:
Number of training samples: 8485
Unlabeled samples in training set: 0
Micro-averaged F1 score on test set: 0.913
----------

Supervised SGDClassifier on 20% of the training data:
Number of training samples: 1673
Unlabeled samples in training set: 0
Micro-averaged F1 score on test set: 0.793
----------

SelfTrainingClassifier on 20% of the training data (rest is unlabeled):
Number of training samples: 8485
Unlabeled samples in training set: 6812
End of iteration 1, added 2834 new labels.
End of iteration 2, added 678 new labels.
End of iteration 3, added 279 new labels.
End of iteration 4, added 98 new labels.
End of iteration 5, added 39 new labels.
End of iteration 6, added 27 new labels.
End of iteration 7, added 19 new labels.
End of iteration 8, added 12 new labels.
End of iteration 9, added 9 new labels.
End of iteration 10, added 8 new labels.
Micro-averaged F1 score on test set: 0.834
----------

LabelSpreading on 20% of the data (rest is unlabeled):
Number of training samples: 8485
Unlabeled samples in training set: 6812
Micro-averaged F1 score on test set: 0.651
----------

after: ~9 seconds

2823 documents
5 categories

Supervised SGDClassifier on 100% of the data:
Number of training samples: 2117
Unlabeled samples in training set: 0
Micro-averaged F1 score on test set: 0.888
----------

Supervised SGDClassifier on 20% of the training data:
Number of training samples: 442
Unlabeled samples in training set: 0
Micro-averaged F1 score on test set: 0.746
----------

SelfTrainingClassifier on 20% of the training data (rest is unlabeled):
Number of training samples: 2117
Unlabeled samples in training set: 1675
End of iteration 1, added 1103 new labels.
End of iteration 2, added 191 new labels.
End of iteration 3, added 67 new labels.
End of iteration 4, added 8 new labels.
End of iteration 5, added 6 new labels.
End of iteration 6, added 8 new labels.
End of iteration 7, added 6 new labels.
End of iteration 8, added 5 new labels.
End of iteration 9, added 2 new labels.
End of iteration 10, added 1 new labels.
Micro-averaged F1 score on test set: 0.831
----------

LabelSpreading on 20% of the data (rest is unlabeled):
Number of training samples: 2117
Unlabeled samples in training set: 1675
Micro-averaged F1 score on test set: 0.700
----------

@adrinjalali adrinjalali mentioned this pull request Nov 15, 2021
41 tasks
@adrinjalali
Copy link
Member

Is this now 9s with LabelSpreading? If yes, you can put it back on, since it's escaped at in the CI at the end of the example.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @patel-zeel

@patel-zeel
Copy link
Contributor Author

Thanks a lot for approving my first PR on sklearn @adrinjalali :)

@adrinjalali adrinjalali changed the title Improve plot_semi_supervised_newsgroups example Improve plot_semi_supervised_newsgroups.py example Nov 23, 2021
@adrinjalali
Copy link
Member

This should be an easy review, @jjerphan maybe?

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @patel-zeel

LGTM!

@thomasjpfan thomasjpfan changed the title Improve plot_semi_supervised_newsgroups.py example ENH Improve plot_semi_supervised_newsgroups.py example Nov 23, 2021
@thomasjpfan thomasjpfan merged commit 73d0b4f into scikit-learn:main Nov 23, 2021
@patel-zeel
Copy link
Contributor Author

Thank you for approving @thomasjpfan.

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Nov 29, 2021
samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Dec 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants