-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Fixes #11128 : Default n_estimator value should be 100 #11172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
b6117a3
Changed RandomForest and ExtraTreesClassifier default n_estimators to…
Olamyy 7253a7c
Changed RandomForest and ExtraTreesClassifier default n_estimators to…
Olamyy ec68bf2
Updated depracation logic to follow the development guideline
Olamyy 1ec2b4d
Updated depracation logic to follow the development guideline
Olamyy ac2041b
Moved logic to fit and updated documentation to API Changes
Olamyy 939ced3
Added test function to test depreaction warning
Olamyy a4fcc2e
Removed redundant test function and included test for correct values …
Olamyy e584e55
Changed DeprecationWarning to Future Warning. Updated version from 0.…
Olamyy 8b5cb95
Changed DeprecationWarning to Future Warning. Updated version from 0.…
Olamyy 3654c3e
Changed assert_warns to assert_no_wars
Olamyy 22dc7bd
Removed poldata.
Olamyy f90d675
Fixed assert_no_warnings call in test function
Olamyy 8d96f42
Made requested changes: Reduced warning line to meet 80 char requirem…
Olamyy e8147fc
Merge branch 'master' into n_estimator_should_be_100
Olamyy bdb2a3b
Reverted unintended commit back to original state
Olamyy 476baeb
Removing unused imports
Olamyy 6c3bac7
Implementing warn and use 10
Olamyy 4b63cb1
Implementing warn and use 10
Olamyy ce2e20a
Implementing warn and use 10
Olamyy 92e3489
Implementing warn and use 10
Olamyy d616b33
Rename test function to show intent
Olamyy f52c170
Reduced line numbers to meet required 79 characters
Olamyy a2b0d2d
Cleaning trailing spaces.
Olamyy cd10b03
Switching to `is None` from `if not`
Olamyy 5953a72
Update function name to show correct intent
Olamyy File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,118 @@ | ||
|
||
======= | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't belong here... |
||
|
||
Introduction | ||
|
||
This README v2.0 (June, 2004) for the v2.0 polarity dataset comes from | ||
the URL http://www.cs.cornell.edu/people/pabo/movie-review-data . | ||
|
||
======= | ||
|
||
What's New -- June, 2004 | ||
|
||
This dataset represents an enhancement of the review corpus v1.0 | ||
described in README v1.1: it contains more reviews, and labels were | ||
created with an improved rating-extraction system. | ||
|
||
======= | ||
|
||
Citation Info | ||
|
||
This data was first used in Bo Pang and Lillian Lee, | ||
``A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization | ||
Based on Minimum Cuts'', Proceedings of the ACL, 2004. | ||
|
||
@InProceedings{Pang+Lee:04a, | ||
author = {Bo Pang and Lillian Lee}, | ||
title = {A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts}, | ||
booktitle = "Proceedings of the ACL", | ||
year = 2004 | ||
} | ||
|
||
======= | ||
|
||
Data Format Summary | ||
|
||
- review_polarity.tar.gz: contains this readme and data used in | ||
the experiments described in Pang/Lee ACL 2004. | ||
|
||
Specifically: | ||
|
||
Within the folder "txt_sentoken" are the 2000 processed down-cased | ||
text files used in Pang/Lee ACL 2004; the names of the two | ||
subdirectories in that folder, "pos" and "neg", indicate the true | ||
classification (sentiment) of the component files according to our | ||
automatic rating classifier (see section "Rating Decision" below). | ||
|
||
File names consist of a cross-validation tag plus the name of the | ||
original html file. The ten folds used in the Pang/Lee ACL 2004 paper's | ||
experiments were: | ||
|
||
fold 1: files tagged cv000 through cv099, in numerical order | ||
fold 2: files tagged cv100 through cv199, in numerical order | ||
... | ||
fold 10: files tagged cv900 through cv999, in numerical order | ||
|
||
Hence, the file neg/cv114_19501.txt, for example, was labeled as | ||
negative, served as a member of fold 2, and was extracted from the | ||
file 19501.html in polarity_html.zip (see below). | ||
|
||
Each line in each text file corresponds to a single sentence, as | ||
determined by Adwait Ratnaparkhi's sentence boundary detector | ||
MXTERMINATOR. | ||
|
||
Preliminary steps were taken to remove rating information from the | ||
text files, but only the rating information upon which the rating | ||
decision was based is guaranteed to have been removed. Thus, if the | ||
original review contains several instances of rating information, | ||
potentially given in different forms, those not recognized as valid | ||
ratings remain part of the review text. | ||
|
||
- polarity_html.zip: The original source files from which the | ||
processed, labeled, and (randomly) selected data in | ||
review_polarity.tar.gz was derived. | ||
|
||
Specifically: | ||
|
||
This data consists of unprocessed, unlabeled html files from the | ||
IMDb archive of the rec.arts.movies.reviews newsgroup, | ||
http://reviews.imdb.com/Reviews. The files in review_polarity.tar.gz | ||
represent a processed subset of these files. | ||
|
||
======= | ||
|
||
Rating Decision (Appendix A) | ||
|
||
This section describes how we determined whether a review was positive | ||
or negative. | ||
|
||
The original html files do not have consistent formats -- a review may | ||
not have the author's rating with it, and when it does, the rating can | ||
appear at different places in the file in different forms. We only | ||
recognize some of the more explicit ratings, which are extracted via a | ||
set of ad-hoc rules. In essence, a file's classification is determined | ||
based on the first rating we were able to identify. | ||
|
||
|
||
- In order to obtain more accurate rating decisions, the maximum | ||
rating must be specified explicitly, both for numerical ratings | ||
and star ratings. ("8/10", "four out of five", and "OUT OF | ||
****: ***" are examples of rating indications we recognize.) | ||
|
||
- With a five-star system (or compatible number systems): | ||
three-and-a-half stars and up are considered positive, | ||
two stars and below are considered negative. | ||
- With a four-star system (or compatible number system): | ||
three stars and up are considered positive, | ||
one-and-a-half stars and below are considered negative. | ||
- With a letter grade system: | ||
B or above is considered positive, | ||
C- or below is considered negative. | ||
|
||
We attempted to recognize half stars, but they are specified in an | ||
especially free way, which makes them difficult to recognize. Hence, | ||
we may lose a half star very occasionally; but this only results in 2.5 | ||
stars in five star system being categorized as negative, which is | ||
still reasonable. | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing
:class:
for the extra trees. Also, the message should read something like "n_estimators
will be changed to 100 in version 0.22. A FutureWarning is raised when the default value is used." or similar.n_estimators=10
doesn't raise a warning (as is tested).