-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] ENH: dataset-fetching with use figshare and checksum #9240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] ENH: dataset-fetching with use figshare and checksum #9240
Conversation
Thanks for this. Much appreciated. |
do we want this for the release? |
it is labeled as such
…On Wed, Jun 28, 2017 at 5:13 PM Andreas Mueller ***@***.***> wrote:
do we want this for the release?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#9240 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGt-46pAF2h0SdTHX2wImITWh28__u0lks5sIm2lgaJpZM4OIGv_>
.
|
I would like it if it's close enough: it avoids people complaining when less reliable servers break. |
Before we merge, one of the problem with logging is that it does not have the same behaviour with python 2.7 and python 3 (I was not aware of that I have to say or maybe I forgot): import logging
logger = logging.getLogger('logger')
logger.warn('warning') Python 2.7:
Python 3:
For this PR in particular that means that on Python 2.7 no messages will be printed about the datasets being downloaded. Here is what I propose (after a quick chat with @ogrisel):
Note logging is only used in |
sklearn/__init__.py
Outdated
logger = logging.getLogger(__name__) | ||
if six.PY2: | ||
logger.addHandler(logging.StreamHandler()) | ||
logger.level = logging.INFO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a small comment to state that this ensures that we get a informative message on stdout by default when downloading a dataset on Python 2 (so as to replicate the default behavior of the logging module in Python 3).
0d0664d
to
2037f34
Compare
is there a way to make the handler only apply to logging emitted by
scikit-learn?
…On 2 Aug 2017 6:01 pm, "Olivier Grisel" ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sklearn/__init__.py
<#9240 (comment)>
:
> @@ -17,6 +17,14 @@
import warnings
import os
from contextlib import contextmanager as _contextmanager
+import logging
+
+import six
+
+logger = logging.getLogger(__name__)
+if six.PY2:
+ logger.addHandler(logging.StreamHandler())
+logger.level = logging.INFO
Maybe add a small comment to state that this ensures that we get a
informative message on stdout by default when downloading a dataset on
Python 2 (so as to replicate the default behavior of the logging module in
Python 3).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9240 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6--gjteY3_qG3NxYN9I8eVSds_nRks5sUCzugaJpZM4OIGv_>
.
|
This is already the case: logger = logging.getLogger(__name__) is only used by sklearn components ( |
I think this is going to have this behaviour as long as we use |
2037f34
to
91b7c3b
Compare
@@ -17,6 +17,11 @@ | |||
import warnings | |||
import os | |||
from contextlib import contextmanager as _contextmanager | |||
import logging | |||
|
|||
logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not "sklearn"
instead of __name__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is the just the general convention right?
I found this from the Python doc
A good convention to use when naming loggers is to use a module-level logger, in each module which uses logging, named as follows:
logger = logging.getLogger(__name__)
and this from the Hitchhiker's guide to Python:
Best practice when instantiating loggers in a library is to only create them using the
__name__
global variable: the logging module creates a hierarchy of loggers using dot notation, so using__name__
ensures no name collisions.
Move logger.warning to logger.info and logger.info to logger.debug [doc build]
91b7c3b
to
6daa256
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1 for merge and backport to 0.19.X. @amueller any further comment?
Not sure whether it is entirely necessary but 5d040fc may need to be backported in 0.19.X (Add download_if_missing argument to fetch_20newsgroups_vectorized) |
Happy to merge now and then I'll go through all the necessary backports. |
Merged, will do the backport to 0.19.X. |
Cool, though I think it'd be good to do a final pass to see if everything is backported that we want. Though an issue in scikit-optimize kinda points toward re-thinking the decision of making |
+1 for just release but I will have to have a look at the pickling error of scikit-optimize/scikit-optimize#461. |
I think one of us should compile a PR with everything from master that is
safe to go into 0.19, to be reviewed, merged and released. I'm happy to
have a go.
…On 4 Aug 2017 9:30 pm, "Olivier Grisel" ***@***.***> wrote:
+1 for just release but I will have to have a look at the pickling error
of scikit-optimize/scikit-optimize#461
<scikit-optimize/scikit-optimize#461>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9240 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6-B4KwUOHyr2SZDkvOMsTzk43ekQks5sUwDrgaJpZM4OIGv_>
.
|
Reference Issue
Fixes #7425. Fixes #8089
What does this implement/fix? Explain your changes.
fetch_
...) to figshare and changed their URLs accordingly.Any other comments?
This PR takes over (Fixes #7429)
cc: @nelson-liu, @jnothman