-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG+1] Deprecate fetch_mldata #11466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
60d08b9
52491a1
ea8f31b
1b40d08
63c9509
ee3784f
e9a59c7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -257,7 +257,6 @@ Loaders | |
datasets.fetch_kddcup99 | ||
datasets.fetch_lfw_pairs | ||
datasets.fetch_lfw_people | ||
datasets.fetch_mldata | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here, maybe still keep the doc even if it's deprecated? Who knows maybe the website will work again in september, and there is still code using this. Those users will no longer be able to find any documentation about this function in the stable docs... A doc with a large deprecation warning would avoid us some opened issues from confused users. |
||
datasets.fetch_olivetti_faces | ||
datasets.fetch_openml | ||
datasets.fetch_rcv1 | ||
|
@@ -1513,6 +1512,7 @@ To be removed in 0.22 | |
:template: deprecated_function.rst | ||
|
||
covariance.graph_lasso | ||
datasets.fetch_mldata | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will generate the docs with the large deprecation warning. |
||
|
||
|
||
To be removed in 0.21 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,7 +8,7 @@ | |
hyperparameter optimization using gradient ascent on the | ||
log-marginal-likelihood. The data consists of the monthly average atmospheric | ||
CO2 concentrations (in parts per million by volume (ppmv)) collected at the | ||
Mauna Loa Observatory in Hawaii, between 1958 and 1997. The objective is to | ||
Mauna Loa Observatory in Hawaii, between 1958 and 2001. The objective is to | ||
model the CO2 concentration as a function of the time t. | ||
|
||
The kernel is composed of several terms that are responsible for explaining | ||
|
@@ -57,24 +57,59 @@ | |
explained by the model. The figure shows also that the model makes very | ||
confident predictions until around 2015. | ||
""" | ||
print(__doc__) | ||
|
||
# Authors: Jan Hendrik Metzen <jhm@informatik.uni-bremen.de> | ||
# | ||
# License: BSD 3 clause | ||
|
||
from __future__ import division, print_function | ||
|
||
import numpy as np | ||
|
||
from matplotlib import pyplot as plt | ||
|
||
from sklearn.gaussian_process import GaussianProcessRegressor | ||
from sklearn.gaussian_process.kernels \ | ||
import RBF, WhiteKernel, RationalQuadratic, ExpSineSquared | ||
from sklearn.datasets import fetch_mldata | ||
try: | ||
from urllib.request import urlopen | ||
except ImportError: | ||
# Python 2 | ||
from urllib2 import urlopen | ||
|
||
print(__doc__) | ||
|
||
|
||
data = fetch_mldata('mauna-loa-atmospheric-co2').data | ||
X = data[:, [1]] | ||
y = data[:, 0] | ||
def load_mauna_loa_atmospheric_c02(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should open an issue that this should be uploaded to openml, as you suggested? |
||
url = ('http://cdiac.ess-dive.lbl.gov/' | ||
'ftp/trends/co2/sio-keel-flask/maunaloa_c.dat') | ||
months = [] | ||
ppmv_sums = [] | ||
counts = [] | ||
for line in urlopen(url): | ||
line = line.decode('utf8') | ||
if not line.startswith('MLO'): | ||
# ignore headers | ||
continue | ||
station, date, weight, flag, ppmv = line.split() | ||
y = date[:2] | ||
m = date[2:4] | ||
month_float = (int(('20' if y < '20' else '19') + y) + | ||
(int(m) - 1) / 12) | ||
if not months or month_float != months[-1]: | ||
months.append(month_float) | ||
ppmv_sums.append(float(ppmv)) | ||
counts.append(1) | ||
else: | ||
# aggregate monthly sum to produce average | ||
ppmv_sums[-1] += float(ppmv) | ||
counts[-1] += 1 | ||
|
||
months = np.asarray(months).reshape(-1, 1) | ||
avg_ppmvs = np.asarray(ppmv_sums) / counts | ||
return months, avg_ppmvs | ||
|
||
|
||
X, y = load_mauna_loa_atmospheric_c02() | ||
|
||
# Kernel with parameters given in GPML book | ||
k1 = 66.0**2 * RBF(length_scale=67.0) # long term smooth rising trend | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's better to remove this section altogether rather than say put a deprecation note at the begining?