Skip to content

Commit 4afccb9

Browse files
authored
MNT replace fetch_california_housing with make_regression in getting_started.rst and compose.rst (scikit-learn#31579)
1 parent 9bf3c41 commit 4afccb9

File tree

2 files changed

+22
-13
lines changed

2 files changed

+22
-13
lines changed

doc/getting_started.rst

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -167,13 +167,17 @@ a :class:`~sklearn.ensemble.RandomForestRegressor` that has been fitted with
167167
the best set of parameters. Read more in the :ref:`User Guide
168168
<grid_search>`::
169169

170-
>>> from sklearn.datasets import fetch_california_housing
170+
>>> from sklearn.datasets import make_regression
171171
>>> from sklearn.ensemble import RandomForestRegressor
172172
>>> from sklearn.model_selection import RandomizedSearchCV
173173
>>> from sklearn.model_selection import train_test_split
174174
>>> from scipy.stats import randint
175175
...
176-
>>> X, y = fetch_california_housing(return_X_y=True)
176+
>>> # create a synthetic dataset
177+
>>> X, y = make_regression(n_samples=20640,
178+
... n_features=8,
179+
... noise=0.1,
180+
... random_state=0)
177181
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
178182
...
179183
>>> # define the parameter space that will be searched over
@@ -196,7 +200,7 @@ the best set of parameters. Read more in the :ref:`User Guide
196200
>>> # the search object now acts like a normal random forest estimator
197201
>>> # with max_depth=9 and n_estimators=4
198202
>>> search.score(X_test, y_test)
199-
0.73...
203+
0.84...
200204

201205
.. note::
202206

doc/modules/compose.rst

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -286,12 +286,17 @@ the regressor that will be used for prediction, and the transformer that will
286286
be applied to the target variable::
287287

288288
>>> import numpy as np
289-
>>> from sklearn.datasets import fetch_california_housing
289+
>>> from sklearn.datasets import make_regression
290290
>>> from sklearn.compose import TransformedTargetRegressor
291291
>>> from sklearn.preprocessing import QuantileTransformer
292292
>>> from sklearn.linear_model import LinearRegression
293293
>>> from sklearn.model_selection import train_test_split
294-
>>> X, y = fetch_california_housing(return_X_y=True)
294+
>>> # create a synthetic dataset
295+
>>> X, y = make_regression(n_samples=20640,
296+
... n_features=8,
297+
... noise=100.0,
298+
... random_state=0)
299+
>>> y = np.exp( 1 + (y - y.min()) * (4 / (y.max() - y.min())))
295300
>>> X, y = X[:2000, :], y[:2000] # select a subset of data
296301
>>> transformer = QuantileTransformer(output_distribution='normal')
297302
>>> regressor = LinearRegression()
@@ -300,11 +305,11 @@ be applied to the target variable::
300305
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
301306
>>> regr.fit(X_train, y_train)
302307
TransformedTargetRegressor(...)
303-
>>> print('R2 score: {0:.2f}'.format(regr.score(X_test, y_test)))
304-
R2 score: 0.61
308+
>>> print(f"R2 score: {regr.score(X_test, y_test):.2f}")
309+
R2 score: 0.67
305310
>>> raw_target_regr = LinearRegression().fit(X_train, y_train)
306-
>>> print('R2 score: {0:.2f}'.format(raw_target_regr.score(X_test, y_test)))
307-
R2 score: 0.59
311+
>>> print(f"R2 score: {raw_target_regr.score(X_test, y_test):.2f}")
312+
R2 score: 0.64
308313

309314
For simple transformations, instead of a Transformer object, a pair of
310315
functions can be passed, defining the transformation and its inverse mapping::
@@ -321,8 +326,8 @@ Subsequently, the object is created as::
321326
... inverse_func=inverse_func)
322327
>>> regr.fit(X_train, y_train)
323328
TransformedTargetRegressor(...)
324-
>>> print('R2 score: {0:.2f}'.format(regr.score(X_test, y_test)))
325-
R2 score: 0.51
329+
>>> print(f"R2 score: {regr.score(X_test, y_test):.2f}")
330+
R2 score: 0.67
326331

327332
By default, the provided functions are checked at each fit to be the inverse of
328333
each other. However, it is possible to bypass this checking by setting
@@ -336,8 +341,8 @@ each other. However, it is possible to bypass this checking by setting
336341
... check_inverse=False)
337342
>>> regr.fit(X_train, y_train)
338343
TransformedTargetRegressor(...)
339-
>>> print('R2 score: {0:.2f}'.format(regr.score(X_test, y_test)))
340-
R2 score: -1.57
344+
>>> print(f"R2 score: {regr.score(X_test, y_test):.2f}")
345+
R2 score: -3.02
341346

342347
.. note::
343348

0 commit comments

Comments
 (0)