@@ -227,67 +227,6 @@ alpha parameter, the fewer features selected.
227
227
Processing Magazine [120] July 2007
228
228
http://dsp.rice.edu/sites/dsp.rice.edu/files/cs/baraniukCSlecture07.pdf
229
229
230
- .. _randomized_l1 :
231
-
232
- Randomized sparse models
233
- -------------------------
234
-
235
- .. currentmodule :: sklearn.linear_model
236
-
237
- In terms of feature selection, there are some well-known limitations of
238
- L1-penalized models for regression and classification. For example, it is
239
- known that the Lasso will tend to select an individual variable out of a group
240
- of highly correlated features. Furthermore, even when the correlation between
241
- features is not too high, the conditions under which L1-penalized methods
242
- consistently select "good" features can be restrictive in general.
243
-
244
- To mitigate this problem, it is possible to use randomization techniques such
245
- as those presented in [B2009 ]_ and [M2010 ]_. The latter technique, known as
246
- stability selection, is implemented in the module :mod: `sklearn.linear_model `.
247
- In the stability selection method, a subsample of the data is fit to a
248
- L1-penalized model where the penalty of a random subset of coefficients has
249
- been scaled. Specifically, given a subsample of the data
250
- :math: `(x_i, y_i), i \in I`, where :math: `I \subset \{ 1 , 2 , \ldots , n\}` is a
251
- random subset of the data of size :math: `n_I`, the following modified Lasso
252
- fit is obtained:
253
-
254
- .. math :: \hat{w_I} = \mathrm{arg}\min_{w} \frac{1}{2n_I} \sum_{i \in I} (y_i - x_i^T w)^2 + \alpha \sum_{j=1}^p \frac{ \vert w_j \vert}{s_j},
255
-
256
- where :math: `s_j \in \{ s, 1 \}` are independent trials of a fair Bernoulli
257
- random variable, and :math: `0 <s<1 ` is the scaling factor. By repeating this
258
- procedure across different random subsamples and Bernoulli trials, one can
259
- count the fraction of times the randomized procedure selected each feature,
260
- and used these fractions as scores for feature selection.
261
-
262
- :class: `RandomizedLasso ` implements this strategy for regression
263
- settings, using the Lasso, while :class: `RandomizedLogisticRegression ` uses the
264
- logistic regression and is suitable for classification tasks. To get a full
265
- path of stability scores you can use :func: `lasso_stability_path `.
266
-
267
- .. figure :: ../auto_examples/linear_model/images/sphx_glr_plot_sparse_recovery_003.png
268
- :target: ../auto_examples/linear_model/plot_sparse_recovery.html
269
- :align: center
270
- :scale: 60
271
-
272
- Note that for randomized sparse models to be more powerful than standard
273
- F statistics at detecting non-zero features, the ground truth model
274
- should be sparse, in other words, there should be only a small fraction
275
- of features non zero.
276
-
277
- .. topic :: Examples:
278
-
279
- * :ref: `sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py `: An example
280
- comparing different feature selection approaches and discussing in
281
- which situation each approach is to be favored.
282
-
283
- .. topic :: References:
284
-
285
- .. [B2009 ] F. Bach, "Model-Consistent Sparse Estimation through the
286
- Bootstrap." https://hal.inria.fr/hal-00354771/
287
-
288
- .. [M2010 ] N. Meinshausen, P. Buhlmann, "Stability selection",
289
- Journal of the Royal Statistical Society, 72 (2010)
290
- http://arxiv.org/pdf/0809.2932.pdf
291
230
292
231
Tree-based feature selection
293
232
----------------------------
0 commit comments