Skip to content

[WIP] Earth (MARS) #2285

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 23 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
87db631
Made changes to pipeline to allow more general parameter passing
jcrudy Jun 13, 2013
c5b716a
Revert "Made changes to pipeline to allow more general parameter pass…
jcrudy Jun 30, 2013
67cd399
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn
jcrudy Jun 30, 2013
bb88ee7
Merge remote-tracking branch 'upstream/master'
jcrudy Jul 21, 2013
883775b
Added earth. Not passing all tests yet.
jcrudy Jul 22, 2013
6e9fe4e
Added documentation. Still need to figure out how to deal with citat…
jcrudy Jul 24, 2013
3091cbc
Working on examples
jcrudy Jul 26, 2013
e25cbd0
Updated the earth code from latest pyearth
jcrudy Jul 26, 2013
dc1c2a1
Merged upstream changes
jcrudy Jul 26, 2013
73ae7d3
Fixe imports and got tests working
jcrudy Jul 26, 2013
0a693ed
Fixed imports again. Not really sure how they got un-fixed.
jcrudy Jul 28, 2013
8ad2521
Made the tests use relative imports
jcrudy Jul 28, 2013
a328d07
Made Earth pass the common tests
jcrudy Jul 28, 2013
fc8241b
merged upstream
jcrudy Jul 28, 2013
f048415
removed unused gcv
jcrudy Jul 28, 2013
7e63062
autopep8 on everything
jcrudy Jul 28, 2013
0ef9d18
Fixed white space
jcrudy Sep 8, 2013
e2c027e
Merged classifier comparison and removed benchmark vs the R package
jcrudy Oct 29, 2013
d0997d5
Changed name of Earth to EarthRegressor
jcrudy Oct 29, 2013
f32bfd5
Fixed the remaining examples to be more consistent with the rest of s…
jcrudy Oct 30, 2013
4e22199
Fixed titles of examples
jcrudy Oct 30, 2013
0db8e5a
Changed doc page so that example is pulled from plot_v_functio.py
jcrudy Oct 30, 2013
8c07970
Incorporated some new features and bug fixes from py-earth
jcrudy Nov 30, 2013
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,6 @@ benchmarks/bench_covertype_data/

*.prefs
.pydevproject
build/*
.project
.idea
Binary file added doc/images/hinge.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/piecewise_linear.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
20 changes: 20 additions & 0 deletions doc/modules/classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1151,6 +1151,26 @@ Low-level methods
tree.export_graphviz


.. _earth_ref:

:mod:`sklearn.earth`: Earth
===========================

.. automodule:: sklearn.earth
:no-members:
:no-inherited-members:

**User guide:** See the :ref:`earth` section for further details.

.. currentmodule:: sklearn

.. autosummary::
:toctree: generated/
:template: class.rst

earth.Earth


.. _utils_ref:

:mod:`sklearn.utils`: Utilities
Expand Down
84 changes: 84 additions & 0 deletions doc/modules/earth.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
.. _earth:

========================================
Multivariate Adaptive Regression Splines
========================================

.. currentmodule:: sklearn.earth

Multivariate adaptive regression splines, implemented by the :class:`EarthRegressor` class, is a method for supervised
learning that is most commonly used for feature extraction and selection. ``EarthRegressor`` models can be thought of as linear models in a higher dimensional
basis space. ``EarthRegressor`` automatically searches for interactions and non-linear relationships. Each term in an ``EarthRegressor`` model is a
product of so called "hinge functions". A hinge function is a function that's equal to its argument where that argument
is greater than zero and is zero everywhere else.

.. math::
\text{h}\left(x-t\right)=\left[x-t\right]_{+}=\begin{cases}
x-t, & x>t\\
0, & x\leq t
\end{cases}

.. image:: ../images/hinge.png

An ``EarthRegressor`` model is a linear combination of basis functions, each of which is a product of one
or more of the following:

1. A constant
2. Linear functions of input variables
3. Hinge functions of input variables

For example, a simple piecewise linear function in one variable can be expressed
as a linear combination of two hinge functions and a constant (see below). During fitting, the ``EarthRegressor`` class
automatically determines which variables and basis functions to use.
The algorithm has two stages. First, the
forward pass searches for terms that locally minimize squared error loss on the training set. Next, a pruning pass selects a subset of those
terms that produces a locally minimal generalized cross-validation (GCV) score. The GCV
score is not actually based on cross-validation, but rather is meant to approximate a true
cross-validation score by penalizing model complexity. The final result is a set of basis functions
that is nonlinear in the original feature space, may include interactions, and is likely to
generalize well.


.. math::
y=1-2\text{h}\left(1-x\right)+\frac{1}{2}\text{h}\left(x-1\right)


.. image:: ../images/piecewise_linear.png


A Simple EarthRegressor Example
----------------------

.. literalinclude:: ../auto_examples/earth/plot_v_function.py
:lines: 13-


.. figure:: ../auto_examples/earth/images/plot_v_function_1.png
:target: ../auto_examples/earth/plot_v_function.html
:align: center
:scale: 75%


.. topic:: Bibliography:

1. Friedman, J. (1991). Multivariate adaptive regression splines. The annals of statistics,
19(1), 1–67. http://www.jstor.org/stable/10.2307/2241837
2. Stephen Milborrow. Derived from mda:mars by Trevor Hastie and Rob Tibshirani.
(2012). earth: Multivariate Adaptive Regression Spline Models. R package
version 3.2-3.
3. Friedman, J. (1993). Fast MARS. Stanford University Department of Statistics, Technical Report No 110.
http://statistics.stanford.edu/~ckirby/techreports/LCS/LCS%20110.pdf
4. Friedman, J. (1991). Estimating functions of mixed ordinal and categorical variables using adaptive splines.
Stanford University Department of Statistics, Technical Report No 108.
http://statistics.stanford.edu/~ckirby/techreports/LCS/LCS%20108.pdf
5. Stewart, G.W. Matrix Algorithms, Volume 1: Basic Decompositions. (1998). Society for Industrial and Applied
Mathematics.
6. Bjorck, A. Numerical Methods for Least Squares Problems. (1996). Society for Industrial and Applied
Mathematics.
7. Hastie, T., Tibshirani, R., & Friedman, J. The Elements of Statistical Learning (2nd Edition). (2009).
Springer Series in Statistics
8. Golub, G., & Van Loan, C. Matrix Computations (3rd Edition). (1996). Johns Hopkins University Press.


References 7, 2, 1, 3, and 4 contain discussions likely to be useful to users. References 1, 2, 6, 5,
8, 3, and 4 are useful in understanding the implementation.
63 changes: 63 additions & 0 deletions doc/modules/earth_bibliography.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
@book{Bjorck1996,
address = {Philadelphia},
author = {Bjorck, Ake},
isbn = {0898713609},
publisher = {Society for Industrial and Applied Mathematics},
title = {{Numerical Methods for Least Squares Problems}},
year = {1996}
}
@techreport{Friedman1993,
author = {Friedman, Jerome H.},
institution = {Stanford University Department of Statistics},
title = {{Technical Report No. 110: Fast MARS.}},
url = {http://scholar.google.com/scholar?hl=en\&btnG=Search\&q=intitle:Fast+MARS\#0},
year = {1993}
}
@techreport{Friedman1991a,
author = {Friedman, JH},
institution = {Stanford University Department of Statistics},
publisher = {Stanford University Department of Statistics},
title = {{Technical Report No. 108: Estimating functions of mixed ordinal and categorical variables using adaptive splines}},
url = {http://scholar.google.com/scholar?hl=en\&btnG=Search\&q=intitle:Estimating+functions+of+mixed+ordinal+and+categorical+variables+using+adaptive+splines\#0},
year = {1991}
}
@article{Friedman1991,
author = {Friedman, JH},
journal = {The annals of statistics},
number = {1},
pages = {1--67},
title = {{Multivariate adaptive regression splines}},
url = {http://www.jstor.org/stable/10.2307/2241837},
volume = {19},
year = {1991}
}
@book{Golub1996,
author = {Golub, Gene and {Van Loan}, Charles},
edition = {3},
publisher = {Johns Hopkins University Press},
title = {{Matrix Computations}},
year = {1996}
}
@book{Hastie2009,
address = {New York},
author = {Hastie, Trevor and Tibshirani, Robert and Friedman, Jerome},
edition = {2},
publisher = {Springer Science+Business Media},
title = {{Elements of Statistical Learning: Data Mining, Inference, and Prediction}},
year = {2009}
}
@book{Stewart1998,
address = {Philadelphia},
author = {Stewart, G. W.},
isbn = {0898714141},
publisher = {Society for Industrial and Applied Mathematics},
title = {{Matrix Algorithms Volume 1: Basic Decompositions}},
year = {1998}
}
@misc{Millborrow2012,
author = {Millborrow, Stephen},
publisher = {CRAN},
title = {{earth: Multivariate Adaptive Regression Spline Models}},
url = {http://cran.r-project.org/web/packages/earth/index.html},
year = {2012}
}
1 change: 1 addition & 0 deletions doc/supervised_learning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,4 @@ Supervised learning
modules/label_propagation.rst
modules/lda_qda.rst
modules/isotonic.rst
modules/earth.rst
6 changes: 6 additions & 0 deletions examples/earth/README.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
.. _earth_examples:

Earth examples
----------------

Examples concerning the :mod:`sklearn.earth` package.
39 changes: 39 additions & 0 deletions examples/earth/plot_sine_wave.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
'''
==============================================
Fitting an EarthRegressor model to a sine wave
==============================================


In this example, a simple sine model is used to generate an artificial data set. An :class:`EarthRegressor` model
is then fitted to that data set and the resulting predictions are plotted against the original data.

'''
print(__doc__)

import numpy as np
import pylab as pl
from sklearn.earth import EarthRegressor

# Create some fake data
np.random.seed(2)
m = 10000
n = 10
X = 80 * np.random.uniform(size=(m, n)) - 40
y = 100 * \
np.abs(np.sin((X[:, 6]) / 10) - 4.0) + \
20 * np.random.normal(size=m)

# Fit an EarthRegressor model
model = EarthRegressor(max_degree=3, minspan_alpha=.5)
model.fit(X, y)

# Print the model
print(model.trace())
print(model.summary())

# Plot the model
pl.figure()
y_hat = model.predict(X)
pl.plot(X[:, 6], y, 'r.')
pl.plot(X[:, 6], y_hat, 'b.')
pl.show()
37 changes: 37 additions & 0 deletions examples/earth/plot_v_function.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
'''
======================================================
Fitting an EarthRegressor model to a v-shaped function
======================================================


In this example, a simple piecewise linear model is used to generate an artificial data set. An :class:`Earth` model
is then fitted to that data set and the resulting predictions are plotted against the original data.

'''
print(__doc__)

import numpy as np
from sklearn.earth import EarthRegressor
import pylab as pl

# Create some fake data
np.random.seed(2)
m = 1000
n = 10
X = 80 * np.random.uniform(size=(m, n)) - 40
y = np.abs(X[:, 6] - 4.0) + 5 * np.random.normal(size=m)

# Fit an EarthRegressor model
model = EarthRegressor(max_degree=1)
model.fit(X, y)

# Print the model
print(model.trace())
print(model.summary())

# Plot the model
y_hat = model.predict(X)
pl.figure()
pl.plot(X[:, 6], y, 'r.')
pl.plot(X[:, 6], y_hat, 'b.')
pl.show()
16 changes: 11 additions & 5 deletions examples/plot_classifier_comparison.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
# Code source: Gael Varoqueux
# Andreas Mueller
# Modified for Documentation merge by Jaques Grobler
# Modified to include EarthRegressor by Jason Rudy
# License: BSD 3 clause

import numpy as np
Expand All @@ -41,11 +42,14 @@
from sklearn.naive_bayes import GaussianNB
from sklearn.lda import LDA
from sklearn.qda import QDA
from sklearn.earth import EarthRegressor
from sklearn.linear_model.logistic import LogisticRegression
from sklearn.pipeline import Pipeline

h = .02 # step size in the mesh

names = ["Nearest Neighbors", "Linear SVM", "RBF SVM", "Decision Tree",
"Random Forest", "AdaBoost", "Naive Bayes", "LDA", "QDA"]
"Random Forest", "AdaBoost", "Naive Bayes", "LDA", "QDA", "Earth"]
classifiers = [
KNeighborsClassifier(3),
SVC(kernel="linear", C=0.025),
Expand All @@ -55,7 +59,9 @@
AdaBoostClassifier(),
GaussianNB(),
LDA(),
QDA()]
QDA(),
Pipeline([('earth', EarthRegressor(max_degree=3, penalty=1.5)),
('logistic', LogisticRegression())])]

X, y = make_classification(n_features=2, n_redundant=0, n_informative=2,
random_state=1, n_clusters_per_class=1)
Expand Down Expand Up @@ -104,10 +110,10 @@

# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, m_max]x[y_min, y_max].
if hasattr(clf, "decision_function"):
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
else:
try:
Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
except NotImplementedError:
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])

# Put the result into a color plot
Z = Z.reshape(xx.shape)
Expand Down
8 changes: 8 additions & 0 deletions sklearn/earth/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
"""
The :mod:`sklearn.earth` module contains the the Earth class for multivariate
adaptive regression splines.
"""

from .earth import EarthRegressor

__all__ = ['EarthRegressor']
Loading