-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] Earth (MARS) #2285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Earth (MARS) #2285
Conversation
…ing" This reverts commit 87db631.
Getting recent changes from scikit-learn people.
I thought they were working before. Maybe I ran the wrong tests or something.
Thanks! I believe this is a great addition, but it will indeed need some work to be to our standards. Also, don't expect any review right now - we are in the middle of releasing the new 0.14 version. Everyone is busy at it :) (Note: don't hesitate to ping us again when 0.14 is released, to have a deeper look at this.) |
I haven't had time to look at this, but I am very very excited about |
same here ;) |
Thanks, guys:). I'm going to be very busy for the next two weeks anyway because I'm moving. After I'm settled I'll ping this thread again and probably put some more work into documentation. |
I'm going to have some time to work on this soon. Does anyone have time to give it a look and point me in the right direction? If not I will put time into documentation, but it would be great to get some feedback to move the code forward as well. |
This PR is really huge! Can you point the most important parts for which you need a initial review? Also, can you explain us in a few words what is each Cython file used for? (to help us have a good overview of your code) |
The file that's most likely needs attention is earth.py. It contains the Earth class, which is the only class that users are expected to interact with directly. I have tried to make it conform to the sklearn interface standards, but I am not sure how successful I was. My main question is whether I need to change anything to make it more compatible with the rest of sklearn. The pyx files contain the following stuff: _basis.pyx : classes that represent MARS basis functions and sets of basis functions Perhaps one place I should focus is on commenting the parts of the code that are not user-facing. The following other things might be issues:
|
If you need a reference for MARS, wikipedia has a great general description of the algorithm: http://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines |
I don't have time to have look tonight but you can start here: http://scikit-learn.org/stable/developers/index.html#contributing-code Also as the official name of the algorithm is MARS I think the name of the classes in scikit-learn should be one of:
I would also remove all references to "earth" and replace them by "mars" in module names, function names, variable names and so on. We should also not commit images in the doc but replace them with plot automatically generated from the examples when possible. For instance, the plot in this rendered paragraph: http://scikit-learn.org/stable/modules/ensemble.html#extremely-randomized-trees is rendered by this example: https://github.com/scikit-learn/scikit-learn/blob/master/examples/ensemble/plot_forest_iris.py which is included in the reST source: https://raw.github.com/scikit-learn/scikit-learn/master/doc/modules/ensemble.rst |
|
||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please never insert more than 2 consecutive blank lines in reST or python code.
Great feedback, guys. Keep it coming. Regarding the name, the problem with MARS is that, as I understand the situation, it is under copyright (only based on a comment on CRAN by Steve Milborrow: http://cran.r-project.org/web/packages/earth/index.html). I don't know if that matters for the name of a class within a larger software project like scikit-learn. Perhaps someone has more knowledge of copyright law than I do. Given that information, if all of the core developers are okay with MARSRegressor then so am I. If anyone is worried about copyrights, then I think MultiVariateAdaptiveSplineRegressor might be a good choice. |
a quick look suggest there is a bit of work in py-earth to make sklearn
happy.
I would start with flake8 on all files, avoid public methods besides fit,
predict, transform, + checking import orders, starting examples with
plot_*, test coverage to reach > 90% as we have now in sklearn
|
Great, thank you @agramfort. A clarification: what do you mean by checking import orders? @mehdidc, is there any part of that you're interested in taking on? As one of these involves removing some functionality (avoiding public methods other than fit, predict, and transform), I would suggest that at least that change should be done as part of integration into sklearn. The rest can be done directly in py-earth. |
@jcrudy : In addition to what He suggested above, @agramfort gave a good todo list here : #4341, I am working on it. |
Hey everyone, I am planning to take this project of integrating pyearth into sklearn forward. I'll create a new branch, make a new |
We recently created a new organization called scikit-learn-contrib whose goal is to host scikit-learn compatible projects: This is very much a work in progress but our goal is to gather nice projects in the scikit-learn ecosystem. We just created it so only lightning is there but more projects will be added in the future. We are thinking py-earth could possibly be moved there. The advantage is that py-earth would become more collaborative. When lightning was hosted in my account I was a bottleneck since I was the only one who could merge PRs. But now others have joined and bugs can be fixed more quickly. Earth / Mars is sufficiently mature to warrant inclusion in scikit-learn but moving py-earth to scikit-learn-contrib would probably involve much less work. |
cc @jcrudy
|
So probably one of the first things we should do is add a description of this in the scikit learn readme so that the users get to know about it. |
it's very fresh. We'll do it when we really advertise the
scikit-learn-contrib org.
|
Yes it's too early to advertise it. I just informed you so you don't loose time. |
@agramfort just wanted to ask one more thing, are the remaining projects of adding GAM (and for LSS), SpAM and LISO to sklearn still of any interest? |
I don't consider them as priorities; The > 400 open PRs we have are a
bigger concern to me to be honest...
|
@jcrudy This is very much a work in progress but I have uploaded the basic workflow for inclusion in scikit-learn-contrib. If you're interested in moving py-earth there, we can provide some assistance. |
@mblondel Yes, I would like to do that. I think it's a great idea. I'm probably too busy to look at it much for the next couple weeks, but I will as soon as I can, and I'll start with the workflow you uploaded. Is there anything else I should keep in mind? |
@mehdidc are you up for it?
|
@agramfort Yes ! @jcrudy @mblondel Ok so I first go through the requirements then add a request in scikit-learn-contrib |
@mehdidc There is a branch of py-earth called |
@jcrudy Great ! it will be nice to have arbitrary sample weights. Ok I see, thanks I will let you know |
I see that @jcrudy made some progress with Py-Earth. Which I think make it even more appetising. I know there are lots of open PR's for Scikitlearn, but I think MARS should get in there. Has anyone made a todo list for what needs to be done for this port? |
If it's ok with @jcrudy and @mehdidc, I think we should transfer [*] py-earth to the contrib organization and collaboratively work on meeting the requirements there. We need a project base so as to make the contrib more attractive. Currently, with lighting only, this feels a bit sad. [*] Transferring rather than forking is important so as to not loose branches and stars. |
@mblondel Do I need to be added to the contrib organization in order to make the transfer? |
@jcrudy you're now a member of scikit-learn-contrib org :)
|
@agramfort @mehdidc @mblondel Transfer complete. The py-earth project is now a part of scikit-learn-contrib! How is progress on making it actually compliant with the requirements? It seems it might be actually there, since @mehdidc has addressed the problems it was having with |
@jcrudy Thanks a lot! You might also want to fork the project in your github. This way, the old project URL is still valid. |
Closed this pull-request. Let's create an issue directly in the project to follow the progress of meeting the requirements. |
I added a py-earth team. @jcrudy you should be able to add collaborators to the team. |
This pull request adds multivariate adaptive regression splines to scikit-learn.
It probably could still use more work, but I'm looking for feedback on how best to proceed. Currently my plan is to focus on documentation going forward.