Skip to content

[MRG] MAINT Deprecated redundant and specific class from gbrt module #3822

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

arjoly
Copy link
Member

@arjoly arjoly commented Nov 4, 2014

As promise with the merge of #3779, we can now start simplifying the gradient boosting code.

@arjoly arjoly changed the title [MRG] MAINT Deprecated redundant and specific class from gbrt module [WIP] MAINT Deprecated redundant and specific class from gbrt module Nov 4, 2014
@arjoly arjoly changed the title [WIP] MAINT Deprecated redundant and specific class from gbrt module [MRG] MAINT Deprecated redundant and specific class from gbrt module Nov 4, 2014
@arjoly
Copy link
Member Author

arjoly commented Nov 4, 2014

(ping @pprett, @glouppe)

@amueller
Copy link
Member

amueller commented Nov 5, 2014

Travis disagrees ;)

@arjoly
Copy link
Member Author

arjoly commented Nov 6, 2014

Thanks ! I forget to check that testing file before pushing.

I fix the test. GBRT is indeed mi-using the api (use predict function instead of decision_score) with its dummy classifier. It's why DummyClassifier could not be used as a drop in replacement.

@pprett
Copy link
Member

pprett commented Nov 6, 2014

@arjoly In general I'm -1 on this -- I see that code duplication is an argument but having little dependencies is also something that is valuable. Now any change in DummyRegressor needs to be checked against the gradient boosting code.

The main reason why those initial estimators in the gradient boosting modules are so quirky is because they are internal components of the gradient boosting module where I can make assumptions about inputs etc. I optimized the GBRT code for single datapoint predictions and one of the major performance killers there is redundant input checking (e.g. check_arrays is super costly).

I see that I made some poor design decisions here but I think the init feature is so infrequently used that I dont want to break pickle compatilibity just to iron out an infrequently used feature -- I would rather deprecate init.

@arjoly
Copy link
Member Author

arjoly commented Nov 7, 2014

The main reason why those initial estimators in the gradient boosting modules are so quirky is because they are internal components of the gradient boosting module where I can make assumptions about inputs etc. I optimized the GBRT code for single datapoint predictions and one of the major performance killers there is redundant input checking (e.g. check_arrays is super costly).

Then it tells me that all these estimator should be private (starting with _) with comments explaining all the gotchas.

I see that I made some poor design decisions here but I think the init feature is so infrequently used that I dont want to break pickle compatilibity just to iron out an infrequently used feature -- I would rather deprecate init.

There is an issue that complains about not being able to use init for a starting estimate. Though, I think it's a good idea to deprecate this feature.

@arjoly
Copy link
Member Author

arjoly commented Nov 7, 2014

At the moment, I find the implementation complex to understand.

@arjoly
Copy link
Member Author

arjoly commented Nov 10, 2014

@pprett Ok, then I am going to close this pr. I definitely think that the gbrt code should be simpler with more private things.

@arjoly arjoly closed this Nov 10, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants