Skip to content

Speed up tree construction #933

@glouppe

Description

@glouppe

CC: @pprett @amueller @bdholt1

Hi folks,

Everyone will agree that tree-based methods have shown to perform quite well (e.g., the recent achievement of Peter!) and are increasingly used by our users. However, the tree module still has a major drawback: it is slow as hell in comparison to other machine learning packages.

For that reason, I think we should put some more effort into accelerating the tree module. In particular, I would like to suggest to move the whole Tree class (not the estimators, but only our struct-of-arrays representation) from tree.py into Cython in _tree.pyx. First the code would be a lot faster. But second, it could also actually be more readable and maintainable if the whole tree construction process was packaged into a single file, in a single class. Currently, the construction process is indeed split across 2 files, estimator classes, the Tree class and all the Cython routines. (imo, this is a mess.)

To show that indeed the construction process could be a lot faster, I profiled recursive_partition using line-profiler (see link below). Insignicant Python instructions do actually take quite some time in comparison to the important parts of the algorithm. E.g., line 314 vs line 320. A mere Python if-statement is only twice faster than finding the best threshold!!!

I let you examine the rest of the profiling report by yourself, but as far as I am concerned, I am convinced that we could indeed significantly speed up the tree module (and be 5-10x faster at least).

http://pastebin.com/0rC1QmPy (toggle text warping)

What's your opinion about this? Since I am increasingly using the module myself, I can actually work on that in the days to come.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions