Replacing np.zeroes by np.empty for Gradient Boosting for better scaling by cores #14380

SmirnovEgorRu · 2019-07-15T19:32:27Z

I have prepared a PR with replacing np.zeroes to np.empty for performance improvement on multi-core systems.
Original issue: #14306

NicolasHug · 2019-07-15T19:43:35Z

Thanks for the PR @SmirnovEgorRu, there seems to be a few unrelated changes though?

NicolasHug

More detailed comments

NicolasHug · 2019-07-16T13:29:59Z

sklearn/ensemble/_hist_gradient_boosting/histogram.pyx

                shape=(self.n_features, self.max_bins),
                dtype=HISTOGRAM_DTYPE
            )

-        for feature_idx in prange(n_features, schedule='static', nogil=True):
+        # for feature_idx in prange(n_features, schedule='static', nogil=True):
+        for feature_idx in range(n_features):


Why replace prange with range?

NicolasHug · 2019-07-16T13:32:02Z

sklearn/ensemble/_hist_gradient_boosting/histogram.pyx

@@ -103,7 +103,9 @@ cdef class HistogramBuilder:

    def compute_histograms_brute(
            HistogramBuilder self,
-            const unsigned int [::1] sample_indices):  # IN
+            const unsigned int [::1] sample_indices,
+            hist_struct [:, ::1] parent):  # IN


I don't fully understand why you need to change the API here.

We have 2 routines to compute histograms:

compute_histograms_subtraction, used when we know the histogram of the parent and the sibling of a node (using the subtraction trick)

compute_histograms_brute used in all other cases. With that function, we do not need the histogram of the parent.

ogrisel · 2019-07-18T10:00:40Z

For reference @NicolasHug did a similar PR here: #14392 but unfortunately this does not seem to solve the scalabilty profile.

NicolasHug · 2020-09-04T13:46:01Z

Closing as a duplicate of #18341 (credited you there).
Thanks @SmirnovEgorRu !

replacing np.zeroes by np.empty for GradientBoosting for better scalling

d72d8c6

SmirnovEgorRu changed the title ~~replacing np.zeroes by np.empty for GradientBoosting for better scalling~~ Replacing np.zeroes by np.empty for Gradient Boosting for better scaling by cores Jul 15, 2019

SmirnovEgorRu mentioned this pull request Jul 15, 2019

Multicore scalability of the Histogram-based GBDT #14306

Open

NicolasHug reviewed Jul 16, 2019

View reviewed changes

github-actions bot added the module:ensemble label Mar 2, 2020

NicolasHug mentioned this pull request Sep 4, 2020

[MRG] MNT Initialize histograms in parallel and don't call np.zero in Hist-GBDT #18341

Merged

NicolasHug closed this Sep 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Replacing np.zeroes by np.empty for Gradient Boosting for better scaling by cores #14380

Replacing np.zeroes by np.empty for Gradient Boosting for better scaling by cores #14380

Uh oh!

SmirnovEgorRu commented Jul 15, 2019

Uh oh!

NicolasHug commented Jul 15, 2019

Uh oh!

NicolasHug left a comment

Uh oh!

NicolasHug Jul 16, 2019

Uh oh!

NicolasHug Jul 16, 2019

Uh oh!

ogrisel commented Jul 18, 2019

Uh oh!

NicolasHug commented Sep 4, 2020 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Replacing np.zeroes by np.empty for Gradient Boosting for better scaling by cores #14380

Replacing np.zeroes by np.empty for Gradient Boosting for better scaling by cores #14380

Uh oh!

Conversation

SmirnovEgorRu commented Jul 15, 2019

Uh oh!

NicolasHug commented Jul 15, 2019

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

NicolasHug Jul 16, 2019

Choose a reason for hiding this comment

Uh oh!

NicolasHug Jul 16, 2019

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Jul 18, 2019

Uh oh!

NicolasHug commented Sep 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

NicolasHug commented Sep 4, 2020 •

edited

Loading