Skip to content

Replacing np.zeroes by np.empty for Gradient Boosting for better scaling by cores #14380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

SmirnovEgorRu
Copy link

I have prepared a PR with replacing np.zeroes to np.empty for performance improvement on multi-core systems.
Original issue: #14306

@SmirnovEgorRu SmirnovEgorRu changed the title replacing np.zeroes by np.empty for GradientBoosting for better scalling Replacing np.zeroes by np.empty for Gradient Boosting for better scaling by cores Jul 15, 2019
@NicolasHug
Copy link
Member

Thanks for the PR @SmirnovEgorRu, there seems to be a few unrelated changes though?

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More detailed comments

shape=(self.n_features, self.max_bins),
dtype=HISTOGRAM_DTYPE
)

for feature_idx in prange(n_features, schedule='static', nogil=True):
# for feature_idx in prange(n_features, schedule='static', nogil=True):
for feature_idx in range(n_features):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why replace prange with range?

@@ -103,7 +103,9 @@ cdef class HistogramBuilder:

def compute_histograms_brute(
HistogramBuilder self,
const unsigned int [::1] sample_indices): # IN
const unsigned int [::1] sample_indices,
hist_struct [:, ::1] parent): # IN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand why you need to change the API here.

We have 2 routines to compute histograms:

  • compute_histograms_subtraction, used when we know the histogram of the parent and the sibling of a node (using the subtraction trick)

  • compute_histograms_brute used in all other cases. With that function, we do not need the histogram of the parent.

@ogrisel
Copy link
Member

ogrisel commented Jul 18, 2019

For reference @NicolasHug did a similar PR here: #14392 but unfortunately this does not seem to solve the scalabilty profile.

@NicolasHug
Copy link
Member

NicolasHug commented Sep 4, 2020

Closing as a duplicate of #18341 (credited you there).
Thanks @SmirnovEgorRu !

@NicolasHug NicolasHug closed this Sep 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants