Run itrees in parallel during prediction. #14001

sergiormpereira · 2019-05-31T17:27:10Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Isolation Forest is executed in parallel during fitting. But, during prediction it is running single-threaded.

In this PR, I parallelised the execution during prediction, more precisely in the _compute_score_samples method. Each ITree was being called in sequence, using a for loop. I created an auxiliary internal function that executes each tree, and this function can be run in parallel. I used joblib for parallelisation.

Any other comments?

I run the tests for Isolation Forest and it passed.

amueller · 2019-05-31T18:08:48Z

When is this actually faster? We did this for the random forests at some point but I think we found that it's slower. Can you provide some benchmarks?

sergiormpereira · 2019-06-04T09:43:05Z

Hey @amueller! Thanks a lot for the feedback.

This is slower for small amounts of test data (1000 samples), but it still runs in around 120 ms in our tests. However, we can see that it gets faster than running single threaded as we increase the amount of data and number of trees.

Please, find the benchmark in the following notebook, with some comments:
https://github.com/TechhubLisbon/scikit-learn/blob/iforest-parallel-predict-benchmark/benchmarks/bench_isolation_forest_parallel_predict.ipynb

Let me know if I should run some more tests.

sergiormpereira · 2019-06-11T08:34:50Z

ping @amueller :)

agramfort · 2019-06-16T14:55:59Z

@sergiopasra see how it is done here: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/bagging.py#L356
_partition_estimators allows to parallelize batches of estimators which leads to less threads and is faster

sergiormpereira · 2019-06-19T09:18:45Z

Ok @agramfort . I'll have a look and come back after. Thanks for the feedback!

…rest.py

Trees parallelized in chunks during prediction.

… _compute_score_samples of iforest.py

sergiormpereira · 2019-07-02T12:46:19Z

@agramfort thanks for the feedback!

I updated the PR with your suggestion of dividing the trees into chunks.

I also re-run the benchmark with this approach. You can check the notebook in https://github.com/TechhubLisbon/scikit-learn/blob/iforest-parallel-predict-benchmark/benchmarks/bench_isolation_forest_parallel_predict_v2.ipynb

You can also compare with the first version in https://github.com/TechhubLisbon/scikit-learn/blob/iforest-parallel-predict-benchmark/benchmarks/bench_isolation_forest_parallel_predict.ipynb

In this PR I needed to check the version of joblib to make it pass the tests.

agramfort · 2019-07-05T17:16:15Z

no more objection on my side. You'll need to add a what's new entry.

maybe @albertcthomas you want to have a look?

albertcthomas · 2019-07-08T09:03:56Z

Thanks for the nice benchmark @sergiormpereira, I will have a look at the code before the end of the week.

albertcthomas

It might be good to have a test checking that the output of score_samples is the same when n_jobs=1 and n_jobs=2, for instance in in test_iforest_parallel_regression(). Otherwise LGTM.

albertcthomas · 2019-07-09T17:52:12Z

sklearn/ensemble/iforest.py

+            for i in range(n_jobs))
+
+        n_samples = X.shape[0]
+        depths = np.zeros(n_samples, order="f")


I think you can use depths = np.sum(par_results, axis=0) instead of this initialization and the for loop below.

Totally agree! I'll include this change.

sergiormpereira · 2019-07-16T21:56:41Z

Hey @albertcthomas, thanks a lot for the feedback!
I added both suggestions: tests, and changing the loop for np.sum.

Now, two of the tests are failing, but it is on test_ridge.py. It looks related to the following issue:
#14219

Do you think it was caused by my changes?

albertcthomas · 2019-07-17T11:47:40Z

Thanks @sergiormpereira! I don't think the failure is related to this PR. Could you push a new commit (or amend the last commit and force push it) to rerun CI?

sergiormpereira · 2019-07-17T17:17:54Z

Thanks @albertcthomas! I amended the last commit and force pushed it, as you suggested. Now it passes everything.

… during predict.

sergiormpereira · 2019-07-19T09:57:24Z

@agramfort I added an entry to What's new. I put it as Fix because in the documentation it was said that n_jobs controls the number of parallel jobs both in training and test.

sergiormpereira · 2019-07-31T08:28:03Z

ping @agramfort @amueller :)

thomasjpfan · 2019-07-31T19:12:37Z

This provides really nice improvements for large samples!

I am concerned with users that set a high n_jobs during training and then move this into production which runs score_samples on <1k samples. They will experience a major performance hit.

amueller · 2019-07-31T20:58:18Z

Yeah can you include n_samples=10? There could also be a threshold that only runs in parallel with enough samples, right?

sergiormpereira · 2019-08-01T08:19:49Z

@thomasjpfan @amueller Yeah, I agree that a threshold can be set. I'll include it in the next days.

sergiormpereira · 2019-08-05T08:26:40Z

Hey @thomasjpfan @amueller ! I was looking into the issue of predicting with a small number of samples. Please, have a look at the following points.

It appears that there is an impact both in terms of the number of samples and the number of features. But, in general, the best we can go with parallel prediction is around 100 ms per prediction call. This applies to, e.g., 1 sample and 10 features, or 100 samples and 500 features. In contrast, the current single thread prediction can achieve a best time of 30 ms per prediction call. This is true for, e.g., 1 sample with 10 features. So, my conclusion is that in this regime, the thread handling is dominating the running time.
Maybe we can impose a threshold. The relationship between the number of features, the number of samples, and the obtained speed-up look non-linear. I could try to come up with a function to define the threshold, but I am afraid that it could be hardware-dependent. So, I would rather go for a simplified threshold, like >= 5000 samples to run in parallel. This is more conservative than 1k samples, but in this regime, I think that the speed-up will not be so different from the single-threaded mode. What's your opinion on this?
Also, the minimum runtime we can go with the single thread IForest is around 30 ms, while with parallel threads is around 100 ms. This is more than twice slower, but as we increase the amount of data, we start saving seconds, minutes, or hours, which I think is a better saving that 70 ms. Of course, I understand that in production, we may be interested in calling predict multiple times, for a few amounts of samples, as @thomasjpfan said.
As a matter of curiosity, I also did a small test with boston house-prices dataset, using the Random Forest Regression, and the Isolation Forest. I observed, that the RF Reg. for 1 thread achieves predict times of around 3 ms. But, when we start increasing the number of threads, the minimum predict call time is 100 ms. With the single-threaded Isolation Forest, in this scenario, the time for 1 sample is around 30 ms. When we use more parallel jobs during predict with Isolation Forest, we also obtain around 100 ms predict time. So, I can conclude that this issue is really not so much Isolation Forest-specific, but is more related to handling the parallel jobs.

The study regarding points 1, 2, and 3 can be found in: https://github.com/TechhubLisbon/scikit-learn/blob/iforest-parallel-predict-benchmark/benchmarks/bench_isolation_forest_parallel_predict_samples_study.ipynb

The study of point 4: https://github.com/TechhubLisbon/scikit-learn/blob/iforest-parallel-predict-benchmark/benchmarks/Parallel_IForest_VS_RF.ipynb

What are your thoughts on this?

thomasjpfan · 2019-08-05T19:53:15Z

@sergiormpereira Thank you for the benchmarks!

On 2:

I am okay with a threshold. I wonder if it is better to have an threshold, or document that it may be good to set_params(n_jobs=1) when sample size < 5000.

On 4:

The parallelism in IsolationForest first uses _compute_chunked_score_samples to break up the samples into batches and puts each batch through each estimator_ in _compute_score_samples. This PR parallelizes the calls to the tree methods to obtain the depth. I wonder if this PR will use too much memory as reported #12040

The Random Forest does not break up the samples into batches. It passes all of X into predict_proba of each tree. The predict_proba call is parallelized in this case.

sergiormpereira · 2019-08-06T14:49:56Z

@thomasjpfan thanks for the feedback, and info regarding RF!

On 2: I can easily implement any of them. But, somehow, I feel an inclination towards documenting it, for the sake of being more general and avoiding to put a constant in the code. We could also document that it may increase the memory footprint (check the next point). Perhaps, @amueller can comment here, too?

On 4: running the prediction in parallel with this PR increases the memory footprint, more or less, a little bit more than 0.5 times the number of parallel jobs, in relation to the current single-threaded method. Maybe this can be improved in the future? Please, have a look at the memory benchmark that I conducted: https://github.com/TechhubLisbon/scikit-learn/blob/iforest-parallel-predict-benchmark/benchmarks/bench_isolation_forest_parallel_memory_consumption.ipynb

sergiormpereira · 2019-08-23T08:16:40Z

ping @amueller @thomasjpfan :)

necosta · 2019-09-10T12:45:41Z

Any updates? Would love to see this being merged. Happy to help on any other investigations

thomasjpfan · 2019-09-13T19:40:29Z

With the increase in memory usage, which opens #12040 back up again by using more memory than `sklearn.get_config()['working_memory'], I rather not have this feature activate automatically. There is essentially a trade off between computation time and memory usage.

The way to work around this may be to parallelize one step higher at the following level:

def _compute_chunked_score_samples(self, X):
    ...
	for sl in slices:
    	scores[sl] = self._compute_score_samples(X[sl], subsample_features)

(I am unsure if this will work)

sergiormpereira · 2019-10-10T16:57:25Z

Hey @thomasjpfan ! I was analyzing that suggestion about parallelizing higher and, from my understanding, it will not alleviate the memory issue. At the moment, when we parallelize at the trees, we are having n_jobs parallel trees running. As you suggest, we would be parallelizing at the batch level, meaning with large data we would have n_jobs batches being predicted in parallel, with trees in series in each batch. So, we would indirectly having n_jobs parallel trees running. Am I correct here?

What we could consider as an option would be to have a parameter to IsolationForest that explicitly states to run predict in parallel as False by default, for instance, a parallel_predict=False. In the documentation, we could warn about the current issues of having it True.

What do you think? Perhaps @amueller can comment, too :)

necosta · 2019-12-04T20:31:39Z

ping for comments :)

sergiormpereira · 2020-01-05T18:08:08Z

ping @amueller @thomasjpfan . I really would like to move this PR forward :)

jnothman · 2020-01-07T10:39:08Z

Generally prediction can be batched over samples. This could even be achieved with a mixin, and would maintain reasonably minimal memory requirements... Rather than providing a parameter to enable parallelisation across trees, I wonder if this kind of mixin would be similarly performant. See also dask_ml.wrappers.ParallelPostFit.

svenvanhal · 2020-11-16T15:37:41Z

+1 for parallelized predictions.

In the meantime, readers may consider a parallel wrapper as workaround:

from os import sched_getaffinity

import numpy as np
from sklearn.ensemble import IsolationForest

# Fit IsolationForest
iso = IsolationForest().fit(X_train)

# Split test array in `n_cores` chunks
n_chunks = len(sched_getaffinity(0))
chunks = np.array_split(X_test, n_chunks)

Multiprocessing:

from multiprocessing import Pool

# Predict in parallel
with Pool(n_chunks) as pool:
    y_score = np.concatenate(pool.map(iso.score_samples, chunks))

Joblib:

from joblib import Parallel, delayed

# Predict in parallel
par_exec = Parallel(n_jobs=n_chunks, max_nbytes='8G')
y_score = np.concatenate(par_exec(delayed(iso.score_samples)(_X) for _X in chunks))

Multiprocessing is slightly faster for me, but your mileage may vary.

thomasjpfan

Sorry for the delay. I left a suggestion with a possible way forward.

thomasjpfan · 2022-04-24T01:18:43Z

sklearn/ensemble/iforest.py

+            return batch_depths
+
+        n_jobs, n_estimators, starts = _partition_estimators(
+            self.n_estimators, self.n_jobs)


Looking at this issue again, I think we can do this:

Suggested change

self.n_estimators, self.n_jobs)

self.n_estimators, None)

which allows joblib.parallel_backend to control n_jobs. At a higher level, we can use parallel_backend to control n_jobs:

with parallel_backend("loky", n_jobs=6): iso.score_samples(X)

Details: Seeing that _partition_estimators uses effective_n_jobs:

scikit-learn/sklearn/ensemble/_base.py

Line 209 in 8d295fb

n_jobs = min(effective_n_jobs(n_jobs), n_estimators)

effective_n_jobs queries configuration as follows:

with parallel_backend("loky", n_jobs=4): print(effective_n_jobs(None)) # 4 # default is 1 print(effective_n_jobs(None)) # 1

adam2392 · 2024-03-08T03:09:55Z

@adrinjalali I see you marked this as stalled and help-wanted. I would be interested in possibly helping finishing this off as I think this is an important performance gap within the tree/ensemble module.

From my understanding, the remaining work is just around configuring the right internal API using Thomas's suggestion, prolly updating unit-tests and then running a few benchmarks to verify this works as intended.

adrinjalali · 2024-03-11T09:08:26Z

@adam2392 would be very nice if you could then open a PR to supersede this work.

lesteve · 2024-12-18T06:04:02Z

This has been done in #25186

Run itrees in parallel during prediction.

c8eab23

Pereira and others added 5 commits June 28, 2019 09:52

Trees parallelized in chunks during prediction.

be93e30

Changed the name of the variables in inner get_depths function in ifo…

07c328b

…rest.py

Merge pull request #1 from TechhubLisbon/iforest-parallel-predict-chunks

1afb725

Trees parallelized in chunks during prediction.

Turned flag of delayed function check_pickle to False, when called in…

16f1e3f

… _compute_score_samples of iforest.py

Added condition to check pickle depending on the version of joblib.

35c9951

albertcthomas reviewed Jul 9, 2019

View reviewed changes

Changed loop to np.sum in _compute_score_samples of IForest.

9c9f43f

albertcthomas approved these changes Jul 17, 2019

View reviewed changes

Included test of score samples in parallel in Isolation Forest

fbdbcce

sergiormpereira force-pushed the iforest-parallel-predict branch from 8e54aea to fbdbcce Compare July 17, 2019 16:13

Added PR 14001 to whats_new/v0.22.rst. IForest now runs parallel jobs…

1be8752

… during predict.

github-actions bot added the module:ensemble label Mar 2, 2020

cmarmo added the Needs Decision Requires decision label Aug 12, 2020

Base automatically changed from master to main January 22, 2021 10:51

thomasjpfan reviewed Apr 24, 2022

View reviewed changes

samuelklee mentioned this pull request Sep 13, 2022

New tools for annotation-based filtering. broadinstitute/gatk#7724

Open

adrinjalali added Stalled help wanted and removed Needs Decision Requires decision labels Mar 6, 2024

adam2392 mentioned this pull request Mar 12, 2024

ENH Enable prediction of isolation forest in parallel #28622

Merged

lesteve closed this Dec 18, 2024

Uh oh!

Run itrees in parallel during prediction. #14001

Run itrees in parallel during prediction. #14001

Uh oh!

Conversation

sergiormpereira commented May 31, 2019

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

amueller commented May 31, 2019

Uh oh!

sergiormpereira commented Jun 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sergiormpereira commented Jun 11, 2019

Uh oh!

agramfort commented Jun 16, 2019

Uh oh!

sergiormpereira commented Jun 19, 2019

Uh oh!

sergiormpereira commented Jul 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agramfort commented Jul 5, 2019

Uh oh!

albertcthomas commented Jul 8, 2019

Uh oh!

albertcthomas left a comment

Choose a reason for hiding this comment

Uh oh!

albertcthomas Jul 9, 2019

Choose a reason for hiding this comment

Uh oh!

sergiormpereira Jul 16, 2019

Choose a reason for hiding this comment

Uh oh!

sergiormpereira commented Jul 16, 2019

Uh oh!

albertcthomas commented Jul 17, 2019

Uh oh!

sergiormpereira commented Jul 17, 2019

Uh oh!

sergiormpereira commented Jul 19, 2019

Uh oh!

sergiormpereira commented Jul 31, 2019

Uh oh!

thomasjpfan commented Jul 31, 2019

Uh oh!

amueller commented Jul 31, 2019

Uh oh!

sergiormpereira commented Aug 1, 2019

Uh oh!

sergiormpereira commented Aug 5, 2019

Uh oh!

thomasjpfan commented Aug 5, 2019

Uh oh!

sergiormpereira commented Aug 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sergiormpereira commented Aug 23, 2019

Uh oh!

necosta commented Sep 10, 2019

Uh oh!

thomasjpfan commented Sep 13, 2019

Uh oh!

sergiormpereira commented Oct 10, 2019

Uh oh!

necosta commented Dec 4, 2019

Uh oh!

sergiormpereira commented Jan 5, 2020

Uh oh!

jnothman commented Jan 7, 2020

Uh oh!

svenvanhal commented Nov 16, 2020

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Apr 24, 2022

sergiormpereira commented Jun 4, 2019 •

edited

Loading

sergiormpereira commented Jul 2, 2019 •

edited

Loading

sergiormpereira commented Aug 6, 2019 •

edited

Loading