What's the point in this line and this function? #5820

olologin · 2015-11-15T08:34:42Z

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/bagging.py#L344

I understand that it slices list of tasks into batches, but why do we need it if joblib provides ability to automatically adjust batch size (from list of tasks), and it automatically groups your tasks into batches for each process. Even if you don't want to waste time onto automatic adjustion process - you can just set batch_size = int(n_tasks/n_jobs) and it will work in same way as explicit slicing.

https://pythonhosted.org/joblib/parallel.html#parallel-reference-documentation

The text was updated successfully, but these errors were encountered:

GaelVaroquaux · 2015-11-15T10:06:17Z

Joblib didn't have the automatic batching back when this code was written. It may not be necessary anymore these days. However, before removing it, we would need careful benchmarking.

olologin · 2015-11-15T10:13:00Z

Yes, I've checked sources now and this function is used only from two files, bagging.py and forest.py, in last case it's used only for computing minimal number of threads:
n_jobs, _, _ = _partition_estimators(self.n_estimators, self.n_jobs)

Not a big deal, as it turned out.

I'll try to make some benchmarks.

olologin · 2015-11-15T12:37:23Z

import numpy as np

from sklearn.ensemble import BaggingClassifier, BaggingRegressor

from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston, load_iris
from sklearn.utils import check_random_state

from scipy.sparse import csc_matrix, csr_matrix

rng = check_random_state(0)

# also load the iris dataset
# and randomly permute it
iris = load_iris()
perm = rng.permutation(iris.target.size)
iris.data = iris.data[perm]
iris.target = iris.target[perm]

# also load the boston dataset
# and randomly permute it
boston = load_boston()
perm = rng.permutation(boston.target.size)
boston.data = boston.data[perm]
boston.target = boston.target[perm]

reg = BaggingRegressor(n_estimators=100, bootstrap=True, oob_score=True, n_jobs=4, random_state = 10)
clf = BaggingClassifier(n_estimators=100, bootstrap=True, oob_score=True, n_jobs=4, random_state = 10)

%timeit -n 10 clf.fit(iris.data,iris.target) 
%timeit -n 10 reg.fit(boston.data,boston.target) 
%timeit -n 10 clf.predict(iris.data)
%timeit -n 10 reg.predict(boston.data)

reg = BaggingRegressor(n_estimators=400, bootstrap=True, oob_score=True, n_jobs=4, random_state = 10)
clf = BaggingClassifier(n_estimators=400, bootstrap=True, oob_score=True, n_jobs=4, random_state = 10)

%timeit -n 10 clf.fit(iris.data,iris.target) 
%timeit -n 10 reg.fit(boston.data,boston.target) 
%timeit -n 10 clf.predict(iris.data)
%timeit -n 10 reg.predict(boston.data)

Results of current master:

    # 100 trees
10 loops, best of 3: 265 ms per loop
10 loops, best of 3: 409 ms per loop
10 loops, best of 3: 147 ms per loop
10 loops, best of 3: 150 ms per loop
    # 400 trees
10 loops, best of 3: 635 ms per loop
10 loops, best of 3: 1.46 s per loop
10 loops, best of 3: 268 ms per loop
10 loops, best of 3: 392 ms per loop

Automatically adjusting batch sizes from this branch https://github.com/olologin/scikit-learn/blob/bagging_refactoring/sklearn/ensemble/bagging.py

    # 100 trees
10 loops, best of 3: 261 ms per loop
10 loops, best of 3: 436 ms per loop
10 loops, best of 3: 146 ms per loop
10 loops, best of 3: 188 ms per loop
    # 400 trees
10 loops, best of 3: 684 ms per loop
10 loops, best of 3: 1.33 s per loop
10 loops, best of 3: 266 ms per loop
10 loops, best of 3: 366 ms per loop

GaelVaroquaux · 2015-11-16T11:30:40Z

This looks promising. Ping @glouppe @arjoly

olologin · 2015-11-16T14:27:32Z

I should say that in current forest.py from master all code which works with joblib is written in the same manner (as bagging.py from branch above).
I.e:

    n_jobs, _, _ = _partition_estimators(self.n_estimators, self.n_jobs)
    ... = Parallel(n_jobs=n_jobs, verbose=self.verbose,
                         backend="threading")(

glouppe · 2015-11-16T14:32:43Z

Can you make the same benchmarks on much larger datasets? The reason behind writing things this way was to minimize overhead by transferring function arguments (i.e., copying X and y) only once per core.

olologin · 2015-11-16T14:35:49Z

Can you make the same benchmarks on much larger datasets?

Digits dataset is big enough?

The reason behind writing things this way was to minimize overhead by transferring function arguments (i.e., copying X and y)

Hmm, interesting, i thought that joblib automatically serializes them only once.

olologin · 2015-11-16T15:35:06Z

import numpy as np

from sklearn.ensemble import BaggingClassifier, BaggingRegressor

from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_digits, load_boston
from sklearn.utils import check_random_state

from scipy.sparse import csc_matrix, csr_matrix

rng = check_random_state(0)

# also load the iris dataset
# and randomly permute it
digits = load_digits()
perm = rng.permutation(digits.target.size)
digits.data = digits.data[perm]
digits.target = digits.target[perm]

# also load the boston dataset
# and randomly permute it
boston = load_boston()
perm = rng.permutation(boston.target.size)
boston.data = boston.data[perm]
boston.target = boston.target[perm]

reg = BaggingRegressor(n_estimators=100, bootstrap=True, oob_score=True, n_jobs=4, random_state = 10)
clf = BaggingClassifier(n_estimators=100, bootstrap=True, oob_score=True, n_jobs=4, random_state = 10)

%timeit -n 10 clf.fit(digits.data,digits.target)
%timeit -n 10 reg.fit(boston.data,boston.target)
%timeit -n 10 clf.predict(digits.data)
%timeit -n 10 reg.predict(boston.data)

reg = BaggingRegressor(n_estimators=400, bootstrap=True, oob_score=True, n_jobs=4, random_state = 10)
clf = BaggingClassifier(n_estimators=400, bootstrap=True, oob_score=True, n_jobs=4, random_state = 10)

%timeit -n 10 clf.fit(digits.data,digits.target)
%timeit -n 10 reg.fit(boston.data,boston.target)
%timeit -n 10 clf.predict(digits.data)
%timeit -n 10 reg.predict(boston.data)

Results of current master:

# 100 trees
10 loops, best of 3: 1.41 s per loop
10 loops, best of 3: 406 ms per loop
10 loops, best of 3: 256 ms per loop
10 loops, best of 3: 148 ms per loop
# 400 trees
10 loops, best of 3: 4.87 s per loop
10 loops, best of 3: 1.35 s per loop
10 loops, best of 3: 734 ms per loop
10 loops, best of 3: 370 ms per loop

Automatically adjusted batch sizes

# 100 trees
10 loops, best of 3: 1.38 s per loop
10 loops, best of 3: 468 ms per loop
10 loops, best of 3: 360 ms per loop
10 loops, best of 3: 192 ms per loop
# 400 trees
10 loops, best of 3: 5.02 s per loop
10 loops, best of 3: 1.38 s per loop
10 loops, best of 3: 983 ms per loop
10 loops, best of 3: 393 ms per loop

I tested it both times on same machine and same number of processes/load average of CPU, but these results look like some random numbers, maybe because of very small difference between versions. I'm using IPython and python 3.4.3 to obtain those results.

olologin · 2015-11-16T20:12:41Z

I'll add additional script which performs same test, but with timeit module (Because seems that IPython's %timeit ignores -n parameter), and i think that these results are more accurate.

import timeit
import numpy as np
setup = """
from sklearn.ensemble import BaggingClassifier, BaggingRegressor

from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_digits, load_boston
from sklearn.utils import check_random_state

from scipy.sparse import csc_matrix, csr_matrix

rng = check_random_state(0)

# also load the iris dataset
# and randomly permute it
digits = load_digits()
perm = rng.permutation(digits.target.size)
digits.data = digits.data[perm]
digits.target = digits.target[perm]

# also load the boston dataset
# and randomly permute it
boston = load_boston()
perm = rng.permutation(boston.target.size)
boston.data = boston.data[perm]
boston.target = boston.target[perm]

reg1 = BaggingRegressor(n_estimators=100, bootstrap=True, oob_score=True, n_jobs=4, random_state = 10)
clf1 = BaggingClassifier(n_estimators=100, bootstrap=True, oob_score=True, n_jobs=4, random_state = 10)
reg2 = BaggingRegressor(n_estimators=400, bootstrap=True, oob_score=True, n_jobs=4, random_state = 10)
clf2 = BaggingClassifier(n_estimators=400, bootstrap=True, oob_score=True, n_jobs=4, random_state = 10)

"""
fits = ["reg1.fit(boston.data, boston.target)", "reg2.fit(boston.data, boston.target)",
        "clf1.fit(digits.data, digits.target)", "clf2.fit(digits.data, digits.target)"]

predicts = ["reg1.predict(boston.data)", "reg2.predict(boston.data)",
            "clf1.predict(digits.data)", "clf2.predict(digits.data)"]

for fit, predict in zip(fits, predicts):
    print("{0}\t: {1}".format(
        np.min(timeit.Timer(setup=setup, stmt=fit).repeat(repeat=10, number=3)),
        fit
    ))
    print("{0}\t: {1}".format(
        np.min(timeit.Timer(setup=setup+fit, stmt=predict).repeat(repeat=10, number=3)),
        predict
    ))

Results:

Master

1.1000740239396691  : reg1.fit(boston.data, boston.target)
0.4071497058030218  : reg1.predict(boston.data)
3.166529825888574   : reg2.fit(boston.data, boston.target)
0.9816317800432444  : reg2.predict(boston.data)
3.070393343223259   : clf1.fit(digits.data, digits.target)
0.7177243148908019  : clf1.predict(digits.data)
11.313239770941436  : clf2.fit(digits.data, digits.target)
1.6799056760501117  : clf2.predict(digits.data)

After removing batches:

1.0993680758401752  : reg1.fit(boston.data, boston.target)
0.4194249368738383  : reg1.predict(boston.data)
3.1378495171666145  : reg2.fit(boston.data, boston.target)
1.0702078870963305  : reg2.predict(boston.data)
3.3732776790857315  : clf1.fit(digits.data, digits.target)
1.0525918728671968  : clf1.predict(digits.data)
11.62876814394258   : clf2.fit(digits.data, digits.target)
2.387485134182498   : clf2.predict(digits.data)

Ratio Before/After:

[ 0.99935827,  1.03014918,  0.99094267,  1.09023354,  1.09864675, 1.4665685 ,  1.02789019,  1.4212019]

We see from it that some degradation of time occurs in clf1.predict and clf2.predict, I don't know why, maybe because these operations are relatively fast while comparing to some internal joblib routines (batch_size adjusting, maybe some serialization overhead).

amueller · 2017-12-15T01:54:49Z

I don't think this is relevant any more, feel free to reopen.

olologin mentioned this issue Nov 16, 2015

[MRG+2] parallelized VotingClassifier #5805

Merged

amueller closed this as completed Dec 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

What's the point in this line and this function? #5820

What's the point in this line and this function? #5820

olologin commented Nov 15, 2015

GaelVaroquaux commented Nov 15, 2015 via email

Uh oh!

olologin commented Nov 15, 2015

Uh oh!

olologin commented Nov 15, 2015

Uh oh!

GaelVaroquaux commented Nov 16, 2015 via email

Uh oh!

olologin commented Nov 16, 2015

Uh oh!

glouppe commented Nov 16, 2015

Uh oh!

olologin commented Nov 16, 2015

Uh oh!

olologin commented Nov 16, 2015

Uh oh!

olologin commented Nov 16, 2015

Uh oh!

amueller commented Dec 15, 2017

Uh oh!

Uh oh!

What's the point in this line and this function? #5820

What's the point in this line and this function? #5820

Comments

olologin commented Nov 15, 2015

GaelVaroquaux commented Nov 15, 2015 via email

Uh oh!

olologin commented Nov 15, 2015

Uh oh!

olologin commented Nov 15, 2015

Uh oh!

GaelVaroquaux commented Nov 16, 2015 via email

Uh oh!

olologin commented Nov 16, 2015

Uh oh!

glouppe commented Nov 16, 2015

Uh oh!

olologin commented Nov 16, 2015

Uh oh!

olologin commented Nov 16, 2015

Uh oh!

olologin commented Nov 16, 2015

Uh oh!

amueller commented Dec 15, 2017

Uh oh!