-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Fix missing 'const' in a few memoryview declaration in trees. #13626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Fix missing 'const' in a few memoryview declaration in trees. #13626
Conversation
7c8adbf
to
8080086
Compare
LGTM, except I guess adding your example as a test wouldn't hurt. |
I added a test. It does not involve |
Fails on windows, interesting! |
# check that random forest supports read-only buffer (#13626) | ||
X_orig = np.random.RandomState(0).random_sample((10, 2)).astype(np.float32) | ||
|
||
with NamedTemporaryFile() as tmp: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm
scikit-learn/sklearn/datasets/tests/test_svmlight_format.py
Lines 118 to 122 in e405505
with NamedTemporaryFile(prefix="sklearn-test", suffix=".gz") as tmp: | |
tmp.close() # necessary under windows | |
with open(datafile, "rb") as f: | |
with gzip.open(tmp.name, "wb") as fh_out: | |
shutil.copyfileobj(f, fh_out) |
X_mmap = np.memmap(tmp.name, dtype='float32', mode='r', shape=(10, 2)) | ||
y = np.zeros(10) | ||
|
||
RandomForestClassifier(n_estimators=2).fit(X_mmap, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test could be under sklearn/tree/tests/test_tree.py
, and test a DecisionTreeRegressor
instead. It kinda feels like that's a more natural place for the test since it's actually testing the splitter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the test, and cleaned it since we actually have a helper to test on memmap arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I wondered why the common test |
Should we run the common test with both dtypes?
… |
Or perhaps there should be an estimator tag specifying what format (dtype,
order) is non-copying for some estimator
|
I'd prefer the second option since the common tests are already quite long. But it's out of scope of this PR I think. |
Thank you! @jeremiedbb |
…ees. (scikit-learn#13626)" This reverts commit 6889d2b.
…ees. (scikit-learn#13626)" This reverts commit 6889d2b.
Memory views were introduced in trees in #12886.
It misses the
const
keyword in a few declarations.A typical use case is doing cross validation on a
RandomForest
:Here X is more than 1Mb, which means it's mem-mapped by joblib in
cross_val_score
. This code breaks on master.What's happening is that for
cross_val_score
, the joblib backend is the sequential backend (as expected) but for the random forest it's loky backend, ignoringprefer='threads'
. So it seems that even if this PR fixes the bug in sklearn, there's also a bug in joblib. @ogrisel