[MRG+1] Read-only data compatibility for Lasso #4775

arthurmensch · 2015-05-27T12:59:39Z

Should fix issue #4772. Benchmarking the changes (calling cd_fast within a lasso regression) suggest we do not reduce performance compared to current master

I added a regression test that fails on current master (setting a read only flag on design matrix)

ogrisel · 2015-05-27T13:03:30Z

sklearn/decomposition/tests/test_dict_learning.py

+                              transform_alpha=0.001, random_state=0, n_jobs=-1)
+    code = dico.fit(X).transform(X)
+    assert_array_almost_equal(np.dot(code, dico.components_), X, decimal=2)
+    X.flags.writeable = True


~~This last line is useless as X will be garbage collected. Please remove it.~~

Sorry I had not realised that you were sharing the same X as in other tests. I would be cleaner to do:

X_readonly = X.copy() X_readonly.flags.writeable = False

and use that instead of X in the fit and transform calls.

ogrisel · 2015-05-27T13:07:53Z

I edited the title of this PR as a [WIP]. Please change it to [MRG] once you address the above comments.

amueller · 2015-05-27T15:14:09Z

sklearn/decomposition/tests/test_dict_learning.py

+    n_components = 12
+    X.flags.writeable = False
+    dico = DictionaryLearning(n_components, transform_algorithm='lasso_cd',
+                              transform_alpha=0.001, random_state=0, n_jobs=-1)


should we use n_jobs=-1 in tests?

I changed this so that the test only assert that dict_learning works on memory mapped ro arrays

ogrisel · 2015-05-28T09:17:06Z

sklearn/decomposition/tests/test_dict_learning.py

+
+if __name__ == '__main__':
+    import nose
+    nose.runmodule()


Please do not add such boilerplate. Use nosetests sklearn/decomposition/tests/test_dict_learning.py to run the tests of a specific test module.

OK, these lines still exist in some tests file though

ogrisel · 2015-05-28T13:18:25Z

Also can you please squash the commits of this PR? Such intermediate commits have not historical values per se.

ogrisel · 2015-05-28T13:18:44Z

If we need help with squashing please feel free to ask.

ogrisel · 2015-05-29T16:25:14Z

LGTM. I think we should write a new test in test_common that checks that we can fit any estimator on readonly data. But this can be done in another PR.

GaelVaroquaux · 2015-05-30T20:22:21Z

sklearn/decomposition/tests/test_dict_learning.py

@@ -59,6 +61,13 @@ def test_dict_learning_reconstruction_parallel():
    code = dico.transform(X)
    assert_array_almost_equal(np.dot(code, dico.components_), X, decimal=2)


PEP8: you need 2 empty lines between top-level functions

GaelVaroquaux · 2015-05-30T20:27:35Z

Does the change to the cython code have any impact to run time?

ogrisel · 2015-06-24T12:58:52Z

@arthurmensch could you please rebase this on top of master? it's no longer mergeable according to github.

I think the np.asarray vs np.array in input validation that triggers memory copy on np.memmap inputs should be tackled in a different pull request.

Instead to test the fix, please add a unittest that calls into the private cython API directly with np.memmap readonly data directly.

ogrisel · 2015-06-24T14:05:08Z

Please remove the last commit, I think this should be addressed in a separate PR. In the mean time write a unittest that uses the private API directly as I did here: #4684

ogrisel · 2015-07-21T19:20:09Z

Can you please squash those commits together?

ogrisel · 2015-07-21T19:22:44Z

sklearn/utils/testing.py

+    Copy from joblib.pool (for independance)"""
+    try:
+        if os.path.exists(folder_path):
+            shutil.rmtree(folder_path)  # This can fail under windows, but will succeed when called by atexit


arthurmensch · 2015-07-23T13:06:02Z

Done

ogrisel · 2015-07-23T14:08:54Z

Please squash commits that have trivial commit messages such as "Fix".

amueller · 2015-07-30T22:46:53Z

examples/decomposition/plot_image_denoising.py

@@ -74,7 +73,7 @@

 print('Learning the dictionary...')
 t0 = time()
-dico = MiniBatchDictionaryLearning(n_components=100, alpha=1, n_iter=500)
+dico = MiniBatchDictionaryLearning(n_components=100, alpha=1, n_iter=500, batch_size=100, n_jobs=4)


That will break on windows, right? Let's not do this.

Sorry I left it, I used it for tests. I don't understand why it would break on Windows though, could you explain ?

amueller · 2015-07-30T22:49:36Z

LGTM apart from minor comments.

amueller · 2015-07-31T14:40:25Z

as far as I know, using multiprocessing requires you to wrap your call into a if __name__ == "__main__" as the file is repeatedly imported. Otherwise you get infinite recursion. As we don't do that for examples, we can't use multiprocessing. (@ogrisel correct me if I'm wrong)

amueller · 2015-07-31T14:42:57Z

Travis test failure from cec3bf9. Merging.

…infix [MRG+1] Read-only data compatibility for Lasso

ogrisel reviewed May 27, 2015
View reviewed changes

ogrisel changed the title ~~cd_fast.pyx : changed ctype to np.ndarray to avoid cython read only error~~ [WIP] cd_fast.pyx : changed ctype to np.ndarray to avoid cython read only error May 27, 2015

amueller reviewed May 27, 2015
View reviewed changes

arthurmensch changed the title ~~[WIP] cd_fast.pyx : changed ctype to np.ndarray to avoid cython read only error~~ [MRG] cd_fast.pyx : changed ctype to np.ndarray to avoid cython read only error May 27, 2015

ogrisel reviewed May 28, 2015
View reviewed changes

arthurmensch changed the title ~~[MRG] cd_fast.pyx : changed ctype to np.ndarray to avoid cython read only error~~ [WIP] cd_fast.pyx : changed ctype to np.ndarray to avoid cython read only error May 28, 2015

arthurmensch force-pushed the cd_fast_readonly_array_brainfix branch 2 times, most recently from 73c98dd to 6094264 Compare May 29, 2015 08:09

arthurmensch changed the title ~~[WIP] cd_fast.pyx : changed ctype to np.ndarray to avoid cython read only error~~ [MRG] cd_fast.pyx : changed ctype to np.ndarray to avoid cython read only error May 29, 2015

ogrisel changed the title ~~[MRG] cd_fast.pyx : changed ctype to np.ndarray to avoid cython read only error~~ [MRG+1] cd_fast.pyx : changed ctype to np.ndarray to avoid cython read only error May 29, 2015

GaelVaroquaux reviewed May 30, 2015
View reviewed changes

arthurmensch force-pushed the cd_fast_readonly_array_brainfix branch from 976e28e to bc33901 Compare June 24, 2015 13:53

arthurmensch changed the title ~~[MRG+1] cd_fast.pyx : changed ctype to np.ndarray to avoid cython read only error~~ [MRG+1] Read only data compatibility for Lasso Jul 2, 2015

arthurmensch changed the title ~~[MRG+1] Read only data compatibility for Lasso~~ [MRG+1] Read-only data compatibility for Lasso Jul 2, 2015

arthurmensch mentioned this pull request Jul 8, 2015

Simple L1 dictionary learning nilearn/nilearn#638

Closed

1 task

ogrisel reviewed Jul 21, 2015
View reviewed changes

arthurmensch force-pushed the cd_fast_readonly_array_brainfix branch from 6869512 to caec867 Compare July 23, 2015 13:05

arthurmensch force-pushed the cd_fast_readonly_array_brainfix branch 4 times, most recently from 9683cef to 9a55eeb Compare July 27, 2015 13:02

amueller reviewed Jul 30, 2015
View reviewed changes

arthurmensch force-pushed the cd_fast_readonly_array_brainfix branch 3 times, most recently from 8dc0aa7 to e3435a1 Compare July 31, 2015 10:10

Bugfix : type in cd changed for read only memmap compatibility

fead69a

arthurmensch force-pushed the cd_fast_readonly_array_brainfix branch from e3435a1 to fead69a Compare July 31, 2015 10:12

amueller added a commit that referenced this pull request Jul 31, 2015

Merge pull request #4775 from arthurmensch/cd_fast_readonly_array_bra…

77ecf16

…infix [MRG+1] Read-only data compatibility for Lasso

amueller merged commit 77ecf16 into scikit-learn:master Jul 31, 2015

lesteve mentioned this pull request Dec 5, 2016

lasso_cd with multiprocessing fails on large dataset #4772

Closed

lesteve mentioned this pull request Mar 7, 2018

MAINT Use Cython memoryviews #10624

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG+1] Read-only data compatibility for Lasso #4775

[MRG+1] Read-only data compatibility for Lasso #4775

arthurmensch commented May 27, 2015

ogrisel May 27, 2015

ogrisel commented May 27, 2015

amueller May 27, 2015

arthurmensch May 28, 2015

ogrisel May 28, 2015

arthurmensch May 28, 2015

ogrisel commented May 28, 2015

ogrisel commented May 28, 2015

ogrisel commented May 29, 2015

GaelVaroquaux May 30, 2015

GaelVaroquaux commented May 30, 2015

ogrisel commented Jun 24, 2015

ogrisel commented Jun 24, 2015

ogrisel commented Jul 21, 2015

ogrisel Jul 21, 2015

arthurmensch commented Jul 23, 2015

ogrisel commented Jul 23, 2015

amueller Jul 30, 2015

arthurmensch Jul 31, 2015

amueller commented Jul 30, 2015

amueller commented Jul 31, 2015

amueller commented Jul 31, 2015

		@@ -59,6 +61,13 @@ def test_dict_learning_reconstruction_parallel():
		code = dico.transform(X)
		assert_array_almost_equal(np.dot(code, dico.components_), X, decimal=2)

[MRG+1] Read-only data compatibility for Lasso #4775

[MRG+1] Read-only data compatibility for Lasso #4775

Conversation

arthurmensch commented May 27, 2015

Choose a reason for hiding this comment

ogrisel commented May 27, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ogrisel commented May 28, 2015

ogrisel commented May 28, 2015

ogrisel commented May 29, 2015

Choose a reason for hiding this comment

GaelVaroquaux commented May 30, 2015

ogrisel commented Jun 24, 2015

ogrisel commented Jun 24, 2015

ogrisel commented Jul 21, 2015

Choose a reason for hiding this comment

arthurmensch commented Jul 23, 2015

ogrisel commented Jul 23, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amueller commented Jul 30, 2015

amueller commented Jul 31, 2015

amueller commented Jul 31, 2015