Skip to content

MNT Replaced np.ndarray with memview where applicable in linear_model/_cd_fast.pyx #23147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Jul 7, 2022

Conversation

Micky774
Copy link
Contributor

@Micky774 Micky774 commented Apr 17, 2022

Reference Issues/PRs

Addresses #10624

What does this implement/fix? Explain your changes.

Replaces np.ndarray typing with corresponding memviews in sparse_enet_coordinate_descent and enet_coordinate_descent_multi_task.

Any other comments?

Would it be worth disabling boundscheck for any of these functions? Afaik there's not much of a risk of an out-of-bounds access in these functions.

@Micky774 Micky774 changed the title MAINT Replaced np.ndarray with memview where applicable in linear_model/_cd_fast.pyx MNT Replaced np.ndarray with memview where applicable in linear_model/_cd_fast.pyx May 25, 2022
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to add the wrapper when in the calls of enet_coordinate_descent and enet_coordinate_descent_multitask since we currently have the same error when passing a dense read-only array. The test then can be extended to cover the dense case

Comment on lines 385 to 390
def make_read_only(obj, attr):
curr_attr = getattr(obj, attr)
RO_attr = create_memmap_backed_data(curr_attr)
setattr(obj, attr, RO_attr)

make_read_only(X, "data")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you only make "data" read-only, I think the helper is no longer necessary and you can write it as flat code

@Micky774
Copy link
Contributor Author

Micky774 commented Jun 6, 2022

I think we also need to add the wrapper when in the calls of enet_coordinate_descent and enet_coordinate_descent_multitask since we currently have the same error when passing a dense read-only array. The test then can be extended to cover the dense case

For the dense case, don't we modify X in-place in _preprocess_data? Do we still support read-only inputs in that case?

@jeremiedbb
Copy link
Member

For the dense case, don't we modify X in-place in _preprocess_data? Do we still support read-only inputs in that case?

right, my bad. We do support read-only, but only if copy=True.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am overall +0.5 on this PR. I see ReadonlyArrayWrapper as a quick hack, so that we can use memoryviews in sklearn/_loss/_loss.pyx.tp. I prefer not to extend to use ReadonlyArrayWrapper too much in the library.

As for this PR, I think the tests are worth adding.

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @Micky774.

I am also not a fan of ReadonlyArrayWrappers, but if this adds support for read-only datasets, this LTGTM.

To answer your question:

Would it be worth disabling boundscheck for any of these functions? Afaik there's not much of a risk of an out-of-bounds access in these functions.

#21512 centralised all the cython compiler directives. By default, all the checks are deactivated:

# Additional checks for Cython
cython_enable_debug_directives = (
os.environ.get("SKLEARN_ENABLE_DEBUG_CYTHON_DIRECTIVES", "0") != "0"
)
config.ext_modules = cythonize(
config.ext_modules,
nthreads=n_jobs,
compile_time_env={
"SKLEARN_OPENMP_PARALLELISM_ENABLED": sklearn._OPENMP_SUPPORTED
},
compiler_directives={
"language_level": 3,
"boundscheck": cython_enable_debug_directives,
"wraparound": False,
"initializedcheck": False,
"nonecheck": False,
"cdivision": True,
},
)

@jjerphan jjerphan merged commit 3bb4bad into scikit-learn:main Jul 7, 2022
@Micky774 Micky774 deleted the memview_refactor branch July 7, 2022 16:16
ogrisel pushed a commit to ogrisel/scikit-learn that referenced this pull request Jul 11, 2022
…del/_cd_fast.pyx` (scikit-learn#23147)

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Dec 12, 2022
jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Jan 3, 2023
jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Jan 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants