MAINT Remove -Wcpp warnings when compiling sklearn.cluster._hierarchical_fast #24914

OmarManzoor · 2022-11-14T12:34:01Z

Reference Issues/PRs

Towards #24875

What does this implement/fix? Explain your changes.

Remove Wcpp warnings when compiling sklearn.cluster._hierarchical_fast. However the cnp.import_array() statement is kept.

Any other comments?

…cal_fast

…_fast

jjerphan

Thank you once again, @OmarManzoor!

Some more hints for this PR before approval.

sklearn/cluster/_hierarchical_fast.pyx

…_fast

jjerphan

LGTM. Thank you, @OmarManzoor!

glemaitre · 2022-11-17T10:11:18Z

sklearn/cluster/_hierarchical_fast.pyx

+    cnp.float64_t[::1] m_1,
+    cnp.float64_t[:, ::1] m_2,
+    cnp.npy_intp[::1] coord_row,
+    cnp.npy_intp[::1] coord_col,


Those could be declared as constant, isn't it?

Suggested change

cnp.float64_t[::1] m_1,

cnp.float64_t[:, ::1] m_2,

cnp.npy_intp[::1] coord_row,

cnp.npy_intp[::1] coord_col,

const cnp.float64_t[::1] m_1,

const cnp.float64_t[:, ::1] m_2,

const cnp.npy_intp[::1] coord_row,

const cnp.npy_intp[::1] coord_col,

glemaitre · 2022-11-17T10:12:38Z

sklearn/cluster/_hierarchical_fast.pyx

+    cnp.npy_intp[::1] coord_row,
+    cnp.npy_intp[::1] coord_col,
+    cnp.float64_t[::1] res
+) nogil:
    cdef INTP size_max = coord_row.shape[0]
    cdef INTP n_features = m_2.shape[1]
    cdef INTP i, j, row, col


for those cdef, we could change the INTP and DOUBLE by the cnp.npy_intp and cnp.float64_t as well just for consistency in the function.

glemaitre · 2022-11-17T10:16:17Z

sklearn/cluster/_hierarchical_fast.pyx

+cpdef void _get_parents(
+    nodes,
+    heads,
+    cnp.npy_intp[:] parents,


Suggested change

cnp.npy_intp[:] parents,

const cnp.npy_intp[:] parents,

sklearn/cluster/_hierarchical_fast.pyx

glemaitre · 2022-11-17T10:16:52Z

sklearn/cluster/_hierarchical_fast.pyx

@@ -95,7 +96,7 @@ def _hc_get_descendent(INTP node, children, INTP n_leaves):
    return descendent


-def hc_get_heads(cnp.ndarray[INTP, ndim=1] parents, copy=True):
+def hc_get_heads(cnp.npy_intp[:] parents, copy=True):
    """Returns the heads of the forest, as defined by parents.

    Parameters


we can change the INTP below

glemaitre · 2022-11-17T10:18:23Z

sklearn/cluster/_hierarchical_fast.pyx

+def average_merge(
+    IntFloatDict a,
+    IntFloatDict b,
+    cnp.intp_t[:] mask,


Suggested change

cnp.intp_t[:] mask,

const cnp.intp_t[:] mask,

glemaitre · 2022-11-17T10:19:35Z

sklearn/cluster/_hierarchical_fast.pyx

+def max_merge(
+    IntFloatDict a,
+    IntFloatDict b,
+    cnp.intp_t[:] mask,


Suggested change

cnp.intp_t[:] mask,

const cnp.intp_t[:] mask,

glemaitre · 2022-11-17T10:31:01Z

sklearn/cluster/_hierarchical_fast.pyx

@@ -354,8 +366,7 @@ cdef class UnionFind(object):
        return n


-cpdef cnp.ndarray[DTYPE_t, ndim=2] _single_linkage_label(
-    cnp.ndarray[DTYPE_t, ndim=2] L):
+def _single_linkage_label(cnp.float64_t[:, :] L):


What is the reason for using def and not cpdef with output type?

Suggested change

def _single_linkage_label(cnp.float64_t[:, :] L):

cpdef DTYPE_t[:, :] _single_linkage_label(const cnp.float64_t[:, :] L):

L seems also to be constant.

I think this forces the output to be a memory view whereas we want to return the actual array and we get the error AttributeError: 'sklearn.cluster._hierarchical_fast._memoryviewslic' object has no attribute 'astype' from the place which is accessing it.

OK I see. @jjerphan What is the Cython API to then acknowledge that you want to return a NumPy array and not a memory view?

One can use cnp.ndarray, but it is better not to depend on this interface if we do not need to.

For functions that are only called from Python code, there's little value of using return type because everything is seen as an object in Python. This is the case here.

…zoor/scikit-learn into cython_hierarchical_fast

…_fast

jjerphan · 2022-11-18T07:36:07Z

Is it possible to remove the aliases definition and usage for aliases defined in this file, i.e.:

scikit-learn/sklearn/cluster/_hierarchical_fast.pyx

Lines 7 to 9 in 6dd1728

    
           ctypedef cnp.float64_t DOUBLE 
        
           ctypedef cnp.npy_intp INTP 
        
           ctypedef cnp.int8_t INT8

Thank you!

…zoor/scikit-learn into cython_hierarchical_fast

sklearn/cluster/_hierarchical_fast.pyx

thomasjpfan

Minor nit other wise LGTM

thomasjpfan · 2022-11-22T21:31:06Z

sklearn/cluster/_hierarchical_fast.pyx


        U.union(left_cluster, right_cluster)

-    return result_arr
+    return result_arr.base


Nit: I think using np.asarray is clearer here:

Suggested change

return result_arr.base

return np.asarray(result_arr)

Note that np.asarray does not make a copy because it uses the dtype and order from the memoryview.

Won't this still be an extra method call as compared to directly accessing the base attribute?

For future contributors, I think using base will be harder to reason about. If result_arr was a view into another array, then base can point to a bigger base array. For example:

%%cython import numpy as np def func_base(double[:, :] X): cdef: double[:] X_small = X[0] return np.asarray(X_small), X_small.base

import numpy as np X = np.asarray([[1.0, 2.0, 3.0], [3.0, 4.0, 5.0]], dtype=np.float64) X_asarray, X_base = func_base(X) X_asarray # array([1., 2., 3.]) X_base # array([[1., 2., 3.], # [3., 4., 5.]])

For this PR, to be sure that np.asarray(result_arr) == result_arr.base one needs to look through the entire function to confirm that result_arr is not a view into another memoryview. Although the function is not long, using result_arr.base does add more cognitive overhead compared to returning np.asarray(result_arr).

Secondly, I think the function call overhead of asarray is small compared to the function's actual computation. In these situations, I favor code that is easier to digest by using np.asarray.

Thank you for the nice explanation. Looking at the provided example, I agree that it seems more reasonable to use np.asarray here.

…_fast

jjerphan · 2022-11-28T09:05:23Z

Thank you, @OmarManzoor.

I also approve @thomasjpfan's comment (i.e. #24914 (comment)) regarding using np.asarray over the memoryviews's base attribute.

…cal_fast (scikit-learn#24914) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

MAINT Remove -Wcpp warnings when compiling sklearn.cluster._hierarchi…

bb9ec11

…cal_fast

github-actions bot added module:cluster cython labels Nov 14, 2022

Merge remote-tracking branch 'upstream/main' into cython_hierarchical…

f183b42

…_fast

jjerphan reviewed Nov 14, 2022

View reviewed changes

sklearn/cluster/_hierarchical_fast.pyx Outdated Show resolved Hide resolved

sklearn/cluster/_hierarchical_fast.pyx Show resolved Hide resolved

sklearn/cluster/_hierarchical_fast.pyx Outdated Show resolved Hide resolved

OmarManzoor added 4 commits November 15, 2022 14:07

Merge remote-tracking branch 'upstream/main' into cython_hierarchical…

c4376ef

…_fast

Improvements in compute_ward_dist

faaf0f7

Addressed PR improvements

661777a

Merge remote-tracking branch 'upstream/main' into cython_hierarchical…

d17d03c

…_fast

jjerphan approved these changes Nov 15, 2022

View reviewed changes

Merge branch 'main' into cython_hierarchical_fast

913f2a4

glemaitre self-requested a review November 17, 2022 10:01

glemaitre reviewed Nov 17, 2022

View reviewed changes

OmarManzoor and others added 4 commits November 17, 2022 16:18

Addressed PR suggestions

e8a245b

Merge branch 'cython_hierarchical_fast' of https://github.com/OmarMan…

f702547

…zoor/scikit-learn into cython_hierarchical_fast

Merge remote-tracking branch 'upstream/main' into cython_hierarchical…

b1db35a

…_fast

Merge branch 'main' into cython_hierarchical_fast

6dd1728

OmarManzoor added 2 commits November 18, 2022 12:41

Remove aliases

ec48a91

Merge branch 'cython_hierarchical_fast' of https://github.com/OmarMan…

fb5a085

…zoor/scikit-learn into cython_hierarchical_fast

jjerphan reviewed Nov 18, 2022

View reviewed changes

sklearn/cluster/_hierarchical_fast.pyx Outdated Show resolved Hide resolved

PR suggestion

8083d81

jeremiedbb reviewed Nov 18, 2022

View reviewed changes

sklearn/cluster/_hierarchical_fast.pyx Outdated Show resolved Hide resolved

Use def instead of cpdef for functions only used inside python

c856fb2

thomasjpfan approved these changes Nov 22, 2022

View reviewed changes

OmarManzoor added 3 commits November 23, 2022 06:35

Merge main into branch

8f0a5bf

Merge remote-tracking branch 'upstream/main' into cython_hierarchical…

c5ae2d8

…_fast

Use np.asarray in place of the base attribute in _single_linkage_label

2eda8ff

jjerphan merged commit f9453d5 into scikit-learn:main Nov 28, 2022

OmarManzoor deleted the cython_hierarchical_fast branch November 28, 2022 11:44

jjerphan mentioned this pull request Nov 30, 2022

MAINT Remove all Cython, C and C++ compilations warnings #24875

Closed

22 tasks

	def _single_linkage_label(cnp.float64_t[:, :] L):
	cpdef DTYPE_t[:, :] _single_linkage_label(const cnp.float64_t[:, :] L):

Uh oh!

MAINT Remove -Wcpp warnings when compiling sklearn.cluster._hierarchical_fast #24914

MAINT Remove -Wcpp warnings when compiling sklearn.cluster._hierarchical_fast #24914

Uh oh!

Conversation

OmarManzoor commented Nov 14, 2022

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

OmarManzoor Nov 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jjerphan commented Nov 18, 2022

Uh oh!

Uh oh!

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Nov 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jjerphan commented Nov 28, 2022

Uh oh!

Uh oh!

OmarManzoor Nov 17, 2022 •

edited

Loading

thomasjpfan Nov 25, 2022 •

edited

Loading