Skip to content

Migrate WeightVector to use tempita #13358

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

massich
Copy link
Contributor

@massich massich commented Mar 1, 2019

WeightVector is used in #13346 and has attributes that cannot be fused.

cdef np.ndarray w
cdef np.ndarray aw
cdef double *w_data_ptr
cdef double *aw_data_ptr
cdef double wscale
cdef double average_a
cdef double average_b
cdef int n_features
cdef double sq_norm

This PR uses Tempita to allow float32 float64.

cross ref: #11000

@massich
Copy link
Contributor Author

massich commented Mar 1, 2019

num_iter is double, I think it should be int. For now I let it as a template of double.

cdef void add_average(self, double *x_data_ptr, int *x_ind_ptr,
int xnnz, double c, double num_iter) nogil

EDIT: number_iter comes from (t - average + 1), so its not integer but something else

@@ -22,7 +22,7 @@ from numpy.math cimport INFINITY
cdef extern from "sgd_fast_helpers.h":
bint skl_isfinite(double) nogil

from sklearn.utils.weight_vector cimport WeightVector
from sklearn.utils.weight_vector cimport WeightVector64 as WeightVector
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be here but in the template. Maybe we have some sort of clash. I need to get back to that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless you also template sgd_fast, you'll have to deal with both, and switch with

if floating is float:
    do something with WeightVector64
else:
    do something with WeightVector32

Copy link
Member

@jeremiedbb jeremiedbb Mar 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this diff don't belong to this PR

@massich massich marked this pull request as ready for review March 1, 2019 16:59
@massich
Copy link
Contributor Author

massich commented Mar 1, 2019

ping: @NicolasHug can you review aswell?? Thx

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -22,7 +22,7 @@ from numpy.math cimport INFINITY
cdef extern from "sgd_fast_helpers.h":
bint skl_isfinite(double) nogil

from sklearn.utils.weight_vector cimport WeightVector
from sklearn.utils.weight_vector cimport WeightVector64 as WeightVector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless you also template sgd_fast, you'll have to deal with both, and switch with

if floating is float:
    do something with WeightVector64
else:
    do something with WeightVector32

@@ -22,7 +22,7 @@ from numpy.math cimport INFINITY
cdef extern from "sgd_fast_helpers.h":
bint skl_isfinite(double) nogil

from sklearn.utils.weight_vector cimport WeightVector
from sklearn.utils.weight_vector cimport WeightVector64 as WeightVector
Copy link
Member

@jeremiedbb jeremiedbb Mar 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this diff don't belong to this PR

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the import but this lgtm

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicks + questions.

Also:

  • is WeightVector32 (going to be) used somewhere else?
  • it'd be nice to have tests ensuring that the .pyx and .pxd files that are generated by tempita are git-ignored. Or at least that we can't modify them accidentally. As @ogrisel noted these tests should also pass / be ignored on pre-compiled wheels where cython sources aren't available.

"""The L2 norm of the weight vector. """
return sqrt(self.sq_norm)

{{endfor}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline

cdef void reset_wscale(self) nogil
cdef {{c_type}} norm(self) nogil

{{endfor}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing newline

dtypes = [('64', 'double'),
('32', 'float')]

def get_dispatch(dtypes):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry not familiar with tempita (and the tempita doc links are broken): why do you need get_dispatch? You can't simply iterate over the dtypes list?

@glemaitre
Copy link
Member

@massich Could you solve the resolve the conflict and the address the comments

@mbatoul
Copy link
Contributor

mbatoul commented Jul 7, 2021

Hi @massich,

I saw that your PR has been stalled.

Are you still interested in continuing this work?

@mbatoul
Copy link
Contributor

mbatoul commented Jul 7, 2021

Hi @jeremiedbb @NicolasHug,

I started a PR #20481 to continue @massich's work.

I will address your comments there.

@cmarmo cmarmo added Superseded PR has been replace by a newer PR and removed help wanted labels Jul 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants