-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Fused type makedataset #9040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
b64e562
b763fa5
1e9d3d0
1bcee61
28e931c
7632545
acf0e3a
14676c9
28c96eb
2b551b0
9351555
d0796ea
d1cad04
225e02e
9b42554
bde8f7c
8605e90
b57ff6d
f168efc
84b2ba7
cab57f7
9700d0a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
{{py: | ||
|
||
""" | ||
Dataset abstractions for sequential data access. | ||
|
||
Template file for easily generate fused types consistent code using Tempita | ||
(https://github.com/cython/cython/blob/master/Cython/Tempita/_tempita.py). | ||
|
||
Generated file: seq_dataset.pxd | ||
|
||
Each class is duplicated for all dtypes (float and double). The keywords | ||
between double braces are substituted in setup.py. | ||
""" | ||
|
||
# name, c_type | ||
dtypes = [('', 'double'), | ||
('32', 'float')] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it is much more clear if we use: dtypes = [('64', 'double'),
 ('32', 'float')] I found an error complaining about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To avoid refactoring. But I don't think it is a good idea |
||
|
||
def get_dispatch(dtypes): | ||
for name, c_type in dtypes: | ||
yield name, c_type | ||
|
||
}} | ||
|
||
{{for name, c_type in get_dispatch(dtypes)}} | ||
|
||
#------------------------------------------------------------------------------ | ||
|
||
""" | ||
Dataset abstractions for sequential data access. | ||
WARNING: Do not edit .pxd file directly, it is generated from .pxd.tp | ||
""" | ||
|
||
cimport numpy as np | ||
|
||
# SequentialDataset and its two concrete subclasses are (optionally randomized) | ||
# iterators over the rows of a matrix X and corresponding target values y. | ||
|
||
|
||
cdef class SequentialDataset{{name}}: | ||
cdef int current_index | ||
cdef np.ndarray index | ||
cdef int *index_data_ptr | ||
cdef Py_ssize_t n_samples | ||
cdef np.uint32_t seed | ||
|
||
cdef void shuffle(self, np.uint32_t seed) nogil | ||
cdef int _get_next_index(self) nogil | ||
cdef int _get_random_index(self) nogil | ||
|
||
cdef void _sample(self, {{c_type}} **x_data_ptr, int **x_ind_ptr, | ||
int *nnz, {{c_type}} *y, {{c_type}} *sample_weight, | ||
int current_index) nogil | ||
cdef void next(self, {{c_type}} **x_data_ptr, int **x_ind_ptr, | ||
int *nnz, {{c_type}} *y, {{c_type}} *sample_weight) nogil | ||
cdef int random(self, {{c_type}} **x_data_ptr, int **x_ind_ptr, | ||
int *nnz, {{c_type}} *y, {{c_type}} *sample_weight) nogil | ||
|
||
|
||
cdef class ArrayDataset{{name}}(SequentialDataset{{name}}): | ||
cdef np.ndarray X | ||
cdef np.ndarray Y | ||
cdef np.ndarray sample_weights | ||
cdef Py_ssize_t n_features | ||
cdef np.npy_intp X_stride | ||
cdef {{c_type}} *X_data_ptr | ||
cdef {{c_type}} *Y_data_ptr | ||
cdef np.ndarray feature_indices | ||
cdef int *feature_indices_ptr | ||
cdef {{c_type}} *sample_weight_data | ||
|
||
|
||
cdef class CSRDataset{{name}}(SequentialDataset{{name}}): | ||
cdef np.ndarray X_data | ||
cdef np.ndarray X_indptr | ||
cdef np.ndarray X_indices | ||
cdef np.ndarray Y | ||
cdef np.ndarray sample_weights | ||
cdef {{c_type}} *X_data_ptr | ||
cdef int *X_indptr_ptr | ||
cdef int *X_indices_ptr | ||
cdef {{c_type}} *Y_data_ptr | ||
cdef {{c_type}} *sample_weight_data | ||
|
||
{{endfor}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The codecov chrome/firefox extension tells me that those lines are not covered. Please add a test for that case (and install the codecov browser extension ;).
This test should probably fit a
SGDClassifier
on 32 bit float iris and check that thecoef_
attribute should be 32 bit float as well ( and the output ofdecision_function
should also output a float32 array).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, there isn't any specific test for
make_dataset
. It's supposed to be covered by the tests for the sag solver. So we can either put the test off to a next PR about the sag solver (#9020) or doing the test you are proposing in this PR. What do you prefer?