Skip to content

LogisticRegression convert to float64 (for SAG solver) #13243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Feb 27, 2019

Conversation

massich
Copy link
Contributor

@massich massich commented Feb 25, 2019

Reference Issues/PRs

Works on #8769 and #11000 (for SAG solver) Fixes #9040

closes #11155 (takes over)

Joan Massich and others added 13 commits July 8, 2018 15:46
initial PR commit

seq_dataset.pyx generated from template

seq_dataset.pyx generated from template #2

rename variables

fused types consistency test for seq_dataset

a

sklearn/utils/tests/test_seq_dataset.py

new if statement

add doc

sklearn/utils/seq_dataset.pyx.tp

minor changes

minor changes

typo fix

check numeric accuracy only up 5th decimal

Address oliver's request for changing test name

add test for make_dataset and rename a variable in test_seq_dataset
@massich massich changed the title Henley is 8769 minus merged LogisticRegression convert to float64 (for SAG solver) Feb 25, 2019
@@ -245,8 +245,9 @@ def sag_solver(X, y, sample_weight=None, loss='log', alpha=1., beta=0.,
max_iter = 1000

if check_input:
X = check_array(X, dtype=np.float64, accept_sparse='csr', order='C')
y = check_array(y, dtype=np.float64, ensure_2d=False, order='C')
_dtype = [np.float64, np.float32]
Copy link
Contributor Author

@massich massich Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lesteve said (https://github.com/scikit-learn/scikit-learn/pull/11155/files#r192674850):

This does not seem to be covered by any of the tests.

@glemaitre responded:

I see a pattern that we applied sometimes: the solvers are always checking the inputs while the class level does not do it (e.g. KMeans)

Is it something that we want to extend here or we let the lines uncovered or removing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure about what you were pointing at. _dtype=[bla.] looks fine to me.

@massich massich force-pushed the henley_is_8769_minus_merged branch from 151fb1c to 5eb99b8 Compare February 25, 2019 15:35
dataset2 = CSRDataset(X_csr.data, X_csr.indptr, X_csr.indices,
y, sample_weight, seed=42)

X64 = iris.data.astype(np.float64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could create a pytest fixture which will return either 64 bits or 32 bits datasets. It can return X, X_sparse, y, sample_weight.

We can then use this fixture in parametrize.

@glemaitre
Copy link
Member

The linter is not happy

@massich
Copy link
Contributor Author

massich commented Feb 26, 2019

The last commit should solve the linter.

I tried to call the constructors with named parameters and avoid verbosity but when I do so, I get __cinit__ errors. That I don't seem to get around.

If green, we can merge it as it is or I could give a pass to sklearn/linear_model/tests/test_base.py

@massich
Copy link
Contributor Author

massich commented Feb 26, 2019

on second thought, since then you compare for consistancy within the test you need both 32 and 64 so you can't parametrize. We could always get a EXPECTED_VALUES and compare against that.

@@ -137,6 +137,11 @@ Support for Python 3.4 and below has been officially dropped.
:mod:`sklearn.linear_model`
...........................

- |Enhancement| :class:`linear_model.make_dataset` now preserves
``float32`` and ``float64`` dtypes. :issues:`8769` and `11000` by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11000 misses :issues: infront

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- |Enhancement| :class:`linear_model.make_dataset` now preserves ``float32``
  and ``float64`` dtypes. :issues:`8769` and :issues:`11000` by :user:`Nelle
  Varoquaux`_, :user:`Arthur Imbert`_, :user:`Guillaume Lemaitre <glemaitre>`,
  and :user:`Joan Massich <massich>`

@GaelVaroquaux
Copy link
Member

LGTM. Merging. This is a net improvement. More improvements can follow later.

@GaelVaroquaux GaelVaroquaux merged commit f02ef9f into scikit-learn:master Feb 27, 2019
@massich massich deleted the henley_is_8769_minus_merged branch February 27, 2019 10:45
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
…13243)

* Remove unused code

* Squash all the PR 9040 commits

initial PR commit

seq_dataset.pyx generated from template

seq_dataset.pyx generated from template #2

rename variables

fused types consistency test for seq_dataset

a

sklearn/utils/tests/test_seq_dataset.py

new if statement

add doc

sklearn/utils/seq_dataset.pyx.tp

minor changes

minor changes

typo fix

check numeric accuracy only up 5th decimal

Address oliver's request for changing test name

add test for make_dataset and rename a variable in test_seq_dataset

* FIX tests

* TST more numerically stable test_sgd.test_tol_parameter

* Added benchmarks to compare SAGA 32b and 64b

* Fixing gael's comments

* fix

* solve some issues

* PEP8

* Address lesteve comments

* fix merging

* avoid using assert_equal

* use all_close

* use explicit ArrayDataset64 and CSRDataset64

* fix: remove unused import

* Use parametrized to cover ArrayDaset-CSRDataset-32-64 matrix

* for consistency use 32 first then 64 + add 64 suffix to variables

* it would be cool if this worked !!!

* more verbose version

* revert SGD changes as much as possible.

* Add solvers back to bench_saga

* make 64 explicit in the naming

* remove checking native python type + add comparison between 32 64

* Add whatsnew with everyone with commits

* simplify a bit the testing

* simplify the parametrize

* update whatsnew

* fix pep8
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
…13243)

* Remove unused code

* Squash all the PR 9040 commits

initial PR commit

seq_dataset.pyx generated from template

seq_dataset.pyx generated from template scikit-learn#2

rename variables

fused types consistency test for seq_dataset

a

sklearn/utils/tests/test_seq_dataset.py

new if statement

add doc

sklearn/utils/seq_dataset.pyx.tp

minor changes

minor changes

typo fix

check numeric accuracy only up 5th decimal

Address oliver's request for changing test name

add test for make_dataset and rename a variable in test_seq_dataset

* FIX tests

* TST more numerically stable test_sgd.test_tol_parameter

* Added benchmarks to compare SAGA 32b and 64b

* Fixing gael's comments

* fix

* solve some issues

* PEP8

* Address lesteve comments

* fix merging

* avoid using assert_equal

* use all_close

* use explicit ArrayDataset64 and CSRDataset64

* fix: remove unused import

* Use parametrized to cover ArrayDaset-CSRDataset-32-64 matrix

* for consistency use 32 first then 64 + add 64 suffix to variables

* it would be cool if this worked !!!

* more verbose version

* revert SGD changes as much as possible.

* Add solvers back to bench_saga

* make 64 explicit in the naming

* remove checking native python type + add comparison between 32 64

* Add whatsnew with everyone with commits

* simplify a bit the testing

* simplify the parametrize

* update whatsnew

* fix pep8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants