-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
LogisticRegression convert to float64 (for SAG solver) #13243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LogisticRegression convert to float64 (for SAG solver) #13243
Conversation
initial PR commit seq_dataset.pyx generated from template seq_dataset.pyx generated from template #2 rename variables fused types consistency test for seq_dataset a sklearn/utils/tests/test_seq_dataset.py new if statement add doc sklearn/utils/seq_dataset.pyx.tp minor changes minor changes typo fix check numeric accuracy only up 5th decimal Address oliver's request for changing test name add test for make_dataset and rename a variable in test_seq_dataset
@@ -245,8 +245,9 @@ def sag_solver(X, y, sample_weight=None, loss='log', alpha=1., beta=0., | |||
max_iter = 1000 | |||
|
|||
if check_input: | |||
X = check_array(X, dtype=np.float64, accept_sparse='csr', order='C') | |||
y = check_array(y, dtype=np.float64, ensure_2d=False, order='C') | |||
_dtype = [np.float64, np.float32] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lesteve said (https://github.com/scikit-learn/scikit-learn/pull/11155/files#r192674850):
This does not seem to be covered by any of the tests.
@glemaitre responded:
I see a pattern that we applied sometimes: the solvers are always checking the inputs while the class level does not do it (e.g. KMeans)
Is it something that we want to extend here or we let the lines uncovered or removing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure about what you were pointing at. _dtype=[bla.]
looks fine to me.
151fb1c
to
5eb99b8
Compare
dataset2 = CSRDataset(X_csr.data, X_csr.indptr, X_csr.indices, | ||
y, sample_weight, seed=42) | ||
|
||
X64 = iris.data.astype(np.float64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could create a pytest fixture which will return either 64 bits or 32 bits datasets. It can return X, X_sparse, y, sample_weight.
We can then use this fixture in parametrize.
The linter is not happy |
The last commit should solve the linter. I tried to call the constructors with named parameters and avoid verbosity but when I do so, I get If green, we can merge it as it is or I could give a pass to |
on second thought, since then you compare for consistancy within the test you need both 32 and 64 so you can't parametrize. We could always get a EXPECTED_VALUES and compare against that. |
doc/whats_new/v0.21.rst
Outdated
@@ -137,6 +137,11 @@ Support for Python 3.4 and below has been officially dropped. | |||
:mod:`sklearn.linear_model` | |||
........................... | |||
|
|||
- |Enhancement| :class:`linear_model.make_dataset` now preserves | |||
``float32`` and ``float64`` dtypes. :issues:`8769` and `11000` by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
11000 misses :issues:
infront
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- |Enhancement| :class:`linear_model.make_dataset` now preserves ``float32``
and ``float64`` dtypes. :issues:`8769` and :issues:`11000` by :user:`Nelle
Varoquaux`_, :user:`Arthur Imbert`_, :user:`Guillaume Lemaitre <glemaitre>`,
and :user:`Joan Massich <massich>`
LGTM. Merging. This is a net improvement. More improvements can follow later. |
…13243) * Remove unused code * Squash all the PR 9040 commits initial PR commit seq_dataset.pyx generated from template seq_dataset.pyx generated from template #2 rename variables fused types consistency test for seq_dataset a sklearn/utils/tests/test_seq_dataset.py new if statement add doc sklearn/utils/seq_dataset.pyx.tp minor changes minor changes typo fix check numeric accuracy only up 5th decimal Address oliver's request for changing test name add test for make_dataset and rename a variable in test_seq_dataset * FIX tests * TST more numerically stable test_sgd.test_tol_parameter * Added benchmarks to compare SAGA 32b and 64b * Fixing gael's comments * fix * solve some issues * PEP8 * Address lesteve comments * fix merging * avoid using assert_equal * use all_close * use explicit ArrayDataset64 and CSRDataset64 * fix: remove unused import * Use parametrized to cover ArrayDaset-CSRDataset-32-64 matrix * for consistency use 32 first then 64 + add 64 suffix to variables * it would be cool if this worked !!! * more verbose version * revert SGD changes as much as possible. * Add solvers back to bench_saga * make 64 explicit in the naming * remove checking native python type + add comparison between 32 64 * Add whatsnew with everyone with commits * simplify a bit the testing * simplify the parametrize * update whatsnew * fix pep8
…t-learn#13243)" This reverts commit 0b56ac8.
…t-learn#13243)" This reverts commit 0b56ac8.
…13243) * Remove unused code * Squash all the PR 9040 commits initial PR commit seq_dataset.pyx generated from template seq_dataset.pyx generated from template scikit-learn#2 rename variables fused types consistency test for seq_dataset a sklearn/utils/tests/test_seq_dataset.py new if statement add doc sklearn/utils/seq_dataset.pyx.tp minor changes minor changes typo fix check numeric accuracy only up 5th decimal Address oliver's request for changing test name add test for make_dataset and rename a variable in test_seq_dataset * FIX tests * TST more numerically stable test_sgd.test_tol_parameter * Added benchmarks to compare SAGA 32b and 64b * Fixing gael's comments * fix * solve some issues * PEP8 * Address lesteve comments * fix merging * avoid using assert_equal * use all_close * use explicit ArrayDataset64 and CSRDataset64 * fix: remove unused import * Use parametrized to cover ArrayDaset-CSRDataset-32-64 matrix * for consistency use 32 first then 64 + add 64 suffix to variables * it would be cool if this worked !!! * more verbose version * revert SGD changes as much as possible. * Add solvers back to bench_saga * make 64 explicit in the naming * remove checking native python type + add comparison between 32 64 * Add whatsnew with everyone with commits * simplify a bit the testing * simplify the parametrize * update whatsnew * fix pep8
Reference Issues/PRs
Works on #8769 and #11000 (for SAG solver) Fixes #9040
closes #11155 (takes over)