ENH: ufunc helper for variance #13263

0x0L · 2019-04-04T19:28:00Z

Hello

This is a PR associated with #13199. Might also fix #9631

It implements a two-pass mean variance algorithm with an error correction term (see discussion in #13199).

It's more of a hack than a perfect solution: instead of allocating one array for the demeaned input, we allocate two arrays (one if we drop the bias correction term) with only a single dimension already reduced. This should never be worse than before but could be as bad.

0x0L · 2019-04-19T10:14:23Z

I'm not sure I understand why the test fails on windows. On a windows machine dtype 'g' gives me a float64 whereas on a Linux machine I get float128

charris · 2019-04-19T12:55:29Z

On a windows machine dtype 'g' gives me a float64 whereas on a Linux machine I get float128

Not sure what the question is, but for MSVC, long double is the same as double, it has always been that way. In general, long double can be IBM double double, IEEE extended precision, IEEE quad precision, or just double. Its type varies across platforms.

0x0L · 2019-04-19T17:03:16Z

@charris thx for your reply

The real issue is that the test fails (only on windows) with the following error:
AssertionError: res <class 'numpy.float64'>, tgt <class 'numpy.float64'>
which I find a bit confusing :)

matthew-brett · 2019-04-19T17:20:42Z

I don't know if it's related, but we've had terrible troubles with something similar here: nipy/nibabel#665

effigies · 2019-05-20T13:47:31Z

Tagging in #12096. I guess it's good to see that it's not just us...

numpy/core/src/umath/_umath_tests.c.src

numpy/core/_methods.py

0x0L · 2019-06-09T00:42:34Z

@eric-wieser
Any idea why some tests fail on windows ?

EDIT @matthew-brett @effigies
Probably related to #10151

numpy/core/_methods.py

0x0L · 2019-09-12T18:40:26Z

@eric-wieser
I think I addressed your remarks. However I still haven't found why the tests failed on windows.

seberg · 2019-09-12T18:59:53Z

Looking at the test failures, this is due to np.dtype(np.longdouble) == np.dtype(np.double) on windows, but the types still being the same. This makes sense because of how the type resolution works, it is to some degree completely fine. We could fix it, by writing a custom type resolver, it would do exactly the same as the SimpleUniformTypeResolver, but ensure that the second output is non-complex.

EDIT: To be honest, I tend towards just adapting the test or mark (that one part) as known-fail on windows. These type of "inconsistencies" exist for many other ufuncs just the same if you use long double on windows.

Micky774 · 2022-02-22T18:39:03Z

Hey there, I was just wondering what the status of this feature was -- is this still being worked on? It would be a welcome change, since we're looking to use it as a significant improvement for a downstream task in scikit-learn :)

0x0L · 2022-02-22T19:09:14Z

Not really. I'm not too happy with the code and it's failing of few tests on windows.

InessaPawson · 2024-04-08T01:16:16Z

@0x0L I'll close the PR since you are not planning on continuing the work. Thank you for giving it a go!

mattip changed the title ~~ufunc helper for variance~~ ENH: ufunc helper for variance Apr 7, 2019

mattip added 01 - Enhancement component: numpy._core component: numpy.ufunc labels Apr 7, 2019

effigies mentioned this pull request May 20, 2019

[WIN] Stochastic bug: Minc2 data may load as np.longdouble in h5py < 2.10 nipy/nibabel#665

Closed

0x0L force-pushed the var_ufunc branch 3 times, most recently from 91883c1 to 719660f Compare June 8, 2019 22:28