-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
TST: Accuracy test float32 sin/cos/exp/log for AVX platforms #15549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CI failure does not seem related to the PR.
|
Yeah, the recent builds seem to all fail. Is there a reason why we run the accuracy tests only on some platforms? I guess they should pass trivially for the non-AVX, etc. versions, but does not mean we need to skip that? |
This is meant to test just the transcendental functions implemented within NumPy (which is currently AVX implementations of float32 sin/cos/exp/log). For other transcendental functions, NumPy relies on the standard C library implementation of the respective platform whose max ULP error seem to wildly vary (as I found out in PR #14048). Its really hard to get the max ULP error across all platforms to use that in the test. |
OK, thanks, not much we can do, I guess. If we knew where it fails, making it |
cd1a1e1
to
1ff480d
Compare
Yeah there is no neat way to do this. I could enable the test and list all the resulting CI failures and then disable the test for those. But I am not sure what benefit this provides. |
1ff480d
to
9da1d31
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I trust matti on this. Just had started to look at it, since this is a benchmark, I do not care too much about the PEP8 fixes.
Main thing is that I wonder if reducing the number of epochs will not give identical results much quicker.
benchmarks/benchmarks/bench_avx.py
Outdated
self.alpha = 0.1 | ||
|
||
def time_train(self): | ||
self.train(5000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally like some real world bench marks, but IIRC, asv will run the whole thing several times, and I am wondering if you do not get identical results if you drastically reduce the number of epochs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A real world benchmark would be great too. I was using the census dataset here for benchmarking. What would be the best way to add that dataset to NumPy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nvm... I doubt it changes much, right? I thought a benchmark that does not just test a single function, but more than that (although I suppose YMMV when it comes to inclusion as benchmark). Anyway, I think this is good for now, and I assume that adding real data does not add much?
Is the 5000 epochs necessary though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the numbers were the same for census dataset and for the random. The time only really depends on the number of epochs and dimensions of the data.
5000 epochs seems to run < 5s on SkylakeX CPU. 1000 epochs takes ~ 1.1s. Perhaps I could reduce it to a 1000 epochs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the 5000 epochs because it took those many to reasonably converge on the real dataset. For the sake of benchmarking it shouldn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say whatever you feel comfortable to give a reasonable benchmark result :), if 100 or less gives fairly stable, consistent timings, that seems also fine? Not sure it is problematic, just thought we should not make them unnecessarily slow.
9da1d31
to
b8c0e9b
Compare
b8c0e9b
to
0a5eb43
Compare
Thanks @r-devulap. This is a start, we should expand this to more ufuncs and more platforms. |
Also added benchmarks for transcendental functions which can be used to benchmark the AVX code after re-writing them using the Universal Intrinsics framework.