Skip to content

Conversation

r-devulap
Copy link
Member

Also added benchmarks for transcendental functions which can be used to benchmark the AVX code after re-writing them using the Universal Intrinsics framework.

@r-devulap
Copy link
Member Author

CI failure does not seem related to the PR.

RuntimeError: failed to find interpreter for Builtin discover of python_spec='python3-dbg'
The command "./tools/travis-before-install.sh" failed and exited with 1 during .

@seberg
Copy link
Member

seberg commented Feb 11, 2020

Yeah, the recent builds seem to all fail. Is there a reason why we run the accuracy tests only on some platforms? I guess they should pass trivially for the non-AVX, etc. versions, but does not mean we need to skip that?

@r-devulap
Copy link
Member Author

This is meant to test just the transcendental functions implemented within NumPy (which is currently AVX implementations of float32 sin/cos/exp/log). For other transcendental functions, NumPy relies on the standard C library implementation of the respective platform whose max ULP error seem to wildly vary (as I found out in PR #14048). Its really hard to get the max ULP error across all platforms to use that in the test.

@seberg
Copy link
Member

seberg commented Feb 11, 2020

OK, thanks, not much we can do, I guess. If we knew where it fails, making it xfail only on those platforms could be nice, but I suppose that is tricky? Not sure that xfail vs. skip as it was adds much to it, so I am OK with just saying that it tests the manually implemented loops for now.

@r-devulap
Copy link
Member Author

OK, thanks, not much we can do, I guess. If we knew where it fails, making it xfail only on those platforms could be nice, but I suppose that is tricky?

Yeah there is no neat way to do this. I could enable the test and list all the resulting CI failures and then disable the test for those. But I am not sure what benefit this provides.

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I trust matti on this. Just had started to look at it, since this is a benchmark, I do not care too much about the PEP8 fixes.

Main thing is that I wonder if reducing the number of epochs will not give identical results much quicker.

self.alpha = 0.1

def time_train(self):
self.train(5000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally like some real world bench marks, but IIRC, asv will run the whole thing several times, and I am wondering if you do not get identical results if you drastically reduce the number of epochs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A real world benchmark would be great too. I was using the census dataset here for benchmarking. What would be the best way to add that dataset to NumPy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nvm... I doubt it changes much, right? I thought a benchmark that does not just test a single function, but more than that (although I suppose YMMV when it comes to inclusion as benchmark). Anyway, I think this is good for now, and I assume that adding real data does not add much?
Is the 5000 epochs necessary though?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the numbers were the same for census dataset and for the random. The time only really depends on the number of epochs and dimensions of the data.

5000 epochs seems to run < 5s on SkylakeX CPU. 1000 epochs takes ~ 1.1s. Perhaps I could reduce it to a 1000 epochs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the 5000 epochs because it took those many to reasonably converge on the real dataset. For the sake of benchmarking it shouldn't matter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say whatever you feel comfortable to give a reasonable benchmark result :), if 100 or less gives fairly stable, consistent timings, that seems also fine? Not sure it is problematic, just thought we should not make them unnecessarily slow.

@charris charris closed this Feb 13, 2020
@charris charris reopened this Feb 13, 2020
@seberg seberg changed the title TEST: Enable accuracy tests for float32 sin/cos/exp/log for AVX platforms TST: Enable accuracy tests for float32 sin/cos/exp/log for AVX platforms Feb 14, 2020
@seberg seberg changed the title TST: Enable accuracy tests for float32 sin/cos/exp/log for AVX platforms TST: Accuracy test float32 sin/cos/exp/log for AVX platforms Feb 14, 2020
@mattip mattip merged commit cd02ddc into numpy:master Feb 16, 2020
@mattip
Copy link
Member

mattip commented Feb 16, 2020

Thanks @r-devulap. This is a start, we should expand this to more ufuncs and more platforms.

@rgommers rgommers added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
05 - Testing component: numpy._core component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants