-
Notifications
You must be signed in to change notification settings - Fork 12.4k
SYCL: use 1D kernel for set_rows #14618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
After the merge I've seen these tests failing in our internal CI:
Do we know if we are missing something else or should I revert? Edit: I'm not 100% sure if this is the patch that broke things, our CI skipped #14617 |
This seems problem with MUL_MAT so I think this PR is unrelated. Might need to check 65a3ebb I'll test it in a moment. Although our ci seems passing |
Yeah sorry to disrupt you, I'm currently checking the other Commit as well, it's most likely the cause |
Confirmed to be 65a3ebb, again, sorry for the heads up. I got misled by the CI pointing at your PR. Seems to be specific to the Nvidia platform, so I think we can consider leaving it since Intel is the priority here. What do you think? Though probably the dev of the PR won't want anything merged that broke something. |
@Alcpz IMO, our priority should be Intel. If it doesn't effect Intel GPUs then it's up to you whether you want to fix it or not. We need to work on softmax kernel's broadcasting and batching support next and I prefer if you guys do it (I have a working implementation but somehow I had to remove all those templated block sizes in order to get it working, so I did not open a PR yet here). |
Continue from #14562
Got it at the same level of performance as the cpy kernel
With LLAMA_SET_ROWS=0
Thanks @AD2605