-
Notifications
You must be signed in to change notification settings - Fork 397
Collection of branchless algorithms from Hacker's Delight. #5202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dfd5b59
to
2f23075
Compare
2f23075
to
0d0b69f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a couple of minor suggestions. Mostly on comments.
@@ -765,6 +765,10 @@ object RuntimeLong { | |||
|
|||
@inline | |||
def clz(a: RuntimeLong): Int = { | |||
/* Warning to the next adventurer to come here: the best branchless | |||
* algorithm I found was worse than the naive implementation here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add how it was worse? Performance wise? Or number of operations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance-wise. I added that to the comment.
@@ -765,6 +765,10 @@ object RuntimeLong { | |||
|
|||
@inline | |||
def clz(a: RuntimeLong): Int = { | |||
/* Warning to the next adventurer to come here: the best branchless | |||
* algorithm I found was worse than the naive implementation here. | |||
* The algorithm was `val hiz = nlz(hi); hiz + ((hiz << 26 >> 31) & nlz(lo))`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* The algorithm was `val hiz = nlz(hi); hiz + ((hiz << 26 >> 31) & nlz(lo))`. | |
* The algorithm was `val hiz = clz(hi); hiz + ((hiz << 26 >> 31) & clz(lo))`. |
Seems we always abbreviate that way? Only the JDK calls it numberOfLeadingZeros.
@inline def signum(i: scala.Int): scala.Int = | ||
if (i == 0) 0 else if (i < 0) -1 else 1 | ||
@inline def signum(i: scala.Int): scala.Int = { | ||
// Hacker's Delight, Section 2-7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Hacker's Delight, Section 2-7 | |
// Hacker's Delight, Section 2-8 |
? (I'm looking at second edition: https://doc.lagout.org/security/Hackers%20Delight.pdf)
if (hi < 0) -1 | ||
else if (hi == 0 && i.toInt == 0) 0 | ||
else 1 | ||
/* Hacker's Delight, Section 2-7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* Hacker's Delight, Section 2-7 | |
/* Hacker's Delight, Section 2-8 |
else throw new ArithmeticException("Long overflow") | ||
if (((a ^ b) & (res ^ a)) < 0L) | ||
longOverflow() | ||
res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand how the overflow checks for addition / subtraction up to here in the file relate to Hacker's Delight. But I understand that they essentially do the same as before, just with bitwise operations. (so no additional comments required IMO, but maybe remove the hacker's delight comments?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I elaborated the comments to point more specifically to the formulas we use. But you're right, it's basically applying De Morgan to the negation of our previous formulas.
@inline | ||
def toIntExact(a: scala.Long): scala.Int = { | ||
val res = a.toInt | ||
if (res.toLong != a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment why this is better than checking against min / max value?
I realize the previous code had branches, but it seems they could easily be avoided (or do we not have a non short circuiting boolean and?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment. The long comparisons require 2 int comparisons each. With the new code, we only need 1 int comparison.
class MathTestOnJDK11 { | ||
|
||
@noinline | ||
private def hideFromOptimizer(x: Int): Int = x | ||
|
||
@Test def multiplyExactLongInt(): Unit = { | ||
for (n <- Seq(Long.MinValue, -1L, 0L, 1L, Long.MaxValue)) { | ||
val nInt = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this, duplicate the loop and have the lhs / rhs cases separate?
As well as some related improvements. While we're there, we also normalize the shape of overflow checks in `Math.xExact` methods.
0d0b69f
to
b11b0ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased to address a benign conflict. Otherwise I only changed comments (and the test) to address the review.
@@ -765,6 +765,10 @@ object RuntimeLong { | |||
|
|||
@inline | |||
def clz(a: RuntimeLong): Int = { | |||
/* Warning to the next adventurer to come here: the best branchless | |||
* algorithm I found was worse than the naive implementation here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance-wise. I added that to the comment.
@inline | ||
def toIntExact(a: scala.Long): scala.Int = { | ||
val res = a.toInt | ||
if (res.toLong != a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment. The long comparisons require 2 int comparisons each. With the new code, we only need 1 int comparison.
else throw new ArithmeticException("Long overflow") | ||
if (((a ^ b) & (res ^ a)) < 0L) | ||
longOverflow() | ||
res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I elaborated the comments to point more specifically to the formulas we use. But you're right, it's basically applying De Morgan to the negation of our previous formulas.
As well as some related improvements.
While we're there, we also normalize the shape of overflow checks in
Math.xExact
methods.