Skip to content

Optimizing compiler is too clever #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JakubVanek opened this issue Jan 10, 2018 · 6 comments
Open

Optimizing compiler is too clever #1

JakubVanek opened this issue Jan 10, 2018 · 6 comments

Comments

@JakubVanek
Copy link

Hi,

I've tried using System.nanoTime and I still get negative durations on my machine (OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode):

GeneralBench 1.2
[...]
  200000 byte add:                          1926082 ns    103837738 ops/sec
  200000 byte sub:                          -847484 ns   -235992655 ops/sec
  200000 byte mul:                          -818918 ns   -244224696 ops/sec
  200000 byte div:                          -597373 ns   -334799195 ops/sec
[...]

The compiler (source → bytecode or bytecode → x86) must be somehow reordering the program flow.

private static int benchArithByte(int count)
{
    byte a, b, c;

    long nullTime = BenchUtils.getIterationTime(count);
    long start, end;

    // Add
    a = (byte)0x77;
    b = (byte)0x11;
    start = System.nanoTime();
    for (int i = 0; i < count; i++)
        c = (byte) (a + b);
    end = System.nanoTime();
    report(count, "byte add", count, "ops", end - start - nullTime);

    // sub
    a = (byte) 0x88;
    b = (byte) 0x11;
    start = System.nanoTime();
    for (int i = 0; i < count; i++)
        c = (byte) (a - b);
    end = System.nanoTime();
    report(count, "byte sub", count, "ops", end - start - nullTime);

    // Mul
    a = (byte) 0x0F;
    b = (byte) 0x11;
    start = System.nanoTime();
    for (int i = 0; i < count; i++)
        c = (byte) (a * b);
    end = System.nanoTime();
    report(count, "byte mul", count, "ops", end - start - nullTime);

    // Div
    a = (byte) 0xFE;
    b = (byte) 0x0E;
    start = System.nanoTime();
    for (int i = 0; i < count; i++)
        c = (byte) (a / b);
    end = System.nanoTime();
    report(count, "byte div", count, "ops", end - start - nullTime);
    
    return count * 4;
}

The same thing happens on the brick with OpenJDK 9, with the exception that it's in the float benchmark.

GeneralBench 1.2
[...]
  200000 float add:                       153755469 ns      1300766 ops/sec
  200000 float sub:                          723385 ns    276477947 ops/sec
  200000 float mul:                         -891574 ns   -224322378 ops/sec
  200000 float div:                          -79218 ns  -2524678734 ops/sec
[...]

Well, I know of two possible solutions. One is to try to isolate benchmarks to their own functions and somehow limit the optimizer, the second is to use Java Microbenchmark Harness - I've read on StackOverflow that it is the recommended way of running tests like this.

Jakub Vaněk

@JakubVanek
Copy link
Author

Hmmm... inventing it on my own is a little bit too complicated.
Suppose we have this code:

float a = 0x88888888p0f;
float b = 0x11111111p0f;
float c = 0f;
start = System.nanoTime();
for (int i = 0; i < count; i++)
	c = (float)(a - b);
end = System.nanoTime();

The compiler might be clever enough to figure out that a and b don't change and might cache the result.
Well, let's make a and b volatile.
Now we have the problem of memory access jitter. I don't know the bytecode, but it might have something like load_variable, load_constant, tested_op and store_variable ops.
On each loop, we are doing this:

load_variable  to:reg0  from:mem_a
load_variable  to:reg1  from:mem_b
tested_op      to:reg2  from:reg0 from:reg1
store_variable to:mem_c from:reg2

But we want to measure only this:

tested_op from:reg0 from:reg1 to:reg2

I'm afraid this can't be solved even by JMH.

@JakubVanek
Copy link
Author

@jabrena
Copy link
Member

jabrena commented Jan 10, 2018

Experiment with JMH could be very nice for the project.

@jabrena
Copy link
Member

jabrena commented Jan 10, 2018

Do you have multiple bricks to experiment?

@JakubVanek
Copy link
Author

Hi,

no, I have only one education set and it's borrowed from FEE CTU.

Jakub

@jabrena
Copy link
Member

jabrena commented Jan 10, 2018

If you could get another one by Science purposes, we could experiment with RMI to synchronise java processes located in different JVMs.

Good night

Juan Antonio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants