Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some ImageParam access can occur in bounds inference on host, but this ignores device_dirty #8481

Open
abadams opened this issue Nov 19, 2024 · 0 comments
Labels

Comments

@abadams
Copy link
Member

abadams commented Nov 19, 2024

Bounds inference lets ImageParam accesses that have all-constant args into bounds inference expressions evaluated at pipeline entry. Because all ImageParams are accessed through trivial wrapper Funcs, you can only make this happen by injecting a dependency on the FuncValueBounds of the wrapper Func itself.

This can create a situation where the host-side value doesn't match the device-side value, so the code that allocates or produces a Func doesn't match the code that accesses it, causing crashes or garbage output.

This is extremely contrived and unlikely to ever be triggered by real code, but there may be other bugs lurking due to places where ImageParam call nodes leak into start-up expressions.

Example from #8478 that produces garbage output:

using namespace Halide;

int main(int argc, char **argv) {

    ImageParam im(Int(32), 0);

    Func f("f"), g("g"), h("h");
    Var x;

    f = im.in();

    f.compute_root().gpu_single_thread();

    // FuncValueBounds of f will include the call to im, because there are no
    // vars in it. But f itself will only ever be computed on device, which only
    // requires im to be available on device.

    h(x) = x;
    h.compute_root();
    g(x) = h(x % f());
    // The bounds required of h depend on the func value bounds of f. These
    // bounds will be evaluated on the host, which needs to access im on the
    // host at the pipeline entry.

    // Make a buffer that's dirty on device
    auto buf = Buffer<int>::make_scalar();
    buf() = 3;
    Func make_big;
    make_big() = 256;
    make_big.gpu_single_thread();
    auto callable = make_big.compile_to_callable({});
    callable(buf);

    assert(buf.device_dirty());

    im.set(buf);

    // The call to g will access h at a coordinate up to 255, but h will only be
    // computed to be size 3, because the func value bounds of f include a
    // host-side access which ignores device_dirty.

    h.trace_realizations().trace_loads();
    g.trace_stores();
    g.realize({256});

    return 0;
}
@abadams abadams added the bug label Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant