-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[Flang][OpenMP][Runtime] Minor Flang runtime for OpenMP AMDGPU modifications #152631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Flang][OpenMP][Runtime] Minor Flang runtime for OpenMP AMDGPU modifications #152631
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
a3304f0
to
cc89cea
Compare
@@ -127,8 +127,8 @@ class Elementwise { | |||
const Descriptor &instance_, *from_{nullptr}; | |||
std::size_t elements_{instance_.InlineElements()}; | |||
std::size_t elementAt_{0}; | |||
SubscriptValue subscripts_[common::maxRank]; | |||
SubscriptValue fromSubscripts_[common::maxRank]; | |||
SubscriptValue subscripts_[maxRank]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove it if you wish, but this flags up as a warning in Clang I believe, that'll cause us to error out in with -werror @dpalermo will know a bit more about this alteration I think :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the warning from clang?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be misremembering the reason, it's been a while, I've checked and it's no longer a problem, so I'm happy to revert this :-)
@@ -158,8 +158,8 @@ template <typename STORE, std::size_t minBuffer = 65536> class FileFrame { | |||
// Avoid passing a null pointer, since it would result in an undefined | |||
// behavior. | |||
if (old != nullptr) { | |||
std::memcpy(buffer_, old + start_, chunk); | |||
std::memcpy(buffer_ + chunk, old, length_ - chunk); | |||
Fortran::runtime::memcpy(buffer_, old + start_, chunk); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the Fortran
prefix, and using it makes this code inconsistent with other namespace usage in the runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, will make the alteration in the next update after the other discussions are resolved
@@ -161,4 +161,4 @@ void std::__libcpp_verbose_abort(char const *format, ...) noexcept( | |||
} | |||
#endif | |||
|
|||
RT_EXT_API_GROUP_END | |||
RT_EXT_API_GROUP_END |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you just deleted the terminal newline here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, thank you! Will re-add in subsequent update.
@@ -23,6 +23,16 @@ | |||
#define STD_FILL_N_UNSUPPORTED 1 | |||
#endif | |||
|
|||
#if !defined(STD_MEMSET_UNSUPPORTED) && \ | |||
(defined(__CUDACC__) || defined(__CUDA__)) && defined(__CUDA_ARCH__) | |||
#define STD_MEMSET_UNSUPPORTED 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this change existing compilations that work with CUDA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to remove it if we wish, it was to maintain consistency with the rest of the CUDA support below (e.g. memmove) when we added the other bits and pieces, @vzakhari may know better if this is a reasonable addition or not or you of course might as well :-) so please do advise if either of you wish it removed or altered, happy to do so!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe std::memset
and std::memcpy
are supported by at least nvcc, so I would prefer these additions to be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, will do so on the next pass!
@@ -63,6 +73,24 @@ | |||
#define STD_TOUPPER_UNSUPPORTED 1 | |||
#endif | |||
|
|||
#if defined(OMP_OFFLOAD_BUILD) || defined(OMP_NOHOST_BUILD) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add this only for the particular OpenMP compiler you are using currently? I think some other OpenMP compilers may support some of these STD methods, so I would prefer not to disable them all for all compilers that may be used for out-of-tree flang-rt builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, I think we should not do this for the host part of the offload build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add this only for the particular OpenMP compiler you are using currently? I think some other OpenMP compilers may support some of these STD methods, so I would prefer not to disable them all for all compilers that may be used for out-of-tree flang-rt builds.
That makes sense, I believe it should be possible, I think Clang (which is what we use) should have an environment variable specified during compilation to notify what compiler is in use, so should be able to use that if I'm remembering correctly.
In addition, I think we should not do this for the host part of the offload build.
Makes sense as well, I'll change the || -> && and it should only be applicable to the offload build and fall back on the std or builtin implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some replies inline, I'll update the PR once the discussions have been concluded!
@@ -127,8 +127,8 @@ class Elementwise { | |||
const Descriptor &instance_, *from_{nullptr}; | |||
std::size_t elements_{instance_.InlineElements()}; | |||
std::size_t elementAt_{0}; | |||
SubscriptValue subscripts_[common::maxRank]; | |||
SubscriptValue fromSubscripts_[common::maxRank]; | |||
SubscriptValue subscripts_[maxRank]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove it if you wish, but this flags up as a warning in Clang I believe, that'll cause us to error out in with -werror @dpalermo will know a bit more about this alteration I think :-)
@@ -158,8 +158,8 @@ template <typename STORE, std::size_t minBuffer = 65536> class FileFrame { | |||
// Avoid passing a null pointer, since it would result in an undefined | |||
// behavior. | |||
if (old != nullptr) { | |||
std::memcpy(buffer_, old + start_, chunk); | |||
std::memcpy(buffer_ + chunk, old, length_ - chunk); | |||
Fortran::runtime::memcpy(buffer_, old + start_, chunk); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, will make the alteration in the next update after the other discussions are resolved
@@ -161,4 +161,4 @@ void std::__libcpp_verbose_abort(char const *format, ...) noexcept( | |||
} | |||
#endif | |||
|
|||
RT_EXT_API_GROUP_END | |||
RT_EXT_API_GROUP_END |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, thank you! Will re-add in subsequent update.
@@ -23,6 +23,16 @@ | |||
#define STD_FILL_N_UNSUPPORTED 1 | |||
#endif | |||
|
|||
#if !defined(STD_MEMSET_UNSUPPORTED) && \ | |||
(defined(__CUDACC__) || defined(__CUDA__)) && defined(__CUDA_ARCH__) | |||
#define STD_MEMSET_UNSUPPORTED 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to remove it if we wish, it was to maintain consistency with the rest of the CUDA support below (e.g. memmove) when we added the other bits and pieces, @vzakhari may know better if this is a reasonable addition or not or you of course might as well :-) so please do advise if either of you wish it removed or altered, happy to do so!
@@ -63,6 +73,24 @@ | |||
#define STD_TOUPPER_UNSUPPORTED 1 | |||
#endif | |||
|
|||
#if defined(OMP_OFFLOAD_BUILD) || defined(OMP_NOHOST_BUILD) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add this only for the particular OpenMP compiler you are using currently? I think some other OpenMP compilers may support some of these STD methods, so I would prefer not to disable them all for all compilers that may be used for out-of-tree flang-rt builds.
That makes sense, I believe it should be possible, I think Clang (which is what we use) should have an environment variable specified during compilation to notify what compiler is in use, so should be able to use that if I'm remembering correctly.
In addition, I think we should not do this for the host part of the offload build.
Makes sense as well, I'll change the || -> && and it should only be applicable to the offload build and fall back on the std or builtin implementations.
fc16dff
to
be67c22
Compare
Attempted to address all the feedback in the last commit (plus force pushed once more to fix some formatting) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there other APIs from std::
in the runtime that should be replaced?
flang-rt/lib/runtime/assign.cpp
Outdated
@@ -15,6 +15,7 @@ | |||
#include "flang-rt/runtime/tools.h" | |||
#include "flang-rt/runtime/type-info.h" | |||
#include "flang-rt/runtime/work-queue.h" | |||
#include "flang/Runtime/stop.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this additional include directive necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will double check if the includes are necessary!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No requirement for this one anymore, thank you for pointing it out!
flang-rt/lib/runtime/copy.cpp
Outdated
@@ -10,6 +10,7 @@ | |||
#include "stack.h" | |||
#include "flang-rt/runtime/descriptor.h" | |||
#include "flang-rt/runtime/terminator.h" | |||
#include "flang-rt/runtime/tools.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this additional include directive necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case it is unfortunately, due needing the memcpy from the fortran::runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But memcpy is defined in freestanding-tools.h, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thank you! I've replaced tools with freestanding-tools.h in the most recent update.
Not that we've found so far, but there will likely be others as the runtime gets built on! So we're happy to address them as they come up. There is an issue with abort (C abort I think, rather than std::) currently, but the method we use to work around it downstream at the moment isn't particularly upstreamable, we create an local (to the .cpp source file with the issue) extern "C" function in-place of abort to forward to the fortran runtime's abort. Unfortunately, couldn't replicate similar functionality to the memset/memcpy in the PR as we end up with duplicate symbols. |
Checked the pointed out includes in the last comments and removed one in the last commit, the other is required for the fortran::runtime::memcpy functionality it seems. |
3353c60
to
0416575
Compare
Updated the PR stack, the main addition was new RT_OFFLOAD_API_GROUP wrappers in temporary-stack.cpp. But otherwise I rebased and reformatted the PR to fix the code formatting! But the PR should address all current comments and pass the CI now, please do feel free to add more comments however if you wish anything further changed! :-) |
Small ping for further review or approvals if possible please :-) |
flang-rt/lib/runtime/extensions.cpp
Outdated
@@ -204,8 +204,8 @@ void FORTRAN_PROCEDURE_NAME(getarg)( | |||
void FORTRAN_PROCEDURE_NAME(getlog)(char *arg, std::int64_t length) { | |||
#if _REENTRANT || _POSIX_C_SOURCE >= 199506L | |||
if (length >= 1 && getlogin_r(arg, length) == 0) { | |||
auto loginLen{std::strlen(arg)}; | |||
std::memset( | |||
auto loginLen{Fortran::runtime::strlen(arg)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just runtime::
is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went back over these to double check and removed Fortran:: from this one, the others appear to need the namespace prefix at the moment or get undeclared identifier errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? The identifiers are declared in the runtime
namespace, no? Maybe you're missing an include or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible and I'm happy to investigate a bit, just assumed it's perhaps intended for those files as we do use Fortran::runtime:: elsewhere in it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I managed to get to the bottom of it, they reside outside of a namespace or in an anonymous namespace, so a using namespace Fortran is required. Should be addressed in the latest commit.
@@ -97,7 +99,7 @@ void DescriptorStorage<COPY_VALUES>::resize(size_type newCapacity) { | |||
// Avoid passing a null pointer, since it would result in an undefined | |||
// behavior. | |||
if (data_ != nullptr) { | |||
memcpy(newData, data_, capacity_ * sizeof(Descriptor *)); | |||
Fortran::runtime::memcpy(newData, data_, capacity_ * sizeof(Descriptor *)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just runtime::
suffices.
@@ -276,13 +276,13 @@ static void DateAndTimeUnavailable(Fortran::runtime::Terminator &terminator, | |||
char *zone, std::size_t zoneChars, | |||
const Fortran::runtime::Descriptor *values) { | |||
if (date) { | |||
std::memset(date, static_cast<int>(' '), dateChars); | |||
Fortran::runtime::memset(date, static_cast<int>(' '), dateChars); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just runtime::
is needed here and in the following lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it is necessary to have the Fortran:: prefix in the remaining locations, it incurs a compile time error otherwise. And the Fortran:: prefix is used elsewhere in these files e.g. one of the input parameter in the above case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really? what happens?
while (count--) { | ||
*to++ = *from++; | ||
} | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need a return;
statement before the closing brace of a function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Happy to adjust.
0416575
to
de94a2c
Compare
rebased on top of main in the last PR and amended the requested changes into the last commit so had to do a force push, so my appologies for that. The only additional changes should be the remove of the requested return; and the removal of one more Fortran:: namespace prefix, the remaining 2 require it at the moment it seems. |
Last PR should address all the comments currently. I'll git squash and clang-format to make the CI happy once we're happy with the PR (so as to make reviewing any changes easier for the time being). |
Small ping for reviewer attention on this please :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with one minor comment. Thank you!
@@ -420,7 +423,7 @@ static void GetDateAndTime(Fortran::runtime::Terminator &terminator, char *date, | |||
auto copyBufferAndPad{ | |||
[&](char *dest, std::size_t destChars, std::size_t len) { | |||
auto copyLen{std::min(len, destChars)}; | |||
std::memcpy(dest, buffer, copyLen); | |||
Fortran::runtime::memcpy(dest, buffer, copyLen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fortran::runtime::memcpy(dest, buffer, copyLen); | |
runtime::memcpy(dest, buffer, copyLen); |
Thank you both very much! I'll address the last comment, squash and clang-format and land the PR in the next couple of days unless there's any further comments in between :-) |
We have some modifications downstream to compile the flang runtime for amdgpu using clang OpenMP, some more hacky than others to workaround (hopefully temporary) compiler issues. The additions here are the non-hacky alterations. Main changes: * Create freestanding versions of memcpy, strlen and memmove, and replace std:: references with these so that we can default to std:: when it's available, or our own Flang implementation when it's not. * Wrap more bits and pieces of the library in declare target wrappers (RT_* macros). * Fix some warnings that'll pose issues with werror on, in this case having the namespace infront of variables passed to templates. Another minor issues that'll likely still pop up depending on the program you're linking with is that abort will be undefined, it is perhaps possible to solve it with a freestanding implementation as with memcpy etc. but we end up with multiple definitions in this case. An alternative is to create an empty extern "c" version (which can be empty or forwrd on to the builtin). Co-author: Dan Palermo Dan.Palermo@amd.com
0e5ef24
to
e6b83ed
Compare
…cations (llvm#152631) We have some modifications downstream to compile the flang runtime for amdgpu using clang OpenMP, some more hacky than others to workaround (hopefully temporary) compiler issues. The additions here are the non-hacky alterations. Main changes: * Create freestanding versions of memcpy, strlen and memmove, and replace std:: references with these so that we can default to std:: when it's available, or our own Flang implementation when it's not. * Wrap more bits and pieces of the library in declare target wrappers (RT_* macros). * Fix some warnings that'll pose issues with werror on, in this case having the namespace infront of variables passed to templates. Another minor issues that'll likely still pop up depending on the program you're linking with is that abort will be undefined, it is perhaps possible to solve it with a freestanding implementation as with memcpy etc. but we end up with multiple definitions in this case. An alternative is to create an empty extern "c" version (which can be empty or forwrd on to the builtin). Co-author: Dan Palermo Dan.Palermo@amd.com
…U modifications (llvm#152631)" This reverts commit 30d2cb5.
…cations (llvm#152631) We have some modifications downstream to compile the flang runtime for amdgpu using clang OpenMP, some more hacky than others to workaround (hopefully temporary) compiler issues. The additions here are the non-hacky alterations. Main changes: * Create freestanding versions of memcpy, strlen and memmove, and replace std:: references with these so that we can default to std:: when it's available, or our own Flang implementation when it's not. * Wrap more bits and pieces of the library in declare target wrappers (RT_* macros). * Fix some warnings that'll pose issues with werror on, in this case having the namespace infront of variables passed to templates. Another minor issues that'll likely still pop up depending on the program you're linking with is that abort will be undefined, it is perhaps possible to solve it with a freestanding implementation as with memcpy etc. but we end up with multiple definitions in this case. An alternative is to create an empty extern "c" version (which can be empty or forwrd on to the builtin). Co-author: Dan Palermo Dan.Palermo@amd.com
We have some modifications downstream to compile the flang runtime for amdgpu using clang OpenMP, some more hacky than others to workaround (hopefully temporary) compiler issues. The additions here are the non-hacky alterations.
Main changes:
* Create freestanding versions of memcpy, strlen and memmove, and replace std:: references with these so that we can default to std:: when it's available, or our own Flang implementation when it's not. * Wrap more bits and pieces of the library in declare target wrappers (RT_* macros). * Fix some warnings that'll pose issues with werror on, in this case having the namespace infront of variables passed to templates.
Another minor issues that'll likely still pop up depending on the program you're linking with is that abort will be undefined, it is perhaps possible to solve it with a freestanding implementation as with memcpy etc. but we end up with multiple definitions in this case. An alternative is to create an empty extern "c" version (which can be empty or forwrd on to the builtin).
Co-author: Dan Palermo Dan.Palermo@amd.com