Initial f16 support #2413

umar456 · 2019-01-18T18:05:27Z

Initial support for fp16:

WilliamTambellini · 2019-01-19T22:01:12Z

why is there
"
Initial support for fp16:
CPU
"
in the description ?

umar456 · 2019-01-19T22:04:48Z

I was thinking of making a table but thought against it. Fixed the message.

CMakeModules/InternalUtils.cmake

src/backend/common/half.hpp

src/backend/cuda/blas.cpp

src/backend/cuda/CMakeLists.txt

9prady9

Looks great, I guess you are still adding more tests - for all the functions for which half support has been enabled.

include/af/half.h

src/api/c/blas.cpp

src/api/cpp/seq.cpp

src/backend/cuda/kernel/memcopy.hpp

src/backend/opencl/err_opencl.hpp

include/af/half.h

CMakeModules/InternalUtils.cmake

WilliamTambellini · 2019-06-25T20:29:44Z

include/af/device.h

+    /// \returns true if the \p device supports half precision operations.
+    ///          false otherwise
+    /// \ingroup device_func_half
+    AFAPI bool isHalfAvailable(const int device);


Could you add more comments in order to give some examples of which device will return true which false ?

I am not sure if that would help users in a general way. Perhaps, a new page in our documentation that lists known devices with half support or hyperlink to a page which has this information is better.

On my side I m more interested to know if the device can handle half natively with better perf than f32 (most of the time meaning Volta/Turing via TensorCores and apparently P100).
Is nt that method going to return true on old GPUs (GTX10X0) where half is perhaps "supported" but perf is disastrous ?

The half support for the 10x0 cards are supported through floats.

Even on the Volta cards we need to improve the way we handle things to improve the compute performance. In order to fully utilize the GPU's half compute performance we need to use half2. This will require a major shift on some of the internal kernels so I pushed that change off to later. We can talk about what functions are important to you and improve those later down the line. All the library calls for half(i.e. matmul) will be using half2 so you will get full compute performance there. If there are kernels that are bandwidth bound moving to half2 will serve no purpose because it will still be a bandwidth bound kernel.

src/backend/cuda/platform.cpp

src/backend/cuda/blas.cpp

src/backend/cuda/CMakeLists.txt

9prady9 · 2019-06-27T05:06:53Z

include/af/device.h

+    /// \returns true if the \p device supports half precision operations.
+    ///          false otherwise
+    /// \ingroup device_func_half
+    AFAPI bool isHalfAvailable(const int device);


I am not sure if that would help users in a general way. Perhaps, a new page in our documentation that lists known devices with half support or hyperlink to a page which has this information is better.

src/api/c/array.cpp

src/api/c/binary.cpp

src/api/c/blas.cpp

src/api/c/data.cpp

src/backend/opencl/kernel/transpose.cl

test/array.cpp

9prady9 · 2019-06-27T06:49:44Z

test/join.cpp

@@ -44,7 +44,7 @@ class Join : public ::testing::Test {

 // create a list of types to be tested
 typedef ::testing::Types<float, double, cfloat, cdouble, int, unsigned int,
-                         intl, uintl, char, unsigned char, short, ushort>
+                         intl, uintl, char, unsigned char, short, ushort, af_half>


we should be C++ typedef af::half to be consistent with cfloat and cboudle. This applies to all the tests where half is enabled.

I am worried we will have conflicts with the half data type. I want to discourage the use of this type by users because of its inefficiency. I am not sure if this is the best way to handle this sort of thing.

9prady9 · 2019-06-27T06:52:01Z

test/testHelpers.hpp

@@ -537,6 +576,10 @@ const af::cfloat &operator+(const af::cfloat &val) { return val; }

 const af::cdouble &operator+(const af::cdouble &val) { return val; }

+const af_half& operator+(const af_half& val) {


af::half to be consistent with other af::* types.

9prady9 · 2019-06-27T06:55:05Z

test/testHelpers.hpp

+    return lhs.data_ == rhs.data_;
+}
+
+std::ostream &operator<<(std::ostream &os, const af_half &val) {


Do we use half_float::half from extern/half/include/half.hpp only in tests ?

Yes. Our half type is not fully functional. It would require us to reimplement the entire half library inorder to correctly implement it. I don't think we should directly include other libraries in our API. perhaps we can discuss it in a separate issue.

* Add half support for gemm * Add half support for JIT * Set the c++ standard to c++14 internally * Add support functions for half support. * Add half support to reductions. * Add support for join

syurkevi

Can you open an issue to fix half testing?
Everything else looking good! Like the random cleanups. 👍

syurkevi · 2019-06-27T18:39:57Z

test/reduce.cpp

+    ASSERT_ARRAYS_EQ(gold, out);
+}
+
+// TODO(umar): HalfMax


syurkevi · 2019-06-28T19:33:30Z

test/array.cpp

+            EXPECT_TRUE(one.isreal());
+            EXPECT_FALSE(one.iscomplex());
+            EXPECT_FALSE(one.isbool());
+            EXPECT_FALSE(one.ishalf());


EXPECT_TRUE?

umar456 changed the title ~~Implement f16 support~~ Initial f16 support Jan 18, 2019

umar456 force-pushed the half branch from 16cdc42 to dabb5fd Compare January 19, 2019 23:03