NVIDIA CUDA Computational Finance Geeks3D
NVIDIA CUDA Computational Finance Geeks3D
NVIDIA CUDA Computational Finance Geeks3D
S is current stock price, X is the strike price, CND is the Cumulative Normal
Distribution function, r is the risk-free interest rate, ν is the volatility
return CND;
}
/* Compare results */
…….
/* Clean up memory on host and device*/
free( hOptPrice);
………..
cudaFree(dOptPrice);
…….
THREAD_N=BlockDim.x*gridDim.x
© NVIDIA Corporation 2008 11
Compile and run
Compile the example BlackScholes.cu:
Copying input data to GPU mem. Transfer time: 3.714000 msecs. (was 10.4)
Reading back GPU results... Transfer time: 2.607000 msecs. (was 15.4)
#define PI 3.14159265358979323846264338327950288f
__device__ void BoxMuller(float& u1, float& u2){
float r = sqrtf(-2.0f * logf(u1));
float phi = 2 * PI * u2;
u1 = r * cosf(phi);
u2 = r * sinf(phi);
}
d_Sum[iAccum] = sum;
d_Sum2[iAccum] = sum2;
}
© NVIDIA Corporation 2008 23
Accurate Floating-Point Summation
The standard way of summing a sequence of N numbers , ai , is the
recursive formula:
S0 =0
Si = Si-1+ ai
S = Sn
1.0000E+10
1.0000E+09
8 underlying options
16
32
1.0000E+08
64
128
256
1.0000E+07
16
32
96
92
72
44
88
38
76
53
57
15
30
60
10
21
42
72
44
40
81
48
97
94
88
16
32
65
77
55
13
26
52
10
20
41
83
16
33
# Paths
Excellent!
Samples Per Second
1.0000E+10
1.0000E+09 Poor!
8 underlying options
16
32
1.0000E+08
64
128
256
1.0000E+07
16
32
96
92
72
44
88
38
76
53
57
15
30
60
10
21
42
72
44
40
81
48
97
94
88
16
32
65
77
55
13
26
52
10
20
41
83
16
33
# Paths
10000
Options Per Second
1000
8 underlying Options
16
32
64
128
256
100
16
32
96
92
72
44
88
38
76
53
57
15
30
60
10
21
42
72
44
40
81
48
97
94
88
16
32
65
77
55
13
26
52
10
20
41
83
16
33
# Paths
10000
Poor!
Options Per Second
1000
8 underlying Options
16
32
64
128 Excellent!
256
100
16
32
96
92
72
44
88
38
76
53
57
15
30
60
10
21
42
72
44
40
81
48
97
94
88
16
32
65
77
55
13
26
52
10
20
41
83
16
33
# Paths
100000
Final summation on the GPU
using parallel reduction is a
1000
100
16
32
96
92
72
44
88
38
76
53
57
15
30
60
10
21
42
72
44
40
81
Samples Per Second
1.0000E+10
48
97
94
88
16
32
65
77
55
13
26
52
10
20
41
83
16
33
# Paths
1.0000E+09
1.0000E+08 Original
Read back a single sum and
Reduce on GPU
1.0000E+07
sum of squares for each
thread block
6
16
32
96
92
72
44
88
38
76
53
57
15
30
60
10
21
42
72
44
40
81
48
97
94
88
16
32
65
77
55
13
26
52
10
20
41
83
16
33
1000000
100000
1.0000E+11
100
16
32
96
92
72
44
88
38
76
53
57
15
30
60
Samples Per Second
10
21
42
72
44
1.0000E+10
40
81
48
97
94
88
16
32
65
77
55
13
26
52
10
20
41
83
16
33
# Paths
1.0000E+09
Original
1.0000E+08
Reduce on GPU
2
6
84
68
36
72
44
88
3
9
57
15
30
60
3
10
21
42
72
44
40
81
48
97
94
88
16
32
65
77
55
13
26
52
10
20
41
83
16
33
1000000
1000
Monte Carlo Samples Per Second for 64 Options
1.0000E+11 100
2
96
92
21
43
38
76
53
07
14
28
0
85
71
43
86
40
81
77
54
16
32
65
8
13
26
52
5
10
20
41
83
16
33
Samples Per Second
1.0000E+10
# Paths
1.0000E+09
Original
1.0000E+08
Reduce on GPU Good performance for
Combine Options into a Single Kernel Launch
16
32
96
92
72
44
88
38
76
53
57
15
30
60
10
21
42
72
44
40
81
48
97
94
88
16
32
65
77
55
13
26
52
10
20
41
83
16
33