0% found this document useful (0 votes)

73 views61 pages

CS 179: GPU Computing: Lecture 16: Simulations and Randomness

This document discusses simulations and randomness, focusing on Monte Carlo methods and parallel random number generation. It introduces Monte Carlo simulations as a way to solve problems that are hard to solve directly, like estimating probabilities. As an example, it shows how Monte Carlo can estimate pi by randomly generating points and calculating the fraction that fall within a circle. It then discusses how the general Monte Carlo method works and how its trials can potentially be parallelized. Finally, it covers challenges with parallel random number generation and introduces an approach using multiple generator sequences.

Uploaded by

Rajul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views61 pages

CS 179: GPU Computing: Lecture 16: Simulations and Randomness

Uploaded by

Rajul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 61

CS 179: GPU Computing

Lecture 16: Simulations and

Randomness
Simulations

Exa Corporation, http://www.exa.com/images/f16.png

South Bay Simulations,
http://www.panix.com/~brosen/graphics/iacc.400.jpg

Max-Planck Institut, http://www.mpa-

Flysurfer Kiteboarding, http://www.flysurfer.com/wp- garching.mpg.de/gadget/hydrosims/
content/blogs.dir/3/files/gallery/research-and-development/zwischenablage07.jpg
Simulations
• But what if your problem is hard to solve? e.g.
– EM radiation attenuation
– Estimating complex probability distributions
– Complicated ODEs, PDEs
• (e.g. option pricing in last lecture)
– Geometric problems w/o closed-form solutions
• Volume of complicated shapes
Simulations
• Potential solution: Monte Carlo methods
– Run simulation with randomly chosen inputs
• (Possibly according to some distribution)
– Do it again… and again… and again…
– Aggregate results
Monte Carlo example
• Estimating the value of π
Monte Carlo example
• Estimating the value of π
– Quarter-circle of radius r:
• Area = (πr2)/4
– Enclosing square:
• Area = r2
– Fraction of area: π/4

"Pi 30K" by CaitlinJo - Own workThis mathematical image was created with
Mathematica. Licensed under CC BY 3.0 via Wikimedia Commons -
http://commons.wikimedia.org/wiki/File:Pi_30K.gif#/media/File:Pi_30K.gif
Monte Carlo example
• Estimating the value of π
– Quarter-circle of radius r:
• Area = (πr2)/4
– Enclosing square:
• Area = r2
– Fraction of area: π/4 ≈ 0.79

• “Solution”: Randomly generate lots of points,

calculate fraction within circle "Pi 30K" by CaitlinJo - Own workThis
mathematical image was created with
Mathematica. Licensed under CC BY 3.0 via

– Answer should be pretty close! Wikimedia Commons -

http://commons.wikimedia.org/wiki/File:Pi_30K
.gif#/media/File:Pi_30K.gif
Monte Carlo example
• Pseudocode:
(simulate on N points)
(assume r = 1)

points_in_circle = 0
for i = 0,…,N-1:
randomly pick point (x,y) from
uniform distribution in [0,1]2
if (x,y) is in circle:
points_in_circle++

return (points_in_circle / N) * 4

points_in_circle = 0
for i = 0,…,N-1:
randomly pick point (x,y) from
uniform distribution in [0,1]2
if x^2 + y^2 < 1:
points_in_circle++

return (points_in_circle / N) * 4

Planetary Materials Microanalysis Facility, , Northern Arizona University,

http://www4.nau.edu/microanalysis/microprobe-
sem/Images/Monte_Carlo.jpg

Center for Air Pollution Impact & Trend Analysis, Washington University in St.
Louis, http://www4.nau.edu/microanalysis/microprobe-
sem/Images/Monte_Carlo.jpg

http://www.cancernetwork.com/sites/default/files/cn_import/n0011bf1.jpg
General Monte Carlo method
• Pseudocode:
for (number of trials):
randomly pick value from a probability distribution
perform deterministic computation on inputs

(aggregate results)
General Monte Carlo method
• Why it works:
– Law of large numbers!
General Monte Carlo method
• Pseudocode:
for (number of trials):
randomly pick value from a probability distribution
perform deterministic computation on inputs

(aggregate results)

• Can we parallelize this?

General Monte Carlo method
• Pseudocode:
for (number of trials):
randomly pick value from a probability distribution
perform deterministic computation on inputs
Trials are
(aggregate results) independent

• Can we parallelize this?

General Monte Carlo method
• Pseudocode:
for (number of trials):
randomly pick value from a probability distribution What about this?
perform deterministic computation on inputs
Trials are
(aggregate results) Usually so independent
(e.g. with reduction)

• Can we parallelize this?

Parallelized Random Number
Generation
Early Credits
• Algorithm and presentation based on:

– “Parallel Random Numbers: As Easy as 1, 2, 3”

• (Salmon, Moraes, Dror, Shaw) at D. E. Shaw Research
• Developed for biomolecular simulations on Anton
(massively parallel ASIC-based supercomputer)
• Also applicable to CPUs, GPUs
Random Number Generation
• Generating random data computationally
is hard
– Computers are deterministic!

https://cdn.tutsplus.com/vector/uploads/legacy/tuts/165_Shiny_Dice/27.jpg
Random Number Generation
• Two methods:
– Hardware random number generator
• aka TRNG (“True” RNG)
• Uses data collected from environment (thermal,
optical, etc)
• Very slow!
– Pseudorandom number generator (PRNG)
• Algorithm that produces “random-looking”
numbers
• Faster – limited by computational power
Demonstration
Random Number Generation
• PRNG algorithm should be:
– High-quality
• Produce “good” random data
– Fast
• (In its own right)
– Parallelizable!

• Can we do it?
– (Assume selection from uniform distribution)
A Very Basic PRNG
• “Linear congruential generator” (LCG)
– e.g. C’s rand() //from glibc

int32_t val = state[0];

val = ((state[0] * 1103515245) + 12345)
& 0x7fffffff;
state[0] = val;
*result = val;

– General formula:
𝑋𝑛+1 = 𝑎𝑋𝑛 + 𝑐 mod 𝑚

• X0 is the “seed” (e.g. system time)

A Very Basic PRNG
• “Linear congruential generator” (LCG)
– e.g. C’s rand() //from glibc

int32_t val = state[0];

val = ((state[0] * 1103515245) + 12345)
& 0x7fffffff;
state[0] = val;
*result = val;

– General formula:
𝑋𝑛+1 = 𝑎𝑋𝑛 + 𝑐 mod 𝑚
Non-parallelizable
recurrence relation!
Linear congruential generators
𝑋𝑛+1 = 𝑎𝑋𝑛 + 𝑐 mod 𝑚

• Not high quality!

– Clearly non-uniform

• Fast to compute
• Not parallelizable!

"Lcg 3d". Licensed under CC BY-SA 3.0 via Wikimedia Commons -

http://commons.wikimedia.org/wiki/File:Lcg_3d.gif#/media/File:Lcg_3d.gif
Measures of RNG quality
• Impossible to prove a sequence is “random”

• Possible tests:
– Frequency
– Periodicity - do the values repeat too early?
– Linear dependence
–…
PRNG Parallelizability
• Many PRNGs (like the LCG) have a
non-parallelizable appearance:

𝑋𝑛+1 = 𝑓(𝑋𝑛 )

– (Better chance of good data when):

• All 𝑋𝑖 in some large state space
• Complicated function f
PRNG Parallelizability
• Possible “approach” to GPU parallelization:
– Assign a PRNG to each thread!
• Initialize with e.g. different X0

• Thread 0 produces sequence 𝑋𝑛+1,0 = 𝑓(𝑋𝑛,0 )

• Thread 1 produces sequence 𝑋𝑛+1,1 = 𝑓(𝑋𝑛,1 )
•…
PRNG Parallelizability
• Possible “approach” to GPU parallelization:
– Assign a PRNG to each thread!
• Initialize with e.g. different X0

• Thread 0 produces sequence 𝑋𝑛+1,0 = 𝑓(𝑋𝑛,0 )

• Thread 1 produces sequence 𝑋𝑛+1,1 = 𝑓(𝑋𝑛,1 )
•…

– In practice, often cannot get high quality

• Repeated values, lack of good, enumerable parameters
PRNG Parallelizability
• Instead of:
𝑋𝑛+1 = 𝑓(𝑋𝑛 )

• Suppose we had:
𝑋𝑛+1 = 𝑏 𝑛

– This is parallelizable! (Without our previous “trick”)

• Is this possible?
More General PRNG
• “Keyed” PRNG given by:
– Transition function: 𝑓: 𝑆 → 𝑆
– Output function: 𝑔: 𝐾 × 𝑆 → 𝑈

• S: Internal (hidden) state space

• U: Output space
• K: “Key space”
– Can “seed” output behavior without relying on X0 alone –
useful for scientific reproducibility!
More General PRNG
• “Keyed” PRNG given by:
– Transition function: 𝑓: 𝑆 → 𝑆
– Output function: 𝑔: 𝐾 × 𝑆 → 𝑈
If S has J times more bits than
• S: Internal (hidden) state space U, can produce J outputs per
transition.
• U: Output space
Assume J = 1 in this lecture
• K: “Key space”
– Can “seed” output behavior without relying on X0 alone –
useful for scientific reproducibility!
More General PRNG
• “Keyed” PRNG given by:
– Transition function: 𝑓: 𝑆 → 𝑆
– Output function: 𝑔: 𝐾 × 𝑆 → 𝑈

– “Trivial” example: LCG 𝑋𝑛+1 = 𝑎𝑋𝑛 + 𝑐 mod 𝑚

• 𝑓 𝑋𝑛 = 𝑎𝑋𝑛 + 𝑐
• 𝑔 𝑋𝑛 = 𝑋𝑛

• S is (for example) the space of 32-bit integers

• U=S
• K is “trivial” (no keys used)
More General PRNG
• “Keyed” PRNG given by:
– Transition function: 𝑓: 𝑆 → 𝑆
– Output function: 𝑔: 𝐾 × 𝑆 → 𝑈

– “Trivial” example: LCG 𝑋𝑛+1 = 𝑎𝑋𝑛 + 𝑐 mod 𝑚

• 𝑓 𝑋𝑛 = 𝑎𝑋𝑛 + 𝑐
• 𝑔 𝑋𝑛 = 𝑋𝑛

• f is more complicated than g!

More General PRNG
• “Keyed” PRNG given by:
– Transition function: 𝑓: 𝑆 → 𝑆
– Output function: 𝑔: 𝐾 × 𝑆 → 𝑈

– General theme: f is complicated, g is simple

• What if we flipped that?
More General PRNG
• “Keyed” PRNG given by:
– Transition function: 𝑓: 𝑆 → 𝑆
– Output function: 𝑔: 𝐾 × 𝑆 → 𝑈

– General theme: f is complicated, g is simple

• What if we flipped that?

• What if f were so simple that it could be evaluated

explicity?
More General PRNG
• i.e. what if we had:
– Simple transition function (p-bit integer state space):
𝑓 𝑠 = (𝑠 + 1) mod 2𝑝

• This is just a counter! Can expand into explicit formula

𝑓 𝑛 = (𝑛 + 𝑛0 ) mod 2𝑝

• These form counter-based PRNGs

– Complicated output function g

• Would this work?

More General PRNG
• i.e. what if we had:
– Simple transition function f
– Complicated output function g(k, n)
• Should be bijective w/r/to n
– Guarantees period of 2p
• Shouldn’t be too difficult to compute
Bijective Functions
• Cryptographic block ciphers!
– AES (Advanced Encryption Standard), Threefish, …

– Must be bijective!
• (Otherwise messages can’t be encrypted/decrypted)
AES-128 Algorithm
• 1) Key Expansion
– Determine all keys k from initial cipher key kB
• Used to strengthen weak keys

Sohaib Majzoub and Hassan Diab, Reconfigurable

Systems for Cryptography and Multimedia
Applications,
http://www.intechopen.com/source/html/38442/m
edia/image19_w.jpg
AES-128 Algorithm
• 2) Add round key
– Bitwise XOR state s with key k0

By User:Matt Crypto - Own work. Licensed under Public Domain via

Wikimedia Commons - http://commons.wikimedia.org/wiki/File:AES-
AddRoundKey.svg#/media/File:AES-AddRoundKey.svg
AES-128 Algorithm
• 3) For each round… (10 rounds total)
– a) Substitute bytes
• Use lookup table to switch positions

By User:Matt Crypto - Own work. Licensed under Public Domain via

Wikimedia Commons - http://commons.wikimedia.org/wiki/File:AES-
AddRoundKey.svg#/media/File:AES-AddRoundKey.svg
AES-128 Algorithm
• 3) For each round…
– b) Shift rows

By User:Matt Crypto - Own work. Licensed under Public Domain via

Wikimedia Commons - http://commons.wikimedia.org/wiki/File:AES-
AddRoundKey.svg#/media/File:AES-AddRoundKey.svg
AES-128 Algorithm
• 3) For each round…
– c) Mix columns
• Multiply by constant matrix

By User:Matt Crypto - Own work. Licensed under Public Domain via

Wikimedia Commons - http://commons.wikimedia.org/wiki/File:AES-
AddRoundKey.svg#/media/File:AES-AddRoundKey.svg
AES-128 Algorithm
• 3) For each round…
– d) Add round key (as before)

By User:Matt Crypto - Own work. Licensed under Public Domain via

Wikimedia Commons - http://commons.wikimedia.org/wiki/File:AES-
AddRoundKey.svg#/media/File:AES-AddRoundKey.svg
AES-128 Algorithm
• 4) Final round
– Do everything in normal round except mix
columns
AES-128 Algorithm
• Summary:
– 1) Expand keys
– 2) Add round key
– 3) For each round (10 rounds total)
• Substitute bytes
• Shift rows
• Mix columns
• Add round key
– 4) Final round:
• (do everything except mix columns)
Algorithmic Improvements
• We have a good PRNG!
– Simple transition function f
• Counter
– Complicated output function g(k, n)
• AES-128
Algorithmic Improvements
• We have a good PRNG!
– Simple transition function f
• Counter
– Complicated output function g(k, n)
• AES-128

– High quality!
• Passes Crush test suite (more on that later)
– Parallelizable!
• f and g only depend on k, n !
– Sort of slow to compute
• AES is sort of slow without special instructions (which GPUs
don’t have)
Algorithmic Improvements
• Can we “make AES go faster”?
– AES is a cryptographic algorithm, but we’re using it
for PRNG
– Can we change the algorithm for our purposes?
AES-128 Algorithm
• Summary:
– 1) Expand keys
– 2) Add round key
– 3) For each round (10 rounds total)
• Substitute bytes
• Shift rows
• Mix columns
• Add round key
– 4) Final round:
• (do everything except mix columns)
AES-128 Algorithm
Purpose of this step is to
hide key from attacker
• Summary: using chosen plaintext.
Not relevant here.
– 1) Expand keys
– 2) Add round key
– 3) For each round (10 rounds total)
• Substitute bytes
• Shift rows
• Mix columns
• Add round key
– 4) Final round:
• (do everything except mix columns)
AES-128 Algorithm
Purpose of this step is to
hide key from attacker
• Summary: using chosen plaintext.
Not relevant here.
– 1) Expand keys
– 2) Add round key
– 3) For each round (10 rounds total)
• Substitute bytes Do we really need
this many rounds?
• Shift rows
• Mix columns
• Add round key
– 4) Final round: Other changes?
• (do everything except mix columns)
Key Schedule Change
• Old key schedule: • New key schedule:
– The first n bytes of the expanded key are simply the

–
encryption key.
The rcon iteration value i is set to 1 – k0 = kB
– Until we have b bytes of expanded key, we do the following

– ki+1 = ki + constant
to generate n more bytes of expanded key:
• We do the following to create 4 bytes of expanded key:
– We create a 4-byte temporary variable, t
– We assign the value of the previous four bytes in the

–
expanded key to t
We perform the key schedule core (see above) on t, with i as
the rcon iteration value
• e.g. golden ratio
– We increment i by 1
– We exclusive-OR t with the four-byte block n bytes before the
new expanded key. This becomes the next 4 bytes in the
expanded key
• We then do the following three times to create the next twelve
bytes of expanded key:
– We assign the value of the previous 4 bytes in the expanded
key to t
– We exclusive-OR t with the four-byte block n bytes before the
new expanded key. This becomes the next 4 bytes in the
expanded key
• If we are processing a 256-bit key, we do the following to
generate the next 4 bytes of expanded key:
– We assign the value of the previous 4 bytes in the expanded
key to t
– We run each of the 4 bytes in t through Rijndael's S-box
– We exclusive-OR t with the 4-byte block n bytes before the
new expanded key. This becomes the next 4 bytes in the
expanded key.

Copied from Wikipedia (Rijndael Key Schedule)

AES-128 Algorithm
• Summary:
– 1) Expand keys using simplified algorithm
– 2) Add round key
– 3) For each round (10 5 rounds total)
• Substitute bytes
• Shift rows
• Mix columns
• Add round key
Other simplifications
– 4) Final round: possible!
• (do everything except mix columns)
Algorithmic Improvements
• We have a good PRNG!
– Simple transition function f
• Counter
– Complicated output function g(k, n)
• Modified AES-128 (known as ARS-5)

– High quality!
• Passes Crush test suite (more on that later)
– Parallelizable!
• f and g only depend on k, n !
– Moderately faster to compute
Even faster parallel PRNGs
• Use a different g, e.g.
– Threefish cipher
• Optimized for PRNG – known as “Threefry”
– “Philox”
• (see paper for details)
• 202 GB/s on GTX580!
– Fastest known PRNG in existence
General Monte Carlo method
• Pseudocode:
for (number of trials):
randomly pick value from a probability distribution What about this?
perform deterministic computation on inputs
Trials are
(aggregate results) Usually so independent
(e.g. with reduction)

• Can we parallelize this?

General Monte Carlo method
• Pseudocode:
for (number of trials):
randomly pick value from a probability distribution Yes!
perform deterministic computation on inputs
Trials are
(aggregate results) Usually so independent
(e.g. with reduction)

• Can we parallelize this?

– Yes!
– Part of cuRAND
Summary
• Monte Carlo methods
– Very useful in scientific simulations
– Parallelizable because of…

• Parallelized random number generation

– Another story of “parallel algorithm analysis”
Credits (again)
• Parallel RNG algorithm and presentation
based on:

– “Parallel Random Numbers: As Easy as 1, 2, 3”

• (Salmon, Moraes, Dror, Shaw) at D. E. Shaw Research

000 Getstartedrpi Digital
100% (2)
000 Getstartedrpi Digital
116 pages
Advertising Response Models
50% (2)
Advertising Response Models
36 pages
Assignment Roof
100% (4)
Assignment Roof
68 pages
RNG Revised
No ratings yet
RNG Revised
132 pages
Random Numbers: Bana7030 Denise L. White, PHD Mba
0% (1)
Random Numbers: Bana7030 Denise L. White, PHD Mba
34 pages
Random Montecarlo 2005
No ratings yet
Random Montecarlo 2005
46 pages
Pseudo Random Bit Generator
No ratings yet
Pseudo Random Bit Generator
54 pages
Parallel Monte Carlo
No ratings yet
Parallel Monte Carlo
39 pages
07 Fenton Simulation
100% (2)
07 Fenton Simulation
52 pages
Parallel Random Number Generation: Ahmet Duran CISC 879
No ratings yet
Parallel Random Number Generation: Ahmet Duran CISC 879
37 pages
TRNG Manual
No ratings yet
TRNG Manual
133 pages
A Single Number Is Not Random. - Random Means The Absence of Order
No ratings yet
A Single Number Is Not Random. - Random Means The Absence of Order
20 pages
Chapter 01 5e
100% (1)
Chapter 01 5e
15 pages
Fullpres PDF
No ratings yet
Fullpres PDF
30 pages
Montecarlo Integration
No ratings yet
Montecarlo Integration
33 pages
Planning and Design of Radiology & Imaging Sciences
100% (1)
Planning and Design of Radiology & Imaging Sciences
39 pages
Full Pres
No ratings yet
Full Pres
25 pages
Numerical Methods in Finance. Part A. (2010-2011)
No ratings yet
Numerical Methods in Finance. Part A. (2010-2011)
23 pages
Computational Science For Engineers - Unit-IV - DataDrivenModels - Simulations, Random Numbers and Random Walk
No ratings yet
Computational Science For Engineers - Unit-IV - DataDrivenModels - Simulations, Random Numbers and Random Walk
31 pages
Random Number Generator Recommendation PDF
No ratings yet
Random Number Generator Recommendation PDF
27 pages
RN GMC Final
No ratings yet
RN GMC Final
21 pages
A Note On Random Number Generation: Christophe Dutang and Diethelm Wuertz September 2009
No ratings yet
A Note On Random Number Generation: Christophe Dutang and Diethelm Wuertz September 2009
30 pages
Random Number Generators: Professor Karl Sigman Columbia University Department of IEOR New York City USA
No ratings yet
Random Number Generators: Professor Karl Sigman Columbia University Department of IEOR New York City USA
17 pages
Flume User Guide
No ratings yet
Flume User Guide
48 pages
Lgorithm To Reproe Stream Ons
No ratings yet
Lgorithm To Reproe Stream Ons
7 pages
IE 403 Ch03 RNG RVG With Comments
No ratings yet
IE 403 Ch03 RNG RVG With Comments
27 pages
A Pseudo-Random Bit Generator Using Three Chaotic Logistic Maps
No ratings yet
A Pseudo-Random Bit Generator Using Three Chaotic Logistic Maps
23 pages
A Paper On Mysterious Solution To Hackers: Random Number Generators
No ratings yet
A Paper On Mysterious Solution To Hackers: Random Number Generators
12 pages
Generacion de Pseudo Numeros
No ratings yet
Generacion de Pseudo Numeros
19 pages
Performance and Quality of Random Number Generator PDF
No ratings yet
Performance and Quality of Random Number Generator PDF
7 pages
Course Module: Course Module Code Topic Coverage Reference/s Duration Learning Outcomes
No ratings yet
Course Module: Course Module Code Topic Coverage Reference/s Duration Learning Outcomes
3 pages
Distance & Direction-2: Floor, Behind Bus Stand, Karnal - Contact: 7015275075, 7206600658
No ratings yet
Distance & Direction-2: Floor, Behind Bus Stand, Karnal - Contact: 7015275075, 7206600658
8 pages
National-Oilwell: Top Drive
No ratings yet
National-Oilwell: Top Drive
6 pages
Gambling, Random Walks and The Central Limit Theorem: 3.1 Random Variables and Laws of Large Num-Bers
No ratings yet
Gambling, Random Walks and The Central Limit Theorem: 3.1 Random Variables and Laws of Large Num-Bers
59 pages
Linear Models: Stability and Redundancy: 2.1 Singular Value Decomposition
No ratings yet
Linear Models: Stability and Redundancy: 2.1 Singular Value Decomposition
24 pages
Random Numbersaaaaaaaaaaaaaaaaaa
No ratings yet
Random Numbersaaaaaaaaaaaaaaaaaa
20 pages
Pretty Derping
No ratings yet
Pretty Derping
46 pages
Cp01 Random
No ratings yet
Cp01 Random
18 pages
3.1 Basics of Pseudo-Random Numbers Generators
No ratings yet
3.1 Basics of Pseudo-Random Numbers Generators
10 pages
Lec4 17
No ratings yet
Lec4 17
22 pages
Probability - Wikipedia
No ratings yet
Probability - Wikipedia
58 pages
Long-Range Dependency Effects in Network Timekeeping: David L. Mills University of Delaware
No ratings yet
Long-Range Dependency Effects in Network Timekeeping: David L. Mills University of Delaware
33 pages
Network Time Protocol (NTP) General Overview: David L. Mills University of Delaware
No ratings yet
Network Time Protocol (NTP) General Overview: David L. Mills University of Delaware
22 pages
3.1 Basics of Pseudo-Random Numbers Generators
No ratings yet
3.1 Basics of Pseudo-Random Numbers Generators
10 pages
Lec2 17
No ratings yet
Lec2 17
27 pages
Random - Number Generators
No ratings yet
Random - Number Generators
37 pages
Lec1 17
No ratings yet
Lec1 17
39 pages
U3 2 Random Numbers
No ratings yet
U3 2 Random Numbers
12 pages
Elective I (Math)
No ratings yet
Elective I (Math)
2 pages
Equity Structured Products Accumulator/ Decumulator
No ratings yet
Equity Structured Products Accumulator/ Decumulator
5 pages
ch07 Stream Nemo
No ratings yet
ch07 Stream Nemo
28 pages
Neofiti 1 - Deuteronomio - Translation-English
No ratings yet
Neofiti 1 - Deuteronomio - Translation-English
68 pages
The Road To Makkah As God Inspired Book
No ratings yet
The Road To Makkah As God Inspired Book
5 pages
UCUN DINAS I BHS INGGRIS PKT A Dijawab
100% (3)
UCUN DINAS I BHS INGGRIS PKT A Dijawab
12 pages
Module Body Fluids For Board Exam
No ratings yet
Module Body Fluids For Board Exam
8 pages
Cico Plast-N: Normal Water Reducing Admixture / Plasticiser For Concrete
No ratings yet
Cico Plast-N: Normal Water Reducing Admixture / Plasticiser For Concrete
2 pages
Lecture 09 Randomnumbers
No ratings yet
Lecture 09 Randomnumbers
49 pages
1 s2.0 S0263224113006519 Main
No ratings yet
1 s2.0 S0263224113006519 Main
11 pages
Computational Physics (PH-401) Lecture-20
No ratings yet
Computational Physics (PH-401) Lecture-20
76 pages
0.1 Installation of R Packages
No ratings yet
0.1 Installation of R Packages
10 pages
JDBC Drivers JDBC-ODBC Bridge Driver Native-API Driver Network Protocol Driver Thin Driver
No ratings yet
JDBC Drivers JDBC-ODBC Bridge Driver Native-API Driver Network Protocol Driver Thin Driver
8 pages
Chapitre 5-Simulation
No ratings yet
Chapitre 5-Simulation
20 pages
In Vivo and in Vitro Evaluation of Four Different Aqueous Polymeric Dispersions For Producing An Enteric Coated Tablet
No ratings yet
In Vivo and in Vitro Evaluation of Four Different Aqueous Polymeric Dispersions For Producing An Enteric Coated Tablet
6 pages
Computational Physics II
No ratings yet
Computational Physics II
108 pages
Unit09 1 PRNG
No ratings yet
Unit09 1 PRNG
43 pages
Excise, Taxation and Narcotics - Government of Sindh
No ratings yet
Excise, Taxation and Narcotics - Government of Sindh
1 page
Lesson 04 - Physical Science
No ratings yet
Lesson 04 - Physical Science
24 pages
Telehandler Genie GTH 1048-Specifications
No ratings yet
Telehandler Genie GTH 1048-Specifications
2 pages
Random Bit Generation and Stream Ciphers
No ratings yet
Random Bit Generation and Stream Ciphers
63 pages
Cne310 Lec 6 RNG
No ratings yet
Cne310 Lec 6 RNG
34 pages
Lec11 Random Sampling
No ratings yet
Lec11 Random Sampling
20 pages
DIY Simple Machine Model Rubric
No ratings yet
DIY Simple Machine Model Rubric
1 page
Illustrated Parts Catalog Bo105 Ls A-3: Lifting System Assy
No ratings yet
Illustrated Parts Catalog Bo105 Ls A-3: Lifting System Assy
2 pages
Randomnumbers
No ratings yet
Randomnumbers
26 pages
Reserch Proposal Raneesha
No ratings yet
Reserch Proposal Raneesha
22 pages
Unit 3
No ratings yet
Unit 3
66 pages
On A Clear Day A Town With An Ocean View Joe Hisaishi
No ratings yet
On A Clear Day A Town With An Ocean View Joe Hisaishi
22 pages
Random Numbers
No ratings yet
Random Numbers
34 pages
Liverpool Football Club Annual Report and Consolidated Financial Statements
No ratings yet
Liverpool Football Club Annual Report and Consolidated Financial Statements
38 pages
Random Numbers and Monte Carlo Simulation: Niveau: L2-INFO
No ratings yet
Random Numbers and Monte Carlo Simulation: Niveau: L2-INFO
15 pages
Lecture06 Random Number Generators
No ratings yet
Lecture06 Random Number Generators
32 pages
Stats Answers
No ratings yet
Stats Answers
9 pages
PRNG
No ratings yet
PRNG
18 pages
Emon 1
No ratings yet
Emon 1
11 pages
Manisharesume 2020
No ratings yet
Manisharesume 2020
5 pages
Chapter Three: Random Numbers
No ratings yet
Chapter Three: Random Numbers
32 pages
Random Numbers
No ratings yet
Random Numbers
13 pages
Lectures 18-19
No ratings yet
Lectures 18-19
18 pages
Lecture 8-Generation of Random Variable1-NEW
No ratings yet
Lecture 8-Generation of Random Variable1-NEW
10 pages
Lect 15
No ratings yet
Lect 15
19 pages
Lecture 12 GG
No ratings yet
Lecture 12 GG
43 pages
2021 Pseudo
No ratings yet
2021 Pseudo
7 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
HPC Cmakeshort
No ratings yet
HPC Cmakeshort
11 pages
HPC Iterative
No ratings yet
HPC Iterative
106 pages
HPC Unix
No ratings yet
HPC Unix
46 pages
HPC Cmake
No ratings yet
HPC Cmake
76 pages
HPC Linear
No ratings yet
HPC Linear
52 pages
HPC Git
No ratings yet
HPC Git
12 pages
HPC Pkgconfig
No ratings yet
HPC Pkgconfig
12 pages
HPC Performance
No ratings yet
HPC Performance
13 pages
HPC Intro
No ratings yet
HPC Intro
16 pages
HPC Graph
No ratings yet
HPC Graph
22 pages
HPC Nbody
No ratings yet
HPC Nbody
23 pages
HPC Programming
No ratings yet
HPC Programming
33 pages
HPC Debug
No ratings yet
HPC Debug
38 pages
HPC Scaling
No ratings yet
HPC Scaling
56 pages
HPC Arithmetic
No ratings yet
HPC Arithmetic
62 pages
HPC Architecture
No ratings yet
HPC Architecture
86 pages
BANA7030 Random Numbers
No ratings yet
BANA7030 Random Numbers
22 pages
S5 Math Exercise
No ratings yet
S5 Math Exercise
6 pages
Pseudorandom Numbers in Modeling and Simulation
No ratings yet
Pseudorandom Numbers in Modeling and Simulation
7 pages
Random Number Generation
No ratings yet
Random Number Generation
4 pages
CG Project Report
No ratings yet
CG Project Report
25 pages
Bitsf463 Lect10
No ratings yet
Bitsf463 Lect10
22 pages
Teip7419 Mo
No ratings yet
Teip7419 Mo
22 pages
Research II Proposal
No ratings yet
Research II Proposal
26 pages
Random Numbers in Scientific Computing: An Introduction: Abstract
No ratings yet
Random Numbers in Scientific Computing: An Introduction: Abstract
20 pages
ENGLISH READING COMPREHENSION TEST 8th Grade
No ratings yet
ENGLISH READING COMPREHENSION TEST 8th Grade
5 pages
How To Code For Quantum Computers
From Everand
How To Code For Quantum Computers
Nivio Dos Santos
No ratings yet

CS 179: GPU Computing: Lecture 16: Simulations and Randomness

Uploaded by

CS 179: GPU Computing: Lecture 16: Simulations and Randomness

Uploaded by

CS 179: GPU Computing

Lecture 16: Simulations and

Exa Corporation, http://www.exa.com/images/f16.png

Max-Planck Institut, http://www.mpa-

• “Solution”: Randomly generate lots of points,

– Answer should be pretty close! Wikimedia Commons -

Planetary Materials Microanalysis Facility, , Northern Arizona University,

• Can we parallelize this?

• Can we parallelize this?

• Can we parallelize this?

• Can we parallelize this?

– “Parallel Random Numbers: As Easy as 1, 2, 3”

int32_t val = state[0];

• X0 is the “seed” (e.g. system time)

int32_t val = state[0];

• Not high quality!

"Lcg 3d". Licensed under CC BY-SA 3.0 via Wikimedia Commons -

– (Better chance of good data when):

• Thread 0 produces sequence 𝑋𝑛+1,0 = 𝑓(𝑋𝑛,0 )

• Thread 0 produces sequence 𝑋𝑛+1,0 = 𝑓(𝑋𝑛,0 )

– In practice, often cannot get high quality

– This is parallelizable! (Without our previous “trick”)

• S: Internal (hidden) state space

– “Trivial” example: LCG 𝑋𝑛+1 = 𝑎𝑋𝑛 + 𝑐 mod 𝑚

• S is (for example) the space of 32-bit integers

– “Trivial” example: LCG 𝑋𝑛+1 = 𝑎𝑋𝑛 + 𝑐 mod 𝑚

• f is more complicated than g!

– General theme: f is complicated, g is simple

– General theme: f is complicated, g is simple

• What if f were so simple that it could be evaluated

• This is just a counter! Can expand into explicit formula

• These form counter-based PRNGs

• Would this work?

Sohaib Majzoub and Hassan Diab, Reconfigurable

By User:Matt Crypto - Own work. Licensed under Public Domain via

By User:Matt Crypto - Own work. Licensed under Public Domain via

By User:Matt Crypto - Own work. Licensed under Public Domain via

By User:Matt Crypto - Own work. Licensed under Public Domain via

By User:Matt Crypto - Own work. Licensed under Public Domain via

Copied from Wikipedia (Rijndael Key Schedule)

• Can we parallelize this?

• Can we parallelize this?

• Parallelized random number generation

– “Parallel Random Numbers: As Easy as 1, 2, 3”

You might also like