0% found this document useful (0 votes)
42 views53 pages

DEM Lecture0224

DEM

Uploaded by

Karunyamaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views53 pages

DEM Lecture0224

DEM

Uploaded by

Karunyamaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Discrete Element

Method
Midterm Project: Option 2

Motivation


Industries:






Problem
examples:



NASA, loyalkng.com,
cowboymining.com

Mining
Food &
Pharmaceutics
Film & Game
etc.

Collapsing Silos
Mars Rover
etc.

Discrete Element Method




Collision detection determines pairs of colliding bodies

Contact forces computed based on constitutive relation


(spring-damper model)

Requires small time-steps

Newtons Second Law used to compute accelerations

Numerical integration (e.g., Velocity Verlet) used to compute


velocity, position of all bodies
3

Discrete Element Method


Loop
tstart to
tend
Particle
Initialization

Collision
Detection

Contact
Force
Calculation

Newtons
2nd Law

Velocity
and
Position
Analysis

Output
Data

Next time step

Collision
Detection

DEM



Contact
Force
Calculation

Newtons
2nd Law

Velocity
and
Position
Analysis

Spatial Subdivision
ri , rj
2 particles:
rij = ri rj

If rij d
ij = d rij

n ij =


rij
rij

Otherwise no collision
5

DEM

Collision
Detection

Contact
Force
Calculation

Newtons
2nd Law

Velocity
and
Position
Analysis

Collision detection code will be


provided to you

Input: Arrays of sphere positions and


radii

Output: Array of collision data

DEM


Contact
Force
Calculation

Newtons
2nd Law

Velocity
and
Position
Analysis

Contact force
components



Collision
Detection

normal
tangential

Four different
categories:





Continuous potential
models
Linear viscoelastic models
Non-linear viscoelastic
models

Collision
Detection

DEM

Contact
Force
Calculation

Velocity
and
Position
Analysis

Newtons
2nd Law

vij = vi v j
meff =

mi m j
(mi + m j )

v n = (vij n ij )n ij
ij

Fn
Normal Force
ij

ij

=
f
computed Fas:
n ij
d

(kn ij n ij n meff vn ij )

kn spring stiffness
n damping coefficient
8

DEM


Collision
Detection

Contact
Force
Calculation

Velocity
and
Position
Analysis

Newtons
2nd Law

Force on one particle is the sum of its contact forces and


gravity:

( )

Fitot = mi g + Fn
j

Calculation acceleration:
Fitot

= mi a i a i =

ij

Fitot
mi

DEM

Collision
Detection

Contact
Force
Calculation

Newtons
2nd Law

Velocity
and
Position
Analysis

Use explicit numerical integration methods like


Explicit Euler or Velocity Verlet Integration

Explicit Euler:

ri (t + t ) = ri (t ) + vi (t )t
vi (t + t ) = vi (t ) + a i (t )t

10

Parallelism





Parallel collision detection (provided)


(Per-contact): Compute collision forces
(Per-body): Reduction to resultant force per body
(Per-body): Solution of Newtons Second Law, time
integration

11

Example

1 million
spheres
0.5 sec long
simulation
~12,000 sec
computational
time
GPU

12

Suggested Code Structure




Class ParticleSystem







void initializeSim()
void performCD()
void computeForces()
void integrate()
void getGPUdata()
void outputState()

13

void initializeSim()



Set initial conditions of all bodies


Copy state data from host to device

void performCD()


Call GPU CD function (provided) to determine pairs of


colliding spheres
Returns array of contact_data structs
 data members: objectIdA, objectIdB

14

void computeForces()




Compute contact force for each contact


Compute resultant force acting on each body
Compute and add reaction force for contact
with boundary planes

void integrate()



Compute acceleration of each body


Update velocity and position of each body
15

void getGPUdata()


Copy state data back to host

void outputState()


Output sphere positions and radii to a text file

16

main function
int main(int argc, char* argv[])
{
float t_curr=0.0f;
float t_end=1.0f;
float h=0.00005f;
ParticleSystem *psystem = new ParticleSystem();
psystem->initializeSim();
while(t_curr<=t_end)
{
psystem->performCD();
psystem->computeForces();
psystem->integrate();
t_curr+=h;
}
delete psystem;
return 0;
}
17

Other Tips (Force computation)


1.

Compute force for each contact with one


thread per contact


2.

Store key-value array with body ID as key, force


as value
Note each contact should create a force on two
bodies

Sort by key (body ID)




thrust::sort_by_key()

18

Other Tips (Force computation)


3.

Sum all forces acting on a single body





4.

thrust::reduce_by_key()
One thread per entry in output, copy to
appropriate place in net force list

Add gravity force to each bodys net force




One thread per body

19

Other Tips (Force computation)


5.

Contact with planes


Assume infinite planes
A plane is defined by a point (p) and normal
direction (N)
One thread per sphere (at position r)









Compute d = N (r p)
Contact if d<radius
Compute force as before, add to net force

20

Parallel Collision Detection

Overview


Method 1: Brute Force





Easier implementation
O(N2) Complexity

Method 2: Parallel Binning





More involved
O(N) Complexity

22

Brute Force Approach




Three Steps:


Run preliminary pass to understand the memory


requirements by figuring out the number of contacts present

Allocate on the device the required amount of memory to


store the desired collision information

Run actual collision detection and populate the data structure


with the information desired

23

Step 1: Search for contacts




Create on the device an array of unsigned integers, equal in


size to the number N of bodies in the system



Call this array dB, initialize all its entries to zero


Array dB to store in entry j the number of contacts that body j will
have with bodies of higher index


If body 5 collides with body 9, no need to say that body 9 collides


with body 5 as well

Do in parallel, one thread per body basis


for body j, loop from k=j+1 to N
if bodies j and k collide, dB[j] += 1
endloop
endDo
24

Step 1, cont.

25

Step 2: Parallel Scan Operation




Allocate memory space for the collision information




Step 2.1: Define first a structure that might help (this is not the most
efficient approach, but well go along with it)
struct collisionInfo {
float3 rA;
float3 rB;
float3 normal;
unsigned int indxA;
unsigned int indxB;
}

Step 2.2: Run a parallel inclusive prefix scan on dB, which gets
overwritten during the process

Step 2.3: Based on the last entry in the dB array, which holds the total
number of contacts, allocate from the host on the device the amount of
memory required to store the desired collision information. To this end
youll have to use the size of the struct collisionInfo. Call this array
dCollisionInfo.

26

Step 3


Parallel pass on a per body basis (one thread per body similar
to step 1)


Thread j (associated with body j), computes its number of contacts as


dB[j]-dB[j-1], and sets the variable contactsProcessed=0

Thread j runs a loop for k=j+1 to N

If body j and k are in contact, populate entry


dCollisionInfo[dB[j-1]+contactsProcessed] with this contacts info and
increment contactsProcesed++


Note: you can break out of the look after k as soon as


contactsProcesed== dB[j]-dB[j-1]
27

Concluding Remarks, Brute Force




Level of effort for discussed approach




Step 1, O(N2) (checking body against the rest of the bodies)

Step 2: prefix scan is O(N)

Step 3, O(N2) (checking body against the rest of the bodies, basically a
repetition of Step 1)

No use of the atomicAdd, which is a big performance bottleneck

Numerous versions of this can be contrived to improve the overall


performance


Not discussed here for this brute force idea, rather moving on to a different
approach altogether, called binning
28

Parallel Binning

29

Collision Detection: Binning




Very similar to the idea presented by LeGrand in GPU-Gems 3

30,000 feet perspective:




Do a spatial partitioning of the volume occupied by the bodies




Place bodies in bins (cubes, for instance)

Do a brute force for all bodies that are touching a bin

Taking the bin to be small means that chances are youll not have
too many bodies inside any bin for the brute force stage

Taking the bins to be small means youll have a lot of them


30

Collision Detection (CD): Binning




Example: 2D collision detection, bins are squares

Body 4 touches bins A4, A5, B4, B5


Body 7 touches bins A3, A4, A5, B3, B4, B5, C3, C4, C5
In proposed algorithm, bodies 4 and 7 will be checked for collision
several times: by threads associated with bin A4, A5, B4.




31

CD: Binning


The method draws on




Parallel Sorting


Parallel Exclusive Prefix Scan




Implemented with O(N) work (NVIDIA tech report, also SDK


particle simulation demo)

Implemented with O(N) work (NVIDIA SDK example)

The extremely fast binning operation for the simple convex


geometries that well be dealing with


On a rectangular grid it is very easy to figure out where the CM


(center of mass) of a simple convex geometry will land

32

Binning: The Method




Notation Use:






N number of bodies
Nb number of bins
Na number of active bins
pi - body i
bj bin j

Stage 1: body parallel




Parallelism: one thread per body

Kernel arguments: grid definition






xmin, xmax, ymin, ymax, zmin, zmax


hx, hy, hz (grid size in 3D)
Can also be placed in constant memory,
will end up cached

zmax
hz

xmin
ymin
zmin

ymax

hx

hy
xmax

33

Stage 1: # Bin-Body Contacts







Purpose: find the number of bins touched by each


body in the problem
Store results in the T, array of N integers
Key observation: its easy to bin bodies

34

Stage 2: Parallel Exclusive Scan




Run a parallel exclusive scan on the array T




Save to the side the number of bins touched by the last body, needed
later, otherwise overwritten by the scan operation. Call this value blast
In our case, if you look carefully, blast = 6

Complexity of Stage: O(N), based on parallel scan algorithm of


Harris, see GPU Gem 3 and CUDA SDK

Purpose: determine the amount of


entries M needed to store the indices
of all the bins touched by each body
in the problem

35

Stage 3: Determine body-&-bin association

Allocate an array B of M pairs of integers.




The key (first entry of the pair), is the bin index

The value (second entry of pair) is the body


that touches that bin

Stage is parallel, on a per-body basis

36

Stage 4: Sort

In parallel, run a radix sort


to order the B array
according to the keys
Work load: O(N)

37

Stage 5-8: Find # of Bodies/Bin

38

Purpose: Find the number of bodies per each active


bin and the location of the active bins in B.

Stage 5-8: Find # of Bodies/Bin







Stage 5: Host allocates C, an array of unsigned


integers of length Nb , on device and Initializes it by
the largest possible integer.
Run in parallel, on a per bin basis, find the start
location of each sequence. Write the location to the
corresponding entry of C-value.
Stage 6: Run parallel radix sort to sort C-value.
Stage 7: Find the location of the first inactive bin.


39

To save memory, C can be resized.

Stage 8: Find out nbpbk (number of bodies per bin


k) and store it in entry k of C, as the key associated
with this pair.

Stage 9: Sort C for Load Balancing






Do a parallel radix sort on the array C based on the key


Purpose: balance the load during next stage
NOTE: this stage might or might not be carried out if the load
balancing does not offset the overhead associated with the
sorting job
Effort: O(Na)

The Value
C-array

10

...

The Key

...

A1

A2

A3

A4

A5

B1

...

40

Stage 10: Investigate Collisions in each Bin




Carried out in parallel, one thread per bin

To store information generated during this stage, host needs to


allocate an unsigned integer array D of length Nb



Array D stores the number of actual contacts occurring in each bin


D is in sync with (linked to) C, which in turn is sync with (linked to) B

Parallelism: one thread per bin






Thread k reads the pair key-value in entry k of array C


Thread k reads does rehearsal for brute force collision detection
Outcome: the number s of active collisions taking place in a bin


Value s stored in kth entry of the D array

41

Stage 10, details




In order to carry out this stage you need to keep in mind how C is
organized, which is a reflection of how B is organized

The drill: thread 0 relies on info at C[0], thread 1


relies on info at C[1], etc.
Lets see what thread 2 (goes with C[2]) does:

Read the first 2 bodies that start at offset 6 in B.




These bodies are 4 and 7, and as B indicates, they


touch bin A4
Bodies 4 and 7 turn out to have 1 contact in A4,
which means that entry 2 of D needs to reflect this

42

Stage 10, details




In order to carry out this stage you need to keep in mind how C is
organized, which is a reflection of how B is organized

The drill: thread 0 relies on info at C[0], thread 1


relies on info at C[1], etc.
Lets see what thread 2 (goes with C[2]) does:

Read the first 2 bodies that start at offset 6 in B.




These bodies are 4 and 7, and as B indicates, they


touch bin A4
Bodies 4 and 7 turn out to have 1 contact in A4,
which means that entry 2 of D needs to reflect this

43

Stage 10, details




Brute Force CD rehearsal




Carried out to understand the memory requirements associated with


collisions in each bin


Finds out the total number of contacts owned by a bin

Key question: which bin does a contact belong to?




Answer: It belongs to bin containing the CM of the Contact Volume (CMCV)

Zoom
in...

44

Stage 10, Comments




Two bodies can have multiple contacts, which is ok

Easy to define the CMCV for two spheres, two ellipsoids, and a couple of other
simple geometries


In general finding CMCV might be tricky




Notice picture below, CM of 4 is in A5, CM of 7 is in B4 and CMCV is in A4

Finding the CMCV is the subject of the so called narrow phase collision detection


Itll be simple in our case since we are going to work with simple geometry primitives

45

Stage 11: Exclusive Prefix Scan




Save to the side the number of contacts


in the last bin (last entry of D) dlast


Last entry of D will get overwritten

Run parallel exclusive prefix scan on D:

Total number of actual collisions:


Nc = D[Nb] + dlast

46

Stage 12: Populate Array E




From the host, allocate on the device memory for array E





Array E stores the required collision information: normal, two tangents, etc.
Number of entries in the array: Nc (see previous slide)

In parallel, on a per bin basis (one thread/bin):




Populate the E array with required info

Not discussed in greater detail, this is just like Stage 7, but now you have to
generate actual collision info (stage 7 was the rehearsal)

Thread for A4 will generate the info for contact c


Thread for C2 will generate the info for i and d
Etc.




47

Stage 12, details




B, C, D required to populate array E with collision information

C and B are needed to compute the


collision information
D is needed to understand where the
collision information will be stored in E
48

Stage 12, Comments




In this stage, parallelism is on a per bin basis




Each thread picks up one entry in the array C

Based on info in C you pick up from B the bin id and bodies touching this bin

Based on info in B you run brute force collision detection




You run brute force CD for as long as necessary to find the number of
collisions specified by array D
Note that in some cases there are no collisions, so you exit without doing
anything

As you compute collision information, you store it in array E


49

Parallel Binning: Summary of Stages




Stage 1: Find number of bins touched by each body, populate T (body parallel)

Stage 2: Parallel exclusive scan of T (length of T: N)

Stage 3: Determine body-to-bin association, populate B (body parallel)

Stage 4: Parallel sort of B (length of B: M)

Stage 5: Find active bins, poputale C-value (bin parallel)

Stage 6: Parallel sort of C-value (bin parallel)

Stage 7: Find and remove inactive bins (bin parallel)

Stage 8: Find number of bodies per active bin (bin parallel)

Stage 9: Parallel sort of C for load balancing (length of C: Na)

50

Parallel Binning: Summary of Stages




Stage 10: Determine # of collisions in each bin, store in D (bin parallel)

Stage 11: Parallel prefix scan of D (length of D: Na)

Stage 12: Run collision detection and populate E with required info (bin parallel)

51

Parallel Binning Concluding Remarks




Some unaddressed issues:




How big should the bins be?

Can you have bins of variable size?

How should the computation be organized such that memory access is


not trampled upon?

Can you eliminate stage 5 (the binary search) and use info from the sort
of stage 4?

Do you need stage 9 (sort for load balancing)?

Does it make sense to have a second sort for load balancing (as we
have right now)?

52

Parallel Binning Concluding Remarks

At the cornerstone of the proposed approach is the fact that one can very
easily find the bins that a simple geometry intersects



Method scales very well on multiple GPUs




First, its easy to bin bodies


Second, if you find a contact, its easy to allocate it to a bin and avoid double
counting

Each GPU handles a subvolume of the volume occupied by the bodies

CD algorithm relies on two key algorithms: sorting and prefix scan





Both these operations require O(N) on the GPU


NOTE: a small number of basic algorithms used in many applications.
53

You might also like