0% found this document useful (0 votes)
68 views

Lesson2 5DistributedMemorySorting PDF

The document summarizes algorithms for distributed sorting, including bitonic sort, bucket sort, and sample sort. Bitonic sort uses a bitonic merge that has work of O(n log n) but is not work-optimal. Distributed bitonic sort can use either a block distribution scheme with O(log P) communication stages or a cyclic scheme with O(log(n/P)) computation stages. Sample sort uses sampling to choose splitters that define variable-sized buckets, allowing a linear time sort by distributing elements between processes in parallel.

Uploaded by

Projectlouis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views

Lesson2 5DistributedMemorySorting PDF

The document summarizes algorithms for distributed sorting, including bitonic sort, bucket sort, and sample sort. Bitonic sort uses a bitonic merge that has work of O(n log n) but is not work-optimal. Distributed bitonic sort can use either a block distribution scheme with O(log P) communication stages or a cyclic scheme with O(log(n/P)) computation stages. Sample sort uses sampling to choose splitters that define variable-sized buckets, allowing a linear time sort by distributing elements between processes in parallel.

Uploaded by

Projectlouis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Lesson25DistributedMemorySorting

BitonicSort

Thepseudocodeforgeneratingabitonicsequence:

1. createtwobitonicsubsequences
2. Makeoneanincreasingsequence
3. Makeoneintoadecreasingsequence

2
Workforabitonicsort:W
(n)= (nlog
(n))thisalgorithmisnotworkoptimal
bs
2
Spanforabitonicsort:D
(n)= (log
(n))
bs

DistributedBitonicMergeviaBinaryExchange

Withregardstobitonicsorts,everythingboilsdowntodoingbitonicsortsefficiently.

Step1:abitonicsplit,performedinplace
Attheendofstage1therearetwobitonicsubsequences

Foradistributedalgorithm,dividetheprocessesbetweenthe
nodes.

Thisshowstheminimumbetweenthetwoinputs.(Thestartofthesplit.)

Thesecondredlineshowthemaximumbetweenthetwoinputs.Theedges
indicatethedependencebetweenthetwoinputsandoutputs.

Theendresultisabitonicsplit.

Thisshowstheinputs,theoutputs,andthepatternsof
dependencies.

Tolookatthedistributedversionofthethis:
Using4processingnodes.

Inthiscase,communication
Communicationoccursanywhereadependenceedgecrosses
aprocessboundary.
Binaryexchange=twoprocessesswappingdatawitheach
other.

Notethatallcommunicationoccursinthefirstlog(p)stages.


Thereareonlylog(p)stagesthatrequirecommunication.
Intheotherlog(n/P)stagesthereisnoneedforcommunicationbetweenprocesses.

Inacyclicdistributiontherowsofthenetworkareassignedtodifferentprocessesina
roundrobinfashion.Thecommunicationissimilartotheblockscheme.

PickaNetwork

Whichnetworkwouldallowforfullyconcurrentexchangeswithoutcongestion?
Hypercubeandfullyconnectedtopologies.
Thisisbecauseyouneedanetworkwithalinearorbetterbisectionwidth.Thefullyconnected
networkismorethannecessary.

CommunicationCostofaBitonicMerge

Whatisthecommunicationtimeofabitonicmerge,assumingablockdistributed,
binaryexchangeschemeonahypercube.

Communicationtime=(a+(b*(n/p))*log(p))
Recall,thebinaryexchangeschemeonlycommunicatesduringthefirstlog(P)stages.
Eachprocesshastosendn/Pwordsateachstage.

BitonicMergeviaTransposes

Twodistributedbitonicmergeschemeshavebeendiscussed:

Blockdistributionscheme:log(P)stagesofcommunication
andlog(n/P)stagesofcomputation.

Cyclicscheme:log(n/P)stagesofcomputationandlog(P)stages
ofcomputation

Therunningtimeforthetwoarethesame:T
(nP)= log(P)+ (n/P)log(P)
net

Istheresomethingthatwillreducethe term,evenatthecostofincreasingthe term?YES

Startwithacyclictopology:
Thismeansthereisnocommunicationinitially.Thenswitchtoblock,whichmeansno
communicationattheend.Tomakethisworkthedatawillhavetobetransposed(or
shuffled).

Thetransposecanbeseenasanalltoallexchangeorasa
matrixtranspose.

2
Ifwelookatoneprocessnote:itneedstosendP1messages,eachofsizen/P
,toalltheother
processes.

Todeterminetheprocesstime:
1. Assumethenetworkisfullyconnected.

T
(nP)= (P1)+ (n/P)((P1)/P)Fullyconnectednetwork.
trans

Thereisalatencybandwidthtradeoffbetweenthetwo
schemes.
Inpracticeisitveryhardtofortheblockorcyclicschemeto
beatthetransposescheme.

ButterflyTrivia
Nameanotherfamousalgorithmthatfollowsthesame
computationalpattern.
FastFourierTransform

BitonicSortCostComputation

= cost/compare

(n/P )K = totaltimetodothecomparisons,atmergingstageK

Therearelognmergingstages,sothetotalcostforcomputationis:

BitonicSortCostCommunication

Assume:n,ParepowersofnP/n

T
(m)= + m
msg

Whatisthecommunicationtime?
Thekeyquestion:wheredoes
communicationhappen?
ForeachstageK,thesizeofthe
K
bitonicmergeisn
=2
K

Stage4asinglebitonicmergeof
size16.

AssumeablockdistributionwithPprocesses.
CommunicationonlystartswhenK>log(n/P)

Klog(n/P)
P
=2
whenk==log(n)itissimplifiedtoP
K

BitonicMergeinPprocessesofnsizerequires

Timetocommunicate

Sothetimefortheentiresortis:

Thesimplifiedversionis:
O( log(P ) + (n/P )(log 2

P))

LinearTimeDistributedSort,Part1

Anycomparisonbasedalgorithmforsortingscalesto:O(nlogn)

ForthebucketsortO(n)

Todobucketsort:
1. Startbyassumingyouknowthepossiblevalues.R={0,1,2,3,...,m1}
2. Thevaluestobesortedareuniformlydistributedovertherange.
3. DivideRintokbuckets

4. Thebucketsortfirstfiguresoutwhichbucketeachvaluebelongs.

5. Sortwithinineachbucketandconcatenatetheresults.

Howisthisalineartimescheme?
Howmanyelementsareineachbucket?theexpectednumberofelementsineachbucketwill
ben/k.(Assumetheelementsareevenlydistributedacrosstherange)

Thetimetosorteachbucketis:O(n/klog(n/k))

DistributedBucketSort
Itistodistributeassigneachbuckettoacomputenode.
Makethefollowingassumptions:
k=P
elementspernode~n/Pelementspertable
Assumeallnodesknowbucketranges
Assumethenetworkisfullyconnected

Therunningtimeis:O(b*(n/P)+t*(n/P)+(a*P))
Thestepstogetthis:
1. Eachprocessneedstoscanitslistoflocalelementsanddecidewhichelementsgo
where.Thenodescandothisinparallelandtheworkislinear.
2. Thenodesneedtoexchangeelements(analltoalloperation).Eachnodehas~n/P
2
elements.Soexpecttosendn/P
elementstoeverynode.Assumethenetworkisfully
connected.
3. Noweachbucketmustdoalocalsort,ofcostO(n/P)

LineartimeDistributedSort,Part2:SampleSort

Thebucketsorthasamajorflawtheassumptionofauniformdistributionofvaluesacross
thebuckets.Ifyoudonothaveuniformdist.,youwillnotgetanequalnumberofelementsin
eachbucket.

SouseSampling.
Dobucketsort,buttheintervalsvaryaccordingtothedata.Todecidethesizeoftheintervals,
usesampling.

TodoSampleSort:
1. Beginwithdataand,inthiscase,3processes.


2. Assumetheelementsareequallydist.amongthe3processes:

3. Sortthemlocally

4. Eachprocesswillchooseasampleofelementsfromtheirlist.Eachshouldchoosethe
sameequallyspacedelements.

5. Gatherthesamplesintheroot.
6. Sortthesamplesontheroot.

7. SelectP1splitters(inthiscase2)

8. Thesplittersdefinetheglobalbucketboundaries.
9. Forthisexample:

Thefirstbucketwillgetthefirstsplit,the0elements.Thesecondbucket
willgetthesecondsplit,13.Thethirdbucketwillgetthelastsplit,4end.


10. Nowthesplitterswillneedtobebroadcast.
11. Eachnodecanpartitionitselementsusingthesplitters.

12. Thenthenodesexchangevalues.Eachnodewillgetonlythevaluesinitsrange.
13. Theneachnodewilldoalocalsort.

Intherunningtimeforthissamplesort,whatisthelargestasymptoticfunctionofP?(AssumeP
processes)

2
2
2
O(P
)orO(P
logP)...TheroothastosortP
samples.

Ifthesystemistrulymassive,thiscouldbeadelimitedtoscalability.

You might also like