0% found this document useful (0 votes)
20 views34 pages

Algorithms Pseudocode

The document discusses performance debugging for distributed systems composed of black box components, focusing on isolating performance bottlenecks caused by complex interactions. It presents two algorithms: the nesting algorithm, which analyzes RPC-style communications, and the convolution algorithm, applicable to broader message-based systems. Experimental results demonstrate the effectiveness of these algorithms in identifying high-impact causal paths and visualizing performance issues.

Uploaded by

HARSH VITHLANI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views34 pages

Algorithms Pseudocode

The document discusses performance debugging for distributed systems composed of black box components, focusing on isolating performance bottlenecks caused by complex interactions. It presents two algorithms: the nesting algorithm, which analyzes RPC-style communications, and the convolution algorithm, applicable to broader message-based systems. Experimental results demonstrate the effectiveness of these algorithms in identifying high-impact causal paths and visualizing performance issues.

Uploaded by

HARSH VITHLANI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 34

Performance

Debugging for
Distributed
Systems of
Black Boxes
Haiyong Xie
Yinghua Wu,
Outline
 Problem statement & goals
 Overview of our approach

 Algorithms
 The nesting algorithm (RPC)
 The convolution algorithm (RPC or free-form)

 Experimental results
 Visualization GUI

 Related work

 Conclusions
Motivation
 Complex distributed systems
 Built from black box components

 Heavy communications traffic

 Bottlenecks at some specific nodes

 These systems may have performance problems

 High or erratic latency

 Caused by complex system interactions

 Isolating performance bottlenecks is hard


 We cannot always examine or modify system

components
 We need tools to infer where bottlenecks are

 Choose which black boxes to open


Example multi-tier
system
client client

web server web server web server

authentication
application application
server
server server

100ms

database database
server server
Goals
 Isolating performance bottlenecks
 Find high-impact causal path patterns
 Causal path: series of nodes that sent/received
messages. Each message is caused by receipt of
previous message, and Some causal paths occur
many times
 High-impact: occurs frequently, and contributes
significantly to overall latency
 Identify high-latency nodes on high-impact
patterns
 Add significant latency to these patterns

Then What should We do?


------- Messages Trace is enough
The Black Box

Complex Performance
distributed system bottlenecks
built from “black
boxes”
 Desired properties
 Zero-knowledge, zero-
instrumentation, zero-perturbation
 Scalability
 Accuracy
 Efficiency (time and space)
Outline
 Problem statement & goals
 Overview of our approach

 Algorithms
 The nesting algorithm (RPC)
 The convolution algorithm (RPC or free-form)

 Experimental results
 Visualization GUI

 Related work

 Conclusions
Overview of Approach
 Obtain traces of messages between components
 Ethernet packets, middleware messages, etc.
 Collect traces as non-invasively as possible
 Require very little information:
[timestamp, source, destination, call/return, call-id]

 Analyze traces using our algorithms


 Nesting: faster, more accurate, limited to RPC-style
systems
 Convolution: works for all message-based systems

 Visualize results and highlight high-impact paths


Recap. causal path
client client

web server web server web server

authentication
application application
server
server server

100ms

database database
server server
Challenges
 Trace contain interleaved messages
from many causal paths
 How to identify causal paths?
 Causality trace by Timestamp

 Want only statistically significant


causal paths
 How to differentiate significance?
 It is easy! They appear repeatedly
Outline
 Problem statement & goals
 Overview of our approach

 Algorithms
 The nesting algorithm (RPC)
 The convolution algorithm (RPC or free-form)

 Experimental results
 Visualization GUI

 Related work

 Conclusions
The nesting algorithm
 Depends on RPC-style communication
 Infers causality from “nesting” relationships by
message timestamps
 Suppose A calls B and B calls C before returning to A
 Then the BC call is “nested” in the AB call
 Uses statistical correlation

node A node B node C


call

call
time

return

return
Nesting: an example causal
path
Consider this system of 4 nodes
Looking for internal delays at each node

node A node B node C node D


C call

call

time
A B
return
call
D
return

return
Steps of the nesting
algorithm
1. Pair call and return messages
 (AB, BA), (BD, DB), (BC, CB)
2. Find and score all nesting relationships
 BC nested in AB node A node B node C node D
 BD also nested in AB call

3. Pick best parents call


 Here: unambiguous

time
4. Reconstruct call paths return
 AB[C ; D] call

return
O(m) run time
return
m = number of messages
Pseudo-code for the
nesting algorithm
 Detects calls pairs and find all possible
nestings of one call pair in another
 Pick the most likely candidate for the
causing call for each call pair
 Derive
procedure call paths from the causal
FindCallPairs
for each trace entry (t1, CALL/RET, sender A, receiver B, callid id)
relationships
case CALL:procedure ScoreNestings
procedure FindCallPaths
for each child (B, C, t2, t3) in Tcallpairs
store (t1,CALL,A,B,id) in Topencalls
for each parent (A, B, initialize
t1, t4) inhash table Tpaths
child.parents
case RETURN:
scoreboard[A,
find matching B, C,for
entry (t2, CALL,
each+=
t2-t1]
B, A, id)
callpair (A, B, t1, t2)
in(1/|child.parents|)
Topencalls
if callpair.parents = null then
if match is found then
procedure FindNestedPairs root := { CreatePathNode(callpair, t1) }
remove entry from Topencalls
for eachwith
child (B; C;messageif root
t2; t3) is in
intimestamp
call Tpaths then update its latencies
pairs
update entry return t2
maxscore := 0 else add root to Tpaths
add entry to Tcallpairs
for each
entry.parents := {all B, t1,function
p (A,callpairst4) CreatePathNode(callpair (A, B, t1, t4), tp)
in CALL,
(t3, child.parents
X, A, id2) in Topencalls with
node := new node with name B
score[p] := scoreboard[A, B, C, t2-t1]*penalty
t3 < t2}
if (score[p] > maxscore)node.latency
then := t4 - t1
maxscore := score[p] node.call_delay := t1 - tp
parent := p for each child in callpair.children
node.edges U
parent.children := parent.children :={child}
node.edges U
{ CreatePathNode(child, t1)}
Inferring nesting
 An example of Parallel calls
−Local info not enough
−Use aggregate info node A node B node C
−Histograms keep track
of possible latencies t1
t2
−Medium-length delay t3

time
will be selected t4
−Assign nesting
−Heuristic methods
Outline
 Problem statement & goals
 Overview of our approach

 Algorithms
 The nesting algorithm
 The convolution algorithm

 Experimental results
 Visualization GUI

 Related work

 Conclusions
The convolution
algorithm
 “Time signal” of messages for each
<source node, destination node>
A sent message to B at times 1,2,5,6,7

1 2 3 4 5 6 7 time

S1(t)= AB messages


The convolution
 Look
algorithm
for time-shifted similarities
 Compute convolution X(t) = S2(t)  S1(t)
 Use Fast Fourier Transforms
S1(t)
(AB)

S2(t) Peaks in X(t) suggest


(BC) causality between
AB and BC
X(t)
Time shift of a peak
indicates delay
Convolution details
 Time complexity: O(em+eVlogV)
m = messages
 e = output edges

 V = number of time steps in trace

 Need to choose time step size


 Must be shorter than delays of interest
 Too coarse: poor accuracy

 Too fine: long running time

 Robust to noise in trace


Algorithm comparison
 Nesting

 Looks at individual paths and then aggregates


 Finds rare paths

 Requires call/return style communication

 Fast enough for real-time analysis

 Convolution

 Applicable to a broader class of systems

 Slower: more work with less information

 May need to try different time steps to get good

results
 Reasonable for off-line analysis
Summarize

Nesting Algorithm Convolution Algorithm

Communication
RPC only RPC or free-form messages
style

Rare events Yes, but hard No


<timestamp, sender, receiver>
Level of
+ <timestamp, sender, receiver>
Trace detail call/return tag
Time and space Linear space Linear space
complexity Linear time Polynomial time

Visualization RPC call and return combined Less compact


 More compact
Outline
 Problem statement & goals
 Overview of our approach

 Algorithms

 Experimental results

 Maketrace: a trace generator

 Maketrace web server simulation

 Pet Store EJB traces

 Execution costs

 Visualization GUI

 Related work

 Conclusions
Maketrace
 Synthetic trace generator
 Needed for testing
 Validate output for known input
 Check corner cases
 Uses set of causal path templates
 All call and return messages, with latencies
 Delays are x ± y seconds, Gaussian normal
distribution
 Recipe to combine paths
 Parallelism, start/stop times for each path
 Duration of trace
Desired results for one
 Causal
trace
paths
 How often
 How much time spent

 Nodes
 Host/component name
 Time spent in node

and all of the nodes it


calls
 Edges
 Timeparent waits
before calling child
Measuring Added Delay
 Added 200msec
delay in WS2
 The nesting
algorithm detects
the added delay,
and so does the
convolution
algorithm
Results: Petstore
 Sample EJB application
 J2EE middleware for

Java
 Instrumentation from

Stanford’s PinPoint
project
 50msec delay added in

mylist.jsp
Results: running time
Trace Length Duration Memory CPU time
(messages) (sec) (MB) (sec)
Nesting
Multi-tier (short) 20,164 50 1.5 0.23
Multi-tier 202,520 500 13.8 2.27
Multi-tier (long) 2,026,658 5,000 136.8 23.97
PetStore 234,036 2,000 18.4 2.92
Convolution (20 ms time step)
PetStore 234,036 2,000 25.0 6,301.00
More details and results in paper
Accuracy vs. parallelism
 Increased parallelism degrades accuracy slightly
 Parallelism is number of paths active at same time
false positives

60
40
20
0
0 100 200 300 400 500
parallelism per node
Other results for nesting
 Clock skew algorithm
 Little
effect on accuracy with skew ≤ delays
of interest
 Drop rate
 Little effect on accuracy with drop rates ≤
5%
 Delay variance
 Robust to ≤ 30% variance
 Noise in the trace
 Only matters if same nodes send noise
 Little effect on accuracy with ≤ 15% noise
Visualization GUI
 Goal: highlight
dominant paths
 Paths sorted

 By frequency

 By total time

 Red highlights

 High-cost

nodes
 Timeline

 Nested calls

 Dominant

subcalls
 Time plots

 Node time

 Call delay
Related work
 Systems that trace end-to-end causality via
modified middleware using modified JVM or
J2EE layers
 Magpie (Microsoft Research), aimed at
performance debugging
 Pinpoint (Stanford/Berkeley), aimed at locating
faults
 Products such as AppAssure, PerformaSure,
OptiBench
 Systems that make inferences from traces
 Intrusion detection (Zhang & Paxson, LBL) uses
traces + statistics to find compromised systems
Future work
Automate trace gathering and
conversion
Sliding-window versions of algorithms

 Find phased behavior


 Reduce memory usage of nesting

algorithm
 Improve speed of convolution algorithm

Validateusefulness on more
complicated systems
Conclusions
 Looking for bottlenecks in black box systems
 Finding causal paths is enough to find

bottlenecks
 Algorithms to find paths in traces really

work
 We find correct latency distributions

 Two very different algorithms get similar

results
 Passively collected traces have sufficient

information

You might also like