Algorithms Pseudocode
Algorithms Pseudocode
Debugging for
Distributed
Systems of
Black Boxes
Haiyong Xie
Yinghua Wu,
Outline
Problem statement & goals
Overview of our approach
Algorithms
The nesting algorithm (RPC)
The convolution algorithm (RPC or free-form)
Experimental results
Visualization GUI
Related work
Conclusions
Motivation
Complex distributed systems
Built from black box components
components
We need tools to infer where bottlenecks are
authentication
application application
server
server server
100ms
database database
server server
Goals
Isolating performance bottlenecks
Find high-impact causal path patterns
Causal path: series of nodes that sent/received
messages. Each message is caused by receipt of
previous message, and Some causal paths occur
many times
High-impact: occurs frequently, and contributes
significantly to overall latency
Identify high-latency nodes on high-impact
patterns
Add significant latency to these patterns
Complex Performance
distributed system bottlenecks
built from “black
boxes”
Desired properties
Zero-knowledge, zero-
instrumentation, zero-perturbation
Scalability
Accuracy
Efficiency (time and space)
Outline
Problem statement & goals
Overview of our approach
Algorithms
The nesting algorithm (RPC)
The convolution algorithm (RPC or free-form)
Experimental results
Visualization GUI
Related work
Conclusions
Overview of Approach
Obtain traces of messages between components
Ethernet packets, middleware messages, etc.
Collect traces as non-invasively as possible
Require very little information:
[timestamp, source, destination, call/return, call-id]
authentication
application application
server
server server
100ms
database database
server server
Challenges
Trace contain interleaved messages
from many causal paths
How to identify causal paths?
Causality trace by Timestamp
Algorithms
The nesting algorithm (RPC)
The convolution algorithm (RPC or free-form)
Experimental results
Visualization GUI
Related work
Conclusions
The nesting algorithm
Depends on RPC-style communication
Infers causality from “nesting” relationships by
message timestamps
Suppose A calls B and B calls C before returning to A
Then the BC call is “nested” in the AB call
Uses statistical correlation
call
time
return
return
Nesting: an example causal
path
Consider this system of 4 nodes
Looking for internal delays at each node
call
time
A B
return
call
D
return
return
Steps of the nesting
algorithm
1. Pair call and return messages
(AB, BA), (BD, DB), (BC, CB)
2. Find and score all nesting relationships
BC nested in AB node A node B node C node D
BD also nested in AB call
time
4. Reconstruct call paths return
AB[C ; D] call
return
O(m) run time
return
m = number of messages
Pseudo-code for the
nesting algorithm
Detects calls pairs and find all possible
nestings of one call pair in another
Pick the most likely candidate for the
causing call for each call pair
Derive
procedure call paths from the causal
FindCallPairs
for each trace entry (t1, CALL/RET, sender A, receiver B, callid id)
relationships
case CALL:procedure ScoreNestings
procedure FindCallPaths
for each child (B, C, t2, t3) in Tcallpairs
store (t1,CALL,A,B,id) in Topencalls
for each parent (A, B, initialize
t1, t4) inhash table Tpaths
child.parents
case RETURN:
scoreboard[A,
find matching B, C,for
entry (t2, CALL,
each+=
t2-t1]
B, A, id)
callpair (A, B, t1, t2)
in(1/|child.parents|)
Topencalls
if callpair.parents = null then
if match is found then
procedure FindNestedPairs root := { CreatePathNode(callpair, t1) }
remove entry from Topencalls
for eachwith
child (B; C;messageif root
t2; t3) is in
intimestamp
call Tpaths then update its latencies
pairs
update entry return t2
maxscore := 0 else add root to Tpaths
add entry to Tcallpairs
for each
entry.parents := {all B, t1,function
p (A,callpairst4) CreatePathNode(callpair (A, B, t1, t4), tp)
in CALL,
(t3, child.parents
X, A, id2) in Topencalls with
node := new node with name B
score[p] := scoreboard[A, B, C, t2-t1]*penalty
t3 < t2}
if (score[p] > maxscore)node.latency
then := t4 - t1
maxscore := score[p] node.call_delay := t1 - tp
parent := p for each child in callpair.children
node.edges U
parent.children := parent.children :={child}
node.edges U
{ CreatePathNode(child, t1)}
Inferring nesting
An example of Parallel calls
−Local info not enough
−Use aggregate info node A node B node C
−Histograms keep track
of possible latencies t1
t2
−Medium-length delay t3
time
will be selected t4
−Assign nesting
−Heuristic methods
Outline
Problem statement & goals
Overview of our approach
Algorithms
The nesting algorithm
The convolution algorithm
Experimental results
Visualization GUI
Related work
Conclusions
The convolution
algorithm
“Time signal” of messages for each
<source node, destination node>
A sent message to B at times 1,2,5,6,7
1 2 3 4 5 6 7 time
Convolution
results
Reasonable for off-line analysis
Summarize
Communication
RPC only RPC or free-form messages
style
Algorithms
Experimental results
Execution costs
Visualization GUI
Related work
Conclusions
Maketrace
Synthetic trace generator
Needed for testing
Validate output for known input
Check corner cases
Uses set of causal path templates
All call and return messages, with latencies
Delays are x ± y seconds, Gaussian normal
distribution
Recipe to combine paths
Parallelism, start/stop times for each path
Duration of trace
Desired results for one
Causal
trace
paths
How often
How much time spent
Nodes
Host/component name
Time spent in node
Java
Instrumentation from
Stanford’s PinPoint
project
50msec delay added in
mylist.jsp
Results: running time
Trace Length Duration Memory CPU time
(messages) (sec) (MB) (sec)
Nesting
Multi-tier (short) 20,164 50 1.5 0.23
Multi-tier 202,520 500 13.8 2.27
Multi-tier (long) 2,026,658 5,000 136.8 23.97
PetStore 234,036 2,000 18.4 2.92
Convolution (20 ms time step)
PetStore 234,036 2,000 25.0 6,301.00
More details and results in paper
Accuracy vs. parallelism
Increased parallelism degrades accuracy slightly
Parallelism is number of paths active at same time
false positives
60
40
20
0
0 100 200 300 400 500
parallelism per node
Other results for nesting
Clock skew algorithm
Little
effect on accuracy with skew ≤ delays
of interest
Drop rate
Little effect on accuracy with drop rates ≤
5%
Delay variance
Robust to ≤ 30% variance
Noise in the trace
Only matters if same nodes send noise
Little effect on accuracy with ≤ 15% noise
Visualization GUI
Goal: highlight
dominant paths
Paths sorted
By frequency
By total time
Red highlights
High-cost
nodes
Timeline
Nested calls
Dominant
subcalls
Time plots
Node time
Call delay
Related work
Systems that trace end-to-end causality via
modified middleware using modified JVM or
J2EE layers
Magpie (Microsoft Research), aimed at
performance debugging
Pinpoint (Stanford/Berkeley), aimed at locating
faults
Products such as AppAssure, PerformaSure,
OptiBench
Systems that make inferences from traces
Intrusion detection (Zhang & Paxson, LBL) uses
traces + statistics to find compromised systems
Future work
Automate trace gathering and
conversion
Sliding-window versions of algorithms
algorithm
Improve speed of convolution algorithm
Validateusefulness on more
complicated systems
Conclusions
Looking for bottlenecks in black box systems
Finding causal paths is enough to find
bottlenecks
Algorithms to find paths in traces really
work
We find correct latency distributions
results
Passively collected traces have sufficient
information