Fault Tolerant Systems: Chapter 1: PRELIMINARIES

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

FAULTTOLERANTSYSTEMS

Chapter1:PRELIMINARIES

PRELIMINARIES
FAULTCLASSIFICATION
TYPESOFREDUNDANCY
BASICMEASURESOFFAULTTOLERANCE
TRADITIONALMEASURES
NETWORKMEASURES

OUTLINE

PRELIMINARIES
Computersystems,hardwareandsoftware,
aremostcomplexsystemsevercreatedby
humanbeings.
Criticalapplications:Spaceshuttle,financial
systems,medicalinstruments,etc.
Faulttolerance:techniquestotoleratefaults
whilestilldeliveringacceptablelevelof
serviceforintendedobjectivesofsystems.
3

FAULTCLASSIFICATION
FAULT ERROR FAILURE
Fault:hardwaredefectorsoftware/programming
mistake.
Error:manisfestationoffault.
Failure:notachieveintendedobjectiveofsystem.

Fault/errormayspreadthroughsystem.
Containmentzone:barriertoreducechancethat
fault/errorinonezonepropagatestoanother.
4

FAULTCLASSIFICATION
FAULTCHARACTERISTICS
Permanent:permanentdefect.
Transient:malfunctionforsometimeandrestore
functionalityafterward.
Intermittent:oscillatesbetweenquiescentand
active.

OTHERCHARACTERISTICS.
Benign.
Malicious:appearsreasonable,butincorrect.
5

TYPESOFREDUNDANCY
REDUNDANCY:
Propertyofhavingmoreofaresourcesthanis
minimallynecessarytodothejob.
Whenthereisfault,redundancymasksorworks
aroundfaults.

FORMSOFREDUNDANCY:
Hardwareredundancy(staticanddynamic):
incorporateextrahardwareintodesigntoeither
detectoroverrideeffectsoffailedcomponent.
6

TYPESOFREDUNDANCY
FORMSOFREDUNDANCY(cont.):
Informationredundancy:errordetectionand
correction.
Timeredundancy:reexecutionofsamehardware
orprogram.
Softwareredundancy:multipleversionsof
program.

BASICMEASURESOFFT
MEASURE
Mathematicalabstractionthatexpressessome
relevantfacetofperformanceofobject.Usually
onlycapturesasubsetofproperties.

TYPES:
Traditional.
Network.

BASICMEASURESOFFT
RELIABILITYANDAVAILABILITY:Verylimitedin
whattheycanexpress.
ReliabilityR(t):probabilitythatsystemhasbeen
up(operational)continuouslyintimeinterval
[0,t].
MeanTimeToFailure(MTTF):Averagetime
systemoperatesuntilfailureoccurs.
MeanTimeBetweenFailure(MTBF):Averagetime
betweentwoconsecutivefailures.
9

BASICMEASURESOFFT
MeanTimetoRepair(MTTR):Timeneededto
repairsystemfollowingfirstfailure.
MTBF=MTTF+MTTR
AvailabilityA(t):averagefractionoftimeover
interval[0,t]thatsystemisup(operational).
PointavailabilityAP(t):probabilitythatsystemis
upatparticulartimeinstantt.
Longterm(steadystate)availability.
A = lim A(t)
t

10

BASICMEASURESOFFT
Longtermavailabilitymaybecalculatedfrom
MTTF,MTBF,andMTTR.
A=

MTTF
MTBF

MTTF
MTTF + MTTR

Itispossibleforalowreliabilitysystemtohave
highavailability:asystemthatfailseveryhouron
averagebutcomesbackupafteronlyasecond
MTBFofonehour(lowreliability),butavailability
ishighA=3559/3600=0.99972.
11

BASICMEASURESOFFT
NETWORKMEASURES:
Focusesonnetworkthatconnectsprocessor
together.
Nodeandlineconnectivity:Minimumnumberof
nodesandlines,respectively,thathavetofail
beforenetworkbecomesdisconnected.
Canonlydistinguishestwonetworkstates:
connectedanddisconnected.Itsaysnothing
abouthownetworkdegradesasnodesfailbefore,
orafter,becomingdisconnected.
12

BASICMEASURESOFFT
NETWORKMEASURES:

Bothnetworkshavesamenodeconnectivityof1.
ButN1ismuchmoreconnectedthanN2
probabilityofN1beingbrokenupislowerthan
forN2.
13

OUTLINE

HARWAREFAULTTOLERANCE
INFORMATIONREDUNDANCY
FAULTTOLERANTNETWORK
SOFTWAREFAULTTOLERANCE
CHECKPOINTING
CASESTUDIES
FAULTDETECTIONINCRYPTOGRAPHIC
SYSTEMS
14

You might also like