Software Reliability: CIS 376 Bruce R. Maxim UM-Dearborn
Software Reliability: CIS 376 Bruce R. Maxim UM-Dearborn
Software Reliability: CIS 376 Bruce R. Maxim UM-Dearborn
CIS 376
Bruce R. Maxim
UM-Dearborn
Software reliability
probability a software component will produce an
incorrect output
software does not wear out
software can continue to operate after a bad result
Operator reliability
probability system user makes an error
Failure Probabilities
If there are two independent components in a
system and the operation of the system depends on
them both then
P(S) = P(A) + P(B)
If the components are replicated then the
probability of failure is
P(S) = P(A)n
meaning that all components fail at once
Non-functional Reliability
Specification
The required level of reliability must be
expressed quantitatively.
Reliability is a dynamic system attribute.
Source code reliability specifications are
meaningless (e.g. N faults/1000 LOC)
An appropriate metric should be chosen to
specify the overall system reliability.
Time Units
Raw Execution Time
non-stop system
Calendar Time
If the system has regular usage patterns
Number of Transactions
demand type transaction systems
Availability
Measures the fraction of time system is
really available for use
Takes repair and restart times into account
Relevant for non-stop continuously running
systems (e.g. traffic signal)
Failure Classification
Transient - only occurs with certain inputs
Permanent - occurs on all inputs
Recoverable - system can recover without
operator help
Unrecoverable - operator has to help
Non-corrupting - failure does not corrupt system
state or data
Corrupting - system state or data are altered
Examples
Failure Class
Example
Metric
ATM fails to
Permanent
Non-corrupting operate with any
ROCOF = .0001
card, must restart to Time unit = days
correct
Magnetic stripe
Transient
Non-corrupting can't be read on
undamaged card
POFOD = .0001
Time unit =
transactions
Specification Validation
It is impossible to empirically validate high
reliability specifications
No database corruption really means
POFOD class < 1 in 200 million
If each transaction takes 1 second to verify,
simulation of one days transactions takes
3.5 days
Safety Specification
Each safety specification should be specified
separately
These requirements should be based on hazard and
risk analysis
Safety requirements usually apply to the system as
a whole rather than individual components
System safety is an an emergent system property
Safety Processes
Hazard and risk analysis
assess the hazards and risks associated with the system
Safety validation
check overall system safety
Hazard decomposition
seek to discover potential root causes for each hazard
Fault-tree Analysis
Hazard analysis method that starts with an
identified fault and works backwards to the
cause of the fault
Can be used at all stages of hazard analysis
It is a top-down technique, that may be
combined with a bottom-up hazard analysis
techniques that start with system failures that
lead to hazards
Risk Assessment
Assess the hazard severity, hazard probability, and
accident probability
Outcome of risk assessment is a statement of
acceptability
Intolerable (can never occur)
ALARP (as low as possible given cost and schedule
constraints)
Acceptable (consequences are acceptable and no extra
cost should be incurred to reduce it further)
Risk Acceptability
Determined by human, social, and political
considerations
In most societies, the boundaries between
regions are pushed upwards with time
(meaning risk becomes less acceptable)
Risk assessment is always subjective (what is
acceptable to one person is ALARP to
another)
Risk Reduction
System should be specified so that hazards do not
arise or result in an accident
Hazard avoidance
system designed so hazard can never arise during normal
operation
Damage limitation
system designed to minimized accident consequences
Security Specification
Similar to safety specification
not possible to specify quantitatively
usually stated in system shall not terms rather
than system shall terms
Differences
no well-defined security life cycle yet
security deals with generic threats rather than
system specific hazards
Threat assignment
identified threats are related to assets so that asset has a
list of associated threats