Reliability Blueprint PDF
Reliability Blueprint PDF
Reliability Blueprint PDF
Blueprint for a
Comprehensive Reliability
Program
Prepared by:
ReliaSoft Corporation
Phone: +1.520.886.0410
Fax: +1.520.886.0399
www.ReliaSoft.com
TABLE OF CONTENTS
1 Introduction ................................................................................................... 1
2 Foundations of Reliability ............................................................................ 2
2.1 Fostering a Culture of Reliability .......................................................................... 2
2.2 Product Mission ................................................................................................... 3
2.2.1 Reliability Specifications .............................................................................................3
2.3 Universal Failure Definitions ................................................................................ 4
3 Reliability Testing ......................................................................................... 6
3.1 Customer Usage Profiling .................................................................................... 6
3.2 Test Types ........................................................................................................... 6
3.2.1 Development Testing..................................................................................................7
3.2.2 Manufacturing Testing ................................................................................................8
4 Field Data..................................................................................................... 10
4.1 The “Disconnect” Between In-House and Field Data......................................... 10
4.2 Sales and Forecasting (Shipping) Data ............................................................. 10
4.3 Warranty Data.................................................................................................... 11
4.4 Field Service Data.............................................................................................. 11
4.5 Customer Support Data ..................................................................................... 12
4.6 Returned Parts/Failure Analysis Data................................................................ 12
5 Data Collection............................................................................................ 13
5.1 In-House Test Data Collection ........................................................................... 13
5.1.1 Test Log ....................................................................................................................14
5.1.2 Failure Log ................................................................................................................14
5.1.3 Service Log ...............................................................................................................15
5.2 Field Data Collection.......................................................................................... 16
5.2.1 ReliaSoft’s Dashboard ..............................................................................................16
6 Data Analysis and Reporting ..................................................................... 19
6.1 Non-Parametric Analysis ................................................................................... 19
6.1.1 Non-Parametric Reliability Analysis..........................................................................19
6.2 Parametric Analysis ........................................................................................... 21
6.2.1 Examples of Reporting for Parametric Data Analysis ..............................................22
6.2.1.1 Probability Plot...................................................................................................22
6.2.1.2 Reliability Function ............................................................................................23
6.2.1.3 Probability Density Function ..............................................................................23
6.2.1.4 Failure Rate Function ........................................................................................23
6.2.1.5 Life vs. Stress ....................................................................................................24
6.2.1.6 Likelihood Function............................................................................................24
6.2.1.7 Reliability Importance ........................................................................................25
6.2.1.8 Reliability Growth...............................................................................................25
7 Bringing It All Together .............................................................................. 26
7.1 Connecting Field and Lab Data ......................................................................... 26
7.2 Reliability Growth ............................................................................................... 26
7.3 Optimum Design Level Determination ............................................................... 27
7.4 Competitive Assessment ................................................................................... 28
7.5 Marketing and Advertising ................................................................................. 28
1 INTRODUCTION
At the highest level, the purpose of a reliability engineering program is to test and
report on the reliability of an organization's products. This information is then
used to assess the financial impact of the reliability of the products, and to
improve the overall product reliability and consequently the financial strength of
the organization.
Reliability assessment is based on the results of testing from in-house labs and
data pertaining to the performance results of the product in the field. The data
produced by these sources is to be utilized to accurately measure and improve
the reliability of the products being produced. This is particularly important as
market concerns drive a constant push for cost reduction. However, one must be
able to keep a perspective on "the big picture," instead of merely looking for the
quick fix. It is often the temptation to cut corners and save initial costs by using
cheaper parts or cutting testing programs. Unfortunately, cheaper parts are
usually less reliable, and inadequate testing programs can allow products with
undiscovered flaws to get out into the field. A quick savings in the short term by
the use of cheaper components or small test sample sizes will usually result in
higher long-term costs in the form of warranty costs, or loss of customer
confidence. The proper balance must be struck between reliability, customer
satisfaction, time to market, sales and features. Figure 1 illustrates this concept.
The polygon on the left represents a properly balanced project. The polygon on
the right represents a project in which reliability and customer satisfaction have
been sacrificed for the sake of sales and time to market.
t t
r ke Re r ke Re
a li ab a lia
M ility M bil
to to ity
e e
iT m iT m
ion
ion
Sale
fact
fact
Sale
s
Satis
Satis
s
Features Features
Figure 1 - Graphical Representation of Balanced and Unbalanced Projects.
As the figure above illustrates, a proper balance must be struck between product
reliability and the other aspects of the business, including time-to-market,
manufacturing costs, sales, product features, and customer satisfaction. Through
proper testing and analysis in the in-house testing labs, as well as collection of
adequate and meaningful data on a product's performance in the field, the
reliability of any product can be measured, tracked and improved, leading to a
balanced organization with a financially healthy outlook for the future.
2 FOUNDATIONS OF RELIABILITY
There are a certain number of "up-front" activities that need to be undertaken in
order to be able to successfully implement a reliability program. Most of these
activities are relatively simple, but they are vital to the design and implementation
of a reliability program. It is like the foundation of a house: relatively inexpensive
and largely forgotten once the major construction has begun, but vital to the
structure nonetheless. The reliability "foundation building" concepts need to be
addressed in order to put a strong reliability program in place.
Achieving this culture of reliability may actually be more difficult than it seems, as
some organizations may not have the history or background that lends itself to
the support of a reliability program. This can be particularly true in situations
where the organization has had a niche market or little or no previous
competition with the products that it produces. In the past, the organization's
customers may have had to accept the reliability of the product, good or bad. As
a consequence, the organization may have developed a mentality that tends to
overlook the reliability of a product in favor of the "damn-the-torpedoes, full-
steam-ahead" method of product development. In this type of organization,
reliability engineering methods and practices tend to be viewed as superfluous or
even wasteful. "We don't need all of this reliability stuff, we'll just find the
problems and fix them," tends to be the attitude in these circumstances.
Unfortunately, this attitude often results in poorly-tested, unreliable products
being shipped to customers.
The first step in developing the necessary culture of reliability is to have the
support of the organization's top management. Without this, the implementation
of a reliability program will be a difficult and frustrating process. Adequate
education as to the benefits of a properly constructed reliability program will go a
long way towards building support in upper management for reliability techniques
and processes. Most important is the emphasis of the financial benefits that will
accrue from a good reliability program, particularly in the form of decreased
warranty costs and increased customer goodwill. This latter aspect of the
benefits of reliability engineering can sometimes be an elusive concept to
appreciate. An adage in the reliability field states that if customers are pleased
with a product, they will tell eight other people, but if they are dissatisfied, they
will tell twenty-two other people.1 While this anecdote is rather eye-opening, it
must be put in a financial context to have the full impact. It is possible to
construct a model that links reliability levels of a product with the probability of
repeat sales. It is therefore possible to calculate a loss of sales revenue based
on the unreliability of the product. This type of information is useful in educating
upper management on the financial importance of reliability. Once the upper
management has been adequately educated and is supportive of the
1
This adage is rather dated. Given the popularity of the Internet and the proliferation of newsgroups and
web sites dedicated to disgruntled customers and employees, this number is probably even higher than
twenty-two.
However, one should not stop with upper management when it comes to
educating an organization on the benefits of a proposed reliability program. It is
vital to have the support and understanding of the rest of the organization as
well. Since the implementation of a reliability program will affect the day-to-day
activities of middle management, the engineers, and the technicians, it is also
necessary to convince these groups of the benefits of reliability-oriented
activities. It is important to demonstrate to them how these activities, which may
initially seem pointless or counterproductive to them, will ultimately benefit the
organization. For example, if test technicians have a good understanding of how
the test data are going to be put to use, they will be less likely to cut corners
while performing the test and recording the data. Overall, a reliability program
stands the greatest chance of success if everyone in the organization
understands the benefits and supports the techniques involved.
“The conditional probability, at a given confidence level, that the equipment will
perform its intended functions satisfactorily or without failure, i.e., within specified
performance limits, at a given age, for a specified length of time, function period,
or mission time, when used in the manner and for the purpose intended while
operating under the specified application and operation environments with their
associated stress levels. ”2
With all of the conditions removed, this boils down to defining reliability as the
ability of a product to perform its intended mission without failing. The definition
of reliability springs directly from the product mission, in that product failure is the
inability of the product to perform its defined mission.
2
Kececioglu, Dimitri, Reliability Engineering Handbook, Vol. 1, Prentice-Hall, 1991.
Financial concerns will definitely have to be taken into account when formulating
reliability specifications. Planning for warranty and production part costs is a
significant part of financial planning for the release of a new product. Based on
financial inputs such as these, a picture of the required reliability for a new
product can be established. However, financial wishful thinking should not be the
sole determinant of the reliability specifications. It can lead to problems such as
unrealistic goals, specifications that change on a regular basis to fit test results,
or test results that get "fudged" in order to conform to unrealistic expectations. It
is necessary to couple the financial goals of the product with a good
understanding of product performance in order to get a realistic specification for
product reliability. A proper balance of financial goals and realistic performance
expectations are necessary to develop a detailed and balanced reliability
specification.
One of the most important reasons is that different groups within the organization
may have different definitions as to what sort of behavior actually constitutes a
failure. This is often the case when comparing the different practices of design
and manufacturing engineering groups. Identical tests performed on the same
product by these groups may produce radically different results simply because
the two groups have different definitions of product failure. In order for a reliability
program to be effective, there must be a commonly accepted definition of failure
for the entire organization. Of course, this definition may require a little flexibility
depending on the type of product, development phase, etc., but as long as
everyone is familiar with the commonly accepted definition of failure,
communications will be more effective and the reliability program will be easier to
manage.
3
This is closely related to the concept of product mission. A good baseline definition of failure is the inability
of the product to perform its mission. From that basic definition, more detailed categorizations of failure can
be developed.
During testing, all of these occurrences will be logged with codes to separate the
three failure types. Other test-process-related issues such as deviations from test
plans will be logged in a separate test log. There should be a timely review of
logged occurrences to insure proper classification prior to metric calculation and
reporting. These and other procedural issues will be discussed in detail in the
next section.
3 RELIABILITY TESTING
Reliability testing is the cornerstone of a reliability engineering program. It
provides the most detailed forms of data in that the conditions under which the
data are collected can be carefully controlled and monitored. Furthermore, the
reliability tests can be designed to uncover particular suspected failure modes
and other problems. The type of reliability testing a product undergoes will
change along different points of its life cycle, but the overriding goal is to insure
that data from all or most of the tests were generated under similar enough
conditions so that an "apples-to-apples" comparison can be made of the
product's reliability characteristics at different points in the product's life. It is for
this reason that consistent and thorough reliability specifications and a standard
definition of failure are up-front requirements to implementing reliability testing.
4 FIELD DATA
While reliability testing is vital to the implementation of a reliability program, it is
not the sole source of product reliability performance data. Indeed, the
information received from the field is the "true" measure of product performance,
and is directly linked to the financial aspects of a product. In fact, a significant
proportion of field data may be more finance-related than reliability-related.
However, given the importance of the link between reliability and income, it is
important to insure that adequate reliability information can be gleaned from field
performance data. In many cases, it is not too difficult to adapt field data
collection programs to include information that is directly applicable to reliability
reporting.
Some of the most prevalent types of field data are: sales and forecasting data,
warranty data, field service data, customer support data, and returned
parts/failure analysis data, as discussed below. These discussions will tend
towards generalizations, as every organization has different methods of
monitoring the performance of its products once they are in the field. However,
the illustrations below give a good general overview of how different types of field
data may be collected and put to use for a reliability program.
4
This problem can be ameliorated by setting up a comprehensive field data collection system. This is
discussed in more detail in the section on Field Data Collection.
field. Knowing how many units are being used at any given time period is
absolutely vital to performing any sort of reliability-oriented calculations. Having
an accurate measurement of the number of failures in the field is basically
useless if there is not a good figure for the total number of units in the field at that
time.
5 DATA COLLECTION
Data collection is the framework for a good reliability engineering program. It is
necessary to have an accurate and comprehensive system of recording data
relating to a product's reliability performance in order to be able to produce
meaningful reliability reports. Although the nature of the data collection system
will differ based on the type of data being collected, there must be a certain
number of common elements in the type of data being collected and the way the
information is recorded. This is necessary in order to provide the continuity
necessary for developing an accurate assessment of the "cradle-to-grave"
reliability of a product.
For example, in one instance the field repair personnel were only collecting
information specific to the failure of the system and what they did to correct the
fault. No information was being collected on the time accumulated on the
systems at the time of failure. Fortunately, it was a simple matter to have the
service personnel access the usage information, which was stored on a
computer chip in the system. This information was then included with the rest of
the data collected by the service technician, which allowed for a much greater
resolution in the failure times used in the calculation of field reliability. Previously,
the failure time was calculated by subtracting the failure date from the date the
product was shipped. This could cause problems in that the product could remain
unused for months after it was shipped. By adding the relatively small step of
requiring the service technicians to record the accumulated use time at failure, a
much more accurate model of the field reliability of this unit could be made.
Another difficulty in using field data to perform reliability analyses is that the data
may reside in different places, and in very different forms. The field service data,
customer support data, and failure analysis data may be in different databases,
each of which may be tailored to the specific needs of the group recording the
data. The challenge in this case is in developing a method of gathering all of the
pertinent data from the various sources and databases and pulling it into one
central location where it can easily be processed and analyzed. These functions
can also be performed automatically, using ReliaSoft's Dashboard system.
The system is designed around a master database that captures product quality
and reliability data into one centralized location. Data are incorporated into the
Once the pertinent information has been loaded to a central storage area, it can
easily be processed and presented in a format that will be meaningful to the
users of the information. The Dashboard system presents an at-a-glance
overview of a variety of different analyses based on the data that have been
extracted. It is then possible to "drill down" to more detailed levels of information
from the top-level view.
Another advantage to using the Dashboard (or any other tool or methodology) to
pull together disparate sources of field information is that the planning involved in
bringing these previously separate sources of information together usually results
in a beneficial synergy. By bringing together key players and acting as a catalyst
to discussion and discovery, the planning process often helps an organization to
gain a new understanding of its current processes and to identify untapped
existing resources. Many organizations that have undergone this process have
found that the fresh perspective and ability to spark communication provided by
such consultations are invaluable to their organizations.
MFG
Warranty
Costs
Corporate Repair
Center(s)
Detailed
Failure Analysis
by Product Engineers
Corporate
Technical
Hotline(s)
3rd-Party
Field Repairs
Product
Disposition
Center
Legend
Figure 2 - Example of Linking Existing Data Sources and Customized Data Sources for a
Dashboard System
that were reported failed or had warranty hits in the subsequent weeks after
being shipped. This information can be used to calculate a simple percent
defective for each shipment week. Note that one must make certain to use a
weighting factor to account for the amount of time a particular week's worth of
units have spent in the field. Also, care should be taken to account for the delay
between shipping and actual installation, which can be a substantial time period
for some products. The time period for the average delay (in this example two
weeks, the data of which appears in the gray diagonal in the table) should be
removed from the data being analyzed. Otherwise, a false appearance of a
decreasing defect rate appears in the final results of the analysis. Figure 4 shows
the results of the non-parametric defect rate calculation, unadjusted and adjusted
for the two-week average delay between shipping and installation.
9%
8%
Percent Defective
7%
6%
5%
4%
3%
2%
1%
0%
9901
9902
9903
9904
9905
9906
9907
9908
9909
9910
9911
9912
9913
9914
9915
9916
9917
9918
9919
9920
Ship Week
Unadjusted Adjusted
Figure 3 - Percent Defective Results from Data in Table 1, Unadjusted and Adjusted for
Installation Delay
The origin of this type of data is usually in-house, from reliability testing done in
laboratories set up for that specific purpose. For that reason, a great deal more
detail will be associated with these data sets than with those that are collected
from the field. Unfortunately, in dealing with field data, it is often a matter of
taking what you can get, without being able to have much impact on the quality
of the data. Of course, setting up a good program for the collection of field data
will raise the quality of the field data collected, but generally it will not be nearly
as concise or detailed as the data collected in-house.
The exception to this generalization is field data that contains detailed time-of-
use information. For example, automotive repairs that have odometer
information, aircraft repairs that have associated flight hours or printer repairs
that have a related print count can lend themselves to parametric analysis.
Caution should be exercised when this type of analysis is performed, however, to
make sure that the data are consistent and complete enough to perform a
meaningful parametric analysis.
unnecessary confusion among the end users of the data reports. It is not unusual
for end-users who are not familiar with statistical analyses to become confused
and indignant when presented with seemingly contradictory data on a particular
product. The tendency in cases such as these is to accuse one or both sources
of data (field or in-house) of being inaccurate. This is, of course, not necessarily
true. As was discussed earlier, there will usually tend to be a disparity between
field data and in-house reliability data.
Another reason for the segregation of the field data and the in-house data is the
need for human oversight when performing the calculations. Field data sets tend
to undergo relatively simple mathematical processing which can be safely
automated without having to worry about whether the analysis type is appropriate
for the data being analyzed. However, this can be a concern for in-house data
sets that are undergoing a more complicated statistical analysis. This is not to
say that parametric analysis should not be in any way automated. However, a
degree of human oversight should be included in the process to insure that the
data sets are being analyzed in an appropriate manner. Furthermore, the data
should be cross-referenced against the Test Log and Service Log to make sure
that irrelevant or "outlier" information is not being included in the data.
Probability Plot
99.00
Weibull
Data 1
estimating distribution parameter values. With the use of
90.00
P=2, A=RRX-S
F=19 | S=0
computers that can precisely calculate parametric values,
the probability plot now serves as a graphical method of
assessing the goodness of fit of the data to a chosen
50.00
)
(t
F
y
,
til
i
distribution. Probability plots have nonlinear scales that will
b
ail
er
n 10.00
essentially linearize the distribution function, and allow for
U
assessment of whether the data set is a good fit for that
5.00
particular distribution based on how close the data points
USER
RELIASOFT come to following the straight line. The y-axis usually shows
9/29/99
1.00
10.00 100.00 1000.00
10:21:40 AM
10000.00
the unreliability or probability of failure, while the x-axis
Time, (t)
shows the time or ages of the units. Specific characteristics
β=0.88, η=1220.90, ρ =1.00
of the probability plot will change based on the type of
distribution..
0.20
USER
RELIASOFT
9/29/99
0 10:29:50 AM
0 480.00 960.00 1440.00 1920.00 2400.00
Time, (t)
2.00E-4
USER
RELIASOFT
9/29/99
0 10:39:28 AM
0 480.00 960.00 1440.00 1920.00 2400.00
Time, (t)
USER
RELIASOFT
9/29/99
6.00E-4 10:47:52 AM
0 800.00 1600.00 2400.00 3200.00 4000.00
Time, (t)
Life vs Stress
1.00E+5
T-H/Weib
testing or reliability testing that is performed at
Data 2
Eta
different stress levels. This indicates how the life
Mean Lif e
99 performance of the product changes at different
1
288 / 20 stress levels. The gray shaded areas are actually pdf
F=40 | S=0
10000.00
plots for the product at different stress levels. Note
that it is difficult to make a complete graphical
ef
L
i
comparison of the pdf plots due to the logarithmic
1000.00
scale of the y-axis.
user
RELIASOFT
9/29/99
11:19:59 AM
100.00
260.00 272.00 284.00 296.00 308.00 320.00
Stress
0.05
0.037072
0.032412 0.031455
0.00
Part A Part B Part C Part D Part E
Time = 300
User's Name
Company
10/04/1999
11:37 AM
In the course of this outline, we have discussed some of the more immediate
benefits of having a good reliability program in place. Examples include feeding
information back to manufacturing organizations to aid in maximizing the
efficiency of the manufacturing process and performing system-level reliability
analyses that can benefit the early stages of a development program. There are
still other methods of putting reliability information to use in order to aid the
organization beyond the obvious uses, such as decreased warranty costs.
organization. For example, the detailed data generated during the development
phase of a product can be used with a parametric growth model in order to judge
whether the project will meet its reliability goal within the allotted time. Based on
the growth model results, more efficient allocation of resources on the project
could be implemented based on the expected performance of the product.
Similarly, a less complicated non-parametric growth model could be used to
assess the change of field reliability as a result of design or manufacturing
process changes once the product has been released. On a larger scale, the
reliability growth of specific product lines can be modeled over the course of
several generations of products in order to estimate the reliability and associated
warranty costs of future product lines or projects that have yet to be
implemented.
Total Cost
Cost
Optimum Reliability
Reliability