Stat Fit

Download as pdf or txt
Download as pdf or txt
You are on page 1of 112

Stat::Fit I II I

User Guide User Guide User Guide User Guide


6WDW)LW
Version 2 Version 2 Version 2 Version 2
VWDWLVWLFDOO\ILW VRIWZDUH
JHHUPRXQWDLQVRIWZDUHFRUSRUDWLRQ
II II II II
Stat::Fit Stat::Fit Stat::Fit Stat::Fit
1995, 1996, 2001 Geer Mountain Software
Corp. All rights reserved.
Printed in the United States of America.
Stat::Fit and Statistically Fit are registered
trademarks of Geer Mountain Software Corp.
Windows is a trademark of Microsoft Corpora-
tion.
Stat::Fit III III III III
User Guide User Guide User Guide User Guide
Software License and Limited Warranty Agreement
This document is a legal agreement between you, the end user, and Geer Mountain Software Corpora-
tion. BY OPENING THE SEALED DISK PACKAGE, YOU AGRE TO BE BOUND BY THE
TERMS OF THIS AGREEMENT. IF YOU DO NOT AGREE TO THE TERMS OF THIS AGREE-
MENT, WHICH INCLUDE THE LICENSE AND LIMITED WARRANTY, PROMPTLY RETURN
THE UNOPENED PACKAGE AND ALL OF THE ACCOMPANYING ITEMS (including documen-
tation) FOR A FULL REFUND.
License License License License
Geer Mountain Software grants to you, the end user, a non-
exclusive license to use the enclosed computer program (the
Software) on a single computer system, subject to the terms
and conditions of this License and limited Warranty Agree-
ment.
Copyright and permitted use Copyright and permitted use Copyright and permitted use Copyright and permitted use
The SOFTWARE is owned by Geer Mountain Software and is
protected by United States copyright law and international
treaty provisions. Treat the SOFTWARE exactly as if it were
a book, with one exception: You may make archival copies of
the SOFTWARE to protect it from loss. The SOFTWARE
may be moved from one computer to another, as long as there
is no possibility of two persons using it at the same time.
You may transfer the complete SOFTWARE and the accom-
panying written materials together on a permanent basis pro-
vided you do not retain any copies and the recipient agrees to
the terms of this Agreement.
Other restrictions Other restrictions Other restrictions Other restrictions
You may not lease, rent or sublicense the SOFTWARE. You
may not transfer the SOFTWARE or the accompanying writ-
ten materials except as provided above. You may not reverse
engineer, decompile, disassemble, or create derivative works
from the SOFTWARE. If you later receive an update to this
SOFTWARE or if this SOFTWARE is an update to a prior
version, any transfer must include both the update and all
accessible prior versions of the SOFTWARE.
Limited warranty and liability Limited warranty and liability Limited warranty and liability Limited warranty and liability
Geer Mountain Software warrants only that (a) the SOFT-
WARE will perform substantially in accordance with the
accompanying written materials and (b) the SOFTWARE is
properly recorded on the disk media.
Your failure to return the enclosed registration card may result
in Geer Mountain Software's inability to provide you with
updates to the SOFTWARE and you assume the entire risk of
performance and result in such event. This Limited Warranty
extends for sixty (60) days from the date of purchase. The
above Limited Warranty is in lieu of all other warranties,
whether written, express, implied or otherwise. Geer Moun-
tain Software specifically excludes all implied warranties
including, but not limited to, implied warranties of merchant-
ability and fitness for a particular purpose.
Geer Mountain Software shall not be liable with respect to the
SOFTWARE or otherwise for special, incidental, consequen-
tial, punitive, or exemplary damages even if advised of the
possibility of such damages. In no event shall liability for any
reason and upon any cause of action whatsoever exceed the
purchase price.
U.S. government restricted rights U.S. government restricted rights U.S. government restricted rights U.S. government restricted rights
If you are acquiring the SOFTWARE on behalf of any unit or
agency of the United States Government, the following provi-
sions apply:
The Government acknowledges Geer Mountain Software's
representation that the SOFTWARE and its documentation
were developed at private expense and no part of them is in
the public domain. The SOFTWARE and documentation are
provided with RESTRICTED RIGHTS. Use, duplication, or
disclosure by the Government is subject to restrictions as set
forth in subparagraphs (c)(1) and (I) of The Rights in Techni-
cal Data and Computer Software clause of DFARS 252.227-
7013 or subparagraphs (c)(1) and (2) of the Commercial
Computer Software-Restricted Rights at 48 CFR 52.227-19,
as applicable. Manufacturer is Geer Mountain Software Cor-
poration, 104 Geer Mountain Road, South Kent, CT 06785.
This Agreement is governed by the laws of the State of Con-
necticut. In the event that you breach the provisions of this
Agreement and Geer Mountain Software resorts to legal
action to enforce its rights, you agree to reimburse Geer
Mountain Software for the expense of doing so, including its
reasonable attorneys fees.
IV IV IV IV
Stat::Fit V VV V
User Guide User Guide User Guide User Guide
Table of Contents
Table of Contents ................................................................................ V
Introduction .........................................................................................IX
About the User`s Guide ...................................................... IX
Terms and Conventions ....................................................... X
Technical Support ................................................................ X
Chapter 1: Overview ........................................................................... 1
Basic Operation .................................................................... 2
Fitting a Distribution ...................................................... 3
Chapter 2: Data Entry and Manipulation ............................................ 5
Creating a New Project ......................................................... 6
Opening Existing Projects .................................................... 7
Saving Files .......................................................................... 8
Data Table ............................................................................ 9
Input Options ...................................................................... 11
Operate ......................................................................... 12
Transform ..................................................................... 13
Filter ............................................................................. 14
Repopulate ................................................................... 14
Generate ....................................................................... 15
VI VI VI VI
Input Graph .................................................................. 16
Input Data ..................................................................... 16
Chapter 3: Statistical Analysis ........................................................... 17
Descriptive Statistics .......................................................... 18
Binned Data ........................................................................ 19
Independence Tests ............................................................ 20
Scatter Plot: .................................................................. 20
Autocorrelation: ........................................................... 20
Distribution Fit ............................................................. 22
Goodness of Fit Tests ................................................... 25
Distribution Fit - Auto::Fit ................................................. 31
Chapter 4: Graphs ............................................................................ 33
Result Graphs ..................................................................... 34
Graphics Style .................................................................... 35
Graph ............................................................................ 35
Scale ............................................................................. 36
Text .............................................................................. 36
Fonts ............................................................................. 36
Color ............................................................................. 37
Other Graphs ...................................................................... 38
Distribution Graph ....................................................... 38
Difference Graph .......................................................... 38
Box Plot ........................................................................ 38
Q-Q Plot ....................................................................... 39
P-P Plot ........................................................................ 39
Distribution Viewer ...................................................... 40
Chapter 5: Print and Output Files ...................................................... 43
Printing ............................................................................... 44
Print Style ..................................................................... 44
Stat::Fit VII VII VII VII
User Guide User Guide User Guide User Guide
Printer Set-up ............................................................... 45
Print Preview ................................................................ 45
Print .............................................................................. 45
File Output .......................................................................... 46
Chapter 6: Tutorial ............................................................................ 49
Tutorial ............................................................................... 50
Appendix: Distributions ..................................................................... 55
Beta Distribution (min, max, p, q) ..................................... 56
Binomial Distribution (n, p) .............................................. 58
Chi Squared Distribution (min, nu) .................................... 60
Discrete Uniform Distribution (min, max) ......................... 62
Erlang Distribution (min, m, beta) ..................................... 63
Exponential Distribution (min, beta) .................................. 65
Extreme Value type 1A Distribution (tau, beta) ................. 66
Extreme Value type 1B Distribution (tau, beta) ................. 67
Gamma Distribution (min, alpha, beta) .............................. 68
Geometric Distribution (p) ................................................. 70
Inverse Gaussian Distribution (min, alpha, beta) ............... 71
Inverse Weibull Distribution (min, alpha, beta) ................. 72
Johnson SB Distribution (min, lamda, gamma, delta) ........ 73
Johnson SU Distribution (xi, lamda, gamma, delta) .......... 75
Logistic Distribution (alpha, beta) ...................................... 77
Log-Logistic Distribution (min, p, beta) ............................ 78
Lognormal Distribution (min, mu, sigma) ........................ 80
Negative Binomial Distribution (p,k) ................................. 82
Normal Distribution (mu, sigma) ...................................... 84
Pareto Distribution (min, alpha) ......................................... 85
Pearson 5 Distribution (min, alpha, beta) ........................... 86
Pearson 6 Distribution (min, beta, p, q) .............................. 87
Poisson Distribution (lambda) ............................................ 89
VIII VIII VIII VIII
Power Function Distribution (min, max, alpha) ................. 90
Rayleigh Distribution (min, sigma) .................................... 91
Triangular Distribution (min, max, mode) ......................... 92
Uniform Distribution (min, max) ....................................... 93
Weibull Distribution (min, alpha, beta) .............................. 94
Bibliography ...................................................................................... 97
Index .................................................................................................. 99
Staf::Fit IX IX IX IX
User Guide User Guide User Guide User Guide
Introduction
Stat::Fit, a Statistically Fit application which fits analytical distributions to user data, is meant to be
easy to use. Hopefully its operation is so intuitive that you never need to use this manual. However, just
in case you want to look up an unfamiliar term, or a specific operation, or enjoy reading software man-
uals, we provide a carefully organized document with the information easily accessible.
About the Users Guide About the Users Guide About the Users Guide About the Users Guide
Chapter 1: Overview Chapter 1: Overview Chapter 1: Overview Chapter 1: Overview
Summarizes a Quick Start for using Stat::Fit. An
overview of the basic operations using the default
settings is given.
Chapter 2: Data Entry and Manipu- Chapter 2: Data Entry and Manipu- Chapter 2: Data Entry and Manipu- Chapter 2: Data Entry and Manipu-
lation lation lation lation
Provides the options for bringing data into
Stat::Fit and for their manipulation.
Chapter 3: Statistical Analysis Chapter 3: Statistical Analysis Chapter 3: Statistical Analysis Chapter 3: Statistical Analysis
Describes the distribution fitting process, the sta-
tistical calculations and the Goodness of Fit
tests.
Chapter 4: Graphs Chapter 4: Graphs Chapter 4: Graphs Chapter 4: Graphs
Goes into the numerous options available for the
types of graphs and graph styles.
Chapter 5: Print and Output Files Chapter 5: Print and Output Files Chapter 5: Print and Output Files Chapter 5: Print and Output Files
Provides details on how to print graphs and
reports.
Chapter 6: Tutorial Chapter 6: Tutorial Chapter 6: Tutorial Chapter 6: Tutorial
Is a tutorial with an example.
Appendix: Distributions Appendix: Distributions Appendix: Distributions Appendix: Distributions
Provides descriptions and equations of various
distributions.
X XX X
Terms and Conventions Terms and Conventions Terms and Conventions Terms and Conventions
Terms and Conventions Terms and Conventions Terms and Conventions Terms and Conventions
This manual uses Windows-specific terminology
and assumes that you know how to use Windows.
For help with Windows, see your Windows docu-
mentation. The terminology in this manual
should be familiar to anyone with basic statistics
knowledge.
Technical Support Technical Support Technical Support Technical Support
Technical support for Stat::Fit is available
through PROMODEL Corporation for all
licensed users of ProModel, MedModel, and Ser-
viceModel. PROMODEL technical support rep-
resentatives will be glad to help you with
questions regarding Stat::Fit.
Phone: 1-(888)-PRO-MODEL
Fax: (801) 226-6046
E-mail: pmteam@promodel.com
mmteam@promodel.com
smteam@promodel.com
MondayFriday, 6 am to 6 pm MST
Stat::Fit 1 11 1
User Guide User Guide User Guide User Guide
Chapter 1:
Overview
This section describes the basic operation of Stat::Fit using the programs default settings. For this
example, we assume that the data is available in a text file.
2 22 2 Chapter 1:
Basic Operation Basic Operation Basic Operation Basic Operation
Basic Operation Basic Operation Basic Operation Basic Operation
The data is loaded by clicking on the
Open File icon, or selecting File on the
menu bar and then Open from the Sub-
menu, as shown below. All icon commands are
available in the menu.
A standard Windows dialog box appears, and
allows a choice of drives, directories and files.
The data in an existing text file loads sequentially
into a Data Table (see Chapter 2 for features of
the Data Table). Data may also be entered manu-
ally. Stat::Fit allows up to 8000 numbers.
The number of data points is shown on the upper
right; the number of intervals for binning the data
on the upper left. By default, Stat::Fit automati-
cally chooses the minimum number of intervals
to avoid data smoothing. Also by default, the data
precision is 6 decimal places. (See Chapter 2 for
other interval and precision options.)
Stat::Fit 3 33 3
User Guide User Guide User Guide User Guide
A histogram of the input data is dis-
played by clicking on the Input Graph
icon. (For additional information on
graph styles and options, see Chapter 4.)
Fitting a Distribution Fitting a Distribution Fitting a Distribution Fitting a Distribution
Continuous and discrete analytical
distributions can be automatically
fit to the input data by using the
Auto::Fit command. This command follows
nearly the same procedure described below for
manual fitting, but chooses all distributions
appropriate for the input data. The distributions
are ranked according to their relative goodness of
fit. An indication of their acceptance as good rep-
resentations of the input data is also given. A
table, as shown below provides the results of the
Auto::Fit procedure.
Manual fitting of analytical distributions to the
input data requires a sequence of steps starting
with a setup of the intended calculations.
The setup dialog is entered by clicking
on the Setup icon or selecting Fit from
the Menu bar and Setup from the Sub-
menu.utions to the input data
The first page of the setup dialog presents a list of
analytical distributions. A distribution, say
Erlang, is chosen by clicking on its name in the
list on the left. The selected distribution then
appears in the list on the right. The setup is
selected for use by clicking OK.
4 44 4 Chapter 1:
Basic Operation Basic Operation Basic Operation Basic Operation
The goodness of fit tests are calculated
by clicking on the Fit icon. By default,
only the Kolmogorov Smirnov test is
performed; other tests and options may be
selected on the Calculations page of the setup
dialog, as shown below. (For details of the Chi
Squared, Kolmogorov Smirnov and Anderson
Darling tests, see Chapter 3.)
A summary of the goodness of fit tests appears in
a table, as shown below:
A graph comparing the fitted distribution
to the input data is viewed by clicking on
the Graph Fit icon. (Other results graphs
as well as modifications to each graph are
described in Chapter 4.)
The Stat::Fit project is saved by clicking
on the Save icon which records not only
the input data but also all calculations
and graphs.
Congratulations! You have mastered the Stat::Fit
basics.
Stat::Fit 5 55 5
User Guide User Guide User Guide User Guide
Chapter 2:
Data Entry and Manipulation
This chapter describes in more detail the options available to bring data into Stat::Fit and then manip-
ulate it.
6 66 6 Chapter 2:
Creating a New Project Creating a New Project Creating a New Project Creating a New Project
Creating a New Project Creating a New Project Creating a New Project Creating a New Project
A New Project is created by clicking on
the New Project icon on the control bar
or by selecting File from the menu bar
and then New from the Submenu.
The New Project command generates a new
Stat::Fit document, and shows an empty Data
Table with the caption, document xx, where xx is
a sequential number depending on the number of
previously generated documents. The document
may be named by invoking the Save As com-
mand and naming the project file. Thereafter, the
document will be associated with this stored file.
The new document does not close any other doc-
ument. Stat::Fit allows multiple documents to be
open at any time. The only limit is the confusion
caused by the multitude of views that may be
opened.
An input table appears, as shown below, which
allows manual data entry.
Alternatively, data may be pasted from the Clip-
board.
Stat::Fit 7 77 7
User Guide User Guide User Guide User Guide
Opening Existing Projects Opening Existing Projects Opening Existing Projects Opening Existing Projects
An existing project is opened by choos-
ing File on the Menu bar and then Open
from the Submenu, or by clicking on the
Open icon on the control bar.
An Open Project Dialog box allows a choice of
drives, directories and files.
Stat::Fit accepts 4 types of files:
SFP SFP SFP SFP Stat::Fit project file
DAT DAT DAT DAT Input data
.* .* .* .* User specified designation for input data
BMP BMP BMP BMP Graphics bitmap file
Select the appropriate file type and click on OK.
If the filename has a .SFP extension indicating a
Stat::Fit project file, the project file is opened in a
new document and associated with that docu-
ment. If the filename has a .BMP extension indi-
cating a saved bitmap (graph...), the bitmap is
displayed. Otherwise, a text file is assumed and a
new project is opened by reading the file for input
data. The document created from a text file has
an association with a project file named after the
text file but with the .SFP extension. The project
file has not been saved.
If the number text contains non-numeric charac-
ters, they cause the number just prior to the non-
numeric text to be entered. For example, 15.45%
would be entered as 15.45, but 16,452,375 would
be entered as three numbers: 16, 452 and 375.
8 88 8 Chapter 2:
Saving Files Saving Files Saving Files Saving Files
Saving Files Saving Files Saving Files Saving Files
The project file, the input data, or any
graph are saved through one of the Save
commands in the File submenu, or by
clicking on the Save icon on the control bar.
When input data is entered into Stat::Fit whether
through manual entry in a new document, open-
ing a data file, pasting data in the Clipboard, or
reopening a Stat::Fit project file, a Stat::Fit docu-
ment is created which contains the data and all
subsequent calculations and graphs. If the docu-
ment is initiated from an existing file, it assumes
the name of that file and the document can be
saved automatically as a Stat::Fit project [.SFP
extension] with the Save command.
The Save command saves the Stat::Fit document
to its project file. The existing file is overwritten.
If a project file does not exist (the document win-
dows will have a document xx name), the Save
As command will be called.
The Save command does NOT save the input
data in a text file, but saves the full document,
that is, input data, calculations, and view infor-
mation, to a binary project file, your project
name.SFP. This binary file can be reopened in
Stat::Fit, but cannot be imported into other appli-
cations. If a text file of the input data is desired,
the Save Input command should be used.
The Save As command is multipurpose. If the
document is unnamed, it can be saved as either a
Stat::Fit project or a text data file with the Save
As command. If a document is named, its name
can be changed by saving either the project or the
input data to a file with a new name. (In any situ-
ation, the document assumes the name of the file-
name used.)
The Save Input command saves the input data in
a separate text file, with each data point separated
with a carriage return. This maintains the integ-
rity of your data separate from the Stat::Fit
project files and calculations. If an existing asso-
ciation with a text file exists, a prompt will ask
for overwrite permission. Otherwise, a Save As
dialog will prompt for a file name, save to that
file, and associate that text file with the docu-
ment. If no extension is specified, the file will be
saved with the extension .DAT.
Stat::Fit 9 99 9
User Guide User Guide User Guide User Guide
Data Table Data Table Data Table Data Table
All data entry in Stat::Fit occurs through the
Data Table. After a project is opened, data may
be entered manually, by pasting from the Clip-
board, or by generating data points from the ran-
dom variate generator. An existing Stat::Fit
project may be opened and data may be added
manually. An example of the Data Table is shown
below:
All data are entered as single measurements, not
cumulative data. The numbers on the left are
aides for location and scroll with the data. The
total number of data points and intervals for con-
tinuous data are shown at the top.
All data can be viewed by using the central scroll
bar or the keyboard. The scroll bar handle can be
dragged to get to a data area quickly, or the scroll
bar can be clicked above or below the handle to
step up or down a page of data. The arrows can
be clicked to step up or down one data point.
The Page Up and Page Down keys can be used to
step up or down a data page. The up and down
arrow keys can be used to step up or down a data
point. The Home key forces the Data Table to the
top of the data, the End key, to the bottom.
Manual data entry requires that the Data Table be
the currently active window which requires click-
ing on the window if it does not already have the
colored title bar. Manual data entry begins when
a number is typed. The current data in the Data
Table is grayed and an input box is opened. The
input box will remain open until the Enter key is
hit unless the Esc key is used to abort the entry.
All numbers are floating point, and can be
entered in straight decimal fashion, such as
0.972, or scientific notation, 9.72e-1 where exx
stands for the power of ten to be multiplied by the
preceding number. Integers are stored as floating
point numbers.
If Insert is off, the default condition, the data
point is entered at the current highlighted location
(the dashed line box, not the colored box). A
number may be highlighted with a click of the
mouse at that location. Note that the number is
also selected (the colored box) although this does
not affect manual data entry. If Insert is on, the
data point is entered before the data point in the
highlighted box, except at the end of the data set.
If a data point is entered in the highlighted box at
the end of the data set, the data point is appended
to the data set and the highlighted box is moved
to the next empty location. In this way data may
be entered continuously without relocating the
data entry point. The empty position at the end of
the data set can be easily reached by using the
End key unless the Data Table is full, 8000 num-
bers.
A single number or group of numbers may be
selected in the Data Table by clicking or dragging
the mouse. The selected numbers are highlighted
in a color, usually blue. If the shift key is used
with a mouse click, a range of numbers is
selected from the last selected number to the cur-
rent position. If the ctrl key is used with a mouse
click, the current position is added to the current
10 10 10 10 Chapter 2:
Data Table Data Table Data Table Data Table
selections unless it was already selected, in which
case it is deselected.
The Delete key deletes the currently selected area
(the colored area) which can be a single number
or group of numbers. There is no undelete. The
Delete command in the Edit menu may also be
used. The Cut command in the Edit menu deletes
the selected numbers and places them in the Clip-
board. The Copy command copies the currently
selected numbers into the Clipboard. The Paste
command pastes the numbers in the Clipboard
before the number in the current highlighted
(dashed box) location, not the selected location.
The Clear command clears all input data and cal-
culations in the current document, after a con-
firming dialog. All views which depend on these
data and calculations are closed. An empty Data
Table is left open and the document is left open.
The underlying Stat::Fit project file, if any, is left
intact, but a Save command will clear it as well.
Use this command carefully. This command is
NOT the same as the New command because it
maintains the documents connection to the disk
file associated with it, if any.
Stat::Fit 11 11 11 11
User Guide User Guide User Guide User Guide
Input Options Input Options Input Options Input Options
Input Options allows several data handling
options to be set: the number of intervals for the
histogram and the chi-squared goodness of fit
test, the precision with which the data will be
shown and stored, and the distribution types
which will be allowed.
The Input Options dialog is entered by
clicking on the Input Options icon or by
selecting Input from the menu bar and
then Options from the Submenu.
An Input Options Dialog box is shown below:
The number of intervals specifies the number of
bins into which the input data will be sorted.
These bins are used only for continuous distribu-
tions; discrete distributions are collected at inte-
ger values. If the input data is forced to be
treated as discrete, this choice will be grayed.
Note that the name intervals is used in Stat::Fit
to represent the classes for continuous data in
order to separate its use from the integer classes
used for discrete data.
The number of intervals are used to display con-
tinuous data in a histogram and to compare the
input data with the fitted data through a chi-
squared test. Please note that the intervals will be
equal length for display, but may be of either
equal length or of equal probability for the chi-
squared test. Also, the number of intervals for a
continuous representation of discrete data will
always default to the maximum number of dis-
crete classes for the same data.
The five choices for deciding on the number of
intervals are:
Auto Automatic mode uses the minimum
number of intervals possible without losing
information.
1
Then the intervals are
increased if the skewness of the sample is
large.
Sturges An empirical rule for assessing
the desirable number of intervals into which
the distribution of observed data should be
classified. If N is the number of data points
and k the number of intervals, then:
N 1 3.3 log
10
N.
Lower Bounds Lower Bounds mode
uses the minimum number of intervals possi-
ble without losing information. If N is the
number of data points and k is the number of
intervals, then:
1. George R. Terrell & David W. Scott 'Oversmoothed
Nonparametric Density Estimates, J. American Sta-
tistical Association, Vol.80, No. 389, March 1985,
p.209-214
N 21 ( )
1 3

12 12 12 12 Chapter 2:
Input Options Input Options Input Options Input Options
Scott Scott model is based on using the
Normal density as a reference density for
constructing histograms. If N is the number
of data points, sigma is the standard devia-
tion of the sample, and k is the number of
intervals, then:
ManuaI Allows arbitrary setting of the
number of intervals, up to a limit of 1000.
The precision of the data is the number of deci-
mal places shown for the input data and all subse-
quent calculations. The default precision is 6
decimal places and is initially set on. The preci-
sion can be set between 0 and 15. Note that all
discrete data is stored as a floating point number.
Please note Please note Please note Please note
While all calculations are performed at maximum
precision, the input data and calculations will be
written to file with the precision chosen here. If
the data has greater precision than the precision
here, it will be rounded when stored.
Distribution Type The type of analytical dis-
tribution can be either continuous or discrete. In
general, all distributions will be treated as either
type by default. However, the analysis may be
forced to either continuous distributions or dis-
crete distributions by checking the appropriate
box in the Input Options dialog.
In particular, discrete distributions are forced to
be distributions with integer values only. If the
input data is discrete, but the data points are mul-
tiples of continuous values, divide the data by the
smallest common denominator before attempting
to analyze it. Input truncation to eliminate small
round-off errors is also useful.
The maximum number of classes for a discrete
distribution is limited to 5000. If the number of
classes to support the input data is greater than
this, the analysis will be limited to continuous
distributions.
Most of the discrete distributions start at 0. If the
data has negative values, an offset should be
added to it before analysis.
Operate Operate Operate Operate
Mathematical operations on the input data are
chosen from the Operate dialog by selecting
Input from the Menu bar and then Operate from
the Submenu.
The Operate dialog allows the choice of a single
standard mathematical operation on the input
data. The operation will affect all input data
regardless of whether a subset of input data is
selected. Mathematical overflow, underflow or
N 1 ( )
1 3 PD[ PLQ ( )
3.5
-------------------------------
( ,
j \

Stat::Fit 13 13 13 13
User Guide User Guide User Guide User Guide
other error will cause an error message and all the
input data will be restored.
The operations of addition, subtraction, multipli-
cation, division, floor and absolute value can be
performed. The operation of rounding will round
the input data points to their nearest integer. The
data can also be sorted into ascending or descend-
ing order, or unsorted with randomly mix.
Transform Transform Transform Transform
Data transformations of the input data are chosen
from the Transform dialog by selecting Input
from the Menu bar and then Transform from the
submenu.
The Transform dialog allows the choice of a sin-
gle standard mathematics function to be used on
the input data. The operation will affect all input
data regardless of whether a subset of input data
is selected. Mathematical overflow, underflow or
other error will cause an error message and all the
input data will be restored.
The transform functions available are: natural
logarithm, log to base 10, exponential, cosine,
sine, square root, reciprocal, raise to any power,
difference and % change. Difference takes the
difference between adjacent data points with the
lower data point first. The total number of result-
ing data points is reduced by one. % change cal-
culates the percent change of adjacent data points
by dividing the difference, lower point first, by
the upper data point and then multiplying by 100.
The total number of data points is reduced by
one.
14 14 14 14 Chapter 2:
Input Options Input Options Input Options Input Options
Filter Filter Filter Filter
Filtering of the input data can be chosen from the
Filter dialog by selecting Input from the Menu
bar and then Filter from the submenu.
The Filter dialog allows the choice of a single fil-
ter to be applied to the input data, discarding data
outside the constraints of the filter. All filters
DISCARD unwanted data and change the statis-
tics. The appropriate input boxes are opened
with each choice of filter. With the exception of
the positive filter which excludes zero, all filters
are inclusive, that is, they always include num-
bers at the filter boundary.
The filters include a minimum cutoff, a maxi-
mum cutoff, both minimum and maximum cut-
offs, keeping only positive numbers (a negative
and zero cutoff), a non-negative cutoff, and a
near mean cutoff. The near mean filters all data
points, excluding all data
points less than the mean minus the standard
deviation times the indicated multiplier or greater
than the mean plus the standard deviation times
the indicated multiplier.
Repopulate Repopulate Repopulate Repopulate
The Repopulate command allows the user to
expand rounded data about each integer. Each
point is randomly positioned about the integer
with its relative value weighted by the existing
shape of the input data distribution. If lower or
upper bounds are known, the points are restricted
to regions above and below these bounds, respec-
tively. The Repopulate command is restricted to
integer data only, and limited in range from
1000 to +1000.
To use the repopulate function, select Input from
the Menu bar and the Repopulate from the Sub-
menu.
Stat::Fit 15 15 15 15
User Guide User Guide User Guide User Guide
The following dialog will be displayed.
The new data points will have a number of deci-
mal places specified by the generated precision.
The goodness of fit tests, the Maximum Likeli-
hood Estimates and the Moment Estimates
require at least three digits to give reasonable
results. The sequence of numbers is repeatable if
the same random number stream is used (e.g.
stream 0). However, the generated numbers, and
the resulting fit, can be varied by choosing a dif-
ferent random number stream, 0-99.
Please note Please note Please note Please note
This repopulation of the decimal part of the data
is not the same as the original data was or would
have been, but only represents the information
not destroyed by rounding. The parameter esti-
mates are not as accurate as would be obtained
with unrounded original data. In order to get an
estimate of the variation of fitted parameters, try
regenerating the data set with several random
number streams.
Generate Generate Generate Generate
Random variates can be generated from
the Generate dialog by selecting Input
from the Menu bar and then Generate
from the submenu, or Clicking on the Generate
icon.
The Generate dialog provides the choice of distri-
bution, parameters, and random number stream
for the generation of random variates from each
of the distributions covered by Stat::Fit. The
generation is limited to 8000 points maximum,
the limit of the input table used by Stat::Fit. The
sequence of numbers is repeatable for each distri-
bution because the same random number stream
is used (stream 0). However, the sequence of
numbers can be varied by choosing a different
random number stream, 0-99.
The generator will not change existing data in the
Data Table, but will append the generated data
points up to the limit of 8000 points. In this man-
ner the sum of two or more distributions may be
tested. Sorting will not be preserved.
This generator can be used to provide a file of
random numbers for another program as well as
16 16 16 16 Chapter 2:
Input Options Input Options Input Options Input Options
to test the variation of the distribution estimates
once the input data has been fit.
Input Graph Input Graph Input Graph Input Graph
A graph of the input data can be viewed
by selecting Input from the Menu bar
and then Input Graph from the Submenu,
or clicking on the Input Graph icon.
A histogram of your data will be displayed. An
example is shown below.
If the input data in the Data Table is continuous
data, or is forced to be treated as continuous in
the Input Options dialog, the input graph will be a
histogram with the number of intervals being
given by the choice of interval type in the Input
Options. If the data is forced to be treated as dis-
crete, the input graph will be a line graph with the
number of classes being determined by the mini-
mum and maximum values. Note that discrete
data must be integer values. The data used to
generate the Input Graph can be viewed by using
the Binned Data command in the Statistics menu
(see Chapter 3).
This graph, as with all graphs in Stat::Fit, may be
modified, saved copied, or printed with options
generally given in the Graph Style, Save As, and
Copy commands in the Graphics menu. See
Chapter 4 for information on Graph Styles.
Input Data Input Data Input Data Input Data
If the Data Table has been closed, then it can be
redisplayed by selecting Input from the menu bar
and Input Data from the submenu.
Stat::Fit 17 17 17 17
User Guide User Guide User Guide User Guide
Chapter 3:
Statistical Analysis
This section describes the descriptive statistics, the statistical calculations on the input data, the distri-
bution fitting process, and the goodness of fit tests. This manual is not meant as a textbook on statisti-
cal analysis. For more information on the distributions, see Appendix: Distributions on page 55.
For further understanding, see the books referenced in the Bibliography on page 97.
18 18 18 18 Chapter 3:
Descriptive Statistics Descriptive Statistics Descriptive Statistics Descriptive Statistics
Descriptive Statistics Descriptive Statistics Descriptive Statistics Descriptive Statistics
The descriptive statistics for the input data can be
viewed by selecting Statistics on the Menu bar
and then Descriptive from the Submenu. The fol-
lowing window will appear:
The Descriptive Statistics command provides the
basic statistical observations and calculations on
the input data, and presents these in a simple
view as shown above. Please note that as long as
this window is open, the calculations will be
updated when the input data is changed. In gen-
eral, all open windows will be updated when the
information upon which they depend changes.
Therefore, it is a good idea, on slower machines,
to close such calculation windows before chang-
ing the data.
Stat::Fit 19 19 19 19
User Guide User Guide User Guide User Guide
Binned Data Binned Data Binned Data Binned Data
The histogram / class data is available by select-
ing Statistics on the Menu bar and then Binned
Data from the Submenu.
The number of intervals used for continuous data
is determined by the interval option in the Input
Options dialog. By default, this number is deter-
mined automatically from the total number of
data points. A typical output is shown below:
For convenience, frequency and relative fre-
quency are given. If the data is sensed to be dis-
crete (all integer), then the classes for the discrete
representation are also given, at least up to 1000
classes. The availability of interval or class data
can also be affected by forcing the distribution
type to be either continuous or discrete.
Because the table can be large, it is viewed best
expanded to full screen by selecting the up arrow
box in the upper right corner of the screen. A
scroll bar allows you to view the rest of the table.
This grouping of the input data is used to produce
representative graphs. For continuous data, the
ascending and descending cumulative distribu-
tions match the appropriate endpoints. The den-
sity matches the appropriate midpoints. For
discrete distributions, the data is grouped accord-
ing to individual classes, with increments of one
on the x-axis.
20 20 20 20 Chapter 3:
Independence Tests Independence Tests Independence Tests Independence Tests
Independence Tests Independence Tests Independence Tests Independence Tests
All of the fitting routines assume that your data
are independent, identically distributed (IID), that
is, each point is independent of all the other data
points and all data points are drawn from identi-
cal distributions. Stat::Fit provides three types of
tests for independence.
The Independence Tests are chosen by selecting
Statistics on the Menu bar and then Independence
from the Submenu. The following submenu will
be shown:
Scatter Plot: Scatter Plot: Scatter Plot: Scatter Plot:
This is a plot of adjacent points in the sequence of
input data against each other. Thus each plotted
point represents a pair of data points [X
i+1
, X
i
].
This is repeated for all pairs of adjacent data
points. If the input data are somewhat dependent
on each other, then this plot will exhibit that
dependence. Time series, where the current data
point may depend on the nearest previous
value(s), will show that pattern here as a struc-
tured curve rather than a seemingly independent
scatter of points. An example is shown below.
The structure of dependent data can be visualized
graphically by starting with randomly generated
data, choosing this plot, and then putting the data
in ascending order with the Input / Operate com-
mands. The position of each point is now depen-
dent on the previous points and this plot would be
close to a straight line.
Autocorrelation: Autocorrelation: Autocorrelation: Autocorrelation:
The autocorrelation calculation used here
assumes that the data are taken from a stationary
process, that is, the data would appear the same
(statistically) for any reasonable subset of the
data. In the case of a time series, this implies that
the time origin may be shifted without affecting
the statistical characteristics of the series. Thus
the variance for the whole sample can be used to
represent the variance of any subset. For a simu-
lation study, this may mean discarding an early
warm-up period (see Law & Kelton
1
). In many
other applications involving ongoing series,
including financial, a suitable transformation of
the data might have to be made. If the process
being studied is not stationary, the calculation
1. 'Simulation Modeling & Analysis, Averill M.
Law, W. David Kelton, 1991, McGraw-Hill, p. 293
Stat::Fit 21 21 21 21
User Guide User Guide User Guide User Guide
and discussion of autocorrelation is more com-
plex (see Box
1
).
A graphical view of the autocorrelation can be
displayed by plotting the scatter of related data
points. The Scatter Plot, as previously described,
is a plot of adjacent data points, that is, of separa-
tion or lag 1. Scatter plots for data points further
removed from each other in the series, that is, for
lag j, could also be plotted, but the autocorrela-
tion is more instructive. The autocorrelation, rho,
is calculated from the equation:
where j is the lag between data points, s is the
standard deviation of the population, approxi-
mated by the standard deviation of the sample,
and xbar is the sample mean. The calculation is
carried out to 1/5 of the length of the data set
where diminishing pairs start to make the calcula-
tion unreliable.
The autocorrelation varies between 1 and -1,
between positive and negative correlation. If the
autocorrelation is near either extreme, the data
are autocorrelated. Note, however, that the auto-
correlation can assume finite values due to the
randomness of the data even though no signifi-
cant autocorrelation exists.
The numbers in parentheses along the x-axis are
the maximum positive and negative correlations.
For large data sets, this plot can take a while to
get to the screen. The overall screen redrawing
can be improved by viewing this plot and closing
it thereafter. The calculation is saved internally
and need not be recalculated unless the input data
changes.
Runs Tests Runs Tests Runs Tests Runs Tests
The Runs Test command calculates two different
runs tests for randomness of the data and displays
a view of the results. The result of each test is
either DO NOT REJECT the hypothesis that the
series is random or REJECT that hypothesis with
the level of significance given. The level of sig-
nificance is the probability that a rejected hypoth-
esis is actually true, that is, that the test rejects the
randomness of the series when the series is actu-
ally random.
A run in a series of observations is the occurrence
of an uninterrupted sequence of numbers with the
same attribute. For instance, a consecutive set of
increasing or decreasing numbers is said to pro-
vide runs up or down respectively. In particu-
lar, a single isolated occurrence is regarded as a
run of one.
The number of runs in a series of observations
indicates the randomness of those observations.
Too few runs indicate strong correlation, point to
point. Too many runs indicate cyclic behavior.
The first runs test is a median test which mea-
sures the number of runs, that is, sequences of
numbers, above and below the median (see
Brunk
2
). The run can be a single number above
or below the median if the numbers adjacent to it
are in the opposite direction. If there are too
many or too few runs, the randomness of the
series is rejected. This median runs test uses a
1. 'Time Series Analysis, George E. P. Box, Gwilym
M. Jenkins, Gregory C. Reinsel, 1994, Prentice-Hall
[
L
[ ( ) [
L M
[ ( )

2
Q M ( )
------------------------------------------
L D
Q

2. 'An Introduction to Mathematical Statistics,


H.D.Brunk, 1960, Ginn
22 22 22 22 Chapter 3:
Independence Tests Independence Tests Independence Tests Independence Tests
normal approximation for acceptance/rejection
which requires that the number of data points
above/below the median be greater than 10. An
error message will be printed if this condition is
not met.
The above/below median runs test will not work
if there are too few data points or for certain dis-
crete distributions.
The second runs test is a turning point test which
measures the number of times the series changes
direction (see Johnson
1
). Again, if there are too
many turning points or too few, the randomness
of the series is rejected. This turning point runs
test uses a normal approximation for acceptance/
rejection which requires that the total number of
data points be greater than 12. An error message
will be printed if this condition is not met.
While there are other runs tests for randomness,
some of the most sensitive require larger data
sets, in excess of 4000 numbers (see Knuth
2
).
Examples of the Runs Tests are shown below in
the table. The length of the runs and their distri-
bution is given.
Distribution Fit Distribution Fit Distribution Fit Distribution Fit
Automatic fitting of continuous distributions can
be performed by using the Auto::Fit command.
This command follows the same procedure as
discussed below for manual fitting, but chooses
distributions appropriate for the input data. It
also ranks the distributions according to their rel-
ative goodness of fit, and gives an indication of
their acceptance as good representations of the
input data. For more details, see the section on
Auto::Fit at the end of this chapter.
The manual fitting of analytical distributions to
the input data in the Data Table takes three steps.
First, distributions appropriate to the input data
must be chosen in the Fit Setup dialog along with
the desired goodness of fit tests. Then, estimates
of the parameters for each chosen distribution
1. 'Univariate Discrete Distributions", Norman L.
Johnson, Samuel Kotz, Adrienne W. Kemp, 1992,
John Wiley & Sons, p.425
2. 'Seminumerical Algorithms, Donald E. Knuth,
1981, Addison-Wesley
Stat::Fit 23 23 23 23
User Guide User Guide User Guide User Guide
must be calculated by using either the moment
equations or the maximum likelihood equation.
Finally the goodness of fit tests are calculated for
each fitted distribution in order to ascertain the
relative goodness of fit. (See Breiman
1
, Law &
Kelton
2
, Banks & Carson
3
, Stuart & Ord
4
.)
Begin the distribution fitting process by
selecting Fit on the Menu bar and then
Setup from the Submenu, or by clicking on the
Fit Setup icon.
The Distribution page of the Fit Setup dialog pro-
vides a distribution list for the choice of distribu-
tions for subsequent fitting. All distributions
chosen here will be used sequentially for esti-
mates and goodness of fit tests. Clicking on a
distribution name in the distribution list on the
left chooses that distribution and moves that dis-
tribution name to the distributions selected box
on the right unless it is already there. Clicking on
the distribution name in the distributions selected
box on the right removes the distribution. All
distributions may be moved to the distributions
selected box by clicking the Select All button.
The distributions selected box may be cleared by
clicking the Clear button.
If the choice of distributions is uncertain or the
data minimal, use the guides in the following
Help directories:
Guided choice of distributions
No data choice of distribution
These guides should give some ideas on appro-
priate models for the input data. Also, each dis-
tribution is described separately in the Appendix,
along with examples.
After selecting the distribution(s), go to the next
window of the dialog box to select the calcula-
tions to be performed.
Estimates can be obtained from either Moments
or Maximum Likelihood Estimates (MLEs). The
default setting for the calculation is MLE.
For continuous distributions with a lower bound
or minimum such as the Exponential, the lower
bound can be forced to assume a value at or
below the minimum data value. This lower
bound will be used for both the moments and
maximum likelihood estimates. By default, it is
left unknown which causes all estimating proce-
1. 'Statistics: With a View Toward Applications, Leo
Breiman, 1973, Houghton Mifflin
2. 'Discrete-Event System Simulation, Jerry Banks,
John S. CarsonII, 1984, Prentice-Hall
3. 'Simulation Modeling & Analysis, Averill M.
Law, W. David Kelton, 1991, McGraw-Hill
4. 'Kendall`s Advanced Theory of Statistics, Volume
2, Alan Stuart, J. Keith Ord, 1991, Oxford Univer-
sity Press
24 24 24 24 Chapter 3:
Independence Tests Independence Tests Independence Tests Independence Tests
dures to vary the lower bound with the other
parameters. If new data is added below a preset
lower bound, the bound will be modified to
assume the closest integer value below all input
data.
The Accuracy of Fit Accuracy of Fit Accuracy of Fit Accuracy of Fit describes the level of preci-
sion in iterative estimations. The default is
0.0003, but can be changed if greater accuracy is
desired. Note that greater accuracy can mean
much greater calculation time. Some distribu-
tions have either moments estimates and/or maxi-
mum likelihood estimates which do not require
iterative estimation; in these cases, the accuracy
will not make any difference in the estimation.
The Level of Significance Level of Significance Level of Significance Level of Significance refers to the level of
significance of the test. The Chi-Squared, Kol-
mogorov-Smirnov and Anderson-Darling tests all
ask to reject the fit to a given level of signifi-
cance. The default setting is 5%, however this
can be changed to 1% or 10% or any value you
desire. This number is the likelihood that if the
distribution is rejected, that it was the right distri-
bution anyway. Stated in a different manner, it is
the probability that you will make a mistake and
reject when you should not. Therefore, the
smaller this number, the less likely you are to
reject when you should accept.
The Goodness of Fit Goodness of Fit Goodness of Fit Goodness of Fit tests described later in the
chapter, may be chosen. Kolmogorov-Smirnow
is the default test.
The maximum likelihood estimates and the
moment estimates can be viewed independent of
the goodness of fit tests. The MLE command is
chosen by selecting Fit from the Menu and then
Maximum Likelihood from the Submenu.
The maximum likelihood estimates of the param-
eters for all analytical distributions chosen in the
fit setup dialog are calculated using the log likeli-
hood equation and its derivatives for each choice.
The parameters thus estimated are displayed in a
new view as shown below:
Some distributions do not have maximum likeli-
hood estimates for given ranges of sample
moments because initial estimates of the distribu-
tions parameters are unreliable. This is espe-
cially evident for many of the bounded
continuous distributions when the sample skew-
ness is negative. When such situations occur, an
error message, rather than the parameters, will be
displayed with the name of the analytical distri-
bution.
Many of the MLEs require significant calcula-
tion, and therefore, significant time, even on a
computer with a Math Coprocessor. Because of
this, a Cancel dialog, shown below, will appear
with each calculation.
Beside a Cancel button, it will display the values
of the parameters in the current maximum likeli-
hood calculation. If the Cancel button is clicked,
the calculations will cease at the next iteration
Stat::Fit 25 25 25 25
User Guide User Guide User Guide User Guide
and an error message will be displayed in the
Maximum Likelihood view next to the appropri-
ate distribution.
The other choice for estimates is Moments.
When the Moment Estimates command is cho-
sen, the estimates of the parameters for all chosen
analytical distributions chosen are calculated
using the moment equations for each choice
along with the sample moments from calculations
on the input data in the Data Table. The parame-
ters thus estimated are displayed as shown
below:
Some distributions do not have moment estimates
for given ranges of sample moments. This is
especially evident for many of the bounded con-
tinuous distributions when the sample skewness
is negative. When such situations occur, an error
message rather than the parameters will be dis-
played with the name of the analytical distribu-
tion.
Note that all chosen estimates (MLEs or
Moments) must be finished before the Result
Graphs can be displayed or the Goodness of Fit
tests can be done. Any time the choice of esti-
mates is changed, all visible views of the Result
Graphs and the Goodness of Fit tests will be
redisplayed with the new calculated estimates.
The moment estimates have been included as an
aid to the fitting process; except for the simplest
distributions, they do NOT give good estimates
of the parameters of a fitted distribution.
Goodness of Fit Tests Goodness of Fit Tests Goodness of Fit Tests Goodness of Fit Tests
The tests for goodness of fit are merely compari-
sons of the input data to the fitted distributions in
a statistically significant manner. Each test
makes the hypothesis that the fit is good and cal-
culates a test statistic for comparison to a stan-
dard. The Goodness of Fit tests include:
Chi-squared test
Kolmogorov Smirnov test
Anderson Darling test
If the choice of test is uncertain, even after con-
sulting the descriptions below, use the Kolmog-
orov Smirnov test which is applicable over the
widest range of data and fitted parameters.
Chi Squared Test Chi Squared Test Chi Squared Test Chi Squared Test
The Chi Squared test is a test of the goodness of
fit of the fitted density to the input data in the
Data Table, with that data appropriately separated
into intervals (continuous data) or classes (dis-
crete data). The test starts with the observed data
in classes (intervals). While the number of
classes for discrete data is set by the range of the
integers, the choice of the appropriate number of
intervals for continuous data is not well deter-
mined. Stat::Fit has an automatic calculation
which chooses the least number of intervals
which does not oversmooth the data. An empiri-
cal rule of some popularity, Sturges rule, can
also be used. If neither appears satisfactory, the
number of intervals may be set manually. The
intervals are set in the Input Options dialog of the
Input menu.
The test then calculates the expected value for
each interval from the fitted distribution, where
the expected values of the end intervals include
the sum or integral to infinity (+/-) or the nearest
bound. In order to make the test valid, intervals
26 26 26 26 Chapter 3:
Independence Tests Independence Tests Independence Tests Independence Tests
(classes) with less than 5 data points are joined to
neighbors until remaining intervals have at least 5
data points. Then the Chi Squared statistic for
this data is calculated according to the equation:
where 2 is the chi squared statistic, n is the total
number of data points, ni is the number of data
points in the ith continuous interval or ith discrete
class, k is the number of intervals or classes used,
and pi is the expected probability of occurrence
in the interval or class for the fitted distribution.
The resulting test statistic is then compared to a
standard value of Chi Squared with the appropri-
ate number of degrees of freedom and level of
significance, usually labeled alpha. In Stat::Fit,
the number of degrees of freedom is always taken
to be the net number of data bins (intervals,
classes) used in the calculation minus 1; because
this is the most conservative test, that is, the least
likely to reject the fit in error. The actual number
of degrees of freedom is somewhere between this
number and a similar number reduced by the
number of parameters fitted by the estimating
procedure. While the Chi Squared test is an
asymptotic test which is valid only as the number
of data points gets large, it may still be used in
the comparative sense (see Law & Kelton
1
,
Brunk
2
, Stuart & Ord
3
).
The goodness of fit view also reports a REJECT
or DO NOT REJECT decision for each Chi
Squared test based on the comparison between
the calculated test statistic and the standard statis-
tic for the given level of significance. The level
of significance can be changed in the Calculation
page of the Fit Setup dialog.
To visualize this process for continuous data,
consider the two graphs below:
The first is the normal comparison graph of the
histogram of the input data versus a continuous
plot of the fitted density. Note that the frequency,
not the relative frequency is used; this is the
actual number of data points per interval. How-
ever, for the Chi Squared test, the comparison is
made between the histogram and the value of the
area under the continuous curve between each
interval end point. This is represented in the sec-
ond graph by comparing the observed data, the
top of each histogram interval, with the expected
data shown as square points. Notice that the
interval near 6 has fewer than 5 as an expected
value and would be combined with the adjacent
interval for the calculation. The result is the sum
1. 'Simulation Modeling & Analysis, Averill M.
Law, W. David Kelton, 1991, McGraw-Hill, p.382
2. 'An Introduction to Mathematical Statistics, H.D.
Brunk, 1960, Ginn & Co., p.261
3. 'Kendall`s Advanced Theory of Statistics, Volume
2, Alan Stuart & J. Keith Ord, 1991, Oxford Uni-
versity Press, p. 1159

2
2
1




( ) n np
np
i i
i
i
k
Stat::Fit 27 27 27 27
User Guide User Guide User Guide User Guide
of the normalized square of the error for each
interval.
In this case, the data were separated into intervals
of equal length. This magnifies any error in the
center interval which has more data points and a
larger difference from the expected value. An
alternative, and more accurate way, to separate
the data is to choose intervals with equal proba-
bility so that the expected number of data points
in each interval is the same. Now the resulting
intervals are NOT equal length, in general, but
the errors are of the same relative size for each
interval. This equal probable technique gives a
better test, especially with highly peaked data.
The Chi Squared test can be calculated with inter-
vals of equal length or equal probability by
selecting the appropriate check box in the Calcu-
lation page of the Fit Setup dialog. The equal
probable choice is the default.
While the test statistic for the Chi Squared test
can be useful, the p-value is more useful in deter-
mining the goodness of fit. The p-value is
defined as the probability that another sample
will be as unusual as the current sample given
that the fit is appropriate. A small p-value indi-
cates that the current sample is highly unlikely,
and therefore, the fit should be rejected. Con-
versely, a high p-value indicates that the sample
is likely and would be repeated, and therefore, the
fit should not be rejected. Thus, the HIGHER the
p-value, the more likely that the fit is appropriate.
When comparing two different fitted distribu-
tions, the distribution with the higher p-value is
likely to be the better fit regardless of the level of
significance.
Kolmogorov Smirnov Test Kolmogorov Smirnov Test Kolmogorov Smirnov Test Kolmogorov Smirnov Test
The Kolmogorov Smirnov test (KS) is a statisti-
cal test of the goodness of fit of the fitted cumula-
tive distribution to the input data in the Data
Table, point by point. The KS test calculates the
largest absolute difference between the cumula-
tive distributions for the input data and the fitted
distribution according to the equations:
, i=1,...,n
, i=1,...,n
where D is the KS statistic, x is the value of the
ith point out of n total data points, and F(x) is the
fitted cumulative distribution. Note that the dif-
ference is determined separately for positive and
negative discrepancies on a point by point basis.
The resulting test statistic is then compared to a
standard value of the Kolmogorov Smirnov sta-
tistic with the appropriate number of data points
and level of significance, usually labeled alpha.
While the KS test is only valid if none of the
parameters in the test have been estimated from
the data, it can be used for fitted distributions
because this is the most conservative test, that is,
least likely to reject the fit in error. The KS test
can be extended directly to some specific distri-
butions, and these specific, more stringent, tests
take the form of adjustment to the more general
KS statistic. (See Law & Kelton
1
, Brunk
2
, Stuart
& Ord
3
)
The goodness of fit view also reports a REJECT
or DO NOT REJECT decision for each KS test
based on the comparison between the calculated
test statistic and the standard statistic for the
given level of significance.
1. 'Simulation Modeling & Analysis, Averill M.
Law, W. David Kelton, 1991, McGraw-Hill, p. 382
2. 'An Introduction to mathematical Statistics, H.D.
Brunk, 1960, Ginn & Co., p. 261
3. 'Kendall`s Advanced Theory of Statistics, Volume
2, Alan Stuart & J. Keith Ord, 1991, Oxford Uni-
versity Press, p. 1159
' PD[ '

'

( , )
'

PD[
L
Q
--- ) [ ( )
( ,
j \

'
-
PD[ ) [ ( )
L 1 ( )
Q
---------------
( ,
j \

28 28 28 28 Chapter 3:
Independence Tests Independence Tests Independence Tests Independence Tests
To visualize this process for continuous data,
consider the two graphs below:
The first is the normal P-P plot, the cumulative
probability of the input data versus a continuous
plot of the fitted cumulative distribution. How-
ever, for the KS test, the comparison is made
between the probability of the input data having a
value at or below a given point and the probabil-
ity of the cumulative distribution at that point.
This is represented in the second graph by com-
paring the cumulative probability for the
observed data, the straight line, with the expected
probability from the fitted cumulative distribu-
tion as square points. The KS test measures the
largest difference between these, being careful to
account for the discrete nature of the measure-
ment.
Note that the KS test can be applied to discrete
data in slightly different manner, and the result-
ing test is even more conservative than the KS
test for continuous data. Also, the test may be
further strengthened for discrete data (see
Gleser
1
).
While the test statistic for the Kolmogorov-
Smirnov test can be useful, the p-value is more
useful in determining the goodness of fit. The p-
value is defined as the probability that another
sample will be as unusual as the current sample
given that the fit is appropriate. A small p-value
indicates that the current sample is highly
unlikely, and therefore, the fit should be rejected.
Conversely, a high p-value indicates that the sam-
ple is likely and would be repeated, and therefore,
the fit should not be rejected. Thus, the HIGHER
the p-value, the more likely that the fit is appro-
priate. When comparing two different fitted dis-
tributions, the distribution with the higher p-
value is likely to be the better fit regardless of the
level of significance.
Anderson Darling Test Anderson Darling Test Anderson Darling Test Anderson Darling Test
The Anderson Darling test is a test of the good-
ness of fit of the fitted cumulative distribution to
the input data in the Data Table, weighted heavily
in the tails of the distributions. This test calcu-
lates the integral of the squared difference
between the input data and the fitted distribution,
with increased weighting for the tails of the dis-
tribution, by the equation:
where W
n
2
is the AD statistic, n is the number of
data points, F(x) is the fitted cumulative distribu-
tion, and F
n
(x) is the cumulative distribution of
1. 'Exact Power of Goodness-of-Fit of Kolmogorov
Type for Discontinuous Distributions Leon Jay
Glese, J.Am.Stat.Assoc., 80 (1985) p. 954
W n
F x F(x
F(x F(x
dF(x
n 2
2
1






[ ( ) )]
)[ )]
)
Stat::Fit 29 29 29 29
User Guide User Guide User Guide User Guide
the input data. This can be reduced to the more
useful computational equation:
where
i
is the value of the fitted cumulative
distribution, F(x
i
), for the ith data point (see
Law & Kelton
1
, Anderson & Darling
2,3)
)
.
The resulting test statistic is then compared to a
standard value of the AD statistic with the appro-
priate number of data points and level of signifi-
cance, usually labeled alpha. The limitations of
the AD test are similar to the Kolmogorov
Smirnov test with the exception of the boundary
conditions discussed below. The AD test is not a
limiting distribution; it is appropriate for any
sample size. While the AD test is only valid if
none of the parameters in the test have been esti-
mated from the data, it can be used for fitted dis-
tributions with the understanding that it is then a
conservative test, that is, less likely to reject the
fit in error. The validity of the AD test can be
improved for some specific distributions. These
more stringent tests take the form of a multiplica-
tive adjustment to the general AD statistic.
The goodness of fit view also reports a REJECT
or DO NOT REJECT decision for each AD test
based on the comparison between the calculated
test statistic and the standard statistic for the
given level of significance. The AD test is very
sensitive to the tails of the distribution. For this
reason, the test must be used with discretion for
many of the continuous distributions with lower
bounds and finite values at that lower bound.
The test is inaccurate for discrete distributions as
the standard statistic is not easily calculated.
While the test statistic for the Anderson Darling
test can be useful, the p-value is more useful in
determining the goodness of fit. The p-value is
defined as the probability that another sample
will be as unusual as the current sample given
that the fit is appropriate. A small p-value indi-
cates that the current sample is highly unlikely,
and therefore, the fit should be rejected. Con-
versely, a high p-value indicates that the sample
is likely and would be repeated, and therefore, the
fit should not be rejected. Thus, the HIGHER the
p-value, the more likely that the fit is appropriate.
When comparing two different fitted distribu-
tions, the distribution with the higher p-value is
likely to be the better fit regardless of the level of
significance
General General General General
Each of these tests has its own regions of greater
sensitivity, but they all have one criterion in com-
mon. The fit and the tests are totally insensitive
for fewer than 10 data points (Stat::Fit will not
respond to less data), and will not achieve much
accuracy until 100 data points. On the order of
200 data points seems to be optimum. For large
data sets, greater than 4000 data points, the tests
can become too sensitive, occasionally rejected a
proposed distribution when it is actually a useful
fit. This can be easily tested with the Generate
command in the Input menu.
While the calculations are being performed, a
window at the bottom of the screen shows its
progress and allows for a Cancel option at any
time.
1. 'Simulation Modeling & Analysis, Averill M.
Law, W. David Kelton, 1991, McGraw-Hill, p. 392
2. 'A Test of Goodness of Fit, T. W. Anderson, D. A.
Darling, J.Am.Stat.Assoc., 1954, p. 765
3. 'Asymptotic Theory of Certain Goodness of Fit`
Criteria Based on Stochastic Processes, T. W.
Anderson, D. A. Darling, Ann.Math.Stat., 1952, p.
193
30 30 30 30 Chapter 3:
Independence Tests Independence Tests Independence Tests Independence Tests
The results are shown in a table. An example is
given below:
In the summary section, the distributions you
have selected for fitting are shown along with the
results of the Goodness of Fit Test(s). The num-
bers in parentheses after the type of distribution
are the parameters and they are shown explicitly
in the detailed information, below the summary
table.
Please note Please note Please note Please note
The above table shows results for the Chi-
Squared Test. The number in parentheses is the
degrees of freedom. When you want to compare
Chi-Squared from different distributions, you can
make a comparison only when they have the same
degrees of freedom.
The detailed information, following the summary
table, includes a section for each fitted distribu-
tion. This section includes:
parameter values
Chi Squared Test
Kolmogorov Smirnov Test
Anderson Darling Test
Please note Please note Please note Please note
If an error occurred in the calculations, the error
message is displayed instead.
For the Chi Squared Test, the details show:
total classes [intervals]
interval type [equal length, equal probable]
net bins [reduced intervals]
chi**2 [the calculated statistic]
degrees of freedom [net bins-1 here]
alpha [level of significance]
chi**2(n, alpha) [the standard statistic]
p-value
result
For both the Kolmogorov Smirnov and Anderson
Darling tests, the details show:
data points
stat [the calculated statistic]
alpha [level of significance]
stat (n, alpha) [the standard statistic]
p-value
result
Stat::Fit 31 31 31 31
User Guide User Guide User Guide User Guide
Distribution Fit - Auto::Fit Distribution Fit - Auto::Fit Distribution Fit - Auto::Fit Distribution Fit - Auto::Fit
Automatic fitting of continuous dis-
tributions can be performed by
clicking on the Auto::Fit icon or by
selecting Fit from the Menu bar and then
Auto::Fit from the Submenu.
This command follows the same procedure as
previously discussed for manual fitting.
Auto::Fit will automatically choose appropriate
continuous distributions to fit to the input data,
calculate Maximum Likelihood Estimates for
those distributions, test the results for Goodness
of Fit, and display the distributions in order of
their relative rank. The relative rank is deter-
mined by an empirical method which uses effec-
tive goodness of fit calculations. While a good
rank usually indicates that the fitted distribution
is a good representation of the input data, an
absolute indication of the goodness of fit is also
given.
An example is shown below:
For continuous distributions, the Auto::Fit dialog
limits the number of distributions by choosing
only those distributions with a lower bound or by
forcing a lower bound to a specific value as in Fit
Setup. Also, the number of distributions will be
limited if the skewness of the input data is nega-
tive. Many continuous distributions with lower
bounds do not have good parameter estimates in
this situation.
For discrete distributions, the Auto::Fit dialog
limits the distributions by choosing only those
distributions that can be fit to the data. The dis-
crete distributions must have a lower bound.
The acceptance of fit usually reflects the results
of the goodness of fit tests at the level of signifi-
cance chosen by the user. However, the accep-
tance may be modified if the fitted distribution
would generate significantly more data points in
the tails of the distribution than are indicated by
the input data.
32 32 32 32 Chapter 3:
Distribution Fit - Auto::Fit Distribution Fit - Auto::Fit Distribution Fit - Auto::Fit Distribution Fit - Auto::Fit
Replication and Confidence Level Replication and Confidence Level Replication and Confidence Level Replication and Confidence Level
Calculator Calculator Calculator Calculator
The Replications command allows the user to
calculate the number of independent data points,
or replications, of an experiment that are neces-
sary to provide a given range, or confidence
interval, for the estimate of a parameter. The con-
fidence interval is given for the confidence level
specified, with a default of 0.95. The resulting
number of replications is calculated using the t
distribution
1
.
To use the Replications calculator, select Utilities
from the Menu bar and then Replications.
The following dialog will be displayed.
The expected variation of the parameter must be
specified by either its expected maximum range
or its expected standard deviation. Quite fre-
quently, this variation is calculated by pilot runs
of the experiment or simulation, but can be cho-
sen by experience if necessary. Be aware that this
is just an initial value for the required replica-
tions, and should be refined as further data are
available.
Alternatively, the confidence interval for a given
estimate of a parameter can be calculated from
the known number of replications and the
expected or estimated variation of the parameter.
1. 'Discrete-Event System Simulation,
Second Edition, Jerry Banks, John
S. Carson II, Barry L. Nelson, 1966,
Prentice-Hall, p. 447is c
Stat::Fit 33 33 33 33
User Guide User Guide User Guide User Guide
Chapter 4:
Graphs
This chapter describes the types of graphs and the Graphics Style options. Graphical analysis and out-
put is an important part of Stat::Fit. The input data in the Data Table may be graphed as a histogram
or line chart and analyzed by a scatter plot or autocorrelation graph. The resulting fit of a distribution
may be compared to the input via a direct comparison, a difference plot, a Q-Q plot, and a P-P plot for
each analytical distribution chosen. The analytical distributions can be displayed for any set of param-
eters.
The resulting graphs can be modified in a variety of ways using the Graphics Style dialog in the Graph-
ics menu, which becomes active when a graph is the currently active window.
34 34 34 34 Chapter 4:
Result Graphs Result Graphs Result Graphs Result Graphs
Result Graphs Result Graphs Result Graphs Result Graphs
A density graph of your input data and the fitted
density can be viewed by choosing Fit from the
Menu bar and then Result Graphs.
This graph displays a histogram of the input data
overlaid with the fitted densities for specific dis-
tributions.
From the next menu that appears (see above),
choose Comparison.
Quicker access to this graph is accom-
plished by clicking on the Graph icon on
the Control bar.
The graph will appear with the default settings
of the input data in a blue histogram and the fitted
data in a red polygon, as shown below.
The distribution being fit is listed in the lower
box on the right. If you have selected more than
one distribution to be fit, a list of the distributions
is given in the upper box on the right. Select addi-
tional distributions to be displayed, as compari-
sons, by clicking on the distribution name(s) in
the upper box. The additional fit(s) will be added
to the graph and the name of the distribution(s)
added to the box on the lower right. There will be
a Legend at the bottom of the graph, as shown
below:
To remove distributions from the graph, click on
the distribution name in the box on the lower
right side and it will be removed from the graphic
display.
Stat::Fit provides many options for graphs in the
Graphics Style dialog, including changes in the
graph character, the graph scales, the title texts,
the graph fonts and the graph colors.
This dialog can be activated by selecting Graph-
ics from the Menu bar and then Graphics Style
from the Submenu.
The graph remains modified as long as the docu-
ment is open, even if the graph itself is closed and
reopened. It will also be saved with the project as
modified. Note that any changes are singular to
that particular graph; they do not apply to any
other graph in that document or any other docu-
ment.
If a special style is always desired, the default
values may be changed by changing any graph to
suit, and checking the Save Apply button at the
bottom of the dialog.
Stat::Fit 35 35 35 35
User Guide User Guide User Guide User Guide
Graphics Style Graphics Style Graphics Style Graphics Style
Graph Graph Graph Graph
The Graphics Style dialog box has 5 tabs (or
pages). When you select a tab, the dialog box
changes to display the options and default set-
tings for that tab. You determine the settings for
any tab by selecting or clearing the check boxes
on the tab. The new settings take effect when
you close the dialog box. If you want your new
settings to be permanent, select Save to Default
and they will remain in effect until you wish to
change them again.
The dialog box for the graph type options is
shown below:
The Graph Type chooses between three types of
distribution functions:
Density indicates the probability density
function, f(x), for continuous random vari-
ables and the probability mass function, p(j),
for discrete random variables. Quite fre-
quently, f(x) is substituted for p(j) with the
understanding that x then takes on only inte-
ger values.
Ascending cumulative indicates the cumu-
lative distribution function, F(x), where x
can be either a continuous random variable
or a discrete random variable. F(x) is contin-
uous or discrete accordingly. F(x) varies
from 0 to 1.
Descending cumulative indicates the sur-
vival function, (1-F(x)).
Graph Type is not available for Scatter Plot,
Autocorrelation, Q-Q plot and P-P plot.
The Normalization area indicates whether the
graph represents actual counts or a relative frac-
tion of the total counts.
Frequency represents actual counts for each
interval (continuous random variable) or
class (discrete random variable).
Relative Frequency represents the relative
fraction of the total counts for each interval
(continuous random variable) or class (dis-
crete random variable).
Normalization is only available for distribu-
tion graph types, such as Comparison and
Difference.
The graph style can be modified for both the
input data and the fitted distribution. Choices
include points, line, bar, polygon, filled polygon
and histogram. For Scatter Plots, the choices are
modified and limited to: points, cross, dots.
36 36 36 36 Chapter 4:
Graphics Style Graphics Style Graphics Style Graphics Style
Scale Scale Scale Scale
The dialog box for Scale is shown below:
The Scale page allows the x and y axes to be
scaled in various ways, as well as modifying the
use of a graph frame, a grid, or tick marks. The
default settings for Scale allow the data and fitted
distribution to be displayed. These settings can
be changed by deselecting the default and adding
Min and Max values.
Moreover, the printed graph will maintain that
aspect ratio as will the bitmap that can be saved
to file or copied to the Clipboard.
The Frame option allows you to have a full, par-
tial or no frame around your graph. A grid can be
added to your graph in both x and y, or just a hor-
izontal or vertical grid can be displayed. Tick
marks can be selected to be inside, outside, or
absent. Both ticks and the grid can overlay the
data.
Text Text Text Text
The dialog box for Text is shown below:
The Text function allows you to add text to your
graph. A Main Title, x-axis and y-axis titles, and
legends can be included. Scale factors can be
added. The layout of the y-axis title can be mod-
ified to be at the top, on the side or rotated along
the side of the y-axis. Some graphs load default
titles initially.
Fonts Fonts Fonts Fonts
The dialog box for Fonts is shown below:
Stat::Fit 37 37 37 37
User Guide User Guide User Guide User Guide
The Fonts page of the dialog provides font selec-
tion for the text titles and scales in the currently
active graph. The font type is restricted to True-
Type

, printer Ready fonts that can be scaled on


the display. The Font size is limited to a range
that can be contained in the same window as the
graph. Text colors can be changed in the Color
page; no underlining or strikeouts are available.
Color Color Color Color
The dialog box for Color is shown below:
The Colors page of the dialog provides color
options for all the fields of the currently active
graph. For each object in the graph, a button to
call the color dialog is located to the left and a
color patch is located on the right. Text refers to
all text including scales. Input refers to the first
displayed graph, the input data in comparison
graphs. Result refers to fitted data. Bar Shade
refers to the left and bottom of histogram boxes
and requires the check box be set on as well.
Background refers to the background color; full
white does not print.
Note that the colors are chosen to display well on
the screen. If a laser printer with gray scales is
used, the colors should be changed to brighter
colors or grays in order to generate appropriate
gray levels. Some of the colors will default to the
nearest of the 16 basic Windows colors in order
to display properly.
38 38 38 38 Chapter 4:
Other Graphs Other Graphs Other Graphs Other Graphs
Other Graphs Other Graphs Other Graphs Other Graphs
Stat::Fit provides additional Result Graphs for
visualizing the fit of your data to a distribution.
Above we have described the Density Graph. The
other choices on the Submenu are Distribution,
Difference, Box Plot, Q-Q Plot and P-P Plot. All
of these graph types allow the comparison of
multiple distributions. If you have selected more
than one distribution to be fit, a list of the distri-
butions is given in the upper box on the right.
Select additional distributions to be displayed, as
comparisons, by clicking on the distribution
name(s) in the upper box. The additional fit(s)
will be added to the graph and the name of the
distribution(s) added to the box on the lower
right. There will be a Legend at the bottom of the
graph. To remove distributions from the graph,
click on the distribution name in the box on the
lower right side and it will be removed from the
graphic display.
Distribution Graph Distribution Graph Distribution Graph Distribution Graph
The Distribution graph displays the cumulative
distribution of the input data overlaid with the fit-
ted cumulative distributions for specific distribu-
tions.
Difference Graph Difference Graph Difference Graph Difference Graph
The Difference Graph is a plot of the difference
between the cumulative input data in the Data
Table minus the fitted cumulative distribution.
Note that conservative error bars shown in the
graph are not a function of the number of inter-
vals for continuous data. Although the graph may
be modified to plot the difference between the
input density and the fitted density, the error bars
derived from the conservative Kolmogorov
Smirnov calculations, are only applicable for the
cumulative distribution.
Multiple distributions can be compared with
respect to their difference plots as shown below:
Box Plot Box Plot Box Plot Box Plot
Box Plots are another way to compare the data
with the distributions. It is particularly good for
looking at the extremes. The center line is the
median. The box represents the quartiles (25%
and 75%). The next set of lines are the octiles and
the outer lines are the extremes of the data or dis-
tribution. A box plot gives a quick indication of
potential skewness in a data set by relating the
location of the box to the median. If one side of
Stat::Fit 39 39 39 39
User Guide User Guide User Guide User Guide
the box is further away from the median than the
other side, the distribution tends to be skewed in
the direction furthest from the median.
Q-Q Plot Q-Q Plot Q-Q Plot Q-Q Plot
The Q-Q Plot, as shown below, is a plot of the
input data (straight line) in the Data Table versus
the value of x that the fitted distribution must
have in order to give the same probability of
occurrence. This plot tends to be sensitive to
variations of the input data in the tails of the dis-
tribution (see Law & Kelton
1
).
Multiple distributions can be added to the graph
for comparison.
P-P Plot P-P Plot P-P Plot P-P Plot
The P-P Plot, as shown below, is a plot of the
probability of the ith data point in the input data
(straight line) from the Data Table versus the
probability of that point from the fitted cumula-
tive distribution. This plot tends to be sensitive
to variations in the center of the fitted data (see
Law & Kelton
2
).
1. 'Simulation Modeling & Analysis, Averill M.
Law, W. David Kelton, 1991, McGraw-Hill, p. 374 2. ibid., p. 339
40 40 40 40 Chapter 4:
Other Graphs Other Graphs Other Graphs Other Graphs
Multiple distributions can be added to the graph
for comparison.
Distribution Viewer Distribution Viewer Distribution Viewer Distribution Viewer
The Distribution Viewer option allows you to dis-
play the functional form of any distribution with
specified parameters totally independent of data.
This is just a picture of what that analytical distri-
bution would look like with the parameters you
have selected. It can be used to visualize the
functional form of distributions and can be useful
in selecting a distribution for fitting. The Distri-
bution Viewer allows active viewing of a distri-
bution while the parameters or moments are
changed.
To display the Distribution Viewer, select Utili-
ties from the Menu bar and then Distribution
Viewer from the Submenu.
Quicker access to the Distribution
Viewer by clicking on the Distribution
Viewer icon.
A dialog box will allow you to select the type of
distribution and its parameters. An example of
the distribution graph viewer is shown below:
The Distribution Viewer uses the distribution and
parameters provided in this dialog to create a
graph of any analytical distribution supported by
Stat::Fit. This graph is not connected to any input
data or document. This graph of the distribution
may then be modified using the sliders for each
parameter, specifying the value, or specifying one
of the moments of the distribution. The number
of moments which may be modified is limited to
the number of free parameters for that distribu-
tion.
As the value of each parameter or moment is
changed, the graph is frozen at its previous repre-
sentation. The graph is updated when the slider is
released, or the edited value is entered with a
Return, Tab, or mouse click in another area. The
graph may also be updated with the Redraw
Graph button when active.
The bounds of the distribution, if any, can be
fixed; however this reduces the number of
moments that can be modified. A grayed moment
box can be viewed, but not modified.
Occasionally, the specified moments cannot be
calculated with the given parameters, such as
when the mean is beyond one of the bounds. In
this situation, an error message is given and the
moments are recalculated from the parameters.
Stat::Fit 41 41 41 41
User Guide User Guide User Guide User Guide
Also, some distributions do not have finite
moments for all values of the parameters and the
appropriate moment boxes are shown empty.
As with all graphs in Stat::Fit, the Distribution
Viewer may be customized by using the Graphics
Style dialog in the Graphics menu. The graphs
may also be copied to the Clipboard or saved as a
graphic file [.bmp] by using the Copy or Save As
commands in the Graphics menu. Note that while
the graph view currently open can still be modi-
fied, the copied or saved version is a fixed bit-
map. The bitmap contains only the graph, and
excludes parameter boxes and sliders.
The distribution in the Distribution Viewer may
also be exported to another application by choos-
ing the Export Fit command while the Distribu-
tion Viewer is the active window. In this way, no-
data or minimal data descriptions can be trans-
lated from the form of the distribution in Stat::Fit
to the form of the distribution in a particular
application.
How to Copy and Save As How to Copy and Save As How to Copy and Save As How to Copy and Save As
Any graph can be copied onto the Clipboard, so
that you can transfer it to another program, by
selecting Graphics from the Menu bar and then
Copy from the Submenu. The Graphics Copy
command places a copy of the current graph in
the Clipboard as a bitmap.
The Graphics Save As command saves a bitmap
(.bmp) file of the current graph. From there, it
can be loaded into another application if that
application supports the display of bitmaps. It
can also be loaded into Stat::Fit, but will no
longer be connected with a document. Note that
the copy can no longer be modified with the
Graphics Style dialog.
42 42 42 42 Chapter 4:
Other Graphs Other Graphs Other Graphs Other Graphs
Stat::Fit 43 43 43 43
User Guide User Guide User Guide User Guide
Chapter 5:
Print and Output Files
This chapter provides details on how to print graphs and reports tailored to meet your needs. Informa-
tion on exporting files will also be given.
44 44 44 44 Chapter 5:
Printing Printing Printing Printing
Printing Printing Printing Printing
The printing process is started by selecting File
from the Menu bar. The following Submenu is
displayed:
Print Style Print Style Print Style Print Style
Select Print Style from the Submenu. All print
output is a copy of selected windows of a
Stat::Fit document or of a stand alone Distribu-
tion graph. The type of print output, whether a
single active window, all active windows in the
document, or a selected report, is chosen in the
Print Style dialog. Unless changed by the user,
the default style is to print the active window
(colored title bar). Other options, such as fonts
and labels, may be chosen in the Print Style dia-
log as well.
The Print Style dialog box has 2 tabs as shown
below:
Print Type Print Type Print Type Print Type
Print Type is the first tab in the Print Style dialog
box. It allows you to select the items you want to
have printed. Your choices include:
Active Window: Print only the active win-
dow.
Active Document: Print all of the open win-
dows in a single document.
Other options available for printing are the inclu-
sion of a Header or Footer. In either you can put
the title of the document, a date and page num-
bers.
Stat::Fit 45 45 45 45
User Guide User Guide User Guide User Guide
Fonts Fonts Fonts Fonts
The dialog box for Fonts is shown below:
The Font page of the File Print Style dialog pro-
vides font choice for the text pages to be printed.
This choice does not change the font(s) chosen
for the graphics pages. This allows you to select
the font type and size for the written text in the
report. The fonts for the graphs should have been
previously selected in the Graph Style/Fonts dia-
log box.
Printer Set-up Printer Set-up Printer Set-up Printer Set-up
Return to the Submenu under File shown at the
beginning of this chapter, and select Printer Set-
up. The standard windows dialog box for printer
set-up will be displayed.
This standard Print Setup dialog will allow speci-
fication of the printer, paper size, and orientation
of printed output. It will also allow access to the
Options dialog of the chosen printer. This setup
will subsequently be used by the Print command.
Print Preview Print Preview Print Preview Print Preview
The Print Preview command opens a separate set
of windows to display each expected page of the
print output, using the options specified in the
Print Style command. These windows can be
closed by double clicking on the upper left Close
button.
The Print Preview windows give a scaled version
of the output to be generated by the Print com-
mand. Each graph will maintain the aspect ratio
of the on screen window while maximizing the
graph size to fit the printed page.
Print Print Print Print
Selecting Print on the Submenu under File, will
display the standard windows print dialog box:
The Print command initiates printing of the out-
put specified by the Print Style command, checks
the printer and asks final permission to print with
a standard dialog. This dialog also gives access
to the Printer Setup dialog to specify the printer
type, paper size and orientation, as well as the
printer Options dialog specific to the chosen
printer.
If you are uncertain of the expected output, exit
this dialog with the Cancel button and use the
Print Preview command to view a screen copy.
46 46 46 46 Chapter 5:
File Output File Output File Output File Output
File Output File Output File Output File Output
When input data is entered into Stat::Fit,
whether through manual entry in a new docu-
ment, opening a data file, pasting data from the
Clipboard, or reopening a Stat::Fit project file, a
Stat::Fit document is created which contains the
data and all subsequent calculations and graphs.
If the document is initiated from an existing file,
it assumes the name of that file and the document
can be saved automatically as a Stat::Fit project
(.SFP extension) with the Save command.
If the document is unnamed, it can be saved as
either a Stat::Fit project or a text data file with
the Save As command. In either situation, the
on-screen document assumes the name of the file
used.
A text file of the input data can be saved with the
Save Input command which will prompt if the
file already exists. Unless specifically changed,
the file will be saved with the .DAT extension.
All graphs can be saved independently as bitmap
files (.BMP format), by using the Save As com-
mand in the Graphics menu when the graph is the
active window.
Export Fit Export Fit Export Fit Export Fit
The fitted distributions, ready for
inclusion in a specific Application,
may be saved in a text file by using
the Export command in the File menu or the
Export icon on the Control Bar.
The Export command provides the fitted distribu-
tion in the form required by the Application in
order to generate random variates from that dis-
tribution. The Export Fit dialog, as shown below,
allows a choice of Applications in order to deter-
mine the format of the output and the choice of
the analytical distribution. After both choices
have been made the expected output is shown
near the bottom of the dialog.
The Export command requires that the appropri-
ate estimates have been calculated, either manu-
ally or automatically with the Auto::Fit
command.
For some applications, the requested distribution
may not be supported. If the generator for the
unsupported distribution can be formed from a
Uniform distribution, the analytical form of this
generator is given instead. If the generator is not
straightforward, no output will be provided.
The output can be directed to either the Windows
clipboard or a text file.
Export of Empirical Distributions Export of Empirical Distributions Export of Empirical Distributions Export of Empirical Distributions
The Export Empirical command allows the
export of an empirical distribution for the input
data to either the Windows clipboard or a text
file. The output is formatted with a value and a
probability, delimited by a space, on each line.
An empirical distribution is advisable if the input
data cannot be fit to the analytical distributions.
Stat::Fit 47 47 47 47
User Guide User Guide User Guide User Guide
To export an empirical distribution, select File
from the Menu bar, Export from the Submenu,
and then Export Empirical.
The following dialog will be displayed:
If any of the data are non-integers, then the
exported distribution may be only a continuous
distribution with the minimum and maximum
values determining the range of data included in
the
distribution. The number of data points included
is shown. The number of intervals may be set as
well, but should be small enough to avoid bins
with zero data and large enough to avoid over-
smoothing. The default interval value is the same
as the auto value set by Stat::Fit and will avoid
oversmoothing. The distribution may be either a
cumulative or density distribution.
If the data is integer, then the exported distribu-
tion may be either a continuous or discrete distri-
bution with the minimum and maximum values
determining the range of data included in the dis-
tribution. The discrete integer range is limited to
1000. The number of data points included is indi-
cated in the Points box. The distribution may be
either a cumulative or density distribution.
The Precision of the numbers in the output is set
by the Precision box whose default value is 3.
48 48 48 48 Chapter 5:
File Output File Output File Output File Output
Stat::Fit 49 49 49 49
User Guide User Guide User Guide User Guide
Chapter 6:
Tutorial
This chapter provides an example to illustrate some of the functions of Stat::Fit. It will also give some
insights for fitting your data and understanding the results of the tests and graphical outputs.
50 50 50 50 Chapter 6:
Tutorial Tutorial Tutorial Tutorial
Tutorial Tutorial Tutorial Tutorial
To begin, start a New File for data by
clicking on the New File icon. An input
data table will appear on the screen.
An easy way to provide some input data
is to use the built-in number generator
provided with Stat::Fit. Click on the
Generate icon.
Select the logistic distribution, enter 1000 for the
number of points and use the default settings.
Click on OK. The data table will be filled with
1000 points of generated data.
A histogram of your new data can be
displayed by clicking on the Input Graph
icon. A graph very similar to the following will
be shown:
The Descriptive Statistics of the input data can be
viewed by selecting Statistics from the Menu bar
and then Descriptive from the Submenu. The fol-
lowing table will be shown:
The independence of your data can be checked by
selecting Statistics from the Menu bar and then
Stat::Fit 51 51 51 51
User Guide User Guide User Guide User Guide
Independence from the Submenu. From the next
submenu which appears, select Scatter Plot.
If the data is independent, it will scatter all over
the graph.
The distribution fitting process is started
by clicking on the Set-up icon. A dialog
box for Fit Setup will appear. Looking at
the graph of the input data, the shape resembles a
Normal distribution. Therefore, let us try to fit
the data to both a Logistic distribution and a Nor-
mal distribution.
Click on Logistic and then on Normal.
After selecting the distributions, go to the next
window of the dialog box, for Calculations. In
addition to the Chi-Squared Test, also select the
Kolmogorov-Smirnov and Anderson-Darling
Tests. Then click on OK.
The tests will be performed by clicking
on the Test icon. The calculations for the
Goodness of Fit tests will start. A sum-
mary of the results will be presented in a table as
shown next:
The results of the tests indicate that neither distri-
bution should be rejected. However you will
notice that the p-values for the Logistic distribu-
tion are higher than for the Normal distribution,
indicating a better fit.
52 52 52 52 Chapter 6:
Tutorial Tutorial Tutorial Tutorial
Graphical results can be viewed by click-
ing on the Graph Fit icon. The following
graph will be shown:
To compare this to the graph of the Normal distri-
bution, click on Normal in the upper right box. It
will be added to the graph as shown below:
Graphically both distributions appear to fit the
data. Remember that the Goodness of Fit tests
indicated that the Logistic distribution was a bet-
ter fit. To get a better visualization of this, let us
try some of the other Result Graphs.
Select Fit from the Menu bar and then Result
Graphs from the Submenu. From the next sub-
menu choose Q-Q Plot.
The Q-Q plot is sensitive to the tails of a distribu-
tion and you can see in the above graph that the
Normal distribution does not provide a good fit
for the tails of the data.
Another useful graph for displaying the fit is the
Difference Graph. Select Fit from the Menu bar,
Result Graphs from the Submenu and then Dif-
ference Graph.
The Difference Graphs illustrate the difference of
the cumulative distribution you have fit com-
pared to the data.
It is instructive to look at another distribution
which might fit. Let us try the Weibull distribu-
tion. Repeat the previous procedures: click on the
Setup icon and select the Weibull distribution for
fitting. Hit OK and then click on the Test icon.
The goodness of fit summary shows the results:
The results of these tests indicate that you should
reject this distribution. Let us look at the Com-
parison Graph to see if it is obvious from its
Stat::Fit 53 53 53 53
User Guide User Guide User Guide User Guide
visual appearance. Click on the Graph Fit icon
and choose Weibull.
For a better visual display, try the Q-Q plot for
the Weibull distribution.
Here we see the tails of the distribution do not fit
well. Let us look at the P-P plot for the center of
the distribution.
The combination of these plots provides a good
visualization of how well the data is fit to a par-
ticular distribution.
Let us now print a Report with our results. Select
File from the menu bar and then Print Style from
the Submenu. In the first page of the dialog box,
select Print Active document in order to print all
of the test results and graphs.
Select Print Setup from the Submenu under File
in order to specify your printer, paper and orien-
tation. Select Print Preview to display your report
before printing. If you are pleased with its con-
tents, Select Print from the Submenu and hit OK.
54 54 54 54 Chapter 6:
Tutorial Tutorial Tutorial Tutorial
Stat::Fit 55 55 55 55
User Guide User Guide User Guide User Guide
Appendix:
Distributions
56 56 56 56
Beta Distribution (min, max, p, q) Beta Distribution (min, max, p, q) Beta Distribution (min, max, p, q) Beta Distribution (min, max, p, q)
Beta Distribution (min, max, p, q) Beta Distribution (min, max, p, q) Beta Distribution (min, max, p, q) Beta Distribution (min, max, p, q)
min x max
min = minimum value of x
max = maximum value of x
p = lower shape parameter > 0
q = upper shape parameter > 0
B(p,q) Beta Function
Description Description Description Description
The Beta distribution is a continuous distribution that has both upper and lower finite bounds. Because
many real situations can be bounded in this way, the Beta distribution can be used empirically to esti-
mate the actual distribution before much data is available. Even when data is available, the Beta distri-
bution should fit most data in a reasonable fashion, although it may not be the best fit. The Uniform
distribution is a special case of the Beta distribution with p,q = 1.
As can be seen in the examples above, the Beta distribution can approach zero or infinity at either of its
bounds, with p controlling the lower bound and q controlling the upper bound. Values of p, q < 1 cause
f x
B p q
x x
p q
p q
( )
( , )
( min) (max )
(max min)




+ ++ +
1
1 1
1
Stat::Fit 57 57 57 57
User Guide User Guide User Guide User Guide
the Beta distribution to approach infinity at that bound. Values of p, q > 1 cause the Beta distribution to
be finite at that bound.
Beta distributions have many, many uses. As summarized in Johnson et al
1
, Beta distributions have
been used to model distributions of hydrologic variables, logarithm of aerosol sizes, activity time in
PERT analysis, isolation data in photovoltaic system analysis, porosity/void ratio of soil, phase deriva-
tives in communication theory, size of progeny in Escherchia Coli, dissipation rate in breakage models,
proportions in gas mixtures, steady-state reflectivity, clutter and power of radar signals, construction
duration, particle size, tool wear, and others. Many of these uses occur because of the doubly bounded
nature of the Beta distribution.
1. 'Continuous Univariate Distributions, Volume 2, Norman L. Johnson, Samuel Kotz, N. Balakrishnan, 1995,
John Wiley & Sons, p. 236-237
58 58 58 58
Binomial Distribution (n, p) Binomial Distribution (n, p) Binomial Distribution (n, p) Binomial Distribution (n, p)
Binomial Distribution (n, p) Binomial Distribution (n, p) Binomial Distribution (n, p) Binomial Distribution (n, p)
x = 0, 1, ..., n
n = number of trials
p = probability of the event occurring
Description Description Description Description
The Binomial distribution is a discrete distribution bounded by [0,n]. Typically, it is used where a sin-
gle trial is repeated over and over, such as the tossing of a coin. The parameter, p, is the probability of
the event, either heads or tails, either occurring or not occurring. Each single trial is assumed to be inde-
pendent of all others. For large n, the Binomial distribution may be approximated by the Normal distri-
bution, for example when np>9 and p<0.5 or when np(1-p)>9.
As shown in the examples above, low values of p give high probabilities for low values of x and visa
versa, so that the peak in the distribution may approach either bound. Note that the probabilities are
actually weights at each integer, but are represented by broader bars for visibility.
p x
n
x
p p
x n x
( ) ( )
j jj j
( (( (
, ,, ,
\ \\ \
, ,, ,
( (( (


1
n
x
n
x n x
j jj j
( (( (
, ,, ,
\ \\ \
, ,, ,
( (( (


!
!( )!
Stat::Fit 59 59 59 59
User Guide User Guide User Guide User Guide
The Binomial distribution has had extensive use in games, but is also useful in genetics, sampling of
defective parts in a stable process, and other event sampling tests where the probability of the event is
known to be constant or nearly so. See Johnson et al.
1
1. 'Univariate Discrete Distributions Norman L. Johnson, Samuel Kotz, Adrienne W. Kemp, 1992, John Wiley &
Sons, p. 134
60 60 60 60
Chi Squared Distribution (min, nu) Chi Squared Distribution (min, nu) Chi Squared Distribution (min, nu) Chi Squared Distribution (min, nu)
Chi Squared Distribution (min, nu) Chi Squared Distribution (min, nu) Chi Squared Distribution (min, nu) Chi Squared Distribution (min, nu)
min = minimum x value
nu = shape parameter
Description Description Description Description
The Chi Squared is a bounded continuous distribution bounded on the lower side. Note that the Chi
Squared distribution is a subset of the Gamma distribution with beta=2 and alpha=n/2. Like the
Gamma distribution, it has three distinct regions. For n=2, the Chi Squared distribution reduces to the
Exponential distribution, starting at a finite value at minimum x and decreasing monotonically thereaf-
ter. For n<2, the Chi Squared distribution tends to infinity at minimum x and decreases monotonically
for increasing x. For n>2, the Chi Squared distribution is 0 at minimum x, peaks at a value that
depends on n, decreasing monotonically thereafter.
Because the Chi Squared distribution does not have a scaling parameter, its utilization is somewhat lim-
ited. Frequently, this distribution will try to represent data with a clustered distribution with n less than
2. However, it can be viewed as the distribution of the sum of squares of independent unit normal vari-
ables with n degrees of freedom and is used in many statistical tests. (see Johnson et al.3 )
1
1. 'Continuous Univariate Distributions, Volume 1, Norman L. Johnson, Samuel Kotz, N.
Balakrishnan, 1994, John Wiley & Sons, p. 415zation is
I [ ( )
1
2
2
2 ( )
-------------------------------
[ PLQ ( )
2
-----------------------
( ,
j \
exp [ PLQ ( )
2 ( ) 1

Stat::Fit 61 61 61 61
User Guide User Guide User Guide User Guide
Examples of each of the regions of the Chi Squared distribution are shown above. Note that the peak of
the distribution moves away from the minimum value for increasing n, but with a much broader distri-
bution. More examples can be viewed by using the Distribution Viewer.
62 62 62 62
Discrete Uniform Distribution (min, max) Discrete Uniform Distribution (min, max) Discrete Uniform Distribution (min, max) Discrete Uniform Distribution (min, max)
Discrete Uniform Distribution (min, max) Discrete Uniform Distribution (min, max) Discrete Uniform Distribution (min, max) Discrete Uniform Distribution (min, max)
x = min, min+1, ..., max
min = minimum x
max = maximum x
Description Description Description Description
The Discrete Uniform distribution is a discrete distribution bounded on [min, max] with constant proba-
bility at every value on or between the bounds. Sometimes called the discrete rectangular distribution,
it arises when an event can have a finite and equally probable number of outcomes. (see Johnson et al.
1
Note that the probabilities are actually weights at each integer, but are represented by broader bars for
visibility.
1. 'Univariate Discrete Distributions, Norman L. Johnson, Samuel Kotz, Adrienne W. Kemp, 1992, John Wiley &
Sons, p. 272
p x ( )
max min

+ ++ +
1
1
Stat::Fit 63 63 63 63
User Guide User Guide User Guide User Guide
Erlang Distribution (min, m, beta) Erlang Distribution (min, m, beta) Erlang Distribution (min, m, beta) Erlang Distribution (min, m, beta)
min = minimum x
m = shape factor = positive integer
= scale factor > 0
Description Description Description Description
The Erlang distribution is a continuous distribution bounded on the lower side. It is a special case of the
Gamma distribution where the parameter, m, is restricted to a positive integer. As such, the Erlang dis-
tribution has no region where f(x) tends to infinity at the minimum value of x [m<1], but does have a
special case at m=1, where it reduces to the Exponential distribution.
The Erlang distribution has been used extensively in reliability and in queuing theory, thus in discrete
event simulation, because it can be viewed as the sum of m exponentially distributed random variables,
each with mean beta. It can be further generalized (see Johnson
1
, Banks & Carson
2
).
1. 'Continuous Univariate Distributions, Volume 1, Norman L. Johnson, Samuel Kotz, N. Balakrishnan, 1994,
John Wiley & Sons
2. 'Discrete-Event System Simulation, Jerry Banks, John S. Carson II, 1984, Prentice-Hall
f x
x
m
x
m
m
( )
( min)
( )
exp
[ min]



j jj j
( (( (
, ,, ,
\ \\ \
, ,, ,
( (( (
1

64 64 64 64
Erlang Distribution (min, m, beta) Erlang Distribution (min, m, beta) Erlang Distribution (min, m, beta) Erlang Distribution (min, m, beta)
As can be seen in the previous examples, the Erlang distribution follows the Exponential distribution at
m=1, has a positive skewness with a peak near 0 for m between 2 and 9, and tends to a symmetrical dis-
tribution offset from the minimum at larger m.
Stat::Fit 65 65 65 65
User Guide User Guide User Guide User Guide
Exponential Distribution (min, beta) Exponential Distribution (min, beta) Exponential Distribution (min, beta) Exponential Distribution (min, beta)
min = minimum x value
= scale parameter = mean
Description Description Description Description
The Exponential distribution is a continuous distribution bounded on the lower side Its shape is always
the same, starting at a finite value at the minimum and continuously decreasing at larger x. As shown in
the examples above, the Exponential distribution decreases rapidly for increasing x.
The Exponential distribution is frequently used to represent the time between random occurrences, such
as the time between arrivals at a specific location in a queuing model or the time between failures in
reliability models. It has also been used to represent the services times of a specific operation. Further,
it serves as an explicit manner in which the time dependence on noise may be treated. As such, these
models are making explicit use of the lack of history dependence of the exponential distribution; it has
the same set of probabilities when shifted in time. Even when Exponential models are known to be
inadequate to describe the situation, their mathematical tractability provides a good starting point.
Later, a more complex distribution such as Erlang or Weibull may be investigated (see Law & Kelton
1
,
Johnson et al.
2
)
1. 'Simulation Modeling & Analysis, Averill M. Law, W. David Kelton, 1991, McGraw-Hill, p. 330
2. 'Continuous Univariate Distributions, Volume 1, Norman L. Johnson, Samuel Kotz, N. Balakrishnan, 1994,
John Wiley & Sons, p. 499
f x
x
( ) exp
[ min]

j jj j
( (( (
, ,, ,
\ \\ \
, ,, ,
( (( (
1

66 66 66 66
Extreme Value type 1A Distribution (tau, beta) Extreme Value type 1A Distribution (tau, beta) Extreme Value type 1A Distribution (tau, beta) Extreme Value type 1A Distribution (tau, beta)
Extreme Value type 1A Distribution (tau, beta) Extreme Value type 1A Distribution (tau, beta) Extreme Value type 1A Distribution (tau, beta) Extreme Value type 1A Distribution (tau, beta)
= threshold/shift parameter
= scale parameter
Description Description Description Description
The Extreme Value 1A distribution is an unbounded continuous distribution. Its shape is always the
same but may be shifted or scaled to need. It is also called the Gumbel distribution.
The Extreme Value 1A distribution describes the limiting distribution of the extreme values of many
types of samples. Actually, the Extreme Value distribution given above is usually referred to as Type 1,
with Type 2 and Type 3 describing other limiting cases. If x is replaced by -x, then the resulting distri-
bution describes the limiting distribution for the least values of many types of samples. These reflected
pair of distributions are sometimes referred to as Type 1A and Type 1B.
The Extreme Value distribution has been used to represent parameters in growth models, astronomy,
human lifetimes, radioactive emissions, strength of materials, flood analysis, seismic analysis, and rain-
fall analysis. It is also directly related to many learning models (see Johnson
1
).
The Extreme Value 1A distribution starts below , is skewed in the positive direction peaking at , then
decreasing monotonically thereafter. determines the breadth of the distribution.
1. 'Continuous Univariate Distributions, Volume 2, Norman L. Johnson, Samuel Kotz, N. Balakrishnan, 1995,
John Wiley & Sons
f x
x x
( ) exp
[ ]
exp exp
[ ]

j jj j
( (( (
, ,, ,
\ \\ \
, ,, ,
( (( (

j jj j
( (( (
, ,, ,
\ \\ \
, ,, ,
( (( (
j jj j
( (( (
, ,, ,
\ \\ \
, ,, ,
( (( (
1





Stat::Fit 67 67 67 67
User Guide User Guide User Guide User Guide
Extreme Value type 1B Distribution (tau, beta) Extreme Value type 1B Distribution (tau, beta) Extreme Value type 1B Distribution (tau, beta) Extreme Value type 1B Distribution (tau, beta)
= threshold/shift parameter
= scale parameter
Description Description Description Description
The Extreme Value 1B distribution is an unbounded continuous distribution. Its shape is always the
same but may be shifted or scaled to need.
The Extreme Value 1B distribution describes the limiting distribution of the least values of many types
of samples. Actually, the Extreme Value distribution given above is usually referred to as Type 1, with
Type 2 and Type 3 describing other limiting cases. If x is replaced by x, then the resulting distribution
describes the limiting distribution for the greatest values of many types of samples. These reflected pair
of distributions are sometimes referred to as Type 1A and Type 1B. Note that the complimentary distri-
bution can be used to represent samples with positive skewness.
The Extreme Value distribution has been used to represent parameters in growth models, astronomy,
human lifetimes, radioactive emissions, strength of materials, flood analysis, seismic analysis, and rain-
fall analysis. It is also directly related to many learning models. (see Johnson et. al.4 )
1
The Extreme Value 1B distribution starts below , is skewed in the negative direction peaking at ,
then decreasing monotonically thereafter. determines the breadth of the distribution.
1. 'Continuous Univariate Distributions, Volume 2, Norman L. Johnson, Samuel Kotz, N.
Balakrishnan, 1995, John Wiley & Sons`` uAA
I [ ( )
1

---
[

-----------
( ,
j \
exp exp
[

-----------
( ,
j \
( ,
j \
exp
68 68 68 68
Gamma Distribution (min, alpha, beta) Gamma Distribution (min, alpha, beta) Gamma Distribution (min, alpha, beta) Gamma Distribution (min, alpha, beta)
Gamma Distribution (min, alpha, beta) Gamma Distribution (min, alpha, beta) Gamma Distribution (min, alpha, beta) Gamma Distribution (min, alpha, beta)
min = minimum x
= shape parameter > 0
= scale parameter > 0
Description Description Description Description
The Gamma distribution is a continuous distribution bounded at the lower side. It has three distinct
regions. For =1, the Gamma distribution reduces to the Exponential distribution, starting at a finite
value at minimum x and decreasing monotonically thereafter. For <1, the Gamma distribution tends to
infinity at minimum x and decreases monotonically for increasing x. For >1, the Gamma distribution
is 0 at minimum x, peaks at a value that depends on both alpha and beta, decreasing monotonically
thereafter. If alpha is restricted to positive integers, the Gamma distribution is reduced to the Erlang
distribution.
Note that the Gamma distribution also reduces to the Chi-squared distribution for min=0, =2, and
=n/2. It can then be viewed as the distribution of the sum of squares of independent unit normal vari-
ables, with n degrees of freedom and is used in many statistical tests.
f x
x x
( )
( min)
( )
exp
[ min]



j jj j
( (( (
, ,, ,
\ \\ \
, ,, ,
( (( (



1

Stat::Fit 69 69 69 69
User Guide User Guide User Guide User Guide
The Gamma distribution can also be used to approximate the Normal distribution, for large alpha, while
maintaining its strictly positive values of x [actually (x-min)].
The Gamma distribution has been used to represent lifetimes, lead times, personal income data, a popu-
lation about a stable equilibrium, interarrival times, and service times. In particular, it can represent
lifetime with redundancy (see Johnson
1
, Shooman
2
).
Examples of each of the regions of the Gamma distribution are shown above. Note the peak of the dis-
tribution moving away from the minimum value for increasing alpha, but with a much broader distribu-
tion.
1. 'Continuous Univariate Distributions, Volume 1, Norman L. Johnson, Samuel Kotz, N. Balakrishnan, 1994,
John Wiley & Sons, p. 343
2. 'Probabilistic Reliability: An Engineering Approach, Martin L. Shooman, 1990, Robert E. Krieger
70 70 70 70
Geometric Distribution (p) Geometric Distribution (p) Geometric Distribution (p) Geometric Distribution (p)
Geometric Distribution (p) Geometric Distribution (p) Geometric Distribution (p) Geometric Distribution (p)
p = probability of occurrence
Description Description Description Description
The Geometric distribution is a discrete distribution bounded at 0 and unbounded on the high side. It is
a special case of the Negative Binomial distribution. In particular, it is the direct discrete analog for the
continuous Exponential distribution. The Geometric distribution has no history dependence, its proba-
bility at any value being independent of a shift along the axis.
The Geometric distribution has been used for inventory demand, marketing survey returns, a ticket con-
trol problem, and meteorological models (see Johnson
1
, Law & Kelton
2
)
Several examples with decreasing probability are shown above. Note that the probabilities are actually
weights at each integer, but are represented by broader bars for visibility.
1. 'Univariate Discrete Distributions, Norman L. Johnson, Samuel Kotz, Adrienne W. Kemp, 1992, John Wiley &
Sons, p. 201
2. 'Simulation Modeling & Analysis, Averill M. Law, W. David Kelton, 1991, McGraw-Hill, p. 366
p x p p
x
( ) ( ) 1
Stat::Fit 71 71 71 71
User Guide User Guide User Guide User Guide
Inverse Gaussian Distribution (min, alpha, beta) Inverse Gaussian Distribution (min, alpha, beta) Inverse Gaussian Distribution (min, alpha, beta) Inverse Gaussian Distribution (min, alpha, beta)
min = minimum x
= shape parameter > 0
= mixture of shape and scale > 0
Description Description Description Description
The Inverse Gaussian distribution is a continuous distribution with a bound on the lower side. It is
uniquely zero at the minimum x, and always positively skewed. The Inverse Gaussian distribution is
also known as the Wald distribution.
The Inverse Gaussian distribution was originally used to model Brownian motion and diffusion pro-
cesses with boundary conditions. It has also been used to model the distribution of particle size in
aggregates, reliability and lifetimes, and repair time (see Johnson
1
)
Examples of Inverse Gaussian distributions are shown above. In particular, notice the drastically
increased upper tail for increasing .
1. 'Continuous Univariate Distributions, Volume 1, Norman L. Johnson, Samuel Kotz, N. Balakrishnan, 1994,
John Wiley & Sons, p. 290
f x
x
x
x
( )
( min)
exp
( min )
( min)
/


j jj j
( (( (
, ,, ,
\ \\ \
, ,, ,
( (( (



, ,, ,

, ,, ,
] ]] ]
] ]] ]
] ]] ]



2 2
3
1 2
2
2
72 72 72 72
Inverse Weibull Distribution (min, alpha, beta) Inverse Weibull Distribution (min, alpha, beta) Inverse Weibull Distribution (min, alpha, beta) Inverse Weibull Distribution (min, alpha, beta)
Inverse Weibull Distribution (min, alpha, beta) Inverse Weibull Distribution (min, alpha, beta) Inverse Weibull Distribution (min, alpha, beta) Inverse Weibull Distribution (min, alpha, beta)
min = minimum x
= shape parameter > 0
= mixture of shape and scale > 0
Description Description Description Description
The Inverse Weibull distribution is a continuous distribution with a bound on the lower side. It is
uniquely zero at the minimum x, and always positively skewed. In general, the Inverse Weibull distribu-
tion fits bounded, but very peaked, data with a long positive tail.
The Inverse Weibull distribution has been used to describe several failure processes as a distribution of
lifetime. (see Calabria & Pulcini 6 )
1
It can also be used to fit data with abnormal large outliers on the
positive side of the peak.
Examples of Inverse Weibull distribution are shown above. In particular, notice the increased peaked-
ness and movement from the minimum for increasing .
1. R. Calabria, G. Pulcini, 'On the maximum likelihood and least-squares estimation in the
Inverse Weibull Distribution, Statistica Applicata, Vol. 2, n.1, 1990, p.53
I [ ( )
1
[ PLQ ( )
--------------------------
( ,
j \
1
1
[ PLQ ( )
--------------------------
( ,
j \

( ,
j \
exp
Stat::Fit 73 73 73 73
User Guide User Guide User Guide User Guide
Johnson SB Distribution (min, lamda, gamma, delta) Johnson SB Distribution (min, lamda, gamma, delta) Johnson SB Distribution (min, lamda, gamma, delta) Johnson SB Distribution (min, lamda, gamma, delta)
where: ; min = minimum value of x
= range of x above the minimum
= skewness parameter
= shape parameter > 0
Description Description Description Description
The Johnson SB distribution is a continuous distribution has both upper and lower finite bounds, similar
to the Beta distribution. The Johnson SB distribution, together with the Lognormal and the Johnson SU
distributions, are transformations of the Normal distribution and can be used to describe most naturally
occurring unimodal sets of data. However, the Johnson SB and SU distributions are mutually exclusive,
I [ ( )

2\ 1 \ ( )
---------------------------------- 1 2 ( ) 1Q
\
1 \
-----------
( ,
j \
( ,
j \
2
( ,
j \
exp
\
[ PLQ

------------------
74 74 74 74
Johnson SB Distribution (min, lamda, gamma, delta) Johnson SB Distribution (min, lamda, gamma, delta) Johnson SB Distribution (min, lamda, gamma, delta) Johnson SB Distribution (min, lamda, gamma, delta)
each describing data in specific ranges of skewness and kurtosis. This leaves some cases where the nat-
ural boundedness of the population cannot be matched.
The family of Johnson distributions have been used in quality control to describe non-normal processes,
which can then be transformed to the Normal distribution for use with standard tests. As can be seen in
the following examples, the Johnson SB distribution goes to zero at both of its bounds, with control-
ling the skewness and controlling the shape. The distribution can be either unimodal or bimodal. (see
Johnson et al.
1
and N. L. Johnson
2
)
1. 'Continuous Univariate distributions, Volume 1, Norman L. Johnson, Samuel Kotz, N.
Balakrishnan, 1994, Johns Wiley & Sons, p. 34t/oS,,
2. N. L. Johnson, 'Systems of frequency curves generated by methods of translation,
Biometrika, Vol. 36, 1949, p. 149
Stat::Fit 75 75 75 75
User Guide User Guide User Guide User Guide
Johnson SU Distribution (xi, lamda, gamma, delta) Johnson SU Distribution (xi, lamda, gamma, delta) Johnson SU Distribution (xi, lamda, gamma, delta) Johnson SU Distribution (xi, lamda, gamma, delta)
Where:
= range of x above the minimum
= skewness parameter
= shape parameter > 0
Description Description Description Description
The Johnson SU distribution is an unbounded continuous distribution. The Johnson SU distribution,
together with the Lognormal and the Johnson SB distributions, can be used to describe most naturally
occurring unimodal sets of data. However, the Johnson SB and SU distributions are mutually exclusive,
I [ ( )

2 \
2
1
--------------------------------- 1/2 1Q \ \
2
1
( ,
j \

2
( ,
j \
exp
\
[

-----------
76 76 76 76
Johnson SU Distribution (xi, lamda, gamma, delta) Johnson SU Distribution (xi, lamda, gamma, delta) Johnson SU Distribution (xi, lamda, gamma, delta) Johnson SU Distribution (xi, lamda, gamma, delta)
each describing data in specific ranges of skewness and kurtosis. This leaves some cases where the nat-
ural boundedness of the population cannot be matched.
The family of Johnson distributions have been used in quality control to describe non-normal processes,
which can then be transformed to the Normal distribution for use with standard tests.
The Johnson SU distribution can be used in place of the notoriously unstable Pearson IV distribution,
with reasonably good fidelity over the most probable range of values.
As can be see in the examples above, the Johnson SU distribution is one of the few unbounded distribu-
tions that can vary its shape, with controlling the skewness and controlling the shape. The
scale is controlled by , , and . (see Johnson et al.
1
and N. L. Johnson
2
)
1. 'Continuous Univariate Distributions, Volume 1, Norman L. Johnson, Samuel Kotz, N.
Balakrishnan, 1994, John Wiley & Sons, p. 34toriouslyunstable oS,,
2. N. L. Johnson, 'Systems of frequency curves generated by methods of translation,
Biometrika, Vol. 36, 1949, p. 149
Stat::Fit 77 77 77 77
User Guide User Guide User Guide User Guide
Logistic Distribution (alpha, beta) Logistic Distribution (alpha, beta) Logistic Distribution (alpha, beta) Logistic Distribution (alpha, beta)
= shift parameter
= scale parameter > 0
Description Description Description Description
The Logistic distribution is an unbounded continuous distribution which is symmetrical about its mean
(and shift parameter), . As shown in the example above, the shape of the Logistic distribution is very
much like the Normal distribution, except that the Logistic distribution has broader tails.
The Logistic function is most often used as a growth model; for populations, for weight gain, for busi-
ness failure, etc.. The Logistic distribution can be used to test for the suitability of such a model, with
transformation to get back to the minimum and maximum values for the Logistic function. Occasion-
ally, the Logistic function is used in place of the Normal function where exceptional cases play a larger
role (see Johnson
1
).
1. 'Continuous Univariate Distributions, Volume 2, Norman L. Johnson, Samuel Kotz, N. Balakrishnan, 1995,
John Wiley & sons, p.113
f x
x
x
( )
exp
[ ]
exp
[ ]


j jj j
( (( (
, ,, ,
\ \\ \
, ,, ,
( (( (
+ ++ +
j jj j
( (( (
, ,, ,
\ \\ \
, ,, ,
( (( (
, ,, ,

, ,, ,
] ]] ]
] ]] ]
] ]] ]





1
2
78 78 78 78
Log-Logistic Distribution (min, p, beta) Log-Logistic Distribution (min, p, beta) Log-Logistic Distribution (min, p, beta) Log-Logistic Distribution (min, p, beta)
Log-Logistic Distribution (min, p, beta) Log-Logistic Distribution (min, p, beta) Log-Logistic Distribution (min, p, beta) Log-Logistic Distribution (min, p, beta)
min = minimum x
p = shape parameter > 0
= scale parameter > 0
Description Description Description Description
The Log-Logistic distribution is a continuous distribution bounded on the lower side. Like the Gamma
distribution, it has three distinct regions. For p=1, the Log-Logistic distribution resembles the Exponen-
tial distribution, starting at a finite value at minimum x and decreasing monotonically thereafter. For
p<1, the Log-Logistic distribution tends to infinity at minimum x and decreases monotonically for
increasing x. For p>1, the Log-Logistic distribution is 0 at minimum x, peaks at a value that depends on
both p and , decreasing monotonically thereafter.
f x
p
x
x
p
p
( )
min
min)

j jj j
( (( (
, ,, ,
\ \\ \
, ,, ,
( (( (
+ ++ +
j jj j
( (( (
, ,, ,
\ \\ \
, ,, ,
( (( (
, ,, ,

, ,, ,
, ,, ,
] ]] ]
] ]] ]
] ]] ]
] ]] ]




1
2
1
Stat::Fit 79 79 79 79
User Guide User Guide User Guide User Guide
By definition, the natural logarithm of a Log-Logistic random variable is a Logistic random variable,
and can be related to the included Logistic distribution in much the same way that the Lognormal distri-
bution can be related to the included Normal distribution. The parameters for the included Logistic dis-
tribution, Lalpha and Lbeta, are given in terms of the Log-Logistic parameters, LLp and LL, by
Lalpha = ln (LL)
Lbeta = 1/LLp
The Log-Logistic distribution is used to model the output of complex processes such as business failure,
product cycle time, etc. (see Johnson
1
).
Note for p=1, the Log-Logistic distribution decreases more rapidly than the Exponential distribution but
has a broader tail. For large p, the distribution becomes more symmetrical and moves away from the
minimum.
1. 'Continuous Univariate Distributions, Volume 2, Norman L. Johnson, Samuel Kotz, N. Balakrishnan, 1995,
John Wiley & Sons, p. 151
80 80 80 80
Lognormal Distribution (min, mu, sigma) Lognormal Distribution (min, mu, sigma) Lognormal Distribution (min, mu, sigma) Lognormal Distribution (min, mu, sigma)
Lognormal Distribution (min, mu, sigma) Lognormal Distribution (min, mu, sigma) Lognormal Distribution (min, mu, sigma) Lognormal Distribution (min, mu, sigma)
min = minimum x
= mean of the included Normal
= standard deviation of the included Normal
Description Description Description Description
The Lognormal distribution is a continuous distribution bounded on the lower side. It is always 0 at
minimum x, rising to a peak that depends on both and , then decreasing monotonically for increasing
x.
By definition, the natural logarithm of a Lognormal random variable is a Normal random variable. Its
parameters are usually given in terms of this included Normal.
The Lognormal distribution can also be used to approximate the Normal distribution, for small , while
maintaining its strictly positive values of x [actually (x-min)].
The Lognormal distribution is used in many different areas including the distribution of particle size in
naturally occurring aggregates, dust concentration in industrial atmospheres, the distribution of minerals
present in low concentrations, duration of sickness absence, physicians consultant time, lifetime distri-
f x
x
x
( )
( min)
exp
[ln( min) ]




j jj j
( (( (
, ,, ,
\ \\ \
, ,, ,
( (( (
1
2
2
2
2
2



Stat::Fit 81 81 81 81
User Guide User Guide User Guide User Guide
butions in reliability, distribution of income, employee retention, and many applications modeling
weight, height, etc. (see Johnson
1
).
The Lognormal distribution can provide very peaked distributions for increasing , indeed, far more
peaked than can be easily represented in graphical form.
1. 'Continuous Univariate Distributions, Volume 1, Norman L. Johnson, Samuel Kotz, N. Balakrishnan, 1994,
John Wiley & Sons, p. 207
82 82 82 82
Negative Binomial Distribution (p,k) Negative Binomial Distribution (p,k) Negative Binomial Distribution (p,k) Negative Binomial Distribution (p,k)
Negative Binomial Distribution (p,k) Negative Binomial Distribution (p,k) Negative Binomial Distribution (p,k) Negative Binomial Distribution (p,k)
x = number of trials to get k events...
p = probability of event = [0,1]
k = number of desired events = positive integer
Description Description Description Description
The Negative Binomial distribution is a discrete distribution bounded on the low side at 0 and
unbounded on the high side. The Negative Binomial distribution reduces to the Geometric Distribution
for k=1. The Negative Binomial distribution gives the total number of trials, x to get k events (fail-
ures...), each with the constant probability, p, of occurring.
The Negative Binomial distribution has many uses; some occur because it provides a good approxima-
tion for the sum or mixing of other discrete distributions. By itself, it is used to model accident statis-
p x
k x
x
p p
k x
( ) ( )
+ ++ +
j jj j
( (( (
, ,, ,
\ \\ \
, ,, ,
( (( (

1
1
Stat::Fit 83 83 83 83
User Guide User Guide User Guide User Guide
tics, birth-and-death processes, market research and consumer expenditure, lending library data,
biometrics data, and many others (see Johnson
1
).
Several examples with increasing k are shown above. With smaller probability, p, the number of classes
is so large that the distribution is best plotted as a filled polygon. Note that the probabilities are actually
weights at each integer, but are represented by broader bars for visibility.
1. 'Univariate Discrete Distributions, Norman L. Johnson, Samuel Kotz, Adrienne W. Kemp, 1992, John Wiley &
Sons, p. 223
84 84 84 84
Normal Distribution (mu, sigma) Normal Distribution (mu, sigma) Normal Distribution (mu, sigma) Normal Distribution (mu, sigma)
Normal Distribution (mu, sigma) Normal Distribution (mu, sigma) Normal Distribution (mu, sigma) Normal Distribution (mu, sigma)
= shift parameter
= scale parameter = standard deviation
Description Description Description Description
The Normal distribution is a unbounded continuous distribution. It is sometimes called a Gaussian dis-
tribution or the bell curve. Because of its property of representing an increasing sum of small, indepen-
dent errors, the Normal distribution finds many, many uses in statistics. It is wrongly used in many
situations. Possibly, the most important test in the fitting of analytical distributions is the elimination of
the Normal distribution as a possible candidate. (See Johnson
1
).
The Normal distribution is used as an approximation for the Binomial distribution when the values of n,
p are in the appropriate range. The Normal distribution is frequently used to represent symmetrical
data, but suffers from being unbounded in both directions. If the data is known to have a lower bound,
it may be better represented by suitable parameterization of the Lognormal, Weibull or Gamma distribu-
tions. If the data is known to have both upper and lower bounds, the Beta distribution can be used,
although much work has been done on truncated Normal distributions (not supported in Stat::Fit).
The Normal distribution, shown above, has the familiar bell shape. It is unchanged in shape with
changes in or .
1. 'Continuous Univariate Distributions, Volume 1, Norman L. Johnson, Samuel Kotz, N. Balakrishnan, 1994,
John Wiley & Sons, p. 80
f x
x
( ) exp
[ ]


j jj j
( (( (
, ,, ,
\ \\ \
, ,, ,
( (( (
1
2
2
2
2
2



Stat::Fit 85 85 85 85
User Guide User Guide User Guide User Guide
Pareto Distribution (min, alpha) Pareto Distribution (min, alpha) Pareto Distribution (min, alpha) Pareto Distribution (min, alpha)
min = minimum x
= scale parameter > 0
Description Description Description Description
The Pareto distribution is a continuous distribution bounded on the lower side. It has a finite value at
the minimum x and decreases monotonically for increasing x. A pareto random variable is the exponen-
tial of an Exponential random variable, and possesses many of the same characteristics.
The Pareto distribution has, historically, been used to represent the income distribution of a society. It is
also used to model many empirical phenomena with very long right tails, such as city population sizes,
occurrence of natural resources, stock price fluctuations, size of firms, brightness of comets, and error
clustering in communication circuits (see Johnson
1
).
The shape of the Pareto curve changes slowly with , but the tail of the distribution increases dramati-
cally with decreasing .
1. 'Continuous Univariate Distributions, Volume 1, Norman L. Johnson, Samuel Kotz, N. Balakrishnan, 1994,
John Wiley & Sons, p. 607
f x
x
( )
min

+ ++ +


1
86 86 86 86
Pearson 5 Distribution (min, alpha, beta) Pearson 5 Distribution (min, alpha, beta) Pearson 5 Distribution (min, alpha, beta) Pearson 5 Distribution (min, alpha, beta)
Pearson 5 Distribution (min, alpha, beta) Pearson 5 Distribution (min, alpha, beta) Pearson 5 Distribution (min, alpha, beta) Pearson 5 Distribution (min, alpha, beta)
min = minimum x
= shape parameter > 0
= scale parameter > 0
Description Description Description Description
The Pearson 5 distribution is a continuous distribution with a bound on the lower side. The Pearson 5
distribution is sometimes called the Inverse Gamma distribution due to the reciprocal relationship
between a Pearson 5 random variable and a Gamma random variable.
The Pearson 5 distribution is useful for modeling time delays where some minimum delay value is
almost assured and the maximum time is unbounded and variably long, such as time to complete a diffi-
cult task, time to respond to an emergency, time to repair a tool, etc. Similar space situations also exist
such as manufacturing space for a given process (see Law & Kelton
1
).
The Pearson 5 distribution starts slowly near its minimum and has a peak slightly removed from it, as
shown above. With decreasing , the peak gets flatter (see vertical scale) and the tail gets much
broader.
1. 'Simulation Modeling & Analysis, Averill M. Law, W. David Kelton, 1991, McGraw-Hill, p. 339
f x
x x
( )
( )( min)
exp
[ min]




j jj j
( (( (
, ,, ,
\ \\ \
, ,, ,
( (( (
+ ++ +






1
Stat::Fit 87 87 87 87
User Guide User Guide User Guide User Guide
Pearson 6 Distribution (min, beta, p, q) Pearson 6 Distribution (min, beta, p, q) Pearson 6 Distribution (min, beta, p, q) Pearson 6 Distribution (min, beta, p, q)
x > min
min (-,)
> 0
p > 0
q > 0
Description Description Description Description
The Pearson 6 distribution is a continuous distribution bounded on the low side. The Pearson 6 distribu-
tion is sometimes called the Beta distribution of the second kind due to the relationship of a Pearson 6
random variable to a Beta random variable. When min=0, =1, p=nu
1
/2, q=nu
2,
/2, the Pearson 6 distri-
bution reduces to the F distribution of nu
1
, nu
2
which is used for many statistical tests of goodness of fit
(see Johnson
1
).
f x
x
x
B p q
p
p q
( )
min
min
( , )

j jj j
( (( (
, ,, ,
\ \\ \
, ,, ,
( (( (
+ ++ +
j jj j
( (( (
, ,, ,
\ \\ \
, ,, ,
( (( (
, ,, ,

, ,, ,
] ]] ]
] ]] ]
] ]] ]

+ ++ +



1
1
88 88 88 88
Pearson 6 Distribution (min, beta, p, q) Pearson 6 Distribution (min, beta, p, q) Pearson 6 Distribution (min, beta, p, q) Pearson 6 Distribution (min, beta, p, q)
Like the Gamma distribution, it has three distinct regions. For p=1, the Pearson 6 distribution resembles
the Exponential distribution, starting at a finite value at minimum x and decreasing monotonically
thereafter. For p<1, the Pearson 6 distribution tends to infinity at minimum x and decreases monotoni-
cally for increasing x. For p>1, the Pearson 6 distribution is 0 at minimum x, peaks at a value that
depends on both p and q, decreasing monotonically thereafter.
The Pearson 6 distribution appears to have found little direct use, except in its reduced form as the F dis-
tribution where it serves as the distribution of the ratio of independent estimators of variance and pro-
vides the final test for the analysis of variance.
The three regions of the Pearson 6 distribution on shown above. Also note that the distribution becomes
sharply peaked just off the minimum for increasing q.
1. 'Continuous Univariate Distributions, Volume 2 Norman L. Johnson, Samuel Kotz, N. Balakrishnan, 1995,
John Wiley & Sons, p. 322
Stat::Fit 89 89 89 89
User Guide User Guide User Guide User Guide
Poisson Distribution (lambda) Poisson Distribution (lambda) Poisson Distribution (lambda) Poisson Distribution (lambda)
= rate of occurrence
Description Description Description Description
The Poisson distribution is a discrete distribution bounded at 0 on the low side and unbounded on the
high side. The Poisson distribution is a limiting form of the Hypergeometric distribution.
The Poisson distribution finds frequent use because it represents the infrequent occurrence of events
whose rate is constant. This includes many types of events in time or space such as arrivals of telephone
calls, defects in semiconductor manufacturing, defects in all aspects of quality control, molecular distri-
butions, stellar distributions, geographical distributions of plants, shot noise, etc.. It is an important
starting point in queuing theory and reliability theory.
1
Note that the time between arrivals (defects) is
Exponentially distributed, which makes this distribution a particularly convenient starting point even
when the process is more complex.
The Poisson distribution peaks near and falls off rapidly on either side. Note that the probabilities are
actually weights at each integer, but are represented by broader bars for visibility.
1. 'Univariate Discrete Distributions, Norman L. Johnson, Samuel Kotz, Adrienne W. Kemp, 1992, John Wiley &
Sons, p. 151
p x
e
x
x
( )
!



90 90 90 90
Power Function Distribution (min, max, alpha) Power Function Distribution (min, max, alpha) Power Function Distribution (min, max, alpha) Power Function Distribution (min, max, alpha)
Power Function Distribution (min, max, alpha) Power Function Distribution (min, max, alpha) Power Function Distribution (min, max, alpha) Power Function Distribution (min, max, alpha)
min = minimum value of x
max = maximum value of x
shape parameter > 0
Description Description Description Description
The Power Function distribution is a continuous distribution that has both upper and lower finite
bounds, and is a special case of the Beta distribution with q=1. (see Johnson et al.
1
) The Uniform distri-
bution is a special case of the Power Function distribution with p=1.
As can be seen from the examples above, the Power Function distribution can approach zero or infinity
at its lower bound, but always has a finite value at its upper bound. Alpha controls the value at the lower
bound as well as the shape.
1. 'Continuous Univariate Distributions, Volume 2, Norman L. Johnson, Samuel Kotz, N.
Balakrishnan, 1995, John Wiley & Sons, p. 210AAAAAAAAAoS,,
I [ ( )
[ PLQ ( )
1
PD[ PLQ ( )

--------------------------------------
Stat::Fit 91 91 91 91
User Guide User Guide User Guide User Guide
Rayleigh Distribution (min, sigma) Rayleigh Distribution (min, sigma) Rayleigh Distribution (min, sigma) Rayleigh Distribution (min, sigma)
min = minimum x
= scale parameter > 0
Description Description Description Description
The Rayleigh distribution is a continuous distribution bounded on the lower side. It is a special case of
the Weibull distribution with alpha =2 and beta/sqrt(2) =sigma. Because of the fixed shape parameter,
the Rayleigh distribution does not change shape although it can be scaled.
The Rayleigh distribution is frequently used to represent lifetimes because its hazard rate increases lin-
early with time, e.g. the lifetime of vacuum tubes. This distribution also finds application in noise prob-
lems in communications. (see Johnson et al.
1
and Shooman
2
)
1. 'Continuous Univariate Distributions, Volume 1, Norman L. Johnson, Samuel Kotz, N.
Balakrishnan, 1994, John Wiley & Sons, p. 456. (see Johnson et aloS,,
2. 'Probabilistic Reliability: An Engineering Approach, Martin L. Shooman, 1990, RobertE.
Krieger, p. 48
I [ ( )
[ PLQ ( )

2
-----------------------
[ PLQ ( )
2
2
2
--------------------------
( ,
, (
j \
exp
92 92 92 92
Triangular Distribution (min, max, mode) Triangular Distribution (min, max, mode) Triangular Distribution (min, max, mode) Triangular Distribution (min, max, mode)
Triangular Distribution (min, max, mode) Triangular Distribution (min, max, mode) Triangular Distribution (min, max, mode) Triangular Distribution (min, max, mode)
min < x mode
mode < x max
min = minimum x
max = maximum x
mode = most likely x
Description Description Description Description
The Triangular distribution is a continuous distribution bounded on both sides.
The Triangular distribution is often used when no or little data is available; it is rarely an accurate repre-
sentation of a data set (see Law & Kelton
1
). However, it is employed as the functional form of regions
for fuzzy logic due to its ease of use.
The Triangular distribution can take on very skewed forms, as shown above, including negative skew-
ness. For the exceptional cases where the mode is either the min or max, the Triangular distribution
becomes a right triangle.
1. 'Simulation Modeling & Analysis, Averill M. Law, W. David Kelton, 1991, McGraw-Hill, p. 341
f x
x
e
x
e
( )
( min)
(max min)(mod min)
(max )
(max min)(max mod )














2
2
Stat::Fit 93 93 93 93
User Guide User Guide User Guide User Guide
Uniform Distribution (min, max) Uniform Distribution (min, max) Uniform Distribution (min, max) Uniform Distribution (min, max)
min = minimum x
max = maximum x
Description Description Description Description
The Uniform distribution is a continuous distribution bounded on both sides. Its density does not
depend on the value of x. It is a special case of the Beta distribution. It is frequently called the rectan-
gular distribution (see Johnson
1
). Most random number generators provide samples from the Uniform
distribution on (0,1) and then convert these samples to random variates from other distributions.
The Uniform distribution is used to represent a random variable with constant likelihood of being in any
small interval between min and max. Note that the probability of either the min or max value is 0; the
end points do NOT occur. If the end points are necessary, try the sum of two opposing right Triangular
distributions.
1. 'Continuous Univariate Distributions, Volume 2, Norman L. Johnson, Samuel Kotz, N. Balakrishnan, 1995,
John Wiley & Sons, p. 276
f x ( )
max min


1
94 94 94 94
Weibull Distribution (min, alpha, beta) Weibull Distribution (min, alpha, beta) Weibull Distribution (min, alpha, beta) Weibull Distribution (min, alpha, beta)
Weibull Distribution (min, alpha, beta) Weibull Distribution (min, alpha, beta) Weibull Distribution (min, alpha, beta) Weibull Distribution (min, alpha, beta)
min = minimum x
= shape parameter > 0
= scale parameter > 0
Description Description Description Description
The Weibull distribution is a continuous distribution bounded on the lower side. Because it provides
one of the limiting distributions for extreme values, it is also referred to as the Frechet distribution and
the Weibull-Gnedenko distribution. Unfortunately, the Weibull distribution has been given various
functional forms in the many engineering references; the form above is the standard form given in
Johnson
1
).
Like the Gamma distribution, it has three distinct regions. For =1, the Weibull distribution is reduced
to the Exponential distribution, starting at a finite value at minimum x and decreasing monotonically
thereafter. For <1, the Weibull distribution tends to infinity at minimum x and decreases monotoni-
1. Continuous Univariate Distributions, Volume 1, Norman L. Johnson, Samuel Kotz, N. Balakrish-
nan, 1994, John Wiley & Sons, p. 628
f x
x x
( )
min
exp
[ min]

j jj j
( (( (
, ,, ,
\ \\ \
, ,, ,
( (( (

j jj j
( (( (
, ,, ,
\ \\ \
, ,, ,
( (( (
j jj j
( (( (
, ,, ,
, ,, ,
\ \\ \
, ,, ,
( (( (
( (( (



1
Stat::Fit 95 95 95 95
User Guide User Guide User Guide User Guide
cally for increasing x. For >1, the Weibull distribution is 0 at minimum x, peaks at a value that
depends on both and , decreasing monotonically thereafter. Uniquely, the Weibull distribution has
negative skewness for >3.6.
The Weibull distribution can also be used to approximate the Normal distribution for =3.6, while
maintaining its strictly positive values of x [actually (x-min)], although the kurtosis is slightly smaller
than 3, the Normal value.
The Weibull distribution derived its popularity from its use to model the strength of materials, and has
since been used to model just about everything. In particular, the Weibull distribution is used to repre-
sent wearout lifetimes in reliability, wind speed, rainfall intensity, health related issues, germination,
duration of industrial stoppages, migratory systems, and thunderstorm data (see Johnson
1
and
Shooman
2
).
1. ibid.
2. 'Probabilistic Reliability: An Engineering Approach, Martin L.
Shooman, 1990, Robert E. Krieger, p. 190
96 96 96 96
Weibull Distribution (min, alpha, beta) Weibull Distribution (min, alpha, beta) Weibull Distribution (min, alpha, beta) Weibull Distribution (min, alpha, beta)
Stat::Fit 97 97 97 97
User Guide User Guide User Guide User Guide
Bibliography
An Introduction in Mathematical Statistics H. D. Brunk, 1960, Ginn & Co.
Continuous Univariate Distributions, Volume 1, Norman L. Johnson, Samuel Kotz, N. Balakrishnan,
1994, John Wiley & Sons
Continuous Univariate Distributions, Volume 2, Norman L. Johnson, Samuel Kotz, N. Balakrishnan,
1995, John Wiley & Sons
Discrete Event System Simulation, Jerry Banks, John S. Carson II, 1984, Prentice-Hall
Introductory Statistical Analysis, Donald L. Harnett, James L. Murphy, 1975, Addison-Wesley
Kendalls Advanced Theory of Statistics, Volume 1 - Distribution Theory, Alan Stuart & J. Keith Ord,
1994, Edward Arnold
Kendalls Advanced Theory of Statistics, Volume 2 Alan Stuart & J. Keith Ord, 1991, Oxford Univer-
sity Press
Seminumerical Algorithms, Volume 2 Donald E. Knuth, 1981, Addison-Wesley
Simulation Modeling & Analysis Averill M. Law, W. David Kelton, 1991, McGraw-Hill
Univariate Discrete Distributions Normal L. Johnson, Samuel Kotz, Adrienne W. Kemp, 1992, John
Wiley & Sons
Statistical Distributions Second Edition, Merran Evans, Nicholas Hastings, Brian Peacock, 1993,
John Wiley & Sons
98 98 98 98
Stat::Fit 99 99 99 99
User Guide User Guide User Guide User Guide
Index
A
absolute value 13
AD statistic 28, 29
analytical distributions 22, 25, 84
Anderson Darling test 25, 28, 29, 30
ascending cumulative 35
Auto 22, 31, 46
autocorrelation 20, 21, 33, 35
B
Beta distribution 56, 57, 84, 87, 93
binned data 16, 19
Binomial distribution 58, 59, 84
Box Plot 38
C
Chi Squared Distribution 60
chi squared test 25, 26, 27, 51
classes 11, 12, 16, 19, 25, 26, 30
comparison graph 26, 34, 37
continuous data 9, 11, 16, 19, 25, 26, 28
continuous distributions 11, 12, 22, 24, 25, 29,
31
cumulative distribution 19, 27, 28, 29, 35, 39,
52
D
Data Table 9, 50
density 35
descending cumulative 35
descriptive statistics 18, 50
difference graph 38, 52
discrete data 11, 12, 16, 25, 28
discrete distribution 11, 12, 19, 22, 29, 58, 62,
70, 82, 89
Discrete Uniform distribution 62
distribution fit 23, 51
Distribution Graph 38
distribution viewer 40
E
Erlang distribution 63, 68
Exponential distribution 63, 64, 65, 68, 78, 79,
85, 88, 94
Export 46
Export of Empirical Distributions 46
Extreme Value distribution 66
Extreme Value type 1B Distribution 67
F
file output 46
filter 14
100 100 100 100
fit setup 23, 26, 27, 51
fonts 36, 37, 45
frequency 35
G
Gamma distribution 63, 68, 69, 78, 84, 86, 88,
94
generate 15, 29, 50
Geometric distribution 70, 82
goodness of fit 11, 22, 23, 25, 26, 27, 28, 29, 31,
51, 87
graph colors 37
graph scale 36
graph text 36
graphics 16, 33, 34, 35, 41, 46
graphics style 33, 34, 35
H
histogram 11, 16, 19, 26, 33, 34, 35, 37
I
independence tests 20
input data 16
input graph 16, 50
input options 11, 16, 19, 25
intervals 9, 11, 12, 16, 19, 25, 26, 27, 30
Inverse gaussian distribution 71
Inverse Weibull Distribution 72
J
Johnson SB Distribution 73
Johnson SU Distribution 75
K
Kolmogorov Smirnov test 25, 27, 29
kurtosis 95
L
level of significance 21, 24, 26, 27, 28, 29, 30
Logistic distribution 51, 77
Log-Logistic distribution 78
Lognormal distribution 80, 81
lower bound 29, 56, 84
Lower Bounds 11
M
manual data entry 6, 9
maximum likelihood 23, 24
maximum likelihood estimates 23, 31
mean 20, 21, 65, 77, 80
mode 92
moments 23, 24, 25
N
Negative Binomial distribution 82
Normal distribution 51, 52, 58, 69, 77, 79, 80,
84, 95
normalization 35
O
operate 12, 20
Stat::Fit 101 101 101 101
User Guide User Guide User Guide User Guide
P
Pareto distribution 85
Pearson 5 distribution 86
Pearson 6 distribution 87, 88
Poisson distribution 89
Power Function Distribution 90
P-P Plot 39, 53
precision 11, 12
print 44, 45
printer set-up 45
p-value 28
Q
Q-Q Plot 39, 52
R
random number stream 15
random variates 15, 46, 93
rank 31
Rayleigh Distribution 91
reciprocal 86
relative frequency 19, 26
Repopulate 14
report 44, 45
result graphs 25, 34, 52
runs test 21, 22
S
scatter plot 20, 21, 35, 51
Scott 12
skewness 24, 25, 64, 92, 95
standard deviation 21, 80, 84
statistics 16, 18, 19, 20, 21, 23, 26, 27, 50
Sturges 11, 25
T
test statistic 25, 26, 27, 28, 29
transform 13
Triangular distribution 92, 93
truncation 12
U
Uniform distribution 46, 56, 93
V
variance 20, 88
W
Weibull distribution 53, 65, 84, 94, 95
102 102 102 102

You might also like