Materi Uts (Baru)

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 92

WELCOME

TO ATTEND LECTURES
PROBABILITY AND
STATISTICS

SOEWONO
LEKTOR, NIDN: 0408014101

SCHOOL OF ELECTRICAL ENGINEERING


TELKOM UNIVERSITY
PROBABILITY AND STATISTICS

The word probability has no precise


definition, these questions have been
debated for several hundred years,
WHAT IS does not dictate a unique definition
PROBABILITY of “probability”.

The term probability refers to the


study of randomness and uncertainty.

Probability is a mathematical
discipline, the aim of which is to
describe random experiments

03/06/2021 PROBABILITY AND STATISTICS 2


IS DAUNTING AND CONFUSING

Is a hard subject.
Students find it
hard.
Is a difficult
subject
PROBABILITY Teachers find it
hard.
Text book writers
find it hard.

Is difficult – very difficult


to apprehend.

03/06/2021 PROBABILITY AND STATISTICS 3


DETERMINISTIC
EXPERIMENT
RANDOM/
STOCHASTIC

THE STUFF
OF RANDOM EXPERIMENT
PROBABILITY (CHANCE EXPERIMENT)

EVENT SAMPLE SPACE


03/06/2021 PROBABILITY AND STATISTICS 4
PROBABILITY
An experiment is a process that results
in an outcome
The set of all possible
A subset of a outcomes of a random
sample space is PROBABILITY experiment is called the
called an event sample space, and it is
denoted by 

An experiment is called a random experiment, if its


outcome cannot be predicted in advance with
certainty.

An element in 
is called a
sample point
Each outcome of a
random experiment
correspondent to a
sample point

03/06/2021 PROBABILITY AND STATISTICS 5


EVENT
Events are the primary elements of Probability

Events, and only events,


have probabilities. So that
EVENT the concept of event is
fundamental to the theory
of probability
We shall generally use a
capital letter near the
beginning of the alphabet to
denote an EVENT
Simple Event/Elementary Event, or Atomic event: If it is consists of exactly one outcome
Compound Events/Composite Events: If it consists of more than one outcome
SURE EVENT/CERTAIN EVENT
NULL EVENT/EMPTY EVENT/IMPOSSIBLE EVENT
MUTUALLY EXCLUSIVE EVENTS
EXHAUSTIVE EVENTS
INDEPENDENT EVENTS

03/06/2021 PROBABILITY AND STATISTICS 6


INTERPRETATIONS OF PROBABILITY

The term probability has four interpretations.

AXIOMATIC

THE APPROACHES
OF PROBABILITY SUBJECTIVE

OBJECTIVE

CLASSICAL EMPIRICAL
(equally likely) (relative frequency)

03/06/2021 PROBABILITY AND STATISTICS 7


PROBABILITY SPACE
 sample space  is countable, the set of all subsets of 
If
is called a class, which is known as the power set of , is
then the set of all events.

A class of sets P = class = , that contains  and , is


closed under complements and countable unions, is
sometimes called -field or -algebra of subsets of .
The members of such a class are calledevent space.

To each A  , P(A) is a real number calledthe probability of


A.
We symbolize this by writing :
P:  and speak of P mapping into

03/06/2021 PROBABILITY AND STATISTICS 8


 
Definition: Let  be a given sample space and the
correspondingeventspace. A probability
function P is a real-valued function with
domain such that:

axiom 1 : P(A)  0 for every A 


axiom 2 : P() = 1
axiom 3 : If A1, A2, A3, … are mutually exclusive
events, then

The triple (, , P) is called a probability space.

03/06/2021 PROBABILITY AND STATISTICS 9


EXPLORATORY DATA ANALYSIS
In EDA, the collected data is examined
carefully.

STATISTICS STEMPLOT BOX PLOT

CONFIRMATORY DATA ANALYSIS


CDA or Statistical Inference offers us
methods for drawing conclusions from
data.
03/06/2021 PROBABILITY AND STATISTICS 10
PROBABILITY AND STATISTICS

Probability and Statistics are related in a most curious way. In


essence, probability is the vehicle which enables the statistician to
use information in a sample to make inferences or describe the
population from which the sample was obtained.
Probability reasons from the population to the sample, while
statistics acts in reverse, moving from the sample to the
population.
PROBABILITY
deductive reasoning

POPULATION SAMPLE

STATISTICS
inductive reasoning

03/06/2021 PROBABILITY AND STATISTICS 11


STATISTICIAN
Person who studies or
works with
statistics.
STATISTIC
A statistic is a function of
WHAT is the random variable.
The statistic itself is
a random variable.

The art and STATISTICS Welcome to the world of


science of The science of gaining statistics, the art of
learning from information from drawing conclusions from
data. numerical data. imperfect data.

The objective of statistics is to make an


inference about a population based on
information contained in a sample.

03/06/2021 PROBABILITY AND STATISTICS 12


 
Theorem 1 :

Proof :

so,

from

03/06/2021 PROBABILITY AND STATISTICS 13


 
Corollary :

03/06/2021 PROBABILITY AND STATISTICS 14


 Theorem 5:then :

A
B
Proof :

As illustrated in figure above:


 

But,

Combining equations (i) and (ii), gives result:


 𝑃 ( 𝐴 ∪𝐵 )=𝑃 ( 𝐴 )+𝑃 ( 𝐵 ) − 𝑃 ( 𝐴 ∩ 𝐵 ) qed

03/06/2021 PROBABILITY AND STATISTICS 15


  probability of a union of more than two events can be
The
computed analogously.

For three events A, B, C, the result is :

03/06/2021 PROBABILITY AND STATISTICS 16


EXERCISES
  An experiment consists of tossing two dice.
1.
a. Find the sample space 
b. Find the event A that the sum of the dots on the dice
equals 7
c. Find the event B that the sumof the dots on the dice is
greater than 10
d. Find the event B that the sumof the dots on the dice is
greater than 12

2. Given that
Find:

03/06/2021 PROBABILITY AND STATISTICS 17


  Show that:
3.

4. Suppose
where A, B, C are mutually exclusive events.
Determine the numerical values of :

03/06/2021 PROBABILITY AND STATISTICS 18


CONDITIONAL PROBABILITY, BAYES’
THEOREM, AND INDEPENDENCE
• The ideas Independence and Conditional Probability play
a central role in probability theory.
• Conditional Probability is the study of how additional
information can change our notion of how likely another
event is to occur.
• The two notion of Independence and Conditional
Probability are closely related, and this relationship will
also be considered.
• Bayes’ Theorem is a particular application of Conditional
Probability.

03/06/2021 PROBABILITY AND STATISTICS 19


THE CONDITIONAL PROBABILITY

 
Definition:

Analogue:

Hence:


Law of Multiplication

03/06/2021 PROBABILITY AND STATISTICS 20


For
   any event E,

 For two events :

03/06/2021 PROBABILITY AND STATISTICS 21


BAYES THEOREM

  second important result on conditional probability is


The
called Bayes Theorem and used in situations where
quantities of the form and are known and we wish to
determine .

Let be a partition of the sample space and A be an event


in .
From the definition of conditional probability,
and .
While from the theorem on total probability

03/06/2021 PROBABILITY AND STATISTICS 22


 

Next, substituting last equation, we find:

known as Bayes Theorem

The original probabilities are called prior probabilities and


the conditional probabilities is often called the posterior
probabilities.

03/06/2021 PROBABILITY AND STATISTICS 23


Diagram that can be used as an aid in
computing probabilities.

TREE DIAGRAM
Is depict events or sequences
PROBABILITY TREE
of events as branch of a tree.
PROBABILITY TREE DIAGRAM

Is a useful way of indicating the possible


outcomes of a probability experiment. If you
know how to use a tree diagram, then you can
work problem requiring Bayes’ theorem.

03/06/2021 PROBABILITY AND STATISTICS 24


A box contains seven red and 13 blue balls. Two balls
are selected at random and are discarded without
their colors being seen. If a third ball is drawn
randomly and observed to be red, what is the
probability that both of the discarded ball were blue?

03/06/2021 PROBABILITY AND STATISTICS 25


INDEPENDENT EVENTS

If two events are unrelated so that the occurrence


(or non-occurrence) of one of the events doesn’t
affect the likelihood of the other event, the events
are called independent.

We want to express this idea mathematically.

03/06/2021 PROBABILITY AND STATISTICS 26


 
Definition:
The events A and B will be called independent if

Mathematically:

Events that are independent, sometimes called:


statistically independent,
stochastically independent, or
independent in a probability sense.
03/06/2021 PROBABILITY AND STATISTICS 27
 
Theorem:
If A and B are independent events, then the following pairs of
event are also independent:
a) and
b) and
c) and
Proof: If , then:
Thus,

Proof for part b and c are left as exercises.

03/06/2021 PROBABILITY AND STATISTICS 28


 
Definition:
Events A, B, and C are mutually independent if the
following two conditions hold:
a) They are pairwise independent; that is

03/06/2021 PROBABILITY AND STATISTICS 29


MUTUALLY EXCLUSIVE EVENTS VERSUS
PROBABILISTICALLY INDEPENDENT EVENTS

There is tendency to equate the concepts “mutually


exclusive” and “probabilistically independent”.
This is FALLACY.

Mutually exclusive events can never be


probabilistically independent, and VICE VERSA.

03/06/2021 PROBABILITY AND STATISTICS 30


  illustrate, suppose that A, B are events for which
To
and .
If A and B are mutually exclusive, then and .
On the other hand, if A and B are probabilistically
independent, then:

Clearly, both of these equations cannot be true


simultaneously.
i. If A, B are probabilistically independent, they are not
MEE.
ii. If A, B are MEE, they are not probabilistically
independent.

03/06/2021 PROBABILITY AND STATISTICS 31


Exercises

  Events A and B are independent, with


1. Given that:
and What is ?

2. If events S and T have equal probability and are


independent with , then

03/06/2021 PROBABILITY AND STATISTICS 32


APPLICATION OF
INDEPENDENCE OF EVENTS
THE SERIES
C1 C2 Cn
SYSTEM

THE SIMPLEST
CONFIGURATIO
N

C1

C2
THE PARALLEL
SYSTEM

Cn

03/06/2021 PROBABILITY AND STATISTICS 33


THE SERIES SYSTEM

C1 C2 Cn

03/06/2021 PROBABILITY AND STATISTICS 34


THE PARALLEL SYSTEM

 For, the system reliability is:


C1

C2

Thus for , the system reliability is the probability


that both components do not fail simultaneously,
Cn

The expression for general is:

03/06/2021 PROBABILITY AND STATISTICS 35


EXERCISES

1. Find the reliability for the given system

A D
0.95 0.90

B E
0.90 0.90 G
0.95

C F
0.90 0.90

03/06/2021 PROBABILITY AND STATISTICS 36


EXERCISES

2. A system consists of components connected as shown in


the following figures. Find the value of R if the reliability
of system is given as 0.92. Assume that all components
have equal reliability.

(a) R R R
(b)

R (c) R R

R R
R

03/06/2021 PROBABILITY AND STATISTICS 37


EXERCISES

3. Determine the reliability of the system in the following


figure

A C

B C E
A

D D

 𝐴 =0.90 𝐵=0.87 𝐶 =0.92 𝐷 =0.95 𝐸 =0.85

03/06/2021 PROBABILITY AND STATISTICS 38


EXERCISES

4. The circuit shown below operates only if there is a path of


functional devices from left to right. The probability that
each device function is shown on the graph. Assume that
devices fail independently. What is the probability that
the circuit operates?

0.9

0.95
L 0.9 0.99 R

0.95
0.9

03/06/2021 PROBABILITY AND STATISTICS 39


5. The circuit below operates if and only if there is a path of
functional devices from left to right. Assume that devices
fail independently and that probability of failure of each
device is as shown. What is the probability that the circuit
operates?

0.02

0.01 0.01
L R
0.01 0.01

0.02

03/06/2021 PROBABILITY AND STATISTICS 40


COUNTING TECHNIQUES

  solve a probability problem by counting the number of


To
points in the sample space without actually listing each
element.

In a probability space,where the outcomes are equally


likely,

The fundamental principle of counting, we


will call: The fundamental principle of counting
or multiplications principle.

03/06/2021 PROBABILITY AND STATISTICS 41


 
Theorem:
The FundamentalPrinciple
If operation A can be performed in different ways and
operation B in different ways, the sequence (operation A,
operation B)can be performed in different ways.

Corollary:
If operation can be performedin ways,
respectively, then the sequence (operation , operation ,
… ,operation ) can be performed in
ways.

03/06/2021 PROBABILITY AND STATISTICS 42


The basic counting results that have direct application to
probability problems is combinatorial mathematics
(calculus of counting). Combinatorics is a branch of
mathematics that deals with various kinds of enumeration
problems.

03/06/2021 PROBABILITY AND STATISTICS 43


COMBINATORIAL MATHEMATICS
(calculus of counting)
COMBINATIONAL MATHEMATICS
(Calculus Of Counting)

PERMUTATIONS COMBINATIONS
Permutations/Lineups Combinations/selections or
Committees

IS AN ARRANGEMENT OF
ARE PERMUTATIONS IN WHICH
THE OBJECTS OF A SET IN A
ORDER IS NOT IMPORTANT
PARTICULAR ORDER
A permutation is an arrangement of A combination is a selection of
objects in a difinite order. objects without regard to order.

03/06/2021 PROBABILITY AND STATISTICS 44


NOTATION
To denote the number of
permutations taken from
a set of elements

WE USE
NOTATION

To denote the
combinations of
Elements taken from
A set of elements

03/06/2021 PROBABILITY AND STATISTICS 45


WR versus WOR

WITHOUT REPLACEMENT/WOR
Sampling WOR, means that the
selected item is removed and
cannot be selected again.

SAMPLING
WITH REPLACEMENT/WR
Sampling WR means that the selected
item is replaced and can be selected
again

03/06/2021 PROBABILITY AND STATISTICS 46


Number of possible arrangements
of size from objects

WOR WR

ORDERED

UNORDERED

Proof discuss it in class

03/06/2021 PROBABILITY AND STATISTICS 47


EXERCISES

1. A bag contains 9 balls, two of which are red, three blue


and four black. Three balls are drawn from the bag at
random. Every ball has an equal chance of being
included in the three. What is the chance (probability)
that:
a) three balls are of different colour?
b) two balls are of the same colour and the third is
different?
c) the balls are of the same colour?

2. If five boys and five girls sit in a row in a random order,


what is the probability that no two children of the same
sex sit together?

03/06/2021 PROBABILITY AND STATISTICS 48


3. A box contains 10 white balls, 20 reds and 30 greens.
Draw 5 balls WOR. Find the probability that:
a) the sample contains 3 white or 2 red or 5 green
b) all 5 balls are the same color.

4. Four married couples have bought 8 seats in a row for a


concert. In how many different ways can they be seated
a) with no restrictions?
b) if each couple is to sit together?
c) if all the men sit together to the right of all the
women?
5. If 3 people are picked from a group of 4 married couples,
what is the probability of not including a pair of spouses?

03/06/2021 PROBABILITY AND STATISTICS 49


RANDOM VARIABLE
The idea of a random variable is started with GAMBLING.
For a given trial, the experimental outcome determines how
much money you win or lose. Thus, an experimental
outcome determines a number. This is a random variable.
Hopefully we can put this number to be used better than
gambling.
The outcome can be from a continuum, or it can be from a
finite (discrete) set.
Thus, there are three possible types of domain for random
variable: discrete, continuous, and mixed, depending on
whether the sample space is discrete or continuous.

03/06/2021 PROBABILITY AND STATISTICS 50


WHAT IS RANDOM ABOUT A RANDOM VARIABLE?

  term “random” is used because the value of a random


The
variable usually cannot be predicted in advance.
Usually, by convention, random variable are denoted by
uppercase letternear the endof the alphabet:

Lowercase better (eg: ) refer to an observation, a value


obtained from a function. Other terms in common use are:
CHANCE VARIABLE, STOCHASTIC VARIABLE,VARIATE.
In German: ZUFÄLLIGE GRÖSSE
In French: VARIABLE ALEATOIRE

03/06/2021 PROBABILITY AND STATISTICS 51


  events have probabilities, so we have to set up a
Only
structure of events.
If is a random variable, then for each real number , we want
{} to have a probability.
Thus {} must be interpreted as describing an event, a set of
elementary event.

Definition : Let  be the sample space of an experiment. A


real-valued function :   is called a random
variable of the experiment if for each interval , is
an event.

In probability, the set is often abbrreviated as .

03/06/2021 PROBABILITY AND STATISTICS 52


 
Another version:
Definiton : A random variable is real valued function defined
on a sample space . The value of the function at
each sample point is denoted by .

The set of values is called range and is denoted RX

03/06/2021 PROBABILITY AND STATISTICS 53


 random variable is a real valued function defined on a
A
sample space . Mathematically, a random variable is a
mapping:

where the domain  is a sample space and is codomain, the


set of real numbers. Collection of all numbers {} is called the
range of the random variable, is subset of the set of all real
numbers.
Mathematically:
Often the notation is shortened to by which we mean the
event
We write to represent the probability of the event

03/06/2021 PROBABILITY AND STATISTICS 54


EVENTS DEFINED BY RANDOM VARIABLES
 is a random variable, and is a fixed real number, we can define
If
the Event
Similarly, for fixed number we can define the following event :

These events have probabilities that are denoted by:

03/06/2021 PROBABILITY AND STATISTICS 55


EXAMPLE
  experiment of tossing a coin three times, the sample space 
The
consists of eight equally likely sample points  = {HHH,…,TTT}
If X is the random variable giving the number of heads obtained,
find :

Solution :
(a) Let A  be the event defined by X = 2, then we have:

Since the sample point are equally likely, we have

(b) Let B  be the event defined by X <2, then we have:

03/06/2021 PROBABILITY AND STATISTICS 56


DISTRIBUTION FUNCTIONS
 
Definition: The distribution function (Cumulative-Distribution
Function, denoted by CDF) of is the function definedby:

Properties of
1.  from definition

Property (5) indicates that F(x) is continuous on the right.

03/06/2021 PROBABILITY AND STATISTICS 57


 
properties:

X is a discrete random variable,


DISCRETE RANDOM VARIABLES only if its range contains a finite or
countable infinite number of point

RANDOM VARIABLES
MIX RANDOM VARIABLES

CONTINUOUS RANDOM
VARIABLES X is a continuous random
  𝑑𝐹 (𝑥 ) variable, only if its range
pdf : 𝑓 𝑋 ( 𝑥 ) = contains an interval of real
𝑑𝑥 numbers
  
Properties:

03/06/2021 PROBABILITY AND STATISTICS 58


FORMULA FOR CALCULATING PROBABILITIES
 OF CERTAIN EVENTS USING THE CDF

No Event Formula for Probability of the Event


1 Height of jump of graph of at
2
3
4
5
6
7
8
9

Relation between CDF and pdf 𝑥 𝑥


   𝑑 𝑑
𝐹 ( 𝑥 )=𝑃 ( 𝑋 ≤ 𝑥 ) =∫ 𝑓 ( 𝑡 ) 𝑑𝑡 𝐹 ( 𝑥 )= ∫ 𝑓 ( 𝑡 ) 𝑑𝑡=𝑓 (𝑥)
−∞
𝑑𝑥 𝑑𝑥 −∞
03/06/2021 PROBABILITY AND STATISTICS 59
Exercises

  Consider the experiment of throwing a fair die. Let X be


1.
the random variable which assigns 1 if the number that
appears is even and 0 if the number that appears is odd.
a. What is the range of X
b. Find and

 2. Consider the experiment of tossing a coin three times.


Let X be the random variable giving the number of heads
obtained. We assume that the tosses are independent
and the probability of head is p.
a. What is the range of X
b. Find the probabilities , ,
, and

03/06/2021 PROBABILITY AND STATISTICS 60


Exercises

  Consider the function given by:


3.

a. Sketch
b. If X is the random variable whose CDF is given by ,
find:

03/06/2021 PROBABILITY AND STATISTICS 61


Exercises

  Suppose a discrete random variable X, has the following


4.
pmf:
a. Find and sketch the CDF
b. Find:

03/06/2021 PROBABILITY AND STATISTICS 62


Exercises

  The pdf of a continuous random variable X is given by:


5.

a. Find the corresponding CDF


b. Sketch and

03/06/2021 PROBABILITY AND STATISTICS 63


FUNCTION OF ONE RANDOM VARIABLE

Suppose
   that a random variable has a probability
density function (pdf) . We often need to find the pdf
of for some given function .
We say that is a strictly monotonic-increasing
function of if and only if , whenever
is a strictly monotonic-decreasing function if and
only if , whenever
Strictly monotonic function is always increasing or
decreasing as a function of .

03/06/2021 PROBABILITY AND STATISTICS 64


Problem
   in this topic is:
If given random variable that has pdf and given
function , we can find pdf and CDF of . Next, we can
calculate and .

03/06/2021 PROBABILITY AND STATISTICS 65


 𝑦=log 𝑥 ; 𝑦 =√ 𝑥 ; 𝑥> 0
INCREASING

  𝑦=𝑎+𝑏𝑥 → 𝑖𝑓 𝑏 >0
STRICTLY MONOTONIC
INVERSE FUNCTION
FUNCTION

  𝑦=𝑎+𝑏𝑥 → 𝑖𝑓 𝑏 <0
DECREASING

 𝑦 = 1 ; 𝑦 =𝑒 − 𝑥 ; 𝑥 >0
𝑥
03/06/2021 PROBABILITY AND STATISTICS 66
  long as is strictly monotonic function of , it is possible to
As
find the unique value of corresponding to nay fixed values of
; this value will be denoted .

is the inverse function and will always exist when is strictly


monotonic.

03/06/2021 PROBABILITY AND STATISTICS 67


 
Theorem:
Suppose is a discrete random variable and is a one-to-one
transformation. Then,

Proof:

qed

Since is one-to-one, is set with only a single element

Hence,

03/06/2021 PROBABILITY AND STATISTICS 68


Example

 
Suppose is the linear operator.

As an illustration, suppose possesses the pmf:

If , then possesses the pmf:

03/06/2021 PROBABILITY AND STATISTICS 69


THE AFFINE TRANSFORMATION

 
Consider the affine transformation , where . For a given
value of , the corresponding value of lies on the line .

First, suppose , then:

For :
why?

03/06/2021 PROBABILITY AND STATISTICS 70


  differentiating with respect to ,
Next,
gives the pdf for .

For

For

Combining the two results yields:

03/06/2021 PROBABILITY AND STATISTICS 71


EXERCISES

  Given pdf of
1.
If , find pdf of

2. Let be a random variable with distribution function and


let , where .
Derive the distribution for .
3. Given:
Find the distribution function for and
pdf for

03/06/2021 PROBABILITY AND STATISTICS 72


  The pmf of a random variable is given as:
4.

Determine the pmf of , if

5. For the same as given in problem 4, determine the pmf


of , if

03/06/2021 PROBABILITY AND STATISTICS 73


 
Theorem:
Suppose is a continuous random variable, is differentiable
for all , and is either strictly increasing or strictly decreasing
for all . Then is a continuous random variable with pdf:

where and are the limiting values described prior to


statement of the theorem.

03/06/2021 PROBABILITY AND STATISTICS 74


 
Proof:
To begin with, suppose . First, consider the case where is
increasing.Then,

Because is increasing, the inequality sign did not flip when


applying the inverse.
Differentiation with respect to yields:
(1)
note: is known as the
Jacobian of transformation

03/06/2021 PROBABILITY AND STATISTICS 75


  is decreasing, differentiation yields:
For
(2)

Combining two equations (1) and (2), yields:

03/06/2021 PROBABILITY AND STATISTICS 76


INTRODUCTION TO STATISTICS
Welcome to the world of statistics, the art of drawing
conclusions from imperfect data. In a nutshell, statistics is a
subject in which we learn facts about the real world through
observations.
1. Statistics is the science that deals with the organizing,
summarizing, and interpreting of numerical information,
called data.
2. Statistics is the art of making inferences and drawing
conclusions form imperfect data. Data values are often
imperfect, in that they convey useful information but do
not tell the whole story. Statistical methods can be used
in all parts of a study from beginning to end.

03/06/2021 PROBABILITY AND STATISTICS 77


Imperfections in data arise for many reason :
(a) Measurement error
(b) Limitations of time
We can think of data as containing truth mixed with random
variation.

DATA = TRUTH + RANDOMNESS

What we have Imperfections and errors


that get in the way
What we want

03/06/2021 PROBABILITY AND STATISTICS 78


The original meaning of the word ”Statistics” is “science of
states”, and in its early existence it was also called “political
arithmetic”.
“Statistic”, from the Italian word “statista” meaning
statesman.

EXPLORATORY DATA ANALYSIS


traditionally called descriptive statistics
“one picture is worth more than ten
STATISTICS thousand words”

CONFIRMATORY DATA ANALYSIS


traditionally called statistical inference

03/06/2021 PROBABILITY AND STATISTICS 79


Basically, EDA is concerned with the organizing and
summarizing of numerical observations.
As John Tukey has said, in EDA the statistician acts as a
detective, finding and revealing clues in the data that may
be used for the second step of CDA.
One useful tool that is often used in EDA is a stem-and-leaf-
plot or simply stem plot.
Another names is :
Stem – and – leaf – display
Stem – and – leaf – diagram

03/06/2021 PROBABILITY AND STATISTICS 80


DATA

Scientists make decisions on the basis of data.


Data are collection of any number of related observations on
one ore more variables.
The term data refers to the actual measurements or
observations that result from an investigation or survey.

03/06/2021 PROBABILITY AND STATISTICS 81


Data analysis is the oldest aspect of STATISTICS

Statistics is the science of gaining


information from data

Data are of course Data are numbers


numbers, but they DATA with a context
are more than that

Data can be thought of as the numerical information


needed to help us make a more imformed decision
in a particular situation

03/06/2021 PROBABILITY AND STATISTICS 82


CLASSIFICATION OF DATA

RATIO

DATA
ORDINAL can be measured NOMINAL
on one of four
scales

INTERVAL

03/06/2021 PROBABILITY AND STATISTICS 83


STEM PLOT
One of the quickest ways of visualizing the shape of a
distribution with a minimum of computational effort is the
stem–and–leaf plot.
Assume that each observation xi consists of at least two digit
and a leaf consisting of the remaining digits.

Example
The following are 32 scores on a statistics exam :
61 89 69 87 83 82 99 75 58 78 66 61 71 70 68 70
80 84 65 86 65 64 47 68 93 66 51 80 62 61 63 62

03/06/2021 PROBABILITY AND STATISTICS 84


The stem–and–leaf–display for this sample is
Stem leaf
4 7
5 8 1
6 6 6 1 8 5 5 4 8 6 1 2 1 3 2
7 5 8 1 0 0
8 3 2 6 0 0 9 4 7
9 9 3

03/06/2021 PROBABILITY AND STATISTICS 85


Ordered STEM PLOT

Stem leaf Frequency


4 7 1
5 1 8 2
6 1 1 1 2 2 3 4 5 5 6 6 8 8 9 14
7 0 0 1 5 8 5
8 0 0 2 3 4 6 7 9 8
9 3 9 2

03/06/2021 PROBABILITY AND STATISTICS 86


BOX PLOT

Box–and–whisker display

BOX–PLOT Box–and–whisker plot

Box–and–whisker diagram
BOX PLOT → emphasizes the spread of the data.
It shows, the median, the quartiles, which Tukey calls
Hinges, and the high and low values.

Two vertical ends of the box are quartiles or hinges. The


median is denoted by +. (Q3 – Q1) is called inter quartile
range or H-spread (Hinges–spread).

03/06/2021 PROBABILITY AND STATISTICS 87


The five-number summary
Lower extreme, lower quartile, median, upper
quartile, upper extreme

QUICK BOX-PLOT

SCALE
LQ MD UQ
SMALLEST LARGEST

03/06/2021 PROBABILITY AND STATISTICS 88


THE (REGULAR) BOX PLOT
The box plot is very similar to the quick box plot we have
just learned about, except that it also includes information
about extreme outliers in the data.
What is an outlier?
A value that appears to be atypical in that it seem to be far
removed from the bulk of the data called on outlier or a
“wild” number.

Outlier Criterion
1. Define ONE STEP as the number that is 1,5 times the
inter quartile range.
2. Define the upper outlier threshold/upper fences to be the
upper quartile plus one step.

03/06/2021 PROBABILITY AND STATISTICS 89


The lower outliers threshold/lower fences is defined
correspondingly as the lower quartile minus one step.
3. Any data values that is beyond one of the outlier
thresholds, will be declared an outlier.
To express this procedure, using a mathematical formula; we
would consider a data value X to be an outlier, if either :
X > (upper quartile) + step
Or X < (lower quartile) - step
Where, step = 1,5 x [upper quartile – lower quartile]

03/06/2021 PROBABILITY AND STATISTICS 90


The outlier criterion can be visualized as shown:

Lower OF Q1 Q3 upper OF

Q2

One step IQR One step

two step two step

Lower IF
upper IF

03/06/2021 PROBABILITY AND STATISTICS 91


Definition:
For a ranked data set, any points that line outside the
interval:
A = [Q1 – 1,5 IQR, Q3 +1,5 IQR] and yet are still inside the
larger interval B=[Q1 – 3 IQR, Q3 +3 IQR] are called mild
outlier.
Furthermore, any points in the data set that lie outside
interval B are classified as extreme outliers.
The endpoints of interval A are called IF and the endpoints of
interval B are called OF.

03/06/2021 PROBABILITY AND STATISTICS 92

You might also like