MTH-262 Statistics and Probability Theory

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

MTH-262 Statistics and Probability Theory

Lecture 03
Outlines of Today’s Lecture
• In this lecture we will discuss following topics:
• Data Presentation: Frequency distribution.
• Measures of Central Tendency: Arithmetic Mean.
• Getting Started with R
How to Construct a Frequency Distribution
1: Sort the data in ascending order.
2: Calculate the range of data.
Data Presentation 3: Decide on the number of intervals in the
frequency distribution.
4: Determine the intervals.
5: Tally and count the observations under each
Frequency Distribution interval.

Presenting number of observations according to their class is known


as the frequency distribution. No. of Classes=Range/h
Range=Xm-X0
Example (Revisited). h=Class Interval
Measures of Central Tendency
Any measure of Central tendency is a single value that attempts to
describe a set of data by identifying the central position within that set
of data.
Measures of Central Tendency
Any measure of Central tendency is a single value that attempts to
describe a set of data by identifying the central position within that set
of data.
Mathematical Averages
Mean (Arithmetic Mean for Ungroup Data):
The mean of a data set is the sum of the observations divided by the
number of observations. The most used measure of center is the
mean. When people speak of taking an average, they are most often
referring to the mean.

Mean
For a variable x, the mean of the observations for a
sample is called a sample
mean and is denoted x¯ . Symbolically,

σ𝑛𝑖=1 𝑋𝑖
𝑋ത =
𝑛
Properties of Arithmetic Mean

i) Sum of the deviations of observations from their mean is always zero.


σ𝑛𝑖=1 𝑋𝑖 − 𝑋ത = 0

ii) Sum of squared deviations of values from their mean is always minimum
(comparative to any other average, A).
σ𝑛𝑖=1 𝑋𝑖 − 𝑋ത 2
< σ𝑛𝑖=1 𝑋𝑖 − 𝐴 2

iii) Change of origin and scale affect the mean.


𝑌 = 𝑎 + 𝑏𝑋 → 𝑌ത = 𝑎 + 𝑏𝑋ത

iv) The combined mean for more than one mean can be presented as:
ത ത
𝑛 𝑋 +𝑛 𝑋 +⋯+𝑛𝑘 𝑋𝑘 ത
𝑋ത𝑐 = 1 1 2 2
𝑛1 +𝑛2 +⋯+𝑛𝑘
Properties of Arithmetic Mean
i) Sum of the deviations of observations from their mean is always zero.
σ𝑛𝑖=1 𝑋𝑖 − 𝑋ത = 0

ii) Sum of squared deviations of values from their mean is always minimum
(comparative to any other average, A).
σ𝑛𝑖=1 𝑋𝑖 − 𝑋ത 2
< σ𝑛𝑖=1 𝑋𝑖 − 𝐴 2

X X-Mean Squared Errors X X-A, A=14 Squared Errors


16 0.556 0.309 16 2 4
17 1.556 2.421 17 3 9
10 -5.444 29.637 10 -4 16
13 -2.444 5.973 13 -1 1
20 4.556 20.757 20 6 36
18 2.556 6.533 18 4 16
13 -2.444 5.973 13 -1 1
14 -1.444 2.085 14 0 0
18 2.556 6.533 18 4 16

Mean=15.444 Sum of Errors = 0.004 Sum of Squared Errors = 80.222 Mean=15.444 Sum of Errors=13 Sum of Squared Errors = 99
Properties of Arithmetic Mean
iii) Change of origin and scale affect the mean.
𝑌 = 𝑎 + 𝑏𝑋 → 𝑌ത = 𝑎 + 𝑏𝑋ത

iv) The combined mean for more than one mean can be presented as:
ത ത
𝑛 𝑋 +𝑛 𝑋 +⋯+𝑛𝑘 𝑋𝑘 ത
𝑋ത𝑐 = 1 1 2 2
𝑛1 +𝑛2 +⋯+𝑛𝑘

X Y=2+3X X1 X2 X3
16 50 16 15 12
17 53 17 16 16
10 32 10 14 20
13 41 13 13 18
20 12 12
20 62
18 13 12
18 56
13 18 15
13 41 14 18 16
14 44 18 20 20
18 56
Mean(X1) = 15.444 Mean(X2) = 15.444 Mean(X3) = 15.667
Mean=15.444 Mean(Y) = 2+3(15.444)
Mean(Y) = 48.333 Combined Mean = 15.518
Mathematical Averages
Mean (Arithmetic Mean for Grouped Data)
When the data is presented in terms of classes and their
frequencies, slightly different formula is applicable to find out the
average.

Class Interval Class Mid Point (X) Frequency (f) f*X


1.5 - 1.9 1.7 2 3.4
2.0 - 2.4 2.2 1 2.2
2.5 - 2.9 2.7 4 10.8
σ 𝑓𝑋 132.5
3.0 - 3.4 3.2 15 48 𝑀𝑒𝑎𝑛 = = = 3.3125
σ𝑓 4
3.5 - 3.9 3.3 10 33
4.0 - 4.4 4.2 5 21
4.5 - 4.9 4.7 3 14.1

Sum(f) = 40 Sum(f*X)=132.5
Mathematical Averages
Trimmed Mean
A trimmed mean is computed by “trimming away” a certain percent
of both the largest and the smallest set of values. For example, the 10% trimmed
mean is found by eliminating the largest 10% and smallest 10% and computing the
average of the remaining values.

10% trimmed mean for the with-nitrogen group:

0.32 + 0.37 + 0.47 + 0.43 + 0.36 + 0.42 + 0.38 + 0.43


𝑋ത𝑡𝑟(10) =
8

10% trimmed mean for the with-nitrogen group:

0.32 + 0.37 + 0.47 + 0.43 + 0.36 + 0.42 + 0.38 + 0.43


𝑋ത𝑡𝑟(10) =
8
Getting Start with R
Contents of First Chapter (Using R for Introductory Statistics)
Getting Started
• R’s Command Line
Getting Started
Variables
• R can be used like a calculator. But it really is an environment for statistical
computing and graphics. The power of R goes well beyond that of a graphing
calculator. One immediate difference is the ability to assign names to values.
• In R this is done with an assignment operator. We use the left arrow for
assignment. In RStudio, there is a keyboard shortcut to insert the two-character
<-, which for Windows users is alt + -. For example, here we assign a value to x
and then refer to x in the subsequent command:
Getting Started
Built in Variables
• R has very few built-in variables. One is “pi” referring to the value π.
• Another is the variable T referring to the logical TRUE value. These names may
have new values bound to them.

Case is Important
• The case of the letters in a variable name is important. There is a distinction
between x and X, or mydata and myData. This is the case with everyday language,
so shouldn’t be surprising, but isn’t always true when using computers.
Getting Started
Functions
• The R language is comprised of numerous built-in functions, providing a rich set
of actions. Several of these functions are for the familiar mathematical
operations:
Getting Started
The Workspace
• After interacting with R one
typically has created several objects
and perhaps functions. Without
doing anything special, R will
maintain these objects in a global
Workspace. When R searches for an
object at the command line, this is
the first place on its path that it will
look.
Getting Started
Data Sets
• Many packages include accompanying data sets. The UsingR package has several
that we will see utilized in the text. This package also calls in, among others, the
HistData package that provides data sets from the history of statistics and data
visualization.
Any Question?

You might also like