TP02 BasicStatistics p1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Data Analysis in Finance

Master in Finance

Class #2
Jorge Caiado, PhD Luís Silveira Santos, PhD
CEMAPRE/ISEG, University of CEMAPRE/ISEG, University of
Lisbon Lisbon
Email: Email:
II. Basic Statistics
Statistical Concepts
Prices and returns
280 .12

240 .08

200 .04

160 .00

120 -.04

80 -.08

40 -.12
95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09

Daily prices for the PSI20 Index over the period Jan, Daily returns for the PSI20 Index over the period
2 1995 - Dec, 31 2009 (3914 obs.) Jan, 2 1995 - Dec, 31 2009

II. Basic Statistics
Simple return
𝑃𝑃𝑡𝑡 − 𝑃𝑃𝑡𝑡−1 𝑃𝑃𝑡𝑡 𝑃𝑃𝑡𝑡 − 𝑃𝑃𝑡𝑡−𝑘𝑘 𝑃𝑃𝑡𝑡
𝑅𝑅𝑡𝑡 = = −1 𝑅𝑅𝑡𝑡 𝑘𝑘 = = −1
𝑃𝑃𝑡𝑡−1 𝑃𝑃𝑡𝑡−1 𝑃𝑃𝑡𝑡−𝑘𝑘 𝑃𝑃𝑡𝑡−𝑘𝑘

(one-period simple return) (k-periods simple return)

Continuously compounded return

𝑟𝑟𝑡𝑡 𝑘𝑘 = 𝑟𝑟𝑡𝑡 + 𝑟𝑟𝑡𝑡−1 + ⋯ + 𝑟𝑟𝑡𝑡−𝑘𝑘+1
= ln 𝑃𝑃𝑡𝑡 − ln 𝑃𝑃𝑡𝑡−1
𝑟𝑟𝑡𝑡 = ln 𝑃𝑃𝑡𝑡 − ln 𝑃𝑃𝑡𝑡−1 + ln 𝑃𝑃𝑡𝑡−1 − ln 𝑃𝑃𝑡𝑡−2 + ⋯
+ ln 𝑃𝑃𝑡𝑡−𝑘𝑘+1 − ln 𝑃𝑃𝑡𝑡−𝑘𝑘
(one-period log return) = ln 𝑃𝑃𝑡𝑡 − ln 𝑃𝑃𝑡𝑡−𝑘𝑘

(k-periods log return)

Annualized return
𝑃𝑃0 (1 + 𝑅𝑅𝑡𝑡𝐴𝐴 )𝑘𝑘 = 𝑃𝑃𝑛𝑛 ⇔ 𝑅𝑅𝑡𝑡𝐴𝐴 = −1
II. Basic Statistics
Fundamental Concepts
- Population: all members of a specified group;
- Population parameter (or simply, parameter): a quantity computed from or used
to describe a population;
- Sample: a subset of a population;
- Sample statistic (or simply, statistic): a quantity computed from or used to
describe a sample.

Measurement scales
- Nominal: categorize data, but do not rank them (ex.: investment strategies);
- Ordinal: categorize data and order them with respect to some characteristic (ex.:
- Interval: ordinal scales characteristics + the difference between scale values are
equal, i.e., addition and subtraction is meaningful (ex.: Celsius and Fahrenheit
- Ratio: interval scales characteristics + there is a true “zero point” defined as the
origin (ex.: money).
II. Basic Statistics

Exercise: State the scale of measurement for each of the following (Source:
DeFusco et al., 2015):
1. Credit ratings for bond issues;
2. Cash dividends per share;
3. Hedge fund classification types;
4. Bond maturity in years.

II. Basic Statistics
Construction of a frequency distribution
1. Sort the data in ascending order;
2. Calculate the range of the data, defined as 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 = 𝑀𝑀𝑀𝑀𝑀𝑀 − 𝑀𝑀𝑀𝑀𝑀𝑀;
3. Decide on the number of intervals (𝑘𝑘) in the frequency distribution;
4. Determine the interval width as 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅/𝑘𝑘;
5. Determine the intervals by successively adding the interval width to the
minimum value, to determine the ending points of intervals, stopping after
reaching an interval that includes the maximum value;
6. Count the number of observations falling in each interval;
7. Construct a table of the intervals listed from the smallest to the largest that
shows the number of observations falling in each interval

Exercise: Construct the frequency table using the following values (Source:
DeFusco et al., 2015):
−4.57, −4.04, −1.64, 0.28, 1.34, 2.35, 2.38, 4.28, 4.42, 4.68, 7.16 and 11.43

II. Basic Statistics
Measures of Central Tendency
- (Arithmetic) Mean: it is the sum of the observations divided by the number of
𝑥𝑥̄ = � 𝑥𝑥𝑖𝑖

- Median: it is the value of the middle item of a set of items that has been sorted
into ascending or descending order. In an odd-numbered set of 𝑛𝑛 items, the
median occupies the (𝑛𝑛 + 1)/2 position. In an even-numbered set of 𝑛𝑛 items,
the median is defined as the mean of the values of the items occupying the 𝑛𝑛/2
and (𝑛𝑛 + 2)/2 positions (the two middle items).

- Mode: it is the most frequently occurring value in a set.

II. Basic Statistics
Other concepts of the mean
- Weighted Mean:
𝑥𝑥̄ 𝑊𝑊 = ∑𝑛𝑛𝑖𝑖=1 𝑤𝑤𝑖𝑖 𝑥𝑥𝑖𝑖 , where ∑𝑛𝑛𝑖𝑖=1 𝑤𝑤𝑖𝑖 = 1, with 𝑤𝑤1 , 𝑤𝑤2 , … , 𝑤𝑤𝑛𝑛 being the weights

- Geometric Mean:
𝑥𝑥̄ 𝐺𝐺 = 𝑛𝑛 ∏𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 = 𝑛𝑛 𝑥𝑥1 𝑥𝑥2 … 𝑥𝑥𝑛𝑛 , with 𝑥𝑥𝑖𝑖 ≥ 0, for 𝑖𝑖 = 1,2, … , 𝑛𝑛
Note: If 𝑟𝑟𝑖𝑖 < 0, add a quantity such that the new value becomes positive (or equal
to zero)

- Harmonic Mean:
1 𝑛𝑛
𝑥𝑥̄ 𝐻𝐻 = = ∑𝑛𝑛 , with 𝑥𝑥𝑖𝑖 > 0, for 𝑖𝑖 = 1,2, … , 𝑛𝑛
(1/𝑛𝑛) ∑𝑛𝑛
𝑡𝑡=1 1/𝑥𝑥𝑖𝑖 𝑡𝑡=1 1/𝑥𝑥𝑖𝑖
The harmonic mean can be viewed as a special type of weighted mean, in which an
observation’s weight is inversely proportional to its magnitude. This type of mean is
more appropriate when averaging ratios, when they are repeatedly applied to a
fixed quantity to yield a variable number of units (ex.: cost averaging).

II. Basic Statistics
Measures of location: quantiles
A quantile is a statistical concept used to divide a set into equal sized intervals of
subsets, each containing an (approximately) equal portion of the data. The most
used quantiles are:

- Quartiles, dividing the set into four equal parts;

- Quintiles, dividing the set into five equal parts;
- Deciles, dividing the set into ten equal parts;
- Percentiles (dividing the set into hundred equal parts).

II. Basic Statistics
How to compute the percentiles?
1. Sort the set in ascending order;

2. Compute the location:

𝐿𝐿𝑦𝑦 = 𝑛𝑛 + 1 , where 𝑦𝑦 is the percentage point at which we are dividing the set
- If the location is a whole number, it corresponds to an actual observation;
- If the location is not a whole number, it lies between the two closest integer
numbers (one above, 𝑥𝑥𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 , and one below, 𝑥𝑥𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 ).

3. Compute the percentile:

- If the value of the location is a whole number, the percentile 𝑃𝑃𝑦𝑦 corresponds to
the value of the observation;
- If the value of the location is not a whole number, the percentile 𝑃𝑃𝑦𝑦 is calculated
using linear interpolation:
𝑃𝑃𝑦𝑦 = 𝑥𝑥𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 + 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝐿𝐿𝑦𝑦 × (𝑥𝑥𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 − 𝑥𝑥𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 ), where 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝐿𝐿𝑦𝑦 is the fractional (or
decimal) part of 𝐿𝐿𝑦𝑦 .

II. Basic Statistics
Measures of Dispersion
- Range: it is the difference between the maximum and the minimum in a set
𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 = 𝑥𝑥𝑚𝑚𝑚𝑚𝑚𝑚 − 𝑥𝑥𝑚𝑚𝑚𝑚𝑚𝑚
- Mean absolute deviation:
𝑀𝑀𝑀𝑀𝑀𝑀 = � 𝑥𝑥𝑖𝑖 − 𝑥𝑥̄

- Variance and standard deviation:

𝑛𝑛 𝑛𝑛
1 1
̄ 2
𝑠𝑠 = �(𝑥𝑥𝑖𝑖 − 𝑥𝑥) 𝑠𝑠 = ̄ 2
�(𝑥𝑥𝑖𝑖 − 𝑥𝑥)
𝑛𝑛 𝑛𝑛
𝑖𝑖=1 𝑖𝑖=1

- Bias-corrected Variance and bias-corrected standard deviation:

𝑛𝑛 𝑛𝑛
1 1
𝑠𝑠′ = ̄ 2
�(𝑥𝑥𝑖𝑖 − 𝑥𝑥) 𝑠𝑠′ = ̄ 2
�(𝑥𝑥𝑖𝑖 − 𝑥𝑥)
𝑛𝑛 − 1 𝑛𝑛 − 1
𝑖𝑖=1 𝑖𝑖=1

Note: the bias-corrected variance has better statistical properties than the variance.
II. Basic Statistics
Measures of Symmetry and Skewness
- Skewness
𝑛𝑛 (𝑥𝑥𝑖𝑖 − 𝑥𝑥)̄ 3
𝑆𝑆𝐾𝐾 = �
(𝑛𝑛 − 1)(𝑛𝑛 − 2) 𝑠𝑠 3

Positively skewed: 𝑆𝑆𝐾𝐾 > 0, mode < median < mean

Negatively skewed: 𝑆𝑆𝐾𝐾 < 0, mode > median > mean
No skewness (or symmetric): 𝑆𝑆𝐾𝐾 = 0, mode = median = mean

II. Basic Statistics
Measures of Kurtosis
- Kurtosis
𝑛𝑛(𝑛𝑛 + 1) (𝑥𝑥𝑖𝑖 − 𝑥𝑥)̄ 4
𝐾𝐾 = �
(𝑛𝑛 − 1)(𝑛𝑛 − 2)(𝑛𝑛 − 3) 𝑠𝑠 4

- Excess Kurtosis
𝑛𝑛(𝑛𝑛 + 1) (𝑥𝑥𝑖𝑖 − 𝑥𝑥)̄ 4 3 𝑛𝑛 − 1 2
𝐾𝐾𝐸𝐸 = � 4

(𝑛𝑛 − 1)(𝑛𝑛 − 2)(𝑛𝑛 − 3) 𝑠𝑠 (𝑛𝑛 − 2)(𝑛𝑛 − 3)

Mesokurtic distribution: 𝐾𝐾𝐸𝐸 = 0.

Leptokurtic distribution: 𝐾𝐾𝐸𝐸 > 0, more peaked than a Normal distribution; fatter
tails, more frequent extremely large deviations from the mean than in a Normal
Platikurtic distribution: 𝐾𝐾𝐸𝐸 < 0, less peaked than a Normal distribution; thinner tails,
less frequent extremely large deviations from the mean than in a Normal
II. Basic Statistics
Measures of Kurtosis (cont.)

Rule of thumb: if 𝐾𝐾𝐸𝐸 = ±1, then excess kurtosis is unusually large.

II. Basic Statistics

Exercise: Consider the annual total returns on the MSCI German Index from 1993
to 2002 (Source: DeFusco et al., 2015)


You might also like