Econ 1006 Summary Notes 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

1

200052 INTRODUCTION TO ECONOMIC METHODS

SUMMARY NOTES - WEEK 1

Required Reading:
Ref. File 1: Section 1.13
Ref. File 3: Introduction and Sections 3.1 to 3.4, 3.7

KEYS TO PASSING THIS UNIT:

(i) Undertake the required reading from the reference


files each week. (It may be necessary to re-read
some sections more than once) – Approximately 4
hours per week.
(ii) Carefully study lecture material and take notice of
advice given in lectures.
(iii) Attempt tutorial exercises before tutorials and work
out where you have difficulties, which hopefully can
be resolved in tutorials.
(iv) Make a conscious effort to keep up with the material
presented.

1. INTRODUCTION TO UNIT

This subject gives an introduction to the basics of statistics


as used in many areas of business and economics. It also
includes an introduction to basic calculus.
2

1.1 How Can We Define Statistics?

There are various definitions of statistics, to be found for


example in dictionaries or textbooks. However there are
common elements to be found in most definitions of
statistics.

Statistics, for our purposes encompasses the following


major activities:

(i) Collection and description of information, or data -


“descriptive statistics”. We will normally be dealing
with a subset of a larger collection or set of data. The
subset is called a sample, the larger set a population.

(ii) Using sample data to make inferences about a


population - “statistical inference”.

1.2 Why Study Statistics?

(i) (Major) It can be useful. It can help us to make


decisions in the face of uncertainty.

(ii) People are bombarded with statistics all the time.


Often statistics is used in ways that are not
warranted. It is important not to be fooled by people
who misuse statistics.

(iii) It is important to have a clear understanding of the


strengths and limitations of statistical analysis.
3

1.3 Structure of the Unit

 Descriptive Statistics:
How we summarise the characteristics of raw data
(using graphs, summary measures, etc.)

 Probability Theory and Probability Distributions


(“deductive statistics”):

Rules (or axioms) for calculating probabilities of


certain things (called events) happening.

Probability theory can be considered part of


descriptive statistics.

Here we will be concerned about making probability


statements about a given population.

 Sampling Theory and Sampling Distributions (the basis


of “inductive statistics”):

Here we will be concerned with making probability


statements about characteristics of samples, given
assumptions about the population from which the
sample was drawn.

 Point and Interval Estimation:


Point Estimation - Here we will be concerned about
producing a particular estimate (a number), based on
sample data, of a characteristic of a population.
4

(For example, using the average height of a sample of


women in Australia as an estimate of the average
height of all women in Australia)

Interval Estimation - Here we will not give an


estimate of a population characteristic, but rather a
range in which we are confident (to some degree) the
true value of the population characteristic is.

 Hypothesis Testing:
Under this heading we will be looking at ways of
testing hypotheses about characteristics of
populations, based on sample data.

This is clearly an example of decision-making (i.e.


rejecting or accepting the hypothesis) under
uncertainty about the population characteristics.

 Regression Analysis:

(Especially relevant for Accounting, Finance and


Economics students (forms the basis of Econometrics)

In this case we will be concerned with estimating


linear relationships between different variables, i.e.
linear equations.

(For example, the relationship between a firm’s


advertising expenditure and its sales revenue)

 Introduction to Differential Calculus


5

2. DESCRIPTIVE STATISTICS

2.1 Some Basic Definitions Relating to Data


(Required Reading: Ref. File 3 - Introduction and Sections
3.1, 3.2)

(i) Elementary Units and Frames:

Statistical data normally represents measurements or


observations of a certain characteristic or variable (e.g.
height) of interest of each member of a set of objects or
people.

Each object (or person) for which the characteristic is or


can be measured is called an elementary unit (e.g. a person
in Australia).

The set or listing of all possible elementary units is called a


frame.

(ii) Population/Sample:

A statistical population is the set of measurements or


observations of a characteristic of interest for all
elementary units in a frame. (For example, the heights of
all males in Australia, not the males themselves)

A population may comprise a finite or infinite number of


elements (observations), depending on the context.

A statistical sample is a subset of a population.


6

Note: There is nothing intrinsic about elements in a


population that makes them a population. It is purely a
matter of how we choose to define a population. For
example, say we define a population to be the set of heights
of all people in Australia. Then the set of heights of people
in the university would represent a sample of the
population.

Hence whether we are talking about a population or a


sample depends on how the population has been defined.

(iii) Parameters/Statistics:

For our purposes -


The numerical characteristics which describe a population
(e.g. the average height of all women in Australia) are
called parameters of the population.

The numerical values calculated from sample data are


called sample statistics. These sample statistics can be
thought of as describing or characterizing the sample.

(iv) Qualitative and Quantitative Variables:

Populations may be quantitative or qualitative. Data from


quantitative populations is called quantitative or interval
data. Data from qualitative populations is called
qualitative, nominal or categorical data.

Data from a quantitative population can be expressed


numerically in a meaningful way. The variable (or
7

characteristic) associated with a quantitative population is


called a quantitative variable.

Examples of quantitative variables: height of an


individual, income of a household, number of cars owned
by a household.

Data from qualitative populations cannot be expressed


numerically in a meaningful way. The variable (or
characteristic) associated with a qualitative population is
called a qualitative or categorical variable.

Examples of qualitative variables: gender of an individual,


hair colour of an individual, brand of car driven by an
individual.

Note: Just because we assign a numerical code to a


qualitative variable does not mean the variable is
quantitative. (For example, if a variable is gender, we
could code males 0 and females 1, but this coding conveys
no meaning in itself)

(v) Discrete and Continuous Quantitative Variables:

A discrete quantitative variable can assume only certain


discrete numerical values (on the number line); i.e. there
are gaps between the various values. Depending on the
variable, there could be a finite or infinite number of these
discrete values.

Examples: number of children in a family, number of days


an individual works during a year.
8

A continuous quantitative variable can assume any value


in a specific range or interval. The interval can be of finite
or infinite width.

Example: height or weight of an individual.

Note: By definition there are an infinite number of values


a continuous variable can take.

2.2 Frequency Distributions

(a) Introduction

Suppose we have a set of raw statistical data (i.e.


observations on some variable (or characteristic) for a
collection of elementary units). At this stage we will make
no distinction as to whether we are talking about a
statistical population or sample.

In studying the data it is often useful to initially group the


raw data into different classes or categories. A frequency
distribution for a set of data lists the number of
observations or ‘data points’ in each class used for
grouping (the class frequencies). The classes of a
frequency distribution must be mutually exclusive (an
observation cannot fall into two classes) and exhaustive
(any observation must belong to a class).
9

(b) Frequency Distributions for Quantitative Data

Each class of a frequency distribution of quantitative data


usually has a lower and an upper limit, although
sometimes it is necessary or convenient to have open-
ended classes, i.e. classes which have either an upper or
lower limit but not both.

Example 2.1:
Suppose we have data on the number of children in 100
households as follows:

Class Frequency
0 to under 2 children 30
2 to under 4 children 55
4 to under 6 children 13
6 or more children 2

The last class is open-ended. The class width of the other


classes is 2.

The class width is the difference between successive lower


class limits or upper class limits.

Note: An open-ended class has no class width.

General Advice for Forming Frequency Distributions:


 The number of classes should generally be between
5 and 20. (Although there are only 4 in the above
simple example)
10

 Class widths are ideally equal, but this may not


always be possible, and open-ended classes may be
necessary.
 Class limits should be chosen such that the class
midpoint is close to the average of observations in
the class. This is because in calculating summary
statistics based on grouped data the midpoint is
used as representative of all observations in the
class.

(c) Relative, Cumulative and Cumulative Relative


Frequency Distributions

A relative frequency distribution shows the proportion of


all observations falling in each class. It is obtained by
dividing the class frequencies ( f i ) by the total number of
observations in the data (‘n’).

A cumulative frequency distribution shows, for each class


i, the total of the first i frequencies.

A cumulative relative frequency distribution shows, for


each class i, the total of the first i relative frequencies.

For the previous example we have

Class (i) Frequency Cumulative. Relative Cumulative.


(fi) Frequency Frequency Rel. Freq.
0 to under 2 30 30 0.30 0.30
2 to under 4 55 85 0.55 0.85
4 to under 6 13 98 0.13 0.98
6+ children __2 100 0.02 1.00
100 1.00
11

2.3 Histograms

Histograms give us a convenient way of visualising the


distribution of observations over classes. They take the
form of a series of adjacent (contiguous) rectangles, one
for each class, with the base of each rectangle centred over
the corresponding class midpoint.

In a frequency histogram the areas of the rectangles are


proportional to the class frequencies, with the factor of
proportionality the same for all classes. Thus if all the
classes have the same width, each rectangle will have the
same base width and the class frequencies can be
represented by the rectangle heights.

In a relative frequency histogram the areas of the


rectangles are proportional to the relative frequencies.

Similarly cumulative and cumulative relative frequency


histograms can be defined.

Note: Frequency and relative frequency histograms will


have the same shape.

Example 2.2:
Consider the following distribution

Class Frequ. Rel. Freq. Cum. Freq.


0.5 to under 2.5 10 0.1 10
2.5 to under 4.5 30 0.3 40
4.5 to under 6.5 50 0.5 90
6.5 to under 8.5 10 0.1 100
12

Frequency Histogram
Frequency
50

30

10

0.5 2.5 4.5 6.5 8.5

Relative Frequency Histogram


Relative
Frequency
0.5

0.3

0.1

0.5 2.5 4.5 6.5 8.5

Cumulative Frequency Histogram

100
90

Cumulative
Frequency

40

10

0.5 2.5 4.5 6.5 8.5


13

2.4 Shapes of Distributions

The frequency or relative frequency histogram gives us a


representation of the shape of the distribution of the data
being analysed.

(That is, how the data is distributed over the possible


values)

There are several terms commonly used to describe the


shapes of distributions.

A distribution is described as negatively skewed (skewed


to the left) if it has the following shape
A Distribution that is Skewed to the Left

Relative Frequency

Variable Value
14

A distribution is positively skewed (skewed to the right) if


it has the following shape.
A Distribution that is Skewed to the Right

Relative Frequency

Variable Value

A distribution is symmetric if it has the following shape.


A Symmetric Distribution

Relative Frequency

Variable Value

The above are all examples of unimodal distributions. A


bimodal distribution has two peaks.

Note that for a multimodal distribution, the peaks need


not be the same height.
15

2.5 Bivariate Frequency Distributions

Often it is of interest to classify observations of elementary


units according to two variables (characteristics). This
allows one to gauge the relationship between the two
variables.

Example 2.3:
Consider the final results of 50 students in a particular
subject. Each student’s final grade and gender are
recorded, allowing the derivation of the following
bivariate frequency distribution.

Grade
Gender HD Dist. Credit Pass Fail Row
Total
Male 5 4 10 6 2 27
Female 2 3 11 2 5 23
Column 7 7 21 8 7 50
Total

Each combination of grade and gender is represented by a


cell in the bivariate frequency distribution, which contains
the frequency of that combination in the data.

The row totals represent, in this example, the marginal


frequencies of females and males in the class (27 and 23,
respectively).

The column totals represent the marginal frequencies of


final grades.
16

Marginal frequencies, represented by the row and column


totals, each refer to one variable only.

We can express the information in a bivariate frequency


distribution as a relative frequency distribution by
dividing each entry in the distribution by the total number
of observations.

Example 2.4:
For the previous example, the bivariate relative frequency
distribution is given by (dividing each entry by 50)

Grade
Gender HD Dist. Credit Pass Fail Row
Total
Male 0.10 0.08 0.20 0.12 0.04 0.54
Female 0.04 0.06 0.22 0.04 0.10 0.46
Col. 0.14 0.14 0.42 0.16 0.14 1.00
Total

The row and column totals in the above table are called
the marginal relative frequencies.

Note: Knowledge of the row and column totals, i.e. the


respective univariate distributions, does not inform us
about any relationship between the variables.
17

3. MEASURES OF CENTRAL TENDENCY AND


DISPERSION

In this section we shall look at important ways of


summarising data from both populations and samples.
We shall be concerned with measures of the

 ‘centre’ of a frequency distribution


 ‘dispersion’ of values in a frequency distribution

3.1 Summation Notation

Suppose we have ‘n’ numbers. By labelling the numbers


(1,2 ,3 ,...,n) , we can represent the numbers by

x i , i  1,...,n

The sum of the numbers can be denoted

 xi  x 1  x 2  ........ x n
i 1

x i
is a shorthand way of writing the sum.
i 1

Theorem (Basic Properties of Summation Notation)


Given ‘c’ is some constant and a1 , a 2 ,...,an are ‘n’
numbers:
n n
(i)  ca i  c a i
i 1 i 1
18

n
 n 
(ii)  (a i  c)    a i   nc
i 1  i 1 
n
 n 2 n
(iii)  (a i  c)    a i   2c a i  nc 2
2

i 1  i 1  i 1

n
 n 2 n
(iv)  (a i  c)    a i   2c a i  nc 2
2

i 1  i 1  i 1

Example 3.1:
Consider the following four labelled numbers.

a1  1 , a 2  3 , a 3  2 , a4  1

Use property (iii) of the above theorem to calculate


4

 (a i  1)2 .
i 1

(See video for solution)


19

3.2 Measures of Central Tendency

For each measure considered there are population and


sample versions. We will suppose here there are N values
in the population and ‘n’ values in a sample.

Note that at this stage we are only concerned with


quantitative variables, and we assume the population
contains a finite number of values.

Definition (Mean of a Finite Quantitative Population)


If x1 , x 2 , x 3 , .......,x N represents a finite population of ‘N’
quantitative data points, then the mean of this population
is given by
N

x1  x 2  ...  x N 
xi
Population mean     i 1
N N
(  is the Greek letter ‘mu’)
20

Definition (Mean of a Sample from a Quantitative


Population)
If x1 , x 2 , x 3 , .....,xn represents a particular sample of size
‘n’ from a quantitative population, then the mean of this
sample is given by
n

x1  x 2  .....  x n 
xi
Sample mean  x   i 1
n n

Definition (Mode of a Set of Data)


The mode is the data value that occurs most frequently in
a set of data (population or sample).

Note: The mode need not be unique.

Definition (The Median of a Set of Data)


If quantitative data is arranged in ascending or
descending order, the middle value of data is called the
median. If there is an even number of data points, the
median is typically taken to be the arithmetic average of
the two middle values.

We can of course talk of population and sample medians.

Example 3.2:
Consider the following set of data, which we can assume to
be a sample from a population.
21

1 1 5 4 12 4
3 1 2 7 6 6
5 1 1 5 8 9
10 2 4 2 6 30

n  24, x1  1, x 3  5, x11  6 , etc. (if we label across rows


then down)

(See video for solution)

Comparison of the Mean, Median and Mode

The mean takes account of all observation values therefore


it can be affected by extreme values or outliers, i.e. values
which differ greatly from the majority of values.

In the previous example the outlier 30 pushes the mean to


the right of the majority of the data. If it were omitted the
mean would be approximately 4.57.

The median and mode are unaffected by extremely high or


low values.

In the previous example, even if x24 were 1,000,000 instead


of 30 the median and mode would be unchanged.

The mode may not represent a “central” value in the


distribution, as in the above example, but it may be useful,
for example, for qualitative data.
22

If the frequency (or relative frequency) distribution is


perfectly symmetric and unimodal, the mean, median and
mode will coincide.
Symmetric Distribution
Relative Frequency

Variable Value
Mean
Median
Mode

If the distribution is skewed to the right (positively


skewed) and unimodal, mode < median < mean.
Distribution that is Skewed to the Right

Relative Frequency

Variable Value
Mode Mean
Median
23

If the distribution is skewed to the left (negatively skewed)


and unimodal, mean < median < mode.
Distribution that is Skewed to the Left

Relative Frequency

Variable Value
Mean Mode
Median

This gives us a way of deciding whether a distribution is


skewed to the left or right.
24

MAIN POINTS

 A statistical population is a set of measurements or


characteristics of elementary units of interest.

 Once a population is defined, a sample is a subset


from the population.

 Parameters are numerical characteristics of a


population.

 Sample statistics are numerical characteristics of a


sample.

 A frequency or relative frequency distribution


describes how data is distributed over different
classes or categories.

 A histogram shows graphically a frequency, relative


frequency or cumulative frequency distribution (the
areas of the ‘contiguous’ rectangles are proportional
to the frequencies or relative frequencies).

 The mean is affected by ‘extreme’ values; the median


and the mode are not affected by ‘extreme’ values.

 The population mean is denoted  : the sample mean


is denoted x .

 The median divides a set of quantitative data into two


equal halves.

You might also like