Ecs Notes

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

STA1501 NOTES

Statistics
Is a science that involves collecting, summarising , analysing and interpretation of data for the
purpose of making informed decisions.

Descriptive Statistics
Descriptive statistics deals with methods of organizing, collection, characterisation, summarizing, and
presenting data in a convenient and informative way. I,e MEAN, RANGE, MEDIAN,
A descriptive measure of a population is called a parameter.

Inferential Statistics
Is a body of methods (estimation process) used to draw conclusions or inferences about
characteristics of populations based on sample data.

inference is the act or process of inferring; to infer means to


conclude or judge from premises or evidence; meaning to derive by reasoning

Statistical inference is the process of making an estimate, prediction, or decision about


a population based on sample data.
we build into the statistical inference a measure of reliability. There are
two such measures: the confidence level and the significance level. The confidence level
is the proportion of times that an estimating procedure will be correct

PAGE: 7, 8, 9

A population is the collection of all items, elements, or objects that we wish to study. A population does
not necessary mean a group of people living in a specific area. For instance. population of South Africa.
In Statistics, a population is referred to a group of all individuals, elements, items of interest to a statistics
practitioner. population is the entire set of observations under study -
A sample is a fraction or a subset of population
 When making conclusions about a population based on a sample, the conclusions and estimates may not be
perfect. To minimise the level of uncertainty when making decisions, 2 reliable measures are used in statistical
inference: the confidence interval and the level of significance.
A sample is a set of data drawn from the studied population. A descriptive measure of a sample is
called a statistic.
We use statistics to make inferences about parameters.
Parameter is a numerical value that describes or summarises a population, that is, describes the
characteristics of a population while a
Statistic is a numerical value that describes the characteristics of a sample or summarises a sample.
Types of variable and information
There are two types of data, namely quantitative and qualitative data.
Quantitative data generates numerical variables and they are usually reported numerically.
 A quantitative variable can be either discrete or interval. A discrete variable is countable and are referred to as
a whole number while an interval variable can assume any given value within a given interval.
Qualitative data generates categorical variables. – options, variety. Qualitative data are usually summarized in
graphs and bar charts.

Statistics practitioner
A person who uses statistical techniques properly.
Examples of statistics practitioners include the following:
1. a financial analyst who develops stock portfolios based on historical rates of return;
2. an economist who uses statistical models to help explain and predict variables such as inflation
rate,
unemployment rate, and changes in the gross domestic product; and
3. a market researcher who surveys consumers and converts the responses into useful information.

A variable is some characteristic of a population or sample. For example, the mark


on a statistics exam is a characteristic of statistics exams that is certainly of interest to
readers of this book. Not all students achieve the same mark. The marks will vary from
student to student, thus the name variable. The price of a stock is another variable. The
prices of most stocks vary daily. We usually represent the name of the variable using
uppercase letters such as X, Y , and Z.
The values of the variable are the possible observations of the variable. The
values
of statistics exam marks are the integers between 0 and 100 (assuming the exam
is marked out of 100). The values of a stock price are real numbers that are usually
measured
in dollars and cents (sometimes in fractions of a cent). The values range from
0 to hundreds of dollars.
Data
Are the observed values of a variable. For example, suppose that we observe
the following midterm test marks of 10 students. types of data: interval, nominal, and ordinal.

Hierarchy of Data

Levels of Measurement
Interval
Data are real numbers, such as heights, weights, incomes, and distances. We also refer to this type
of data as quantitative or numerical.
 Values are real numbers.
 All calculations are valid.
 Data may be treated as ordinal or nominal
Ordinal
Data appear to be nominal, but the difference is that the order of their values have meaning. The
difference between nominal and ordinal types of data is that the order of the values of the latter
indicate a higher rating.
Ordinal level of measurement presumes that one classification is ranked higher than another. The
items or object differ form one to the other one but have more or less of a characteristic than
another. In this level of measurement the order of the variables is meaningful
 Values must represent the ranked order of the data.
 Calculations based on an ordering process are valid.
 Data may be treated as nominal but not as interval.
Nominal pg:4 (21)
Data are categories. For example, responses to questions about marital status produce nominal
data. The values of this variable are single, married, divorced, and widowed. Nominal data are also
called qualitative or categorical.
Nominal data is count or compute the percentages of the occurrences of each category. Determining
frequencies are permitted.
Nominal scale applies to names. This measurement scale is used for objects or elements which
consists of names
Values are the arbitrary numbers that represent categories.
 Only calculations based on the frequencies or percentages of occurrence
 are valid.
 Data may not be treated as ordinal or interval.

Macroeconomics
Is a major branch of economics that deals with the behaviour of the economy as a whole.
Macroeconomists develop mathematical models that predict variables such as gross domestic
product, unemployment rates, and inflation

The range
The range is the difference between the largest value and the smallest value. To compute the range,
we need to identify the lowest and the highest values in the distribution
The variance
The variance and its related measure, the standard deviation measure the amount of variation of the
data around the mean. They measure " how far each data value is far from the mean"
n
1
2
s= ∑ ( x −x ) 2
n−1 i−1 1
The standard deviation
The sample standard deviation is simply the positive square root of the variance which is symbolised
by s√ variance
The coefficient of variation
The coefficient of variation of a distribution is the standard deviation of the data set divided by their
mean. It is a relative measure of dispersion, which is expressed as a percentage and symbolised by
mean
CV.cv =
standard deviation
Measures of central tendency are measures of location. They are typical values that describes or
summarise the distribution. The most used measures of location are the arithmetic mean, the
median and the mode
 The arithmetic mean (the average) is the sum of all the values in the data set divided by the
number of observations.

The population mean which is symbolised by μand is given by


∑ x. The sample mean is
N
symbolised by x and is given by
∑ x.
n
Steps on how to compute the mean;
1. Add all the given observations and find the total ∑ x (i.e., the sum)
2. Count the number of observations (n)
3. Divide the total number of observations (sum) by the number of observations (n)

 The median is the centre of the distribution when data is arranged in ascending or descending
order. It divides the data set in two parts, n halves below and above for an ordered data set. It
is that value with 50% of the observations less or equal to it and 50% of the observations
above or equal to it.
Steps to follow when computing the median
1. Arrange data in a numerical order; starting from the lowest data to the highest
2. Count the number of observations (n)
n+1
3. Determine the median position :
2
4. Read the median value from the ordered data
 The mode is the most frequent value in the data set
NB: The mode is seldom the best measure of central location. For ordinal and nominal data the mean is not an
appropriate measure of location but the mode is the appropriate measure of location for nominal and ordinal
data.

The measures of variability describes the amount of variation or spread in a data set. There are also
called measures of dispersion. The most used measures of variability are the range, the variance, the
standard deviation and the coefficient of variation
 The range is the difference between the largest value and the smallest value. To compute the
range we need to identify the lowest and the highest values in the distribution,
X Maximum - X Minimum

Probability Sampling Methods


Simple random sampling plan
A simple random sample is a sample selected such that every possible, with the same number of
observations has equal chance of being selected. All members of the population have the same chance
of being selected for the sample.
Stratified sampling plan
A stratified random sample is obtained by dividing the population in into several groups called strata,
and then drawing simple random samples from each stratum.
Cluster sampling
A cluster sample is a simple random sample in which each sampling unit is a collection
/groups/cluster of elements. The population is divided into primary units, then samples are drawn
from the primary units.
Systematic sample
A random starting point is selected, and then every kth item thereafter is selected for the sample.

A random experiment is a process or an action which generates one or multiple outcomes.


1. Flipping a coin is an action which lead to a head or a tail.
2. Tossing a dice is an action which generates 1, 2,3,4,5 or 6.

A sample space is a list of all outcomes of a random experiment and is denoted by S.


1. If you flip a coin once, there are 2 possible outcomes namely the head and the tail.
The sample space: S = {head, tail}
2. If a dice is tossed once, there a six possible outcomes; 1,2,3,4,5,6.
The sample space of the toss of a fair dice is : S = {1, 2, 3, 4, 5, 6}

The three approaches to assigning probability are:


 Classical approach
In rolling a fair die once, the probability of the event obtaining an odd number {1, 3, 5} is 3 out of {1,
2, 3, 4, 5, 6}
Number of successes is 3
3
Total number of outcomes is 6, and the probability is
6
Number of successes
P(E) =
Total number of outcomes

 Relative frequency approach


3
If you toss a coin 15 times and get 3 heads, the relative frequency approach gives:
15
Number of ×the events occurred
P(E) =
Total number of observations
Subjective approach
Probability is assigned based on intuition, past experiences, educated guess or opinion and give a numerical
estimate of the likelihood that a particular outcome will occur.
 Given a student’s assignment marks, a lecturer may feel that the student has a 50% change to pass the
module.

Sampling and Non Sampling errors


The population is different form the sample, two major errors arise when a sample of observations is
taken from the population in order to make statement about a population based on the
characteristics of the sample:
sampling error and non-sampling error.
Sampling Error is the difference between the sample and the population.
Non Sampling Error result from the mistakes made in the process of obtaining data or data being
selected or chosen without following a proper process such as errors in data acquisition,
nonresponse error and selection bias.

Defining events
A simple event is an individual outcome of a sample space.
An event is a list or set of one or more simple events in a sample space.
The probability of an event is the total or sum of the probabilities of the simple events that constitute
the event

Joint, Marginal and Conditional Probability


 Intersection
The intersection of events A and B is the event that occurs when both A and B occur. It is
denoted by: A and B. The probability of the intersection is also called the joint probability.

 Marginal probability
Marginal probabilities are computed by adding across or down columns, are so named
because they calculated in the margins of the table.

 Conditional probability
The conditional probability is the probability of an event A given information about the
occurrence of the event B.

The probability of event A given event B is given by:

P ( A∧B)
P (A/B) =
P (B)
Where P (B) > 0
The probability of event B given event A is given by:

P ( A∧B)
P (B/A) =
P( A)
Where P (A) > 0
If the probability of a particular outcome is equal to 0 that outcome
is impossible, while a probability outcome of 1 implies that it is a certain outcome

 Independent events
Two events A and B are independent if P (A/B) = P (A) or P (B/A) = P (B)

Read the statement above as follows: If the probability of an event A, given that another event B had
taken place, is the same as the direct probability of the event A, then the event B has no effect on the
occurrence of A. The probability of event A is independent of whether event B took place or not.

 The union of events


The union of events A and B is the event that occurs when either A or B or both occur. It is
defined by: A or B The probability of the event A or B is given by:

P (A or B) = P (A) + P (B) - P (A and B)


Probability rules and trees
 Complement rule
The complement rule of an event A is that even A does not occur. It is symbolized by

Example
If the probability of passing an assignment is 0.7 the probability of falling the exam is 1 – 0.7 = 0.3

 Multiplication rule
The joint probability of any two events A and B is given by

†We have created an Excel spreadsheet that does the calculations for this case. See Appendix 1 for
instructions on how to download this spreadsheet from Cengage’s website plus hundreds of data sets
and much more
To access these materials, go to www
.cengagebrain.
com and enter this book’s ISBN (9781337093453) in the search field.
There you’ll find the following available for download:
Random variable
A random variable is function or rule that assigns a number to each outcome of an experiment. There
are two types of random variables, discrete and continuous.
A discrete random variable is a variable that can take on a countable number of values, in other
words, a discrete random variable can assume a countable number of possible outcomes.
Example
 The number of accidents that occur on N1 highway every one hour is a random variable
 The delivery time of parcels to clients.
A continuous random variable is random variable which can take on any value over a given intervals
of values.
A probability distribution is a table, or a formula that describes the values of a random variable.

Discrete probability distribution


A discrete probability distribution is a listing of all possible outcomes a discrete random variable can
assume.
Data sets: There are approximately 1,100 data sets stored in folders.
Excel Workbooks: There are several workbooks containing spreadsheets that
perform
many of the Excel procedures.
XLSTAT: You’ll find a link to download a free 30-day trial of XLSTAT, software
to augment or replace Excel’s Data Analysis.
Appendixes: These include additional topics that are not covered in the book.
You’ll find (among about three dozen others) Data Analysis Plus printouts and instructions
and Minitab 17 printouts and instructions.
Formula Card: Lists every formula in the book.

You might also like