Inference Statistics Terminology
Inference Statistics Terminology
Inference Statistics Terminology
1
Statistics for Data Science : Course Objectives
COURSE OBJECTIVES
The Course aims to:
1. To equip students with the skills to summarize and interpret data using descriptive
statistics and visualization techniques.
2. To develop a foundational understanding of probability and its applications in data
science.
3. To enable students to perform hypothesis testing and construct confidence intervals
for statistical inference.
4. To teach students how to build and assess linear and logistic regression models for
predictive analysis.
5. To provide hands-on experience with statistical software for data manipulation,
analysis, and visualization.
2
COURSE OUTCOMES
On completion of this course, the students shall be able to:-
Summarize and describe the main features of a dataset using measures such as mean,
CO1 median, mode, variance, and standard deviation, as well as graphical representations
like histograms, box plots, and scatter plots.
Understand of probability theory, including concepts such as random variables,
CO2 probability distributions, and the law of large numbers, enabling them to model and
reason about uncertainty in data.
Apply/perform statistical inference, including hypothesis testing, confidence interval
CO3 estimation, and p-value computation, to draw valid conclusions from sample data about
larger populations.
Utilize statistical software tools to perform data analysis, including data cleaning,
CO5
transformation, visualization, and implementing various statistical methods.
3
Unit-3 Syllabus
4
SUGGESTIVE READINGS
TEXT BOOKS:
• T1. Hastie, Trevor, et al., The elements of statistical learning. Vol. 2. No. 1. New York:
Publisher: Springer, Edition: Second Edition (2009), ISBN: 978-0387848570
• T2. Montgomery, Douglas C., and George C. Runger. Applied statistics and probability for
engineers. John Wiley & Sons, 2010.
• T3. Probability and Statistics The Science of Uncertainty Second Ed., Michael J. Evans and
Jeffrey S. Rosenthal.
REFERENCE BOOKS:
• R1. Practical Statistics for Data Scientists: 50 Essential Concepts, Authors: Peter Bruce, et al,
Publisher: O'Reilly Media, Edition: Second Edition (2020), ISBN: 978-1492072942
• R2. An Introduction to Statistical Learning: with Applications in R, Authors: Gareth James, et
al, Publisher: Springer, Edition: Second Edition (2021), ISBN: 978-1071614174
• R3. Think Stats: Exploratory Data Analysis in Python, Author: Allen B. Downey, Publisher:
O'Reilly Media, Publication Year: 2014 (2nd Edition), ISBN: 978-1491907337
5
Overview of Inference
Statistical confidence
Confidence intervals
Confidence interval for a population mean
How confidence intervals behave
Choosing the sample size
Some Cautions
6
Statistical Inference
After we have selected a sample, we know the responses of the
individuals in the sample. However, the reason for taking the sample is
to infer from that data some conclusion about the wider population
represented by the sample.
Population
Collect data from a
Sample representative sample...
Make an inference
about the population.
7
Confidence Interval
A level C confidence interval for a parameter has two parts:
An interval calculated from the data, which has the form
estimate ± margin of error
A confidence level C, where C is the probability that the
interval will capture the true parameter value in repeated
samples. In other words, the confidence level is the success
rate for the method.
8
Statistical Estimation
9
We know that sample mean is an unbiased estimator for
the (unknown) population mean µ.
So we can take = 495 as a good estimate.
12
Confidence Interval for a
Population Mean
To calculate a confidence interval for µ, we use the formula:
−z* z*
14
15
16
17
18
Choosing the Sample Size
You may need a certain margin of error (e.g., in drug trials or
manufacturing specs). In most cases, we have no control over the
population variability (s), but we can choose the number of
measurements (n).
z * 2
m z * n
n m
Remember, though, that sample size is not always stretchable at will. There are
typically costs and constraints associated with large samples. The best approach is to
use the smallest sample size that can give you useful results.
19
Sample Size Example
How many undergraduates should we survey?
Suppose we are planning a survey about college savings programs.
We want the margin of error of the amount contributed to be $30 with
95% confidence. Let us assume the population standard deviation, σ,
equals $1483.
How many measurements should you take?
For a 95% confidence interval, z* = 1.96.
20
6.2 Tests of Significance
21
Statistical Inference 2
The second common type of Statistical inference, called tests of
significance, is to assess evidence in the data about some claim
concerning a population.
22
Four Steps of Tests of Significance
Tests of Significance: Four Steps
1. State the null and alternative hypotheses.
2. Calculate the value of the test statistic.
3. Find the P-value for the observed data.
4. State a conclusion.
23
1. Stating Hypotheses
A significance test starts with a careful statement of the claims we want to
compare.
The claim tested by a statistical test is called the null hypothesis (H0).
The test is designed to assess the strength of the evidence against the
null hypothesis. Often, the null hypothesis is a statement of “no effect”
or “no difference in the true means.”
The claim about the population for which we’re trying to find evidence
is the alternative hypothesis (Ha).
24
25
2. Test Statistic
A test of significance is based on a statistic that estimates the parameter that
appears in the hypotheses. When H0 is true, we expect the estimate to be
near the parameter value specified in H0.
Values of the estimate far from the parameter value specified by H0 give
evidence against H0.
A test statistic calculated from the sample data measures how far
the data diverge from what we would expect if the null hypothesis
H0 were true.
Large values of the statistic show that the data are not consistent
with H0.
26
27
3. P-Value
The probability, computed assuming H0 is true, that the
statistic would take a value as or more extreme than the one
actually observed is called the P-value of the test. The smaller
the P-value, the stronger the evidence against H0.
28
29
4. Conclusion
We make one of two decisions based on the strength of the evidence against
the null hypothesis ―reject H0 or fail to reject H0.
30
31
Tests for a Population Mean
Confidence level C
and a for a two-sided
test are related as
follows:
C=1–a
a/2 a/2
35
36
37
More About P-Values
38
6.3 Use and Abuse of Tests
39
Cautions About Significance Tests 1
Choosing the significance level
Factors often considered:
What are the consequences of rejecting the null hypothesis
when it is actually true?
• What might happen if we concluded that global warming was real when
it really wasn’t?
• Suppose an innocent person was convicted of a crime.
40
Choosing Significance
Some conventions:
Level
Typically, the standards of our field of work are used.
41
Cautions About Significance Tests 2
Do not ignore lack of significance
Consider this provocative title from the British Medical Journal: “Absence of
evidence is not evidence of absence.”
Having no proof that a particular suspect committed a murder does not imply
that the suspect did not commit the murder.
Indeed, failing to find statistical significance in results means that “the null
hypothesis is not rejected.” This is very different from actually accepting the
null hypothesis. The sample size, for instance, could be too small to overcome
large variability in the population.
42
Cautions About Significance Tests 3
Statistical inference not valid for all sets of data
43
6.4 Power and Inference as a
Decision
Power
Increasing the power
The common practice of testing hypotheses
44
Power of Test
45
46
47
48
49
TypeWhen
I and Type II Errors
we draw a conclusion from a significance test, we hope our
conclusion will be correct. But sometimes it will be wrong. There are two
types of mistakes we can make.
52
References
Books:
• Hastie, Trevor, et al., The elements of statistical learning. Vol. 2. No. 1. New York: Publisher: Springer, Edition:
Second Edition (2009), ISBN: 978-0387848570
• Practical Statistics for Data Scientists: 50 Essential Concepts, Authors: Peter Bruce, et al, Publisher: O'Reilly
Media, Edition: Second Edition (2020), ISBN: 978-1492072942
Research Papers:
• Garg, Ram and Goyal, Ruchi, Inferential Statistics As a Measure of Judging the Short-Term Solvency An Empirical Study of Three Steel
Companies in India (February 5, 2019). International Journal of Advanced Studies of Scientific Research, Vol. 4, No. 1, 2019, Available at
SSRN: https://ssrn.com/abstract=3329388.
• Alacaci, C. (2004). Inferential Statistics: Understanding Expert Knowledge and its Implications for Statistics Education. Journal of Statistics
Education, 12(2). https://doi.org/10.1080/10691898.2004.11910737
Websites:
• https://www.simplilearn.com/inferential-statistics-article/
• https://builtin.com/data-science/inferential-statistics#:~:text=Inferential%20statistics%20is%20the%20practice,
sample%20data%20sample%20or%20population./
Videos:
• https://www.youtube.com/watch?v=cjTgyRUaD1s&list=PLbRMhDVUMngeD_vOeveVE-3b7wu_AZph9
• https://www.youtube.com/watch?v=ZmCBF5JXOPM&list=PLFW6lRTa1g80s2MWqXNg2o0haq1k14v2I 53
THANK YOU
For queries
Email: madan.e13485@cumail.in