Inferential Analysis
Inferential Analysis
Inferential Analysis
1.3.3 t-Intervals
σ
√n
of the arithmetic mean must be replaced by
σ
∗
N −n
√ n N −1 √
Conditions for 1. Assume that underlying distribution of random variable in sample space is normal distribution (indispensable for small samples)
applying t-intervals: When conditions not met: eliminate extreme outliers (record it), bootstrap it, use median as mean
Dichotomous = t a ∗s t a ∗s
binominal distribution ,v ,v
The interval 2 to 2 contains the arithmetic mean μ of the sample space with a probability of 1−α
x− x+
√n √n
a
Calc: invT (t ¿ ¿ , df )¿ => v=df =n−1
2
When using inv.T on calc always use one-sided value for area e.g a=99% then invT(0.995,df)
1.3.7 Confidence Intervals for Proportions, Nominal Attributes
X−μ x E ( X ) =n∗π and Var ( X )=n∗π∗(1−π )
T= p=
S n Rule of thumb: For n∗π∗( 1−π )> 9 , a binominal distribution can be approximated by a normal distribution
√ √ √
√n X E( X) n∗π Var ( X) n∗π∗(1−π) π∗( 1−π)
P= E ( P )= = =π and standard error √ Var ( P)=
n = =
n n n
2
n
2
n
Absolute error = 2∗standard error
z-confidence interval for proportions:
Provided n∗p (1− p ) >9 holds, we have:
The probability that the proportion π of the sample space is
between
Step 4: determine the region of rejection of the null hypothesis. Set up a decision rule.
Since our sample size is n = 400, our reasoning is as follows: Provided H0 : 𝜋 ≤ 0.02 is true, at most 8 imperfect
chips
set up decision rule, determine region of rejection of null hypothesis (region of rejection through Excel, first value
below risk is c => R={c,c+1,c+2,…)
(reject H0 if and only if at least c “six dots” are observed (replace c with its value)
We need to determine this critical value c, which implies a region of rejection (also called critical region) based on
probability theory
c is the smallest solution x of the inequation P(X≥x) < a or P(X 1-a, respectively. In words: the probability that the
RV X takes a value larger than the critical value c is smaller than the error risk fixed in advance. If the null
hypothesis holds, we can conclude that, in our example: X has a binomial distribution («success» being an
imperfect chip) with n = 400 and probability of success 𝜋 = 0.02, in short: X ~ B(400, 0.02)
Step 5: assess data, determine test statistic, find decision => Type 1 and Type 2 error occur
Excel method and binom:
Alternative hypothesis HA: the dice is loaded; the probability of «six
dots» is bigger than 1/6.
a) Shorthand: HA: p>1/ 6
Can be limited, by making α very small (e.g., 1% or 0.1%) Calc: β=binomcdf ( n , lower limit ,upper limit )
Excel: β=
.
dist
Interpretation: The probability that our binominal test with n=400 and α =5% confirms the null
hypothesis although the true proportion of imperfect chips is 4%, is 27%.
p-value Decision-making based on p-value
Given that Ho is true, what is the probability that we get the result p−value ≤ α →reject H 0 test statistic is significant
we did for our sample?
p−value>α → retain H 0 test statistic is not significant
p-values in SPSS
Note: SPSS always conducts a one-sided test! For a two sided test, multiply the p-value by two.
Analyze > Nonparametric Tests > Legacy Dialogs > Binomial
according to type of test, value will be given with different legends:
- Exact Sig (2-tailed)
- Exact Sig. (1-tailed)
- Asymptotic significance or similar
Make histogram and box plot to figure this out ALWAYS ON EXPLORE USE COMPARE MEAN on SPSS!!
One-sample t-test, σ unkown 2-sample t-test for paired 2-sample t-test for independent
samples samples
Is an expectancy bigger or smaller Does a certain measure have an Does a metric attribute have
than a given reference value? impact on a certain attribute of a different expectancies in two groups
population? Does it change its of the population (e.g. men,
Ryan says African schlong (known expectancy? (exact same unit is women)? Are the differences we
measured twice) measure between them significant?
value) is on average 20cm long, a
random sample (Unknown) showed
African schlong measured in Winter South African schlong vs Congo
ave length = 23cm sd of 1.5cm and African schlong measured in schlong – is there a big difference?
summer
Hypotheses: H0: μ=μ 0 H0: μ A =μB ∨μdiff =μ A −μB =0 H0: μ1=μ 2∨μ 1−μ2=0
HA: μ ≠ μ 0 (bilateral / standard case) HA: μ A ≠ μB ∨μdiff ≠ μ A −μ B =0 HA: μ1 ≠ μ2 ∨μ 1−μ2 ≠ 0 standard
μ> μ0 ∨μ < μ0 (unilateral test) (bilateral test) case, bilateral case)
μ> μ1∨μ< μ 0 (unilateral test
(under specified circumstances)
Test ( X−μ0 ) X 1− X 2
distribution T= T= follows t-distribution
S has a t-distribution S( X − X )
1 2
√n
sample) df =v =n−1 degrees of freedom
then use excel formula to Take difference of the two for one group
determine p-value samples from each unit and then
T.DIST(t,df,TRUE) use that difference as a one-
or use tcdf on calc (times by sample t-test
2 to get double sided mf)
=T.DIST.2T(t,df)
p-value of Are below t-bell curve to the left of Estimating standard error s(X − X ) : 1 2
the sample −|t | and to the right of +|t | (bilateral Variant 1: homoscedasticity, that
test) is when we assume the two
(in case of unilateral test, cut p-value in variances are the same
half)
Level of α (select in advance, standard value 5%) σ 12 =σ 22
significance Variant 2: heteroscedasticity, that
Decision if p-value > α , retain H0 is we assume the two variances
making if p-value ≤ α , reject H0 and instead are different:
2 2
accept HA σ 1 ≠ σ2
2. To show frequencies in the output, select Observed and «Expected» in the crosstabs
menu
3. If the condition «expected frequencies > 5» is violated, select Method Exact (this means
«run Fisher’s exact test»). Then, use this test’s p-value shown in the output file. Or merge
columns.
4. If both attributes have only two different values, the cross tab will have four cases. This
reduces the df to 1, and the Yates correction will be carried out. The p-value shown in the
row «continuity correction» will then be the one to be used.
Chi-square test for two different sample spaces
H0: π 1=π 2, meaning proportions are the same
HA: π 1 ≠ π 2, meaning proportions are different
Samples: two samples with sample size n1, n2 and observed proportions p1, p2
Success No success
Sample 1 n1*p1 n1*(1-p1)
Sample 1 n2*p2 n2*(1-p2)
Error risk: α
Decision rule: p < α : reject H0 p > α : retain H0
2
SPSS: χ - test with continuity correction yields the bilateral p-value
4. Linear Regression
Conditions: 𝐸𝑖 ~ 𝑁(0, 𝜎𝐸 2 )
1. Ei must follow a normal distribution with expectation 0
and variance .
2. The Ei ’s have the same variance across all observations
i = 1 through n.
3. The Ei ’s don’t influence each other, that is they are
independent RVs.
Y i=β 0 + β 1∗x (i1 )+ β2∗x(i2) +...+ β m∗x (im ) + Ei , withi=1 … n, where the unknown regression parameters β0, β1, β2, …, βm are to be
2
estimated. Assume error terms Ei follow normal distribution with expectancy E(E i)=0 and unknown variance σ E . Assume error variables
are independent
n
1
∗∑ e i
2 2
Point estimate for the variance s =
E
n−m−1 i=1
Adjusted R2 explains what proportion of the model accounts for the variance of the independent variable
Backward stepwise regression
1. Run regression with complete model
2. In case p value is bigger than α, regressor with highest p-value must be dropped (only one at a time)
3. Run regression again but without regressor from Step 2.
4. Repeat Step 2 & 3 until all regressor have p-value smaller than α
5. Select model whose regressors all have significant impact on independent variables
4.4 Transformations
Model Remark SPSS
Exponential model ~ ~x
y= b∗β
1. Logarithmise data for yi => obtain zi
~ ~x 2. Run regression with zi as dependent, and xi as independent
y= b ∙ β We linearize by logarithmising:
~ ~ variable
ln ( y )=ln ( b ) + x∗ln( β ) ~ ~
3. b∧ β can be calculated by retransforming estimates b and β
z=b + βw , ~ ~
obtained from linear regression b=e b & β =e β
with z=ln ( y )
w=ln ( x ) ,
~
b=ln ( b )
Power model ~ β 1. Logarithmise data for xi => obtain wi, logarithmise data for yi
y= b∗x
=> zi
~ 2. Run a regression with zi as dependent, and with wi as
y= b ∙ x β We linearize by logarithmising:
independent variable
z=b + β∗w ~
3. b can be obtained by retransforming estimate b of the linear
, with z =ln ( y ) , ~
regression: b=e b --- THIS is the constant number (not beta)
w=ln ( x ) , Interpretation: if x goes up by 1%, y changes by about β%
~
b=ln ( b)
SPSS commands
What do you want? Command What do you want? Command
Binominal Tests Analyze > Nonparametric Tests > Legacy > Binomial Boxplot Analyze > Descriptive Statistics > Explore
p-value Analyze > Nonparametric Tests > Legacy > Binomial Chi-square test Analyze > Descriptive Statistics > CrossTabs > Statistics
> Chi-Square
Note: when using “Get from data”, Test proportion = α ,
all data must first be sorted into descending
When using “Cut point”, Test proportion =1−α , data
must not be sorted
α
When trying bilateral binomial test: use α modified =
2
, then compare this p-value
t-test (one-sample Analyze > Compare Means > One-Sample T-Test Linear Regression Analyze > Regression > Linear
test, σ unknown) Select confidence interval under Statistics
Note: Test Value is the reference value that we are Under save:
comparing our data to Mean when looking for expectancy of market share at
given price x0
Individual when looking for extra realization at price x0
t-test (two-samples Analyze > Compare Means > Paired Samples T-Test Testing for linear Analyze > Regression > Linear > Plots
test) regression Y=*ZRESID
X=*ZPRED
t-test (two Analyze > Compare Means > Summary Independent- Multiple linear Analyze > Regression > Linear
independent Samples T Test regression Then select backward
samples)
Excel commands
What do you want? Command What do you want? Command
A metric variable can be measured with a well-defined unit of measurement. It can be discrete or continuous.
Discrete variable:
o It can only take a countable number of values on a scale (e.g. a family can have 1 or 2 children, but not 1.3)
Continuous variable:
o Every value on an interval can be adopted (e.g. weight, height, time)
Box plot
Top of box = 75 percentile
Bottom of box = 25 percentile
Cross = mean
Line in the box = medium
Points = outliers