|
58 | 58 | Center Limit Theorem(CLT) : random variables tends to follow normal distribution.
|
59 | 59 | Test and hypothesis: statistic take n random samples of population, as n samples approximately normal dist, from n samples sigma, SE, etc, we can estimate population.
|
60 | 60 |
|
| 61 | + https://www.khanacademy.org/math/probability/statistics-inferential/hypothesis-testing/v/z-statistics-vs--t-statistics |
| 62 | + |
61 | 63 | If the observed value is normally distributed, then under the null hypothesis, the z-statistic has a standard normal distribution
|
62 | 64 | So the further the z-statistic is from zero, such extreme z-statitics should unlikely to happen under normal distribution. this serves as the stronger the evidence that the null hypothesis is false.
|
63 | 65 | The P-value of the test is the chance of getting a test statistic as extreme, or more extreme, than the one we observed(z-value), if the null hypothesis is true
|
|
67 | 69 | 95% population within mean+- 2 sd, confidence level 95%.
|
68 | 70 | SE = 2*sigma.
|
69 | 71 |
|
| 72 | + CLT say we probably can infer population from sample. Probability is Confidence level. |
| 73 | + |
| 74 | + so we draw the sample, got x-bar, sample mean, s, sample standard deviation. |
| 75 | + Population u and sigma can be inferred thru z-test(sample size > 30) or t-test. |
| 76 | + |
| 77 | + Z = (x-bar - u) / sample-stdv / sqrt(n) |
| 78 | + |
| 79 | + z is in (−1, 1) with probability approximately 0.68 |
| 80 | + z is in (−2, 2) with probability approximately 0.95 |
| 81 | + |
| 82 | + so, if you relax z to 2, you higly probably(95%) guessed u value by assert it must be within a range. however the range could be coarse. |
| 83 | + |
| 84 | + |
70 | 85 | Ex: U.S ppl life mean 68, sd(sigma) is 5. so 95% ppl live from 58-78.
|
71 | 86 |
|
72 | 87 | z= ( observed − expected ) / standard_error
|
|
122 | 137 | A P-value of less than 5% is statistically significant: the difference can't be explained by chance, so reject the null hypothesis
|
123 | 138 | A P-value of more than 5% is not statistically significant: the difference can be explained by chance. We don't have enough evidence to reject the null hypothesis.
|
124 | 139 |
|
| 140 | + Comparing p-values from t and z |
| 141 | + One may be tempted to think that the confidence interval based on the t statistic would always be larger than that based on the z statistic as always t∗ > z∗ . However, the standard error SE for the t also depends on s which is variable and can sometimes be small enough to offset the difference. |
| 142 | + |
125 | 143 | ## Confidence Level : how close sample to population. z=diff(sample,population) is to 0 by confidence. Prob(-1<(p-xp)/SE<1)=0.68
|
126 | 144 | Claim: 42 likes out of 100 surveyed with error 9%. This is 95% CI.
|
127 | 145 | 1. sample mean approximate to true mean, with sample size n, population std sigma, and CI.
|
|
136 | 154 | no-belt = c(65963,4000,2642,303)
|
137 | 155 | chisq.test(data.frame(yes-belt,no-belt))
|
138 | 156 |
|
| 157 | + |
139 | 158 | ## ANOVA
|
140 | 159 | F-dist: distribution between two variances, the ratio of two chi-square variables. ANOVA.
|
141 | 160 | variance analysis. hypothesis test two samples with t test to get p-value to reject null hypo.
|
|
0 commit comments