Big Data Analysis
Big Data Analysis
Big Data Analysis
Analysis Method
- Mathematical Method -
Curve-fitting, Curve-linear,
Non-linear Model, Linear Model,
Probability Theory, Simulator
It is not statistical analysis
Author
Kuan-Sian Wang
Mei-Yu Lee
2015/6/15
1
Announcement
Big data analysis is a very important method applied in the most part of fileds for our
world. We have researched as so far and want to share with the persons who are
interested in. It is our honor for academic researches of big data and we hope to share
freely our results for the whole world, and then to introduce in more correct analysical
methods for the future.
1
Contents
Preface............................................................................................................................ 1
Chaper 1. Basic analysis method ................................................................................ 1
1.1. The frequency distribution table cannot analysis big data ......................................................... 1
1.2. Assumption population is normal distribution, it is not a good idea. ......................................... 4
1.3. The hypothesis and test is not analyis method about big data .................................................... 9
Chaper 2. The population distribution test and the population mean and variance
test 14
2.1. The population distribution test................................................................................................ 14
2.2. One population mean and population variance test .................................................................. 25
2.3. Two independent population means and population variances test .......................................... 28
2.4. Two dependent population means and population variances test ............................................. 38
Chaper 3. The population proportion test ................................................................. 44
3.1. One population proportion test, ................................................................................................ 44
3.2. Two independent population proportion test ............................................................................ 54
Chaper 4. One way analysis ..................................................................................... 59
4.1. one way model ......................................................................................................................... 59
4.2. the α
= i 0,=i 1, 2, ..., k , .................................................................................................... 59
4.3. the α i ≠ 0, i = 1,2,..., k , ....................................................................................................... 62
4.4. the α i ≠ 0, i = 1,2,..., k and error distribution is Arcsin distribution. .................................. 67
4.5. the α i ≠ 0, i = 1,2,..., k and error distribution of each category has a specific probability
distribution. ........................................................................................................................................ 80
4.6. the α i = 0, i = 1,2,..., k and error distribution of each category has a specific probability
distribution. ........................................................................................................................................ 84
4.7. the α i = 0, i = 1,2,..., k , ........................................................................................................ 88
Chaper 5. Simple linear model ................................................................................. 92
5.1. Simple linear analysis .............................................................................................................. 92
5.2. The parabola model analysis, three basic assumptions are unchanged. ................................... 92
5.3. The comparison of independent variable is Normal distribution and independent variable is
Arcsin distribution, the three basic assumptions are unchanged...................................................... 102
5.4. The error probability distribution is not normal distribution and other basic assumptions are
unchanged. ....................................................................................................................................... 124
5.5. The variances of error are not equally and the other basic assumptions are unchanged. ....... 135
5.6. The independent variable has a shifted exponential distribution and the non-linear model, the
three basic assumptions are unchanged. .......................................................................................... 149
5.7. The random vatiable range has a specific region and the three basic assumptions are
unchanged. ....................................................................................................................................... 167
1
5.8. The 3th basic assumptionis modified, error has the Durbin Watson the first order
autoregressive error model............................................................................................................... 185
Chaper 6. The general linear model and non-linear model .................................... 197
6.1. multiple regression analysis ................................................................................................... 197
6.2. Collinarity in highly, the other assumptions are unchanged. .................................................. 198
6.3. The probability distributions of independent variable and error are not normal distribution, the
other assumptions are unchanged. ................................................................................................... 210
6.4. Non-linear model and the other assumptions are unchanged. ................................................ 239
6.5. Non-linare model and the indepenet variable is the sample statistics, the other assumptions are
unchanged. ....................................................................................................................................... 258
6.6. Dummy variable is one of independent variable, the other assumptions are unchanged. ...... 285
6.7. The endogenous variable in the linear model, the other assumptions are unchanged. ........... 296
Chaper 7. Multi-variate analysis using linear model .............................................. 316
Appendix 1. The common probability distributions ............................................... 345
Appendix 2. The Curve-linear of linear model analysis ......................................... 347
Appendix 3. The mathametical formula of Non-linear model analyis, .................. 348
Appendix 4. The limiting theory of cumulative probability distribution function . 349
Appendix 5. An application of Dow Jones ............................................................. 350
Appendix 6. The estimation of Cos model analysis ............................................... 359
Appendix 7. The population of Logistic distribution ............................................. 376
Appendix 8. The critical values of Logistic distribution ........................................ 381
Appendix 9. The transformation of probability distribution by the simulator ....... 383
Appendix 10. One way analysis when the error distribution is arcsin ................. 396
Appendix 11. The errors and residuals when the distribution of the errors is
shifted-exponential..................................................................................................... 419
Appendix 12. The critical values from two population means test of arcsin and
semi-circle 433
Appendix 13. The critical values of Zr statistic .................................................... 436
2
Preface
The big data is a population data, the anslysical method is belogned to mathecial
mehtod. The amount of data is huge and very hard to get the characteritics of big data.
Before the big dat analyis, the computer software must have the follwowing
functions:
(1) The curve-fitting method: it can formulate the pattern of big data.
(2) The probability distribution transformation simulator: it can get any kind of
probability distributions and do the transformation of probability dsitributions.
(3) SLLN software: it can analysize the central limiting theory and law of large
number.
(4) The curve-linear method: it can find out the relationship of two random variables,
which one is a mathematical combination of lot of variables.
In presnet, the statistical analysis is always the tool for big data, however, it is
incorrect way. Statistics is used on the condition of the part data of a population to
infer the characterestics of a population. But the big data is not part of population data,
but population, so the statistical analyis is not the true analysis tool for big data.
For easy to understand, this book introduces the orders of chapers and method
following the Statistics book. There are 36 examples that can study the difference
between the statistical analysis and the big data analysis. Readers can use the output
digit to understand the big data analysis skills.
The statiscal analysis method and theroy cannot analyize the big data, in
particular, the sampling distribution of test statistic cannot be gottten if the population
is not normal distribution. Of coures, the critical values of test statistic are always a
problem as calculating the values. The result of hypothesis and test doest not answer
in reality. Indeed, the small sample data can be analysized by the statistical analysis
and we get the information of assumption population distribution. The statistical
analysis is not suitable for the population that is big data.
The big data analysis is belonged to the analysis method of probability
distribution. Here, the following courses are necessary to understand the process of
big data analysis:
1) probability theory, 2) advance caluculus, 3)matrix, 4)mathematical statistics,
5)linear model. Big data analysis method is not as easy as the statistical analysis and
the process is also not easy to know. The accurate analysis method is always relied on
the mathematical method in generally.
The computer software is desinged and coded by the author, includng statistical
analysis package, probability distribution transformation simulator, the sampling
1
distribution of test statistics and residual, the sampling distribution of Durbin-Watson
test and LM test. This software can run and analyze the small sample data and the big
data.
The contents include 36 examples as follows.
Chapter 1 Basic analysis method
Section 1 The frequency distribution table cannot analysis big data
( )
Example 1, X 1 ~ Normal µ X1 = 0,σ X2 1 = 2 2 ,
Section 2 Assumption population is normal distribution, it is not a good idea.
Example 2, The population is shifted exponential
distribution,
X ~ Shifted_exponential (λ X , c X ) the sample mean and the sample variance.
Section 3 The hypothesis and test is not analyis method about big data
( )
Example 3, X 1 ~ Normal µ X1 = 100,σ X2 1 = 10 2 , , simulated the sample which size
is n,n=500,000,000, hypothesis and test.
Chapter 2 The population distribution test and the population mean and variance test
Section 1 The population distribution test
Example 4,Population is Normal(0,1), n=100,goodness of fit test
Example 5,Population is
U_quadratic(0,1)+ U_quadratic(0,1),
simuated the sample data which size is
100,000,000, the curve-fitting method.
Section 2 One population mean and population variance test
Example 6,Population is the Logistic distribution,
population mean=100,
population variance= 4, simulated 100 samples,
Section 3 Two independent population means and population variances test
Example 7 1st population is Arcsin distribution, population mean=100, population
variance= 25, simulated 50 samples.
2nd population is Semi circle distribution,
population mean=100,
population variance= 25, simulated 50 samples.
Two populations are independent,
Example 8 1st population is Arcsin distribution, population
mean=100,population variance= 25, simulated
60,000,000 samples.
2nd population is Semi circle distribution,
population mean=100, population variance= 25,
simulated 60,000,000 samples.
Two populations are independent,
Let X 1 is the data set of 1st population, X 2 is the data set of 2nd population and two
sample sizes are big data.
Example 9 1st population is Normal distribution,
population mean=100,
population variance= 25, simulated 20 samples.
2nd population is Normal distribution, population
mean=100,population variance= 9,
2
simulated 15 samples.
Two populations are independent,
Section 4 Two dependent population means and population variances test
Example 10 1st population is Double exponential distribution, population
mean=100, population variance= 8,
(
X 1 ~ Double exponential λ X 1 = 0.5, µ X1 = 100 , )
nd
2 population is
(
X 2 , X 2 x1 ~ Double exponential λ X 2 = 0.5, µ X 2 = x1 , )
population mean=100, population variance= 16,
Two populations are dependent, simulated the 20 pair samples.
Example 11 1st population is Double exponential distribution, population
mean=100, population variance= 8,
(
X 1 ~ Double exponential λ X 1 = 0.5, µ X1 = 100 , )
(
2nd population is X 2 , X 2 x1 ~ Double exponential λ X 2 = 0.5, µ X 2 = x1 , )
population mean=100, population variance= 16,
Two populations are dependent, simulated the 60,000,000 pair samples.
3
pˆ 1 − pˆ 2 X1 + X 2
W3 = ,p= ,
(
p 1− p ) 1 1
+
n1 n2
n1 + n2
pˆ 1 − pˆ 2
W5 = ,
pˆ 1 (1 − pˆ 1 ) pˆ 1 (1 − pˆ 1 )
+
n1 n2
4
the α i ≠ 0, i = 1,2,..., k ,
Arcsin population is divided to 5 categories,
Category 1 population, X 1 ~ Arc sin (µ1 = 5, c1 = 10 ),
Category 2 population, X 2 ~ Arc sin (µ 2 = 15, c2 = 10 ),
Category 3 population, X 3 ~ Arc sin (µ 3 = 25, c3 = 10 ),
Category 4 population, X 4 ~ Arc sin (µ 4 = 35, c4 = 10 ),
Category 5 population, X 5 ~ Arc sin (µ 5 = 45, c5 = 10 ),
The each has n sample data, one way model is designed by
X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,
σ ε2 = 50,
Section 5 the α i ≠ 0, i = 1,2,..., k and error distribution of each category
has a specific probability distribution.
Exmple 21,the α i ≠ 0, i = 1,2,..., k ,
Arcsin population is divided to 5 categories,
Category 1 population, X 1 ~ Arc sin (µ1 = 5, c1 = 10 ),
(
Category 2 population, X 2 ~ Normal µ 2 = 15, σ 22 = 50 , )
(
Category 3 population, X 3 ~ Semi _ circle µ 3 = 25, R3 = 200 , )
Category 4 population, X 4 ~ DE (λ4 = 0.2, µ 4 = 35),
Category 5 population, X 5 ~ Triangular1(µ5 = 45, c5 = 10 ),
The each has n sample data, one way model is designed by
X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,
α 1 = −20,α 2 = −10,α 3 = 0,α 4 = 10, α 5 = 20,
1 1 2 2
( )
ε 3 j ~ Semi _ circle 0, Rε = 200 ,σ ε2 = 50, ε 4 j ~ DE (λε = 0.2,0),
iid iid
3 3 4
4 5 5
5
X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,
α 1 = −20, α 2 = −10, α 3 = 0, α 4 = 10, α 5 = 20,
1 1 2 2
( )
ε 3 j ~ Semi _ circle 0, Rε = 200 , σ ε2 = 50, ε 4 j ~ DE (λε = 0.2,0),
iid iid
3 3 4
4 5 5
6
(
X 1 ~ Normal µ X = 10, σ X2 = 12 , )
the population conditional expectation line is
(
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0, σ 2 = X 14 , )
X 2i = β 0 + β1 X 1i + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope,
ε i is error,Three basic assumptions are
i) ε i ~ shifted exponential distribution ,
ii) E (ε i ) = 0,Var (ε i ) = σ 2 is affected by X1,
iii) ε 1 ,..., ε n are independently.
Section 6 The independent variable has a shifted exponential distribution and the
non-linear model, the three basic assumptions are unchanged.
(
Example 28 X 1 ~ Shifted _ exponential λ X 1 = 1, c X 1 = 0.1 , )
the population conditional expectation line is
E ( X 2 x1 ) = β 0 + β1 ( x1 + log( x1 )) = 1 + 2( x1 + log( x1 )),
ε ~ Normal (0, σ 2 = 1),
X 2i = β 0 + β1 H ( X 1i ) + ε i , i = 1,2,...., n , β 0 is intercept,
β1 is slope, ε i is error,
three basic assumptions are
i) ε i ~ Normal distribution,ii) E (ε i ) = 0,Var (ε i ) = σ 2 ,
iii) ε 1 ,..., ε n are independently,
Section 7 The random vatiable range has a specific region and the three basic
assumptions are unchanged.
(
Example 29, X 1 ~ Normal µ X 1 = 2, σ X2 1 = 5 2 , )
the population conditional expectation line is
( ) (
E X 2 x 1 = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0,σ 2 = 1 , )
− 20 ≤ X 1 X 2 ≤ 20 , X 2i = β 0 + β1 X 1i + ε i , i = 1,2,...., n ,
three basic assumptions
i) ε i ~ Normal distribution,ii) E (ε i ) = 0, Var (ε i ) = σ 2 ,
iii) ε 1 ,..., ε n are independently,
Section 8 The 3th basic assumptionis modified, error has the Durbin Watson the first
order autoregressive error model.
Example 30, Durbin Watson model
(
X 1 ~ Normal µ X1 = 2, σ X2 1 = 5 2 , )
the population conditional expectation line is
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 ,
µ ~ Normal (0, σ 2 = 1), there are n paired samples, T=n。
X 2t = β 0 + β1 X 1t + ε t , t = 1,2,...., T ,
β 0 is intercept, β1 is slope, ε i is error,
ε t = ρε t −1 + µ t , t = 1,2,3,...., T , ε 0 = 0, ρ < 1, let ρ =0.5.
The three basic assumptions are
i) µt ~Normal distribution,ii) E (µ t ) = 0, Var (µ t ) = σ 2 ,
7
iii) µ1 ,..., µ T are independently.
8
other assumptions are unchanged.
Example 34,
( )
iid
X 1 , X 2 ,....., X 10 ~ Normal µ X i = 100,σ X2 i = 25 ,
X 11 = sample Mid _ range ( X 1 , X 2 ,....., X 10 ) + ε ,
ε ~ Normal (µε = 0,σ ε2 = 16 )
Section 6 Dummy variable is one of independent variable, the other assumptions are
unchanged.
Example 35,
Dummy=0,
X 1 ~ Normal (E ( X 1 ) = 100,Var ( X 1 ) = 25),
X 2 x1 ~ Normal (E (X 2 x1 ) = 50 + 2 x1 ,Var (X 2 x1 ) = 1),
ε ~ Normal (E (ε ) = 0,Var (ε ) = 16 ),
X 3 Dummmy = 0, x1 , x2 = 50 + 2 x1 + 3x2 + ε
Dummy=1,
X 1 ~ Normal (E ( X 1 ) = 100,Var ( X 1 ) = 25),
X 2 x1 ~ Normal (E (X 2 x1 ) = 50 + 2 x1 ,Var (X 2 x1 ) = 1),
ε ~ Normal (E (ε ) = 0,Var (ε ) = 16 ),
X 3 Dummmy = 1, x1 , x2 = 10 + x1 + 5 x2 + ε
Section 7 The endogenous variable in the linear model, the other assumptions are
unchanged.
Example 36,
X 2 (t + 1) = β 0 + β1 X 1 (t ) + β 2 X 3 (t ) + β 3 X 4 (t ) + ε 1 (t ),
X 1 (t + 1) = α 0 + α 1 X 2 (t + 1) + α 2 X 3 (t + 1) + α 3 X 4 (t + 1) + ε 2 (t + 1),
X3(t)~ Normal(mu=10,sigma*sigma=4),
X4(t)~ Normal(mu=30+2*X3,sigma*sigma=25),
9
Curve-fitting,
(4)The multi-variate analyis is substituted by non-line analysis,
(4.1).Conclusion
(5).The mathematical model,
(6).The confirm the mathematical model using the probability
distribution simulator,
appendix 9.2,
X 1 ~ Shifted_ exp onential (λ1 = 1, c1 = 0 ),
X 2 ~ DEl (λ2 = 1, µ 2 = 0 ),
X 1 and X 2 are independent random variables,
appendix 9.3, X 1 ~ Arc sin (0,1), X 2 x1 ~ Uniform − x12 , x12 , ( )
f X 1 (x1 ) = ,−1 < x1 < 1, f X 2 x1 (x 2 x1 ) =
1 1 1
, x 2 ≤ x12 ,
π 1 − x12 2 x12
X 1 and X 2 are not independent random variables,
appendix 9.4,
10
X 1 , X 2 ~ Unform(− 1,1), f X i (xi ) = 0.5,−1 < xi < 1, i = 1,2,
iid
∑ (X )
10 10
∑
2
Xi − X i −X
W1 = MAD = i =1
, W2 = S = i =1
.
10 9
Appendix 10 One way analyis,the sampling distribution of test
statsistic when error distribution is arcsin distribution.
Appendix 10.1)k=5, n=5,
Appendix 10.2)k=5, n=100,
Appendix 10.3)k=5, n=1000,
11
Chaper 1. Basic analysis method
The frequency distribution table is arranged data method, the process has the class
number, frequency of each class and class limit. The formula of class number
k = log 2 (n ) + 1, k =class number, n =sample size,when n=100,000,000 k= 26.
The 26 class cannot understand the character of data set that has 100,000,000
records.
For accurately, the probability method is a good method when big data.
Note: Big data is not close set, Curve-linear analysis can be usedful, please refer the
Appendix 5.
( )
Example 1, X 1 ~ Normal µ X1 = 0,σ X2 1 = 2 2 , simulated the sample which size is n.
(1.1)n=10, frequency distribution table,
X1 frequency distribution table
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -6.31382~ -4.88201 -5.59792 10.00000 0.0100000 0.0100000
[ 2 ] -4.88201~ -3.45020 -4.16610 34.00000 0.0340000 0.0440000
[ 3 ] -3.45020~ -2.01839 -2.73429 128.00000 0.1280000 0.1720000
[ 4 ] -2.01839~ -0.58657 -1.30248 231.00000 0.2310000 0.4030000
[ 5 ] -0.58657~ 0.84524 0.12933 279.00000 0.2790000 0.6820000
[ 6 ] 0.84524~ 2.27705 1.56115 197.00000 0.1970000 0.8790000
[ 7 ] 2.27705~ 3.70886 2.99296 84.00000 0.0840000 0.9630000
[ 8 ] 3.70886~ 5.14068 4.42477 27.00000 0.0270000 0.9900000
[ 9 ] 5.14068~ 6.57249 5.85658 10.00000 0.0100000 1.0000000
frequency distribution: sample mean=-0.075416 , sample variance=4.355512 , sample sd=2.086986
1
[ 24 ] 9.36532~ 10.25902 9.81217 141.00000 0.0000014 0.9999998
[ 25 ] 10.25902~ 11.15272 10.70587 15.00000 0.0000001 1.0000000
[ 26 ] 11.15272~ 12.04642 11.59957 1.00000 0.0000000 1.0000000
frequency distribution: sample mean=-0.000169 , sample variance=4.066784 , sample sd=2.016627
2
The distribution function estimated line ------
F(X)= 0.08435101807117462200+
-0.08860223740339279200*(X- 1.82400258924639000000)^1+
0.25061420723795891000*(X--1.82400258924639000000)^2+
-0.04219520930200815200*(X- -1.82400258924639000000)^3+
value range 0.3000003052<=F(x)<= 0.4000000000 ,
value range -0.5240759524<=X<= -0.2532618458 ,
Error=0.000000490106430389 MAX=0.000035887307207494 coefficient of
determination=0.999999820467032170,
3
The distribution function estimated line ------
F(X)= -1.24017958471085880000+
1.87075669982004910000*(X- -1.32396818420487010000)^1+
-0.58725876218522899000*(X- -1.32396818420487010000)^2+
0.08198952173552243000*(X- -1.32396818420487010000)^3+
-0.00428836389892239820*(X- -1.32396818420487010000)^4+
value range 0.9000003052<=F(x)<= 0.9999996948 ,
value range 1.2814384883<=X<= 5.0553297197 ,
Error=0.000012818821521072 MAX=0.001163414315560996 coefficient of
determination=0.999991132738400010
The image of estimated line
The probability distribution of big data is the population distribution, the characters
of big data is the characters of population. In statistic, the population dsitrbituion is
assumed the normal distribution in usually,. In fact, population distribution doesn’t
need set a specific probability distribution.
Finding the population distributon methods are
i) Curve-fitting, ii)SLLN(strong law of large number), iii) Curve-linear.
The curve-fitting method is more impottant than the statistical analysis in big data
and finding the probability distribution of big data is first step for analysis the big
data.
4
Example 2, The population is shifted exponential distribution,
X ~ Shifted_exponential (λ X , c X ),
f X (x ) = λ X exp(− λ X (x − c X )), x > c X ,
E ( X ) = µ X = λ X + c X , Var ( X ) = σ X2 =
1
,
(λ X )2
µ X is the function of σ X .
Let X ~ Shifted_exponential (λ X = 1, c X = −1),
E ( X ) = µ X = λ X + c X = 0, Var ( X ) = σ X2 = 1, the sample size is n.
∑ (X )
n n
∑X
2
i i −X
Y1 = X = i =1
, sample mean, Y2 = i =1
,sample variance,
n n −1
(2.1)n=30,
f Y1 ( y1 ), FY1 ( y1 ) Coefficient
Mathematical Mean: 0.00001
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.03333
S.D. : 0.18256
Skewed Coef. : 0.36519
Kurtosis Coef. : 3.19965
MAD : 0.14527
Range : 2.07844
Mid_range : 0.28129
Median : -0.01107
Q1 : -0.12844
Q2 : -0.01107
Q3 : 0.11640
IQR : 0.24483
C.V. : none
f Y2 ( y2 ), FY2 ( y2 ) Coefficient
Mathematical Mean: 1.00003
Geometrical Mean : 0.88771
Harmonic Mean : 0.78723
Variance : 0.26920
S.D. : 0.51884
Skewed Coef. : 1.75228
Kurtosis Coef. : 9.28194
MAD : 0.38430
Range : 13.85028
Mid_range : 6.96980
Median : 0.88990
Q1 : 0.64057
Q2 : 0.88990
Q3 : 1.23306
IQR : 0.59249
C.V. : 0.51883
5
(2.2)n=200,
f Y1 ( y1 ), FY1 ( y1 ) Coefficient
Mathematical Mean: 0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.00500
S.D. : 0.07071
Skewed Coef. : 0.14138
Kurtosis Coef. : 3.03089
MAD : 0.05640
Range : 0.80854
Mid_range : 0.06479
Median : -0.00167
Q1 : -0.04855
Q2 : -0.00167
Q3 : 0.04675
IQR : 0.09531
C.V. : none
f Y2 ( y2 ), FY2 ( y2 ) Coefficient
Mathematical Mean: 1.00003
Geometrical Mean : 0.98071
Harmonic Mean : 0.96187
Variance : 0.04008
S.D. : 0.20021
Skewed Coef. : 0.67754
Kurtosis Coef. : 3.93750
MAD : 0.15714
Range : 2.86882
Mid_range : 1.77018
Median : 0.97946
Q1 : 0.85848
Q2 : 0.97946
Q3 : 1.11862
IQR : 0.26015
C.V. : 0.20021
The following is goodness of fit(Pearson chi square test statistic), there are 20 basic
probability distribution can be selected and the null hypothesis probability
distributipon.
(2.3)n=30,
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
lower limit -0.96036 -0.74116 -0.45856 -0.06026 0.62065
upper limit -0.74116 -0.45856 -0.06026 0.62065
observed no 8.00000 4.00000 5.00000 6.00000 7.00000
probability 0.20000 0.20000 0.20000 0.20000 0.20000
expected no 6.00000 6.00000 6.00000 6.00000 6.00000
chi square 0.66667 0.66667 0.16667 0.00000 0.16667
degree of freedom=2
H0: X1~Shifted exponential(lamda,c), lamda,c are unknown
lamda point estimated value=1.017983 (MLE)
c point estimated value=-0.960361 (MLE)
pearson chi-square test statistic =1.666667
p-value=0.434500
6
lamda value from 0.848319 to 1.272478
c value from -0.826382 to -1.094340
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
lower limit -0.96036 -0.74116 -0.45856 -0.06026 0.62065
upper limit -0.74116 -0.45856 -0.06026 0.62065
observed no 8.00000 4.00000 5.00000 6.00000 7.00000
probability 0.20000 0.20000 0.20000 0.20000 0.20000
expected no 6.00000 6.00000 6.00000 6.00000 6.00000
chi square 0.66667 0.66667 0.16667 0.00000 0.16667
degree of freedom=2
H0: X1~Shifted exponential(lamda=1.017983,c=-0.960361),
pearson chi-square test statistic =1.666667
p-value=0.434500
Population is Shifted exponential(lamda=1.017983,c=-0.960361).
(2.4) n=200,
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ]
lower limit -0.99517 -0.86123 -0.70661 -0.52374 -0.29991 -0.01136 0.39534
1.09060
upper limit -0.86123 -0.70661 -0.52374 -0.29991 -0.01136 0.39534 1.09060
observed no 23.00000 20.00000 28.00000 24.00000 23.00000 34.00000 26.00000
22.00000
probability 0.12500 0.12500 0.12500 0.12500 0.12500 0.12500 0.12500
0.12500
expected no 25.00000 25.00000 25.00000 25.00000 25.00000 25.00000 25.00000
25.00000
chi square 0.16000 1.00000 0.36000 0.04000 0.16000 3.24000 0.04000
0.36000
degree of freedom=5
H0: X1~Shifted exponential(lamda,c), lamda,c are unknown
lamda point estimated value=0.996968 (MLE)
c point estimated value=-0.995168 (MLE)
pearson chi-square test statistic =5.360000
p-value=0.373500
correction:
expected number>=5 in each cell, the frequency table is adjusted
7
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
lower limit -0.99517 -0.19477 0.60563 1.40604 2.20644
upper limit -0.19477 0.60563 1.40604 2.20644 5.40804
observed no 104.00000 58.00000 23.00000 8.00000 7.00000
probability 0.54976 0.24752 0.11145 0.05018 0.04109
expected no 109.95195 49.50479 22.28905 10.03543 8.21878
chi square 0.32219 1.45781 0.02268 0.41283 0.18073
degree of freedom=2
pearson chi-square test statistic =2.396247
p-value=0.301700
Population is Shifted exponential(lamda=0.996968,c=-0.995168).
(2.5) n=100,000,000, it is big data, goodness of fit(Pearson chi square test statistic)
and the probability distribution.
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.00000 -0.94871 -0.89465 -0.83749 -0.77688 -0.71234 -0.64335
-0.56925 -0.48922 -0.40221 -0.30691 -0.20156 -0.08379 0.04973 0.20387
0.38618 0.60930 0.89696 1.30239 1.99548
upper limit -0.94871 -0.89465 -0.83749 -0.77688 -0.71234 -0.64335 -0.56925
-0.48922 -0.40221 -0.30691 -0.20156 -0.08379 0.04973 0.20387 0.38618
0.60930 0.89696 1.30239 1.99548
observed no 4999364.00000 4996823.00000 5004706.00000 4999628.00000 4999942.00000 5001463.00000
5001842.00000 5002197.00000 4999556.00000 4999314.00000 4999025.00000 4995225.00000
4999502.00000 5000939.00000 5000360.00000 5000155.00000 5000682.00000 4997930.00000
4999445.00000 5001902.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.08090 2.01867 4.42929 0.02768 0.00067 0.42807 0.67859
0.96536 0.03943 0.09412 0.19012 4.56013 0.04960 0.17634 0.02592
0.00481 0.09302 0.85698 0.06161 0.72352
degree of freedom=17
H0: X1~Shifted exponential(lamda,c), lamda,c are unknown
lamda point estimated value=1.000084 (MLE), c point estimated value=-1.000000 (MLE)
pearson chi-square test statistic =15.504827 , p-value=0.559100
Population is Shifted exponential(lamda=1.000084,c=-1.000000).
8
coefficient of determination=0.999999983600273760
The image of estimated line
The big data is population all data, the population distribution does not assume and
gets the population distribution from curve-fitting methid in directly.
1.3. The hypothesis and test is not analyis method about big data
The hypothesis and test is method of the statistics, it gets the information of
population form the test. The test result is not true always, it is sometimes and the
sampling distribution of test statistic cannot link the critical value in sometime.
Big data is population data, it is not necessary to check the parameter of population.
The character of population can get from the big data in directly and the result is
really and rightly.
9
System integrated It is impossible to do, The probability
and analysis distribution can be
transferred when the
mathematical model is
setted.
simulator Ouput the simulated According the model to
sample data. simulating data and the
comparison with simulated
data and the real data.
The comparison It is impossible to do, SLLN and the probability
of system distribution transferred.
designed
( )
Example 3, X 1 ~ Normal µ X1 = 100,σ X2 1 = 10 2 , , simulated the sample which size is n,
n=500,000,000, it is big data.
(3.1)Hypothesis and test
* Suppose the population distribution is the normal distribution.
1. one population mean test and mu confidence interval when population sigma is
unknown
H0: mu=0 , mu is population mean
t(df=499999999)=223600.346338
which formula is t=(X1 sample mean-0)/standard error
the standard error =sample stand deviation/(n-1)^0.5, n is sample size=500000000
left tail test p-value= 1.0000, right tail test p-value= 0.0000
two tailes test p-value= 0.0000
90% confidence interval for mu, [99.999350 , 100.000822]
95% confidence interval for mu, [99.999209 , 100.000963]
99% confidence interval for mu, [99.998934 , 100.001238]
10
two tailes test p-value= 0.8474
a.s. a.s.
→ µ = 100, S 2 n
X n
→∞
→σ 2 = 100 ,
→∞
11
Pr(| X1 distribution F() - X2 distribution F()|< 0.0100000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0050000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0010000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0005000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0001000000)= 1.000000
The probability limiting theory
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0000000003
Pr(| X1 distribution F() - X2 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0001000000)= 0.000000
Red line isX1,Blue line is X2,
12
value range 0.3000000020<=F(x)<= 0.4000000000 ,
value range 94.7556650642<=X<= 97.4667068903 ,
Error=0.000000088238904393 MAX=0.000014401248907725 coefficient of
determination=1.000000000000000000,
13
The comparison of estimated value and
the sample data.
The frequency distribution table will be used, the a specific population distribution
is changed to k class table.
χ df = ∑
2
k
(Oi − E ( X i ))
2
=∑
k
(
Oi − nPi 0
2
)
> χ α2 ,df , reject null hypothesis.
i =1 E(X i ) i =1 nPi 0
14
the distribution of big data.
Example 4,Population is Normal(0,1), simuated the sample data which size is 100,
(4.1) Normal(0,1) probability distribution,
Normal(0,1) Coeffficient
Mathematical Mean: -0.00011
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.99994
S.D. : 0.99997
Skewed Coef. : -0.00004
Kurtosis Coef. : 3.00022
MAD : 0.79783
Range : 10.84608
Mid_range : -0.03259
Median : -0.00009
Q1 : -0.67455
Q2 : -0.00009
Q3 : 0.67426
IQR : 1.34881
C.V. : none
(4.2)The population distribution is assumptions of 20 kinds probability distribution
and do the goodness of fit test.
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
lower limit -1.17046 -0.60521 -0.17064 0.23492 0.66939 1.23442
upper limit -1.17046 -0.60521 -0.17064 0.23492 0.66939 1.23442
observed no 12.00000 19.00000 11.00000 16.00000 9.00000 21.00000 12.00000
probability 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
expected no 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571
chi square 0.36571 1.55571 0.75571 0.20571 1.95571 3.15571 0.36571
degree of freedom=4
H0: X1~Normal(mu,sigma*sigma), mu,sigma are unknown
population mean(mu) point estimated value=0.032257 (MLE,UMVUE)
population variance(sigma*sigma) which point estimated value=1.268638
(UMVUE) , pearson chi-square test statistic =8.360000, p-value=0.079200
f(x2),F(x2) Coeffficient
Mathematical Mean: 0.03242
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.61572
S.D. : 1.27111
Skewed Coef. : 0.00010
Kurtosis Coef. : 2.18407
MAD : 1.06662
Range : 5.90724
Mid_range : 0.03241
Median : 0.03227
Q1 : -0.95217
Q2 : 0.03227
Q3 : 1.01697
IQR : 1.96914
C.V. : 39.20616
16
f(x3),F(x3) Coeffficient
Mathematical Mean: 0.03227
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.26824
S.D. : 1.12616
Skewed Coef. : -0.00067
Kurtosis Coef. : 4.19418
MAD : 0.86076
Range : 17.14903
Mid_range : 0.03484
Median : 0.03235
Q1 : -0.64980
Q2 : 0.03235
Q3 : 0.71441
IQR : 1.36422
C.V. : 34.90190
17
(4.5)The comparison of two distribution functions,
X1~ Normal(0.032257, 1.268638),X2~Normal(0,1),Blue line
The probability limiting theory
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0005151313
Pr(| X1 distribution F() - X2 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0100000000)= 0.795772
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0050000000)= 0.902920
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0010000000)= 0.981488
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0005000000)= 0.990881
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0001000000)= 0.998159
X2~ Trapezoid(0.032257, 6.820000),X3~Normal(0,1)
The probability limiting theory
E(| X2 distribution F() - X3 distribution F()|^2)= 0.0040737700
Pr(| X2 distribution F() - X3 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0500000000)= 0.638571
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0100000000)= 0.929455
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0050000000)= 0.961603
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0010000000)= 0.990411
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0005000000)= 0.995231
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0001000000)= 0.999044
18
(5.2)The probability distribution
pdf,cdf Coeffficient
Mathematical Mean: 1.00005
Geometrical Mean : 0.77150
Harmonic Mean : 0.42919
Variance : 0.30000
S.D. : 0.54773
Skewed Coef. : -0.00019
Kurtosis Coef. : 2.09515
MAD : 0.42858
Range : 1.99996
Mid_range : 1.00001
Median : 1.00002
Q1 : 0.63663
Q2 : 1.00002
Q3 : 1.36463
IQR : 0.72799
C.V. : 0.54770
19
value range 0.1000000100<=F(x)<= 0.2000000000 ,
value range 0.1975766814<=X<= 0.3655230421 ,
20
Error=0.000042976954628337 MAX=0.000338756703370580 coefficient of
determination=0.999984251538919790,
21
coefficient of determination=0.999999914843758720
22
4.32947254180908200000*tan((F(x)-0.5)*pi)^3+
67.88774108886718800000*tan((F(x)-0.5)*pi)^4+
589.52453613281250000000*tan((F(x)-0.5)*pi)^5+
2660.26855468750000000000*tan((F(x)-0.5)*pi)^6+
4830.36816406250000000000*tan((F(x)-0.5)*pi)^7+
0.450000<F(x)<=0.500000
Error=0.000000003958578990 MAX=0.000004794943887165
coefficient of determination=0.999999967588840240
23
Error=0.000283952128120614 MAX=0.001985943214733776
coefficient of determination=0.999927048293332570
The random variable value estimated line ------
X=-245.96403503417969000000+
913.20166015625000000000*tan((F(x)-0.5)*pi)^1+
-1283.43377685546870000000*tan((F(x)-0.5)*pi)^2+
793.42468261718750000000*tan((F(x)-0.5)*pi)^3+
-131.82031250000000000000*tan((F(x)-0.5)*pi)^4+
-67.65928649902343700000*tan((F(x)-0.5)*pi)^5+
23.61195373535156300000*tan((F(x)-0.5)*pi)^6+
0.750000<F(x)<=0.800000
Error=0.000618953406496594 MAX=0.003635112268771001
coefficient of determination=0.999923991490472620
24
The comparison of estimated value and
the sample data.
H 0 : σ = 2, χ 992 =
(n − 1)S 2 =
99 × S 2
,
4 4
∑ (X )
n n
∑ Xi
2
i −X
X= i =1
, sample mean S 2 = i =1 ,sample variance,
n n −1
X − 100 X − 100 X − 100
(6.2) t 99 = = , W2 = , it is test statistic.
S n S 100 S 100
25
Mathematical Mean: -0.00002
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.02019
S.D. : 1.01004
Skewed Coef. : -0.00054
Kurtosis Coef. : 3.03977
MAD : 0.80462
Range : 11.20633
Mid_range : 0.02111
Median : 0.00001
Q1 : -0.67859
Q2 : 0.00001
Q3 : 0.67859
IQR : 1.35719
C.V. : none
student(df=99),
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.2900 1.6610 1.9854 2.3651 2.6270
可見得 W2 不是真正的 student(df=99)分配.
Z(standard normal)
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.28 1.645 1.96 2.326 2.576
student(df=99) is not Z distribution,but student(df) df→
∞
→ Z.
26
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.047223
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.846550
W2 is approached to t (df=99).
(6.3) χ 992 =
(n − 1)S 2 =
99 × S 2
,W3 =
99 × S 2
, the test statistic.
4 4 4
Mathematical Mean: 99.00358
Geometrical Mean : 97.44302
Harmonic Mean : 95.89919
Variance : 314.93577
S.D. : 17.74643
Skewed Coef. : 0.50368
Kurtosis Coef. : 3.46742
MAD : 14.03831
Range : 202.31477
Mid_range : 135.86554
Median : 97.56694
Q1 : 86.48509
Q2 : 97.56694
Q3 : 109.94606
IQR : 23.46098
C.V. : 0.17925
(
W3 is not symmetric distribution, P χ 992 ≤ χ12−α ,99 = α , )
α 0.005 0.01 0.025 0.05 0.1
Critical value 60.995366 63.911996 68.402117 72.495428 77.480065
27
Pr(| W3 distribution F() - W0 distribution F()|< 0.0005000000)= 0.004866
Pr(| W3 distribution F() - W0 distribution F()|< 0.0001000000)= 0.001018
28
n1 n2
∑X 1i ∑X
j =1
2j
X1 = i =1
,X2 = , the sample means,
n1 n2
∑ (X ) ∑ (X )
n1 n2
2
−X2
2
1i − X1 2j
j =1
S12 = i =1
, S 22 = ,the sample variances,
n1 − 1 n2 − 1
∑ (X ) ( )
n1 n2
− X1 +∑ X2j − X 2
2 2
1i
i =1 j =1
2
Spool sample variance, S spool = ,
n1 + n2 − 2
σ1 = σ 2 = σ ,
(n1 + n2 − 2)S pool
2
H 0 : σ = 5, χ 982 = ,
25
X1 − X 2 X1 − X 2 X1 − X 2
(7.2) t 98 = = , W2 = ,
1 1 1 1 1 1
S pool + S pool + S pool +
n1 n2 50 50 50 50
It is sampling distribution of test statistic,
student(df=98),
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.2897 1.66004 1.9837 2.3640 2.6258
Z(standard normal)
α 0.9 0.95 0.975 0.99 0.995
臨界值 1.28 1.645 1.96 2.326 2.576
student(df=98) is not Z,student(df)分配 df→
∞
→ Z.
S12
(7.3) F49, 49 = = W3 , it is test statistic,
S 22
Mathematical Mean: 1.02179
Geometrical Mean : 1.00517
Harmonic Mean : 0.98899
Variance : 0.03526
S.D. : 0.18778
Skewed Coef. : 0.68452
Kurtosis Coef. : 3.99789
MAD : 0.14708
Range : 2.65286
Mid_range : 1.70655
Median : 1.00245
Q1 : 0.88971
Q2 : 1.00245
Q3 : 1.13237
IQR : 0.24266
C.V. : 0.18378
30
W3,Red line,W0~ F 分配(df1=49, df2=49),Blue line
E(| W3 distribution - W0 distribution |^2)= 0.0149602069
************ The | W3 distribution F() - W0 distribution F()| ****************
The almost surely limiting theory
E(| W3 distribution F() - W0 distribution F()|^2)= 0.0064238777
Pr(| W3 distribution F() - W0 distribution F()|< 0.1000000000)= 0.711018
Pr(| W3 distribution F() - W0 distribution F()|< 0.0500000000)= 0.282925
Pr(| W3 distribution F() - W0 distribution F()|< 0.0100000000)= 0.053264
Pr(| W3 distribution F() - W0 distribution F()|< 0.0050000000)= 0.026527
Pr(| W3 distribution F() - W0 distribution F()|< 0.0010000000)= 0.005327
Pr(| W3 distribution F() - W0 distribution F()|< 0.0005000000)= 0.002681
Pr(| W3 distribution F() - W0 distribution F()|< 0.0001000000)= 0.000556
The probability limiting theory
E(| W3 distribution F() - W0 distribution F()|^2)= 0.0064238777
Pr(| W3 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.288982
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.717075
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.946736
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.973473
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.994673
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.997319
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.999444
W3 is not F(df1=49, df2=49).
(
W3 is not sysmmetric distribution, P χ 992 ≤ χ12−α ,99 = α , )
α 0.005 0.01 0.025 0.05 0.1
Critical value 76.197494 78.232165 81.220576 83.834295 86.890946
α 0.9 0.95 0.975 0.99 0.995
Critical value 109.234755 112.517459 115.387940 118.721108 121.007592
Comaprsion of the cumulative probability distribution function of W3 and W0,
the analyis method is SLLN.
31
W3,Red line,W0~Chi square(df=99),Blue line
E(| W3 distribution - W0 distribution |^2)= 28.1950421877
************ The | W3 distribution F() - W0 distribution F()| ****************
The almost surely limiting theory
E(| W3 distribution F() - W0 distribution F()|^2)= 0.0065680926
Pr(| W3 distribution F() - W0 distribution F()|< 0.1000000000)= 0.693687
Pr(| W3 distribution F() - W0 distribution F()|< 0.0500000000)= 0.280758
Pr(| W3 distribution F() - W0 distribution F()|< 0.0100000000)= 0.053043
Pr(| W3 distribution F() - W0 distribution F()|< 0.0050000000)= 0.026300
Pr(| W3 distribution F() - W0 distribution F()|< 0.0010000000)= 0.005175
Pr(| W3 distribution F() - W0 distribution F()|< 0.0005000000)= 0.002556
Pr(| W3 distribution F() - W0 distribution F()|< 0.0001000000)= 0.000498
The probability limiting theory
E(| W3 distribution F() - W0 distribution F()|^2)= 0.0065680926
Pr(| W3 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.306313
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.719242
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.946957
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.973700
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.994825
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.997444
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.999502
W3 is not chi square (df=98).
32
Mathematical Mean: 100.00098
Geometrical Mean : 99.87580
Harmonic Mean : 99.75063
Variance : 25.00367
S.D. : 5.00037
Skewed Coef. : -0.00028
Kurtosis Coef. : 1.49991
MAD : 4.50195
Range : 14.14214
Mid_range : 100.00000
Median : 100.00159
Q1 : 95.00027
Q2 : 100.00159
Q3 : 105.00150
IQR : 10.00123
C.V. : 0.05000
33
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0500000000)= 0.346311
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0100000000)= 0.888842
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0050000000)= 0.944612
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0010000000)= 0.988972
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0005000000)= 0.994487
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0001000000)= 0.998902
X 1 and X 2 are different probability
distribution.
Y2 = X 1 − X 2 ,
Mathematical Mean: 0.00005
Geometrical Mean : none
Harmonic Mean : none
Variance : 49.99462
S.D. : 7.07069
Skewed Coef. : -0.00009
Kurtosis Coef. : 2.37498
MAD : 5.78337
Range : 34.13656
Mid_range : -0.00033
Median : 0.00079
Q1 : -5.08802
Q2 : 0.00079
Q3 : 5.08761
IQR : 10.17563
C.V. : none
Y3 = X 1 × X 2 ,
34
Mathematical Mean: 9999.98883
Geometrical Mean : 9974.96383
Harmonic Mean : 9949.95429
Variance : 500650.64213
S.D. : 707.56671
Skewed Coef. : 0.10573
Kurtosis Coef. : 2.38218
MAD : 578.83920
Range : 3413.79617
Mid_range : 10070.67790
Median : 9977.05165
Q1 : 9485.48654
Q2 : 9977.05165
Q3 : 10503.52127
IQR : 1018.03473
C.V. : 0.07076
Y4 = Min( X 1 , X 2 ),
Mathematical Mean: 97.10863
Geometrical Mean : 97.02428
Harmonic Mean : 96.94127
Variance : 16.63579
S.D. : 4.07870
Skewed Coef. : 0.60474
Kurtosis Coef. : 2.39843
MAD : 3.42879
Range : 17.07097
Mid_range : 98.53558
Median : 96.21186
Q1 : 93.63726
Q2 : 96.21186
Q3 : 100.00155
IQR : 6.36429
C.V. : 0.04200
Y5 = Max( X 1 , X 2 ),
Mathematical Mean: 102.89098
Geometrical Mean : 102.80870
Harmonic Mean : 102.72501
Variance : 16.63924
S.D. : 4.07912
Skewed Coef. : -0.60492
Kurtosis Coef. : 2.39859
MAD : 3.42913
Range : 17.07099
Mid_range : 101.46443
Median : 103.78740
Q1 : 99.99853
Q2 : 103.78740
Q3 : 106.36321
IQR : 6.36468
C.V. : 0.03965
X1 × X 2 1
W1 = = ,
X1 + X 2 1 X1 +1 X 2
Mathematical Mean: 49.93755
Geometrical Mean : 49.90619
Harmonic Mean : 49.87485
Variance : 3.13287
S.D. : 1.76999
Skewed Coef. : 0.06579
Kurtosis Coef. : 2.37060
MAD : 1.44915
Range : 8.53611
Mid_range : 49.98904
Median : 49.88532
Q1 : 48.66361
Q2 : 49.88532
Q3 : 51.21596
IQR : 2.55235
C.V. : 0.03544
35
Example 9 1st population is Normal distribution, population mean=100,
population variance= 25, simulated 20 samples.
2nd population is Normal distribution, population mean=100,
population variance= 9, simulated 15 samples.
Two populations are independent,
2
S12 S 22
+
X1 − X 2 X1 − X 2 n1 n2
H 0 : µ1 = µ 2 , t df = = , df = 2 2
,
S12 S 22 S12 S 22 S12 S12
+ + (n1 − 1) + (n2 − 1)
n1 n2 20 15 n1 n2
∑ (X ) ∑ (X )
n1 n2
2
−X2
2
1i − X1 2j
j =1
S12 = i =1
, S 22 = , the sample variance of two populations.
n1 − 1 n2 − 1
2
S12 S 22
+
(9.1) df = W5 = 1
n n 2
is estimated value,
2 2
S12 S12
(n1 − 1) + (n2 − 1)
n1 n2
X1 − X 2 X1 − X 2 X1 − X 2
(9.2) t df = = , W2 = , the test statistic.
2 2 2 2
S S S S S12 S 22
+ 1 2 1
+ 2
+
n1 n2 20 15 20 15
36
Mathematical Mean: -0.00002
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.06693
S.D. : 1.03292
Skewed Coef. : 0.00023
Kurtosis Coef. : 3.21413
MAD : 0.81728
Range : 14.32443
Mid_range : 0.38564
Median : -0.00002
Q1 : -0.68216
Q2 : -0.00002
Q3 : 0.68226
IQR : 1.36442
C.V. : none
student(df=27),
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.3137 1.7033944 2.052 2.4726 2.7704
W2 is not student(df=27),
37
2.4. Two dependent population means and population variances test
∑ (d )
n n
∑d
2
i i −d
d d
t n −1 = = t19 = ,d = i =1
, S d2 = i =1
,
Sd n Sd 20 n n −1
H 0 : ρ ( X 1 , X 2 ) = ρ 0 = 0.5 ,
1 1+ r 1 1 + ρ0
Z r = ln , Z ρ0 = ln ,
2 1− r 2 1 − ρ 0
Z r − Z ρ0 Z r − Z 0.70710678118
Z test statistic n →
>10
= = W9 ,
1 1
n−3 17
∑ (X )( )
n n n
1i − X 1 X 2i − X 2 ∑ X 1i ∑X 2i
r= i =1
,X1 = i =1
,X2 = i =1
,
∑ (X ) ∑ (X )
n
2
n
2 n n
1i − X1 2i −X2
i =1 i =1
1 1+ r
Zr = ln is approached to standara normal disrribution when n > 10 .
2 1− r
d
(10.1) t19 = = W2 , this is test statistic,
Sd 20
38
Mathematical Mean: 0.00091
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.10477
S.D. : 1.05108
Skewed Coef. : -0.00008
Kurtosis Coef. : 3.10022
MAD : 0.83664
Range : 15.32141
Mid_range : -0.19370
Median : 0.00102
Q1 : -0.70487
Q2 : 0.00102
Q3 : 0.70679
IQR : 1.41166
C.V. : none
student(df=19),
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.3280 1.7293 2.0932 2.5388 2.8600
W2 is not student(df=19),
Comaprsion of the cumulative probability distribution function of W2 and W0,
the analyis method is SLLN.
W2,Red line,W0~t (df=19),Blue line
E(| W2 distribution - W0 distribution |^2)= 0.0007868991
************ The | W2 distribution F() - W0 distribution F()| ****************
The almost surely limiting theory
E(| W2 distribution F() - W0 distribution F()|^2)= 0.0000138807
Pr(| W2 distribution F() - W0 distribution F()|< 0.1000000000)= 1.000000
Pr(| W2 distribution F() - W0 distribution F()|< 0.0500000000)= 1.000000
Pr(| W2 distribution F() - W0 distribution F()|< 0.0100000000)= 1.000000
Pr(| W2 distribution F() - W0 distribution F()|< 0.0050000000)= 0.752822
Pr(| W2 distribution F() - W0 distribution F()|< 0.0010000000)= 0.138949
Pr(| W2 distribution F() - W0 distribution F()|< 0.0005000000)= 0.066443
Pr(| W2 distribution F() - W0 distribution F()|< 0.0001000000)= 0.013042
The probability limiting theory
E(| W2 distribution F() - W0 distribution F()|^2)= 0.0000138807
Pr(| W2 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.000000
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.000000
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.000000
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.247178
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.861051
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.933557
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.986958
W2 is approached to t(df=19).
39
Mathematical Mean: 0.12932
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.55681
S.D. : 1.24772
Skewed Coef. : 0.08443
Kurtosis Coef. : 3.10007
MAD : 0.99178
Range : 14.31015
Mid_range : 0.23809
Median : 0.11060
Q1 : -0.71401
Q2 : 0.11060
Q3 : 0.95325
IQR : 1.66726
C.V. : 9.64807
40
W9 is not Z distribution,
X 2 marginal probability
Mathematical Mean: 99.99976
Geometrical Mean : 99.91953
Harmonic Mean : 99.83890
Variance : 15.99534
S.D. : 3.99942
Skewed Coef. : 0.00014
Kurtosis Coef. : 4.49831
MAD : 2.99973
Range : 78.72053
Mid_range : 99.38273
Median : 99.99975
Q1 : 97.70688
Q2 : 99.99975
Q3 : 102.29224
IQR : 4.58536
C.V. : 0.03999
41
(11.2) Comaprsion of the cumulative probability distribution function of X 1 and
X 2 , the analyis method is SLLN.
X 1 ,Red line, X 2 ,Blue line
E(| X1 distribution - X2 distribution |^2)= 1.4460020756
************ The | X1 distribution F() - X2 distribution F()| ****************
The almost surely limiting theory
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0046337057
Pr(| X1 distribution F() - X2 distribution F()|< 0.1000000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0500000000)= 0.306545
Pr(| X1 distribution F() - X2 distribution F()|< 0.0100000000)= 0.049092
Pr(| X1 distribution F() - X2 distribution F()|< 0.0050000000)= 0.023699
Pr(| X1 distribution F() - X2 distribution F()|< 0.0010000000)= 0.004522
Pr(| X1 distribution F() - X2 distribution F()|< 0.0005000000)= 0.002241
Pr(| X1 distribution F() - X2 distribution F()|< 0.0001000000)= 0.000450
42
E(X1)= 99.9998, Var(X1)= 8.0009, E(X2)=99.9999, Var(X2)=16.0037,
Cov(X1,X2)= 8.0028, X1 and X2 correlation coefficient=0.7072.
(11.4)The probability distribution transformation,
Y1 = X 1 + X 2 ,
Mathematical Mean: 200.00116
Geometrical Mean : 199.90092
Harmonic Mean : 199.80032
Variance : 40.00594
S.D. : 6.32502
Skewed Coef. : 0.00047
Kurtosis Coef. : 5.04787
MAD : 4.66678
Range : 140.09298
Mid_range : 198.22280
Median : 200.00043
Q1 : 196.52041
Q2 : 200.00043
Q3 : 203.48213
IQR : 6.96173
C.V. : 0.03162
Y2 = X 1 − X 2 ,
Mathematical Mean: -0.00017
Geometrical Mean : none
Harmonic Mean : none
Variance : 7.99976
S.D. : 2.82838
Skewed Coef. : -0.00107
Kurtosis Coef. : 5.99838
MAD : 2.00007
Range : 71.51912
Mid_range : 1.87186
Median : -0.00008
Q1 : -1.38633
Q2 : -0.00008
Q3 : 1.38652
IQR : 2.77285
C.V. : none
Y3 = Max( X 1 , X 2 ),
Mathematical Mean: 100.99972
Geometrical Mean : 100.94557
Harmonic Mean : 100.89166
Variance : 11.00200
S.D. : 3.31693
Skewed Coef. : 0.38404
Kurtosis Coef. : 5.33136
MAD : 2.42632
Range : 70.62467
Mid_range : 102.26586
Median : 100.71252
Q1 : 99.18867
Q2 : 100.71252
Q3 : 102.69491
IQR : 3.50624
C.V. : 0.03284
Y4 = Min( X 1 , X 2 ),
43
Mathematical Mean: 99.00029
Geometrical Mean : 98.94410
Harmonic Mean : 98.88718
Variance : 11.00050
S.D. : 3.31670
Skewed Coef. : -0.38328
Kurtosis Coef. : 5.33123
MAD : 2.42618
Range : 71.89303
Mid_range : 96.57137
Median : 99.28708
Q1 : 97.30518
Q2 : 99.28708
Q3 : 100.81114
IQR : 3.50596
C.V. : 0.03350
W2 = Max( X 1 , X 2 ) − Min( X 1 , X 2 ),
Mathematical Mean: 2.00007
Geometrical Mean : 1.12295
Harmonic Mean : 0.06645
Variance : 3.99947
S.D. : 1.99987
Skewed Coef. : 1.99940
Kurtosis Coef. : 8.99830
MAD : 1.47152
Range : 37.63142
Mid_range : 18.81571
Median : 1.38642
Q1 : 0.57545
Q2 : 1.38642
Q3 : 2.77257
IQR : 2.19712
C.V. : 0.99990
Note: please refer the Appendix 9.
Example 12 The population is B(1, p = 0.5) and simulated n samples, the summation
of sample is B(n, p = 0.5) ,
44
Pr(| X1 distribution F() - X2 distribution F()|< 0.0001000000)= 0.004118
The probability limiting theory
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0010185097
Pr(| X1 distribution F() - X2 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0500000000)= 0.138500
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0100000000)= 0.737302
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0050000000)= 0.853907
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0010000000)= 0.964844
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0005000000)= 0.981575
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0001000000)= 0.995882
X1 X2
(12.2)n=31,
31
X 1 ~ Binomial (n = 31, p = 0.5), X 2 ~ Normal µ = np = 15.5, σ 2 = np(1 − p ) = ,
4
X 1 and X 2 are independent r.v.’s.
E(| X1 distribution - X2 distribution |^2)= 0.0839062936
************ The | X1 distribution F() - X2 distribution F()| ****************
The almost surely limiting theory
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0009854525
Pr(| X1 distribution F() - X2 distribution F()|< 0.1000000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0500000000)= 0.869215
Pr(| X1 distribution F() - X2 distribution F()|< 0.0100000000)= 0.268230
Pr(| X1 distribution F() - X2 distribution F()|< 0.0050000000)= 0.149426
Pr(| X1 distribution F() - X2 distribution F()|< 0.0010000000)= 0.035334
Pr(| X1 distribution F() - X2 distribution F()|< 0.0005000000)= 0.018888
Pr(| X1 distribution F() - X2 distribution F()|< 0.0001000000)= 0.004198
X1 X2
45
Whe n=30, the binomial distribution is not approached to the standard normal
distribution, the central limit theorem cannot be applied.
12.3)n=1000,
X 1 ~ Binomial (n = 1000, p = 0.5),
1000
, σ = np(1 − p ) =
1000 2
X 2 ~ Normal µ = np = ,
2 4
X 1 and X 2 are independent r.v.’s.
E(| X1 distribution - X2 distribution |^2)= 0.0854899972
************ The | X1 distribution F() - X2 distribution F()| ****************
The almost surely limiting theory
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0000309286
Pr(| X1 distribution F() - X2 distribution F()|< 0.1000000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0500000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0100000000)= 0.925866
Pr(| X1 distribution F() - X2 distribution F()|< 0.0050000000)= 0.601874
Pr(| X1 distribution F() - X2 distribution F()|< 0.0010000000)= 0.166524
Pr(| X1 distribution F() - X2 distribution F()|< 0.0005000000)= 0.091206
Pr(| X1 distribution F() - X2 distribution F()|< 0.0001000000)= 0.021299
46
X1 X2
Whe n=1000, the binomial distribution is not approached to the standard normal
distribution, the central limit theorem cannot be applied.
12.4)n=10000,
X 1 ~ Binomial (n = 10000, p = 0.5),
10000
, σ = np(1 − p ) =
10000 2
X 2 ~ Normal µ = np = ,
2 4
X 1 and X 2 are independent r.v.’s.
E(| X1 distribution - X2 distribution |^2)= 0.0902553835
************ The | X1 distribution F() - X2 distribution F()| ****************
The almost surely limiting theory
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0000031300
Pr(| X1 distribution F() - X2 distribution F()|< 0.1000000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0500000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0100000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0050000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0010000000)= 0.423546
Pr(| X1 distribution F() - X2 distribution F()|< 0.0005000000)= 0.243267
Pr(| X1 distribution F() - X2 distribution F()|< 0.0001000000)= 0.060481
47
X1 X2
Whe n=10000, the binomial distribution is not approached to the standard normal
distribution, the central limit theorem canbe applied.
n n
t2 t4 t6 ∞
t 2k
= φ X − p (t ) = E 1 − + − + .... = E 1 + (− 1) ∑
k
n
t t2
= cos → exp − ,
n
→∞
n 2
48
∞ t2 w2
f W (w) = exp(− itw)dw = ,−∞ < w < ∞, W ~ Normal (0,1).
1 1
2π ∫−∞ − 2
exp
2π
exp −
2
∑X i n
X = pˆ = i =1
, ∑ X i ~ Binomial (n, p ) , the sample proportion is disctete random
n i =1
variable.
X is discrete random value, but the range 0 ≤ X ≤ 1 ,
X−p
is discrete random variable, but sometime is likely the continuous
p(1 − p ) n
random variable.
( )
P Y 2 = 1 = 1, Y 2 is point distribution, it is not continuous random variable.
( )
P Y 2 k = 1 = 1, Y 2 k is point distribution also , k = 1,2,..., ∞ .
pˆ − p0 pˆ − p
H 0 : p = p0 , test statistic= , confidence interval formula= ,
p0 (1 − p0 ) pˆ (1 − pˆ )
n n
13.1)
X1
X 1 ~ Binomial (n = 30, p = 0.1), pˆ =
n
pˆ − p pˆ − 0.1 pˆ − p pˆ − 0.1
W4 = = , W5 = = ,
p(1 − p ) 0.1(1 − 0.1) pˆ (1 − pˆ ) pˆ (1 − pˆ )
30 30 30 30
f W4 (w4 ), FW4 (w4 ) Coefficient
Mathematical Mean: 0.08089
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.89015
S.D. : 0.94348
Skewed Coef. : 0.66832
Kurtosis Coef. : 3.28358
MAD : 0.75072
Range : 8.52013
Mid_range : 3.04290
Median : 0.00000
Q1 : -0.60858
Q2 : 0.00000
Q3 : 0.60858
IQR : 1.21716
C.V. : 11.66352
49
f W5 (w5 ), FW5 (w5 ) Coefficient
Mathematical Mean: -0.15218
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.05820
S.D. : 1.02869
Skewed Coef. : -0.37631
Kurtosis Coef. : 2.54027
MAD : 0.83097
Range : 6.41597
Mid_range : 1.17379
Median : 0.00000
Q1 : -0.73193
Q2 : 0.00000
Q3 : 0.53709
IQR : 1.26901
C.V. : none
Whe n=30 and p=0.1, the binomial distribution is not approached to the standard
normal distribution, the central limit theorem cannot be applied.
13.2)
X1
X 1 ~ Binomial (n = 30, p = 0.5), pˆ =
n
pˆ − p pˆ − 0.5 pˆ − p pˆ − 0.5
W4 = = , W5 = = ,
p(1 − p ) 0.5(1 − 0.5) pˆ (1 − pˆ ) pˆ (1 − pˆ )
30 30 30 30
f W4 (w4 ), FW4 (w4 ) Coefficient
Mathematical Mean: 0.00002
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00000
S.D. : 1.00000
Skewed Coef. : -0.00010
Kurtosis Coef. : 2.93241
MAD : 0.79134
Range : 10.22415
Mid_range : 0.00000
Median : 0.00000
Q1 : -0.73030
Q2 : 0.00000
Q3 : 0.73030
IQR : 1.46059
C.V. : none
50
Whe n=30 and p=0.5, the binomial distribution is not approached to the standard
normal distribution, the central limit theorem cannot be applied.
13.3)
X 1 ~ Binomial (n = 1000, p = 0.1), pˆ =
X1
n
pˆ − p pˆ − 0.1 pˆ − p pˆ − 0.1
W4 = = ,W5 = = ,
p(1 − p ) 0.1(1 − 0.1) pˆ (1 − pˆ ) pˆ (1 − pˆ )
n 1000 n 1000
f W4 (w4 ), FW4 (w4 ) Coefficient
Mathematical Mean: 0.00009
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00007
S.D. : 1.00003
Skewed Coef. : 0.08436
Kurtosis Coef. : 3.00261
MAD : 0.79733
Range : 10.54093
Mid_range : 0.42164
Median : 0.00000
Q1 : -0.63246
Q2 : 0.00000
Q3 : 0.63246
IQR : 1.26491
C.V. : none
W0~Normal(0,1),
E(| W4 distribution F() - W0 distribution F()|^2)= 0.0000969234
Pr(| W4 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.000000
Pr(| W4 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.000000
Pr(| W4 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.318177
Pr(| W4 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.602179
Pr(| W4 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.907139
Pr(| W4 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.954177
Pr(| W4 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.991361
W0~Normal(0,1),
E(| W5 distribution F() - W0 distribution F()|^2)= 0.0001526296
Pr(| W5 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.000000
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.000000
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.459881
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.733614
51
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.952783
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.976930
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.995458
Whe n=1000 and p=0.5, the binomial distribution is not approached to the standard
normal distribution, the central limit theorem cannot be applied.
13.4)
X 1 ~ Binomial (n = 1000, p = 0.5), pˆ = 1
X
n
pˆ − p pˆ − 0.5 pˆ − p pˆ − 0.5
W4 = = , W5 = = ,
p(1 − p ) 0.5(1 − 0.5) pˆ (1 − pˆ ) pˆ (1 − pˆ )
n 1000 n 1000
f W4 (w4 ), FW4 (w4 ) Coefficient
Mathematical Mean: 0.00008
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.99998
S.D. : 0.99999
Skewed Coef. : 0.00015
Kurtosis Coef. : 2.99875
MAD : 0.79763
Range : 10.81499
Mid_range : -0.03162
Median : 0.00000
Q1 : -0.69570
Q2 : 0.00000
Q3 : 0.69570
IQR : 1.39140
C.V. : none
W0~Normal(0,1),
E(| W4 distribution F() - W0 distribution F()|^2)= 0.0000306337
Pr(| W4 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.000000
Pr(| W4 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.000000
Pr(| W4 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.073411
Pr(| W4 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.396256
Pr(| W4 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.833248
Pr(| W4 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.908597
Pr(| W4 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.978473
52
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.833817
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.909107
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.978662
Whe n=1000 and p=0.5, the binomial distribution is not approached to the standard
normal distribution, the central limit theorem cannot be applied.
Example 14, The population is B(1, p ) , simulated the sample size n=100,0000, it is big
data(population data), the sample porportion is population porportin.
value Simple number probability
0 n-X 1-X/n=1-p
1 X p=X/n
ε = W1 = X 2 − X 1 ,
Mathematical Mean: -0.00003
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.22727
S.D. : 0.47673
Skewed Coef. : 0.00002
Kurtosis Coef. : 1.35380
MAD : 0.45454
Range : 1.96586
Mid_range : 0.00293
Median : -0.03940
Q1 : -0.45171
Q2 : -0.03940
Q3 : 0.45172
IQR : 0.90343
C.V. : none
53
3.2. Two independent population proportion test
Two indepdendent Bernoulli population, there are two sample proporitons and they
are discrete random varuables. The central limit theory may not be applied when the
sample size is not very large. When the sample size very large, it is big data and the
analysis method is probability distribution.
54
16.2) X 1 ~ Binomial (n1 = 30, p1 = 0.5), X 2 ~ Binomial (n2 = 30, p 2 = 0.5),
f W3 (w3 ), FW3 (w3 ) Coefficient
Mathematical Mean: 0.00003
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.01669
S.D. : 1.00831
Skewed Coef. : 0.00004
Kurtosis Coef. : 2.96522
MAD : 0.80121
Range : 11.19213
Mid_range : 0.09698
Median : 0.00000
Q1 : -0.77503
Q2 : 0.00000
Q3 : 0.77503
IQR : 1.55005
C.V. : none
W0~Normal(0,1),
E(| W3 distribution F() - W0 distribution F()|^2)= 0.0000139715
Pr(| W3 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.000000
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.000000
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.031715
55
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.170560
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.471014
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.581490
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.844681
W0~Normal(0,1),
E(| W5 distribution F() - W0 distribution F()|^2)= 0.0000140026
Pr(| W5 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.000000
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.000000
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.031616
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.170142
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.473208
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.597400
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.865530
The central limit theory can be applied when n=1000.
56
f W5 (w5 ), FW5 (w5 ) Coefficient
Mathematical Mean: 0.00009
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00185
S.D. : 1.00092
Skewed Coef. : 0.00017
Kurtosis Coef. : 3.00618
MAD : 0.79825
Range : 10.94953
Mid_range : -0.02342
Median : 0.00000
Q1 : -0.67099
Q2 : 0.00000
Q3 : 0.67102
IQR : 1.34201
C.V. : none
W0~Normal(0,1),
E(| W5 distribution F() - W0 distribution F()|^2)= 0.0000150363
Pr(| W5 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.000000
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.000000
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.000000
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.232002
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.773681
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.874502
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.969979
The central limit theory can be applied when n=1000.
57
X 3 marginal probability distribution,
Mathematical Mean: 0.50013
Geometrical Mean : 0.25008
Harmonic Mean : 0.00000
Variance : 0.12500
S.D. : 0.35356
Skewed Coef. : -0.00062
Kurtosis Coef. : 1.49998
MAD : 0.31831
Range : 1.00000
Mid_range : 0.50000
Median : 0.50029
Q1 : 0.14655
Q2 : 0.50029
Q3 : 0.85369
IQR : 0.70714
C.V. : 0.70694
ε 2 = W2 = X 4 − X 3 ,
Mathematical Mean: -0.00001
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.12495
S.D. : 0.35348
Skewed Coef. : -0.00059
Kurtosis Coef. : 3.50066
MAD : 0.24993
Range : 1.99998
Mid_range : 0.00000
Median : 0.00000
Q1 : -0.16316
Q2 : 0.00000
Q3 : 0.16314
IQR : 0.32630
C.V. : none
U1 = ε1 + ε 2 ,
Mathematical Mean: 0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.35231
S.D. : 0.59355
Skewed Coef. : -0.00008
Kurtosis Coef. : 2.37790
MAD : 0.50497
Range : 3.80177
Mid_range : 0.00499
Median : -0.00000
Q1 : -0.46153
Q2 : -0.00000
Q3 : 0.46154
IQR : 0.92307
C.V. : none
58
Chaper 4. One way analysis
4.2. the α
= i 0,=i 1, 2, ..., k ,
Category 4 population, X 4 ~ N (µ
4 = 25, σ 4
2 2
= 5 ),
Category 5 population, X 5 ~ N (µ
5 = 25, σ 5
2 2
= 5 ),
The each category has n sample data, one way model is designed by
X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,
18.1)n=100,
One way model analysis, popuation distribution is normal distribution.
One way model
X(ij)=mu+alpha(i)+e(ij), i=1,2...,5, j=1,2...,n(i)
1=A1, 2=A2, 3=A3, 4=A4, 5=A5
A1 A2 A3 A4 A5 Total
sample size 100 100 100 100 100 500
sample mean 24.40637 25.13159 24.90588 25.63750 24.43427 24.90312
sample variance 24.11047 25.44705 22.79769 20.40478 24.85717
alpha estimate value -0.49675 0.22847 0.00276 0.73438 -0.46885
summation of alpha(i)=0.000000
H0:alpha(1)=...=alpha(5)=0
ANOVA
Source df SS MS F
Treatment 4 105.8109657939 26.4527414485 1.1245272744
Error 495 11644.0990944496 23.5234325140
Total 499 11749.9100602435
The F test p value=0.348400
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -5.92056 -3.70917 -2.08938 -0.67794 0.67706 2.08788
3.70737 5.91757
upper limit -5.92056 -3.70917 -2.08938 -0.67794 0.67706 2.08788 3.70737
5.91757
observed no 52.00000 55.00000 58.00000 66.00000 59.00000 48.00000 53.00000
51.00000 58.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
59
chi square 0.22756 0.00556 0.10756 1.96356 0.21356 1.02756 0.11756
0.37356 0.10756
degree of freedom=7
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =4.144000
p-value=0.763000
60
(18.2)n=100,000,000, this is big data and the method is probability distribution.
(18.2.1)X1,…,X5 marginal probability disribution,
X1 marginal probability distribution,
Mathematical Mean: 24.99974
Geometrical Mean : none
Harmonic Mean : none
Variance : 25.00206
S.D. : 5.00021
Skewed Coef. : 0.00015
Kurtosis Coef. : 3.00035
MAD : 3.98959
Range : 59.70709
Mid_range : 26.21951
Median : 24.99979
Q1 : 21.62796
Q2 : 24.99979
Q3 : 28.37247
IQR : 6.74452
C.V. : 0.20001
X2 marginal probability distribution,
Mathematical Mean: 25.00019
Geometrical Mean : none
Harmonic Mean : none
Variance : 24.99649
S.D. : 4.99965
Skewed Coef. : -0.00005
Kurtosis Coef. : 2.99982
MAD : 3.98918
Range : 57.16562
Mid_range : 24.59357
Median : 25.00050
Q1 : 21.62799
Q2 : 25.00050
Q3 : 28.37249
IQR : 6.74450
C.V. : 0.19998
( )
iid
X1,…,X5 ~ Normal µ1 = 25, σ 12 = 5 2 .
(18.2.2) The probability distribution of merging X1,X2,X3,X4,X5, the probability
distrituions of X1,..,X5 are conditional probability and the pripori probability
distribution is the proportion(each category sample size ratio) that is 0.2.
The marginal probability distribution,
f X (x ) = P(1st ) f (x 1st ) + P(2nd ) f (x 2nd ) + P(3rd ) f (x 3rd ) + P(4th ) f (x 4th )
(x − 25)2
+ P(5th ) f (x 5th ) =
1
× exp − ,−∞ < x < ∞
50π 50
61
Mathematical Mean: 24.99950
Geometrical Mean : none
Harmonic Mean : none
Variance : 25.00216
S.D. : 5.00022
Skewed Coef. : -0.00018
Kurtosis Coef. : 2.99806
MAD : 3.98988
Range : 56.66966
Mid_range : 24.40659
Median : 25.00039
Q1 : 21.62609
Q2 : 25.00039
Q3 : 28.37265
IQR : 6.74656
C.V. : 0.20001
Category 5 population, X 5 ~ N (µ
5 = 45, σ = 5 ),
2
5
2
The each category has n sample data, one way model is designed by
X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,
19.1)n=100,
One way model analysis, popuation distribution is normal distribution.
One way model
X(ij)=mu+alpha(i)+e(ij), i=1,2...,5, j=1,2...,n(i)
1=A1, 2=A2, 3=A3, 4=A4, 5=A5
A1 A2 A3 A4 A5 Total
sample size 100 100 100 100 100 500
sample mean 14.67626 35.11895 25.00049 4.90064 44.68392 24.87606
sample variance 35.35926 23.77747 27.54776 24.88746 19.30776
alpha estimate value -10.19979 10.24290 0.12444 -19.97541 19.80787
summation of alpha(i)=-0.000000
H0:alpha(1)=...=alpha(5)=0
ANOVA
Source df SS MS F
Treatment 4 100033.6931928730 25008.4232982183 955.3972743523
Error 495 12957.0911127101 26.1759416418
Total 499 112990.7843055832
The F test p value=0.000100
[checking the three basic assumptions]
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -6.24545 -3.91271 -2.20404 -0.71515 0.71421 2.20246
3.91081 6.24230
upper limit -6.24545 -3.91271 -2.20404 -0.71515 0.71421 2.20246 3.91081
6.24230
observed no 57.00000 58.00000 47.00000 58.00000 57.00000 52.00000 58.00000
58.00000 55.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
62
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 0.03756 0.10756 1.31756 0.10756 0.03756 0.22756 0.10756
0.10756 0.00556
degree of freedom=7
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =2.056000, p-value=0.956600
63
The best parameters and goodness of fit(pearson chi square test)
mu point estimated value=0.000000 (MLE), sigma point estimated value=5.116243 (MLE)
mu value from -1.023249 to 1.023249, sigma value from 4.263536 to 6.395304
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -6.30129 -3.97826 -2.27671 -0.79403 0.62937 2.11142
3.81265 6.13443
upper limit -6.30129 -3.97826 -2.27671 -0.79403 0.62937 2.11142 3.81265
6.13443
observed no 55.00000 58.00000 48.00000 56.00000 53.00000 56.00000 59.00000
57.00000 58.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 0.00556 0.10756 1.02756 0.00356 0.11756 0.00356 0.21356
0.03756 0.10756
degree of freedom=6
H0: A0~Normal(mu=-0.081860,sigma*sigma=25.958263), sigma=5.094925
pearson chi-square test statistic =1.624000, p-value=0.950800
19.2) n= 100,000,000, this is big data and the method is probability distribution.
(19.2.1)X1,…,X5 marginal probability distribution,
X1 marginal probability distribution,
Mathematical Mean: 14.99971
Geometrical Mean : none
Harmonic Mean : none
Variance : 25.00033
S.D. : 5.00003
Skewed Coef. : 0.00022
Kurtosis Coef. : 2.99984
MAD : 3.98943
Range : 56.25291
Mid_range : 14.23740
Median : 14.99986
Q1 : 11.62697
Q2 : 14.99986
Q3 : 18.37208
IQR : 6.74511
C.V. : 0.33334
X2 marginal probability distribution,
64
Mathematical Mean: 34.99988
Geometrical Mean : 34.63291
Harmonic Mean : 34.25292
Variance : 24.99987
S.D. : 4.99999
Skewed Coef. : 0.00010
Kurtosis Coef. : 2.99929
MAD : 3.98947
Range : 55.94190
Mid_range : 34.09912
Median : 34.99930
Q1 : 31.62725
Q2 : 34.99930
Q3 : 38.37220
IQR : 6.74495
C.V. : 0.14286
X3 marginal probability distribution,
Mathematical Mean: 24.99974
Geometrical Mean : none
Harmonic Mean : none
Variance : 25.00206
S.D. : 5.00021
Skewed Coef. : 0.00015
Kurtosis Coef. : 3.00035
MAD : 3.98959
Range : 59.70709
Mid_range : 26.21951
Median : 24.99979
Q1 : 21.62795
Q2 : 24.99979
Q3 : 28.37247
IQR : 6.74451
C.V. : 0.20001
65
X1,X2,X3,X4,X5 are normal distribution and the population mean are not equal and
the population variances are equally.
66
Mathematical Mean: 24.99995
Geometrical Mean : 24.89892
Harmonic Mean : 24.79662
Variance : 4.99994
S.D. : 2.23605
Skewed Coef. : 0.00020
Kurtosis Coef. : 3.00064
MAD : 1.78407
Range : 25.90353
Mid_range : 25.14977
Median : 24.99983
Q1 : 23.49187
Q2 : 24.99983
Q3 : 26.50803
IQR : 3.01616
C.V. : 0.08944
20.1)n=100,
One way model analysis, popuation distribution is arcsin distribution.
X(ij)=mu+alpha(i)+e(ij), i=1,2...,5, j=1,2...,n(i)
1=A1, 2=A2, 3=A3, 4=A4, 5=A5
A1 A2 A3 A4 A5 Total
sample size 100 100 100 100 100 500
sample mean 5.95631 14.08830 24.75121 33.69864 44.68603 24.63610
sample variance 47.32315 43.33253 46.56744 53.27101 40.77840
alpha estimate value -18.67979 -10.54780 0.11511 9.06254 20.04993
summation of alpha(i)=-0.000000
H0:alpha(1)=...=alpha(5)=0
ANOVA
Source df SS MS F
Treatment 4 94433.3479159967 23608.3369789992 510.4007919148
Error 495 22895.9809422788 46.2545069541
Total 499 117329.3288582755
67
observed no 69.00000 57.00000 58.00000 51.00000 49.00000 45.00000 52.00000
51.00000 68.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 3.25356 0.03756 0.10756 0.37356 0.77356 2.00556 0.22756
0.37356 2.78756
degree of freedom=7
H0: error~Uniform(alpha,beta), alpha,beta are unknown
alpha point estimated value=-10.952996 (MLE), beta point estimated value=11.301311 (MLE)
degree of freedom=7
H0: error~Normal(mu=0,sigma*sigma), sigma are unknown
population variance(sigma*sigma) which point estimated value=45.647453 (UMVUE)
68
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -10.95300 -9.81323 -8.29369 -6.42427 -3.70905 3.70905 6.42427
8.29369 9.81323
upper limit -9.81323 -8.29369 -6.42427 -3.70905 3.70905 6.42427 8.29369
9.81323 11.30131
observed no 13.00000 66.00000 41.00000 61.00000 144.00000 56.00000 31.00000
51.00000 37.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 32.59756 1.96356 3.81356 0.53356 140.80356 0.00356 10.85356
0.37356 6.19756
degree of freedom=7
H0: error~Triangular 1(mu=0,c), mu,c are unknown
c point estimated value=11.127154 (MLE)
69
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -173.02438 -104.23882 -57.67479 -18.56714 18.56714 57.67479
104.23882 173.02438
upper limit -173.02438 -104.23882 -57.67479 -18.56714 18.56714 57.67479 104.23882
173.02438
observed no 0.00000 0.00000 0.00000 0.00000 500.00000 0.00000 0.00000
0.00000 0.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 55.55556 55.55556 55.55556 55.55556 3555.55556 55.55556 55.55556
55.55556 55.55556
degree of freedom=7
H0: error~Logistic(mu=0,sigma), mu,sigma are unknown
sigma point estimated value=83.207141 (MME)
70
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.083824, p-value=0.466600
H0: residual is random , H1: Oscillation, Z=-0.083824, p-value=0.533400
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.083824, p-value=0.933200
71
55.55556 55.55556
chi square 37.35556 0.03756 4.29356 17.79756 2.35756 4.29356 0.11756
1.96356 25.38756
degree of freedom=7
H0: A0~Arcsin(mu=0.000000,c), c is unknown, c point estimated value=11.127154 (MLE),
pearson chi-square test statistic =93.604000, p-value=0.000000
72
55.55556 55.55556
chi square 0.04356 0.55556 0.00356 2.78756 0.99756 0.35556 2.83756
0.77356 0.03756
degree of freedom=6
H0: A0~Arcsin(mu=0.178034,c=9.458081),
pearson chi-square test statistic =8.392000
p-value=0.210700
(20.2)n=100 and data is same as (20.1), one way analysis and error is Arcsin
distribution,
(20.2.1) Each category probability distribution,
Category 1 data goodness of fit test,
mu point estimated value=5.956306, c point estimated value=9.998345
mu value from 3.956637 to 7.955975, c value from 8.331954 to 12.497931
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
lower limit -4.99669 -2.50363 0.02794 3.68618 7.74651 11.40475 13.93633
upper limit -2.50363 0.02794 3.68618 7.74651 11.40475 13.93633 15.00000
observed no 16.00000 11.00000 13.00000 12.00000 18.00000 15.00000 15.00000
probability 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
expected no 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571
chi square 0.20571 0.75571 0.11571 0.36571 0.96571 0.03571 0.03571
degree of freedom=4
73
H0: X1~Arcsin(mu=5.716346,c=9.123490),
pearson chi-square test statistic =2.480000, p-value=0.648200
Category 2 data goodness of fit test,
mu point estimated value=14.088299, c point estimated value=9.977409
mu value from 12.092817 to 16.083781, c value from 8.314508 to 12.471762
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
lower limit 5.00034 6.22754 8.67307 12.20696 16.12928 19.66317 22.10870
upper limit 6.22754 8.67307 12.20696 16.12928 19.66317 22.10870 24.95515
observed no 15.00000 13.00000 16.00000 19.00000 12.00000 8.00000 17.00000
probability 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
expected no 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571
chi square 0.03571 0.11571 0.20571 1.55571 0.36571 2.76571 0.51571
degree of freedom=4
H0: X2~Arcsin(mu=14.168118,c=8.813378),
pearson chi-square test statistic =5.560000, p-value=0.234500
Category 3 data goodness of fit test,
mu point estimated value=24.751212, c point estimated value=9.991408
mu value from 22.752931 to 26.749494, c value from 8.326173 to 12.489260
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
lower limit 15.01221 15.47688 18.28394 22.34026 26.84244 30.89876 33.70582
upper limit 15.47688 18.28394 22.34026 26.84244 30.89876 33.70582 34.99502
observed no 11.00000 14.00000 18.00000 16.00000 15.00000 13.00000 13.00000
probability 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
expected no 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571
chi square 0.75571 0.00571 0.96571 0.20571 0.03571 0.11571 0.11571
degree of freedom=4
H0: X3~Arcsin(mu=24.591350,c=10.116300),
pearson chi-square test statistic =2.200000, p-value=0.699000
Category 4 data goodness of fit test,
mu point estimated value=33.698639, c point estimated value=9.999893
mu value from 31.698661 to 35.698618, c value from 8.333245 to 12.499867
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
lower limit 25.00016 25.41150 28.19782 32.22417 36.69309 40.71944 43.50576
upper limit 25.41150 28.19782 32.22417 36.69309 40.71944 43.50576 44.99995
observed no 17.00000 17.00000 16.00000 16.00000 9.00000 11.00000 14.00000
probability 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
expected no 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571
chi square 0.51571 0.51571 0.20571 0.20571 1.95571 0.75571 0.00571
degree of freedom=4
H0: X4~Arcsin(mu=34.458631,c=10.041560),
pearson chi-square test statistic =4.160000, p-value=0.384700
Category 5 data goodness of fit test,
mu point estimated value=44.686033, c point estimated value=9.995422
mu value from 42.686949 to 46.685117, c value from 8.329518 to 12.494277
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
lower limit 35.00913 35.53594 38.19390 42.03475 46.29779 50.13865 52.79660
upper limit 35.53594 38.19390 42.03475 46.29779 50.13865 52.79660 54.99998
observed no 10.00000 12.00000 18.00000 18.00000 16.00000 12.00000 14.00000
probability 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
expected no 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571
chi square 1.28571 0.36571 0.96571 0.96571 0.20571 0.36571 0.00571
degree of freedom=4
H0: X5~Arcsin(mu=44.166271,c=9.578946),
pearson chi-square test statistic =4.160000, p-value=0.384700
(20.2.2)
One way model analysis,
X(ij)=mu+alpha(i)+e(ij), i=1,2...,5, j=1,2...,n(i)
1=A1, 2=A2, 3=A3, 4=A4, 5=A5
A1 A2 A3 A4 A5 Total
sample size 100 100 100 100 100 500
74
sample mean 5.95631 14.08830 24.75121 33.69864 44.68603 24.63610
sample variance 47.32315 43.33253 46.56744 53.27101 40.77840
alpha estimate value -18.67979 -10.54780 0.11511 9.06254 20.04993
summation of alpha(i)=-0.000000
H0:alpha(1)=...=alpha(5)=0
ANOVA
Source df SS MS F
Treatment 4 94433.3479159967 23608.3369789992 510.4007919148
Error 495 22895.9809422788 46.2545069541
Total 499 117329.3288582755
The error probability is Arcsin distribution.
The F test p value=0.000000
75
95% C.I. for mu(2)-mu(5)
[ -32.4880350742, -28.70743258290], mu(2)<mu(5)
95% C.I. for mu(3)-mu(4)
[ -10.8377283679, -7.05712587660], mu(3)<mu(4)
95% C.I. for mu(3)-mu(5)
[ -21.8251218705, -18.04451937920], mu(3)<mu(5)
95% C.I. for mu(4)-mu(5)
[ -12.8776947483, -9.09709225700], mu(4)<mu(5)
The common population standard deviation and variance confidence interval
90% confidence interval for population variance [43.926183 , 48.846993]
90% confidence interval for population standard deviation [6.627683 , 6.989062]
95% confidence interval for population variance [43.507579 , 49.376819]
95% confidence interval for population standard deviation [6.596028 , 7.026864]
99% confidence interval for population variance [42.710061 , 50.448188]
99% confidence interval for population standard deviation [6.535293 , 7.102689]
sample scatter diagram residual polr
76
f(w5,w6) f(w6,w5)
∑X j
50
X 11 ,..., X 1n ~ Arc sin (5,10 ), X =
iid CLT
j =1
→ N 5, ,
n→∞
n n
Categoty 1 j-th residual = X 1 j − X is not Arcsin distribution or Normal
distribution, j = 1,2,..., n.
77
Pr(| new distribution F(x) - F distribution F(x)|>= 0.0005000000)= 0.099165
Pr(| new distribution F(x) - F distribution F(x)|>= 0.0001000000)= 0.857994
MSTR/MSE is approached to F(4,495), but is not F(4,495).
78
X4 marginal probability distribution,
Mathematical Mean: 35.00030
Geometrical Mean : 34.27066
Harmonic Mean : 33.54103
Variance : 50.00963
S.D. : 7.07175
Skewed Coef. : -0.00009
Kurtosis Coef. : 1.49980
MAD : 6.36697
Range : 20.00000
Mid_range : 35.00000
Median : 35.00040
Q1 : 27.92758
Q2 : 35.00040
Q3 : 42.07292
IQR : 14.14534
C.V. : 0.20205
X5 marginal probability distribution,
Mathematical Mean: 45.00056
Geometrical Mean : 44.43796
Harmonic Mean : 43.87534
Variance : 50.00112
S.D. : 7.07115
Skewed Coef. : -0.00016
Kurtosis Coef. : 1.49996
MAD : 6.36629
Range : 20.00000
Mid_range : 45.00000
Median : 45.00107
Q1 : 37.92928
Q2 : 45.00107
Q3 : 52.07130
IQR : 14.14202
C.V. : 0.15713
X1 + X 2 + X 3 + X 4 + X 5
(20.3.3) The mean of X1,X2,X3,X4,X5, Y1= ,
5
79
Mathematical Mean: 25.00014
Geometrical Mean : 24.79644
Harmonic Mean : 24.58851
Variance : 10.00071
S.D. : 3.16239
Skewed Coef. : 0.00015
Kurtosis Coef. : 2.70000
MAD : 2.55694
Range : 19.96294
Mid_range : 25.00195
Median : 24.99959
Q1 : 22.80531
Q2 : 24.99959
Q3 : 27.19544
IQR : 4.39013
C.V. : 0.12649
1 1 2 2
( )
ε 3 j ~ Semi _ circle 0, Rε = 200 , σ ε2 = 50, ε 4 j ~ DE (λε = 0.2,0), σ ε2 = 50,
iid iid
3 3 4 4
5 5
21.1)n=100, the each category has a specific probability distribution and the variances
are equally, the error is normal distribution in assumption when analysis data.
One way model analysis,
One way model, X(ij)=mu+alpha(i)+e(ij), i=1,2...,5, j=1,2...,n(i)
1=A1, 2=A2, 3=A3, 4=A4, 5=A5
A1 A2 A3 A4 A5 Total
sample size 100 100 100 100 100 500
sample mean 4.11151 14.78073 23.99617 35.50465 44.53823 24.58626
sample variance 52.12294 48.92488 52.72852 63.07862 51.03545
alpha estimate value -20.47475 -9.80552 -0.59009 10.91840 19.95197
summation of alpha(i)=0.000000
H0:alpha(1)=...=alpha(5)=0
ANOVA
Source df SS MS F
Treatment 4 103300.4246150119 25825.1061537530 482.0087696471
Error 495 26521.1513796043 53.5780835952
Total 499 129821.5759946162
80
The F test p value=0.000100
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -8.93524 -5.59783 -3.15327 -1.02314 1.02181 3.15101
5.59511 8.93073
upper limit -8.93524 -5.59783 -3.15327 -1.02314 1.02181 3.15101 5.59511
8.93073
observed no 51.00000 83.00000 48.00000 54.00000 42.00000 41.00000 52.00000
70.00000 59.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 0.37356 13.55756 1.02756 0.04356 3.30756 3.81356 0.22756
3.75556 0.21356
degree of freedom=7
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =26.320000
p-value=0.000400
81
99% confidence interval for population variance [46.039145 , 64.069506]
99% confidence interval for population standard deviation [6.785215 , 8.004343]
sample scatter diagram residual polr
82
X4 marginal probability distribution,
Mathematical Mean: 34.99860
Geometrical Mean : none
Harmonic Mean : none
Variance : 50.01376
S.D. : 7.07204
Skewed Coef. : -0.00033
Kurtosis Coef. : 6.00203
MAD : 5.00031
Range : 171.46965
Mid_range : 37.16623
Median : 34.99991
Q1 : 31.53313
Q2 : 34.99991
Q3 : 38.46399
IQR : 6.93085
C.V. : 0.20207
X5 marginal probability distribution,
Mathematical Mean: 44.99956
Geometrical Mean : 44.43818
Harmonic Mean : 43.87912
Variance : 49.99835
S.D. : 7.07095
Skewed Coef. : 0.00010
Kurtosis Coef. : 1.33335
MAD : 6.66655
Range : 20.00000
Mid_range : 45.00000
Median : 44.92970
Q1 : 37.92903
Q2 : 44.92970
Q3 : 52.07031
IQR : 14.14128
C.V. : 0.15713
100π 10 10
Y1=X marginal probability distribution,
Mathematical Mean: 25.00241
Geometrical Mean : none
Harmonic Mean : none
Variance : 249.96187
S.D. : 15.81018
Skewed Coef. : -0.00004
Kurtosis Coef. : 2.15907
MAD : 13.43457
Range : 163.69237
Mid_range : 31.96496
Median : 25.14082
Q1 : 13.19401
Q2 : 25.14082
Q3 : 36.66261
IQR : 23.46860
C.V. : 0.63235
83
X1 + X 2 + X 3 + X 4 + X 5
(21.2.3)The mean of X1,X2,X3,X4,X5 Y1= ,
5
Mathematical Mean: 24.99933
Geometrical Mean : 24.79516
Harmonic Mean : 24.58566
Variance : 9.99995
S.D. : 3.16227
Skewed Coef. : 0.00040
Kurtosis Coef. : 2.95419
MAD : 2.53224
Range : 44.39859
Mid_range : 23.49363
Median : 24.99926
Q1 : 22.84348
Q2 : 24.99926
Q3 : 27.15464
IQR : 4.31117
C.V. : 0.12649
1 1 2 2
( )
ε 3 j ~ Semi _ circle 0, Rε = 200 , σ ε2 = 50, ε 4 j ~ DE (λε = 0.2,0), σ ε2 = 50,
iid iid
3 3 4 4
5 5
(22.1)n=100, , the each category has a specific probability distribution and the
variances are equally, the error is normal distribution in assumption when
analysis data.
One way model analysis,
One way model
X(ij)=mu+alpha(i)+e(ij), i=1,2...,5, j=1,2...,n(i)
1=A1, 2=A2, 3=A3, 4=A4, 5=A5
A1 A2 A3 A4 A5 Total
sample size 100 100 100 100 100 500
sample mean 24.47177 25.20717 24.55538 25.82802 25.92013 25.19649
sample variance 47.91952 39.94974 43.76623 43.07667 52.68748
alpha estimate value -0.72472 0.01068 -0.64111 0.63152 0.72364
summation of alpha(i)=-0.000000
84
H0:alpha(1)=...=alpha(5)=0
ANOVA
Source df SS MS F
Treatment 4 185.8843267195 46.4710816799 1.0217931828
Error 495 22512.5649871330 45.4799292669
Total 499 22698.4493138525
The F test p value=0.399200
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -8.23232 -5.15746 -2.90521 -0.94266 0.94142 2.90313
5.15496 8.22817
upper limit -8.23232 -5.15746 -2.90521 -0.94266 0.94142 2.90313 5.15496
8.22817
observed no 66.00000 61.00000 51.00000 57.00000 42.00000 36.00000 48.00000
75.00000 64.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 1.96356 0.53356 0.37356 0.03756 3.30756 6.88356 1.02756
6.80556 1.28356
degree of freedom=7
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =22.216000
p-value=0.002300
H0: Variances are equal
The Bartlett chi-square test statistic =2.266693
p-value=0.686800
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=256, number of the positive ofresidual=244
Run=237
H0: residualis random , H1: Increasing line or decreasing line
Z=-1.241278, p-value=0.107300
H0: residual is random , H1: Oscillation
Z=-1.241278, p-value=0.892700
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-1.241278, p-value=0.214600
multiple comparison of population means
1. LSD( least significant difference),假設各個母體為常態分配,
The confidence coefficietn=0.95
95% C.I. for mu(1)-mu(2)[ -2.6047178349, 1.13391193990] mu(1)=mu(2)
95% C.I. for mu(1)-mu(3)[ -1.9529250259, 1.78570474890] mu(1)=mu(3)
95% C.I. for mu(1)-mu(4)[ -3.2255615302, 0.51306824450] mu(1)=mu(4)
95% C.I. for mu(1)-mu(5)[ -3.3176800410, 0.42094973370] mu(1)=mu(5)
95% C.I. for mu(2)-mu(3)[ -1.2175220784, 2.52110769640] mu(2)=mu(3)
95% C.I. for mu(2)-mu(4)[ -2.4901585827, 1.24847119200] mu(2)=mu(4)
95% C.I. for mu(2)-mu(5)[ -2.5822770935, 1.15635268130] mu(2)=mu(5)
95% C.I. for mu(3)-mu(4)[ -3.1419513917, 0.59667838300] mu(3)=mu(4)
95% C.I. for mu(3)-mu(5)[ -3.2340699025, 0.50455987220] mu(3)=mu(5)
95% C.I. for mu(4)-mu(5)[ -1.9614333982, 1.77719637660] mu(4)=mu(5)
conclusion,mu(1)=mu(2)= mu(3)=mu(4)=mu(5),
90% confidence interval for population variance [41.174814 , 50.790425]
90% confidence interval for population standard deviation [6.416760 , 7.126740]
95% confidence interval for population variance [40.441479 , 51.952494]
95% confidence interval for population standard deviation [6.359362 , 7.207808]
99% confidence interval for population variance [39.080477 , 54.385607]
99% confidence interval for population standard deviation [6.251438 , 7.374660]
sample scatter diagram residual polr
85
(22.2)n=100,000,000 this is big data and the method is probability distribution.
(22.2.1)X1,…,X5 marginal probability distribution,
The comparison of X1 and X2 The comparison of X1 and X3
86
The comparison of X2 and X5 The comparison of X3 and X4
100π 10 10
Y1=X marginal probability distribution,
87
Mathematical Mean: 25.00116
Geometrical Mean : none
Harmonic Mean : none
Variance : 50.00985
S.D. : 7.07176
Skewed Coef. : -0.00055
Kurtosis Coef. : 2.77059
MAD : 5.93557
Range : 163.69237
Mid_range : 21.96496
Median : 25.00175
Q1 : 19.22921
Q2 : 25.00175
Q3 : 30.77336
IQR : 11.54415
C.V. : 0.28286
X1 + X 2 + X 3 + X 4 + X 5
(22.2.3)The mean of X1,X2,X3,X4,X5, Y1= .
5
Mathematical Mean: 25.00003
Geometrical Mean : 24.79586
Harmonic Mean : 24.58635
Variance : 10.00058
S.D. : 3.16237
Skewed Coef. : 0.00042
Kurtosis Coef. : 2.95455
MAD : 2.53237
Range : 40.88973
Mid_range : 25.29991
Median : 24.99994
Q1 : 22.84364
Q2 : 24.99994
Q3 : 27.15600
IQR : 4.31237
C.V. : 0.12649
The each category has n sample data, one way model is designed by
X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,
n=100,
One way model analysis,
One way model
X(ij)=mu+alpha(i)+e(ij), i=1,2...,5, j=1,2...,n(i)
1=X1, 2=X2, 3=X3, 4=X4, 5=X5
X1 X2 X3 X4 X5 Total
sample size 100 100 100 100 100 500
sample mean 25.83636 24.37861 25.14427 25.48965 24.80035 25.12985
sample variance 24.12428 28.19286 19.79491 27.18655 26.64595
88
alpha estimate value 0.70651 -0.75124 0.01442 0.35980 -0.32949
summation of alpha(i)=0.000000
H0:alpha(1)=...=alpha(5)=0
ANOVA
Source df SS MS F
Treatment 4 130.1739575636 32.5434893909 1.2919769848
Error 495 12468.5094530738 25.1889079860
Total 499 12598.6834106374
The F test p value=0.277200,
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -6.12657 -3.83823 -2.16208 -0.70153 0.70062 2.16053
3.83636 6.12348
upper limit -6.12657 -3.83823 -2.16208 -0.70153 0.70062 2.16053 3.83636
6.12348
observed no 60.00000 53.00000 42.00000 55.00000 65.00000 56.00000 62.00000
56.00000 51.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 0.35556 0.11756 3.30756 0.00556 1.60556 0.00356 0.74756
0.00356 0.37356
degree of freedom=7
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =6.520000
p-value=0.480500
89
sample scatter diagram residual plot
Xi −X j
α = 0.05 ,test statistic= is symmetric distribution, the right sided
1 1
MSE +
n n
critical value will be shown.
P(|test statistic | ≤ right sided critival value)=0.95,
critival value Treatment number,k
n 2 3 4 5 6
2 4.3023 4.1774 4.0682 4.0120 3.9780
3 2.7745 3.0668 3.1999 3.2939 3.3600
4 2.4442 2.7922 2.9696 3.0870 3.1788
5 2.3028 2.6695 2.8624 2.9919 3.0905
8 2.1437 2.5208 2.7304 2.8769 2.9865
10 2.0997 2.4792 2.6944 2.8416 2.9540
15 2.0491 2.4300 2.6489 2.7993 2.9161
20 2.0247 2.4066 2.6280 2.7820 2.8984
25 2.0085 2.3917 2.6146 2.7703 2.8880
30 2.0007 2.3852 2.6074 2.7628 2.8821
90
20 2.9951 3.0740 3.1435 3.2017 3.2549
25 2.9855 3.0663 3.1323 3.1954 3.2474
30 2.9792 3.0600 3.1274 3.1900 3.2428
91
Chaper 5. Simple linear model
(1.2)Big data
The simple linear model analysis can be applied in big data, the method is
f X ( x ), f ε (ε ) can be formed using the curve-fitting or SLLN.
Y = H ( x ) + ε , H ( x ) is from the linear model analysis.
X , ε are independent random variables.
f X ,ε ( x, ε ) = f X ( x ) f ε (ε ), f X ,Y ( x, y ) = f X ,ε ( x, ε = y − H (x )),
f Y ( y ) = ∫ f X ,Y ( x, y )dx,
f X ,Y (x, y ) f X ,Y ( x, y )
fY x (y x) = , fX (x y ) =
,
f X (x ) fY ( y )
y
There are marginal probability, conditional probability distribution and the joint
probability distribution.
92
[ 2 ] -4.88201~ -3.45020 -4.16610 34.00000 0.0340000 0.0440000
[ 3 ] -3.45020~ -2.01839 -2.73429 128.00000 0.1280000 0.1720000
[ 4 ] -2.01839~ -0.58657 -1.30248 231.00000 0.2310000 0.4030000
[ 5 ] -0.58657~ 0.84524 0.12933 279.00000 0.2790000 0.6820000
[ 6 ] 0.84524~ 2.27705 1.56115 197.00000 0.1970000 0.8790000
[ 7 ] 2.27705~ 3.70886 2.99296 84.00000 0.0840000 0.9630000
[ 8 ] 3.70886~ 5.14068 4.42477 27.00000 0.0270000 0.9900000
[ 9 ] 5.14068~ 6.57249 5.85658 10.00000 0.0100000 1.0000000
frequency distribution: sample mean=-0.075416 , sample variance=4.355512 , sample sd=2.086986
(23.1.4)
X 2i = β 0 + β1 X 1i + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope, ε i is error,
(23.1.4.1)
The linear mdoel analysis
The estimated line is X2=9.496367+-0.000008*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 0.0000002558 0.0000002558 0.0000000017
error 998 150956.1107438368 151.2586279998
total 999 150956.1107440926
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=1.000000
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 9.4963665700 0.3891511813 24.40277 0.00000
slpoe -0.0000077701 0.1889605459 -0.00004 1.00000
----------------------------------------------------------------------------------
MSE=151.2586279998 , R2=0.000000 , R2(adj)=-0.001002
X2(mean)= 9.4963671217, X2(variance)= 151.1072179621, X2(s.d.)= 12.2925675903
X1(mean)= -0.0710038541, X1(variance)= 4.2404544136, X1(s.d.)= 2.0592363666
93
SSX1= 4236.2139591564 , SS(X2*X1)= -0.0329158468, C.V.= 1.2950978508
[testing the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -15.76201 -10.35077 -6.44917 -3.11546 0.00031 3.11558
6.44919 10.34561 15.76074
upper limit -15.76201 -10.35077 -6.44917 -3.11546 0.00031 3.11558 6.44919
10.34561 15.76074
observed no 0.00000 8.00000 351.00000 213.00000 112.00000 76.00000 62.00000
52.00000 39.00000 87.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 100.00000 84.64000 630.01000 127.69000 1.44000 5.76000 14.44000
23.04000 37.21000 1.69000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =1025.920000
p-value=0.000000
94
(23.1.4.2) residual analysis
X0= residual,residual frequency distribution table,
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -11.44891~ -1.55595 -6.50243 632.00000 0.6320000 0.6320000
[ 2 ] -1.55595~ 8.33702 3.39053 217.00000 0.2170000 0.8490000
[ 3 ] 8.33702~ 18.22998 13.28350 75.00000 0.0750000 0.9240000
[ 4 ] 18.22998~ 28.12294 23.17646 32.00000 0.0320000 0.9560000
[ 5 ] 28.12294~ 38.01591 33.06942 19.00000 0.0190000 0.9750000
[ 6 ] 38.01591~ 47.90887 42.96239 11.00000 0.0110000 0.9860000
[ 7 ] 47.90887~ 57.80183 52.85535 6.00000 0.0060000 0.9920000
[ 8 ] 57.80183~ 67.69480 62.74831 6.00000 0.0060000 0.9980000
[ 9 ] 67.69480~ 77.58776 72.64128 2.00000 0.0020000 1.0000000
frequency distribution: sample mean=0.303929 , sample variance=151.568080 , sample sd=12.311299
95
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.9706108969 0.0383249777 25.32580 0.00000
slpoe 2.0101963232 0.0051629974 389.34676 0.00000
----------------------------------------------------------------------------------
MSE=0.9892994144 , R2=0.993460 , R2(adj)=0.993453
X2(mean)= 9.4963671217, X2(variance)= 151.1072179621, X2(s.d.)= 12.2925675903
X1^2(mean)= 4.2412555065, X1^2(variance)= 37.1499685491, X1^2(s.d.)= 6.0950774030
SS(X1^2)=37112.8185805040 , SS(X2*X1^2)= 74604.0514540141, C.V.= 0.1047385073
[testing the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.27472 -0.83710 -0.52156 -0.25196 0.00002 0.25197
0.52157 0.83668 1.27462
upper limit -1.27472 -0.83710 -0.52156 -0.25196 0.00002 0.25197 0.52157
0.83668 1.27462
observed no 94.00000 112.00000 96.00000 89.00000 109.00000 100.00000 104.00000
88.00000 103.00000 105.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.36000 1.44000 0.16000 1.21000 0.81000 0.00000 0.16000
1.44000 0.09000 0.25000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =5.920000
p-value=0.656100
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=500, number of the positive ofresidual=500
H0: residualis random , H1: Increasing line or decreasing line, Z=-1.518654, p-value=0.064500
H0: residual is random , H1: Oscillation, Z=-1.518654, p-value=0.935500
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-1.518654, p-value=0.129000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0, D.W. test=1.928344
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0, D.W. test=2.071656
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.921448 , 1.067938]
90% confidence interval for population standard deviation [0.959921 , 1.033411]
95% confidence interval for population variance [0.909498 , 1.084451]
95% confidence interval for population standard deviation [0.953676 , 1.041370]
99% confidence interval for population variance [0.887006 , 1.118262]
99% confidence interval for population standard deviation [0.941810 , 1.057479]
estimated line X1^2 residual plot
96
(23.1.5.2)
X0=residual,residual frequency distribution table,
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -2.96420~ -2.27648 -2.62034 9.00000 0.0090000 0.0090000
[ 2 ] -2.27648~ -1.58875 -1.93261 54.00000 0.0540000 0.0630000
[ 3 ] -1.58875~ -0.90102 -1.24489 121.00000 0.1210000 0.1840000
[ 4 ] -0.90102~ -0.21330 -0.55716 218.00000 0.2180000 0.4020000
[ 5 ] -0.21330~ 0.47443 0.13056 291.00000 0.2910000 0.6930000
[ 6 ] 0.47443~ 1.16215 0.81829 181.00000 0.1810000 0.8740000
[ 7 ] 1.16215~ 1.84988 1.50602 99.00000 0.0990000 0.9730000
[ 8 ] 1.84988~ 2.53761 2.19374 23.00000 0.0230000 0.9960000
[ 9 ] 2.53761~ 3.22533 2.88147 4.00000 0.0040000 1.0000000
frequency distribution: sample mean=-0.002854 , sample variance=1.013268 , sample sd
97
Concluson,
the population conditional expectation line is E (Y x ) = β 0 + β1 H ( x ),
( )
H ( x ) is the function of x , ε ~ Normal 0,σ 2 = 1 , there are n pair samples,
以 Yi = β 0 + β1 H ( X i ) + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope, ε i is error,
The thress basic assumptions,
i) ε i ~ Normal distribution,,ii) E (ε i ) = 0,Var (ε i ) = σ 2 ,iii) ε 1 ,..., ε n are independently,
98
(23.2.1.2)X1 marginal probability distribution,
f(x1),F(x1) Coefficient
Mathematical Mean: -0.00013
Geometrical Mean : none
Harmonic Mean : none
Variance : 4.00003
S.D. : 2.00001
Skewed Coef. : -0.00020
Kurtosis Coef. : 2.99965
MAD : 1.59580
Range : 23.23623
Mid_range : 0.42831
Median : -0.00000
Q1 : -1.34943
Q2 : -0.00000
Q3 : 1.34898
IQR : 2.69841
C.V. : none
(23.2.2)
Non-linear model analysis
The relation is X2=1.0000038041+2.0000020130*X1^2(This analysis of population data)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
X1^2 1 12797974037.9741990000 12797974037.9741990000
error 99999998 99969713.5608463290 0.9996971556
total 99999999 12897943751.5350460000
----------------------------------------------------------------------------------
F test value=12801851006.8304460000,
H0: slope(X1)=0
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 1.0000038041 0.0001224595 8165.99698 0.00000
slpoe 2.0000020130 0.0000176764 113145.26507 0.00000
----------------------------------------------------------------------------------
MSE=0.9996971556 , R2=0.992249 , R2(adj)=0.992249
99
X2(mean)= 9.0000657709, X2(variance)= 128.9794388051, X2(s.d.)= 11.3569114994
X1^2(mean)= 4.0000269573, X1^2(variance)= 31.9948710078, X1^2(s.d.)= 5.6564008882
SS(X1^2)=3199487068.7844071000 , SS(X2*X1^2)=6398980578.2747154000,
C.V.= 0.1110934733
100
The joint probability of x1^2 and The joint probability of X2 estimated
residual value and X2
101
(23.2.4) Conclusion,
X1~Normal(0,4),X2=1.0000038041+2.0000020130*X1^2+error,
error~Normal(0,1).
102
[ 6 ] 1.13645~ 3.20520 2.17082 212.00000 0.2120000 0.8850000
[ 7 ] 3.20520~ 5.27395 4.23957 93.00000 0.0930000 0.9780000
[ 8 ] 5.27395~ 7.34269 6.30832 16.00000 0.0160000 0.9940000
[ 9 ] 7.34269~ 9.41144 8.37707 6.00000 0.0060000 1.0000000
frequency distribution: sample mean=-0.195823 , sample variance=8.359417 , sample sd=2.891266
(24.1.2)liner model,
The linear mdoel analysis
The estimated line is X2=0.914975+2.016337*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 33222.8669385391 33222.8669385391 34431.1819581484
error 998 962.9765613322 0.9649063741
total 999 34185.8434998714
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.9149751100 0.0311157337 29.40555 0.00000
slpoe 2.0163366364 0.0108664347 185.55641 0.00000
----------------------------------------------------------------------------------
MSE=0.9649063741 , R2=0.971831 , R2(adj)=0.971803
X2(mean)= 0.5787895399, X2(variance)= 34.2200635634, X2(s.d.)= 5.8497917539
X1(mean)= -0.1667308742, X1(variance)= 8.1798536980, X1(s.d.)= 2.8600443525
SSX1=8171.6738443145 , SS(X2*X1)= 16476.8453532465, C.V.= 1.6971565864
103
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.25891 -0.82671 -0.51509 -0.24883 0.00002 0.24884
0.51510 0.82630 1.25881
upper limit -1.25891 -0.82671 -0.51509 -0.24883 0.00002 0.24884 0.51510
0.82630 1.25881
observed no 95.00000 121.00000 95.00000 97.00000 111.00000 69.00000 105.00000
111.00000 101.00000 95.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.25000 4.41000 0.25000 0.09000 1.21000 9.61000 0.25000
1.21000 0.01000 0.25000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =17.540000
p-value=0.024900
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=519
number of the positive ofresidual=481
H0: residualis random , H1: Increasing line or decreasing line
Z=0.299228, p-value=0.617700
H0: residual is random , H1: Oscillation
Z=0.299228, p-value=0.382300
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.299228, p-value=0.764600
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.138562
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=1.861438
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
104
[ 2 ] -2.41622~ -1.62533 -2.02078 34.00000 0.0340000 0.0390000
[ 3 ] -1.62533~ -0.83444 -1.22989 175.00000 0.1750000 0.2140000
[ 4 ] -0.83444~ -0.04355 -0.43900 281.00000 0.2810000 0.4950000
[ 5 ] -0.04355~ 0.74734 0.35190 282.00000 0.2820000 0.7770000
[ 6 ] 0.74734~ 1.53823 1.14279 163.00000 0.1630000 0.9400000
[ 7 ] 1.53823~ 2.32913 1.93368 52.00000 0.0520000 0.9920000
[ 8 ] 2.32913~ 3.12002 2.72457 5.00000 0.0050000 0.9970000
[ 9 ] 3.12002~ 3.91091 3.51546 3.00000 0.0030000 1.0000000
frequency distribution: sample mean=-0.011123 , sample variance=1.013525 , sample sd=1.006740
105
sample mean(X1)= 0.0002, sample variance(X1)=7.9996,
sample mean(X2)= 1.0003, sample variance(X2)=32.9977,
sample cov(X1,X2)=5.9990,
X1 and X2 sample correlation coefficient=0.9847.
E(X2|x1) and x1 are linear relation E(X1|x2) and x2 are linear relation
106
Mathematical Mean: 1.00026
Geometrical Mean : none
Harmonic Mean : none
Variance : 32.99767
S.D. : 5.74436
Skewed Coef. : -0.00042
Kurtosis Coef. : 3.00002
MAD : 4.58337
Range : 62.34209
Mid_range : 1.75812
Median : 1.00061
Q1 : -2.87410
Q2 : 1.00061
Q3 : 4.87528
IQR : 7.74938
C.V. : 5.74288
(24.2.2)
linear model analysis
The estimated line is X2=0.999800+1.999973*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
X1 1 3199757235.7981005000 3199757235.7981005000
error 99999998 100009863.4082655900 1.0000986541
total 99999999 3299767099.2063661000
----------------------------------------------------------------------------------
F test value=3199441597.8159437000,
H0: slope(X1)=0, The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.9998002603 0.0001000049 9997.50943 0.00000
slpoe 1.9999729773 0.0000353579 56563.60665 0.00000
----------------------------------------------------------------------------------
MSE=1.0000986541 , R2=0.969692 , R2(adj)=0.969692
X2(mean)= 1.0002574041, X2(variance)= 32.9976713220, X2(s.d.)= 5.7443599576
X1(mean)= 0.0002285750, X1(variance)=7.9996093391, X1(s.d.)= 2.8283580642
SSX1=799960925.9119683500 , SS(X2*X1)=1599900234.7154553000, C.V.= 0.9997919753
[testing the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.64499 -1.28166 -1.03648 -0.84165 -0.67450 -0.52440
-0.38531 -0.25359 -0.12562 -0.00023 0.12544 0.25334 0.38530 0.52440
0.67446 0.84159 1.03643 1.28156 1.64494
upper limit -1.64499 -1.28166 -1.03648 -0.84165 -0.67450 -0.52440 -0.38531
-0.25359 -0.12562 -0.00023 0.12544 0.25334 0.38530 0.52440 0.67446
0.84159 1.03643 1.28156 1.64494
observed no 4997611.00000 4998213.00000 5000648.00000 5003532.00000 4995760.00000 5003631.00000
5003659.00000 4991788.00000 5008607.00000 4988199.00000 5002254.00000 5010054.00000
4996379.00000 5000935.00000 4999903.00000 5001543.00000 4999865.00000 4994052.00000
5001195.00000 5002172.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 1.14146 0.63867 0.08398 2.49500 3.59552 2.63683 2.67766
13.48739 14.81609 27.85272 1.01610 20.21658 2.62233 0.17485 0.00188
0.47617 0.00364 7.07574 0.28561 0.94352
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
107
pearson chi-square test statistic =102.241750, p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=50000677, number of the positive ofresidual=49999323
H0: residualis random , H1: Increasing line or decreasing line
Z=1.046802, p-value=0.852500
H0: residual is random , H1: Oscillation, Z=1.046802, p-value=0.147500
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=1.046802, p-value=0.295000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t), t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0, D.W. test=2.000100
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0, D.W. test=1.999900
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.999866 , 1.000331]
90% confidence interval for population standard deviation [0.999933 , 1.000166]
95% confidence interval for population variance [0.999822 , 1.000376]
95% confidence interval for population standard deviation [0.999911 , 1.000188]
99% confidence interval for population variance [0.999734 , 1.000463]
99% confidence interval for population standard deviation [0.999867 , 1.000232]
The joint probability of X1 and residual The joint probability of X2 estimated
value and X2
108
E(| X0 distribution - X1 distribution |^2)= 0.0000000342
************ The | X0 distribution F() - X1 distribution F()| ****************
The almost surely limiting theory
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000005
Pr(| X0 distribution F() - X1 distribution F()|< 0.1000000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0500000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0100000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0050000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0010000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0005000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0001000000)= 1.000000
(24.2.4)Conclusion,
X1~Normal(0,8), X2=0.999800+1.999973*X1+error, error~Normal(0,1),
X2~Normal(1,9).
109
X1 frequency probability table
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -3.99999~ -3.11111 -3.55555 219.00000 0.2190000 0.2190000
[ 2 ] -3.11111~ -2.22222 -2.66666 93.00000 0.0930000 0.3120000
[ 3 ] -2.22222~ -1.33334 -1.77778 73.00000 0.0730000 0.3850000
[ 4 ] -1.33334~ -0.44446 -0.88890 69.00000 0.0690000 0.4540000
[ 5 ] -0.44446~ 0.44442 -0.00002 65.00000 0.0650000 0.5190000
[ 6 ] 0.44442~ 1.33331 0.88887 83.00000 0.0830000 0.6020000
[ 7 ] 1.33331~ 2.22219 1.77775 80.00000 0.0800000 0.6820000
[ 8 ] 2.22219~ 3.11107 2.66663 113.00000 0.1130000 0.7950000
[ 9 ] 3.11107~ 3.99996 3.55551 205.00000 0.2050000 1.0000000
frequency distribution: sample mean=0.028428 , sample variance=7.427828 , sample sd=2.725404
(25.1.2)Linear model,
The linear mdoel analysis
The estimated line is X2=1.003288+1.994835*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 31635.2079013432 31635.2079013432 30064.5594131703
error 998 1050.1380396651 1.0522425247
total 999 32685.3459410082
110
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 1.0032882298 0.0324391428 30.92832 0.00000
slpoe 1.9948353290 0.0115048147 173.39135 0.00000
----------------------------------------------------------------------------------
MSE=1.0522425247 , R2=0.967871 , R2(adj)=0.967839
X2(mean)= 1.0441211382, X2(variance)= 32.7180640050, X2(s.d.)= 5.7199706297
X1(mean)=0.0204693128, X1(variance)= 7.9577648652, X1(s.d.)= 2.8209510569
SSX1=7949.8071003290 , SS(X2*X1)= 15858.5560627216, C.V.= 0.9824422622
[testing the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.31465 -0.86332 -0.53790 -0.25985 0.00003 0.25986
0.53790 0.86289 1.31454
upper limit -1.31465 -0.86332 -0.53790 -0.25985 0.00003 0.25986 0.53790
0.86289 1.31454
observed no 95.00000 106.00000 112.00000 93.00000 86.00000 118.00000 97.00000
81.00000 114.00000 98.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.25000 0.36000 1.44000 0.49000 1.96000 3.24000 0.09000
3.61000 1.96000 0.04000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =13.440000, p-value=0.097500
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=492
number of the positive ofresidual=508
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.498246, p-value=0.309200
H0: residual is random , H1: Oscillation
Z=-0.498246, p-value=0.690800
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.498246, p-value=0.618400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0 , D.W. test=2.016499
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0 , D.W. test=1.983501
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.980074 , 1.135885]
90% confidence interval for population standard deviation [0.989987 , 1.065779]
95% confidence interval for population variance [0.967364 , 1.153448]
95% confidence interval for population standard deviation [0.983547 , 1.073987]
99% confidence interval for population variance [0.943441 , 1.189410]
99% confidence interval for population standard deviation [0.971309 , 1.090601]
estimated line residual plot
111
(25.1.3) )residual analysis
X0=residual,residual frequency distribution table,
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -3.41247~ -2.68336 -3.04792 3.00000 0.0030000 0.0030000
[ 2 ] -2.68336~ -1.95426 -2.31881 26.00000 0.0260000 0.0290000
[ 3 ] -1.95426~ -1.22515 -1.58970 84.00000 0.0840000 0.1130000
[ 4 ] -1.22515~ -0.49604 -0.86059 214.00000 0.2140000 0.3270000
[ 5 ] -0.49604~ 0.23307 -0.13149 271.00000 0.2710000 0.5980000
[ 6 ] 0.23307~ 0.96217 0.59762 227.00000 0.2270000 0.8250000
[ 7 ] 0.96217~ 1.69128 1.32673 124.00000 0.1240000 0.9490000
[ 8 ] 1.69128~ 2.42039 2.05584 40.00000 0.0400000 0.9890000
[ 9 ] 2.42039~ 3.14950 2.78494 11.00000 0.0110000 1.0000000
frequency distribution: sample mean=-0.009726 , sample variance=1.096746 , sam
112
(25.1.4)Conclusion,
Example 24
( )
X 1 ~ Normal µ X1 = 0, σ X2 1 = 8 , the population conditional expectation line is
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 ,
ε ~ Normal (0,σ 2 = 1), paird samples, n=1000.
and the example 25
( )
X 1 ~ Arc sin µ X1 = 0, c X1 = 4 , the population conditional expectation line is
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 ,
ε ~ Normal (0,σ 2 = 1), paird samples, n=1000.
The scatter diagram will be affected by the difference of example 24 and example 25.
E(X2|x1) and x1 are linear relation E(X1|x2) and x2 are linear relation
113
(25.2.1.2)X1 marginal probability distribution,
f(x1),F(x1) Coefficient
Mathematical Mean: -0.00009
Geometrical Mean : none
Harmonic Mean : none
Variance : 8.00016
S.D. : 2.82846
Skewed Coef. : 0.00009
Kurtosis Coef. : 1.49999
MAD : 2.54651
Range : 8.00000
Mid_range : 0.00000
Median : -0.00068
Q1 : -2.82871
Q2 : -0.00068
Q3 : 2.82824
IQR : 5.65696
C.V. : none
114
-0.00050362767001588780*(X- -7.88255373455696070000)^4+
-0.00016884419128152623*(X- -7.88255373455696070000)^5+
value range 0.0000000000<=F(x)<= 0.0500000000 ,
value range -12.1406624088<=X<= -7.2493987045 ,
Error=0.000001802185949689 MAX=0.000405076641770683 coefficient of
determination=0.999989517414489940,
115
-0.00246402792098705800*(X- -3.12696456286737190000)^2+
0.00054935093062757900*(X- -3.12696456286737190000)^3+
value range 0.3000000100<=F(x)<= 0.3500000000 ,
value range -3.6407210842<=X<= -2.5943772611 ,
Error=0.000000016150895336 MAX=0.000012288207797306 coefficient of
determination=0.999999952277162760,
116
0.00221499110969602950*(X-2.84948746905875840000)^3+
-0.00227351335534464740*(X-2.84948746905875840000)^4+
-0.02064939566048451500*(X-2.84948746905875840000)^5+
0.01077810503960563400*(X-2.84948746905875840000)^6+
0.07482507346776401400*(X-2.84948746905875840000)^7+
-0.01480126317869690000*(X-2.84948746905875840000)^8+
-0.09024417059117695300*(X- 2.84948746905875840000)^9+
value range 0.5500000100<=F(x)<= 0.6000000000 ,
value range 2.2408596227<=X<= 3.4498809780 ,
Error=0.000000013836781947 MAX=0.000009045522029627 coefficient of
determination=0.999999959351747570,
117
The distribution function estimated line ------
F(X)= 0.87502675117248407000+
0.08291309172761111800*(X-8.25074935649551480000)^1+
-0.00088256106199935402*(X-8.25074935649551480000)^2+
value range 0.8500000100<=F(x)<= 0.9000000000 ,
value range 7.9491564965<=X<= 8.5538951422 ,
Error=0.000001291310120531 MAX=0.000079548601771950 coefficient of
determination=0.999996216105796250,
(25.2.2)Linear model,
linear model analysis
The estimated line is X2=0.999889+2.000061*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 3200259101.8005004000 3200259101.8005004000 3199301615.1923523000
error 99999998 100029925.9875474000 1.0002992799
total 99999999 3300289027.7880478000
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
----------------------------------------------------------------------------------
Individual test
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.9998891566 0.0001000150 9997.39566 0.00000
slpoe 2.0000612022 0.0000353603 56562.36925 0.00000
----------------------------------------------------------------------------------
MSE= 1.0002992799 , R2=0.969691 , R2(adj)=0.969691
X2(mean)= 0.9997174806, X2(variance)= 33.0028906079, X2(s.d.)= 5.7448142362
X1(mean)= -0.0000858353, X1(variance)= 8.0001581996, X1(s.d.)= 2.8284550906
SSX1=800015811.9593379500 , SS(X2*X1)=1600080586.6603060000, C.V.=1.0004322702
118
[testing the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.64515 -1.28179 -1.03658 -0.84174 -0.67457 -0.52446
-0.38535 -0.25361 -0.12563 -0.00023 0.12545 0.25336 0.38534 0.52446
0.67453 0.84168 1.03654 1.28169 1.64510
upper limit -1.64515 -1.28179 -1.03658 -0.84174 -0.67457 -0.52446 -0.38535
-0.25361 -0.12563 -0.00023 0.12545 0.25336 0.38534 0.52446 0.67453
0.84168 1.03654 1.28169 1.64510
observed no 5002390.00000 4998681.00000 4998148.00000 4998710.00000 4998083.00000 5002449.00000
5000725.00000 4991509.00000 5010771.00000 4990005.00000 5000132.00000 5011954.00000
4999865.00000 4996435.00000 4998836.00000 5000731.00000 5001620.00000 5002092.00000
4996024.00000 5000840.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 1.14242 0.34795 0.68598 0.33282 0.73498 1.19952 0.10513
14.41942 23.20289 19.98000 0.00348 28.57962 0.00364 2.54184 0.27098
0.10687 0.52488 0.87529 3.16172 0.14112
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =98.360563
p -value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=50000548, number of the positive ofresidual=49999452
H0: residualis random , H1: Increasing line or decreasing line, Z=0.238401, p-value=0.594300
H0: residual is random , H1: Oscillation, Z=0.238401, p-value=0.405700
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.238401, p-value=0.811400
119
The joint probability of X1 and residual The joint probability of X2 estimated
value and X2
120
(25.2.1.3)Conclusion,
X1~Arcsin(0.0006830781, 4.0000397221),X2=0.999889+2.000061*X1+error,
Error~Normal(0,1).
121
pearson chi-square test statistic =95.208147 p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=49996755, number of the positive ofresidual=50003245
H0: residualis random , H1: Increasing line or decreasing line, Z=-0.011758, p-value=0.495300
H0: residual is random , H1: Oscillation, Z=-0.011758, p-value=0.504700
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.011758, p-value=0.990600
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.000392
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=1.999608
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.242424 , 0.242537]
90% confidence interval for population standard deviation [0.492366 , 0.492480]
95% confidence interval for population variance [0.242413 , 0.242548]
95% confidence interval for population standard deviation [0.492355 , 0.492491]
99% confidence interval for population variance [0.242392 , 0.242569]
99% confidence interval for population standard deviation [0.492333 , 0.492513]
The joint probability of X1 and residual The joint probability of X1 estimated
value and X1
(25.2.3.1)
The residual of X1 estimated line,
X0= residual,residual mariginal probability distribution
Mathematical Mean: -0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.24248
S.D. : 0.49242
Skewed Coef. : -0.00031
Kurtosis Coef. : 2.99892
MAD : 0.39291
Range : 5.78566
Mid_range : 0.04909
Median : 0.00004
Q1 : -0.33217
Q2 : 0.00004
Q3 : 0.33214
IQR : 0.66430
C.V. : none
122
SLLN analysis, X0=residual and Normal(0,1),Note:X1~ Normal(0,0.24248), X1 is
representable code of Normal(0,0.24248),
The probability limiting theory
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000054
Pr(| X0 distribution F() - X1 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0001000000)= 0.171208
(25.2.3.2)Conclusion,
X2~ The curve-fitting estimated line,
X1=-0.4847793029+0.4848304416*X2+error, error~Normal(0, 0.24248),
(ii)
X2 is the ramdom variable which has a priori probability distribution and X1 are
dependenet variable, the probability model is
X2~a special distribution,
X1=-0.4847793029+0.4848304416*X2+error, error~Normal(0,0.24248),
123
5.4. The error probability distribution is not normal distribution and
other basic assumptions are unchanged.
124
observed no 5.00000 46.00000 99.00000 206.00000 243.00000 216.00000 129.00000
41.00000 12.00000 3.00000
probability 0.00970 0.03590 0.10390 0.19970 0.25480 0.21590 0.12140
0.04540 0.01130 0.00200
expected no 9.70000 35.90000 103.90000 199.70000 254.80000 215.90000 121.40000
45.40000 11.30000 2.00000
chi square 2.27732 2.84150 0.23109 0.19875 0.54647 0.00005 0.47578
0.42643 0.04336 0.50000
pearson chi square test statistic=7.540751, degree of freedom=7, p-value=0.374800
correction:
expected number>=5 in each cell, the frequency table is adjusted
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit 970.70485 977.07952 983.45419 989.82886 996.20353 1002.57820 1008.95286
1015.32753 1021.70220
upper limit 977.07952 983.45419 989.82886 996.20353 1002.57820 1008.95286 1015.32753
1021.70220 1034.45154
observed no 5.00000 46.00000 99.00000 206.00000 243.00000 216.00000 129.00000
41.00000 15.00000
probability 0.00970 0.03590 0.10390 0.19970 0.25480 0.21590 0.12140
0.04540 0.01330
expected no 9.70000 35.90000 103.90000 199.70000 254.80000 215.90000 121.40000
45.40000 13.30000
chi square 2.27732 2.84150 0.23109 0.19875 0.54647 0.00005 0.47578
0.42643 0.21729
degree of freedom=6, pearson chi-square test statistic =7.214681
p-value=0.301400
125
2043.97721 2056.57566 2069.17410
observed no 7.00000 48.00000 98.00000 205.00000 245.00000 210.00000 133.00000
38.00000 13.00000 3.00000
probability 0.01050 0.03740 0.10520 0.19890 0.25180 0.21390 0.12170
0.04650 0.01190 0.00220
expected no 10.50000 37.40000 105.20000 198.90000 251.80000 213.90000 121.70000
46.50000 11.90000 2.20000
chi square 1.16667 3.00428 0.49278 0.18708 0.18364 0.07111 1.04922
1.55376 0.10168 0.29091
pearson chi square test statistic=8.101118, degree of freedom=7
p-value=0.323700
correction:
expected number>=5 in each cell, the frequency table is adjusted
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit 1943.18963 1955.78808 1968.38652 1980.98497 1993.58342 2006.18187 2018.78031
2031.37876 2043.97721
upper limit 1955.78808 1968.38652 1980.98497 1993.58342 2006.18187 2018.78031 2031.37876
2043.97721 2069.17410
observed no 7.00000 48.00000 98.00000 205.00000 245.00000 210.00000 133.00000
38.00000 16.00000
probability 0.01050 0.03740 0.10520 0.19890 0.25180 0.21390 0.12170
0.04650 0.01410
expected no 10.50000 37.40000 105.20000 198.90000 251.80000 213.90000 121.70000
46.50000 14.10000
chi square 1.16667 3.00428 0.49278 0.18708 0.18364 0.07111 1.04922
1.55376 0.25603
degree of freedom=6, pearson chi-square test statistic =7.964556 ,p-value=0.240700
(26.1.2)Linear model,
The linear mdoel analysis
The estimated line is X2=3.166510+1.997864*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 382218.8128254331 382218.8128254331 384923.1253450022
error 998 990.9884599890 0.9929744088
total 999 383209.8012854221
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100[error is assumption normal distribution]
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 3.1665098965 3.2203203082 0.98329 0.32540
slpoe 1.9978640165 0.0032201709 620.42173 0.00000
[Note:The p value of t test and F test is assumption normal distribution ]
126
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.27709 -0.83865 -0.52253 -0.25242 0.00002 0.25243
0.52253 0.83823 1.27698
upper limit -1.27709 -0.83865 -0.52253 -0.25242 0.00002 0.25243 0.52253
0.83823 1.27698
observed no 0.00000 148.00000 224.00000 152.00000 93.00000 94.00000 75.00000
54.00000 63.00000 97.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 100.00000 23.04000 153.76000 27.04000 0.49000 0.36000 6.25000
21.16000 13.69000 0.09000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =345.880000, p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=617
number of the positive ofresidual=383
H0: residualis random , H1: Increasing line or decreasing line
Z=0.293092, p-value=0.615300
H0: residual is random , H1: Oscillation Z=0.293092, p-value=0.384700
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.293092, p-value=0.769400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0 D.W. test=2.050645
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0 D.W. test=1.949355
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.924871 , 1.071905]
90% confidence interval for population standard deviation [0.961702 , 1.035329]
95% confidence interval for population variance [0.912877 , 1.088480]
95% confidence interval for population standard deviation [0.955446 , 1.043302]
99% confidence interval for population variance [0.890301 , 1.122416]
99% confidence interval for population standard deviation [0.943558 , 1.059441]
estimated line residual plot
127
(26.1.3) )residual analysis
X0=residual,residual frequency distribution table,
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -1.06633~ -0.28840 -0.67736 512.00000 0.5120000 0.5120000
[ 2 ] -0.28840~ 0.48953 0.10056 270.00000 0.2700000 0.7820000
[ 3 ] 0.48953~ 1.26745 0.87849 121.00000 0.1210000 0.9030000
[ 4 ] 1.26745~ 2.04538 1.65641 49.00000 0.0490000 0.9520000
[ 5 ] 2.04538~ 2.82330 2.43434 27.00000 0.0270000 0.9790000
[ 6 ] 2.82330~ 3.60123 3.21227 11.00000 0.0110000 0.9900000
[ 7 ] 3.60123~ 4.37915 3.99019 5.00000 0.0050000 0.9950000
[ 8 ] 4.37915~ 5.15708 4.76812 1.00000 0.0010000 0.9960000
[ 9 ] 5.15708~ 5.93501 5.54604 4.00000 0.0040000 1.0000000
frequency distribution: sample mean=0.015769 , sample variance=0.964105 , sample sd=0.981889
correction:
expected number>=5 in each cell, the frequency table is adjusted
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ]
lower limit -1.06633 -0.36619 0.33394 1.03407 1.73421 2.43434 3.13447
3.83461
upper limit -0.36619 0.33394 1.03407 1.73421 2.43434 3.13447 3.83461
5.93501
observed no 472.00000 270.00000 128.00000 73.00000 25.00000 16.00000 8.00000
8.00000
probability 0.48138 0.24965 0.12948 0.06715 0.03482 0.01806 0.00937
0.01009
expected no 481.38047 249.65331 129.47508 67.14831 34.82442 18.06063 9.36659
10.09118
chi square 0.18279 1.65825 0.01681 0.50995 2.77160 0.23511 0.19939
0.43335
degree of freedom=5, pearson chi-square test statistic =6.007245 , p-value=0.305500
128
(26.1.4)Conclusion,
129
(26.2.1.2)X1 marginal probability distribution,
f(x1),F(x1) Coefficient
Mathematical Mean: 999.99822
Geometrical Mean : 999.94821
Harmonic Mean : 999.89818
Variance : 100.01798
S.D. : 10.00090
Skewed Coef. : 0.00009
Kurtosis Coef. : 2.99998
MAD : 7.97943
Range : 114.27209
Mid_range : 999.86497
Median : 999.99816
Q1 : 993.25136
Q2 : 999.99816
Q3 : 1006.74323
IQR : 13.49187
C.V. : 0.01000
130
SLLN analysis, X2=residual and Normal(2001,401),Note:X3~ Normal(2001,401),
X3is representable code of Normal(2001,401)
E(| X2 distribution F() - X3 distribution F()|^2)= 0.0000000030
Pr(| X2 distribution F() - X3 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0001000000)= 0.058114
131
3529322.00000 3249696.00000 3033308.00000 2883804.00000 2808472.00000 2837661.00000
3110262.00000 7099568.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 5000000.00000 5000000.00000 5000000.00000 18588804.39990 13245460.88325
5133933.08852 1884871.93755 564035.22365 95868.59674 1335.53362 85432.31756 243623.67437
432578.75594 612712.81848 773575.48457 895657.10208 960558.99496 935141.99018 714221.94173
881637.15732
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =61049449.900423 p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=63213603
number of the positive ofresidual=36786397
H0: residualis random , H1: Increasing line or decreasing line
Z=-2.339794, p-value=0.009700
H0: residual is random , H1: Oscillation Z=-2.339794, p-value=0.990300
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-2.339794, p-value=0.019400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.999609
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.000391
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.999797 , 1.000262]
90% confidence interval for population standard deviation [0.999898 , 1.000131]
95% confidence interval for population variance [0.999752 , 1.000307]
95% confidence interval for population standard deviation [0.999876 , 1.000153]
99% confidence interval for population variance [0.999665 , 1.000394]
99% confidence interval for population standard deviation [0.999833 , 1.000197]
The joint probability of X1 and residual The joint probability of X2 estimated
value and X2
132
(26.2.3) residual analysis,
X0=residual,residual mariginal probability distribution
Mathematical Mean: -0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00003
S.D. : 1.00001
Skewed Coef. : 2.00143
Kurtosis Coef. : 9.01341
MAD : 0.73569
Range : 17.46274
Mid_range : 7.73083
Median : -0.30685
Q1 : -0.71219
Q2 : -0.30685
Q3 : 0.38620
IQR : 1.09839
C.V. : none
(26.2.4)Conclusion,
X1~Normal(1000,100),
X2=1.014509+1.999985*X1+error~Normal(20001,401),
error~Shifted exponential(1,-1).
Note 1:
The sum of two independent normal distribution and shifted exponential distribution,
the new probability distribution is not normal distribution.
X1~Normal(1000,100), error~Shifted exponential(1,-1),
X2=1.014509+1.999985*X1+error~Normal(20001,401),
X1 value is larger than error value, the probability distribution of X2 is closed to the
normal distribution.
133
Note 2:special case 1,X1~Normal(0,0.01), error~Shifted exponential(1,-1),
X2=1+2*X1+error, X2 is not Normal(1,1.04),
X2 marginal probability distribution
Mathematical Mean: 0.99999
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.04050
S.D. : 1.02005
Skewed Coef. : 1.88649
Kurtosis Coef. : 8.55547
MAD : 0.75078
Range : 19.30974
Mid_range : 8.70015
Median : 0.71305
Q1 : 0.29908
Q2 : 0.71305
Q3 : 1.40650
IQR : 1.10742
C.V. : 1.02006
f(x1,x2) f(x2,x1)
134
5.5. The variances of error are not equally and the other basic
assumptions are unchanged.
135
frequency distribution: sample mean=21.520889 , sample variance=9685.392088 , sample sd=98.414390
(27.1.2)Linear model
The linear mdoel analysis
The estimated line is X2=14.308033+0.708181*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 482.1211954987 482.1211954987 0.0511827938
error 998 9400755.9438872896 9419.5951341556
total 999 9401238.0650827885
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.821500
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 14.3080334157 31.4308126066 0.45522 0.64880
slpoe 0.7081806427 3.1302718635 0.22624 0.82080
----------------------------------------------------------------------------------
MSE=9419.5951341556 , R2=0.000051 , R2(adj)=-0.000951
X2(mean)= 21.3848374337, X2(variance)= 9410.6487137966, X2(s.d.)= 97.0084981525
X1(mean)= 9.9929362531, X1(variance)= 0.9622826008, X1(s.d.)= 0.9809600404
SSX1=961.3203181839 , SS(X2*X1)= 680.7884407509, C.V.= 4.5384772752
[testing the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -124.38487 -81.68244 -50.89324 -24.58548 0.00243 24.58640
50.89335 81.64170 124.37486
upper limit -124.38487 -81.68244 -50.89324 -24.58548 0.00243 24.58640 50.89335
81.64170 124.37486
observed no 85.00000 109.00000 111.00000 106.00000 89.00000 112.00000 103.00000
93.00000 92.00000 100.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 2.25000 0.81000 1.21000 0.36000 1.21000 1.44000 0.09000
0.49000 0.64000 0.00000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =8.500000 p-value=0.386200
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=500
number of the positive ofresidual=500
H0: residualis random , H1: Increasing line or decreasing line
Z=-1.455376, p-value=0.072800
H0: residual is random , H1: Oscillation
Z=-1.455376, p-value=0.927200
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-1.455376, p-value=0.145600
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
136
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.015410
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=1.984590
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [8773.545827 , 10168.352504]
90% confidence interval for population standard deviation [93.667208 , 100.838249]
95% confidence interval for population variance [8659.769991 , 10325.581868]
95% confidence interval for population standard deviation [93.057885 , 101.614870]
99% confidence interval for population variance [8445.612001 , 10647.510351]
99% confidence interval for population standard deviation [91.900011 , 103.186774]
estimated line residual plot
137
estimated line the residual plot of the second estimated
line
|residual|=-242.5040219496+101.0672375803*|X1|^0.5,
E(residual*residual)=Var(residual)=101.0672376803*101.0672376803*E(|X1|),
X2=14.308033+0.708181*X1+residual,
|residual|=-242.5040219496+101.0672375803*|X1|^0.5+residual*,
138
(27.2) sample size= 100,000,000, it is big data.
(27.2.1) Basiec analysis
(27.2.1.1) X1 and X2 joint probability distribution,
f(x1,x2) f(x2,x1)
139
SLLN analysis, X1=residual and Normal(10,1),Note:X2~Normal(0,1), X2 is
representable code of Normal(10,1),
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0000000015
Pr(| X1 distribution F() - X2 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0001000000)= 0.000000
140
value range 0.1000000100<=F(x)<= 0.2000000000 ,
value range -107.5163092722<=X<= -62.0432012067 ,
Error=0.000000233547746683 MAX=0.000020890090338330 coefficient of
determination=0.999999914447524340,
The distribution function estimated line ------
F(X)= 0.24899682944380910000+
0.00318975348182066500*(X--45.62364773537726100000)^1+
0.00001213660629145654*(X--45.62364773537726100000)^2+
-0.00000002751632815062*(X--45.62364773537726100000)^3+
value range 0.2000000100<=F(x)<= 0.3000000000 ,
value range -62.0431986609<=X<= -30.4738637617 ,
Error=0.000000099673653176 MAX=0.000017640429306021 coefficient of
determination=0.999999963604179750,
The distribution function estimated line ------
F(X)= 0.34951062691196494000+
0.00379050655727160810*(X--16.97804083056261600000)^1+
0.00000839573872546268*(X--16.97804083056261600000)^2+
-0.00000004136959653219*(X--16.97804083056261600000)^3+
value range 0.3000000100<=F(x)<= 0.4000000000 ,
value range -30.4738612351<=X<= -4.0029828124 ,
Error=0.000000104478204304 MAX=0.000016252327573463 coefficient of
determination=0.999999961701279250,
The distribution function estimated line ------
F(X)= 0.44985527289905308000+
0.00408116139290391410*(X-8.35507859593608690000)^1+
0.00000287623085366576*(X-8.35507859593608690000)^2+
-0.00000007602363901208*(X-8.35507859593608690000)^3+
value range 0.4000000100<=F(x)<= 0.5000000000 ,
value range -4.0029806215<=X<= 20.5702906021 ,
Error=0.000000095294152266 MAX=0.000014290182198229 coefficient of
determination=0.999999965154154900,
The distribution function estimated line ------
F(X)= 0.55016728455410380000+
0.00407298611669351320*(X-32.79901676105938400000)^1+
-0.00000331194750528143*(X-32.79901676105938400000)^2+
-0.00000007132410243069*(X-32.79901676105938400000)^3+
value range 0.5000000100<=F(x)<= 0.6000000000 ,
value range 20.5702921348<=X<= 45.1923860199 ,
Error=0.000000032360259050 MAX=0.000008567423601447 coefficient of
determination=0.999999988154823270,
The distribution function estimated line ------
F(X)= 0.65050750867698381000+
0.00376151001768973770*(X-58.26221002426717600000)^1+
-0.00000856418163655686*(X-58.26221002426717600000)^2+
-0.00000005153877821425*(X-58.26221002426717600000)^3+
value range 0.6000000100<=F(x)<= 0.7000000000 ,
value range 45.1923926241<=X<= 71.8755376329 ,
Error=0.000000188511263551 MAX=0.000022297376381153 coefficient of
determination=0.999999930999149970,
The distribution function estimated line ------
F(X)= 0.75101213763260977000+
0.00315197743657036340*(X-87.20846187146278800000)^1+
-0.00001194425136915499*(X-87.20846187146278800000)^2+
-0.00000003236150879327*(X-87.20846187146278800000)^3+
value range 0.7000000100<=F(x)<= 0.8000000000 ,
value range 71.8755434489<=X<= 103.8430413815 ,
Error=0.000000135959046181 MAX=0.000017448187597746 coefficient of
determination=0.999999950153608100,
The distribution function estimated line ------
F(X)= 0.85212189000189875000+
141
0.00220249096772242440*(X-124.97668440835029000000)^1+
-0.00001217755270888239*(X-124.97668440835029000000)^2+
0.00000001566674319784*(X-124.97668440835029000000)^3+
value range 0.8000000100<=F(x)<= 0.9000000000 ,
value range 103.8430422181<=X<= 150.0839242573 ,
Error=0.000000092168122129 MAX=0.000015401028635065 coefficient of
determination=0.999999966246177600,
The distribution function estimated line ------
F(X)= 0.96133595246982062000+
0.00073725354816319474*(X-204.67622832737760000000)^1+
-0.00000571180585784637*(X-204.67622832737760000000)^2+
0.00000001913929754021*(X-204.67622832737760000000)^3+
-0.00000000002195783893*(X-204.67622832737760000000)^4+
value range 0.9000000100<=F(x)<= 0.9999999900 ,
value range 150.0839267729<=X<= 801.5959538084 ,
Error=0.000653183502864461 MAX=0.005920349578180328 coefficient of
determination=0.999617179281705900
Left diagram, the comparison of
estimated line and the sample data.
(27.2.2)
The linear model analysis
The estimated line is X2=1.057107+1.992369*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 396935813.1594531500 396935813.1594531500 37432.8359736426
Error 99999998 1060394690640.6692000000 10603.9471184856
total 99999999 1060791626453.8286000000
----------------------------------------------------------------------------------
H0: slope(X1)=0, The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 1.0571068904 0.1034915241 10.21443 0.00000
slpoe 1.9923693011 0.0102977768 193.47567 0.00000
----------------------------------------------------------------------------------
MSE=10603.9471184856 , R2=0.000374 , R2(adj)=0.000374
X2(mean)= 20.9808330771, X2(variance)= 10607.9163706175, X2(s.d.)= 102.9947395289
X1(mean)= 10.0000166514, X1(variance)= 0.9999553447, X1(s.d.)= 0.9999776721
SSX1= 99995533.4745774870 , SS(X2*X1)=199228031.1403110000, C.V.= 4.9080733901
[testing the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -169.38485 -131.97304 -106.72642 -86.66552 -69.45338 -53.99801
-39.67537 -26.11194 -12.93506 -0.02331 12.91674 26.08630 39.67459 53.99812
69.44932 86.65914 106.72188 131.96242 169.37991
upper limit -169.38485 -131.97304 -106.72642 -86.66552 -69.45338 -53.99801 -39.67537
-26.11194 -12.93506 -0.02331 12.91674 26.08630 39.67459 53.99812 69.44932
86.65914 106.72188 131.96242 169.37991
observed no 4938599.00000 4548475.00000 4678516.00000 4822658.00000 4952994.00000 5070986.00000
5163751.00000 5224911.00000 5292317.00000 5296576.00000 5311669.00000 5292980.00000
142
5233672.00000 5161407.00000 5068570.00000 4952938.00000 4820357.00000 4679514.00000
4548569.00000 4940541.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 754.01656 40774.96513 20670.39245 6290.03699 441.91281 1007.80244 5362.87800
10116.99158 17089.84570 17591.46476 19427.51311 17167.45608 10920.52072 5210.44393
940.36898 442.96637 6454.32149 20542.25524 40757.98955 707.07454
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =242671.216418 p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=49999233
number of the positive ofresidual=50000767
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.869998, p-value=0.192200
H0: residual is random , H1: Oscillation Z=-0.869998, p-value=0.807800
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.869998, p-value=0.384400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t), t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0 D.W. test=2.000022
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0 D.W. test=1.999978
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [10601.480952 , 10606.414432]
90% confidence interval for population standard deviation [102.963493 , 102.987448]
95% confidence interval for population variance [10601.008659 , 10606.887208]
95% confidence interval for population standard deviation [102.961200 , 102.989743]
99% confidence interval for population variance [10600.085273 , 10607.811779]
99% confidence interval for population standard deviation [102.956716 , 102.994232]
143
(27.2.3) residual analysis I, the first line model residual,
residual = X 2i − 1.057107 + 1.992369 X 1i ,residual is dependent vairable,X1 is
independent variable, the model is non-linear model.
X 2i − 1.057107 + 1.992369 X 1i = residual i = α 0 + α 1G ( X 1i ) + ε i* , i = 1,2,...., n ,
|error|= 0.0147617544+ 0.7977439361*X1^2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
F
----------------------------------------------------------------------------------
X1^2 1 25582884352.8384210000 25582884352.8384210000
error 99999998 385383264709.4420800000 3853.8327241711
total 99999999 410966149062.2805200000
----------------------------------------------------------------------------------
F test value=6638296.5177454781,
H0: slope(X1)=0 The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.0147617544 0.0318823771 0.46301 0.64320
slpoe 0.7977439361 0.0003096244 2576.48918 0.00000
----------------------------------------------------------------------------------
MSE=3853.8327241711 , R2=0.062251 , R2(adj)=0.062251
|error|(mean)= 80.5871293433, |error|(variance)= 4109.6615317194, |error|(s.d.)= 64.1066418690
X1^2(mean)= 101.0002883634, X1^2(variance)= 401.9967005752, X1^2(s.d.)= 20.0498553754
SS(X1^2)=40199669655.5234070000 , SS(|error|*X1^2)=32069042701.9511030000,
C.V.= 0.7703369759
144
253832.44361 316161.16092 357102.65707 368863.21983 329924.34188 189290.14849 817857.75912
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =14280442.722998
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=57508356, number of the positive ofresidual=42491644
H0: residualis random , H1: Increasing line or decreasing line, Z=-0.271073, p-value=0.393200
H0: residual is random , H1: Oscillation, Z=-0.271073, p-value=0.606800
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.271073, p-value=0.786400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0 D.W. test=2.000399
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0 D.W. test=1.999601
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
the joint probability distribution of X1^2 the joint probability distribution of
and residual of the second line model absoluted value of resisual(1st estimated
line) and the estimated value.
The residual analysis(the residual is com from the second estimated line )
X0= the estimate value of ε i* ,X0 the frequency distribution table
Mathematical Mean: -0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 3853.83269
S.D. : 62.07925
Skewed Coef. : 1.05257
Kurtosis Coef. : 4.47947
MAD : 48.75117
Range : 911.65015
Mid_range : 275.73722
Median : -11.94476
Q1 : -46.39342
Q2 : -11.94476
Q3 : 34.48710
IQR : 80.88052
C.V. : none
145
(27.2.4) residual analysis II, the first line model residual,
The residual is come from X2 estimated line.
residual= X 2i − 1.057107 + 1.992369 X 1i , square of residual is dependent variable,X1
is the independent variable, the model is non-linear model.
( X 2i − 1.057107 + 1.992369 X 1i )2 = (residuali )2 = α 0 + α 1G ( X 1i ) + ε i* , i = 1,2,...., n ,
The non-linear model does not have the modelerror^2=b0+b1*X1^4.
Please refer the Appendix 2.
error^2= -3531.6577789815+ 13.7238340284*X1^3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
F
----------------------------------------------------------------------------------
X1^3 1 1763214739041286.0000000000
1763214739041286.0000000000 6769613.5938211801
error 99999998 26046016945285144.0000000000
260460174.6620549300
total 99999999 27809231684326428.0000000000
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
146
161817.89121
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =61665284.150412
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=68076201
number of the positive ofresidual=31923799
H0: residualis random , H1: Increasing line or decreasing line
Z=0.323444, p-value=0.626900
H0: residual is random , H1: Oscillation
Z=0.323444, p-value=0.373100
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.323444, p-value=0.746200
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.000306
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=1.999694
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
(27.2.5)residual analysis conclusion,
X2=1.057107+1.992369*X1+residual,
| residual |=0.0147617544+0.7977439361*X1^2,
residual ^2=-3531.6577789815+13.7238340284*X1^3,
147
W2 Coefficient
Mathematical Mean: 0.00342
Geometrical Mean : none
Harmonic Mean : none
Variance : 101.02253
S.D. : 10.05100
Skewed Coef. : -0.00003
Kurtosis Coef. : 3.11837
MAD : 7.97972
Range : 129.22523
Mid_range : 3.13574
Median : 0.00260
Q1 : -6.68951
Q2 : 0.00260
Q3 : 6.69701
IQR : 13.38652
C.V. : none
(27.2.6.2)
X2=1.057107+1.992369*X1+residual,
| residual |=0.0147617544+0.7977439361*X1^2,
let | residual |/(X1^2), W1= X1,W2=( X2-1.057107-1.992369*X1)/ (X1^2),
W1=Z1,W2=Z2/Z3,
f(w1,w2) f(w2,w1)
148
5.6. The independent variable has a shifted exponential distribution
and the non-linear model, the three basic assumptions are
unchanged.
(
Example 28 X 1 ~ Shifted _ exponential λ X 1 = 1, c X 1 = 0.1 , )
the population conditional expectation line is
E ( X 2 x1 ) = β 0 + β1 ( x1 + log( x1 )) = 1 + 2( x1 + log( x1 )),
ε ~ Normal (0, σ 2 = 1),
X 2i = β 0 + β1 H ( X 1i ) + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope,
ε i is error,
three basic assumptions are
i) ε i ~ Normal distribution,ii) E (ε i ) = 0,Var (ε i ) = σ 2 ,
iii) ε 1 ,..., ε n are independently,
(28.1) paird samples, n=1000,
(28.1.1) Basic analysis,
scatter diagram scatter diagram using the linear model
149
[ 8 ] 11.60192~ 14.07028 12.83610 19.00000 0.0190000 0.9940000
[ 9 ] 14.07028~ 16.53864 15.30446 6.00000 0.0060000 1.0000000
frequency distribution: sample mean=2.708426 , sample variance=15.002841 , sample s
150
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.861633, p-value=0.389000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.946452
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.053548
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [1.488609 , 1.725267]
90% confidence interval for population standard deviation [1.220086 , 1.313494]
95% confidence interval for population variance [1.469305 , 1.751944]
95% confidence interval for population standard deviation [1.212149 , 1.323610]
99% confidence interval for population variance [1.432969 , 1.806565]
99% confidence interval for population standard deviation [1.197067 , 1.344085]
estimated line residual plot
151
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.56114 -1.02345 -0.63577 -0.30451 0.00509 0.31464
0.64588 1.03305 1.57113
upper limit -1.56114 -1.02345 -0.63577 -0.30451 0.00509 0.31464 0.64588
1.03305 1.57113
observed no 108.00000 83.00000 97.00000 79.00000 98.00000 122.00000 115.00000
91.00000 105.00000 102.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.64000 2.89000 0.09000 4.41000 0.04000 4.84000 2.25000
0.81000 0.25000 0.04000
degree of freedom=7
H0: X0~Normal(mu=0.005057,sigma*sigma=1.493452), sigma=1.222069
pearson chi-square test statistic =16.260000 p-value=0.022800
(28.1.3)Non-linear model
(28.1.3.1) Non-linear model analysis
The relation is X2= -5.7019052126+ 8.6972906461*|X1|^0.5
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
|X1|^0.5 1 13615.4936331317 13615.4936331317 14263.6780241845
error 998 952.6478810603 0.9545569951
total 999 14568.1415141920
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -5.7019052126 0.0767990536 -74.24447 0.00000
slpoe 8.6972906461 0.0728229420 119.43064 0.00000
----------------------------------------------------------------------------------
MSE=0.9545569951 , R2=0.934607 , R2(adj)=0.934542
X2(mean)= 2.6952984452, X2(variance)= 14.5827242384, X2(s.d.)= 3.8187333291
|X1|^0.5(mean)= 0.9654964977, |X1|^0.5(variance)= 0.1801772425, |X1|^0.5(s.d.)= 0.4244728996
SS(|X1|^0.5)= 179.9970652660 , SS(X2*|X1|^0.5)= 1565.4867920592, C.V.= 0.3624883651
152
0.25000 1.21000 0.25000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =5.520000
p-value=0.700800
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=505
number of the positive ofresidual=495
H0: residualis random , H1: Increasing line or decreasing line
Z=-1.515641, p-value=0.064900
H0: residual is random , H1: Oscillation
Z=-1.515641, p-value=0.935100
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-1.515641, p-value=0.129800
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.863513
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.136487
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.889088 , 1.030434]
90% confidence interval for population standard deviation [0.942915 , 1.015103]
95% confidence interval for population variance [0.877558 , 1.046367]
95% confidence interval for population standard deviation [0.936781 , 1.022921]
99% confidence interval for population variance [0.855856 , 1.078991]
99% confidence interval for population standard deviation [0.925125 , 1.038745]
estimated line residual plot
153
X 2i = β 0 + β1 ( X 1i + log( X 1i )) + ε i , i = 1,2,...., n ,n=1,000 時,
The estimated line X2=-5.7019052126+8.6972906461*|X1|^0.5,
MSE=0.9545569951 , R2=0.934607,
X 1 + log( X 1 ) can replaced by the X1 .
(28.1.4)Curve-linear model
(28.1.4.1)Curve-linear model analysis,
The estimated line ------
X2=3.46357664007337010000+
3.67239577180589550000*(X1-1.11218055224209960000)^1+
-1.27835332491667940000*(X1-1.11218055224209960000)^2+
0.90127949018938125000*(X1-1.11218055224209960000)^3+
0.49003005831036717000*(X1-1.11218055224209960000)^4+
-0.29802305408520624000*(X1-1.11218055224209960000)^5+
-0.59487223676114809000*(X1-1.11218055224209960000)^6+
0.58458658553718124000*(X1- 1.11218055224209960000)^7+
-0.20875884690030944000*(X1-1.11218055224209960000)^8+
0.03382465923368727100*(X1- 1.11218055224209960000)^9+
-0.00208700331694444690*(X1- 1.11218055224209960000)^10+
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
regression 10 13640.4301041395 1364.0430104139 1454.1575350712
error 989 927.7114100525 0.9380297372
total 999 14568.1415141920
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE= 0.9380297372 , R2=0.936319 , R2(adj)=0.935675
X2(Mean)= 2.6952984452, X2(Var)= 14.5827242384, X2(sd)= 3.8187333291
X1(Mean)= 1.1121805522, X1(Var)= 0.9483783370, X1(sd)= 0.9738471836
------------------- individual test -------------------------
parameter coefficient standard error t test p value
----------------------------------------------------------------------------------
b0 3.4635766401 0.0707195225 48.9762447154 0.0000000000
b1 3.6723957718 0.1983373678 18.5159045545 0.0000000000
b2 -1.2783533249 0.4801468941 -2.6624213144 0.0078000000
b3 0.9012794902 0.5152977320 1.7490461032 0.0802000000
b4 0.4900300583 0.8797847787 0.5569885615 0.5774000000
b5 -0.2980230541 0.4557298397 -0.6539467643 0.5132000000
b6 -0.5948722368 0.4481985075 -1.3272517128 0.1846000000
b7 0.5845865855 0.3624036389 1.6130814453 0.1066000000
b8 -0.2087588469 0.1211280822 -1.7234553967 0.0850000000
b9 0.0338246592 0.0188148083 1.7977679461 0.0722000000
b10 -0.0020870033 0.0011241658 -1.8564906575 0.0634000000
[testing the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
154
lower limit -1.24125 -0.81512 -0.50787 -0.24534 0.00002 0.24535
0.50787 0.81471 1.24115
upper limit -1.24125 -0.81512 -0.50787 -0.24534 0.00002 0.24535 0.50787
0.81471 1.24115
observed no 97.00000 96.00000 94.00000 96.00000 131.00000 87.00000 106.00000
104.00000 88.00000 101.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.09000 0.16000 0.36000 0.16000 9.61000 1.69000 0.36000
0.16000 1.44000 0.01000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =14.040000 p-value=0.080700
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=514
number of the positive ofresidual=486
H0: residualis random , H1: Increasing line or decreasing line
Z=-1.178388, p-value=0.119400
H0: residual is random , H1: Oscillation
Z=-1.178388, p-value=0.880600
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-1.178388, p-value=0.238800
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.860814
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.139186
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
155
(28.2) n = 100,000,000, it is big data.
(28.2.1) Basiec analysis,
(28.2.1.1) X1 and X2 joint probability distribution
f(x1,x2) f(x2,x1)
156
Curve-fitting estimated the distribution function of X1,
The distribution function estimated line ------
F(X)=1- exp( -1*(X-0.1000000037)/ 0.9998860023 )^ 0.9999155137 )
SSE=0.000941193706477202 MAX error=0.000090575757443090
coefficient of determination=0.999999993540950370
Left diagram, the comparison the
estimated line and the sample data.
157
determination=0.999999970317733580,
158
determination=0.999999970885772080,
(28.2.2)
Non-linear model analysis
The relation is X2= -5.6489811404+ 8.6404656140*|X1|^0.5
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
|X1|^0.5 1 1368250291.0205469000 1368250291.0205469000 1325230928.3527293000
error 99999998 103246176.5253460400 1.0324617859
total 99999999 1471496467.5458930000
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -5.6489811404 0.0002489346 -22692.62692 0.00000
slpoe 8.6404656140 0.0002373512 36403.72135 0.00000
----------------------------------------------------------------------------------
MSE=1.0324617859 , R2=0.929836 , R2(adj)=0.929836
X2(mean)= 2.6238670284, X2(variance)= 14.7149648226, X2(s.d.)= 3.8360089706
|X1|^0.5(mean)= 0.9574539774, |X1|^0.5(variance)= 0.1832699499, X1|^0.5(s.d.)= 0.4281003970
SS(|X1|^0.5)= 18326994.8069597260 , SS(X2*|X1|^0.5)=158353768.4368600500,
C.V.= 0.3872533389
159
5005753.00000 5001931.00000 4995232.00000 4993472.00000 4981832.00000 4968200.00000
4960777.00000 4998337.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 449.56058 5.19180 15.84912 46.53080 42.86006 38.21507 87.02792
30.41071 127.79546 2.93838 45.13811 58.00418 6.61940 0.74575 4.54676
8.52296 66.01524 202.24800 307.68875 0.55311
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =1546.462174 p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=50071508
number of the positive ofresidual=49928492
H0: residualis random , H1: Increasing line or decreasing line
Z=1.556457, p-value=0.940300
H0: residual is random , H1: Oscillation
Z=1.556457, p-value=0.059700
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=1.556457, p-value=0.119400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.000006
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=1.999994
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [1.032222 , 1.032702]
90% confidence interval for population standard deviation [1.015983 , 1.016219]
95% confidence interval for population variance [1.032176 , 1.032748]
95% confidence interval for population standard deviation [1.015960 , 1.016242]
99% confidence interval for population variance [1.032086 , 1.032838]
99% confidence interval for population standard deviation [1.015916 , 1.016286]
The joint probability distribution X1 and The joint probability distribution X2
resiudal estimated line and X2
160
(28.2.3) residual analysis,
X0=residual,residual mariginal probability distribution
Mathematical Mean: 0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.03246
S.D. : 1.01610
Skewed Coef. : 0.02346
Kurtosis Coef. : 3.07275
MAD : 0.80963
Range : 18.46113
Mid_range : 3.55036
Median : -0.00182
Q1 : -0.68489
Q2 : -0.00182
Q3 : 0.68216
IQR : 1.36705
C.V. : none
161
----------------------------------------------------------------------------------
MSE=0.3737779995 , R2=0.008425 , R2(adj)=0.008425
|error|(mean)= 0.8096343458, |error|(variance)= 0.3769539951, |error|(s.d.)= 0.6139657931
X1^3(mean)= 6.6339102335, X1^3(variance)= 755.2108190482, X1^3(s.d.)= 27.4810993057
SS(X1^3)=75521081149.6118320000 , SS(|error|*X1^3)=154872495.8848765800,
C.V.= 0.7551234275
162
X 2 = -5.6489811404 + 8.6404656140 × X 1 is close to X 2 = 1 + 2( X 1 + log( X 1 ))
(
when X 1 ~ Shifted _ exponential λ X1 = 1, c X1 = 0.1 . )
Note:
( )
(i) X 1 ~ Shifted _ exponential λ X 1 = 1, c X 1 = 0.1 ,
let W1 = 1 + 2( X 1 + log( X 1 )), W2 = -5.6489811404 + 8.6404656140 X 1 and
the probability distribution,
f(w1) Coefficient
Mathematical Mean: 2.62385
Geometrical Mean : none
Harmonic Mean : none
Variance : 13.71195
S.D. : 3.70297
Skewed Coef. : 0.88546
Kurtosis Coef. : 4.16572
MAD : 2.91641
Range : 46.88099
Mid_range : 20.03533
Median : 2.12241
Q1 : -0.11984
Q2 : 2.12241
Q3 : 4.76471
IQR : 4.88455
C.V. : 1.41127
f(w2) Coefficient
Mathematical Mean: 2.62416
Geometrical Mean : none
Harmonic Mean : none
Variance : 13.68024
S.D. : 3.69868
Skewed Coef. : 0.81283
Kurtosis Coef. : 3.50108
MAD : 2.97408
Range : 34.74667
Mid_range : 14.45671
Median : 2.04549
Q1 : -0.26919
Q2 : 2.04549
Q3 : 4.88529
IQR : 5.15448
C.V. : 1.40947
f(w1,w2) f(w,w1)
163
The comparison of distribution functions of W1 andW2, the SLLN method.
E(| W1 distribution F() - W2 distribution F()|^2)= 0.0001098779
Pr(| W1 distribution F() - W2 distribution F()|>= 0.1000000000)= 0.000000
Pr(| W1 distribution F() - W2 distribution F()|>= 0.0500000000)= 0.000000
Pr(| W1 distribution F() - W2 distribution F()|>= 0.0100000000)= 0.359005
Pr(| W1 distribution F() - W2 distribution F()|>= 0.0050000000)= 0.730856
Pr(| W1 distribution F() - W2 distribution F()|>= 0.0010000000)= 0.943118
Pr(| W1 distribution F() - W2 distribution F()|>= 0.0005000000)= 0.972091
Pr(| W1 distribution F() - W2 distribution F()|>= 0.0001000000)= 0.994396
( )
(ii) X 1 ~ Beta α X 1 = 5, β X 1 = 5 ,
let W1 = 1 + 2( X 1 + log( X 1 )), W2 = -5.6489811404 + 8.6404656140 X 1 and the
probability distribution,
f(w1) Coefficient
Mathematical Mean: 0.50887
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.95557
S.D. : 0.97753
Skewed Coef. : -0.63352
Kurtosis Coef. : 3.50644
MAD : 0.77685
Range : 12.37201
Mid_range : -3.22364
Median : 0.61394
Q1 : -0.08908
Q2 : 0.61394
Q3 : 1.22121
IQR : 1.31029
C.V. : 1.92097
f(w2) Coefficient
Mathematical Mean: 0.38506
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.91912
S.D. : 0.95871
Skewed Coef. : -0.41435
Kurtosis Coef. : 2.94306
MAD : 0.77238
Range : 7.73477
Mid_range : -0.90347
Median : 0.46061
Q1 : -0.23948
Q2 : 0.46061
Q3 : 1.08852
IQR : 1.32800
C.V. : 2.48974
164
f(w1,w2) f(w,w1)
(
(iii) X 1 ~ U _ quadratic a X 1 = 0.1, b X 1 = 10.1 , )
let W1 = 1 + 2( X 1 + log( X 1 )), W2 = -5.6489811404 + 8.6404656140 X 1 and the
probability distribution,
f(w1) Coefficient
Mathematical Mean: 13.35891
Geometrical Mean : none
Harmonic Mean : none
Variance : 102.64776
S.D. : 10.13152
Skewed Coef. : -0.11405
Kurtosis Coef. : 1.29408
MAD : 9.68774
Range : 29.23024
Mid_range : 11.20995
Median : 13.96007
Q1 : 3.50935
Q2 : 13.96007
Q3 : 23.54578
IQR : 20.03644
C.V. : 0.75841
165
f(w2) Coefficient
Mathematical Mean: 11.85983
Geometrical Mean : none
Harmonic Mean : none
Variance : 74.17143
S.D. : 8.61228
Skewed Coef. : -0.20846
Kurtosis Coef. : 1.35965
MAD : 8.15813
Range : 24.72747
Mid_range : 9.44711
Median : 13.43310
Q1 : 3.54255
Q2 : 13.43310
Q3 : 20.37056
IQR : 16.82800
C.V. : 0.72617
f(w1,w2) f(w,w1)
166
5.7. The random vatiable range has a specific region and the three
basic assumptions are unchanged.
(
Example 29, X 1 ~ Normal µ X 1 = 2, σ X2 1 = 5 2 , )
the population conditional expectation line is
( )
E X 2 x 1 = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0,σ 2 = 1 , ( )
− 20 ≤ X 1 X 2 ≤ 20 , X 2i = β 0 + β1 X 1i + ε i , i = 1,2,...., n ,
three basic assumptions
i) ε i ~ Normal distribution,ii) E (ε i ) = 0, Var (ε i ) = σ 2 ,
iii) ε 1 ,..., ε n are independently,
(29.1) paird samples, n=1000,
(29.1.1) Basic analysis
scatter diagram scatter diagram using the linear model
167
(29.1.2) The linear mdoel analysis
The estimated line is X2=0.941446+1.939680*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 12444.1963216948 12444.1963216948 13576.3568375690
error 998 914.7747129542 0.9166079288
total 999 13358.9710346490
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.9414462120 0.0302768708 31.09457 0.00000
slpoe 1.9396800517 0.0166470957 116.51762 0.00000
----------------------------------------------------------------------------------
MSE=0.9166079288 , R2=0.931524 , R2(adj)=0.931455
X2(mean)= 0.9746026257, X2(variance)= 13.3723433780, X2(s.d.)= 3.6568214857
X1(mean)= 0.0170937541, X1(variance)= 3.3108626684, X1(s.d.)= 1.8195776071
SSX1=3307.5518056979 , SS(X2*X1)= 6415.5922574834, C.V.= 0.9823454269
168
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.060949
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=1.939051
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.853742 , 0.989468]
90% confidence interval for population standard deviation [0.923981 , 0.994720]
95% confidence interval for population variance [0.842670 , 1.004768]
95% confidence interval for population standard deviation [0.917971 , 1.002381]
99% confidence interval for population variance [0.821831 , 1.036095]
99% confidence interval for population standard deviation [0.906549 , 1.017887]
estimated line residual plot
− 20 ≤ X 1 X 2 ≤ 20 cannot be displayed.
169
(29.2) )n = 100,000,000, it is big data.
(29.2.1) Basiec analysis
(29.2.1.1) X1 and X2 joint probability distribution,
f(x1,x2) f(x2,x1)
170
-0.00966084117124410560*(X--3.25954375288234570000)^3+
-0.02777743313523883800*(X--3.25954375288234570000)^4+
value range 0.0000000000<=F(x)<= 0.0500000000 ,
value range -4.8603737625<=X<= -2.9598606193 ,
Error=0.000021032801105810 MAX=0.001055199533229372 coefficient of
determination=0.999882158334783560,
171
0.00897070770342134340*(X--0.90966334524730819000)^2+
-0.00088349228864359475*(X--0.90966334524730819000)^3+
value range 0.3000000100<=F(x)<= 0.3500000000 ,
value range -1.0721952808<=X<= -0.7492054344 ,
Error=0.000000011732264046 MAX=0.000007908876976825 coefficient of
determination=0.999999965671563910,
172
0.00406936298207664220*(X-0.87179656701912989000)^2+
-0.00005390853473841162*(X-0.87179656701912989000)^3+
value range 0.6000000100<=F(x)<= 0.6500000000 ,
value range 0.7317211982<=X<= 1.0112597272 ,
Error=0.000000013640841233 MAX=0.000008635026152226 coefficient of
determination=0.999999959820099700,
173
Error=0.000002779778600875 MAX=0.000126897610485011 coefficient of
determination=0.999991820409337540,
174
determination=0.999999840114692780,
175
determination=0.999999875576865980,
176
The distribution function estimated line ------
F(X)= 0.72498905470022679000+
0.09069476539588389200*(X-3.84448233551434180000)^1+
0.00043206523340668344*(X-3.84448233551434180000)^2+
-0.00038638921889067035*(X-3.84448233551434180000)^3+
value range 0.7000000100<=F(x)<= 0.7500000000 ,
value range 3.5684929476<=X<= 4.1200041420 ,
Error=0.000000025808463601 MAX=0.000012677298795838 coefficient of
determination=0.999999924307723890,
177
Left diagram, the comparison of the
estimated line and sample data.
178
number of the negative of residual=50005644
number of the positive ofresidual=49994356
H0: residualis random , H1: Increasing line or decreasing line
Z=0.234327, p-value=0.592700
H0: residual is random , H1: Oscillation
Z=0.234327, p-value=0.407300
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.234327, p-value=0.814600
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t), t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0, D.W. test=1.999884
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0, D.W. test=2.000116
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.988616 , 0.989076]
90% confidence interval for population standard deviation [0.994291 , 0.994523]
95% confidence interval for population variance [0.988571 , 0.989120]
95% confidence interval for population standard deviation [0.994269 , 0.994545]
99% confidence interval for population variance [0.988485 , 0.989206]
99% confidence interval for population standard deviation [0.994226 , 0.994588]
The joint probability distribution of The joint probability distribution of X2
X1and residual estimated line and X
179
SLLN analysis, X0=residual and Normal(0, 0.98885),
Note:X1~ Normal(0, 0.98885), X1 is representable code of Normal(0, 0.98885),
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000016
Pr(| X0 distribution F() - X1 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0001000000)= 0.000000
Note:
(
Case 1, X 1 ~ Normal µ X 1 = 2, σ X2 1 = 5 2 ,)
( )
the population conditional expectation line is E X 2 x 1 = β 0 + β1 x1 = 1 + 2 x1 ,
ε ~ Normal (0,σ 2 = 1),
f X1 ( x1 ) Coefficient
Mathematical Mean: 2.00055
Geometrical Mean : none
Harmonic Mean : none
Variance : 25.00207
S.D. : 5.00021
Skewed Coef. : 0.00005
Kurtosis Coef. : 3.00083
MAD : 3.98942
Range : 55.92611
Mid_range : 2.64690
Median : 2.00030
Q1 : -1.37190
Q2 : 2.00030
Q3 : 5.37332
IQR : 6.74521
C.V. : 2.49941
f X 2 (x2 ) Coefficient
Mathematical Mean: 4.99941
Geometrical Mean : none
Harmonic Mean : none
Variance : 101.01026
S.D. : 10.05039
Skewed Coef. : -0.00001
Kurtosis Coef. : 3.00072
MAD : 8.01866
Range : 112.96368
Mid_range : 5.47472
Median : 5.00042
Q1 : -1.77931
Q2 : 5.00042
Q3 : 11.77619
IQR : 13.55550
C.V. : 2.01031
180
f X1 , X 2 ( x1 , x2 ) f X 2 , X1 ( x2 , x1 )
Case 2,
( )
X 1 ~ Normal µ X 1 = 2, σ X2 1 = 5 2 , the population conditional expectation line is
( ) ( )
E X 2 x 1 = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0,σ 2 = 1 , − 20 ≤ X 1 X 2 ≤ 20 ,
P(− 20 ≤ X 1 X 2 ≤ 20 ) = 0.4349,
f X1 (x1 − 20 ≤ X 1 X 2 ≤ 20 ) Coefficient
Mathematical Mean: 0.03961
Geometrical Mean : none
Harmonic Mean : none
Variance : 3.19544
S.D. : 1.78758
Skewed Coef. : -0.18257
Kurtosis Coef. : 1.94858
MAD : 1.52948
Range : 9.09938
Mid_range : -0.26951
Median : 0.15974
Q1 : -1.40847
Q2 : 0.15974
Q3 : 1.56271
IQR : 2.97118
C.V. : 45.13446
181
f X 2 (x2 − 20 ≤ X 1 X 2 ≤ 20 ) Coefficient
Mathematical Mean: 1.06531
Geometrical Mean : none
Harmonic Mean : none
Variance : 12.92068
S.D. : 3.59453
Skewed Coef. : -0.17302
Kurtosis Coef. : 1.94628
MAD : 3.07533
Range : 18.15173
Mid_range : 0.91494
Median : 1.30158
Q1 : -1.85081
Q2 : 1.30158
Q3 : 4.12148
IQR : 5.97229
C.V. : 3.37416
f X1 , X 2 (x1 , x2 − 20 ≤ X 1 X 2 ≤ 20 ) f X 2 , X1 (x2 , x1 − 20 ≤ X 1 X 2 ≤ 20 )
182
Case 3,
( )
X 1 ~ Normal µ X 1 = 2, σ X2 1 = 5 2 , the population conditional expectation line is
( ) ( )
E X 2 x 1 = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0,σ 2 = 1 , 50 ≤ X 12 + X 22 ≤ 200 ,
P (50 ≤ X 2
)
+ X 22 ≤ 200 = 0.3164,
( )
1
f X1 x1 5 ≤ X 12 + X 22 ≤ 20 Coefficient
Mathematical Mean: 1.56229
Geometrical Mean : none
Harmonic Mean : none
Variance : 18.17945
S.D. : 4.26374
Skewed Coef. : -0.82863
Kurtosis Coef. : 1.93590
MAD : 3.78165
Range : 16.35907
Mid_range : -0.37002
Median : 3.58329
Q1 : -3.81694
Q2 : 3.58329
Q3 : 4.67325
IQR : 8.49018
C.V. : 2.72916
(
f X 2 x2 50 ≤ X 12 + X 22 ≤ 200 ) Coefficient
Mathematical Mean: 4.11852
Geometrical Mean : none
Harmonic Mean : none
Variance : 73.26312
S.D. : 8.55939
Skewed Coef. : -0.83996
Kurtosis Coef. : 1.91924
MAD : 7.62476
Range : 26.79517
Mid_range : 0.11570
Median : 8.18657
Q1 : -6.79991
Q2 : 8.18657
Q3 : 10.37197
IQR : 17.17188
C.V. : 2.07827
(
f X1 , X 2 x1 , x2 50 ≤ X 12 + X 22 ≤ 200 ) (
f X 2 , X1 x2 , x1 50 ≤ X 12 + X 22 ≤ 200 )
183
E ( X 2 x1 ),50 ≤ X 12 + X 22 ≤ 200 E ( X 1 x2 ),50 ≤ X 12 + X 22 ≤ 200
( )
fW1 w1 50 ≤ X 12 + X 22 ≤ 200 ,W1 = ε , Coefficient
Mathematical Mean: 0.00026
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.99984
S.D. : 0.99992
Skewed Coef. : -0.00002
Kurtosis Coef. : 3.00011
MAD : 0.79780
Range : 11.19210
Mid_range : 0.04738
Median : 0.00024
Q1 : -0.67411
Q2 : 0.00024
Q3 : 0.67464
IQR : 1.34875
C.V. : none
184
5.8. The 3th basic assumptionis modified, error has the Durbin
Watson the first order autoregressive error model.
185
[ 6 ] 20.38560~ 27.81797 24.10179 2.00000 0.0200000 1.0000000
frequency distribution: sample mean=6.784375 , sample variance=58.615228 , sample sd=7.656058
186
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=0.859603
Z=6.137602, p-value=0.000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
Z=6.137602, p-value=1.000000
H0: auto correlation coefficient=0 , H1:against H0
Z=6.137602, p-value=0.000000
(C.L.T. can be applied when Durbin Watson test statistic),
H0:Variances are equal
The test statistic=Max(each residual*residual)/SSE
p value=0.197109
2. The population sigma of error confidence interval
90% confidence interval for population variance
[1.185621 , 1.900812]
90% confidence interval for population standard deviation
[1.088862 , 1.378699]
95% confidence interval for population variance
[1.137383 , 1.996875]
95% confidence interval for population standard deviation
[1.066482 , 1.413108]
99% confidence interval for population variance
[1.050533 , 2.203873]
99% confidence interval for population standard deviation
[1.024955 , 1.484545]
estimated line residual plot
187
(30.1.4) Drubin Watson analysis
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=0.859603
Z=6.137602, p-value=0.000000,
Ho:auto corealtion coefficient is 0 will be rejected, (see 30.1.2).
The Durbin Watson model analysis
[ The Durbin Watson information ]
The Durbin Watson Model
Y(t)=b0+b1*X(1,t) +error(t),t=1,..,100,
error(t+1)=rho*error(t)+mu(t+1),t=1,...,99,
mu(1),...,mu(100) are iid, error(1)=mu(1),
E(mu(t))=0,Var(mu(t))=1.000000,t=1,..,100,
The probability distribution of mu(t) are Normal distribution(the probability distribution),
t=1,...,100,
--- The sample size=100,lag=1,sigma=1.000000(variance is known),
--- independent variable number=1,
[ Durbin Watson test statistic ]
188
Xˆ 2t = 0.9796787996 + 2.0299052130 × X 1t + εˆt , t = 1,2,....,100,
εˆt = 0.595 × εˆt −1 + µˆ t , t = 1,2,....,100, εˆ0 = 0,
µ (sample mean)=0, µ (sample variance)=1.0188720780,
(30.2) n = 100,000,000, it is big data and the Durbin Watson the first order
autoregressive error model will be applied.
(30.2.1) Basiec analysis,
(30.2.1.1) X1 and X2 joint probability distribution when the auto correlation
coefficient is 0.
f(x1,x2) f(x2,x1)
189
f(x2),F(x2) Coefficient
Mathematical Mean: 4.99975
Geometrical Mean : none
Harmonic Mean : none
Variance : 101.33978
S.D. : 10.06677
Skewed Coef. : -0.00021
Kurtosis Coef. : 3.00024
MAD : 8.03195
Range : 113.33610
Mid_range : 7.23542
Median : 4.99982
Q1 : -1.78978
Q2 : 4.99982
Q3 : 11.79044
IQR : 13.58022
C.V. : 2.01346
190
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =122.247413
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=50001979
number of the positive ofresidual=49998021
H0: residualis random , H1: Increasing line or decreasing line
Z=-3332.395606, p-value=0.000000
H0: residual is random , H1: Oscillation
Z=-3332.395606, p-value=1.000000
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-3332.395606, p-value=0.000000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.000222
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.999778
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [1.333014 , 1.333634]
90% confidence interval for population standard deviation [1.154562 , 1.154831]
95% confidence interval for population variance [1.332954 , 1.333694]
95% confidence interval for population standard deviation [1.154536 , 1.154856]
99% confidence interval for population variance [1.332838 , 1.333810]
99% confidence interval for population standard deviation [1.154486 , 1.154907]
[testing the three basic assumptions]
The joint proability distribution of X1 The joint proability distribution of X2
and residual estimated line and X2
191
(30.2.3) residual analysis
X0=residual,residual mariginal probability distribution
Mathematical Mean: 0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.33332
S.D. : 1.15470
Skewed Coef. : 0.00006
Kurtosis Coef. : 3.00014
MAD : 0.92130
Range : 13.15760
Mid_range : 0.07225
Median : -0.00006
Q1 : -0.77878
Q2 : -0.00006
Q3 : 0.77888
IQR : 1.55766
C.V. : none
192
(30.2.4)The auto correlation coefficient analysis, the residual is form (30.2.2)
estimated line.
(30.2.4.1)The joint proabability distribution of t and error(t).
X1= t = 1,2,3,....., T ,X2= error (t ) ,T=100,000,000.
f(x1,x2) f(x2,x1)
193
E(X2|x1) and x1 are linear relation E(X1|x2) and x2 are linear relation
194
pearson chi-square test statistic =87.533467
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=49998734
number of the positive ofresidual=50001265
H0: residualis random , H1: Increasing line or decreasing line
Z=-1.389294, p-value=0.082400
H0: residual is random , H1: Oscillation
Z=-1.389294, p-value=0.917600
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-1.389294, p-value=0.164800
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,99999999
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.999839
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.000161
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance
[0.999908 , 1.000373]
90% confidence interval for population standard deviation
[0.999954 , 1.000187]
95% confidence interval for population variance
[0.999864 , 1.000418]
95% confidence interval for population standard deviation
[0.999932 , 1.000209]
99% confidence interval for population variance
[0.999776 , 1.000505]
99% confidence interval for population standard deviation
[0.999888 , 1.000253]
The joint probability distribution of The joint probability distribution of
X1=residual(t) and mu(t) X2=residual(t) estimated line and
X2=residual(t)
195
(30.2.4.3)mu(t)分析
mu(t)=residual of Durbin Waston,lag=1 and marginal probability distribution,
Mathematical Mean: 0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00014
S.D. : 1.00007
Skewed Coef. : -0.00034
Kurtosis Coef. : 3.00046
MAD : 0.79793
Range : 11.16799
Mid_range : -0.13834
Median : 0.00003
Q1 : -0.67445
Q2 : 0.00003
Q3 : 0.67458
IQR : 1.34903
C.V. : none
196
Chaper 6. The general linear model and non-linear
model
(1.1)Sample
Yi = β 0 + β1 X 1i + β 2 X 2i + ... + β k X ki + ε i , i = 1,2,..., n
β 0 is intercept, β1 , β 2 ,.., β k are slopes,
X 1i , X 2i ,..., X ki are independent variables,
Yi is dependent variables.
ε i is error, there are three basic assumptions,
(a )ε i ~ N (0,σ i2 ), (b )σ 12 = ... = σ n2 , (c )Cov (ε i , ε j ) = 0, i ≠ j.
fXk y
(x y ) = ∫ ....∫ f
k x1 ,..., xk y
(x ,..., x y )dx ...dx ,
1 k 2 k −1
There are marginal probability, conditional probability distribution and the joint
probability distribution.
197
6.2. Collinarity in highly, the other assumptions are unchanged.
Example 31,
Multi-variate normal distribution and there are 5 random variables,
the vector of population expection mean and cov-variance matrix
E ( X 1 ) 100 1 0.99 0.99 0.99 0.99
E ( X ) 0 0.99 1 0.99 0.99 0.99
2
μ = E ( X 3 ) = − 100, Σ = 0.99 0.99 1 0.99 0.99,
E ( X 4 ) − 120 0.99 0.99 0.99 1 0.99
E ( X 5 ) 180 0.99 0.99 0.99 0.99 1
X i ~ Normal (E ( X i ),Var ( X i )),Var ( X i ) = 1, i = 1,2,..,5,
Cov (X i , X j ) = ρ ((X i , X j )) = 0.99, i, j = 1,2,...,5, i ≠ j ,
(31.1) paird samples, n=1000,
(31.1.1) X 1 , X 2 , X 3 , X 4 are independent variables, X 5 is dependent variables.
The linear model analysis
Dependent variable is X5,
Independent variables are X1,X2,X3,X4
The correlation matrix is below
r(X5,X1)=0.990839,r(X5,X2)=0.990473,r(X5,X3)=0.990308,r(X5,X4)=0.991157,
r(X1,X2)=0.990072,r(X1,X3)=0.990595,r(X1,X4)=0.990603,r(X2,X3)=0.990136,
r(X2,X4)=0.990641,r(X3,X4)=0.990697,
The estimated line is X5=207.931419+0.268172*X1+0.240660*X2+0.207652*X3+0.283226*X4
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 4 1002.8234887438 250.7058721860 21539.0189884664
error 995 11.5814161712 0.0116396142
total 999 1014.4049049150
----------------------------------------------------------------------------------
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 207.9314185679 6.1480819864 33.82053 0.00000
X1 0.2681718005 0.0300243283 8.93182 0.00000
X2 0.2406601650 0.0297202607 8.09751 0.00000
X3 0.2076518602 0.0302769893 6.85841 0.00000
X4 0.2832257049 0.0305070251 9.28395 0.00000
----------------------------------------------------------------------------------
MSE=0.0116396142 , R2=0.988583 , R2(adj)=0.988537
dependent variable:X5 , sample mean=180.0015808554 , sample variance=1.015420
independent variable:X1 , sample mean=100.0017783040 , sample variance=1.010565
independent variable:X2 , sample mean=0.0084181394 , sample variance=1.002530
independent variable:X3 , sample mean=-99.9910537157 , sample variance=1.004678
independent variable:X4 , sample mean=-119.9968491273 , sample variance=1.025461
198
Cov(b2,b3)= -0.0002781319, Cov(b2,b4)= -0.0003224420,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= 0.0863582216, Cov(b3,b1)= -0.0003205022, Cov(b3,b2)= -0.0002781319, Var(b3)=
0.0009166961, Cov(b3,b4)= -0.0003113108,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b0)= 0.1108784652, Cov(b4,b1)= -0.0003032501, Cov(b4,b2)= -0.0003224420, Cov(b4,b3)=
-0.0003113108, Var(b4)= 0.0009306786,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ indivudal test ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
variable coefficient staradarad error t value F value
intercept 207.9314185679 6.1480819864 33.8205 1143.8285
X1 slope 0.2681718005 0.0300243283 8.9318 79.7774
X2 slope 0.2406601650 0.0297202607 8.0975 65.5697
X3 slope 0.2076518602 0.0302769893 6.8584 47.0377
X4 slope 0.2832257049 0.0305070251 9.2840 86.1917
====================
199
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.010840 , 0.012566]
90% confidence interval for population standard deviation [0.104116 , 0.112100]
95% confidence interval for population variance [0.010699 , 0.012761]
95% confidence interval for population standard deviation [0.103438 , 0.112964]
99% confidence interval for population variance [0.010434 , 0.013160]
99% confidence interval for population standard deviation [0.102149 , 0.114715]
200
there are 4 independnent variables.
The independnent variables are:X4,X1,X2,X3,
X4
The estimated line is X5=298.353611+0.986293*X4
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 1 996.5434228552 996.5434228552 55681.2885224165
error 998 17.8614820598 0.0178972766
total 999 1014.4049049150
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 298.3536114944 0.5015757414 594.83262
X4 0.9862928194 0.0041797589 235.96883
----------------------------------------------------------------------------------
MSE=0.0178972766 , R2=0.982392 , R2(adj)=0.982375
dependent variable:X5 , sample mean=180.0015808554 , sample variance=1.015420
independent variable:X4 , sample mean=-119.9968491273 , sample variance=1.025461
-------- Regression CoefficientVariance and Covariance Matrix ---------------
Var(b0)= 0.2515782244, Cov(b0,b1)= 0.0020963911,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= 0.0020963911, Var(b1)= 0.0000174704,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------- partial coefficient of determination and test ---------------
r(X5,X4) square= 0.9823921572, test value= 55681.2885224169
X4,X1
The estimated line is X5=193.253972+0.512206*X4+0.482098*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 1000.9325215354 500.4662607677
37036.1240416042
X4 1 996.5434228552
X1 1 4.3890986801
error 997 13.4723833797 0.0135129221
total 999 1014.4049049150
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 193.2539716661 5.8478697371 33.04690
X4 0.5122059123 0.0265549382 19.28854
X1 0.4820984749 0.0267499348 18.02242
----------------------------------------------------------------------------------
MSE=0.0135129221 , R2=0.986719 , R2(adj)=0.986692
dependent variable:X5 , sample mean= 180.0015808554 , sample variance=1.015420
independent variable:X4 , sample mean= -119.9968491273 , sample variance=1.025461
independent variable:X1 , sample mean=100.0017783040 , sample variance=1.010565
201
-------- Regression CoefficientVariance and Covariance Matrix ---------------
Var(b0)= 34.1975804621, Cov(b0,b1)= 0.1549855754, Cov(b0,b2)=
-0.1559950883,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= 0.1549855754, Var(b1)= 0.0007051647, Cov(b1,b2)=
-0.0007036678,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.1559950883, Cov(b2,b1)= -0.0007036678, Var(b2)=
0.0007155590,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------- partial coefficient of determination and test ---------------
r(X5,X4) square= 0.9823921572, test value= 55681.2885224169
r(X5,X1|X4) square= 0.2457298149, test value= 324.8075162925
X4,X1,X2
The estimated line is X5=188.369379+0.353744*X4+0.340773*X1+0.303663*X2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 3 1002.2759878036 334.0919959345 27434.8999909182
X4 1 996.5434228552
X1 1 4.3890986801
X2 1 1.3434662682
error 996 12.1289171115 0.0121776276
total 999 1014.4049049150
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 188.3693785632 5.5708685591 33.81329
X4 0.3537444690 0.0293783731 12.04098
X1 0.3407726171 0.0287383399 11.85777
X2 0.3036631765 0.0289107990 10.50345
----------------------------------------------------------------------------------
MSE=0.0121776276 , R2=0.988043 , R2(adj)=0.988007
dependent variable:X5 , sample mean=180.0015808554 , sample variance=1.015420
independent variable:X4 , sample mean=-119.9968491273 , sample variance=1.025461
independent variable:X1 , sample mean= 100.0017783040 , sample variance=1.010565
independent variable:X2 , sample mean=0.0084181394 , sample variance=1.002530
202
X4,X1,X2,X3
The estimated line is X5=207.931419+0.283226*X4+0.268172*X1+0.240660*X2+0.207652*X3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 4 1002.8234887438 250.7058721860 21539.0189884775
X4 1 996.5434228552
X1 1 4.3890986801
X2 1 1.3434662682
X3 1 0.5475009402
error 995 11.5814161712 0.0116396142
total 999 1014.4049049150
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 207.9314185232 6.1480819864 33.82053
X4 0.2832257042 0.0305070251 9.28395
X1 0.2681717999 0.0300243283 8.93182
X2 0.2406601653 0.0297202607 8.09751
X3 0.2076518605 0.0302769893 6.85841
----------------------------------------------------------------------------------
MSE=0.0116396142 , R2=0.988583 , R2(adj)=0.988537
dependent variable:X5 , sample mean=180.0015808554 , sample variance=1.015420
independent variable:X4 , sample mean=-119.9968491273 , sample variance=1.025461
independent variable:X1 , sample mean=100.0017783040 , sample variance=1.010565
independent variable:X2 , sample mean= 0.0084181394 , sample variance=1.002530
independent variable:X3 , sample mean=-99.9910537157 , sample variance=1.004678
-------- Regression CoefficientVariance and Covariance Matrix ---------------
Var(b0)= 37.7989121120, Cov(b0,b1)= 0.1108784652, Cov(b0,b2)=
-0.1585817368, Cov(b0,b3)= -0.0390525466, Cov(b0,b4)= 0.0863582216,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= 0.1108784652, Var(b1)= 0.0009306786, Cov(b1,b2)=
-0.0003032501, Cov(b1,b3)= -0.0003224420, Cov(b1,b4)= -0.0003113108,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.1585817368, Cov(b2,b1)= -0.0003032501, Var(b2)=
0.0009014603, Cov(b2,b3)= -0.0002745713, Cov(b2,b4)= -0.0003205022,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= -0.0390525466, Cov(b3,b1)= -0.0003224420, Cov(b3,b2)=
-0.0002745713, Var(b3)= 0.0008832939, Cov(b3,b4)= -0.0002781319,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b0)= 0.0863582216, Cov(b4,b1)= -0.0003113108, Cov(b4,b2)=
-0.0003205022, Cov(b4,b3)= -0.0002781319, Var(b4)= 0.0009166961,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------- partial coefficient of determination and test ---------------
r(X5,X4) square= 0.9823921572, test value= 55681.2885224169
r(X5,X1|X4) square= 0.2457298149, test value= 324.8075162925
r(X5,X2|X4,X1) square= 0.0997200147, test value= 110.3224954748
r(X5,X3|X4,X1,X2) square= 0.0451401337, test value= 47.0377221138
203
X1 1 4.3890986801
X2 1 1.3434662682
X3 1 0.5475009402
error 995 11.5814161712 0.0116396142
total 999 1014.4049049150
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 207.9314185232 6.1480819864 33.82053
X4 0.2832257042 0.0305070251 9.28395
X1 0.2681717999 0.0300243283 8.93182
X2 0.2406601653 0.0297202607 8.09751
X3 0.2076518605 0.0302769893 6.85841
----------------------------------------------------------------------------------
MSE= 0.0116396142 , R2=0.988583 , R2(adj)=0.988537
dependent variable:X5 , sample mean= 180.0015808554 , sample variance=1.015420
independent variable:X4 , sample mean= -119.9968491273 , sample variance=1.025461
independent variable:X1 , sample mean= 100.0017783040 , sample variance=1.010565
independent variable:X2 , sample mean= 0.0084181394 , sample variance=1.002530
independent variable:X3 , sample mean= -99.9910537157 , sample variance=1.004678
-------- Regression CoefficientVariance and Covariance Matrix ---------------
Var(b0)= 37.7989121120, Cov(b0,b1)= 0.1108784652, Cov(b0,b2)=
-0.1585817368, Cov(b0,b3)= -0.0390525466, Cov(b0,b4)= 0.0863582216,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= 0.1108784652, Var(b1)= 0.0009306786, Cov(b1,b2)=
-0.0003032501, Cov(b1,b3)= -0.0003224420, Cov(b1,b4)= -0.0003113108,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.1585817368, Cov(b2,b1)= -0.0003032501, Var(b2)=
0.0009014603, Cov(b2,b3)= -0.0002745713, Cov(b2,b4)= -0.0003205022,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= -0.0390525466, Cov(b3,b1)= -0.0003224420, Cov(b3,b2)=
-0.0002745713, Var(b3)= 0.0008832939, Cov(b3,b4)= -0.0002781319,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b0)= 0.0863582216, Cov(b4,b1)= -0.0003113108, Cov(b4,b2)=
-0.0003205022, Cov(b4,b3)= -0.0002781319, Var(b4)= 0.0009166961,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
204
Source df SS MS
----------------------------------------------------------------------------------
Regression 4 98755044.7592373790 24688761.1898093450
error 99999995 1249190.6841565552 0.0124919075
total 99999999 100004235.4433939300
----------------------------------------------------------------------------------
F test statistic=1976380409.2119820000
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 209.9307396412 0.0196175495 10701.17037 0.00000
X1 0.2492945867 0.0000968252 2574.68747 0.00000
X2 0.2494729046 0.0000968258 2576.51194 0.00000
X3 0.2492945247 0.0000968282 2574.60603 0.00000
X4 0.2494228352 0.0000968482 2575.39951 0.00000
----------------------------------------------------------------------------------
MSE=0.0124919075 , R2=0.987509 , R2(adj)=0.987509
dependent variable:X5 , sample mean=180.0000108540 , sample variance=1.000042
independent variable:X1 , sample mean= 100.0000034896 , sample variance=1.000060
independent variable:X2 , sample mean=0.0000051233 , sample variance=1.000037
independent variable:X3 , sample mean=-99.9999936142 , sample variance=1.000006
independent variable:X4 , sample mean=-119.9999940189 , sample variance=1.000046
205
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.00714 0.65957 2.89256 0.16526 1.97443 0.95485 5.61376
21.46592 22.16355 41.25341 5.22038 8.22531 0.78171 0.22898 1.31892
2.85315 0.19130 0.27472 0.00551 0.08898
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =116.339396
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=49997485
number of the positive ofresidual=50002515
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.712975, p-value=0.238000
H0: residual is random , H1: Oscillation
Z=-0.712975, p-value=0.762000
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.712975, p-value=0.476000
206
sample mean(X5 estimated value)= 180.0000,
sample variance(X5 estimated value)= 0.9876
sample mean(residual)= -0.0000, sample variance(residual)= 0.0125,
sample cov(X5 estimated value,residual)= -0.0000,
X5 estimated value and residual sample correlation coefficient=-0.0000.
sample mean(X5 estimated value)= 180.0000,
sample variance(X5 estimated value)= 0.9876,
sample mean(X5)= 180.0000, sample variance(X5)= 1.0000,
sample cov(X5 estimated value,X5)= 0.9876,
X5 estimated value and X5 sample correlation coefficient=0.9937.
207
(31.2.3)one of X 1 , X 2 , X 3 , X 4 , X 5 is dependent variable and the other is independent
variables(refer Chpater 7), it is the multu-variate analysis using linear model.
Dependent variable is X1,
Independent variables are X2,X3,X4,X5
The estimated line is X1=109.985855+0.249283*X2+0.249269*X3+0.249561*X4+0.249380*X5
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 4 98756340.1276181040 24689085.0319045260 1975734119.9261568000
error 99999995 1249615.7022571960 0.0124961576
total 99999999 100005955.8298753100
----------------------------------------------------------------------------------
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 109.9858552217 0.2375014503 463.09551 0.00000
X2 0.2492833210 0.0008663705 287.73292 0.00000
X3 0.2492686305 0.0008663542 287.72138 0.00000
X4 0.2495613558 0.0008664952 288.01240 0.00000
X5 0.2493798155 0.0008664593 287.81480 0.00000
----------------------------------------------------------------------------------
MSE=0.0124961576 , R2=0.987505 , R2(adj)=0.987505,C.V.= 0.0011178621
208
----------------------------------------------------------------------------------
Regression 4 98751088.0409933180 24687772.0102483290 1975745125.0907817000
error 99999995 1249542.2846975452 0.0124954235
total 99999999 100000630.3256908700
----------------------------------------------------------------------------------
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -139.8693654835 0.2245683476 -622.83651 0.00000
X1 0.2492539653 0.0008663287 287.71292 0.00000
X2 0.2493263735 0.0008663589 287.78647 0.00000
X4 0.2495145493 0.0008665043 287.95536 0.00000
X5 0.2493650829 0.0008664610 287.79723 0.00000
----------------------------------------------------------------------------------
MSE=0.0124954235 , R2=0.987505 , R2(adj)=0.987505,C.V.=-------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -164.8917723298 0.2105188956 -783.26353 0.00000
X1 0.2494341135 0.0008662743 287.93897 0.00000
X2 0.2492713225 0.0008663586 287.72302 0.00000
X3 0.2494020045 0.0008663088 287.89041 0.00000
X5 0.2493808935 0.0008664444 287.82099 0.00000
----------------------------------------------------------------------------------
MSE=0.0124898103 , R2=0.987511 , R2(adj)=0.987511,C.V.=-------
There are 5 random variables, X1,…,X5, any on of them can be depedent variables,
because the multi-variate normal distribution is joint probability distribution.
209
6.3. The probability distributions of independent variable and error
are not normal distribution, the other assumptions are
unchanged.
Example 32,
X 1 ~ Arc sin (µ = 100, c = 10 ), X 2 ~ Double _ exponential (λ = 0.1, µ = 50 ),
X 3 ~ Semi _ circle(µ = 100, R = 10), X 4 ~ Logistic (µ = 100,σ = 10),
X 5 ~ Gamma(α = 50, β = 2 ), X 6 ~ U _ quadratic(a = 90, b = 110 ),
X 1 , X 2 ,..., X 6 are independent random variables.
X 7 = 1 + 2 X 1 + 3 X 3 + 4 X 4 + 5 X 5 + 6 X 6 + ε , ε ~ Raised _ secant (0, s = 5 ),
210
independent variable:X6 , sample mean=99.9920425548 , sample variance=59.469904
211
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=508
number of the positive ofresidual=492
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.624833, p-value=0.266100
H0: residual is random , H1: Oscillation
Z=-0.624833, p-value=0.733900
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.624833, p-value=0.532200
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.968793
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.031207
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
(32.1.2)residual analysis
X0= residual,residual frequency distribution table,
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -4.54698~ -3.50989 -4.02843 15.00000 0.0150000 0.0150000
[ 2 ] -3.50989~ -2.47280 -2.99134 75.00000 0.0750000 0.0900000
[ 3 ] -2.47280~ -1.43571 -1.95425 129.00000 0.1290000 0.2190000
[ 4 ] -1.43571~ -0.39862 -0.91716 210.00000 0.2100000 0.4290000
[ 5 ] -0.39862~ 0.63848 0.11993 191.00000 0.1910000 0.6200000
[ 6 ] 0.63848~ 1.67557 1.15702 189.00000 0.1890000 0.8090000
[ 7 ] 1.67557~ 2.71266 2.19411 121.00000 0.1210000 0.9300000
[ 8 ] 2.71266~ 3.74975 3.23120 61.00000 0.0610000 0.9910000
[ 9 ] 3.74975~ 4.78684 4.26829 9.00000 0.0090000 1.0000000
frequency distribution: sample mean=0.013109 , sample variance=3.222793 , sample sd=1.795214
212
X0= residual,goodness of fit( the best parameters)
mu point estimated value=-0.000000 (MLE), sigma point estimated value=1.769665 (MLE)
mu value from -0.353933 to 0.353933, sigma value from 1.474720 to 2.212081
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -2.44514 -1.61785 -1.02136 -0.51170 -0.03535 0.44093
0.95058 1.54628 2.37416
upper limit -2.44514 -1.61785 -1.02136 -0.51170 -0.03535 0.44093 0.95058
1.54628 2.37416
observed no 91.00000 99.00000 106.00000 107.00000 96.00000 80.00000 108.00000
106.00000 104.00000 103.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.81000 0.01000 0.36000 0.49000 0.16000 4.00000 0.64000
0.36000 0.16000 0.09000
degree of freedom=7
H0: X0~Normal(mu=-0.035393,sigma*sigma=3.535410), sigma=1.880269
pearson chi-square test statistic =7.080000
p-value=0.420500
213
X2 goodness of fit( the best parameters)
lamda point estimated value=0.094262 (MLE), mu point estimated value=49.359194 (MLE)
lamda value from 5.304350 to 21.217400, mu value from 49.092580 to 49.625808
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit 32.08204 39.47238 43.79546 46.86272 49.24188 51.62104
54.68831 59.01138 66.40173
upper limit 32.08204 39.47238 43.79546 46.86272 49.24188 51.62104 54.68831
59.01138 66.40173
observed no 101.00000 94.00000 105.00000 99.00000 96.00000 97.00000 96.00000
108.00000 107.00000 97.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.01000 0.36000 0.25000 0.01000 0.16000 0.09000 0.16000
0.64000 0.49000 0.09000
degree of freedom=7
H0: X2~Double exponential(lamda=0.093791,mu=49.241884),
pearson chi-square test statistic =2.260000
p-value=0.944000
214
X4 goodness of fit( the best parameters)
mu point estimated value=100.084063
sigma point estimated value=5.276446
mu value from 99.028774 to 101.139352
sigma value from 4.058805 to 7.537780
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit 88.53504 92.67281 95.42304 97.67749 99.74637 101.81525
104.06971 106.81993 110.95770
upper limit 88.53504 92.67281 95.42304 97.67749 99.74637 101.81525 104.06971
106.81993 110.95770
observed no 99.00000 102.00000 94.00000 98.00000 100.00000 112.00000 85.00000
99.00000 106.00000 105.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.01000 0.04000 0.36000 0.04000 0.00000 1.44000 2.25000
0.01000 0.36000 0.25000
degree of freedom=7
H0: X4~Logistic(mu=99.746370,sigma=5.102497),
pearson chi-square test statistic =4.760000
p-value=0.689200
215
p-value=0.772800
216
(32.1.4)The linear model stepwise analysis
Dependent variable is X7,
Independent variables are X1,X2,X3,X4,X5,X6
The correlation matrix is below
Sorting the Independent variable by coefficient of determination and the order is from large to small
r(X7,X5) square=0.477971,
r(X7,X6) square=0.186725,
r(X7,X2) square=0.153496,
r(X7,X4) square=0.151640,
r(X7,X3) square=0.034345,
r(X7,X1) square=0.029612
X5
The estimated line is X7=1947.582963+6.017263*X5
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 1 7539744.9039596841 7539744.9039596841 913.7697477860
error 998 8234749.9820202049 8251.2524869942
total 999 15774494.8859798890
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
217
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 1947.5829634149 20.1120969927 96.83639
X5 6.0172627135 0.1990584350 30.22862
----------------------------------------------------------------------------------
MSE= 8251.2524869942 , R2=0.477971 , R2(adj)=0.477448
-------- Regression Coefficient Variance and Covariance Matrix ---------------
Var(b0)= 404.4964454450, Cov(b0,b1)= -3.9624389920,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -3.9624389920, Var(b1)= 0.0396242605,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
X5,X6
The estimated line is X7=1204.004869+6.117982*X5+7.335645*X6
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 10734608.1139184050 5367304.0569592025 1061.7703108833
X5 1 7539744.9039596915
X6 1 3194863.2099587135
error 997 5039886.7720614839 5055.0519278450
total 999 15774494.8859798890
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 1204.0048687995 33.5059202032 35.93409
X5 6.1179821270 0.1558572424 39.25376
X6 7.3356449337 0.2917930735 25.13989
----------------------------------------------------------------------------------
MSE=5055.0519278450 , R2=0.680504 , R2(adj)=0.679863
X5,X6,X2
The estimated line is X7=1092.086402+6.142985*X5+6.908326*X6+3.089359*X2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 3 12863423.2126139330 4287807.7375379773 1467.0392851062
218
X5 1 7539744.9039596915
X6 1 3194863.2099587135
X2 1 2128815.0986955278
error 996 2911071.6733659552 2922.7627242630
total 999 15774494.8859798890
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 1092.0864015411 25.8127178293 42.30807
X5 6.1429847459 0.1185152471 51.83286
X6 6.9083264658 0.2224395406 31.05710
X2 3.0893593685 0.1144712010 26.98809
----------------------------------------------------------------------------------
MSE=2922.7627242630 , R2=0.815457 , R2(adj)=0.814901
-------- Regression CoefficientVariance and Covariance Matrix ---------------
Var(b0)= 666.2964017370, Cov(b0,b1)= -1.4759332404, Cov(b0,b2)=
-4.9244035158, Cov(b0,b3)= -0.4747071819,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -1.4759332404, Var(b1)= 0.0140458638, Cov(b1,b2)=
0.0006612473, Cov(b1,b3)= 0.0001060497,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -4.9244035158, Cov(b2,b1)= 0.0006612473, Var(b2)=
0.0494793492, Cov(b2,b3)= -0.0018124904,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= -0.4747071819, Cov(b3,b1)= 0.0001060497, Cov(b3,b2)=
-0.0018124904, Var(b3)= 0.0131036559,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------- partial coefficient of determination and test ---------------
r(X7,X5) square= 0.4779706075, test value= 913.7697477860
r(X7,X6|X5) square= 0.3879733103, test value= 632.0139249926
r(X7,X2|X5,X6) square= 0.4223934376, test value= 728.3571399838
X5,X6,X2,X4
The estimated line is X7=579.918568+6.092456*X5+7.110098*X6+2.992638*X2+5.013871*X4
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 4 15158991.6238933490 3789747.9059733371 6126.3674763649
X5 1 7539744.9039596915
X6 1 3194863.2099587135
X2 1 2128815.0986955278
X4 1 2295568.4112794157
error 995 615503.2620865402 618.5962433031
total 999 15774494.8859798890
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 579.9185677943 14.5501716217 39.85648
X5 6.0924561580 0.0545294774 111.72776
X6 7.1100981813 0.1023873294 69.44315
X2 2.9926379623 0.0526866266 56.80071
X4 5.0138705763 0.0823060270 60.91742
----------------------------------------------------------------------------------
MSE=618.5962433031 , R2=0.960981 , R2(adj)=0.960824
219
-------- Regression CoefficientVariance and Covariance Matrix ---------------
Var(b0)= 211.7074942222, Cov(b0,b1)= -0.3054042424, Cov(b0,b2)=
-1.0700867928, Cov(b0,b3)= -0.0871216228, Cov(b0,b4)= -0.6919942043,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -0.3054042424, Var(b1)= 0.0029734639, Cov(b1,b2)=
0.0001372042, Cov(b1,b3)= 0.0000237622, Cov(b1,b4)= -0.0000682696,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -1.0700867928, Cov(b2,b1)= 0.0001372042, Var(b2)=
0.0104831652, Cov(b2,b3)= -0.0003888685, Cov(b2,b4)= 0.0002726154,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= -0.0871216228, Cov(b3,b1)= 0.0000237622, Cov(b3,b2)=
-0.0003888685, Var(b3)= 0.0027758806, Cov(b3,b4)= -0.0001306811,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b0)= -0.6919942043, Cov(b4,b1)= -0.0000682696, Cov(b4,b2)=
0.0002726154, Cov(b4,b3)= -0.0001306811, Var(b4)= 0.0067742821,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------- partial coefficient of determination and test ---------------
r(X7,X5) square= 0.4779706075, test value= 913.7697477860
r(X7,X6|X5) square= 0.3879733103, test value= 632.0139249926
r(X7,X2|X5,X6) square= 0.4223934376, test value= 728.3571399838
r(X7,X4|X5,X6,X2) square= 0.7885647173, test value= 3710.9317040498
X5,X6,X2,X4,X3
The estimated line is
X7=185.504969+6.078340*X5+7.089015*X6+2.969676*X2+4.978852*X4+4.028495*X3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 5 15572911.0866016300 3114582.2173203258 15357.8548155408
X5 1 7539744.9039596915
X6 1 3194863.2099587135
X2 1 2128815.0986955278
X4 1 2295568.4112794157
X3 1 413919.4627082814
error 994 201583.7993782585 202.8006029962
total 999 15774494.8859798890
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 185.5049693609 12.0674818975 15.37230
X5 6.0783397806 0.0312236784 194.67084
X6 7.0890153415 0.0586260935 120.91911
X2 2.9696759947 0.0301712293 98.42741
X4 4.9788516136 0.0471325959 105.63500
X3 4.0284947365 0.0891701501 45.17762
----------------------------------------------------------------------------------
MSE=202.8006029962 , R2=0.987221 , R2(adj)=0.987157
-------- Regression CoefficientVariance and Covariance Matrix ---------------
Var(b0)= 145.6241193453, Cov(b0,b1)= -0.0973958315, Cov(b0,b2)=
-0.3467431472, Cov(b0,b3)= -0.0241246995, Cov(b0,b4)= -0.2200961989,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b0,b5)= -0.7784811029,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -0.0973958315, Var(b1)= 0.0009749181, Cov(b1,b2)=
0.0000451268, Cov(b1,b3)= 0.0000079490, Cov(b1,b4)= -0.0000221393,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
220
Cov(b1,b5)= -0.0000278625,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.3467431472, Cov(b2,b1)= 0.0000451268, Var(b2)=
0.0034370188, Cov(b2,b3)= -0.0001272495, Cov(b2,b4)= 0.0000897360,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b5)= -0.0000416126,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= -0.0241246995, Cov(b3,b1)= 0.0000079490, Cov(b3,b2)=
-0.0001272495, Var(b3)= 0.0009103031, Cov(b3,b4)= -0.0000424485,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b5)= -0.0000453216,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b0)= -0.2200961989, Cov(b4,b1)= -0.0000221393, Cov(b4,b2)=
0.0000897360, Cov(b4,b3)= -0.0000424485, Var(b4)= 0.0022214816,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b5)= -0.0000691193,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b5,b0)= -0.7784811029, Cov(b5,b1)= -0.0000278625, Cov(b5,b2)=
-0.0000416126, Cov(b5,b3)= -0.0000453216, Cov(b5,b4)= -0.0000691193,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Var(b5)= 0.0079513157,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------- partial coefficient of determination and test ---------------
r(X7,X5) square= 0.4779706075, test value= 913.7697477860
r(X7,X6|X5) square= 0.3879733103, test value= 632.0139249926
r(X7,X2|X5,X6) square= 0.4223934376, test value= 728.3571399838
r(X7,X4|X5,X6,X2) square= 0.7885647173, test value= 3710.9317040498
r(X7,X3|X5,X6,X2,X4) square= 0.6724894703, test value= 2041.0169229919
X5,X6,X2,X4,X3,X1
The estimated line is
X7=1.725619+5.999391*X5+6.992869*X6+3.001740*X2+5.005397*X4+3.990032*X3
+2.003624*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 6 15771369.4369300980 2628561.5728216828 835131.7203478590
X5 1 7539744.9039596915
X6 1 3194863.2099587135
X2 1 2128815.0986955278
X4 1 2295568.4112794157
X3 1 413919.4627082814
X1 1 198458.3503284678
error 993 3125.4490497915 3.1474814197
total 999 15774494.8859798890
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 1.7256189338 1.6720512567 1.03204
X5 5.9993912365 0.0039025193 1537.31237
X6 6.9928693758 0.0073136456 956.14004
X2 3.0017403101 0.0037608884 798.14661
X4 5.0053974070 0.0058727120 852.31447
X3 3.9900315733 0.0111098385 359.14398
X1 2.0036237968 0.0079792685 251.10369
----------------------------------------------------------------------------------
221
MSE= 3.1474814197 , R2=0.999802 , R2(adj)=0.999801
-------- Regression CoefficientVariance and Covariance Matrix ---------------
Var(b0)= 2.7957554051, Cov(b0,b1)= -0.0012814815, Cov(b0,b2)=
-0.0051012466, Cov(b0,b3)= -0.0004678744, Cov(b0,b4)= -0.0034932829,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b0,b5)= -0.0119699807, Cov(b0,b6)= -0.0058399172,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -0.0012814815, Var(b1)= 0.0000152297, Cov(b1,b2)=
0.0000008208, Cov(b1,b3)= 0.0000000832, Cov(b1,b4)= -0.0000003768,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b5)= -0.0000003843, Cov(b1,b6)= -0.0000025087,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.0051012466, Cov(b2,b1)= 0.0000008208, Var(b2)=
0.0000534894, Cov(b2,b3)= -0.0000020238, Cov(b2,b4)= 0.0000013522,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b5)= -0.0000005872, Cov(b2,b6)= -0.0000030552,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= -0.0004678744, Cov(b3,b1)= 0.0000000832, Cov(b3,b2)=
-0.0000020238, Var(b3)= 0.0000141443, Cov(b3,b4)= -0.0000006453,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b5)= -0.0000007230, Cov(b3,b6)= 0.0000010189,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b0)= -0.0034932829, Cov(b4,b1)= -0.0000003768, Cov(b4,b2)=
0.0000013522, Cov(b4,b3)= -0.0000006453, Var(b4)= 0.0000344887,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b5)= -0.0000010889, Cov(b4,b6)= 0.0000008435,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b5,b0)= -0.0119699807, Cov(b5,b1)= -0.0000003843, Cov(b5,b2)=
-0.0000005872, Cov(b5,b3)= -0.0000007230, Cov(b5,b4)= -0.0000010889,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Var(b5)= 0.0001234285, Cov(b5,b6)= -0.0000012222,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b6,b0)= -0.0058399172, Cov(b6,b1)= -0.0000025087, Cov(b6,b2)=
-0.0000030552, Cov(b6,b3)= 0.0000010189, Cov(b6,b4)= 0.0000008435,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b6,b5)= -0.0000012222, Var(b6)= 0.0000636687,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------- partial coefficient of determination and test ---------------
r(X7,X5) square= 0.4779706075, test value= 913.7697477860
r(X7,X6|X5) square= 0.3879733103, test value= 632.0139249926
r(X7,X2|X5,X6) square= 0.4223934376, test value= 728.3571399838
r(X7,X4|X5,X6,X2) square= 0.7885647173, test value= 3710.9317040498
r(X7,X3|X5,X6,X2,X4) square= 0.6724894703, test value= 2041.0169229919
r(X7,X1|X5,X6,X2,X4,X3) square= 0.9844955346, test value= 63053.0649313661
222
X1 1 198458.3503284678
error 993 3125.4490497915 3.1474814197
total 999 15774494.8859798890
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 1.7256189338 1.6720512567 1.03204
X5 5.9993912365 0.0039025193 1537.31237
X6 6.9928693758 0.0073136456 956.14004
X2 3.0017403101 0.0037608884 798.14661
X4 5.0053974070 0.0058727120 852.31447
X3 3.9900315733 0.0111098385 359.14398
X1 2.0036237968 0.0079792685 251.10369
----------------------------------------------------------------------------------
MSE=3.1474814197 , R2=0.999802 , R2(adj)=0.999801
-------- partial coefficient of determination and test ---------------
r(X7,X5) square= 0.4779706075, test value= 913.7697477860
r(X7,X6|X5) square= 0.3879733103, test value= 632.0139249926
r(X7,X2|X5,X6) square= 0.4223934376, test value= 728.3571399838
r(X7,X4|X5,X6,X2) square= 0.7885647173, test value= 3710.9317040498
r(X7,X3|X5,X6,X2,X4) square= 0.6724894703, test value= 2041.0169229919
r(X7,X1|X5,X6,X2,X4,X3) square= 0.9844955346, test value= 63053.0649313661
223
X5 5.9999843218 0.0000127805 469463.88122 0.00000
X6 7.0000395856 0.0000233334 300000.46358 0.00000
----------------------------------------------------------------------------------
MSE=3.2664876062 , R2=0.999776 , R2(adj)=0.999776
-------- Regression Coefficient Variance and Covariance Matrix ---------------
Var(b0)= 0.0000310819, Cov(b0,b1)= -0.0000000653, Cov(b0,b2)= -0.0000000082,
Cov(b0,b3)= -0.0000001306, Cov(b0,b4)= -0.0000000397,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b0,b5)= -0.0000000163, Cov(b0,b6)= -0.0000000545,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -0.0000000653, Var(b1)= 0.0000000007, Cov(b1,b2)= 0.0000000000,
Cov(b1,b3)= -0.0000000000, Cov(b1,b4)= -0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b5)= -0.0000000000, Cov(b1,b6)= -0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.0000000082, Cov(b2,b1)= 0.0000000000, Var(b2)= 0.0000000002,
Cov(b2,b3)= -0.0000000000, Cov(b2,b4)= 0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b5)= 0.0000000000, Cov(b2,b6)= 0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= -0.0000001306, Cov(b3,b1)= -0.0000000000, Cov(b3,b2)= -0.0000000000, Var(b3)=
0.0000000013, Cov(b3,b4)= -0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b5)= 0.0000000000, Cov(b3,b6)= 0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b0)= -0.0000000397, Cov(b4,b1)= -0.0000000000, Cov(b4,b2)= 0.0000000000, Cov(b4,b3)=
-0.0000000000, Var(b4)= 0.0000000004,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b5)= -0.0000000000, Cov(b4,b6)= 0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b5,b0)= -0.0000000163, Cov(b5,b1)= -0.0000000000, Cov(b5,b2)= 0.0000000000, Cov(b5,b3)=
0.0000000000, Cov(b5,b4)= -0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Var(b5)= 0.0000000002, Cov(b5,b6)= 0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b6,b0)= -0.0000000545, Cov(b6,b1)= -0.0000000000, Cov(b6,b2)= 0.0000000000, Cov(b6,b3)=
0.0000000000, Cov(b6,b4)= 0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b6,b5)= 0.0000000000, Var(b6)= 0.0000000005,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ indivudal test ~~~~~~~~~~~~~~~~~~~~
variable coefficient staradarad error t value F value
intercept 0.9867531718 0.0055751175 176.9924 31326.2939
X1 slope 1.9999213639 0.0000255605 78242.5080 6121890058.7773
X2 slope 2.9999412544 0.0000127842 234660.2646 55065439797.1507
X3 slope 4.0001689600 0.0000361412 110681.7904 12250458721.4729
X4 slope 5.0000470778 0.0000199242 250953.7029 62977760979.4029
X5 slope 5.9999843218 0.0000127805 469463.8812 220396335769.2494
X6 slope 7.0000395856 0.0000233334 300000.4636 90000278149.9821
====================
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -2.97291 -2.31628 -1.87318 -1.52108 -1.21899 -0.94773
-0.69635 -0.45830 -0.22703 -0.00041 0.22670 0.45785 0.69634 0.94773
1.21892 1.52097 1.87310 2.31610 2.97282
upper limit -2.97291 -2.31628 -1.87318 -1.52108 -1.21899 -0.94773 -0.69635
-0.45830 -0.22703 -0.00041 0.22670 0.45785 0.69634 0.94773 1.21892
1.52097 1.87310 2.31610 2.97282
observed no 5051613.00000 5971972.00000 5542443.00000 5224203.00000 4980733.00000 4823755.00000
4697688.00000 4601644.00000 4563161.00000 4528589.00000 4541139.00000 4572172.00000
4614681.00000 4690735.00000 4808916.00000 4991488.00000 5225001.00000 5546665.00000
5974645.00000 5048757.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
224
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 532.78035 188945.91376 58848.88165 10053.39704 74.24346 6212.46000 18278.50907
31737.50055 38165.66238 44445.66618 42110.68346 36607.35952 29694.14635 19128.96805
7302.61901 14.49083 10125.09000 59768.52445 189986.57521 475.44901
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =792508.920328
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=49993684
number of the positive ofresidual=50006316
H0: residualis random , H1: Increasing line or decreasing line
Z=1.752560, p-value=0.960200
H0: residual is random , H1: Oscillation
Z=1.752560, p-value=0.039800
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=1.752560, p-value=0.079600
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.999746
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.000254
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
225
X7 estimated value and residual sample correlation coefficient=-0.0000.
sample mean(X7 estimated value)= 2551.0235,
sample variance(X7 estimated value)= 14589.7898
sample mean(X7)= 2551.0235, sample variance(X7)= 14593.0562,
sample cov(X7 estimated value,X7)= 14589.7898,
X7 estimated value and X7 sample correlation coefficient=0.9999.
226
Mathematical Mean: -0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 3.26649
S.D. : 1.80734
Skewed Coef. : -0.00034
Kurtosis Coef. : 2.40612
MAD : 1.48660
Range : 9.94034
Mid_range : 0.00275
Median : 0.00031
Q1 : -1.32379
Q2 : 0.00031
Q3 : 1.32410
IQR : 2.64788
C.V. : none
227
The distribution function estimated line ------
F(X)= 0.44987576559502285000+
0.19858376182936632000*(X- -0.25066119544383186000)^1+
0.00585919768037360120*(X- -0.25066119544383186000)^2+
-0.00523476762313990210*(X- -0.25066119544383186000)^3+
value range 0.4000000100<=F(x)<= 0.5000000000 ,
value range -0.5040456445<=X<= 0.0003084673 ,
Error=0.000000504490819611 MAX=0.000033655695216739 coefficient of
determination=0.999999815607305110,
228
Left diagram, the comparison of the
estimated line and sample data.
229
X2 marginal probability distribution
Mathematical Mean: 50.00185
Geometrical Mean : none
Harmonic Mean : none
Variance : 199.86414
S.D. : 14.13733
Skewed Coef. : 0.00002
Kurtosis Coef. : 5.98350
MAD : 9.99860
Range : 314.45206
Mid_range : 45.63741
Median : 50.00008
Q1 : 43.07283
Q2 : 50.00008
Q3 : 56.93560
IQR : 13.86277
C.V. : 0.28274
230
representable code of Semi circle(100,10),
E(| X3 distribution F() - X4 distribution F()|^2)= 0.0000000048
Pr(| X3 distribution F() - X4 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0001000000)= 0.150758
231
X5 marginal probability distribution
Mathematical Mean: 100.00330
Geometrical Mean : 99.00503
Harmonic Mean : 98.00328
Variance : 199.97939
S.D. : 14.14141
Skewed Coef. : 0.28200
Kurtosis Coef. : 3.12027
MAD : 11.26371
Range : 153.08074
Mid_range : 116.88629
Median : 99.34229
Q1 : 90.13605
Q2 : 99.34229
Q3 : 109.14344
IQR : 19.00739
C.V. : 0.14141
X6 marginal probability
Mathematical Mean: 99.99909
Geometrical Mean : 99.69848
Harmonic Mean : 99.39844
Variance : 59.99624
S.D. : 7.74572
Skewed Coef. : 0.00018
Kurtosis Coef. : 1.19046
MAD : 7.49981
Range : 20.00000
Mid_range : 100.00000
Median : 99.49599
Q1 : 92.06262
Q2 : 99.49599
Q3 : 107.93659
IQR : 15.87397
C.V. : 0.07746
232
Pr(| X6 distribution F() - X7 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X6 distribution F() - X7 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X6 distribution F() - X7 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X6 distribution F() - X7 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X6 distribution F() - X7 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X6 distribution F() - X7 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X6 distribution F() - X7 distribution F()|>= 0.0001000000)= 0.466447
233
f(x3,x7) f(x7,x3)
f(x4,x7) f(x7,x4)
234
sample cov(X5,X7)= 1199.6967, X5 and X7 sample correlation coefficient=0.7023.
f(x6,x7) f(x7,x6)
235
Source df SS MS F
----------------------------------------------------------------------------------
Regression 6 19950183969.6544340000 3325030661.6090722000 9177578605.3239517000
error 99999993 36229931.3560557440 0.3622993389
total 99999999 19986413901.0104900000
----------------------------------------------------------------------------------
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -0.2377059175 0.0030849345 - 77.05380 0.00000
X1 -0.6654452663 0.0000149064 -44641.49140 0.00000
X3 -1.3309958247 0.0000221106 -60197.06391 0.00000
X4 -1.6636944112 0.0000161308 -103137.67725 0.00000
X5 -1.9964087901 0.0000158033 -126328.44017 0.00000
X6 -2.3291637691 0.0000209387 -111237.03189 0.00000
X7 0.3327356069 0.0000023557 141245.07951 0.00000
----------------------------------------------------------------------------------
MSE=0.3622993389 , R2=0.998187 , R2(adj)=0.998187, C.V.= 0.0120378148,
236
----------------------------------------------------------------------------------
Regression 6 8215444625.9808950000 1369240770.9968159000 10496295470.1297400000
Error 99999993 13044989.8161359350 0.1304499073
total 99999999 8228489615.7970314000
----------------------------------------------------------------------------------
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -0.0384853409 0.0030851689 -12.47431 0.00000
X1 -0.3993461255 0.0000148136 -26958.10317 0.00000
X2 -0.5990316805 0.0000096793 -61887.85966 0.00000
X3 -0.7987577144 0.0000218532 -36551.05508 0.00000
X5 -1.1980831163 0.0000149912 -79918.99507 0.00000
X6 -1.3977758379 0.0000201090 -69509.97265 0.00000
X7 0.1996810515 0.0000022030 90639.08156 0.00000
----------------------------------------------------------------------------------
MSE= 0.1304499073 , R2=0.998415 , R2(adj)=0.998415, C.V.= 0.0036117202
237
Source df SS MS F
----------------------------------------------------------------------------------
Regression 6 5992964742.2179699000 998827457.0363283200 15000050782.4948750000
error 99999993 6658826.7040005112 0.0665882717
total 99999999 5999623568.9219704000
----------------------------------------------------------------------------------
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -0.0297658424 0.0030851657 -9.64805 0.00000
X1 -0.2853842400 0.0000146155 -19526.15614 0.00000
X2 -0.4280852131 0.0000089767 -47688.59117 0.00000
X3 -0.5708154020 0.0000213123 -26783.34505 0.00000
X4 -0.7134959238 0.0000143670 -49661.98808 0.00000
X5 -0.8561845389 0.0000131264 -65225.97779 0.00000
X7 0.1426977827 0.0000018433 77414.22985 0.00000
----------------------------------------------------------------------------------
MSE=0.0665882717 , R2=0.998890 , R2(adj)=0.998890, C.V.= 0.0025804938
X7=0.986753+1.999921*X1+2.999941*X2+4.000169*X3+5.000047*X4
+5.999984*X5+7.000040*X6+error,
X1 estimated line
X1=1.120834+-1.475921*X2+-1.968012*X3+-2.459938*X4+-2.951889*X5
+-3.443901*X6+0.491983*X7,
X1,…,X6 are independent random variables.
There have a difference about X1 estiamted line and from the X7 estimated line
coverted to X1 estimated line.
238
6.4. Non-linear model and the other assumptions are unchanged.
Example 33,
X 1 ~ Normal (E ( X 1 ) = 100,Var ( X 1 ) = 25),
X 2 x1 ~ Normal (E (X 2 x1 ) = 50 + 0.5 x1 ,Var ( X 2 x1 ) = 16 ),
X 3 x1 , x2 ~ Normal (E ( X 3 x1 , x2 ) = 10 + 0.5 x1 + 0.5 x2 ,Var ( X 3 x1 , x2 ) = 12.25),
X 4 x1 , x2 ~ Normal (E ( X 4 x1 , x2 ) = 5 + 0.7 x1 + 0.3 x2 ,Var ( X 4 x1 , x2 ) = 16 ),
ε ~ Normal (E (error ) = 0,Var (error ) = 16),
X 5 = 1 + 2 X 1 + 3Cos ( X 2π ) + 4 X 3 + 5 log( X 4 ) + ε ,
(33.1) paird samples, n=1000,
(33.1.1)Non-linear model analysis,
Dependent variable is X5,
Independent variables are X1,X2*X2*Cos(X2*pi),X3^2,X4*Sin(X4*pi),
The correlation matrix is below
r(X5,X1)=0.913424,r(X5,X2*X2*Cos(X2*pi))=0.190844,r(X5,X3^2)=0.669410,
r(X5,X4*Sin(X4*pi))=-0.004997,r(X1,X2*X2*Cos(X2*pi))=-0.005078,
r(X1,X3^2)=0.661686,r(X1,X4*Sin(X4*pi))=0.031870,r(X2*X2*Cos(X2*pi),X3^2)=0.048152,
r(X2*X2*Cos(X2*pi),X4*Sin(X4*pi))=-0.007655,r(X3^2,X4*Sin(X4*pi))=0.005973,
239
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.36000 0.25000 3.61000 0.00000 0.16000 1.21000 0.01000
0.36000 1.69000 0.01000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =7.660000
p-value=0.467300
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=504
number of the positive ofresidual=496
H0: residualis random , H1: Increasing line or decreasing line
Z=0.634838, p-value=0.737300
H0: residual is random , H1: Oscillation
Z=0.634838, p-value=0.262700
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.634838, p-value=0.525400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.084979
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=1.915021
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [14.936741 , 17.315226]
90% confidence interval for population standard deviation [3.864808 , 4.161157]
95% confidence interval for population variance [14.742772 , 17.583407]
95% confidence interval for population standard deviation [3.839632 , 4.193257]
99% confidence interval for population variance [14.377688 , 18.132552]
99% confidence interval for population standard deviation [3.791792 , 4.258233]
residual plot (X5 estimated line,X5) scatter diagram
240
The step of independent variable function into the linear model
step 1, X1 into the linear model, SSR= 109970.4139046841
The estimated line ------
X5= 49.3856531699+2.168077*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 1 109970.4139046841 109970.4139046841 5026.4878893839
error 998 21834.4250482869 21.8781814111
total 999 131804.8389529710
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE=21.8781814111 , R2=0.834343 , R2(adj)=0.834177
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 49.3856531699 0.6555359596 75.33630 0.00000
X1 2.1680765441 0.0065378760 331.61787 0.00000
----------------------------------------------------------------------------------
241
Source df SS MS F
----------------------------------------------------------------------------------
Regression 1 59227.2618314910 59227.2618314910 814.4224380609
error 998 72577.5771214800 72.7230231678
total 999 131804.8389529710
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE=72.7230231678 , R2=0.449356 , R2(adj)=0.448804
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 214.9710238916 0.2141637910 1003.76923 0.00000
X3^3 0.0000383022 0.0000001574 243.36652 0.00000
----------------------------------------------------------------------------------
242
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE=16.8481544568 , R2=0.872557 , R2(adj)=0.872301
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 49.2429693450 0.6555390426 75.11829 0.00000
X1 2.1704326025 0.0065379603 331.97396 0.00000
X2*X2*Cos(X2*pi) 0.0003139506 0.0000044237 70.97052 0.00000
----------------------------------------------------------------------------------
243
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 56.1186402713 0.7804182270 71.90842 0.00000
X1 2.0562247214 0.0096038114 214.10507 0.00000
X4^3 0.0000038173 0.0000002401 15.89963 0.00000
----------------------------------------------------------------------------------
244
intercept 193.8085737983 0.2888939230 670.86414 0.00000
X2*X2*Cos(X2*pi) 0.0002981273 0.0000044237 67.39256 0.00000
X4^2 0.0065736582 0.0000259285 253.53035 0.00000
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 53.3699085295 0.6735605615 79.23550 0.00000
245
X1 2.0161536896 0.0087303376 230.93651 0.00000
X2*X2*Cos(X2*pi) 0.0003058271 0.0000044342 68.97079 0.00000
X3^2 0.0009310891 0.0000349171 26.66574 0.00000
----------------------------------------------------------------------------------
246
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 58.4351073897 0.7199516097 81.16533 0.00000
X1 1.9893993937 0.0087358097 227.72925 0.00000
X3^3 0.0000065801 0.0000002102 31.30427 0.00000
X4*Sin(X4*pi) -0.0049791827 0.0004191549 -11.87910 0.00000
----------------------------------------------------------------------------------
247
error 995 15958.0890680491 16.0382804704
total 999 131804.8389529710
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE=16.0382804704 , R2=0.878926 , R2(adj)=0.878440
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 53.1089278285 0.6739540370 78.80200 0.00000
X1 2.0198062914 0.0087362837 231.19742 0.00000
X2*X2*Cos(X2*pi) 0.0003055191 0.0000044342 68.90005 0.00000
X3^2 0.0009232908 0.0000349238 26.43727 0.00000
X4*Sin(X4*pi) -0.0047511270 0.0004191928 -11.33399 0.00000
----------------------------------------------------------------------------------
248
lower limit -6.58003 -5.12671 -4.14596 -3.36666 -2.69803 -2.09764
-1.54125 -1.01436 -0.50248 -0.00091 0.50177 1.01336 1.54122 2.09764
2.69787 3.36641 4.14578 5.12629 6.57984
upper limit -6.58003 -5.12671 -4.14596 -3.36666 -2.69803 -2.09764 -1.54125
-1.01436 -0.50248 -0.00091 0.50177 1.01336 1.54122 2.09764 2.69787
3.36641 4.14578 5.12629 6.57984
observed no 4998027.00000 5001867.00000 4999790.00000 5000871.00000 5000036.00000 4998466.00000
5003401.00000 4985695.00000 5013290.00000 4990655.00000 5001686.00000 5006678.00000
4997993.00000 5001607.00000 5000595.00000 4999887.00000 4996942.00000 4999684.00000
5000622.00000 5002208.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.77855 0.69714 0.00882 0.15173 0.00026 0.47063 2.31336
40.92661 35.32482 17.46581 0.56852 8.91914 0.80561 0.51649 0.07081
0.00255 1.87027 0.01997 0.07738 0.97505
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =111.963500
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=50001274
number of the positive ofresidual=49998726
H0: residualis random , H1: Increasing line or decreasing line
Z=0.198806, p-value=0.578800
H0: residual is random , H1: Oscillation
Z=0.198806, p-value=0.421200
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.198806, p-value=0.842400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t), t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0 D.W. test=1.999933
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0 D.W. test=2.000067
2. The population sigma of error confidence interval
90% confidence interval for population variance [15.998270 , 16.005714]
90% confidence interval for population standard deviation [3.999784 , 4.000714]
95% confidence interval for population variance [15.997557 , 16.006428]
95% confidence interval for population standard deviation [3.999695 , 4.000803]
99% confidence interval for population variance [15.996163 , 16.007823]
99% confidence interval for population standard deviation [3.999520 , 4.000978]
249
sample mean(X5 estimated value)= 266.1995,
sample variance(X5 estimated value)= 124.3788,
sample mean(residual)= -0.0000, sample variance(residual)= 16.0020,
sample cov(X5 estimated value,residual)= 0.0000,
X5 estimated value and residual sample correlation coefficient=0.0000.
sample mean(X5 estimated value)= 266.1995,
sample variance(X5 estimated value)= 124.3788,
sample mean(X5)= 266.1995, sample variance(X5)= 140.3808,
sample cov(X5 estimated value,X5)= 124.3788,
X5 estimated value and X5 sample correlation coefficient=0.9413.
250
(33.2.3) residual analysis,
X0=residual,residual mariginal probability distribution
Mathematical Mean: -0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 16.00199
S.D. : 4.00025
Skewed Coef. : 0.00007
Kurtosis Coef. : 3.00004
MAD : 3.19168
Range : 44.76841
Mid_range : 0.46503
Median : -0.00013
Q1 : -2.69810
Q2 : -0.00013
Q3 : 2.69779
IQR : 5.39589
C.V. : none
251
Y2=Cos(X2*pi) marginal probability distribution
Mathematical Mean: 0.00002
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.50006
S.D. : 0.70715
Skewed Coef. : 0.00002
Kurtosis Coef. : 1.49985
MAD : 0.63668
Range : 2.00000
Mid_range : 0.00000
Median : -0.00005
Q1 : -0.70709
Q2 : -0.00005
Q3 : 0.70726
IQR : 1.41436
C.V. : none
Y3=X3|^0.5 marginal probability distribution
Mathematical Mean: 10.48478
Geometrical Mean : 10.48148
Harmonic Mean : 10.47817
Variance : 0.06904
S.D. : 0.26276
Skewed Coef. : -0.07545
Kurtosis Coef. : 3.01479
MAD : 0.20957
Range : 3.06716
Mid_range : 10.42838
Median : 10.48810
Q1 : 10.30951
Q2 : 10.48810
Q3 : 10.66365
IQR : 0.35414
C.V. : 0.02506
Y4=log(X4) marginal probability distribution
Mathematical Mean: 4.65244
Geometrical Mean : 4.65211
Harmonic Mean : 4.65179
Variance : 0.00303
S.D. : 0.05508
Skewed Coef. : -0.16658
Kurtosis Coef. : 3.06065
MAD : 0.04389
Range : 0.66743
Mid_range : 4.60436
Median : 4.65395
Q1 : 4.61624
Q2 : 4.65395
Q3 : 4.69030
IQR : 0.07406
C.V. : 0.01184
Y5=X5 marginal probability distribution
Mathematical Mean: 266.19954
Geometrical Mean : 265.93517
Harmonic Mean : 265.66997
Variance : 140.38075
S.D. : 11.84824
Skewed Coef. : -0.00492
Kurtosis Coef. : 2.99901
MAD : 9.45415
Range : 133.04969
Mid_range : 265.89242
Median : 266.20948
Q1 : 258.21112
Q2 : 266.20948
Q3 : 274.19887
IQR : 15.98774
C.V. : 0.04451
252
(33.2.5)The joint probability distribution,
The joint probability distribution of one of X1,Cos(X2*pi),|X3|^0.5,log(X4) and X5.
f(y1,y5),Y1=X1,Y5=X5, f(y5,y1)
f(y3,y5),Y3=|X3|^0.5,Y5=X5, f(y5,y3)
253
sample mean(Y3)= 10.4848, sample variance(Y3)= 0.0690,
sample mean(Y5)= 266.1995, sample variance(Y5)= 140.3808,
sample cov(Y3,Y5)=2.1070, Y3 and Y5 sample correlation coefficient=0.6768.
f(y4,y5),Y4=log(X4),Y5=X5, f(y5,y4)
ANOVA
----------------------------------------------------------------------------------
254
Source df SS MS
----------------------------------------------------------------------------------
Regression 4 2223226529.0748138000 555806632.2687034600
error 99999995 276557964.4251456300 2.7655797825
total 99999999 2499784493.4999595000
----------------------------------------------------------------------------------
F test statistic=200972915.6177705500
The F test p value=0.000100
MSE=2.7655797825 , R2=0.889367 , R2(adj)=0.889367
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -7.0882470635 0.0024553389 -2886.87113 0.00000
Cos(X2*pi) -1.0365780832 0.0001474203 -7031.44887 0.00000
X3^2 0.0001873891 0.0000001154 1623.35288 0.00000
X4 0.1218840691 0.0000249898 4877.35611 0.00000
X5 0.3456682051 0.0000138827 24899.20961 0.00000
----------------------------------------------------------------------------------
255
The correlation matrix is below
r(X3,X1)=0.681029,r(X3,X2)=0.669036,r(X3,X4/(1-X4))=0.576200,r(X3,X5)=0.676858,
r(X1,X2)=0.529988,r(X1,X4/(1-X4))=0.735363,r(X1,X5)=0.921517,r(X2,X4/(1-X4))=0.565801,
r(X2,X5)=0.519779,r(X4/(1-X4),X5)=0.694989,
The step of independent variable function into the linear model
step 1, X1 into the linear model, SSR=1405891347.1167312000
step 2, X2 into the linear model, SSR=400134517.2289078200
step 3, X5 into the linear model, SSR= 26042734.3775551320
step 4, X4/(1-X4) into the linear model, SSR= 40385.9280145168
The estimated line ------
X3= -53.6223818064+0.266098*X1+0.489453*X2+-57.804549*X4/(1-X4)+0.111590*X5
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
Regression 4 1832108984.6512086000 458027246.1628021600
error 99999995 1199137009.6899021000 11.9913706965
total 99999999 3031245994.3411107000
----------------------------------------------------------------------------------
F test statistic=38196404.5442885760
The F test p value=0.000100
MSE=11.9913706965 , R2=0.604408 , R2(adj)=0.604408
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -53.6223818064 0.2931435360 -182.92193 0.00000
X1 0.2660983112 0.0000548225 4853.81603 0.00000
X2 0.4894527205 0.0000263447 18578.82396 0.00000
X4/(1-X4) -57.8045492172 0.2876381258 -200.96275 0.00000
X5 0.1115896524 0.0000218496 5107.16261 0.00000
----------------------------------------------------------------------------------
256
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -78.9549067020 0.3454075797 -228.58475 0.00000
X1 0.6353919780 0.0000526114 12077.07480 0.00000
X2 0.2995868991 0.0000287494 10420.62338 0.00000
X3/(1-X3) -72.8697409630 0.3383507794 -215.36744 0.00000
|X5|^0.5 1.0371012120 0.0007190906 1442.23997 0.00000
----------------------------------------------------------------------------------
257
6.5. Non-linare model and the indepenet variable is the sample
statistics, the other assumptions are unchanged.
Example 34,
( )
iid
X 1 , X 2 ,....., X 10 ~ Normal µ X i = 100,σ X2 i = 25 ,
X 11 = sample Mid _ range ( X 1 , X 2 ,....., X 10 ) + ε ,
ε ~ Normal (µε = 0,σ ε2 = 16 )
(34.1) paird samples, n=1000,
(34.1.1)The linear model analysis,
Dependent variable is X11,
Independent variables are X1,X2,X3,X4,X5,X6,X7,X8,X9,X10
The correlation matrix is below
r(X11,X1)=0.156999,r(X11,X2)=0.118742,r(X11,X3)=0.120827,r(X11,X4)=0.119763,
r(X11,X5)=0.073588,r(X11,X6)=0.111077,r(X11,X7)=0.139506,r(X11,X8)=0.135484,
r(X11,X9)=0.091303,r(X11,X10)=0.099970,r(X1,X2)=-0.022653,r(X1,X3)=-0.006942,
r(X1,X4)=0.002438,r(X1,X5)=-0.014813,r(X1,X6)=-0.011543,r(X1,X7)=0.019416,
r(X1,X8)=0.009116,r(X1,X9)=0.032938,r(X1,X10)=-0.043615,r(X2,X3)=-0.045026,
r(X2,X4)=-0.015778,r(X2,X5)=0.039732,r(X2,X6)=0.007813,r(X2,X7)=0.065894,
r(X2,X8)=-0.011657,r(X2,X9)=-0.025933,r(X2,X10)=-0.027953,r(X3,X4)=-0.026932,
r(X3,X5)=0.023902,r(X3,X6)=-0.045622,r(X3,X7)=0.018674,r(X3,X8)=0.036982,
r(X3,X9)=0.006055,r(X3,X10)=-0.024494,r(X4,X5)=-0.005415,r(X4,X6)=-0.054387,
r(X4,X7)=0.016722,r(X4,X8)=0.071585,r(X4,X9)=0.039967,r(X4,X10)=0.056471,
r(X5,X6)=0.018856,r(X5,X7)=0.000047,r(X5,X8)=-0.037696,r(X5,X9)=0.000259,
r(X5,X10)=-0.006063,r(X6,X7)=0.024971,r(X6,X8)=-0.025989,r(X6,X9)=0.024292,
r(X6,X10)=0.011157,r(X7,X8)=-0.000994,r(X7,X9)=0.041997,r(X7,X10)=-0.019164,
r(X8,X9)=0.012759,r(X8,X10)=-0.010528,r(X9,X10)=-0.035934,
258
X3 0.1214958597 0.0277529953 4.37776 0.00000
X4 0.1036262427 0.0277277976 3.73727 0.00000
X5 0.0702971007 0.0289345571 2.42952 0.01500
X6 0.1105042678 0.0273097594 4.04633 0.00000
X7 0.1090938389 0.0269458775 4.04863 0.00000
X8 0.1181866415 0.0271613650 4.35128 0.00000
X9 0.0756706102 0.0285400874 2.65138 0.00800
X10 0.1064396116 0.0278894303 3.81649 0.00000
----------------------------------------------------------------------------------
MSE=18.7368714852 , R2=0.140411 , R2(adj)=0.131720
dependent variable:X11 , sample mean=100.1147385908 , sample variance=21.579284
independent variable:X1 , sample mean= 99.8994901460 , sample variance=26.094982
independent variable:X2 , sample mean=100.1588541820 , sample variance=26.182831
independent variable:X3 , sample mean=100.2038710839 , sample variance=24.555726
independent variable:X4 , sample mean=100.0923253433 , sample variance=24.754496
independent variable:X5 , sample mean=99.9437070746 , sample variance=22.498640
independent variable:X6 , sample mean=99.9688953333 , sample variance=25.342675
independent variable:X7 , sample mean=99.9323338299 , sample variance=26.047528
independent variable:X8 , sample mean= 99.9250692399 , sample variance=25.650844
independent variable:X9 , sample mean=99.7190029442 , sample variance=23.194495
independent variable:X10 , sample mean=99.8491622199 , sample variance=24.320404
259
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b6,b0)= -0.0779477845, Cov(b6,b1)= 0.0000088101, Cov(b6,b2)= -0.0000021588, Cov(b6,b3)=
0.0000357009, Cov(b6,b4)= 0.0000425538,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b6,b5)= -0.0000148209, Var(b6)= 0.0007458230, Cov(b6,b7)= -0.0000190907,
Cov(b6,b8)= 0.0000144784, Cov(b6,b9)= -0.0000210208,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b6,b10)= -0.0000107506,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b7,b0)= -0.0596499553, Cov(b7,b1)= -0.0000140049, Cov(b7,b2)= -0.0000493941, Cov(b7,b3)=
-0.0000172970, Cov(b7,b4)= -0.0000143101,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b7,b5)= 0.0000027538, Cov(b7,b6)= -0.0000190907, Var(b7)= 0.0007260803,
Cov(b7,b8)= 0.0000020155, Cov(b7,b9)= -0.0000315939,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b7,b10)= 0.0000118710,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b8,b0)= -0.0702830267, Cov(b8,b1)= -0.0000054397, Cov(b8,b2)= 0.0000050626, Cov(b8,b3)=
-0.0000289005, Cov(b8,b4)= -0.0000538034,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b8,b5)= 0.0000295819, Cov(b8,b6)= 0.0000144784, Cov(b8,b7)= 0.0000020155, Var(b8)=
0.0007377398, Cov(b8,b9)= -0.0000072636,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b8,b10)= 0.0000100245,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b9,b0)= -0.0742002508, Cov(b9,b1)= -0.0000230901, Cov(b9,b2)= 0.0000216663, Cov(b9,b3)=
-0.0000041872, Cov(b9,b4)= -0.0000329566,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b9,b5)= -0.0000012781, Cov(b9,b6)= -0.0000210208, Cov(b9,b7)= -0.0000315939, Cov(b9,b8)=
-0.0000072636, Var(b9)= 0.0008145366,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b9,b10)= 0.0000294704,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b10,b0)= -0.0847615428, Cov(b10,b1)= 0.0000321724, Cov(b10,b2)= 0.0000217277,
Cov(b10,b3)= 0.0000175929, Cov(b10,b4)= -0.0000454363,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b10,b5)= 0.0000043977, Cov(b10,b6)= -0.0000107506, Cov(b10,b7)= 0.0000118710,
Cov(b10,b8)= 0.0000100245, Cov(b10,b9)= 0.0000294704,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Var(b10)= 0.0007778203,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ indivudal test ~~~~~~~~~~~~~~
variable coefficient staradarad error t value F value
intercept -7.4503312427 8.6670028613 -0.8596 0.7389
X1 slope 0.1470880113 0.0268668371 5.4747 29.9724
X2 slope 0.1134822176 0.0269079848 4.2174 17.7866
X3 slope 0.1214958597 0.0277529953 4.3778 19.1648
X4 slope 0.1036262427 0.0277277976 3.7373 13.9672
X5 slope 0.0702971007 0.0289345571 2.4295 5.9026
X6 slope 0.1105042678 0.0273097594 4.0463 16.3728
X7 slope 0.1090938389 0.0269458775 4.0486 16.3914
X8 slope 0.1181866415 0.0271613650 4.3513 18.9336
X9 slope 0.0756706102 0.0285400874 2.6514 7.0298
X10 slope 0.1064396116 0.0278894303 3.8165 14.5656
====================
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -5.54753 -3.64302 -2.26983 -1.09651 0.00011 1.09655
2.26983 3.64120 5.54709
upper limit -5.54753 -3.64302 -2.26983 -1.09651 0.00011 1.09655 2.26983
3.64120 5.54709
observed no 112.00000 85.00000 101.00000 95.00000 104.00000 103.00000 86.00000
112.00000 110.00000 92.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 1.44000 2.25000 0.01000 0.25000 0.16000 0.09000 1.96000
260
1.44000 1.00000 0.64000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =9.240000
p-value=0.322400
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=497
number of the positive ofresidual=503
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.188700, p-value=0.425200
H0: residual is random , H1: Oscillation
Z=-0.188700, p-value=0.574800
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.188700, p-value=0.850400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.926742
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.073258
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
261
(34.1.2)Non-linear model analysis,
Dependent variable is X11,
Independent variables are
X1/(1-X1),X2/(1-X2),X3^3,X4^3,X5^3,X6/(1-X6),X7/(1-X7),X8^3,X9/(1-X9),X10/(1-X10),
The correlation matrix is below
r(X11,X1/(1-X1))=0.159839,r(X11,X2/(1-X2))=0.120176,r(X11,X3^3)=0.122788,
r(X11,X4^3)=0.120656,r(X11,X5^3)=0.076117,r(X11,X6/(1-X6))=0.113402,
r(X11,X7/(1-X7))=0.142056,r(X11,X8^3)=0.137334,r(X11,X9/(1-X9))=0.093740,
r(X11,X10/(1-X10))=0.102837,r(X1/(1-X1),X2/(1-X2))=-0.020878,r(X1/(1-X1),X3^3)=-0.005132,
r(X1/(1-X1),X4^3)=0.002852,r(X1/(1-X1),X5^3)=-0.016269,r(X1/(1-X1),X6/(1-X6))=-0.010289,
r(X1/(1-X1),X7/(1-X7))=0.021956,r(X1/(1-X1),X8^3)=0.011516,r(X1/(1-X1),X9/(1-X9))=0.036457,
r(X1/(1-X1),X10/(1-X10))=-0.041011,r(X2/(1-X2),X3^3)=-0.043579,r(X2/(1-X2),X4^3)=-0.013706,
r(X2/(1-X2),X5^3)=0.042441,r(X2/(1-X2),X6/(1-X6))=0.009398,
r(X2/(1-X2),X7/(1-X7))=0.065409,r(X2/(1-X2),X8^3)=-0.013791,
r(X2/(1-X2),X9/(1-X9))=-0.026665,r(X2/(1-X2),X10/(1-X10))=-0.027921,r(X3^3,X4^3)=-0.024639,
r(X3^3,X5^3)=0.023045,r(X3^3,X6/(1-X6))=-0.040099,r(X3^3,X7/(1-X7))=0.016333,
r(X3^3,X8^3)=0.038874,r(X3^3,X9/(1-X9))=0.009999,r(X3^3,X10/(1-X10))=-0.019516,
r(X4^3,X5^3)=-0.002575,r(X4^3,X6/(1-X6))=-0.058136,r(X4^3,X7/(1-X7))=0.017290,
r(X4^3,X8^3)=0.073642,r(X4^3,X9/(1-X9))=0.039448,r(X4^3,X10/(1-X10))=0.056368,
r(X5^3,X6/(1-X6))=0.017163,r(X5^3,X7/(1-X7))=0.002357,r(X5^3,X8^3)=-0.038543,
r(X5^3,X9/(1-X9))=0.001146,r(X5^3,X10/(1-X10))=-0.003904,
r(X6/(1-X6),X7/(1-X7))=0.019963,r(X6/(1-X6),X8^3)=-0.024927,
r(X6/(1-X6),X9/(1-X9))=0.019955,r(X6/(1-X6),X10/(1-X10))=0.010906,
r(X7/(1-X7),X8^3)=-0.000673,r(X7/(1-X7),X9/(1-X9))=0.040521,
r(X7/(1-X7),X10/(1-X10))=-0.023087,r(X8^3,X9/(1-X9))=0.009972,
r(X8^3,X10/(1-X10))=-0.007249,r(X9/(1-X9),X10/(1-X10))=-0.038285,
The step of independent variable function into the linear model
One or more independent variable mathematical model are changed,
the inptut order is nonsense.
step 1, X1/(1-X1) into the linear model, SSR= 550.7637684560
step 2, X7/(1-X7) into the linear model, SSR= 414.0005835120
step 3, X8^3 into the linear model, SSR= 396.5661458599
step 4, X2/(1-X2) into the linear model, SSR= 292.5981016362
step 5, X3^3 into the linear model, SSR= 317.9405210125
step 6, X6/(1-X6) into the linear model, SSR= 308.6943160358
step 7, X4^3 into the linear model, SSR= 312.5261301144
step 8, X10/(1-X10) into the linear model, SSR= 267.1628304095
step 9, X9/(1-X9) into the linear model, SSR= 141.3022651843
step 10, X5^3 into the linear model, SSR= 116.6332214324
The estimated line ------
X11=6655.2612996575+1427.635572*X1/(1-X1)+1109.290130*X2/(1-X2)+0.000004*X3^3
+0.000003*X4^3+0.000002*X5^3+1086.414201*X6/(1-X6)+1080.517700*X7/(1-X7)
+0.000004*X8^3+752.276595*X9/(1-X9)+1047.002574*X10/(1-X10)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 10 3118.1878836530 311.8187883653 16.7243417739
error 989 18439.5168349765 18.6446075177
total 999 21557.7047186295
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE=18.6446075177 , R2=0.144644 , R2(adj)=0.135995
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 6655.2612996575 150.4263295500 44.24266 0.00000
X1/(1-X1) 1427.6355718509 59.8433526094 23.85621 0.00000
262
X2/(1-X2) 1109.2901296678 60.4827853500 18.34059 0.00000
X3^3 0.0000040097 0.0000002118 18.92945 0.00000
X4^3 0.0000034308 0.0000002121 16.17560 0.00000
X5^3 0.0000023940 0.0000002217 10.79969 0.00000
X6/(1-X6) 1086.4142005958 60.7742339354 17.87623 0.00000
X7/(1-X7) 1080.5176999974 60.0325350419 17.99887 0.00000
X8^3 0.0000039386 0.0000002073 18.99554 0.00000
X9/(1-X9) 752.2765951882 63.4245585025 11.86097 0.00000
X10/(1-X10) 1047.0025740547 62.1957122237 16.83400 0.00000
----------------------------------------------------------------------------------
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -5.53386 -3.63404 -2.26423 -1.09380 0.00011 1.09384
2.26424 3.63222 5.53341
upper limit -5.53386 -3.63404 -2.26423 -1.09380 0.00011 1.09384 2.26424
3.63222 5.53341
observed no 111.00000 82.00000 102.00000 102.00000 95.00000 110.00000 85.00000
109.00000 113.00000 91.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 1.21000 3.24000 0.04000 0.04000 0.25000 1.00000 2.25000
0.81000 1.69000 0.81000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =11.340000
p-value=0.183100
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=492
number of the positive ofresidual=508
H0: residualis random , H1: Increasing line or decreasing line
Z=0.197982, p-value=0.578500
H0: residual is random , H1: Oscillation
Z=0.197982, p-value=0.421500
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.197982, p-value=0.843000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.920537
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.079463
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [17.360449 , 20.133921]
90% confidence interval for population standard deviation [4.166587 , 4.487084]
95% confidence interval for population variance [17.134379 , 20.446794]
95% confidence interval for population standard deviation [4.139369 , 4.521813]
99% confidence interval for population variance [16.708918 , 21.087552]
99% confidence interval for population standard deviation
[4.087654 , 4.592118]
263
residual plot (X11 estimated line,X11) scatter
diagram
SSR of stepwise in the linear model SSR of stepwise in the non-linear model
step 1, X1 into the linear model, SSR= step 1, X1/(1-X1) into the linear model, SSR=
531.3664409936 550.7637684560
step 2, X7 into the linear model, SSR= step 2, X7/(1-X7) into the linear model, SSR=
401.5706605275 414.0005835120
step 3, X8 into the linear model, SSR= step 3, X8^3 into the linear model, SSR=
388.3547404849 396.5661458599
step 4, X2 into the linear model, SSR= step 4, X2/(1-X2) into the linear model, SSR=
285.3976832166 292.5981016362
step 5, X3 into the linear model, SSR= step 5, X3^3 into the linear model, SSR=
309.9942976198 317.9405210125
step 6, X6 into the linear model, SSR= step 6, X6/(1-X6) into the linear model, SSR=
299.5827128196 308.6943160358
step 7, X4 into the linear model, SSR= step 7, X4^3 into the linear model, SSR=
310.4347613610 312.5261301144
step 8, X10 into the linear model, SSR= step 8, X10/(1-X10) into the linear model, SSR=
257.5509009630 267.1628304095
step 9, X9 into the linear model, SSR= step 9, X9/(1-X9) into the linear model, SSR=
132.0909257631 141.3022651843
step 10, X5 into the linear model, SSR= step 10, X5^3 into the linear model, SSR=
110.5956960362 116.6332214324
The SSR of linear model and non-linear model are unqueal but is very closely.
All estimated slope value of linear model are equally likely, it is said the X1,..,X10
has a function of central tendency. The sample central tendency has sample median,
sample median and sample midrange, reconstructing the line model and the
independent variable is the sample statistic of central tendency.
(34.1.3)
Independent variable is sample statistic of central tendency and the dependent
variable is X11,
(34.1.3.1)Let X1=sample median of (X1,…,X10),X2= X11,
The linear model analysis
The estimated line is X2=49.303787+0.508023*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 898.5240494875 898.5240494875 43.4057388698
error 998 20659.1806691420 20.7005818328
total 999 21557.7047186295
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
variable coefficient standard error t test p value
264
----------------------------------------------------------------------------------
intercept 49.3037874860 7.7136389653 6.39177 0.00000
slpoe 0.5080232590 0.0771098786 6.58830 0.00000
----------------------------------------------------------------------------------
MSE=20.7005818328 , R2=0.041680 , R2(adj)=0.040720
X2(mean)= 100.1147385908, X2(variance)= 21.5792840026, X2(s.d.)= 4.6453507944
X1(mean)=100.0169779713, X1(variance)=3.4849538007, X1(s.d.)= 1.8668030964
SSX1=3481.4688468525 , SS(X2*X1)= 1768.6671497030, C.V.= 0.0454457483
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -5.83100 -3.82916 -2.38581 -1.15253 0.00011 1.15258
2.38581 3.82725 5.83053
upper limit -5.83100 -3.82916 -2.38581 -1.15253 0.00011 1.15258 2.38581
3.82725 5.83053
observed no 114.00000 94.00000 88.00000 89.00000 104.00000 109.00000 92.00000
106.00000 106.00000 98.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 1.96000 0.36000 1.44000 1.21000 0.16000 0.81000 0.64000
0.36000 0.36000 0.04000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =7.340000
p-value=0.500400
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=489
number of the positive ofresidual=511
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.047987, p-value=0.480900
H0: residual is random , H1: Oscillation
Z=-0.047987, p-value=0.519100
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.047987, p-value=0.961800
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0 , D.W. test=1.929624
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0 , D.W. test=2.070376
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [19.280818 , 22.346057]
90% confidence interval for population standard deviation [4.390993 , 4.727162]
95% confidence interval for population variance [19.030784 , 22.691586]
95% confidence interval for population standard deviation [4.362429 , 4.763569]
99% confidence interval for population variance [18.560148 , 23.399059]
99% confidence interval for population standard deviation [4.308149 , 4.837257]
265
residual plot (X11 esitmated line,X11) scatter
diagram
266
H0: residualis random , H1: Increasing line or decreasing line
Z=0.075963, p-value=0.530300
H0: residual is random , H1: Oscillation
Z=0.075963, p-value=0.469700
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.075963, p-value=0.939400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0 D.W. test=1.932781
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0 D.W. test=2.067219
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [17.390541 , 20.155266]
90% confidence interval for population standard deviation [4.170197 , 4.489462]
95% confidence interval for population variance [17.165019 , 20.466919]
95% confidence interval for population standard deviation [4.143069 , 4.524038]
99% confidence interval for population variance [16.740524 , 21.105032]
99% confidence interval for population standard deviation [4.091519 , 4.594021]
scatter plot (X11 estimated line,X11) scatter
diagram
267
X2(mean)= 100.1147385908, X2(variance)= 21.5792840026, X2(s.d.)= 4.6453507944
X1(mean)=99.9540589242, X1(variance)= 4.6127633789, X1(s.d.)= 2.1477344759
SSX1=4608.1506155634 , SS(X2*X1)= 4947.3726088020, C.V.= 0.0403006259
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -5.17084 -3.39565 -2.11570 -1.02205 0.00010 1.02209
2.11570 3.39395 5.17043
upper limit -5.17084 -3.39565 -2.11570 -1.02205 0.00010 1.02209 2.11570
3.39395 5.17043
observed no 109.00000 95.00000 94.00000 99.00000 90.00000 100.00000 114.00000
106.00000 87.00000 106.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.81000 0.25000 0.36000 0.01000 1.00000 0.00000 1.96000
0.36000 1.69000 0.36000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =6.800000
p-value=0.558300
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=487
number of the positive ofresidual=513
H0: residualis random , H1: Increasing line or decreasing line
Z=-1.181679, p-value=0.118700
H0: residual is random , H1: Oscillation
Z=-1.181679, p-value=0.881300
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-1.181679, p-value=0.237400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.904664
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.095336
268
(34.1. 4) The best linear model of three models.
X11=sample midrange of (X1,…,X10)
X2=-7.197286+1.073613* sample midrange of (X1,…,X10)+residual,
residual~Normal(0,16.2786961983).
intercept test H0: b0=0,p-value=0.22600,
X2=1.073613*sample midrange of (X1,…,X10) +error,
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.0110135921 0.0380967324 0.28910 0.77240
X1 0.0999679442 0.0001204760 829.77463 0.00000
X2 0.0999300896 0.0001204493 829.64416 0.00000
X3 0.0998878245 0.0001204471 829.30876 0.00000
X4 0.0998924611 0.0001204500 829.32703 0.00000
X5 0.1000779894 0.0001204645 830.76725 0.00000
X6 0.0999622438 0.0001204553 829.87029 0.00000
X7 0.1000537324 0.0001204693 830.53287 0.00000
X8 0.0997723293 0.0001204530 828.30917 0.00000
X9 0.1002271516 0.0001204440 832.14750 0.00000
X10 0.1001151352 0.0001204452 831.20924 0.00000
----------------------------------------------------------------------------------
MSE=18.1372864360 , R2=0.121116 , R2(adj)=0.121115
dependent variable:X11 , sample mean= 99.9999671850 , sample variance=20.636710
independent variable:X1 , sample mean= 99.9994266097 , sample variance=24.992012
269
independent variable:X2 , sample mean= 100.0000272232 , sample variance=25.003085
independent variable:X3 , sample mean= 100.0004395878 , sample variance=25.004021
independent variable:X4 , sample mean= 100.0006059031 , sample variance=25.002796
independent variable:X5 , sample mean= 100.0011910828 , sample variance=24.996775
independent variable:X6 , sample mean= 100.0006931820 , sample variance=25.000624
independent variable:X7 , sample mean= 100.0010210333 , sample variance=24.994786
independent variable:X8 , sample mean= 99.9994009786 , sample variance=25.001565
independent variable:X9 , sample mean= 99.9990704106 , sample variance=25.005315
independent variable:X10 , sample mean= 100.0007584652 , sample variance=25.004818
~~~~~~~~~~~~~~~~~~~~~~~~~~ indivudal test ~~~~~~~~~~~~~~~~~~~~~~
variable coefficient staradarad error t value F value
intercept 0.0110135921 0.0380967324 0.2891 0.0836
X1 slope 0.0999679442 0.0001204760 829.7746 688525.9418
X2 slope 0.0999300896 0.0001204493 829.6442 688309.4337
X3 slope 0.0998878245 0.0001204471 829.3088 687753.0188
X4 slope 0.0998924611 0.0001204500 829.3270 687783.3216
X5 slope 0.1000779894 0.0001204645 830.7672 690174.2230
X6 slope 0.0999622438 0.0001204553 829.8703 688684.6935
X7 slope 0.1000537324 0.0001204693 830.5329 689784.8444
X8 slope 0.0997723293 0.0001204530 828.3092 686096.0828
X9 slope 0.1002271516 0.0001204440 832.1475 692469.4537
X10 slope 0.1001151352 0.0001204452 831.2092 690908.8073
====================
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -7.00530 -5.45805 -4.41392 -3.58425 -2.87241 -2.23321
-1.64087 -1.07992 -0.53496 -0.00096 0.53420 1.07886 1.64083 2.23322
2.87224 3.58399 4.41373 5.45761 7.00510
upper limit -7.00530 -5.45805 -4.41392 -3.58425 -2.87241 -2.23321 -1.64087
-1.07992 -0.53496 -0.00096 0.53420 1.07886 1.64083 2.23322 2.87224
3.58399 4.41373 5.45761 7.00510
observed no 2498239.00000 2494891.00000 2500279.00000 2498080.00000 2500951.00000 2499743.00000
2502938.00000 2497196.00000 2508220.00000 2493915.00000 2502167.00000 2507317.00000
2502174.00000 2501887.00000 2501793.00000 2498769.00000 2500123.00000 2497027.00000
2494048.00000 2500243.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000
chi square 1.24045 10.44075 0.03114 1.47456 0.36176 0.02642 3.45274
3.14497 27.02736 14.81089 1.87836 21.41540 1.89051 1.42431 1.28594
0.60614 0.00605 3.53549 14.17052 0.02362
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =108.247369
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=24999016
number of the positive ofresidual=25000984
H0: residualis random , H1: Increasing line or decreasing line
Z=0.429932, p-value=0.666400
H0: residual is random , H1: Oscillation
Z=0.429932, p-value=0.333600
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.429932, p-value=0.667200
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,50000000
270
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.999919
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.000081
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [18.131322 , 18.143255]
90% confidence interval for population standard deviation [4.258089 , 4.259490]
95% confidence interval for population variance [18.130179 , 18.144399]
95% confidence interval for population standard deviation [4.257955 , 4.259624]
99% confidence interval for population variance [18.127946 , 18.146636]
99% confidence interval for population standard deviation [4.257693 , 4.259887]
The joint probability distribution of X11 The joint probability distribution of X11
estimated line and residual estimated line and X11
271
(34.2.1.1)The marginal probability of depenednet estimated of X11,
X11 estimated line probability distribution
Mathematical Mean: 99.99997
Geometrical Mean : 99.98747
Harmonic Mean : 99.97496
Variance : 2.49943
S.D. : 1.58096
Skewed Coef. : 0.00002
Kurtosis Coef. : 2.99870
MAD : 1.26151
Range : 17.33966
Mid_range : 100.25892
Median : 99.99987
Q1 : 98.93349
Q2 : 99.99987
Q3 : 101.06666
IQR : 2.13317
C.V. : 0.01581
272
SLLN analysis, X0=residual and Normal(0, 18.13728),
Note:X1~Normal(0, 18.13728), X1 is representable code of Normal(0, 18.13728),
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000179
Pr(| X0 distribution F() - X1 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0001000000)= 0.573363
273
step 1, X10/(1-X10) into the linear model, SSR= 12602485.7531839610
step 2, X9/(1-X9) into the linear model, SSR= 12594932.7051732540
step 3, X5/(1-X5) into the linear model, SSR= 12570768.9250471590
step 4, X7/(1-X7) into the linear model, SSR= 12560518.2108957770
step 5, X6/(1-X6) into the linear model, SSR= 12527527.8000582460
step 6, X1/(1-X1) into the linear model, SSR= 12527385.6705855130
step 7, X2/(1-X2) into the linear model, SSR= 12525029.9134794470
step 8, X4/(1-X4) into the linear model, SSR= 12526546.3414355520
step 9, X3/(1-X3) into the linear model, SSR= 12509716.5299316640
step 10, X8/(1-X8) into the linear model, SSR= 12483551.5261553530
274
chi square 24.42344 3.48100 5.18400 2.24297 3.97152 2.70192 8.12161
0.12455 38.91151 17.38706 1.07978 10.75369 0.88923 0.12769 0.35570
13.13316 2.10497 14.01382 28.77773 15.93654
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =193.721894
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=25012624
number of the positive ofresidual=24987376
H0: residualis random , H1: Increasing line or decreasing line
Z=0.362145, p-value=0.641400
H0: residual is random , H1: Oscillation
Z=0.362145, p-value=0.358600
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.362145, p-value=0.717200
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,50000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.999922
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.000078
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance
[18.122183 , 18.134110]
90% confidence interval for population standard deviation
[4.257016 , 4.258416]
95% confidence interval for population variance
[18.121041 , 18.135253]
95% confidence interval for population standard deviation
[4.256882 , 4.258551]
99% confidence interval for population variance
[18.118809 , 18.137489]
99% confidence interval for population standard deviation
[4.256619 , 4.258813]
The joint probability distribution of X11 The joint probability distribution of X11
estimated line and residual estimated line and X11
275
(34.2.2.1)The mariagnal proability distribution of depedent variable estimated line,
X11 estimated line probability distribution
Mathematical Mean: 99.99997
Geometrical Mean : 99.98741
Harmonic Mean : 99.97483
Variance : 2.50855
S.D. : 1.58384
Skewed Coef. : -0.09762
Kurtosis Coef. : 3.01799
MAD : 1.26330
Range : 17.46074
Mid_range : 99.57886
Median : 100.02558
Q1 : 98.94654
Q2 : 100.02558
Q3 : 101.08149
IQR : 2.13495
C.V. : 0.01584
276
SLLN analysis, X0=residual and Normal(0, 18.12814),
Note:X1~ Normal(0, 18.12814),
X1 is representable code of Normal(0, 18.12814),
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000382
Pr(| X0 distribution F() - X1 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0001000000)= 0.717220
277
(34.2.4) The joint probability distribution of one of X1,…,X10 and X11,
f(x1,x2) and f(x1,x11) only,
f(x1,x2) f(x2,x1)
f(x1,x11) f(x11,x1)
278
(34.2.5)The marginal probability distribution of sample median(X1,…,X10),
sample mean(X1,…,X10) and sample midrange (X1,…,X10),
the joint probability distribution of sample satsitic and X11.
Y1= sample median(X1,…,X10),
Mathematical Mean: 100.00033
Geometrical Mean : 99.98303
Harmonic Mean : 99.96572
Variance : 3.45868
S.D. : 1.85975
Skewed Coef. : 0.00037
Kurtosis Coef. : 3.01741
MAD : 1.48278
Range : 20.68329
Mid_range : 100.06641
Median : 99.99999
Q1 : 98.74835
Q2 : 99.99999
Q3 : 101.25230
IQR : 2.50395
C.V. : 0.01860
Y2= sample mean(X1,…,X10),
Mathematical Mean: 100.00026
Geometrical Mean : 99.98776
Harmonic Mean : 99.97525
Variance : 2.49999
S.D. : 1.58113
Skewed Coef. : 0.00002
Kurtosis Coef. : 2.99870
MAD : 1.26165
Range : 17.34104
Mid_range : 100.25825
Median : 100.00016
Q1 : 98.93366
Q2 : 100.00016
Q3 : 101.06709
IQR : 2.13342
C.V. : 0.01581
279
f(y1,y4), f(y4,y1),
Y1= sample median(X1,…,X10),
Y4=X11,
f(y2,y4), f(y4,y2),
Y2= sample mean(X1,…,X10),
Y4=X11,
280
E(Y4|Y2) Var(Y4|Y2)
f(y3,y4), f(y4,y3),
Y3= sample midrange(X1,…,X10),
Y4=X11,
E(Y4|Y3) Var(Y4|Y3)
281
Y1= sample median(X1,…,X10), Y2= sample mean (X1,…,X10),
f(y1,y2), f(y2,y1),
282
Y2= sample mean(X1,…,X10),Y3= sample midrange(X1,…,X10),
f(y1,y3), f(y3,y1),
283
-1.01425 -0.50243 -0.00091 0.50172 1.01325 1.54106 2.09742 2.69758
3.36605 4.14533 5.12574 6.57912
observed no 2499843.00000 2497922.00000 2503165.00000 2499465.00000 2501094.00000 2498744.00000
2500649.00000 2496794.00000 2502781.00000 2497297.00000 2498742.00000 2503158.00000
2499099.00000 2500824.00000 2499902.00000 2498744.00000 2502175.00000 2500066.00000
2499780.00000 2499756.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000
chi square 0.00986 1.72723 4.00689 0.11449 0.47873 0.63101 0.16848
4.11137 3.09358 2.92248 0.63303 3.98919 0.32472 0.27159 0.00384
0.63101 1.89225 0.00174 0.01936 0.02381
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =25.054690
p-value=0.123400
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=25002154
number of the positive ofresidual=24997846
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.326914, p-value=0.371900
H0: residual is random , H1: Oscillation
Z=-0.326914, p-value=0.628100
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.326914, p-value=0.743800
2. The population sigma of error confidence interval
90% confidence interval for population variance
[15.993259 , 16.003785]
90% confidence interval for population standard deviation
[3.999157 , 4.000473]
95% confidence interval for population variance
[15.992251 , 16.004794]
95% confidence interval for population standard deviation
[3.999031 , 4.000599]
99% confidence interval for population variance
[15.990282 , 16.006768]
99% confidence interval for population standard deviation
[3.998785 , 4.000846]
The joint probability distribution of X2 The joint probability distribution of X2
and residual estimated line and X2
284
6.6. Dummy variable is one of independent variable, the other
assumptions are unchanged.
Example 35,
Dummy=0,
X 1 ~ Normal (E ( X 1 ) = 100,Var ( X 1 ) = 25),
X 2 x1 ~ Normal (E (X 2 x1 ) = 50 + 2 x1 ,Var (X 2 x1 ) = 1),
ε ~ Normal (E (ε ) = 0,Var (ε ) = 16 ),
X 3 Dummmy = 0, x1 , x2 = 50 + 2 x1 + 3 x2 + ε
Dummy=1,
X 1 ~ Normal (E ( X 1 ) = 100,Var ( X 1 ) = 25),
X 2 x1 ~ Normal (E (X 2 x1 ) = 50 + 2 x1 ,Var (X 2 x1 ) = 1),
ε ~ Normal (E (ε ) = 0,Var (ε ) = 16 ),
X 3 Dummmy = 1, x1 , x2 = 10 + x1 + 5 x2 + ε
285
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -5.19358 -3.41058 -2.12500 -1.02655 0.00010 1.02658
2.12501 3.40888 5.19316
upper limit -5.19358 -3.41058 -2.12500 -1.02655 0.00010 1.02658 2.12501
3.40888 5.19316
observed no 99.00000 114.00000 111.00000 92.00000 89.00000 81.00000 104.00000
108.00000 100.00000 102.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.01000 1.96000 1.21000 0.64000 1.21000 3.61000 0.16000
0.64000 0.00000 0.04000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =9.480000
p-value=0.303400
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=505
number of the positive ofresidual=495
H0: residualis random , H1: Increasing line or decreasing line
Z=0.446149, p-value=0.672300
H0: residual is random , H1: Oscillation
Z=0.446149, p-value=0.327700
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.446149, p-value=0.655400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.041210
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=1.958790
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [15.295336 , 17.728284]
90% confidence interval for population standard deviation [3.910925 , 4.210497]
95% confidence interval for population variance [15.096894 , 18.002561]
95% confidence interval for population standard deviation [3.885472 , 4.242942]
99% confidence interval for population variance [14.723376 , 18.564158]
99% confidence interval for population standard deviation [3.837105 , 4.308614]
residual plot (X3 estimated line,X3) scatter diagram
286
(35.1.2) Dummy=1,
The linear model analysis
Dependent variable is X3,
Independent variables are X1,X2
The correlation matrix is below
r(X3,X1)=0.989692,r(X3,X2)=0.996094,r(X1,X2)=0.994836,
The estimated line is X3=3.135026+-1.115684*X1+5.074093*X2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 4016310.4152957569 2008155.2076478784 129633.4189882031
error 1997 30935.5872966504 15.4910301936
total 1999 4047246.0025924072
----------------------------------------------------------------------------------
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 3.1350260377 4.7189833703 0.66434 0.50640
X1 -1.1156843386 0.1760572333 -6.33705 0.00000
X2 5.0740930310 0.0875175417 57.97801 0.00000
----------------------------------------------------------------------------------
MSE=15.4910301936 , R2=0.992356 , R2(adj)=0.992349
dependent variable:X3 , sample mean=1162.4274350265 , sample variance=2024.635319
independent variable:X1 , sample mean=100.2568506284 , sample variance=24.271728
independent variable:X2 , sample mean=250.5171661846 , sample variance=98.224138
~~~~~~~~~~~~~~~~~~~~~~~~~~ indivudal test ~~~~~~~~~~~~~~~~~~~
variable coefficient staradarad error t value F value
intercept 3.1350260377 4.7189833703 0.6643 0.4414
X1 slope -1.1156843386 0.1760572333 -6.3371 40.1583
X2 slope 5.0740930310 0.0875175417 57.9780 3361.4498
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ]
lower limit -5.25552 -3.57581 -2.37980 -1.37290 -0.44969 0.44899
1.37185 2.37860 3.57418 5.25278
upper limit -5.25552 -3.57581 -2.37980 -1.37290 -0.44969 0.44899 1.37185
2.37860 3.57418 5.25278
observed no 180.00000 191.00000 175.00000 163.00000 187.00000 197.00000 166.00000
188.00000 203.00000 178.00000 172.00000
probability 0.09091 0.09091 0.09091 0.09091 0.09091 0.09091 0.09091
0.09091 0.09091 0.09091 0.09091
expected no 181.81818 181.81818 181.81818 181.81818 181.81818 181.81818 181.81818
181.81818 181.81818 181.81818 181.81818
chi square 0.01818 0.46368 0.25568 1.94768 0.14768 1.26768 1.37618
0.21018 2.46768 0.08018 0.53018
degree of freedom=9
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =8.765000
p-value=0.459200
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=1000
number of the positive ofresidual=1000
H0: residualis random , H1: Increasing line or decreasing line
Z=-1.163046, p-value=0.122500
H0: residual is random , H1: Oscillation
Z=-1.163046, p-value=0.877500
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
287
Z=-1.163046, p-value=0.245000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,2000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.967591
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.032409
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [14.724537 , 16.341706]
90% confidence interval for population standard deviation [3.837256 , 4.042488]
95% confidence interval for population variance [14.586281 , 16.515440]
95% confidence interval for population standard deviation [3.819199 , 4.063919]
99% confidence interval for population variance [14.323307 , 16.866053]
99% confidence interval for population standard deviation [3.784615 , 4.106830]
residual plor (X3 estimated line,X3) scatter diagram
(35.1.3) Merging two lines, one is Dummy=0 line and the other is Dummy=1 line,
Dummy explains two lines,
Dummy = 0 − − − − X 3 = β 0* + β1* X 1 + β 2* X 2 + ε
X3=37.536437+1.458679*X1+3.267224*X2,
Dummy = 1 − − − − X 3 = β 0 + β1 X 1 + β 2 X 2 + ε ,
X3=3.135026+-1.115684*X1+5.074093*X2,
( ) ( )
X 3 = β 0* + β 0 − β 0* × Dummy + β1* × X 1 + β1 − β1* × Dummy × X 1 + β 2* × X 2
( )
+ β 2 − β × Dummy × X 2 + ε ,
*
2
( )
0
β * =1.458679, βˆ − βˆ * =-1.115684-1.458679=-2.54363,
1 0 0
288
(35.2) 100,000,000 pair samples when Dummy=0,
100,000,000 pair samples when Dummy=1,
This is big data.
(35.2.1)
Dummy=0
The linear model analysis
Dependent variable is X3,
Independent variables are X1,X2
The correlation matrix is below
r(X3,X1)=0.992276,r(X3,X2)=0.994758,r(X1,X2)=0.995035,
The estimated line is X3=49.981969+1.999034*X1+3.000456*X2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
Regression 2 160891286545.8791800000 80445643272.9395900000
error 99999997 1600009673.5958138000 16.0000972160
total 99999999 162491296219.4750100000
----------------------------------------------------------------------------------
F test value=5027822155.5235453000
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 49.9819685798 0.0215437686 2320.01975 0.00000
X1 1.9990339848 0.0008038458 2486.83762 0.00000
X2 3.0004562569 0.0003999390 7502.28390 0.00000
----------------------------------------------------------------------------------
MSE=16.0000972160 , R2=0.990153 , R2(adj)=0.990153
dependent variable:X3 , sample mean=1000.0025100438 , sample variance=1624.912978
independent variable:X1 , sample mean=100.0003227308 , sample variance=24.999969
independent variable:X2 , sample mean=250.0008110845 , sample variance=100.994415
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -6.57964 -5.12640 -4.14572 -3.36646 -2.69787 -2.09752
-1.54116 -1.01430 -0.50245 -0.00091 0.50174 1.01330 1.54113 2.09752
2.69771 3.36622 4.14554 5.12599 6.57945
upper limit -6.57964 -5.12640 -4.14572 -3.36646 -2.69787 -2.09752 -1.54116
-1.01430 -0.50245 -0.00091 0.50174 1.01330 1.54113 2.09752 2.69771
3.36622 4.14554 5.12599 6.57945
observed no 5000255.00000 4995740.00000 4996814.00000 5000481.00000 5002041.00000 5000762.00000
5006458.00000 4989759.00000 5009520.00000 4991637.00000 5000057.00000 5009650.00000
4999340.00000 4999557.00000 5000250.00000 4998997.00000 4999801.00000 5000112.00000
4997946.00000 5000823.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.01300 3.62952 2.03012 0.04627 0.83314 0.11613 8.34115
20.97562 18.12608 13.98795 0.00065 18.62450 0.08712 0.03925 0.01250
0.20120 0.00792 0.00251 0.84378 0.13547
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =88.053884
p-value=0.000000
289
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=50002559
number of the positive ofresidual=49997441
H0: residualis random , H1: Increasing line or decreasing line
Z=0.230026, p-value=0.591000
H0: residual is random , H1: Oscillation
Z=0.230026, p-value=0.409000
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.230026, p-value=0.818000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.000129
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=1.999871
2. The population sigma of error confidence interval
90% confidence interval for population variance [15.996376 , 16.003820]
90% confidence interval for population standard deviation [3.999547 , 4.000477]
95% confidence interval for population variance [15.995663 , 16.004533]
95% confidence interval for population standard deviation [3.999458 , 4.000567]
99% confidence interval for population variance [15.994270 , 16.005929]
99% confidence interval for population standard deviation [3.999284 , 4.000741]
The joint probability distribution of The joint probability distribution of
X3 estimated line and residual X3 estimated line and X3
290
(35.2.2) Dummy=1
The linear model analysis
Dependent variable is X3,
Independent variables are X1,X2
The correlation matrix is below
r(X3,X1)=0.990027,r(X3,X2)=0.996060,r(X1,X2)=0.995038,
The estimated line is X3=9.981917+-1.001015*X1+5.000479*X2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
F
----------------------------------------------------------------------------------
Regression 2 205006682851.6498400000 102503341425.8249200000
error 99999997 1600005568.4168379000 16.0000561642
total 99999999 206606688420.0666800000
----------------------------------------------------------------------------------
F test value=6406436350.8527622000,
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 9.9819167145 0.0215435153 463.33742 0.00000
X1 -1.0010152763 0.0008040300 -1244.99738 0.00000
X2 5.0004786870 0.0004000125 12500.80602 0.00000
----------------------------------------------------------------------------------
MSE=16.0000561642 , R2=0.992256 , R2(adj)=0.992256
dependent variable:X3 , sample mean=1159.9954450810 , sample variance=2066.066905
independent variable:X1 , sample mean=99.9995133739 , sample variance=25.000053
independent variable:X2 , sample mean=249.9989795262 , sample variance=101.003944
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -6.57963 -5.12640 -4.14571 -3.36646 -2.69787 -2.09751
-1.54116 -1.01430 -0.50245 -0.00091 0.50174 1.01330 1.54113 2.09752
2.69771 3.36621 4.14553 5.12598 6.57944
upper limit -6.57963 -5.12640 -4.14571 -3.36646 -2.69787 -2.09751 -1.54116
-1.01430 -0.50245 -0.00091 0.50174 1.01330 1.54113 2.09752 2.69771
3.36621 4.14553 5.12598 6.57944
observed no 5001503.00000 4996434.00000 4999121.00000 5000801.00000 5004121.00000 4997297.00000
5001568.00000 4991596.00000 5009405.00000 4986991.00000 5001564.00000 5007662.00000
4998122.00000 5001150.00000 4998299.00000 5002529.00000 5000721.00000 4997985.00000
5003478.00000 4999653.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.45180 2.54327 0.15453 0.12832 3.39653 1.46124 0.49172
14.12544 17.69081 33.84682 0.48922 11.74125 0.70538 0.26450 0.57868
1.27917 0.10397 0.81205 2.41930 0.02408
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =92.708066
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=49997738
number of the positive ofresidual=50002262
H0: residualis random , H1: Increasing line or decreasing line
291
Z=0.141220, p-value=0.556200
H0: residual is random , H1: Oscillation
Z=0.141220, p-value=0.443800
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.141220, p-value=0.887600
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.000287
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=1.999713
2. The population sigma of error confidence interval
90% confidence interval for population variance [15.996335 , 16.003779]
90% confidence interval for population standard deviation [3.999542 , 4.000472]
95% confidence interval for population variance [15.995622 , 16.004492]
95% confidence interval for population standard deviation [3.999453 , 4.000562]
99% confidence interval for population variance [15.994229 , 16.005887]
99% confidence interval for population standard deviation [3.999279 , 4.000736]
The joint probability distribution of The joint probability distribution of
X3 estimated line and residual X3 estimated line and X3
X3 估計值與殘差的聯合機率分配 X3 估計值與 X3 的聯合機率分配
292
(35.2.3) Merging two lines, one is Dummy=0 line and the other is Dummy=1 line,
Dummy explains two lines,
Dummy ~ Bernoulli ( p = 0.5), the sample sizes of two lines are equally,
X 1 ~ Normal (E ( X 1 ) = 100,Var ( X 1 ) = 25),
X 2 x1 ~ Normal (E (X 2 x1 ) = 50 + 2 x1 ,Var (X 2 x1 ) = 1),
ε ~ Normal (E (ε ) = 0,Var (ε ) = 16 ),
X 3 Dummmy = 0, x1 , x2 = 50 + 2 x1 + 3 x2 + ε
又
X 1 ~ Normal (E ( X 1 ) = 100, Var ( X 1 ) = 25),
X 2 x1 ~ Normal (E ( X 2 x1 ) = 50 + 2 x1 , Var ( X 2 x1 ) = 1),
ε ~ Normal (E (ε ) = 0, Var (ε ) = 16),
X 3 Dummmy = 1, x1 , x2 = 10 + x1 + 5 x2 + ε
Dummy and X1,X2, ε are independent random variables, ε and X1,X2 are
independent random variables,X1,X2 are depenedent random variables.
The joint probability distribution of f (Dummy, x1 , x2 , x3 ) ,
f (Dummy, x1 , x2 , x3 ) from f (Dummy, x1 , x2 , error )
f (Dummy, x1 , x2 )
1 − (x1 − 100 )2 1 − ( x2 − (50 + 2 x1 ))2
= 0.5 Dummy × 0.51− Dummy × exp ×
exp
,
5 2π 50 2π 2
− error
2
f (error ) =
1
exp ,−∞ < error < ∞,−∞ < x1 , x2 < ∞, Dummy = 0,1,
4 2π 32
( x3 − 50 − 2 x1 − 3 x2 )2
f (x3 = 50 + 2 x1 + 3 x2 + error Dummy = 0, x1 , x2 ) =
1 ,
exp −
4 2π 32
( x − 10 + x1 − 5 x2 )
2
293
X3 conditional probability distribution when Dummy=0 is condition.
Mathematical Mean: 1000.00251
Geometrical Mean : 999.18840
Harmonic Mean : 998.37229
Variance : 1624.91298
S.D. : 40.31021
Skewed Coef. : 0.00024
Kurtosis Coef. : 2.99926
MAD : 32.16495
Range : 454.78176
Mid_range : 997.76593
Median : 999.99921
Q1 : 972.81214
Q2 : 999.99921
Q3 : 1027.19611
IQR : 54.38398
C.V. : 0.04031
X3|Dummy=0~Normal(1000.00251, 1624.91298),
X3|Dummy=1~Normal(1159.99545,2066.06690),
294
Note:X3 marginal probability distribution is not from
(X3|Dummy=0+X3|Dummy=1)/2
~Normal((1000.00251+1159.99545)/2,(1624.91298+2066.06690)/4,
295
6.7. The endogenous variable in the linear model, the other
assumptions are unchanged.
Example 36,
X 2 (t + 1) = β 0 + β1 X 1 (t ) + β 2 X 3 (t ) + β 3 X 4 (t ) + ε 1 (t ),
X 1 (t + 1) = α 0 + α 1 X 2 (t + 1) + α 2 X 3 (t + 1) + α 3 X 4 (t + 1) + ε 2 (t + 1),
X3(t)~ Normal(mu=10,sigma*sigma=4),
X4(t)~ Normal(mu=30+2*X3,sigma*sigma=25),
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 2.5047251362 0.2532236294 9.89136 0.00000
X2 0.8697515895 0.0135652896 64.11596 0.00000
X3 0.0395712780 0.0163199930 2.42471 0.01520
X4 0.0048332598 0.0050010169 0.96646 0.33360
----------------------------------------------------------------------------------
MSE=1.2282179783 , R2=0.677300 , R2(adj)=0.676815
dependent variable:X1 , sample mean=13.4839491171 , sample variance=3.800353
independent variable:X2 , sample mean=11.8880073355 , sample variance=3.361917
independent variable:X3 , sample mean=10.0566527547 , sample variance=3.696124
independent variable:X4 , sample mean=49.9985749896 , sample variance=39.147886
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ]
lower limit -1.47983 -1.00687 -0.67010 -0.38658 -0.12662 0.12643
296
0.38628 0.66976 1.00641 1.47906
upper limit -1.47983 -1.00687 -0.67010 -0.38658 -0.12662 0.12643 0.38628
0.66976 1.00641 1.47906
observed no 178.00000 186.00000 168.00000 172.00000 189.00000 193.00000 161.00000
210.00000 197.00000 162.00000 184.00000
probability 0.09091 0.09091 0.09091 0.09091 0.09091 0.09091 0.09091
0.09091 0.09091 0.09091 0.09091
expected no 181.81818 181.81818 181.81818 181.81818 181.81818 181.81818 181.81818
181.81818 181.81818 181.81818 181.81818
chi square 0.08018 0.09618 1.05018 0.53018 0.28368 0.68768 2.38368
4.36818 1.26768 2.16018 0.02618
degree of freedom=9
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =12.934000
p-value=0.165600
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=989
number of the positive ofresidual=1011
H0: residualis random , H1: Increasing line or decreasing line
Z=-4.199955, p-value=0.000100
H0: residual is random , H1: Oscillation
Z=-4.199955, p-value=0.999900
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-4.199955, p-value=0.000200
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,2000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.717703
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.282297
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
297
(36.1.2) Merging two lines, there are 2000 pair samples.
X2 is dependent variable and X1,X3,X4 are independent variables,
X 2 = β 0 + β1 X 1 + β 2 X 3 + β 3 X 4 + ε 1 ,
The linear model analysis
Dependent variable is X2,
Independent variables are X1,X3,X4
The correlation matrix is below
r(X2,X1)=0.821468,r(X2,X3)=0.077767,r(X2,X4)=0.025066,r(X1,X3)=0.112102,
r(X1,X4)=0.059818,r(X3,X4)=0.609887,
The estimated line is X2=1.805642+0.773962*X1+0.000384*X3+-0.007151*X4
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
Regression 3 4538.9481394521 1512.9827131507
error 1996 2181.5246729835 1.0929482330
total 1999 6720.4728124356
----------------------------------------------------------------------------------
F test value=1384.3132433239
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 1.8056421122 0.2412956998 7.48311 0.00000
X1 0.7739615277 0.0120712769 64.11596 0.00000
X3 0.0003842533 0.0154177372 0.02492 0.98000
X4 -0.0071513426 0.0047159801 -1.51641 0.12960
----------------------------------------------------------------------------------
MSE=1.0929482330 , R2=0.675391 , R2(adj)=0.674903
dependent variable:X2 , sample mean=11.8880073355 , sample variance=3.361917
independent variable:X1 , sample mean=13.4839491171 , sample variance=3.800353
independent variable:X3 , sample mean=10.0566527547 , sample variance=3.696124
independent variable:X4 , sample mean=49.9985749896 , sample variance=39.147886
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ]
lower limit -1.39597 -0.94980 -0.63212 -0.36467 -0.11945 0.11926
0.36439 0.63180 0.94937 1.39524
upper limit -1.39597 -0.94980 -0.63212 -0.36467 -0.11945 0.11926 0.36439
0.63180 0.94937 1.39524
observed no 197.00000 177.00000 165.00000 196.00000 177.00000 194.00000 183.00000
170.00000 181.00000 163.00000 197.00000
probability 0.09091 0.09091 0.09091 0.09091 0.09091 0.09091 0.09091
0.09091 0.09091 0.09091 0.09091
expected no 181.81818 181.81818 181.81818 181.81818 181.81818 181.81818 181.81818
181.81818 181.81818 181.81818 181.81818
chi square 1.26768 0.12768 1.55568 1.10618 0.12768 0.81618 0.00768
0.76818 0.00368 1.94768 1.26768
degree of freedom=9
298
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =8.996000
p-value=0.437600
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=1009
number of the positive ofresidual=991
H0: residualis random , H1: Increasing line or decreasing line
Z=-3.262117, p-value=0.000600
H0: residual is random , H1: Oscillation
Z=-3.262117, p-value=0.999400
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-3.262117, p-value=0.001200
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,2000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.737485
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.262515
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance
[1.038856 , 1.152982]
90% confidence interval for population standard deviation
[1.019243 , 1.073770]
95% confidence interval for population variance
[1.029100 , 1.165243]
95% confidence interval for population standard deviation
[1.014446 , 1.079464]
99% confidence interval for population variance
[1.010542 , 1.189988]
99% confidence interval for population standard deviation
[1.005257 , 1.090866]
residual plot (X2 estimated line,X2) scatter diagram
299
X1=0.778870+0.878632*X2+0.291467*X3+-0.013764*X4
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 3 2808.8009175528 936.2669725176 932.8222440846
error 996 999.6780314159 1.0036928026
total 999 3808.4789489687
----------------------------------------------------------------------------------
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.7788696180 0.3249882453 2.39661 0.01640
X2 0.8786315019 0.0172949985 50.80264 0.00000
X3 0.2914669693 0.0219899836 13.25453 0.00000
X4 -0.0137636091 0.0065889029 -2.08891 0.03680
----------------------------------------------------------------------------------
MSE=1.0036928026 , R2=0.737513 , R2(adj)=0.736722
dependent variable:X1 , sample mean=13.4885297649 , sample variance=3.812291
independent variable:X2 , sample mean=11.8880073355 , sample variance=3.363600
independent variable:X3 , sample mean=10.1413624600 , sample variance=3.579252
independent variable:X4 , sample mean=50.2331742135 , sample variance=39.863320
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.28396 -0.84317 -0.52534 -0.25378 0.00003 0.25379
0.52535 0.84275 1.28386
upper limit -1.28396 -0.84317 -0.52534 -0.25378 0.00003 0.25379 0.52535
0.84275 1.28386
observed no 96.00000 100.00000 98.00000 102.00000 96.00000 105.00000 100.00000
109.00000 94.00000 100.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.16000 0.00000 0.04000 0.04000 0.16000 0.25000 0.00000
0.81000 0.36000 0.00000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =1.820000
p-value=0.986000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=492
number of the positive ofresidual=508
H0: residualis random , H1: Increasing line or decreasing line
Z=-1.131181, p-value=0.129000
H0: residual is random , H1: Oscillation
Z=-1.131181, p-value=0.871000
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-1.131181, p-value=0.258000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.994126
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.005874
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
300
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance
[0.934790 , 1.083562]
90% confidence interval for population standard deviation
[0.966845 , 1.040943]
95% confidence interval for population variance
[0.922656 , 1.100335]
95% confidence interval for population standard deviation
[0.960550 , 1.048969]
99% confidence interval for population variance
[0.899818 , 1.134680]
99% confidence interval for population standard deviation
[0.948587 , 1.065214]
residual plot (X1 estimated line,X1) scatter diagram
301
independent variable:X3 , sample mean=9.9719430494 , sample variance=3.802330
independent variable:X4 , sample mean=49.7639757658 , sample variance=38.361456
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.26275 -0.82923 -0.51666 -0.24959 0.00002 0.24960
0.51667 0.82882 1.26264
upper limit -1.26275 -0.82923 -0.51666 -0.24959 0.00002 0.24960 0.51667
0.82882 1.26264
observed no 100.00000 92.00000 107.00000 112.00000 100.00000 86.00000 101.00000
104.00000 99.00000 99.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.00000 0.64000 0.49000 1.44000 0.00000 1.96000 0.01000
0.16000 0.01000 0.01000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =4.720000
p-value=0.787000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=511
number of the positive ofresidual=489
H0: residualis random , H1: Increasing line or decreasing line
Z=1.408094, p-value=0.920500
H0: residual is random , H1: Oscillation
Z=1.408094, p-value=0.079500
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=1.408094, p-value=0.159000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.091406
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=1.908594
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.904153 , 1.048050]
90% confidence interval for population standard deviation [0.950870 , 1.023743]
95% confidence interval for population variance [0.892417 , 1.064273]
95% confidence interval for population standard deviation [0.944678 , 1.031636]
99% confidence interval for population variance [0.870328 , 1.097493]
99% confidence interval for population standard deviation [0.932914 , 1.047613]
residual plot (X2 estimated line,X2) scatter diagram
302
(36.1.5)Conclusion,
Two lines cannot merge a line from above output.
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 1.9159002951 0.0010857543 1764.57991 0.00000
X2 0.8986221367 0.0000561273 16010.43525 0.00000
X3 0.0601329080 0.0000723854 830.73206 0.00000
X4 0.0039825055 0.0000225497 176.61009 0.00000
----------------------------------------------------------------------------------
MSE=1.2701932289 , R2=0.724174 , R2(adj)=0.724174
dependent variable:X1 , sample mean=13.1800094690 , sample variance=4.605060
independent variable:X2 , sample mean= 11.6440934994 , sample variance=4.060056
independent variable:X3 , sample mean= 10.0002083351 , sample variance=4.000203
independent variable:X4 , sample mean=50.0005300270 , sample variance=40.997526
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.85385 -1.44440 -1.16808 -0.94852 -0.76014 -0.59099
-0.43423 -0.28579 -0.14157 -0.00026 0.14137 0.28550 0.43422 0.59099
0.76010 0.94845 1.16803 1.44428 1.85380
upper limit -1.85385 -1.44440 -1.16808 -0.94852 -0.76014 -0.59099 -0.43423
-0.28579 -0.14157 -0.00026 0.14137 0.28550 0.43422 0.59099 0.76010
0.94845 1.16803 1.44428 1.85380
observed no 4998822.00000 4984458.00000 4988518.00000 4995274.00000 5001333.00000 5000594.00000
5011305.00000 4993699.00000 5015713.00000 4995329.00000 5008776.00000 5018245.00000
5006062.00000 5009219.00000 5000800.00000 5000197.00000 4997609.00000 4992711.00000
4984845.00000 4996491.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
303
chi square 0.27754 48.31075 26.36726 4.46702 0.35538 0.07057 25.56060
7.94052 49.37967 4.36365 15.40364 66.57600 7.34957 16.99799 0.12800
0.00776 1.14338 10.62590 45.93480 2.46262
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =333.722626
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=49994022
number of the positive ofresidual=50005978
H0: residualis random , H1: Increasing line or decreasing line
Z=-890.032474, p-value=0.000000
H0: residual is random , H1: Oscillation
Z=-890.032474, p-value=1.000000
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-890.032474, p-value=0.000000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.724288
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.275712
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance
[1.269898 , 1.270489]
90% confidence interval for population standard deviation
[1.126897 , 1.127160]
95% confidence interval for population variance
[1.269841 , 1.270545]
95% confidence interval for population standard deviation
[1.126872 , 1.127185]
99% confidence interval for population variance
[1.269731 , 1.270656]
99% confidence interval for population standard deviation
[1.126823 , 1.127234]
The joint probability distribution of X1 The joint probability distribution of X1
estimated line and residual estimated line and X1
304
The marginal probability distribution of X1 estimated line
Mathematical Mean: 13.18001
Geometrical Mean : 13.05024
Harmonic Mean : 12.91620
Variance : 3.33487
S.D. : 1.82616
Skewed Coef. : 0.00067
Kurtosis Coef. : 3.00057
MAD : 1.45705
Range : 21.73526
Mid_range : 13.38322
Median : 13.17991
Q1 : 11.94844
Q2 : 13.17991
Q3 : 14.41184
IQR : 2.46340
C.V. : 0.13856
305
----------------------------------------------------------------------------------
intercept 1.5939066857 0.0010283287 1549.99726 0.00000
X1 0.8005194001 0.0000499999 16010.43525 0.00000
X3 -0.0201006237 0.0000685260 -293.32850 0.00000
X4 -0.0059930572 0.0000212781 -281.65318 0.00000
----------------------------------------------------------------------------------
MSE=1.1315260108 , R2=0.721303 , R2(adj)=0.721303
dependent variable:X2 , sample mean=11.6440934994 , sample variance=4.060056
independent variable:X1 , sample mean= 13.1800094690 , sample variance=4.605060
independent variable:X3 , sample mean=10.0002083351 , sample variance=4.000203
independent variable:X4 , sample mean=50.0005300270 , sample variance=40.997526
306
[1.131263 , 1.131789]
90% confidence interval for population standard deviation
[1.063608 , 1.063856]
95% confidence interval for population variance
[1.131212 , 1.131840]
95% confidence interval for population standard deviation
[1.063585 , 1.063880]
99% confidence interval for population variance
[1.131114 , 1.131938]
99% confidence interval for population standard deviation
[1.063538 , 1.063926]
The joint probability distribution of X2 The joint probability distribution of X2
estimated line and residual estimated line and X2
307
(36.2.3)The marginal probability distribution of X1,X2,X3,X4,
X1 marginal probability distribution
Mathematical Mean: 13.18001
Geometrical Mean : 12.99892
Harmonic Mean : 12.80899
Variance : 4.60506
S.D. : 2.14594
Skewed Coef. : 0.00081
Kurtosis Coef. : 3.00012
MAD : 1.71220
Range : 25.60974
Mid_range : 12.93215
Median : 13.17982
Q1 : 11.73255
Q2 : 13.17982
Q3 : 14.62702
IQR : 2.89447
C.V. : 0.16282
X2 marginal probability distribution
Mathematical Mean: 11.64409
Geometrical Mean : 11.46243
Harmonic Mean : 11.27023
Variance : 4.06006
S.D. : 2.01496
Skewed Coef. : 0.00066
Kurtosis Coef. : 3.00019
MAD : 1.60771
Range : 23.52923
Mid_range : 11.77931
Median : 11.64399
Q1 : 10.28502
Q2 : 11.64399
Q3 : 13.00343
IQR : 2.71841
C.V. : 0.17305
308
(36.2.4)The joint probability distribution of two random variables from 1,X2,X3,X4.
F(x1,x2) F(x2,x1)
309
f(x2,x3) f(x3,x2)
310
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.2001544589 0.0013811857 144.91495 0.00000
X2 0.9000407087 0.0000701843 12823.95337 0.00000
X3 0.2999729077 0.0000905484 3312.84656 0.00000
X4 -0.0100017702 0.0000282866 -353.58635 0.00000
----------------------------------------------------------------------------------
MSE=0.9999595451 , R2=0.782856 , R2(adj)=0.782856
dependent variable:X1 , sample mean= 13.1800095007 , sample variance=4.605060
independent variable:X2 , sample mean=11.6440934995 , sample variance=4.060056
independent variable:X3 , sample mean=10.0002062052 , sample variance=4.001183
independent variable:X4 , sample mean=50.0005542746 , sample variance=41.000300
311
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.999631 , 1.000289]
90% confidence interval for population standard deviation [0.999815 , 1.000144]
95% confidence interval for population variance [0.999568 , 1.000352]
95% confidence interval for population standard deviation [0.999784 , 1.000176]
99% confidence interval for population variance [0.999445 , 1.000475]
99% confidence interval for population standard deviation [0.999722 , 1.000237]
The joint probability distribution of X1 The joint probability distribution of X1
estiamted line and residual estiamted line and X1
312
(36.2.5) X 2 (t + 1) = 0.1 + 0.8 X 1 (t ) + 0.2 X 3 (t ) − 0.02 X 4 (t ) + ε 1 (t ),
there are 50,000,000 pair samples.
The linear model analysis
Dependent variable is X2,
Independent variables are X1,X3,X4
The correlation matrix is below
r(X2,X1)=0.852045,r(X2,X3)=0.158640,r(X2,X4)=0.060304,r(X1,X3)=-0.000222,
r(X1,X4)=-0.000188,r(X3,X4)=0.624718,
The estimated line is X2=0.098942+0.800069*X1+0.200046*X3+-0.020005*X4
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
Regression 3 152997438.5548028300 50999146.1849342810
error 49999996 50005342.8307363090 1.0001069366
total 49999999 203002781.3855391400
----------------------------------------------------------------------------------
F test value=50993693.0915864330,
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.0989422760 0.0014125169 70.04679 0.00000
X1 0.8000688390 0.0000659053 12139.66651 0.00000
X3 0.2000458617 0.0000905696 2208.75324 0.00000
X4 -0.0200050845 0.0000282883 -707.18705 0.00000
----------------------------------------------------------------------------------
MSE=1.0001069366 , R2=0.753672 , R2(adj)=0.753672
dependent variable:X2 , sample mean=11.6440934995 , sample variance=4.060056
independent variable:X1 , sample mean=13.1800094374 , sample variance=4.605060
independent variable:X3 , sample mean= 10.0002104649 , sample variance=3.999224
independent variable:X4 , sample mean=50.0005057795 , sample variance=40.994753
313
number of the negative of residual=24997776
number of the positive ofresidual=25002224
H0: residualis random , H1: Increasing line or decreasing line
Z=0.491071, p-value=0.688400
H0: residual is random , H1: Oscillation
Z=0.491071, p-value=0.311600
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.491071, p-value=0.623200
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,50000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.999954
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.000046
314
X0= residual,residual mariginal probability distribution
Mathematical Mean: 0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00011
S.D. : 1.00005
Skewed Coef. : -0.00004
Kurtosis Coef. : 2.99985
MAD : 0.79790
Range : 11.07731
Mid_range : -0.24130
Median : 0.00011
Q1 : -0.67440
Q2 : 0.00011
Q3 : 0.67462
IQR : 1.34903
C.V. : none
315
Chaper 7. Multi-variate analysis using linear model
The multi-variate analyisis is vey complex, for big data, the linear model analysis will
do the job of the multi-varaiate analysis.
The method is select one variable from X 1 ,...., X k which is dependent variable and
the other variables are independent variable. The number of line model is
k k − 1 k − 2 k − 1
× + + ... +
−
( )
= k × 2 k − 1 ,
1 1 1 k 1
From the correlation matrix can get the relationship between any two random
variables.
Non-linear model also can be running, the non-linear formula is in appendix 3. There
are has 33 kinds of model, the number of line model is
k k − 1 k − 2 k − 1
× × 33 + × 332 + ... + (
× 33k = k × 34 k − 1 ,)
1 1 1 k − 1
Example 37,
(1) The population distribution of sample data,
X1~Shifted exponential(1,0.1),
X2|x1~Normal(4+5*log(x1),4),
X3|x1~Raised cosine(5+x1+log(x1),2),
X4|x1,x2~Semi circle(3+0.5*x1+0.5*x2,4),
X5|x2,x3~Arcsin(4.5+0.3*x2+0.7*x3,3),
X6|x4,x5~DE(0.5,10+2*x4*x5),
2π
f X 5 x2 , x3 (x5 x 2 , x3 ) = , x5 − (4.5 + 0.3 x 2 + 0.7 x3 ) < 3,
1 1
π (x5 − (4.5 + 0.3x2 + 0.7 x3 ))2
1−
9
f X 6 x4 , x5 (x6 x 4 , x5 ) = exp(− 0.5 x6 − (10 + 2 x 4 x5 ) ),−∞ < x6 < ∞
1
4
316
(1.2)There are simulating 100000000 data of each random variable,
(2) .The marigainl probability distribution and join probability distribution from the
sample data,
(2.1)The marigainl probability distribution,
f(x1),F(x1) Coefficient
Mathematical Mean: 1.09989
Geometrical Mean : 0.74967
Harmonic Mean : 0.49628
Variance : 1.00023
S.D. : 1.00011
Skewed Coef. : 2.00104
Kurtosis Coef. : 9.01081
MAD : 0.73579
Range : 17.49852
Mid_range : 8.84926
Median : 0.79294
Q1 : 0.38758
Q2 : 0.79294
Q3 : 1.48601
IQR : 1.09843
C.V. : 0.90928
f(x2),F(x2) Coefficient
Mathematical Mean: 2.55969
Geometrical Mean : none
Harmonic Mean : none
Variance : 24.76556
S.D. : 4.97650
Skewed Coef. : -0.11179
Kurtosis Coef. : 2.53399
MAD : 4.06678
Range : 40.40991
Mid_range : 3.54517
Median : 2.77224
Q1 : -0.97310
Q2 : 2.77224
Q3 : 6.18263
IQR : 7.15573
C.V. : 1.94418
f(x3),F(x3) Coefficient
Mathematical Mean: 5.81227
Geometrical Mean : 5.47763
Harmonic Mean : 5.13824
Variance : 3.95023
S.D. : 1.98752
Skewed Coef. : 0.71585
Kurtosis Coef. : 3.86991
MAD : 1.56641
Range : 26.34688
Mid_range : 14.00667
Median : 5.59331
Q1 : 4.37840
Q2 : 5.59331
Q3 : 6.99343
IQR : 2.61503
C.V. : 0.34195
317
f(x4),F(x4) Coefficient
Mathematical Mean: 4.83042
Geometrical Mean : none
Harmonic Mean : none
Variance : 12.43939
S.D. : 3.52695
Skewed Coef. : 0.07494
Kurtosis Coef. : 2.75477
MAD : 2.84745
Range : 31.39992
Mid_range : 7.55374
Median : 4.79539
Q1 : 2.35577
Q2 : 4.79539
Q3 : 7.26191
IQR : 4.90614
C.V. : 0.73015
f(x5),F(x5) Coefficient
Mathematical Mean: 9.33692
Geometrical Mean : none
Harmonic Mean : none
Variance : 12.08518
S.D. : 3.47638
Skewed Coef. : 0.13754
Kurtosis Coef. : 2.73443
MAD : 2.81281
Range : 32.18093
Mid_range : 14.25918
Median : 9.26423
Q1 : 6.87166
Q2 : 9.26423
Q3 : 11.73394
IQR : 4.86228
C.V. : 0.37233
f(x6),F(x6) Coefficient
Mathematical Mean: 115.76089
Geometrical Mean : none
Harmonic Mean : none
Variance : 9568.44490
S.D. : 97.81843
Skewed Coef. : 1.24458
Kurtosis Coef. : 5.20013
MAD : 76.04563
Range : 1535.85616
Mid_range : 680.08533
Median : 94.68323
Q1 : 42.22852
Q2 : 94.68323
Q3 : 167.12280
IQR : 124.89428
C.V. : 0.84500
318
(2.2)The jont probability distribution, it can explains the relationship of two random
variables and estimates the mathematical equaiton of each other.
f(x1,x2) f(x2,x1)
Var(X2|X1) Var(X1|X2)
319
E(X1)= 1.1000, Var(X1)= 1.0003, E(X3)= 5.8121, Var(X3)= 3.9513,
Cov(X1,X3)= 1.7989, X1 and X3 correlation coefficient=0.9049.
E(X3|X1) E(X1|X3)
Var(X3|X1) Var(X1|X3)
f(x1,x4) f(x4,x1)
320
Var(X4|X1) Var(X1|X3)
Var(X5|X1) Var(X1|X5)
321
f(x1,x6) f(x6,x1)
Var(X6|X1) Var(X1|X6)
322
E(X3|X2) E(X2|X3)
Var(X3|X2) Var(X2|X3)
f(x2,x4) f(x4,x2)
E(X4|X2) E(X2|X4)
323
Var(X4|X2) Var(X2|X4)
f(x2,x5) f(x5,x2)
Var(X5|X2) Var(X2|X5)
324
f(x2,x6) f(x6,x2)
Var(X6|X2) Var(X2|X6)
f(x3,x4) f(x4,x3)
325
E(X4|X3) E(X3|X4)
Var(X4|X3) Var(X3|X4)
f(x3,x5) f(x5,x3)
326
Var(X5|X3) Var(X3|X5)
f(x3,x6) f(x6,x3)
Var(X6|X3) Var(X3|X6)
327
f(x4,x5) f(x5,x4)
Var(X6|X3) Var(X3|X6)
f(x4,x6) f(x5,x4)
328
E(X6|X4) E(X4|X6)
Var(X6|X4) Var(X4|X6)
f(x5,x6) f(x6,x5)
329
Var(X6|X5) Var(X5|X6)
330
0.03292626918224489400*(X- -4.61901000819476600000)^1+
0.00318823596547814990*(X- -4.61901000819476600000)^2+
-0.00014109893893810010*(X- -4.61901000819476600000)^3+
value range 0.0750000100<=F(x)<= 0.1000000000 ,
value range -5.0090375273<=X<= -4.2479248639 ,
Error=0.000000005425321957 MAX=0.000007878337998812 coefficient of
determination=0.999999872082410370,
331
The comparison of estimated line and The simulated data of estimated line.
the sample data
332
X5 cumulative probability distribution function estimated line,
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.7821129684 0.0001074185 7280.98828 0.00000
X2^3 0.0016446109 0.0000002029 8104.40169 0.00000
333
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -0.2336437812 0.0001733818 -1347.56772 0.00000
X3^2 0.0353467151 0.0000037539 9416.03253 0.00000
----------------------------------------------------------------------------------
Dependent variable is X1,
Independent variables are X4^2,
The correlation matrix is below
r(X1,X4^2)=0.748223,
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.4079068260 0.0001362092 2994.70836 0.00000
X4^2 0.0193501435 0.0000025856 7483.77468 0.00000
----------------------------------------------------------------------------------
334
----------------------------------------------------------------------------------
Regression 1 57630966.6931709500 57630966.6931709500 135889024.4768352500
error 99999998 42410316.6259581600 0.4241031747
total 99999999 100041283.3191291100
----------------------------------------------------------------------------------
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.3504015040 0.0001405365 2493.31298 0.00000
X5^3 0.0006471917 0.0000000853 7591.50622 0.00000
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.0980431977 0.0001573498 623.09062 0.00000
|X6| 0.0085615933 0.0000010381 8247.65891 0.00000
----------------------------------------------------------------------------------
Individual test
335
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -0.0829703252 0.0002037633 -407.18983 0.00000
X2^3 0.0004429525 0.0000003147 1407.59773 0.00000
X3^2 0.0290840146 0.0000058213 4996.16585 0.00000
----------------------------------------------------------------------------------
,…………………………………………………….,
336
(4.1).The result of non-line model analysis,
Conclusion,
X1=-0.2336437812+0.035347*X3^2
MSE=0.1137961501 , R2=0.886251
X1=-0.0829703252+0.000443*X2^3+0.029084*X3^2
MSE=0.0939828370 , R2=0.906056
X1=-0.1275426498+0.000342*X2^3+0.026851*X3^2+0.001267*|X6|
MSE=0.0896129611 , R2=0.910424
X1=-0.1373223672+0.000349*X2^3+0.026895*X3^2+0.775916*exp(-1*X5)
+0.001283*|X6|
MSE=0.0888607882 , R2=0.911176
X1=-0.1602518180+0.000358*X2^3+0.027103*X3^2+0.014863*|X4|
+0.782158*exp(-1*X5)+0.000753*|X6|
MSE=0.0885592540 , R2=0.911477
X2=4.0003967634+5.000141*log(X1)
MSE=3.9996024754 , R2=0.838560
X2=0.1120553209+3.577348*log(X1)+0.353141*|X6|^0.5
MSE=3.1874576509 , R2=0.871342
X2=-1.4826502464+3.361569*log(X1)+0.372298*X4+1.072416*|X5|^0.5
MSE=3.0293903187 , R2=0.877722
X2=0.5136191676+3.781916*log(X1)+-0.012885*X3^2+-0.026204*exp(-1*X4)
+0.370138*|X6|^0.5
MSE=3.1004901213 , R2=0.874852
X2=0.5117732688+3.728065*log(X1)+-0.011873*X3^2+-0.023205*exp(-1*X4)
+-4.096719*exp(-1*X5)+0.367254*|X6|^0.5
MSE=3.0812170616 , R2=0.875630
X2=4.0003967634+5.000141*log(X1)+residual,
X3=1.6751858508+4.320531*|X1|^0.5
MSE=0.5308059044 , R2=0.865680
X3=1.4276330468+3.886705*|X1|^0.5+0.071003*X5
MSE=0.5043594406 , R2=0.872372
X3=1.4821101114+3.840865*|X1|^0.5+0.068103*X5+0.000000*X6^3
MSE=0.5028855871 , R2=0.872745
X3=1.2990829172+4.023415*|X1|^0.5+-0.020896*X2+0.074984*X5
+0.000000*X6^3
MSE=0.5007872003 , R2=0.873276
X3= 1.3199505815+4.039275*|X1|^0.5+-0.017140*X2+-0.001077*X4^2
+0.073466*X5+0.000000*X6^3
MSE=0.5003167932 , R2=0.873395
X3=1.6751858508+4.320531*|X1|^0.5+residual,
337
X4= -2.4956606911+0.743695*|X6|^0.5
MSE= 1.3763692986 , R2=0.889346
X4=2.9997406458+0.500452*X1+0.499848*X2
MSE= 4.0000428815 , R2=0.678415
X4=-2.9643515908+-0.020698*X5^2+0.999850*|X6|^0.5
MSE=0.6820507077 , R2=0.945166
X4= -2.5149584784+-0.091872*X1+-0.069399*|X2|+0.788415*|X6|^0.5
MSE=1.3313077787 , R2=0.892969
X4=-2.6089171999+0.000672*X2^3+-0.022134*X5^2+0.965048*|X6|^0.5
MSE=0.6309448683 , R2=0.949275
X4=-2.4564205981+0.000676*X2^3+-4.366075*exp(-1*X3)+-0.022272*X5^2
+0.956085*|X6|^0.5
MSE= 0.6240742427 , R2=0.949827
X4=-1.9250723755+-0.119939*exp(-X1)/X1+0.000684*X2^3+-0.179871*log(X3)
+-0.022005*X5^2+0.941606*|X6|^0.5
MSE=0.6087417797 , R2=0.951060
X4=2.9997406458+0.500452*X1+0.499848*X2+residual,
X4=-2.9643515908+-0.020698*X5^2+0.999850*|X6|^0.5+residual,
X5=3.1081856134+0.632310*|X6|^0.5,
MSE=4.0908530177 , R2=0.661564
X5=-1.2355589466+2.599183*|X3|^0.5+0.446347*|X6|^0.5
MSE=3.6556632584 , R2=0.697567
X5=4.5011708433+0.300020*X2+0.699824*X3
MSE=4.5002575962 , R2=0.627694
X5= -0.5019385434+0.047599*X2+2.354357*|X3|^0.5+0.418552*|X6|^0.5
MSE=3.6444578101 , R2=0.698494 , R2(adj)=0.698494
X5=0.6365520060+0.006178*X2^2+0.268705*X3+-1.335977*|X4|
+1.393435*|X6|^0.5
MSE=1.6958721421 , R2=0.859701
X5=0.9176472304+-0.047826*exp(-X1)/X1+0.007338*X2^2+0.240705*X3
+-1.333907*|X4|+1.383227*|X6|^0.5
MSE=1.6931528079 , R2=0.859926
X5=4.5011708433+0.300020*X2+0.699824*X3+residual,
X5=0.6365520060+0.006178*X2^2+0.268705*X3+-1.335977*|X4|
+1.393435*|X6|^0.5+residual
338
X6=32.0137473116+2.3420605999*X4^2
MSE=1365.6656152749 , R2=0.857305
X6=-4.3751636672+1.679077*X4^2+0.605495*X5^2
MSE=305.8895986398 , R2=0.968038
X6=0.4116674758+1.712298*X2+1.580176*X4^2+0.548742*X5^2
MSE=281.3557885288 , R2=0.970602
X6=1.1557635081+1.783814*X2+6.690695*1/X3+1.579208*X4^2+0.549917*X5^2
MSE=281.2575403898 , R2=0.970612
X6=1.0377658589+-1.511670*exp(-X1)/X1+1.424571*X2+34.050383*exp(-1*X3)
+1.589610*X4^2+0.552997*X5^2
MSE= 279.4893135064 , R2=0.970797
X6=32.0137473116+2.3420605999*X4^2
339
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
X4*X5 and residual joint pdf X6 estimated line andX6 joint pdf
340
Mathematical Mean: 2.55947
Geometrical Mean : none
Harmonic Mean : none
Variance : 24.77917
S.D. : 4.97787
Skewed Coef. : -0.11148
Kurtosis Coef. : 2.53310
MAD : 4.06800
Range : 40.28849
Mid_range : 3.33694
Median : 2.77110
Q1 : -0.97484
Q2 : 2.77110
Q3 : 6.18374
IQR : 7.15858
C.V. : 1.94488
Comaprsion of the cumulative probability distribution function of X2 and X3,
Note:X3 is the estimated line of X2.
E(| X2 distribution F() - X3 distribution F()|^2)= 0.0000000047
Pr(| X2 distribution F() - X3 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0001000000)= 0.131138
X3 simulating data,X3=1.6751858508+4.320531*|X1|^0.5+residual,
X1~Shifted exponential(1,0.1),residual~f:\\test07_data_caseXX\\X3_residual.txt,
f(x3),F(x3) Coefficient
Mathematical Mean: 5.81173
Geometrical Mean : 5.47772
Harmonic Mean : 5.14477
Variance : 3.95284
S.D. : 1.98817
Skewed Coef. : 0.65567
Kurtosis Coef. : 3.36698
MAD : 1.58994
Range : 19.50163
Mid_range : 10.68130
Median : 5.56123
Q1 : 4.33543
Q2 : 5.56123
Q3 : 7.03998
IQR : 2.70455
C.V. : 0.34210
Comaprsion of the cumulative probability distribution function of X3 and X4,
Note:X4 is the estimated line of X3.
E(| X3 distribution F() - X4 distribution F()|^2)= 0.0000388044
Pr(| X3 distribution F() - X4 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0100000000)= 0.082715
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0050000000)= 0.546509
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0010000000)= 0.911276
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0005000000)= 0.956713
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0001000000)= 0.991513
X4 simulating data,X1~shifted exponential(1,0.1),
X2=4.0003967634+ 5.0001407941*log(X1)+residual,
X1~Shifted exponential(1,0.1),residual~f:\\test07_data_caseXX\\X2_residual.txt,
X4=2.9997406458+0.500452*X1+0.499848*X2+residual,
residual~f:\\test07_data_caseXX\\X4_residual.txt,
f(x4),F(x4) Coefficient
341
Mathematical Mean: 4.82916
Geometrical Mean : none
Harmonic Mean : none
Variance : 12.44001
S.D. : 3.52704
Skewed Coef. : 0.07499
Kurtosis Coef. : 2.75482
MAD : 2.84760
Range : 33.02010
Mid_range : 7.56494
Median : 4.79414
Q1 : 2.35387
Q2 : 4.79414
Q3 : 7.26105
IQR : 4.90718
C.V. : 0.73036
Comaprsion of the cumulative probability distribution function of X4 and X5,
Note:X5 is the estimated line of X4.
E(| X4 distribution F() - X5 distribution F()|^2)= 0.0000000067
Pr(| X4 distribution F() - X5 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X4 distribution F() - X5 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X4 distribution F() - X5 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X4 distribution F() - X5 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X4 distribution F() - X5 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X4 distribution F() - X5 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X4 distribution F() - X5 distribution F()|>= 0.0001000000)= 0.259279
X2=4.0003967634+ 5.0001407941*log(X1)+residual,
X1~Shifted exponential(1,0.1),residual~f:\\test07_data_caseXX\\X2_residual.txt,
X3=1.6751858508+4.320531*|X1|^0.5+residual,
X1~Shifted exponential(1,0.1),residual~f:\\test07_data_caseXX\\X3_residual.txt,
X5=4.5011708433+0.300020*X2+0.699824*X3+residual,
residual~f:\\test07_data_caseXX\\X5_residual.txt,
f(x5),F(x5) Coefficient
Mathematical Mean: 9.33610
Geometrical Mean : none
Harmonic Mean : none
Variance : 12.08803
S.D. : 3.47678
Skewed Coef. : 0.13458
Kurtosis Coef. : 2.67176
MAD : 2.81811
Range : 28.75054
Mid_range : 12.09812
Median : 9.25869
Q1 : 6.85920
Q2 : 9.25869
Q3 : 11.73911
IQR : 4.87991
C.V. : 0.37240
342
Pr(| X5 distribution F() - X6 distribution F()|>= 0.0001000000)= 0.952857
X6 simulating data,
X1~shifted exponential(1,0.1),
X2=4.0003967634+ 5.0001407941*log(X1)+residual,
X1~Shifted exponential(1,0.1),residual~f:\\test07_data_caseXX\\X2_residual.txt,
X3=1.6751858508+4.320531*|X1|^0.5+residual,
X1~Shifted exponential(1,0.1),residual~f:\\test07_data_caseXX\\X3_residual.txt,
X4=2.9997406458+0.500452*X1+0.499848*X2+residual,
residual~f:\\test07_data_caseXX\\X4_residual.txt,
X5=4.5011708433+0.300020*X2+0.699824*X3+residual,
residual~f:\\test07_data_caseXX\\X5_residual.txt,
X6=-4.3751636672+1.679077*X4^2+0.605496*X5^2+residual,
residual~f:\\test07_data_caseXX\\X6_residual.txt,
f(x6),F(x6) Coefficient
Mathematical Mean: 115.77931
Geometrical Mean : none
Harmonic Mean : none
Variance : 9247.97277
S.D. : 96.16638
Skewed Coef. : 1.43667
Kurtosis Coef. : 5.75711
MAD : 74.24751
Range : 1287.36189
Mid_range : 639.30587
Median : 90.67089
Q1 : 43.50364
Q2 : 90.67089
Q3 : 163.19796
IQR : 119.69432
C.V. : 0.83060
X6 simulating data,
X1~shifted exponential(1,0.1),
X2=4.0003967634+ 5.0001407941*log(X1)+residual,
X1~Shifted exponential(1,0.1),residual~f:\\test07_data_caseXX\\X2_residual.txt,
X3=1.6751858508+4.320531*|X1|^0.5+residual,
X1~Shifted exponential(1,0.1),residual~f:\\test07_data_caseXX\\X3_residual.txt,
343
X4=2.9997406458+0.500452*X1+0.499848*X2+residual,
residual~f:\\test07_data_caseXX\\X4_residual.txt,
X5=4.5011708433+0.300020*X2+0.699824*X3+residual,
residual~f:\\test07_data_caseXX\\X5_residual.txt,
X6=10.000307+1.999999*X4*X5+residual,
residual~f:\\test07_data_caseXX\\X6_residual_spc.txt,
f(x6),F(x6) Coefficient
Mathematical Mean: 115.77414
Geometrical Mean : none
Harmonic Mean : none
Variance : 9548.69469
S.D. : 97.71742
Skewed Coef. : 1.21234
Kurtosis Coef. : 4.96883
MAD : 76.18625
Range : 1296.86566
Mid_range : 556.56971
Median : 94.54750
Q1 : 42.12351
Q2 : 94.54750
Q3 : 167.45398
IQR : 125.33048
C.V. : 0.84403
344
Appendix 1. The common probability distributions
1)Uniform distribution,
f X (x ) =
1
, α ≤ x ≤ β ,−∞ < α < β < ∞,
X ~ U (α , β ) β −α
2) Normal distribution, (x − µ )2
f X (x ) =
1
(
X ~ N µ ,σ 2 ) 2π σ
exp −
2 × σ
,−∞ < x < ∞
2
− ∞ < µ < ∞, σ > 0,
3)Shifted exponential distribution, f X ( x ) = λ exp(− λ ( x − c )), c < x < ∞
X ~ Shifted _ exp onential (λ , c ) − ∞ < c < ∞, λ > 0,
4)Pareto1 distribution, x λ −1
X ~ Pareto1(λ , c ) f X (x ) = λ × ,0 < x < c, λ > 0, c > 0,
cλ
5)Pareto2 distribution, cλ
X ~ Pareto2(λ , c ) f X ( x ) = λ λ +1 , c < x < ∞, λ > 0, c > 0,
x
6)Rayleigh distribution, ( )
f X ( x ) = 2λ × ( x − c ) × exp − λ ( x − c ) , c < x < ∞
2
X ~ Rayleigh(λ , c )
λ > 0, c > 0,
λ
exp(− λ x − µ ),−∞ < x < ∞
7)Double exponential distribution,
f X (x ) =
X ~ DE (λ , µ ) 2
− ∞ < µ < ∞, λ > 0,
8)Lognormal distribution (ln ( x ) − µ )2
f X (x ) =
1
(
X ~ Log _ normal µ , σ 2 ) 2π σx
exp −
2σ 2
,0 < x < ∞,
− ∞ < µ < ∞, σ > 0,
9)Gamma distribution x α −1 x
X ~ Gamma(α , β ) f X (x ) = exp − ,0 < x < ∞, α , β > 0,
Γ(α )β α
β
Γ( ) : gamma function ,
10)Beta distribution Γ(α + β ) α −1
f X (x ) = x (1 − x ) ,0 < x < 1
β −1
X ~ Beta(α , β ) Γ(α )Γ(β )
α , β > 0, Γ( ) : gamma function ,
11)Cauchy distribution σ
f X (x ) = ×
1
,−∞ < x < ∞,
X ~ Cauchy (µ , σ ) π (x − µ )2 + σ 2
σ > 0,−∞ < µ < ∞,
12)Arcsin distribution
f (x ) =
1 1
, x − µ < c,
X ~ Arc sin (µ , c ) π ( x − µ)
2
1−
c2
− ∞ < µ < ∞, c > 0,
345
13)Gumbel distribution − x−µ
x−µ − e σ
X ~ Gumbel (µ , σ ) f X (x ) =
1
e
−
σ
e
,−∞ < x < ∞,
σ
− ∞ < µ < ∞, σ > 0,
14) Triangular 1 distribution x − µ 1
X ~ Triangular1(µ , c ) × ,−c + µ < x < µ + c
f ( x ) = c c ,
0, otherwise
− ∞ < µ < ∞, c > 0,
15)Trapezoid distribution f X (x ) =
X ~ Trapezoid (µ , c ) 1.5c + x − µ
, µ − 1.5c < x < µ − 0.5c
2c 2
1 , µ − 0.5c < x < µ + 0.5c
2c ,
1.5c − x + µ
, µ + 0.5c < x < µ + 1.5c
2c 2
0, otherwie
− ∞ < µ < ∞, c > 0,
16)U-quadratic distribution f X (x ) = α (x − β ) , a ≤ x ≤ b,−∞ < a < b < ∞,
2
X ~ U _ quadratic(a, b )
a+b 12
β= ,α = ,
2 (b − a )3
f X ( x ) = 2 R 2 − ( x − µ ) , x − µ ≤ R,
17) Wingner semicircle distribution 2 2
X ~ Semi _ circle(µ , R ) πR
− ∞ < µ < ∞, R > 0,
18) Logisitic distribution −
( x−µ )
X ~ Logistic (µ , σ )
σ
f X (x ) =
e 1
× ,−∞ < x < ∞,
−
( x−µ )
2
σ
1 + e σ
− ∞ < µ < ∞, σ > 0,
19)Weibull distribution γ −1
x − α γ
X ~ Weibull (α , β , γ ) x −α
f X (x ) = γ × × × exp −
1
β β β
, x > α , α > 0, β > 0, γ > 0,
20)Pareto3 distribution λ −1
x
f X ( x ) = λ 1 −
1
X ~ Pareto3(λ , c ) × ,0 < x < c
c c
λ > 0, c > 0
346
Appendix 2. The Curve-linear of linear model
analysis
Curve-linear analysis model,
( ) 2
( ) k
(1) X 2 = βˆ0 + βˆ1 × X 1 − X 1 + βˆ 2 × X 1 − X 1 + ... + βˆ k × X 1 − X 1 + εˆ, ( )
(2) X = βˆ + βˆ × X + βˆ × X 2 + ... + βˆ × X k + εˆ,
2 0 1 1 2 1 k 1
2 k
(3) X 2 = βˆ0 + βˆ1 × X 1 − X 1 + βˆ2 × X 1 − X 1 + ... + βˆk × X 1 − X 1 + εˆ,
1 1 1
(4) X 2 = βˆ0 + βˆ1 × + βˆ2 × 2 + ... + βˆk × k + εˆ,
X1 X1 X1
(5) X 2 = βˆ0 + βˆ1 × cos( X 1 ) + βˆ2 × cos ( X 1 ) + ... + βˆk × cos k ( X 1 ) + εˆ,
2
There two kinds selection criterion, one is the coefficient of determination and the
other is the MSE.
347
Appendix 3. The mathametical formula of
Non-linear model analyis,
There are 33 kinds model for analysis and the criterion is the coefficient of
determination.X2 is dependent variable and X1 is independent variable.
1. X2=b0+b1*X1
2. X2=b0+b1*X1^2
3. X2=b0+b1*X1^3
4. X2=b0+b1*Cos(X1*pi)
5. X2=b0+b1*Cos(2*X1*pi)
6. X2=b0+b1*Sin(X1*pi)
7. X2=b0+b1*Sin(2*X1*pi)
8. X2=b0+b1*Cos(X1*pi)*Sin(X1*pi)
9. X2=b0+b1*Cos(X1*pi)*Cos(X1*pi)
10. X2=b0+b1*Sin(X1*pi)*Sin(X1*pi)
11. X2=b0+b1*exp(X1)
12. X2=b0+b1*exp(-1*X1)
13. X2=b0+b1*log(X1)
14. X2=b0+b1/X1
15. X2=b0+b1*X1/(1-X1)
16. X2=b0+b1*X1*exp(X1)
17. X2=b0+b1*X1*exp(-1*X1)
18. X2=b0+b1*X1*Cos(X1*pi)
19. X2=b0+b1*X1*Sin(X1*pi)
20. X2=b0+b1*X1*Cos(X1*pi)*Cos(X1*pi)
21. X2=b0+b1*X1*Sin(X1*pi)*Sin(X1*pi)
22. X2=b0+b1*X1*X1*Cos(X1*pi)
23. X2=b0+b1*X1*X1*Sin(X1*pi)
24. X2=b0+b1*X1*X1*Cos(X1pi)*Cos(X1*pi)
25. X2=b0+b1*X1*X1*Sin(X1*pi)*Sin(X1*pi)
26. X2=b0+b1*X1*Cos(X1*pi)*Sin(X1*pi)
27. X2=b0+b1*X1*X1*Cos(X1*pi)*Sin(X1*pi)
28. X2=b0+b1*|X1|
29. X2=b0+b1*|X1|^0.5
30. X2=b0+b1*exp(X1)/X1
31. X2=b0+b1*exp(-X1)/X1
32. X2=b0+b1*exp(X1)*log(X1)
33. X2=b0+b1*exp(-X1)*log(X1)
348
Appendix 4. The limiting theory of cumulative
probability distribution function
Whether FX n ( xn ) is closed FX ( xn ) ,
FX n ( x ) ~ Uniform(0,1) ,
i)If the cdf of two random variables are different, FX ( x ) = 0,1 ,
[(
E FX n ( x ) − FX ( x ) = ,
2
)]
1
3
{ } {
P FX n ( x ) − FX (x ) ≥ 0.1 = 0.1, P FX n (x ) − FX (x ) ≥ 0.05 = 0.05, }
P{F (x ) − FX (x ) ≥ 0.01} = 0.01, P{FX (x ) − FX (x ) ≥ 0.05} = 0.05,
Xn n
Computation,
FX n ( xn ) is compuated in first and FX ( xn ) is gotten from the X probability
distribution, the data base of FX n (x ) − FX ( x ) is setting. The calculated the
[(
E FX n (x ) − FX ( x ) )]
2
{
and P FX n ( x ) − FX ( x ) ≥ ε . }
349
Appendix 5. An application of Dow Jones
Dow Jones industry index is additive measure and is not close range,
there are two case,
Case 1, data is 1999/7/27, 1999/7/28,……,2014/6/5,
Case 2, data is 1999/7/27, 1999/7/28,……,2015/5/12,
Data analysis,
350
b1 10.3479777556 0.2175345703 47.5693483532 0.0000000000
b2 0.0018183585 0.0009599880 1.8941471774 0.0582000000
b3 -0.0001052331 0.0000037129 -28.3421743413 0.0000000000
b4 -0.0000001721 0.0000000083 -20.7450146490 0.0000000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=1791, number of the positive ofresidual=1947
H0: residualis random , H1: Increasing line or decreasing line, Z=-51.413601, p-value=0.000000
H0: residual is random , H1: Increasing line or decreasing line or Oscillation, Z=-51.413601, p-value=0.000000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model, Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,3738
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0 D.W. test=0.067287
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0 D.W. test=3.932713
estimated line residual plot
351
0.00000000000000000000*(X--10.75798745619877500000)^14+
0.00000000000000000000*(X- -10.75798745619877500000)^15+
-0.00000000000000000000*(X- -10.75798745619877500000)^16+
-0.00000000000000000000*(X--10.75798745619877500000)^17+
-0.00000000000000000000*(X--10.75798745619877500000)^18+
-0.00000000000000000000*(X--10.75798745619877500000)^19+
-0.00000000000000000000*(X--10.75798745619877500000)^20+
0.00000000000000000000*(X--10.75798745619877500000)^21+
0.00000000000000000000*(X--10.75798745619877500000)^22+
0.00000000000000000000*(X--10.75798745619877500000)^23+
SSE=0.038058137174926572 MAX error=0.010738934771908681 coefficient of determination=0.999877822985879020
Durbin Watson model analysis will be applied whren the curve-linear analysis is
finished, the first order auto-regressive error is ε t +1 = ρε t + µ t +1 .From the output of
regression analyis ,0.067287=2-2 ρ , ρ = 0.96636,The data set is population , the auto
regressive correlation coefficient is population correation ceofficienf of AR(1), the
real MSE= MSE (regressive analysis) × 1 − ρ 2 ( )
= 228456.4468600020*(1-0.96636*0.96636)=15112.017098,
the esimtated population variance is 122.93094428, the value is removed the effect of
the first order auto-regressive error mode.
The SSE is the part of X1 cannot explain X5, the value is very huge. But using the
AR(1) analysis, the residual is µ (t).
µ (t)的機率分配,
Mathematical Mean: -0.11863
Geometrical Mean : none
Harmonic Mean : none
Variance : 15001.42664
S.D. : 122.48031
Skewed Coef. : -0.26761
Kurtosis Coef. : 7.53518
MAD : 86.93288
Range : 1639.51650
Mid_range : 87.62719
Median : 4.04876
Q1 : -61.15728
Q2 : 4.04876
Q3 : 66.57909
IQR : 127.73637
C.V. : none
352
µ (t) is close to double exponential distribution and | µ (t)| is shifted exponential
distribution. The exponential distribution has the memoryless property.
353
Case 2,
Dates are 1999/7/27, 1999/7/28,……,2015/5/12,
Each record has X2=open,X3=day high,X4=day low,X5=close,
X1=t, 1999/7/27=25001, 1999/7/28=25002,……
t=25001, 25002, 25003,….., 28973, is arithmetic series and time value,
3973 records is totally.
X5= Dow Jones industry index close index ,
(1999/7/28 close index),(1999/7/29 close index),…..,etc.
X1 esitmated the X5 using curve-linear analysis, the result is below,
The estimated line ------
X5= 13423.50612813327500000000+
6.28301955573260780000*(X1- 26987.00000000000000000000)^1+
-0.04408700016989541800*(X1- 26987.00000000000000000000)^2+
-0.00013050937590719514*(X1- 26987.00000000000000000000)^3+
0.00000017258416938094*(X1- 26987.00000000000000000000)^4+
0.00000000071248246365*(X1- 26987.00000000000000000000)^5+
-0.00000000000028123310*(X1- 26987.00000000000000000000)^6+
-0.00000000000000185997*(X1- 26987.00000000000000000000)^7+
0.00000000000000000019*(X1- 26987.00000000000000000000)^8+
0.00000000000000000000*(X1- 26987.00000000000000000000)^9+
0.00000000000000000000*(X1- 26987.00000000000000000000)^10+
-0.00000000000000000000*(X1- 26987.00000000000000000000)^11+
-0.00000000000000000000*(X1- 26987.00000000000000000000)^12+
0.00000000000000000000*(X1- 26987.00000000000000000000)^13+
0.00000000000000000000*(X1- 26987.00000000000000000000)^14+
-0.00000000000000000000*(X1- 26987.00000000000000000000)^15+
-0.00000000000000000000*(X1- 26987.00000000000000000000)^16+
0.00000000000000000000*(X1- 26987.00000000000000000000)^17+
0.00000000000000000000*(X1- 26987.00000000000000000000)^18+
-0.00000000000000000000*(X1- 26987.00000000000000000000)^19+
-0.00000000000000000000*(X1- 26987.00000000000000000000)^20+
0.00000000000000000000*(X1- 26987.00000000000000000000)^21+
0.00000000000000000000*(X1- 26987.00000000000000000000)^22+
-0.00000000000000000000*(X1- 26987.00000000000000000000)^23+
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
regression 23 21965463556.1898500000 955020154.6169500400 4296.8688614340
error 3949 877702976.7959175100 222259.5535061832
total 3972 22843166532.9857670000
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE= 222259.5535061832 , R2=0.961577 , R2(adj)=0.961353
X5(Mean)= 11601.3823533854, X5(Var)= 5751048.9760789946, X5(sd)= 2398.1344783141
X1(Mean)= 26987.0000000000, X1(Var)= 1315725.1666666667, X1(sd)= 1147.0506382312
------------------- individual test -------------------------
parameter coefficient standard error t test p value
----------------------------------------------------------------------------------
b0 13423.5061281333 28.4893013675 471.1770904797 0.0000000000
b1 6.2830195557 0.2154666082 29.1600615402 0.0000000000
b2 -0.0440870002 0.0008499905 -51.8676398765 0.0000000000
b3 -0.0001305094 0.0000035030 -37.2565163629 0.0000000000
b4 0.0000001726 0.0000000071 24.4549886281 0.0000000000
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ]
lower limit -652.13595 -456.20123 -317.97271 -203.09431 -99.25938 -0.10671
99.15823 202.94871 317.95409 455.99928 651.79075
upper limit -652.13595 -456.20123 -317.97271 -203.09431 -99.25938 -0.10671 99.15823
202.94871 317.95409 455.99928 651.79075
observed no 345.00000 271.00000 272.00000 294.00000 318.00000 371.00000 398.00000
432.00000 365.00000 350.00000 242.00000 315.00000
probability 0.08333 0.08333 0.08333 0.08333 0.08333 0.08333 0.08333
0.08333 0.08333 0.08333 0.08333 0.08333
expected no 331.08333 331.08333 331.08333 331.08333 331.08333 331.08333 331.08333
331.08333 331.08333 331.08333 331.08333 331.08333
chi square 0.58497 10.90362 10.54369 4.15356 0.51701 4.81251 13.52481
30.76015 3.47447 1.08082 23.96931 0.78129
354
degree of freedom=10
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =105.106217 p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=1872
number of the positive ofresidual=2101
H0: residualis random , H1: Increasing line or decreasing line Z=-53.551452, p-value=0.000000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t), t=2,3,...,3973
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0 D.W. test=0.069523
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0 D.W. test=3.930477
estimated line residual plot
Durbin Watson model analysis will be applied whren the curve-linear analysis is
finished, the first order auto-regressive error is ε t +1 = ρε t + µ t +1 .From the output of
regression analyis, 0.069523=2-2 ρ , ρ = 0.9652385,The data set is population , the
auto regressive correlation coefficient is population correation ceofficienf of AR(1),
the real MSE= MSE (regressive analysis) × 1 − ρ 2 ( )
= 222259.5535061832*(1-0.9652385*0.9652385)=15183.5809664,
the esimtated population variance is 123.221674093 the value is removed the effect of
the first order auto-regressive error mode.
The SSE is the part of X1 cannot explain X5, the value is very huge. But using the
AR(1) analysis, the residual is µ (t).
µ (t) probability distribution,
Mathematical Mean: 0.14576
Geometrical Mean : none
Harmonic Mean : none
Variance : 15096.54665
S.D. : 122.86800
Skewed Coef. : -0.26505
Kurtosis Coef. : 7.23183
MAD : 87.50202
Range : 1638.51426
Mid_range : 87.21567
Median : 4.79757
Q1 : -61.63347
Q2 : 4.79757
Q3 : 66.47271
IQR : 128.10618
C.V. : 842.94913
355
Left diagram is comparison of the
estimated line and the real sample data.
(2)The analysis of data set that the new inputting of two cases,
The estimated line ------
X5= 17779.29496671001100000000+
6.75789595209062100000*(X1- 28856.00000000000000000000)^1+
-2.21204850418814660000*(X1- 28856.00000000000000000000)^2+
0.09267482485302025500*(X1- 28856.00000000000000000000)^3+
0.00270808698662494680*(X1- 28856.00000000000000000000)^4+
-0.00015703847732595477*(X1- 28856.00000000000000000000)^5+
-0.00000160841359467799*(X1- 28856.00000000000000000000)^6+
0.00000010570796549203*(X1- 28856.00000000000000000000)^7+
0.00000000058491599096*(X1- 28856.00000000000000000000)^8+
-0.00000000003815047667*(X1- 28856.00000000000000000000)^9+
-0.00000000000013969470*(X1- 28856.00000000000000000000)^10+
0.00000000000000831085*(X1- 28856.00000000000000000000)^11+
0.00000000000000002202*(X1- 28856.00000000000000000000)^12+
-0.00000000000000000115*(X1- 28856.00000000000000000000)^13+
-0.00000000000000000000*(X1- 28856.00000000000000000000)^14+
0.00000000000000000000*(X1- 28856.00000000000000000000)^15+
0.00000000000000000000*(X1- 28856.00000000000000000000)^16+
-0.00000000000000000000*(X1- 28856.00000000000000000000)^17+
-0.00000000000000000000*(X1- 28856.00000000000000000000)^18+
0.00000000000000000000*(X1- 28856.00000000000000000000)^19+
0.00000000000000000000*(X1- 28856.00000000000000000000)^20+
-0.00000000000000000000*(X1- 28856.00000000000000000000)^21+
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
regression 21 59071918.0792927290 2812948.4799663206 94.7949695660
error 213 6320567.7366195917 29674.0269324863
total 234 65392485.8159123210
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE= 29674.0269324863 , R2=0.903344 , R2(adj)=0.893815
X5(Mean)= 17402.8388936170, X5(Var)= 279455.0675893689, X5(sd)= 528.6350987112
X1(Mean)= 28856.0000000000, X1(Var)= 4621.6666666667, X1(sd)= 67.9828409723
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,235
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0 D.W. test=0.639414
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0 D.W. test=3.360586
the auto regressive correlation coefficient is population correation ceofficienf of
AR(1), 0.639414=2-2 ρ , ρ = 0.680293,MSE= 126.257395131,
µ (t) probability distribution
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ]
lower limit -123.53180 -59.66444 -22.30443 4.20292 30.71027 68.07028
131.93764
upper limit -123.53180 -59.66444 -22.30443 4.20292 30.71027 68.07028 131.93764
356
observed no 32.00000 30.00000 31.00000 24.00000 23.00000 29.00000 40.00000
25.00000
probability 0.12500 0.12500 0.12500 0.12500 0.12500 0.12500 0.12500
0.12500
expected no 29.25000 29.25000 29.25000 29.25000 29.25000 29.25000 29.25000
29.25000
chi square 0.25855 0.01923 0.10470 0.94231 1.33547 0.00214 3.95085
0.61752
degree of freedom=5
H0: X1~Double exponential(lamda,mu), lamda,mu are unknown
lamda point estimated value=0.010853 (MLE)
mu point estimated value=4.202921 (MLE)
pearson chi-square test statistic =7.230769, p-value=0.204000
357
(9.3.3)The estimated line is updated each day,
The estimated line will be re-esimtated when the new date close index is happened.
date Close inex(A) Esimtated close residual(A-B)
index (B)
2014-06-06 16924.28 16941.10026 -16.82025508
2014-06-09 16943.10 16907.17784 35.92215678
2014-06-10 16945.92 16949.50737 -3.58736874
2014-06-11 16843.88 16814.06412 29.81587580
2014-06-12 16734.19 16737.10220 -2.91219893
2014-06-13 16775.74 16795.82379 -20.08378979
2014-06-16 16781.01 16798.94964 -17.93964378
2014-06-17 16808.49 16804.96515 3.52485263
2014-06-18 16906.62 16931.36683 -24.74683243
2014-06-19 16921.46 16915.13261 6.32738863
2014-06-20 16947.08 16948.01616 -0.93616120
2014-06-23 16937.26 16936.82191 0.43808744
358
Appendix 6. The estimation of Cos model analysis
(
appendix 6.1) X 1 ~ Normal µ X1 = 0,σ X2 1 = 2 2 , )
the population conditional expectation line is
( )
E X 2 x1 = β 0 + β1 cos( x1π ) = 1 + 2 cos( x1π ), ε ~ Normal 0,σ 2 = 1 , ( )
(1)paird samples, n=1000,
(1.1)Basic analysis
scatter diagram scatter diagram using the linear model
359
(1.4)
X 2i = β 0 + β1 X 1i + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope, ε i is error,
(1.4.1)
The linear mdoel analysis
The estimated line is X2=0.923463+-0.020511*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 1.7163094683 1.7163094683 0.5523661436
error 998 3100.9808786213 3.1071952692
total 999 3102.6971880896
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.458800
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.9234629152 0.0557439800 16.56615 0.00000
slpoe -0.0205110272 0.0275977632 -0.74321 0.45740
----------------------------------------------------------------------------------
MSE=3.1071952692 , R2=0.000553 , R2(adj)=-0.000448
X2(mean)= 0.9237919833, X2(variance)= 3.1058029911, X2(s.d.)= 1.7623288544
X1(mean)= -0.0160434741, X1(variance)= 4.0837137248, X1(s.d.)= 2.0208200625
SSX1=4079.6300111218 , SS(X2*X1)= -83.6774020592, C.V.= 1.9081393352
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
[ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ]
lower limit -2.25910 -1.48353 -0.92433 -0.44653 0.00004
0.44654 0.92433 1.48279 2.25892
upper limit -2.25910 -1.48353 -0.92433 -0.44653 0.00004 0.44654
0.92433 1.48279 2.25892
observed no 103.00000 127.00000 103.00000 93.00000 85.00000
80.00000 78.00000 97.00000 130.00000 104.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000 100.00000 100.00000
chi square 0.09000 7.29000 0.09000 0.49000 2.25000
4.00000 4.84000 0.09000 9.00000 0.16000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =28.300000
p-value=0.000400
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=511
number of the positive ofresidual=489
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.364527, p-value=0.357800
360
H0: residual is random , H1: Oscillation
Z=-0.364527, p-value=0.642200
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.364527, p-value=0.715600
(1.4.2)residual analysis
X0=residual,residual frequency distribution table,
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -4.87791~ -3.73037 -4.30414 6.00000 0.0060000 0.0060000
[ 2 ] -3.73037~ -2.58282 -3.15660 64.00000 0.0640000 0.0700000
[ 3 ] -2.58282~ -1.43528 -2.00905 169.00000 0.1690000 0.2390000
[ 4 ] -1.43528~ -0.28774 -0.86151 213.00000 0.2130000 0.4520000
[ 5 ] -0.28774~ 0.85981 0.28603 206.00000 0.2060000 0.6580000
[ 6 ] 0.85981~ 2.00735 1.43358 198.00000 0.1980000 0.8560000
[ 7 ] 2.00735~ 3.15489 2.58112 113.00000 0.1130000 0.9690000
[ 8 ] 3.15489~ 4.30244 3.72866 29.00000 0.0290000 0.9980000
[ 9 ] 4.30244~ 5.44998 4.87621 2.00000 0.0020000 1.0000000
frequency distribution: sample mean=0.001443 , sample variance=3.216414 , sample sd=1.793436
361
X0=residual,goodness of fit(peasrson chi square test statistic)
mu point estimated value=-0.000000 (MLE)
sigma point estimated value=1.762724 (MLE)
mu value from -0.352545 to 0.352545
sigma value from 1.468937 to 2.203405
degree of freedom=7
H0: X0~Normal(mu=0.035254,sigma*sigma=3.211632), sigma=1.792103
pearson chi-square test statistic =24.080000
p-value=0.001100
362
----------------------------------------------------------------------------------
intercept 0.9710470890 0.0318112268 30.52530 0.00000
slpoe 2.0161453275 0.0442994922 45.51170 0.00000
----------------------------------------------------------------------------------
MSE= 1.0108760710 , R2=0.674846 , R2(adj)=0.674520
X2(mean)= 0.9237919833, X2(variance)= 3.1058029911, X2(s.d.)= 1.7623288544
Cos(X1*pi)(mean)= -0.0234383429, Cos(X1*pi)(variance)= 0.5156261467, Cos(X1*pi)(s.d.)=
0.7180711293
SS(Cos(X1*pi))= 515.1105205874 , SS(X2*Cos(X1*pi))= 1038.5376692321, C.V.= 1.0883655058
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.28855 -0.84618 -0.52722 -0.25469 0.00003 0.25470
0.52722 0.84576 1.28844
upper limit -1.28855 -0.84618 -0.52722 -0.25469 0.00003 0.25470 0.52722
0.84576 1.28844
observed no 97.00000 109.00000 101.00000 104.00000 98.00000 87.00000 120.00000
93.00000 90.00000 101.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.09000 0.81000 0.01000 0.16000 0.04000 1.69000 4.00000
0.49000 1.00000 0.01000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =8.300000
p-value=0.404700
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=509
number of the positive ofresidual=491
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.053044, p-value=0.478900
H0: residual is random , H1: Oscillation
Z=-0.053044, p-value=0.521100
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.053044, p-value=0.957800
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
e(t)~Normal(0,sigma*sigma),
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.005910
Z=-0.093348, p-value=0.537200
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
Z=-0.093348, p-value=0.462800
H0: auto correlation coefficient=0 , H1:against H0
Z=-0.093348, p-value=0.925600
363
(1.5.2)
X0=residual,residual frequency distribution table,
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -3.29633~ -2.46000 -2.87817 5.00000 0.0050000 0.0050000
[ 2 ] -2.46000~ -1.62366 -2.04183 35.00000 0.0350000 0.0400000
[ 3 ] -1.62366~ -0.78733 -1.20549 186.00000 0.1860000 0.2260000
[ 4 ] -0.78733~ 0.04901 -0.36916 300.00000 0.3000000 0.5260000
[ 5 ] 0.04901~ 0.88534 0.46718 294.00000 0.2940000 0.8200000
[ 6 ] 0.88534~ 1.72168 1.30351 134.00000 0.1340000 0.9540000
[ 7 ] 1.72168~ 2.55802 2.13985 35.00000 0.0350000 0.9890000
[ 8 ] 2.55802~ 3.39435 2.97618 10.00000 0.0100000 0.9990000
[ 9 ] 3.39435~ 4.23069 3.81252 1.00000 0.0010000 1.0000000
frequency distribution: sample mean=-0.000335 , sample variance=1.053746 , sample sd=1.026521
364
E(X2|x1) and Cos(x1) E(X1|x2) and x2 are not linear relation
(2.2)
Non-linear model analysis
The relation is X2= 0.9998775155+ 1.9999954117*Cos(X1*pi)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Cos(X1*pi) 1 19999024.1243208420 19999024.1243208420 19980084.2396042200
error 9999998 10009477.3799172790 1.0009479382
total 9999999 30008501.5042381210
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
365
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.9998775155 0.0003163776 3160.39269 0.00000
slpoe 1.9999954117 0.0004474354 4469.90875 0.00000
----------------------------------------------------------------------------------
MSE= 1.0009479382 , R2=0.666445 , R2(adj)=0.666445
X2(mean)= 1.0001611654, X2(variance)= 3.0008504505, X2(s.d.)= 1.7322962941
Cos(X1*pi)(mean)= 0.0001418253, Cos(X1*pi)(variance)= 0.4999779471, Cos(X1*pi)(s.d.)=
0.7070911873
SS(Cos(X1*pi))= 4999778.9713012017 , SS(X2*Cos(X1*pi))= 9999535.0023550987, C.V.= 1.0003126411
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.64568 -1.28220 -1.03692 -0.84201 -0.67478 -0.52463
-0.38547 -0.25369 -0.12567 -0.00023 0.12549 0.25345 0.38546 0.52463
0.67475 0.84195 1.03687 1.28210 1.64564
upper limit -1.64568 -1.28220 -1.03692 -0.84201 -0.67478 -0.52463 -0.38547
-0.25369 -0.12567 -0.00023 0.12549 0.25345 0.38546 0.52463 0.67475
0.84195 1.03687 1.28210 1.64564
observed no 500181.00000 499440.00000 500221.00000 499775.00000 500666.00000 499780.00000
498682.00000 499119.00000 501077.00000 499775.00000 499889.00000 501302.00000 499538.00000
499583.00000 500575.00000 501046.00000 499705.00000 499674.00000 500605.00000 499367.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 500000.00000 500000.00000 500000.00000 500000.00000 500000.00000 500000.00000
500000.00000 500000.00000 500000.00000 500000.00000 500000.00000 500000.00000 500000.00000
500000.00000 500000.00000 500000.00000 500000.00000 500000.00000 500000.00000 500000.00000
chi square 0.06552 0.62720 0.09768 0.10125 0.88711 0.09680 3.47425
1.55232 2.31986 0.10125 0.02464 3.39041 0.42689 0.34778 0.66125
2.18823 0.17405 0.21255 0.73205 0.80138
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =18.282472
p-value=0.437100
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=4999624
number of the positive ofresidual=5000376
H0: residualis random , H1: Increasing line or decreasing line
Z=0.400995, p-value=0.655900
H0: residual is random , H1: Oscillation
Z=0.400995, p-value=0.344100
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.400995, p-value=0.688200
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,10000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.000053
366
(2.3)residual analysis,
X0=residual,residual mariginal probability distribution
Mathematical Mean: 0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00095
S.D. : 1.00047
Skewed Coef. : -0.00042
Kurtosis Coef. : 3.00296
MAD : 0.79818
Range : 10.68772
Mid_range : -0.19128
Median : 0.00010
Q1 : -0.67486
Q2 : 0.00010
Q3 : 0.67487
IQR : 1.34973
C.V. : none
SLLN analysis, X0=residual and Normal(0,1),Note:X1~Normal(0,1), X1 is
representable code of Normal(0,1),
E(| X0 distribution - X1 distribution |^2)= 0.0000031422
************ The | X0 distribution F() - X1 distribution F()| ****************
The almost surely limiting theory
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000138
Pr(| X0 distribution F() - X1 distribution F()|< 0.1000000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0500000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0100000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0050000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0010000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0005000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0001000000)= 0.484983
The probability limiting theory
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000138
Pr(| X0 distribution F() - X1 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0001000000)= 0.515017
367
(2.4)Conclusion,
X1~Normal(0,4),X2=1.0000038041+2.0000020130*Cos(X1*pi)+error,
error~Normal(0,1).
(
Appendix 6.2) X 1 ~ Normal µ X1 = 0,σ X2 1 = 2 2 , )
the population conditional expectation line is
( )
E X 2 x1 = β 0 + β1 cos 2 ( x1π ) = 1 + 2 cos 2 ( x1π ), ε ~ Normal 0,σ 2 = 1 , ( )
(1)paird samples, n=1000,
(1.1)Basic analysis
scatter diagram scatter diagram using the linear model
368
(1.3)the frequency probability table of dependent variable,
X2 frequency probability table
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -1.49027~ -0.71803 -1.10415 10.00000 0.0100000 0.0100000
[ 2 ] -0.71803~ 0.05421 -0.33191 37.00000 0.0370000 0.0470000
[ 3 ] 0.05421~ 0.82646 0.44034 132.00000 0.1320000 0.1790000
[ 4 ] 0.82646~ 1.59870 1.21258 197.00000 0.1970000 0.3760000
[ 5 ] 1.59870~ 2.37095 1.98482 266.00000 0.2660000 0.6420000
[ 6 ] 2.37095~ 3.14319 2.75707 201.00000 0.2010000 0.8430000
[ 7 ] 3.14319~ 3.91543 3.52931 112.00000 0.1120000 0.9550000
[ 8 ] 3.91543~ 4.68768 4.30156 38.00000 0.0380000 0.9930000
[ 9 ] 4.68768~ 5.45992 5.07380 7.00000 0.0070000 1.0000000
frequency distribution: sample mean=1.950073 , sample variance=1.382945 , sample sd=1.175987
(1.4)
X 2i = β 0 + β1 X 1i + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope, ε i is error,
(1.4.1)
The linear mdoel analysis
The estimated line is X2=1.944812+-0.004149*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 0.0651048857 0.0651048857 0.0483258413
error 998 1344.5120478279 1.3472064607
total 999 1344.5771527135
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.826500
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 1.9448119558 0.0367074052 52.98146 0.00000
slpoe -0.0041492944 0.0188748948 -0.21983 0.82600
----------------------------------------------------------------------------------
MSE=1.3472064607 , R2=0.000048 , R2(adj)=-0.000954
X2(mean)= 1.9449167240, X2(variance)= 1.3459230758, X2(s.d.)= 1.1601392484
X1(mean)= -0.0252496238, X1(variance)= 3.7852937477, X1(s.d.)= 1.9455831382
SSX1=3781.5084539155 , SS(X2*X1)= -15.6905919453, C.V.= 0.5967824839
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.48754 -0.97685 -0.60864 -0.29402 0.00003 0.29403
0.60864 0.97637 1.48742
upper limit -1.48754 -0.97685 -0.60864 -0.29402 0.00003 0.29403 0.60864
0.97637 1.48742
observed no 97.00000 118.00000 98.00000 78.00000 113.00000 96.00000 99.00000
92.00000 99.00000 110.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.09000 3.24000 0.04000 4.84000 1.69000 0.16000 0.01000
0.64000 0.01000 1.00000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =11.720000
p-value=0.164100
~~~~~ The run test of residual~~~~~~~~~~~~~
369
number of the negative of residual=504
number of the positive ofresidual=496
H0: residualis random , H1: Increasing line or decreasing line
Z=0.951244, p-value=0.829300
H0: residual is random , H1: Oscillation
Z=0.951244, p-value=0.170700
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.951244, p-value=0.341400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.063777
370
0.50831 0.81542 1.24223
upper limit -1.24233 -0.81583 -0.50831 -0.24555 0.00002 0.24556 0.50831
0.81542 1.24223
observed no 104.00000 92.00000 101.00000 108.00000 89.00000 98.00000 109.00000
103.00000 94.00000 102.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.16000 0.64000 0.01000 0.64000 1.21000 0.04000 0.81000
0.09000 0.36000 0.04000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =4.000000
p-value=0.857100
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=494, number of the positive ofresidual=506
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.944739, p-value=0.172400
H0: residual is random , H1: Oscillation
Z=-0.944739, p-value=0.827600
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.944739, p-value=0.344800
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.995970
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.004030
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.875214 , 1.014354]
90% confidence interval for population standard deviation [0.935529 , 1.007152]
95% confidence interval for population variance [0.863864 , 1.030039]
95% confidence interval for population standard deviation [0.929443 , 1.014908]
99% confidence interval for population variance [0.842501 , 1.062153]
99% confidence interval for population standard deviation [0.917878 , 1.030608]
estimated line Cos(X1*pi)^2, residual plot
371
sample mean(X1)= 0.0002, sample variance(X1)= 4.0000,
sample mean(X2)= 2.0000, sample variance(X2)= 1.4999,
sample cov(X1,X2)= -0.0001, X1 and X2 sample correlation coefficient=-0.0000.
E(X2|x1) and Cos(x1) E(X1|x2) and x2 are not linear relation
372
Mathematical Mean: 1.99997
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.49994
S.D. : 1.22472
Skewed Coef. : -0.00005
Kurtosis Coef. : 2.83415
MAD : 0.98580
Range : 12.84450
Mid_range : 1.94613
Median : 2.00005
Q1 : 1.15340
Q2 : 2.00005
Q3 : 2.84642
IQR : 1.69302
C.V. : 0.61237
(2.2)
Non-linear model analysis
The relation is X2= 0.9998860304+ 2.0001194854*Cos(X1*pi)*Cos(X1*pi)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Cos(X1*pi)*Cos(X1*pi) 1 50004162.7065743580 50004162.7065743580 50009305.0178369730
error 99999998 99989715.2912962140 0.9998971729
total 99999999 149993877.9978705600
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.9998860304 0.0001732013 5772.97106 0.00000
slpoe 2.0001194854 0.0002828333 7071.72575 0.00000
----------------------------------------------------------------------------------
MSE=0.9998971729 , R2=0.333375 , R2(adj)=0.333375
X2(mean)= 1.9999719462, X2(variance)= 1.4999387950, X2(s.d.)= 1.2247198843
Cos(X1*pi)*Cos(X1*pi)(mean)= 0.5000130858, Cos(X1*pi)*Cos(X1*pi)(variance)= 0.1249954724,
Cos(X1*pi)*Cos(X1*pi)(s.d.)= 0.3535469876
SS(Cos(X1*pi)*Cos(X1*pi))= 12499547.1191571710 , SS(X2*Cos(X1*pi)*Cos(X1*pi))= 25000587.7511875290,
C.V.= 0.4999813058
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.64482 -1.28153 -1.03637 -0.84157 -0.67443 -0.52435
-0.38527 -0.25356 -0.12561 -0.00023 0.12543 0.25331 0.38526 0.52435
0.67439 0.84151 1.03633 1.28143 1.64477
upper limit -1.64482 -1.28153 -1.03637 -0.84157 -0.67443 -0.52435 -0.38527
-0.25356 -0.12561 -0.00023 0.12543 0.25331 0.38526 0.52435 0.67439
0.84151 1.03633 1.28143 1.64477
observed no 5000200.00000 4999190.00000 4999348.00000 5002193.00000 5000539.00000 4998575.00000
4999144.00000 4989605.00000 5010040.00000 4991075.00000 4995383.00000 5010131.00000
4999004.00000 5005535.00000 5000563.00000 5000975.00000 4997245.00000 5002898.00000
5000265.00000 4998092.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.00800 0.13122 0.08502 0.96185 0.05810 0.40613 0.14655
21.61121 20.16032 15.93113 4.26334 20.52743 0.19840 6.12725 0.06339
0.19012 1.51801 1.67968 0.01405 0.72809
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =94.809278
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=49998910
373
number of the positive ofresidual=50001090
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.031195, p-value=0.487600
H0: residual is random , H1: Oscillation
Z=-0.031195, p-value=0.512400
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.031195, p-value=0.975200
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.999886
374
E(| X0 distribution - X1 distribution |^2)= 0.0000000520
************ The | X0 distribution F() - X1 distribution F()| ****************
The almost surely limiting theory
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000012
Pr(| X0 distribution F() - X1 distribution F()|< 0.1000000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0500000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0100000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0050000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0010000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0005000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0001000000)= 1.000000
The probability limiting theory
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000012
Pr(| X0 distribution F() - X1 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0001000000)= 0.000000
(2.4) Conclusion,
X1~Normal(0,4),X2=1.0000038041+2.0000020130*Cos(X1*pi)^2+error,
error~Normal(0,1).
375
Appendix 7. The population of Logistic distribution
The population is Logistic probabilitydistribution, the population mean is 100 and
the population variance is 4, simulating 100,000,000 samples,
( the parameters of Logisitic are µ = 0, σ = 1.10760 ).
(1)The marginal probability distribution,
f(x1),F(x1) Coefficient
Mathematical Mean: 0.00009
Geometrical Mean : none
Harmonic Mean : none
Variance : 4.03673
S.D. : 2.00916
Skewed Coef. : -0.00054
Kurtosis Coef. : 4.19153
MAD : 1.53566
Range : 30.59933
Mid_range : 0.00000
Median : 0.00006
Q1 : -1.21675
Q2 : 0.00006
Q3 : 1.21721
IQR : 2.43396
C.V. : none
0.000000<F(x)<=0.050000
Error=0.000683287633189380 MAX=0.011062371964946749
coefficient of determination=0.999999091007459650
The random variable value estimated line ------
X= -0.36144773662090302000+
1.13121198117733000000*tan((F(x)-0.5)*pi)^1+
0.21319090574979782000*tan((F(x)-0.5)*pi)^2+
0.02551528438925743100*tan((F(x)-0.5)*pi)^3+
0.00161103846039623020*tan((F(x)-0.5)*pi)^4+
0.00003934353298973292*tan((F(x)-0.5)*pi)^5+
0.050000<F(x)<=0.100000
Error=0.000004880519120589 MAX=0.000144586395838253
coefficient of determination=0.999999946660391050
376
The random variable value estimated line ------
X= -4.59137023985385890000+
-35.37335491180419900000*log(1-F(x)))^1+
-205.04631805419922000000*log(1-F(x)))^2+
-714.86285400390625000000*log(1-F(x)))^3+
-1048.61547851562500000000*log(1-F(x)))^4+
0.100000<F(x)<=0.150000
Error=0.000001466090004518 MAX=0.000102311196046756
coefficient of determination=0.999999958717005750
The random variable value estimated line ------
X= 0.29488432407379150000+
2.27502429485321040000*tan((F(x)-0.5)*pi)^1+
1.03891706466674800000*tan((F(x)-0.5)*pi)^2+
0.31234908103942871000*tan((F(x)-0.5)*pi)^3+
0.04100593179464340200*tan((F(x)-0.5)*pi)^4+
0.150000<F(x)<=0.200000
Error=0.000000692821008230 MAX=0.000058836914659688
coefficient of determination=0.999999965850611460
The random variable value estimated line ------
X= 1.90501141548156740000+
3.42121016979217530000*log(F(x))^1+
1.12947809696197510000*log(F(x))^2+
0.20629882812500000000*log(F(x))^3+
0.200000<F(x)<=0.250000
Error=0.000000522986619037 MAX=0.000056209626235093
coefficient of determination=0.999999961868786920
The random variable value estimated line ------
X= 0.14986997842788696000+
2.14525794982910160000*tan((F(x)-0.5)*pi)^1+
1.34892559051513670000*tan((F(x)-0.5)*pi)^2+
0.78867864608764648000*tan((F(x)-0.5)*pi)^3+
0.21835052967071533000*tan((F(x)-0.5)*pi)^4+
0.250000<F(x)<=0.300000
Error=0.000000663805238469 MAX=0.000066825379190227
coefficient of determination=0.999999937395434580
The random variable value estimated line ------
X= 3.28086045384407040000+
-2.52311021089553830000*(1/F(x))^1+
0.51717242598533630000*(1/F(x))^2+
-0.04198981169611215600*(1/F(x))^3+
0.300000<F(x)<=0.350000
Error=0.000000532872788396 MAX=0.000044405749932031
coefficient of determination=0.999999938881833470
The random variable value estimated line ------
X= -2.69870167225599290000+
-6.55925154685974120000*log(1-F(x)))^1+
-5.23477113246917720000*log(1-F(x)))^2+
-1.98897159099578860000*log(1-F(x)))^3+
0.350000<F(x)<=0.400000
Error=0.000000522831295055 MAX=0.000049890658496810
coefficient of determination=0.999999931637600590
The random variable value estimated line ------
X= 0.00001115538179874420+
1.39649295806884770000*tan((F(x)-0.5)*pi)^1+
-0.21163129806518555000*tan((F(x)-0.5)*pi)^2+
-1.46191787719726560000*tan((F(x)-0.5)*pi)^3+
-2.79061889648437500000*tan((F(x)-0.5)*pi)^4+
-2.23810195922851560000*tan((F(x)-0.5)*pi)^5+
0.400000<F(x)<=0.450000
Error=0.000000312752819030 MAX=0.000042977291241642
coefficient of determination=0.999999955482027800
The random variable value estimated line ------
X= 0.00004515495038504014+
1.101215330883860600000000000000*log(F(x)/(1-F(x)))^1+
-0.086022198200225830000000000000*log(F(x)/(1-F(x)))^2+
4.551019668579101600000000000000*log(F(x)/(1-F(x)))^3+
157.682952880859370000000000000000*log(F(x)/(1-F(x)))^4+
2109.235229492187500000000000000000*log(F(x)/(1-F(x)))^5+
377
14309.840820312500000000000000000000*log(F(x)/(1-F(x)))^6+
48705.539062500000000000000000000000*log(F(x)/(1-F(x)))^7+
65953.402343750000000000000000000000*log(F(x)/(1-F(x)))^8+
0.450000<F(x)<=0.500000
Error=0.000000149343082446 MAX=0.000028367131421380
coefficient of determination=0.999999977992232840
The random variable value estimated line ------
X= 0.00005732061163143953+
1.111632163869217000000000000000*log(F(x)/(1-F(x)))^1+
-0.043361157178878784000000000000*log(F(x)/(1-F(x)))^2+
1.669347763061523400000000000000*log(F(x)/(1-F(x)))^3+
-77.822616577148438000000000000000*log(F(x)/(1-F(x)))^4+
1275.596923828125000000000000000000*log(F(x)/(1-F(x)))^5+
-9452.799804687500000000000000000000*log(F(x)/(1-F(x)))^6+
32981.671875000000000000000000000000*log(F(x)/(1-F(x)))^7+
-44203.011718750000000000000000000000*log(F(x)/(1-F(x)))^8+
0.500000<F(x)<=0.550000
Error=0.000000155651482517 MAX=0.000029820467389419
coefficient of determination=0.999999976594577730
The random variable value estimated line ------
X= -0.00402648001909255980+
1.216633915901184100000000000000*log(F(x)/(1-F(x)))^1+
-0.990192890167236330000000000000*log(F(x)/(1-F(x)))^2+
4.117090225219726600000000000000*log(F(x)/(1-F(x)))^3+
-8.011781692504882800000000000000*log(F(x)/(1-F(x)))^4+
5.939398765563964800000000000000*log(F(x)/(1-F(x)))^5+
0.550000<F(x)<=0.600000
Error=0.000000273554275783 MAX=0.000038477489750333
coefficient of determination=0.999999961147661430
The random variable value estimated line ------
X= -0.00899159908294677730+
1.201054513454437300000000000000*log(F(x)/(1-F(x)))^1+
-0.328715801239013670000000000000*log(F(x)/(1-F(x)))^2+
0.489564538002014160000000000000*log(F(x)/(1-F(x)))^3+
-0.262946486473083500000000000000*log(F(x)/(1-F(x)))^4+
0.600000<F(x)<=0.650000
Error=0.000000369839116296 MAX=0.000040407285126498
coefficient of determination=0.999999951626579180
The random variable value estimated line ------
X= 0.06860533356666564900+
1.00061774253845210000*tan((F(x)-0.5)*pi)^1+
0.91882944107055664000*tan((F(x)-0.5)*pi)^2+
-1.22047185897827150000*tan((F(x)-0.5)*pi)^3+
0.45349740982055664000*tan((F(x)-0.5)*pi)^4+
0.650000<F(x)<=0.700000
Error=0.000000450977800240 MAX=0.000054210018204492
coefficient of determination=0.999999948501949400
The random variable value estimated line ------
X= 4.74466514587402340000+
29.23645019531250000000*log(F(x))^1+
102.57336425781250000000*log(F(x))^2+
192.31542968750000000000*log(F(x))^3+
142.07812500000000000000*log(F(x))^4+
0.700000<F(x)<=0.750000
Error=0.000000997935804361 MAX=0.000065634571963180
coefficient of determination=0.999999905526037460
The random variable value estimated line ------
X= -0.23868405818939209000+
2.30883240699768070000*tan((F(x)-0.5)*pi)^1+
-1.29462718963623050000*tan((F(x)-0.5)*pi)^2+
0.54605150222778320000*tan((F(x)-0.5)*pi)^3+
-0.10434103012084961000*tan((F(x)-0.5)*pi)^4+
0.750000<F(x)<=0.800000
Error=0.000001124348221286 MAX=0.000067089558881905
coefficient of determination=0.999999919160220130
The random variable value estimated line ------
X= -0.36335521936416626000+
2.43422782421112060000*tan((F(x)-0.5)*pi)^1+
378
-1.17309939861297610000*tan((F(x)-0.5)*pi)^2+
0.36107987165451050000*tan((F(x)-0.5)*pi)^3+
-0.04744070023298263500*tan((F(x)-0.5)*pi)^4+
0.800000<F(x)<=0.850000
Error=0.000000821088910784 MAX=0.000064113400427557
coefficient of determination=0.999999959334919250
The random variable value estimated line ------
X= 0.60040664672851563000+
0.30081748962402344000*tan((F(x)-0.5)*pi)^1+
0.65662479400634766000*tan((F(x)-0.5)*pi)^2+
-0.38221311569213867000*tan((F(x)-0.5)*pi)^3+
0.08851981163024902300*tan((F(x)-0.5)*pi)^4+
-0.00764834880828857420*tan((F(x)-0.5)*pi)^5+
0.850000<F(x)<=0.900000
Error=0.000002097995186246 MAX=0.000088115155979729
coefficient of determination=0.999999940928421590
The random variable value estimated line ------
X= 0.11100018024444580000+
1.41221305727958680000*tan((F(x)-0.5)*pi)^1+
-0.33743028342723846000*tan((F(x)-0.5)*pi)^2+
0.05258737690746784200*tan((F(x)-0.5)*pi)^3+
-0.00451843289192765950*tan((F(x)-0.5)*pi)^4+
0.00016247624444076791*tan((F(x)-0.5)*pi)^5+
0.900000<F(x)<=0.950000
Error=0.000003175717615387 MAX=0.000147648432990088
coefficient of determination=0.999999965612508260
The random variable value estimated line ------
X= -2.09225997701287270000+
4.082087025046348600000000000000*log(F(x)/(1-F(x)))^1+
-1.798192268237471600000000000000*log(F(x)/(1-F(x)))^2+
0.605736260768026110000000000000*log(F(x)/(1-F(x)))^3+
-0.125122338649816810000000000000*log(F(x)/(1-F(x)))^4+
0.016424997724243440000000000000*log(F(x)/(1-F(x)))^5+
-0.001370952220895560500000000000*log(F(x)/(1-F(x)))^6+
0.000070298643606747646000000000*log(F(x)/(1-F(x)))^7+
-0.000002016273541016744300000000*log(F(x)/(1-F(x)))^8+
0.000000024715343244219312000000*log(F(x)/(1-F(x)))^9+
0.950000<F(x)<=1.000000
Error=0.000413627728662946 MAX=0.007649680932839686
coefficient of determination=0.999999001672271070
Left diagram is the comparison of
estimated line and the sample data.
379
(4) SLLN analysis, X1~Logistic, the population mean is 100 and
the population variance is 4,Note:X2~ Logistic( µ = 0, σ = 1.10760 ),
E(| X1 distribution - X2 distribution |^2)= 0.0000003063
************ The | X1 distribution F() - X2 distribution F()| ****************
The almost surely limiting theory
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0000000015
Pr(| X1 distribution F() - X2 distribution F()|< 0.1000000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0500000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0100000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0050000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0010000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0005000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0001000000)= 0.998464
(5) X1~Logistic, the population mean is 100 and the population variance is 4,
simulated 100,000,000 samples, let is Z1=MIN(X1^2,|X1|^0.5).
f(z1),F(z1) Coefficient
Mathematical Mean: 0.98542
Geometrical Mean : 0.50966
Harmonic Mean : 0.00000
Variance : 0.43841
S.D. : 0.66212
Skewed Coef. : 0.04205
Kurtosis Coef. : 2.08294
MAD : 0.56631
Range : 3.91148
Mid_range : 1.95574
Median : 1.10317
Q1 : 0.32025
Q2 : 1.10317
Q3 : 1.46813
IQR : 1.14788
C.V. : 0.67192
380
Appendix 8. The critical values of Logistic
distribution
The population distribution is Logistic and the size is n,
(1) Population mean test, the test statistic is below.
X − µ0
H 0 : µ = µ 0 ,W2 = ,W2 is symmetric distribution,let P(W2 ≤ W2,,1−α ,n ) = α ,
S n
α
n 0.9 0.95 0.975 0.99 0.995
3 1.832074 2.773549 4.038885 6.494457 9.230786
4 1.617368 2.275064 3.032092 4.273789 5.469409
5 1.524799 2.082087 2.674281 3.561003 4.342657
6 1.473804 1.980366 2.494179 3.223868 3.833998
7 1.440605 1.917936 2.387339 3.029054 3.547273
8 1.417804 1.874686 2.315461 2.901814 3.362455
9 1.400650 1.844090 2.264861 2.813804 3.237146
10 1.387597 1.820916 2.226902 2.749689 3.146810
11 1.377002 1.802217 2.197069 2.699893 3.077922
12 1.368606 1.787869 2.173740 2.660647 3.023007
13 1.361515 1.774840 2.154866 2.629129 2.979524
14 1.355690 1.765044 2.138563 2.602998 2.942553
15 1.350262 1.756018 2.124872 2.580223 2.911964
20 1.332484 1.726209 2.079441 2.507003 2.812751
25 1.322380 1.709321 2.053773 2.467227 2.758745
30 1.315117 1.698104 2.037418 2.442101 2.725241
40 1.306762 1.684679 2.017151 2.410781 2.684739
50 1.301810 1.676165 2.005176 2.393369 2.661536
60 1.298437 1.671040 1.997169 2.381784 2.646289
70 1.295938 1.667310 1.991929 2.373999 2.636381
80 1.294317 1.664634 1.988154 2.367312 2.627865
90 1.292706 1.662162 1.984677 2.363223 2.622213
100 1.291414 1.660411 1.981549 2.357562 2.614991
500 1.283723 1.648030 1.964215 2.331347 2.582061
1000 1.282632 1.646505 1.962219 2.330148 2.579613
α
n 0.005 0.01 0.025 0.05 0.01
3 0.008403 0.016867 0.042528 0.086376 0.159444
4 0.059119 0.094817 0.178843 0.293231 0.491692
5 0.169494 0.243808 0.400363 0.592060 0.897213
6 0.336514 0.455166 0.687924 0.957122 1.365034
7 0.552361 0.716395 1.027004 1.372351 1.879315
8 0.809824 1.020691 1.408302 1.827812 2.429735
9 1.103232 1.360638 1.823780 2.316140 3.009192
381
10 1.429280 1.732159 2.269594 2.831804 3.613066
11 1.781371 2.130483 2.741035 3.371553 4.237220
12 2.158214 2.551584 3.233786 3.930505 4.878840
13 2.557395 2.995094 3.746949 4.509101 5.537230
14 2.975774 3.457066 4.276104 5.101819 6.207915
15 3.412854 3.938117 4.825366 5.710974 6.892107
20 5.813817 6.543346 7.745914 8.922009 10.454083
25 8.494694 9.412828 10.911204 12.348840 14.196985
30 11.387816 12.482961 14.245811 15.925322 18.065095
40 17.607472 19.030081 21.286075 23.404382 26.065239
50 24.256767 25.975437 28.678530 31.189666 34.311383
60 31.208779 33.198644 36.314654 39.182994 42.733670
70 38.405502 40.644663 44.133973 47.337958 51.281553
80 45.798381 48.268437 52.107084 55.621938 59.933550
90 53.325879 56.028213 60.204638 64.014510 68.673697
100 60.995366 63.911996 68.402117 72.495428 77.480065
500 404.333799 412.623254 425.07711 436.034652 448.985695
1000 861.758831 874.129319 892.611616 908.755821 927.719993
α
n 0.9 0.95 0.975 0.99 0.995
3 4.711185 6.483522 8.422638 11.254946 13.603583
4 6.522300 8.618926 10.873294 14.110851 16.759289
5 8.208240 10.563933 13.060454 16.597565 19.469364
6 9.810309 12.384144 15.078838 18.858426 21.899558
7 11.353633 14.116056 16.979500 20.964656 24.151310
8 12.848181 15.779548 18.795694 22.958149 26.269819
9 14.310642 17.395572 20.542562 24.866555 28.287443
10 15.739825 18.961779 22.236348 26.706210 30.232595
11 17.144261 20.498782 23.885786 28.487558 32.087822
12 18.530760 22.006093 25.502358 30.219110 33.899608
13 19.897226 23.486719 27.080340 31.925397 35.700713
14 21.244815 24.942558 28.634362 33.576790 37.414163
15 22.579713 26.383770 30.161302 35.214266 39.120753
20 29.091643 32.774641 37.520223 43.022865 47.231696
25 35.402157 40.048426 44.551252 50.408345 54.842358
30 41.575570 46.565597 51.361142 57.554326 62.189708
40 53.630356 59.209573 64.508761 71.276492 76.286030
50 65.419420 71.511919 77.240342 84.492281 89.797246
60 77.025623 83.573324 89.690387 97.378675 102.976929
70 88.496966 95.463891 101.935625 110.020823 115.88847
80 99.855558 107.212127 114.024173 122.457852 128.564761
90 111.138753 118.851592 125.955253 134.762581 141.102749
100 122.353588 130.397197 137.777691 146.911876 153.446014
500 550.948973 567.069979 581.440733 598.524253 610.401251
1000 1072.25162 1094.29574 1108.88654 1136.90360 1152.89168
382
Appendix 9. The transformation of probability
distribution by the simulator
The proability distribution transformation using the simulator,
appendix 9.1, X 1 , X 2 ~ Unform(− 1,1), f X i (xi ) = 0.5,−1 < xi < 1, i = 1,2,
iid
383
Mathematical Mean: 0.00001
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.66668
S.D. : 0.81650
Skewed Coef. : -0.00006
Kurtosis Coef. : 2.39995
MAD : 0.66668
Range : 3.99931
Mid_range : -0.00003
Median : 0.00002
Q1 : -0.58580
Q2 : 0.00002
Q3 : 0.58583
IQR : 1.17163
C.V. : none
384
W2 pdf and cdf Coefficient
Mathematical Mean: 0.00340
Geometrical Mean : none
Harmonic Mean : none
Variance : 100865.74363
S.D. : 317.59368
Skewed Coef. : 13.94699
Kurtosis Coef. : 298983.28571
MAD : 4.08617
Range : 587070.08862
Mid_range : 1491.26030
Median : -0.00000
Q1 : -0.27023
Q2 : -0.00000
Q3 : 0.27022
IQR : 0.54045
C.V. : none
385
2.3)X1,X2 joint probability distribution,
the joint pdf the joint cdf
386
2.6) Y1 = X 1 + X 2 , Y2 = X 1 − X 2 , joint proabability distribution,
Y1,Y2 joint pdf Y1,Y2 joint cdf
387
3.2)X2 marginal probability distribution,
X2 pdf and cdf Coefficient
Mathematical Mean: -0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.12501
S.D. : 0.35357
Skewed Coef. : -0.00004
Kurtosis Coef. : 3.49968
MAD : 0.25002
Range : 1.99996
Mid_range : 0.00000
Median : -0.00000
Q1 : -0.16322
Q2 : -0.00000
Q3 : 0.16321
IQR : 0.32643
C.V. : none
388
3.5) Y2 = X 1 − X 2 , marginal probability distribution,
Y2 pdf and cdf Coefficient
Mathematical Mean: 0.00001
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.62500
S.D. : 0.79057
Skewed Coef. : -0.00001
Kurtosis Coef. : 2.69997
MAD : 0.63662
Range : 3.99993
Mid_range : -0.00000
Median : 0.00003
Q1 : -0.50672
Q2 : 0.00003
Q3 : 0.50668
IQR : 1.01340
C.V. : none
If the distribution with range limiting, then the forth example will give you the figures
and coefficients of this distribution.
appendix 9.4, X 1 , X 2 ~ Unform(− 1,1), f X i (xi ) = 0.5,−1 < xi < 1, i = 1,2, the range of
iid
389
4.1)X1 在 0.1 ≤ X 12 + X 22 ≤ 0.9 ,the conditional marginal probability distribution,
X1 conditional pdf and cdf Coefficinet
Mathematical Mean: -0.00003
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.25000
S.D. : 0.50000
Skewed Coef. : 0.00009
Kurtosis Coef. : 1.82017
MAD : 0.43618
Range : 1.89735
Mid_range : 0.00000
Median : -0.00013
Q1 : -0.42902
Q2 : -0.00013
Q3 : 0.42902
IQR : 0.85803
C.V. : none
390
4.4) Y1 = X 1 + X 2 , 在 0.1 ≤ X 12 + X 22 ≤ 0.8 , the conditional marginal probability
distribution,
Y1 conditional pdf and cdf Ceofficient
Mathematical Mean: 0.00002
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.50001
S.D. : 0.70711
Skewed Coef. : 0.00002
Kurtosis Coef. : 1.82003
MAD : 0.61686
Range : 2.68326
Mid_range : -0.00000
Median : 0.00000
Q1 : -0.60676
Q2 : 0.00000
Q3 : 0.60677
IQR : 1.21353
C.V. : none
391
Of course, the random variables can do the mathametical combination and form new
distributions.
appendix 9.5, X 1 , X 2 , X 3 , X 4 ~ Uniform(α = −1, β = 1),
iid
f P3 ( p3 ) Coefficient
Mathematical Mean: -0.00001
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.44242
S.D. : 0.66515
Skewed Coef. : -0.00003
Kurtosis Coef. : 1.94811
MAD : 0.57749
Range : 3.14087
Mid_range : -0.00007
Median : -0.00002
Q1 : -0.57870
Q2 : -0.00002
Q3 : 0.57875
IQR : 1.15746
C.V. : none
392
f P4 ( p 4 ) Coefficient
Mathematical Mean: -0.00017
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.78978
S.D. : 0.88869
Skewed Coef. : 0.00031
Kurtosis Coef. : 1.73468
MAD : 0.78547
Range : 3.14159
Mid_range : -0.00000
Median : -0.00057
Q1 : -0.78551
Q2 : -0.00002
Q3 : 0.78537
IQR : 1.57088
C.V. : none
f P1 , P2 ( p1 , p 2 ) FP1 , P2 ( p1 , p 2 )
f P1 , P3 ( p1 , p3 ) FP1 , P3 ( p1 , p3 )
393
f P1 , P4 ( p1 , p 4 ) FP1 , P4 ( p1 , p 4 )
394
f P3 , P4 ( p3 , p 4 ) FP3 , P4 ( p3 , p 4 )
( )
appendix 9.6, X i ~ Normal µ i = i, σ i2 = 2 2 , i = 1,2,...,10, X 1 ,..., X 10 are indepednent
∑ (X )
10 10
∑X
2
i −X i −X
random variables and let W1 = MAD = i =1
, W2 = S = i =1
.
10 9
f W1 (w1 ) Coefficient
Mathematical Mean: 2.85131
Geometrical Mean : 2.80016
Harmonic Mean : 2.74686
Variance : 0.28346
S.D. : 0.53241
Skewed Coef. : 0.13271
Kurtosis Coef. : 2.98834
MAD : 0.42532
Range : 5.78206
Mid_range : 3.32641
Median : 2.83962
Q1 : 2.48456
Q2 : 2.83962
Q3 : 3.20518
IQR : 0.72062
C.V. : 0.18672
f W2 (w2 ) Coefficient
Mathematical Mean: 3.57606
Geometrical Mean : 3.52136
Harmonic Mean : 3.46422
Variance : 0.37877
S.D. : 0.61544
Skewed Coef. : 0.05865
Kurtosis Coef. : 2.97632
MAD : 0.49160
Range : 6.62652
Mid_range : 3.92894
Median : 3.57031
Q1 : 3.15653
Q2 : 3.57031
Q3 : 3.98912
IQR : 0.83258
C.V. : 0.17210
395
Appendix 10. One way analysis when the error
distribution is arcsin
One way analyis,the sampling distribution of test statsistic when error distribution is
arcsin distribution.
X ij = µ + α i + ε ij , i = 1,2,...., k , j = 1,2,..., n,
396
SSTr
(2) W2 = , degree of freedom=4,
σ2
f W2 (w2 ) FW2 (w2 ) Coefficient
Mathematical Mean: 4.00039
Geometrical Mean : 2.85405
Harmonic Mean : 1.76468
Variance : 11.84350
S.D. : 3.44144
Skewed Coef. : 2.40912
Kurtosis Coef. : 13.99505
MAD : 2.44357
Range : 101.20935
Mid_range : 50.60517
Median : 3.06624
Q1 : 1.68786
Q2 : 3.06624
Q3 : 5.23325
IQR : 3.54539
C.V. : 0.86028
{ }
2
n
= ∑ (εˆi ) , degree of freedom=20,
SSE
(3) W3 =
2
σ 2
i =1
397
w3 − E (W3 ) w3 − E (W3 )
2
E FW3 − Φ =0.0045366405,
Var (W ) Var (W )
3 3
w − E (W3 )
− Φ w3 − E (W3 ) ≥ ε ,
P FW3 3
Var (W ) Var (W )
3 3
ε probability ε probability
0.1000 0.123440 0.0010 0.991354
0.0500 0.595534 0.0005 0.995678
0.0100 0.912089 0.0001 0.999136
0.0050 0.956375
W3 − E (W3 )
is not approached to the standard normal distribution,
Var (W3 )
the right side probability
0.995 0.99 0.975 0.95 0.9
W3 3.997720 4.643964 5.767607 6.925535 8.524510
[ ] [
E W4 − F (4,20 )(w4 ) =0.0053817782, E (W4 − F (4,20 )df (w4 )) = 0.0003085445,
2 2
]
P{W4 − F (4,20 )df (w4 ) ≥ ε } ,
ε probability ε probability
0.1000 0.000000 0.0010 0.968536
0.0500 0.000000 0.0005 0.982002
0.0100 0.707318 0.0001 0.996260
0.0050 0.859357
398
(5) W4 = MSTr MSE is not approached to F30,1000 distribution,
the right side probability
0.995 0.99 0.975 0.95 0.9
W4 0.060973 0.086961 0.140243 0.203086 0.298185
X 1• − X 2• − (α 1 − α 2 ) X 1• − X 2•
(6) W5 = = , α 1 = 0, α 2 = 0, ,
(
S X 1• − X 2• ) (
S X 1• − X 2• )
f W5 (w5 ) FW5 (w5 ) Coefficient
Mathematical Mean: -0.00010
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.10044
S.D. : 1.04902
Skewed Coef. : 0.00011
Kurtosis Coef. : 3.40612
MAD : 0.82626
Range : 20.75598
Mid_range : -0.25192
Median : 0.00025
Q1 : -0.68667
Q2 : 0.00025
Q3 : 0.68647
IQR : 1.37314
C.V. : none
[
E w5 − t 20 (w5 )
2
] = 0.0000752963, E[(F W5 (w5 ) − t 20 df (w5 ))2 ] = 0.0000002301,
{
P FW5 (w5 ) − t 20 df (w5 ) ≥ ε , }
ε probability ε probability
0.1000 0.000000 0.0010 0.103650
0.0500 0.000000 0.0005 0.222103
0.0100 0.000000 0.0001 0.875660
0.0050 0.000000
W5 is approached to t 20 distribution,
the right side probability
0.995 0.99 0.975 0.95 0.9
W5 -2.819497 -2.500928 -2.065375 -1.712420 -1.321295
t 20 -2.845336 -2.527554 -2.085834 -1.724817 -1.325341
399
(7) W6 = Bartlett’s test statistic,
f W6 (w6 ) FW6 (w6 ) Coefficient
Mathematical Mean: 8.44731
Geometrical Mean : 6.65876
Harmonic Mean : 4.55310
Variance : 30.24505
S.D. : 5.49955
Skewed Coef. : 1.19074
Kurtosis Coef. : 4.99742
MAD : 4.27492
Range : 67.01933
Mid_range : 33.51114
Median : 7.34715
Q1 : 4.35582
Q2 : 7.34715
Q3 : 11.35473
IQR : 6.99891
C.V. : 0.65104
Because Var (W6 ) = 30.24505 ≠ 2 × E (W6 ) = 2 × 8.44731, W6 is not chi-square
distribution.
[(
E W6 − χ 42 (w6 ) ) ]=26.9458736711, E [(F (w ) − χ df (w )) ]= 0.0896172970,
2
W6 6
2
4 6
2
{ }
P FW6 (w6 ) − χ 42 df (w6 ) ≥ ε ,
ε probability ε probability
0.1000 0.867559 0.0010 0.998790
0.0500 0.936260 0.0005 0.999394
0.0100 0.987705 0.0001 0.999879
0.0050 0.993895
W6 is not approached to χ 42 , the chi square distribution df=4,
the right side probability
0.995 0.99 0.975 0.95 0.9
W6 0.494955 0.707972 1.147801 1.670573 2.469927
( )
(8) W7 = Max S12 , S 22 ,.., S k2 SSE
f W7 (w7 ) FW7 (w7 ) Coefficient
Mathematical Mean: 0.12074
Geometrical Mean : 0.11629
Harmonic Mean : 0.11210
Variance : 0.00113
S.D. : 0.03368
Skewed Coef. : 0.69154
Kurtosis Coef. : 2.96671
MAD : 0.02727
Range : 0.19670
Mid_range : 0.14891
Median : 0.11481
Q1 : 0.09497
Q2 : 0.11481
Q3 : 0.14171
IQR : 0.04673
C.V. : 0.27891
400
the right side probability
0.995 0.99 0.975 0.95 0.9
W7 0.063828 0.0664709 0.070985 0.075578 0.081785
[ ] [
E W8 − F (4,20 )(w8 ) =3.9392454231, E (W8 − F (4,20 )df (w8 )) = 0.0999608830,
2 2
]
P{W8 − F (4,20 )df (w8 ) ≥ ε },
ε probability ε probability
0.1000 0.876347 0.0010 0.998894
0.0500 0.941644 0.0005 0.999449
0.0100 0.988829 0.0001 0.999890
0.0050 0.994446
W8 is not approached to F4, 20 distribution,
the right side probability
0.995 0.99 0.975 0.95 0.9
W8 0.160785 0.228220 0.364881 0.522319 0.754528
401
(10) W9 = Brwon-Forshe test statistic
f W9 (w9 ) FW9 (w9 ) Coefficient
Mathematical Mean: 0.85165
Geometrical Mean : 0.67049
Harmonic Mean : 0.47700
Variance : 0.36970
S.D. : 0.60803
Skewed Coef. : 2.07121
Kurtosis Coef. : 11.03935
MAD : 0.43846
Range : 11.64236
Mid_range : 5.82128
Median : 0.71054
Q1 : 0.44224
Q2 : 0.71054
Q3 : 1.09340
IQR : 0.65116
C.V. : 0.71394
402
(12) W11 = Cochran test statistic
f W11 (w11 ) FW11 (w11 ) Coefficient
Mathematical Mean: 0.48296
Geometrical Mean : 0.46517
Harmonic Mean : 0.44840
Variance : 0.01814
S.D. : 0.13470
Skewed Coef. : 0.69154
Kurtosis Coef. : 2.96671
MAD : 0.10908
Range : 0.78679
Mid_range : 0.59564
Median : 0.45922
Q1 : 0.37988
Q2 : 0.45922
Q3 : 0.56682
IQR : 0.18694
C.V. : 0.27891
403
appendix 10.2)k=5, n=100,
( )
n
SST
(1) W1 = 2 = ∑ Yi − Y ,degree of freedom=499,
2
σ i =1
404
SSTr
(2) W2 = , degree of freedom=4,
σ2
f W2 (w2 ) FW2 (w2 ) Coefficient
Mathematical Mean: 4.00021
Geometrical Mean : 3.04203
Harmonic Mean : 1.98723
Variance : 8.19720
S.D. : 2.86308
Skewed Coef. : 1.48120
Kurtosis Coef. : 6.43547
MAD : 2.18052
Range : 51.46479
Mid_range : 25.73259
Median : 3.34000
Q1 : 1.90970
Q2 : 3.34000
Q3 : 5.37507
IQR : 3.46536
C.V. : 0.71573
{ }
P FW2 (w2 ) − χ 42 df (w2 ) ≥ ε ,
ε probability ε probability
0.1000 0.000000 0.0010 0.718710
0.0500 0.000000 0.0005 0.863821
0.0100 0.000000 0.0001 0.978192
0.0050 0.000000
W2 is approached to χ 42 , the chi square distribution df=4,
n
(3) W3 = 2 = ∑ (εˆi ) , degree of freedom=495,
SSE 2
σ i =1
405
w3 − E (W3 ) w3 − E (W3 )
2
E FW3 − Φ =0.0002932969,
Var (W ) Var (W )
3 3
w − E (W3 )
− Φ w3 − E (W3 ) ≥ ε ,
P FW3 3
Var (W ) Var (W )
3 3
ε probability ε probability
0.1000 0.000000 0.0010 0.969578
0.0500 0.000000 0.0005 0.984778
0.0100 0.643253 0.0001 0.996895
0.0050 0.842916
W3 − E (W3 )
is not approached to the standard normal distribution,
Var (W3 )
[ ]
C.V. : 0.70762
[
E W4 − F (4,495)(w4 ) =0.0000191233, E (W4 − F (4,495)df (w4 )) = 0.0000015378,
2 2
]
P{W4 − F (4,495)df (w4 ) ≥ ε },
ε probability ε probability
0.1000 0.000000 0.0010 0.506144
0.0500 0.000000 0.0005 0.807838
0.0100 0.000000 0.0001 0.958244
0.0050 0.000000
W4 = MSTr MSE is approached to F4, 495 distribution,
406
the right side probability
0.995 0.99 0.975 0.95 0.9
W4 0.052135 0.074801 0.121940 0.178936 0.267685
[
E w5 − t 495 (w5 )
2
] = 0.0000005357, E[(F W5 (w5 ) − t 495 df (w5 ))2 ]= 0.0000000231,
{
P FW5 (w5 ) − t 495 df (w5 ) ≥ ε , }
ε probability ε probability
0.1000 0.000000 0.0010 0.000000
0.0500 0.000000 0.0005 0.000000
0.0100 0.000000 0.0001 0.611416
0.0050 0.000000
W5 is t 495 distribution,
the right side probability
0.995 0.99 0.975 0.95 0.9
W5 -2.582447 -2.332560 -1.964141 -1.647945 -1.283977
t 495 -2.585516 -2.334550 -1.965193 -1.647786 -1.283195
407
(6) W6 = Bartlett’s test statistic,
f W6 (w6 ) FW6 (w6 ) Coefficient
Mathematical Mean: 14.93448
Geometrical Mean : 11.35319
Harmonic Mean : 7.41677
Variance : 115.57412
S.D. : 10.75054
Skewed Coef. : 1.55191
Kurtosis Coef. : 7.03217
MAD : 8.14539
Range : 205.06390
Mid_range : 102.53231
Median : 12.45426
Q1 : 7.13221
Q2 : 12.45426
Q3 : 20.02521
IQR : 12.89300
C.V. : 0.71985
Var (W6 ) = 115.57412 ≠ 2 × E (W6 ) = 2 × 14.93448, W6 is not chi squared distribution,
[(
E W6 − χ 42 (w6 ) ) ]=182.3594577579, E [(F (w ) − χ df (w )) ]=0.1843295012,
2
W6 6
2
4 6
2
{ }
P FW6 (w6 ) − χ 42 df (w6 ) ≥ ε ,
ε probability ε probability
0.1000 0.889506 0.0010 0.998921
0.0500 0.945236 0.0005 0.999461
0.0100 0.989155 0.0001 0.999891
0.0050 0.994588
W6 is not approached to χ 42 , the chi square distribution, df=4,
the right side probability
0.995 0.99 0.975 0.95 0.9
W6 0.767567 1.102412 1.796724 2.637296 3.946318
( )
(7) W7 = Max S12 , S 22 ,.., S k2 SSE
f W7 (w7 ) FW7 (w7 ) Coefficient
Mathematical Mean: 0.00272
Geometrical Mean : 0.00270
Harmonic Mean : 0.00269
Variance : 0.00000
S.D. : 0.00035
Skewed Coef. : 1.11002
Kurtosis Coef. : 4.89462
MAD : 0.00027
Range : 0.00425
Mid_range : 0.00415
Median : 0.00266
Q1 : 0.00247
Q2 : 0.00266
Q3 : 0.00291
IQR : 0.00043
C.V. : none
408
the right side probability
0.995 0.99 0.975 0.95 0.9
W7 0.0021572 0.0021850 0.0022325 0.0022802 0.0023438
the right side probability
0.1 0.05 0.025 0.01 0.005
W7 0.0031876 0.0033821 0.0035672 0.0038019 0.0039716
(8) W8 = Levene’ test statistic,
f W8 (w8 ) FW8 (w8 ) Coefficient
Mathematical Mean: 1.72923
Geometrical Mean : 1.32043
Harmonic Mean : 0.86690
Variance : 1.50253
S.D. : 1.22578
Skewed Coef. : 1.45927
Kurtosis Coef. : 6.34870
MAD : 0.93505
Range : 23.22863
Mid_range : 11.61448
Median : 1.45064
Q1 : 0.83284
Q2 : 1.45064
Q3 : 2.32357
IQR : 1.49074
[ ]
C.V. : 0.70886
[
E W8 − F (4,2495)(w8 ) =0.7874818145, E (W8 − F (4,495)df (w8 )) = 0.0450906919,
2 2
]
P{W8 − F (4,495)df (w8 ) ≥ ε } ,
ε probability ε probability
0.1000 0.820088 0.0010 0.998496
0.0500 0.916875 0.0005 0.999251
0.0100 0.984502 0.0001 0.999850
0.0050 0.992365
W8 is not approached to F4, 495 distribution,
the right side probability
0.995 0.99 0.975 0.95 0.9
W8 0.089959 0.129173 0.210553 0.308938 0.461700
the right side probability
0.1 0.05 0.025 0.01 0.005
W8 3.356857 4.098499 4.823204 5.770799 6.483476
(9) W9 = Brwon-Forshe test statistic
f W9 (w9 ) FW9 (w9 ) Coefficient
Mathematical Mean: 1.00058
Geometrical Mean : 0.76423
Harmonic Mean : 0.50179
Variance : 0.50162
S.D. : 0.70825
Skewed Coef. : 1.44756
Kurtosis Coef. : 6.27128
MAD : 0.54079
Range : 15.30359
Mid_range : 7.65191
Median : 0.83982
Q1 : 0.48203
Q2 : 0.83982
Q3 : 1.34521
IQR : 0.86318
C.V. : 0.70784
409
the right side probability
0.995 0.99 0.975 0.95 0.9
W9 0.052165 0.074807 0.121894 0.178716 0.267168
the right side probability
0.1 0.05 0.025 0.01 0.005
W9 1.942176 2.370283 2.788086 3.331875 3.741441
410
appendix 10.3)k=5, n=1000,
( )
n
SST
(1) W1 = 2 = ∑ Yi − Y , degree of freedom=4999,
2
σ i =1
411
SSTr
(2) W2 = ,degree of freedom=4,
σ2
f W2 (w2 ) FW2 (w2 ) Coefficient
Mathematical Mean: 3.99977
Geometrical Mean : 3.05141
Harmonic Mean : 1.99845
Variance : 8.01231
S.D. : 2.83060
Skewed Coef. : 1.42463
Kurtosis Coef. : 6.10617
MAD : 2.16609
Range : 42.26903
Mid_range : 21.13479
Median : 3.35626
Q1 : 1.92139
Q2 : 3.35626
Q3 : 5.38631
IQR : 3.46492
C.V. : 0.70769
{ }
2
n
= ∑ (εˆi ) ,degree of freedom=4995,
SSE
(3) W3 =
2
σ 2
i =1
412
w3 − E (W3 ) w3 − E (W3 )
2
E FW3 − Φ =0.0000291555,
Var (W ) Var (W )
3 3
w − E (W3 )
− Φ w3 − E (W3 ) ≥ ε ,
P FW3 3
Var (W ) Var (W )
3 3
ε probability ε probability
0.1000 0.000000 0.0010 0.903133
0.0500 0.000000 0.0005 0.954476
0.0100 0.000000 0.0001 0.991415
0.0050 0.430280
W3 − E (W3 )
is not approached to the standard normal distribution.
Var (W3 )
[ ]
C.V. : 0.70686
E W4 − F (4,4995)(w4 ) =0.0000034609,
2
[ ]
E (W4 − F (4,4995)df (w4 )) = 0.0000000413,
2
413
the right side probability
0.995 0.99 0.975 0.95 0.9
W4 0.051818 0.074328 0.121122 0.177694 0.265910
X 1• − X 2• − (α 1 − α 2 ) X 1• − X 2•
(5) W5 = = , α 1 = 0, α 2 = 0, ,
(
S X 1• − X 2• ) (
S X 1• − X 2• )
f W5 (w5 ) FW5 (w5 ) Coefficient
Mathematical Mean: -0.00001
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00029
S.D. : 1.00015
Skewed Coef. : 0.00023
Kurtosis Coef. : 2.99970
MAD : 0.79804
Range : 10.98179
Mid_range : -0.24466
Median : -0.00023
Q1 : -0.67475
Q2 : -0.00023
Q3 : 0.67450
IQR : 1.34926
C.V. : none
[
E w5 − t 4995 (w5 )
2
] = 0.0000006366, E[(F W5 (w5 ) − t 4995 df (w5 ))2 ]= 0.0000000063,
{
P FW5 (w5 ) − t 4995 df (w5 ) ≥ ε , }
ε probability ε probability
0.1000 0.000000 0.0010 0.000000
0.0500 0.000000 0.0005 0.000000
0.0100 0.000000 0.0001 0.197183
0.0050 0.000000
W5 is t 4995 distribution and approached to standard normal distribution.
the right side probability
0.995 0.99 0.975 0.95 0.9
W5 -2.575921 -2.326218 -1.959983 -1.644768 -1.281282
Z -2.575 -2.326 -1.96 -1.645 -1.28
414
(6) W6 = Bartlett’s test statistic,
f W6 (w6 ) FW6 (w6 ) Coefficient
Mathematical Mean: 15.88240
Geometrical Mean : 12.10275
Harmonic Mean : 7.92057
Variance : 127.42862
S.D. : 11.28843
Skewed Coef. : 1.44590
Kurtosis Coef. : 6.21748
MAD : 8.62151
Range : 157.25538
Mid_range : 78.62853
Median : 13.30024
Q1 : 7.61448
Q2 : 13.30024
Q3 : 21.36097
IQR : 13.74650
C.V. : 0.71075
{ }
P FW6 (w6 ) − χ 42 df (w6 ) ≥ ε ,
ε probability ε probability
0.1000 0.890833 0.0010 0.998930
0.0500 0.945839 0.0005 0.999466
0.0100 0.989264 0.0001 0.999893
0.0050 0.994642
W6 is not approached to χ 42 , the chi square distribtion ,df=4,
the right side probability
0.995 0.99 0.975 0.95 0.9
W6 0.817171 1.174583 1.916733 2.812830 3.953046
( )
(7) W7 = Max S12 , S 22 ,.., S k2 SSE
f W7 (w7 ) FW7 (w7 ) Coefficient
Mathematical Mean: 0.00022
Geometrical Mean : 0.00022
Harmonic Mean : 0.00022
Variance : 0.00000
S.D. : 0.00001
Skewed Coef. : 0.88362
Kurtosis Coef. : 4.11040
MAD : 0.00001
Range : 0.00010
Mid_range : 0.00025
Median : 0.00022
Q1 : 0.00021
Q2 : 0.00022
Q3 : 0.00023
IQR : 0.00001
C.V. : none
415
the right side probability
0.995 0.99 0.975 0.95 0.9
W7 0.0002046 0.0002055 0.0002070 0.0002085 0.0002106
[ ]
C.V. : 0.70730
E W8 − F (4,4995)(w8 ) =0.7316761313,
2
[ ]
E (W8 − F (4,4995)df (w8 )) = 0.0427728338, P{W8 − F (4,4995)df (w8 ) ≥ ε } ,
2
ε probability ε probability
0.1000 0.814532 0.0010 0.998469
0.0500 0.914560 0.0005 0.999241
0.0100 0.984172 0.0001 0.999852
0.0050 0.992230
W8 is not approached to F4, 4995 distribution,
the right side probability
0.995 0.99 0.975 0.95 0.9
W8 0.088340 0.126567 0.206201 0.302279 0.451914
416
(9) W9 = Brwon-Forshe test statistic
f W9 (w9 ) FW9 (w9 ) Coefficient
Mathematical Mean: 1.00005
Geometrical Mean : 0.76317
Harmonic Mean : 0.50056
Variance : 0.50085
S.D. : 0.70771
Skewed Coef. : 1.42257
Kurtosis Coef. : 6.04993
MAD : 0.54139
Range : 10.09545
Mid_range : 5.04782
Median : 0.83911
Q1 : 0.48065
Q2 : 0.83911
Q3 : 1.34567
IQR : 0.86502
C.V. : 0.70767
417
(11) W11 = Cochran test statistic
f W11 (w11 ) FW11 (w11 ) Coefficient
Mathematical Mean: 0.22143
Geometrical Mean : 0.22123
Harmonic Mean : 0.22102
Variance : 0.00009
S.D. : 0.00968
Skewed Coef. : 0.88362
Kurtosis Coef. : 4.11040
MAD : 0.00760
Range : 0.10422
Mid_range : 0.25226
Median : 0.21998
Q1 : 0.21438
Q2 : 0.21998
Q3 : 0.22695
IQR : 0.01257
C.V. : 0.04370
418
Appendix 11. The errors and residuals when the
distribution of the errors is shifted-exponential
(1) W1 = ε 1
f W1 (w1 ) FW1 (w1 ) Coefficient
Mathematical Mean: -0.00009
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.99967
S.D. : 0.99983
Skewed Coef. : 1.99759
Kurtosis Coef. : 8.97695
MAD : 0.73578
Range : 16.86253
Mid_range : 7.43126
Median : -0.30712
Q1 : -0.71236
Q2 : -0.30712
Q3 : 0.38622
IQR : 1.09858
C.V. : none
(2) W11 = εˆ1 ,
f W11 (w11 ) FW11 (w11 ) Coefficient
Mathematical Mean: -0.00013
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.96244
S.D. : 0.98104
Skewed Coef. : 1.88515
Kurtosis Coef. : 8.54124
MAD : 0.72202
Range : 17.21643
Mid_range : 6.42728
Median : -0.27732
Q1 : -0.67318
Q2 : -0.27732
Q3 : 0.39038
IQR : 1.06356
C.V. : none
419
w11 − E (W11 )
Z (w11 ) = ,
Var (W11 )
f W11 (Z (w11 )), FW11 (Z (w11 )) Coefficient
Mathematical Mean: -0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00000
S.D. : 1.00000
Skewed Coef. : 1.88515
Kurtosis Coef. : 8.54124
MAD : 0.73598
Range : 17.54916
Mid_range : 6.55163
Median : -0.28255
Q1 : -0.68606
Q2 : -0.28255
Q3 : 0.39805
IQR : 1.08412
C.V. : none
w11 − E (W11 )
2
w11 − E (W11 )
E FW11 − Φ =0.0065250344,
Var (W ) Var (W )
11 11
w − E (W11 )
− Φ w11 − E (W11 ) ≥ ε ,
P FW11 11
Var (W ) Var (W )
11 11
ε probability ε probability
0.1000 0.321582 0.0010 0.992575
0.0500 0.644815 0.0005 0.996322
0.0100 0.925030 0.0001 0.999261
0.0050 0.962713
W11 − E (W11 )
is not approached to the standard normal,
Var (W11 )
the right side probability
0.995 0.99 0.975 0.95 0.9
Z (W11 ) -1.300855 -1.230216 -1.12423 -1.030475 -0.915864
Z -2.576 -2.326 -1.96 -1.645 -1.28
420
W1 = ε 1 , W11 = εˆ1 ,
f W1 ,W11 (w1 , w11 ) FW1 ,W11 (w1 , w11 )
421
w − E (W12 )
− Φ w12 − E (W12 ) ≥ ε ,
P FW12 12
Var (W ) Var (W )
12 12
ε probability ε probability
0.1000 0.342103 0.0010 0.992670
0.0500 0.650120 0.0005 0.996354
0.0100 0.926324 0.0001 0.999270
0.0050 0.963381
W12 − E (W12 )
is not the standard normal distribution.
Var (W12 )
the right side probability
0.995 0.99 0.975 0.95 0.9
Z (W12 ) -1.268817 -1.202948 -1.104417 -1.016864 -0.908898
Z -2.576 -2.326 -1.96 -1.645 -1.28
W1 = ε 1 , W12 = εˆ2 ,
f W1 ,W12 (w1 , w12 ) FW1 ,W11 (w1 , w12 )
422
W2 = ε 2 , W12 = εˆ2 ,
f W2 ,W12 (w2 , w12 ) FW2 ,W11 (w2 , w12 )
w13 − E (W13 )
Z (w13 ) = ,
Var (W13 )
f W13 (Z (w13 )), FW13 (Z (w13 )) Coefficient
Mathematical Mean: -0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00000
S.D. : 1.00000
Skewed Coef. : 1.71577
Kurtosis Coef. : 8.00312
MAD : 0.73398
Range : 18.74473
Mid_range : 4.90773
Median : -0.25249
Q1 : -0.65488
Q2 : -0.25249
Q3 : 0.41027
IQR : 1.06514
C.V. : none
423
w13 − E (W13 ) w13 − E (W13 )
2
E FW13 − Φ =0.0053123134,
Var (W ) Var (W )
13 13
w − E (W13 )
− Φ w13 − E (W13 ) ≥ ε ,
P FW13 13
Var (W ) Var (W )
13 13
ε probability ε probability
0.1000 0.224635 0.0010 0.992141
0.0500 0.615891 0.0005 0.996039
0.0100 0.920018 0.0001 0.999228
0.0050 0.960373
W13 − E (W13 )
is not approached to the standard normal,
Var (W13 )
the right side probability
0.995 0.99 0.975 0.95 0.9
Z (W13 ) -1.657456 -1.496394 -1.282640 -1.116063 -0.938869
Z -2.576 -2.326 -1.96 -1.645 -1.28
the right side probability
0.1 0.05 0.025 0.01 0.005
Z (W13 ) 1.335149 1.947893 2.609660 3.486793 4.150587
Z 1.28 1.645 1.96 2.326 2.576
W1 = ε 1 , W13 = εˆ3 ,
f W1 ,W13 (w1 , w13 ) FW1 ,W13 (w1 , w13 )
424
W3 = ε 3 , W13 = εˆ3 ,
f W3 ,W13 (w3 , w13 ) FW3 ,W13 (w3 , w13 )
f W11 ,W12 (w11 , w12 ) , W11 = εˆ1 , W12 = εˆ2 , FW11 ,W12 (w11 , w12 )
f W11 ,W13 (w11 , w13 ) , W11 = εˆ1 , W13 = εˆ3 , FW11 ,W13 (w11 , w13 )
425
E(W11)= -0.0001, Var(W11)= 0.9624, E(W13)= -0.0001, Var(W13)= 0.9131,
Cov(W11,W13)= 0.0027, W11 and W13 correlation coefficient=0.0029.
f W12 ,W13 (w12 , w13 ) , W12 = εˆ2 , W13 = εˆ3 , FW12 ,W13 (w12 , w13 )
Z (w1 ) =
( ),
βˆ0 − E βˆ0
Var (βˆ )
0
426
w1 − E (W1 )
2
w1 − E (W1 )
E FW1 − Φ =0.0012270612,
Var (W ) Var (W )
1 1
w − E (W1 )
− Φ w1 − E (W1 ) ≥ ε ,
P FW1 1
Var (W ) Var (W )
1 1
ε probability ε probability
0.1000 0.000000 0.0010 0.984610
0.0500 0.216585 0.0005 0.992397
0.0100 0.842099 0.0001 0.998467
0.0050 0.923134
W1 − E (W1 ) β 0 − E βˆ0
=
ˆ ( )
( )
is not approached to the standard normal,
Var (W1 ) Var βˆ
0
(6) W2 = β̂1
f W2 (w2 ) FW2 (w2 ) Coefficient
Mathematical Mean: 1.00006
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.80847
S.D. : 1.34479
Skewed Coef. : 1.03401
Kurtosis Coef. : 5.38857
MAD : 1.02249
Range : 24.45152
Mid_range : 7.82075
Median : 0.81303
Q1 : 0.08878
Q2 : 0.81303
Q3 : 1.69951
IQR : 1.61073
C.V. : 1.34471
427
Z (w2 ) =
( ),
βˆ1 − E βˆ1
Var (βˆ )
1
w2 − E (W2 )
2
w2 − E (W2 )
E FW2 − Φ =0.0015675226,
Var (W ) Var (W )
2 2
w − E (W2 )
− Φ w2 − E (W2 ) ≥ ε ,
P FW2 2
Var (W ) Var (W )
2 2
ε probability ε probability
0.1000 0.000000 0.0010 0.986468
0.0500 0.291121 0.0005 0.993190
0.0100 0.860482 0.0001 0.998638
0.0050 0.931600
W2 − E (W2 ) β1 − E βˆ1
=
ˆ ( )
( )
is not approached to the standard normal,
Var (W2 ) Var β ˆ
1
428
f W1 ,W2 (w1 , w2 ) , W1 = β̂ 0 , W2 = β̂1 , FW1 ,W1 (w1 , w2 )
( )
n
SST
= ∑ Yi − Y
2
(6) W3 = , SST is calculated when β1 = 1 ,
σ 2
i =1
( )
n
SSR
= β̂12 ∑ X i − X
2
(7) W4 = , SSR is calculated when β1 = 1 ,
σ2 i =1
429
Var (W4 ) = 9.68168 ≠ 2 × E (W4 ) = 2 × 1.55271,
SSR
is not chi square distribution,
σ2
n
= ∑ (εˆi ) , SSE is calculated when β1 = 1 ,
SSE
(8) W5 =
2
σ 2
i =1
430
[(
E FW10 (w10 ) − t 38 df (w10 ) ) ]=0.0011480745,
2
βˆ1 − β1 βˆ1 − 1
(12) W11 = =
( )
S βˆ1 S βˆ1 ( )
,
[(
E FW11 (w11 ) − t 38 df (w11 ) ) ]=0.0014880275,
2
431
t 38 -2.712425 -2.429447 -2.024893 -1.686300 -1.304611
432
Appendix 12. The critical values from two population
means test of arcsin and semi-circle
The critical value table of independent populations test statistic, one population
distribution is Arcsin that population mean is µ1 and the population variance is σ 12 ,
the other distribution is Semi-circle that population mean is µ 2 and the population
variance is σ 22 . The sample sizes of both populations are n.
∑ (X ) ∑ (X )
n n n1 n
∑X ∑ X2j
2
−X2
2
1i 1i − X1 2j
j =1 j =1
X1 = i =1
,X2 = , S12 = i =1
, S 22 = ,
n n n −1 n −1
∑ (X ) + ∑ (X )
n n
2 2
1i − X1 2j −X2
i =1 j =1
σ 12 = σ 22 = σ 2 , S spool
2
= ,
n+n−2
433
20 24.338859 25.577754 27.432793 29.048569 30.941318
25 32.680777 34.079478 36.167621 37.987511 40.117767
30 41.208021 42.744839 45.037098 47.042741 49.384924
40 58.557826 60.347179 63.015616 63.015615 68.061082
50 76.197494 78.232165 81.220576 83.834295 86.890946
60 94.124526 96.344728 99.638021 102.502531 105.843218
70 112.159900 114.562806 118.159471 121.264451 124.880088
80 130.223667 132.860676 136.724074 140.081709 143.979349
90 148.620290 151.385898 155.472251 159.013461 163.133853
100 167.043767 169.915952 174.239270 177.994473 182.336885
500 928.08501 934.90646 944.68181 953.20538 963.00543
1000 1898.95425 1908.62552 1922.40088 1934.51759 1948.29727
α
n 0.9 0.95 0.975 0.99 0.995
10 23.169775 24.714741 26.074063 27.665324 28.754879
15 34.273278 36.134509 37.760022 39.670647 21.873146
20 45.195865 47.317295 49.185332 51.369164 52.867983
25 56.015660 58.375919 60.434481 62.855678 64.513822
30 66.770287 69.344411 71.594274 74.223254 76.026592
40 88.079901 91.021454 93.60060 96.623616 98.698884
50 109.234755 112.517459 115.387940 118.721108 121.007592
60 130.301510 133.873324 136.993278 140.644338 143.128406
70 151.262203 155.106560 162.416367 162.416368 165.142292
80 172.173458 176.277876 179.847272 184.061336 186.911357
90 193.014321 197.368240 201.161451 205.591631 208.633962
100 213.815328 218.384264 222.394516 227.040805 230.198807
500 1033.23203 1043.26178 1051.97347 1061.99940 1068.91952
1000 2047.79392 2061.82921 2074.05669 2088.33990 2098.14225
434
1000 0.905672 0.914159 0.927203 0.938573 0.951877
α
n 0.9 0.95 0.975 0.99 0.995
10 1.898740 2.329106 2.817968 3.577266 4.254482
15 1.622038 1.882642 2.157653 2.553222 2.879902
20 1.496042 1.689531 1.886587 2.157596 2.373311
25 1.422024 1.579803 1.736509 1.946275 2.110210
30 1.372245 1.507056 1.638579 1.812095 1.945284
40 1.308066 1.415256 1.517819 1.649969 1.749932
50 1.267709 1.358526 1.444584 1.553117 1.633779
60 1.239547 1.319421 1.394039 1.488162 1.557474
70 1.217785 1.289998 1.356854 1.440634 1.501634
80 1.201597 1.267759 1.328695 1.404305 1.459434
90 1.188455 1.249250 1.305641 1.375237 1.425492
100 1.177207 1.234306 1.28686 1.351521 1.397114
500 1.073519 1.095503 1.115276 1.138159 1.154078
1000 1.051494 1.066640 1.079930 1.095475 1.106027
435
Appendix 13. The critical values of Zr statistic
The critical value table of Zr test statistic,
1st population is Double exponential distribution, population mean= µ X 1 ,
2
( )
population variance= σ X1 , X 1 ~ Double exponential λ X 1 =
2
σ X1
, µ X1 ,
nd
2 population is
2
X 2 , X 2 x1 ~ Double exponential λ X 2 = , µ X 2 = x1 ,
σ X 2 − ρ 2 σ X1
2 2
( ) ( )
population mean= µ X 2 , population variance= σ X 2 .
2
( )
Two populations are dependent, ρ = 0.5 , simulated the n pair samples.
H 0 : ρ ( X 1 , X 2 ) = ρ 0 = 0.5 ,
1 1+ r 1 1 + ρ0
Z r = ln , Z ρ0 = ln ,
2 1− r 2 1 − ρ 0
Z r − Z ρ0 Z r − Z 0.70710678118
Z test statistic n →
>10
= = W9 ,
1 1
n−3 17
∑ (X )( )
n n n
1i − X 1 X 2i − X 2 ∑ X 1i ∑X 2i
r= i =1
,X1 = i =1
,X2 = i =1
,
∑ (X ) ∑ (X )
n
2
n
2 n n
1i − X1 2i −X2
i =1 i =1
1 1+ r
Zr = ln is approached to standara normal disrribution when n > 10 .
2 1− r
W9 is not symmetric distribution, P(W9 ≤ W9,1−α ) = α ,
(1)n=5,
α 0.005 0.01 0.025 0.05 0.1
Critical value -2.661393 -2.316441 -1.846836 -1.474369 -1.073827
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.561924 1.984510 2.372679 2.855005 3.204614
(2)n=10,
α 0.005 0.01 0.025 0.05 0.1
Critical value -2.888845 -2.572160 -2.119369 -1.742329 -1.317886
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.698560 2.160881 2.572568 3.064138 3.408551
(3)n=15,
α 0.005 0.01 0.025 0.05 0.1
Critical value -2.978682 -2.665197 -2.214938 -1.834618 -1.401475
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.723679 2.195487 2.613863 3.111734 3.456902
436
(4)n=20,
α 0.005 0.01 0.025 0.05 0.1
Critical value -3.034397 -2.722552 -2.271044 -1.886826 -1.447965
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.732903 2.210861 2.632875 3.133993 3.479965
(5)n=25,
α 0.005 0.01 0.025 0.05 0.1
Critical value -3.074671 -2.763184 -2.309700 -1.923198 -1.479528
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.774830 2.217604 2.640761 3.141121 3.487382
(6)n=30,
α 0.005 0.01 0.025 0.05 0.1
Critical value -3.101670 -2.791216 -2.337317 -1.949068 -1.501659
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.740043 2.222071 2.647542 3.150122 3.497317
(7)n=35,
α 0.005 0.01 0.025 0.05 0.1
Critical value -3.125688 -2.814323 -2.358315 -1.967575 -1.517606
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.779227 2.224297 2.649059 3.150684 3.496449
(8)n=40,
α 0.005 0.01 0.025 0.05 0.1
Critical value -3.146775 -2.833434 -2.376144 -1.984586 -1.532513
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.739866 2.224165 2.648785 3.150423 3.495764
(9)n=50,
α 0.005 0.01 0.025 0.05 0.1
Critical value -3.176120 -2.863397 -2.404082 -2.009037 -1.553884
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.741854 2.227245 2.652193 3.151441 3.497727
(10)n=60,
α 0.005 0.01 0.025 0.05 0.1
Critical value -3.195373 -2.881483 -2.420810 -2.024091 -1.565718
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.740656 2.226746 2.653092 3.153596 3.497734
(11)n=100,
α 0.005 0.01 0.025 0.05 0.1
Critical value -3.248393 -2.931687 -2.466615 -2.065029 -1.601086
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.737634 2.223860 2.651814 3.151870 3.495316
437