Big Data Analysis

Big Data
Analysis Method
- Mathematical Method -
Curve-fitting, Curve-linear,
Non-linear Model, Linear Model,
Probability Theory, Simulator
It is not statistical analysis
Author
Kuan-Sian Wang
Mei-Yu Lee
2015/6/15
1
Announcement
This is a free book, but all copyright is reserved.
Big data analysis is a very important method applied in the most part of fileds for our
world. We have researched as so far and want to share with the persons who are
interested in. It is our honor for academic researches of big data and we hope to share
freely our results for the whole world, and then to introduce in more correct analysical
methods for the future.
1
Contents
Preface............................................................................................................................ 1
Chaper 1. Basic analysis method ................................................................................ 1
1.1. The frequency distribution table cannot analysis big data ......................................................... 1
1.2. Assumption population is normal distribution, it is not a good idea. ......................................... 4
1.3. The hypothesis and test is not analyis method about big data .................................................... 9
Chaper 2. The population distribution test and the population mean and variance
test 14
2.1. The population distribution test................................................................................................ 14
2.2. One population mean and population variance test .................................................................. 25
2.3. Two independent population means and population variances test .......................................... 28
2.4. Two dependent population means and population variances test ............................................. 38
Chaper 3. The population proportion test ................................................................. 44
3.1. One population proportion test, ................................................................................................ 44
3.2. Two independent population proportion test ............................................................................ 54
Chaper 4. One way analysis ..................................................................................... 59
4.1. one way model ......................................................................................................................... 59
4.2. the α
= i 0,=i 1, 2, ..., k , .................................................................................................... 59
4.3. the α i ≠ 0, i = 1,2,..., k , ....................................................................................................... 62
4.4. the α i ≠ 0, i = 1,2,..., k and error distribution is Arcsin distribution. .................................. 67
4.5. the α i ≠ 0, i = 1,2,..., k and error distribution of each category has a specific probability
distribution. ........................................................................................................................................ 80
4.6. the α i = 0, i = 1,2,..., k and error distribution of each category has a specific probability
distribution. ........................................................................................................................................ 84
4.7. the α i = 0, i = 1,2,..., k , ........................................................................................................ 88
Chaper 5. Simple linear model ................................................................................. 92
5.1. Simple linear analysis .............................................................................................................. 92
5.2. The parabola model analysis, three basic assumptions are unchanged. ................................... 92
5.3. The comparison of independent variable is Normal distribution and independent variable is
Arcsin distribution, the three basic assumptions are unchanged...................................................... 102
5.4. The error probability distribution is not normal distribution and other basic assumptions are
unchanged. ....................................................................................................................................... 124
5.5. The variances of error are not equally and the other basic assumptions are unchanged. ....... 135
5.6. The independent variable has a shifted exponential distribution and the non-linear model, the
three basic assumptions are unchanged. .......................................................................................... 149
5.7. The random vatiable range has a specific region and the three basic assumptions are
unchanged. ....................................................................................................................................... 167
1
5.8. The 3th basic assumptionis modified, error has the Durbin Watson the first order
autoregressive error model............................................................................................................... 185
Chaper 6. The general linear model and non-linear model .................................... 197
6.1. multiple regression analysis ................................................................................................... 197
6.2. Collinarity in highly, the other assumptions are unchanged. .................................................. 198
6.3. The probability distributions of independent variable and error are not normal distribution, the
other assumptions are unchanged. ................................................................................................... 210
6.4. Non-linear model and the other assumptions are unchanged. ................................................ 239
6.5. Non-linare model and the indepenet variable is the sample statistics, the other assumptions are
unchanged. ....................................................................................................................................... 258
6.6. Dummy variable is one of independent variable, the other assumptions are unchanged. ...... 285
6.7. The endogenous variable in the linear model, the other assumptions are unchanged. ........... 296
Chaper 7. Multi-variate analysis using linear model .............................................. 316
Appendix 1. The common probability distributions ............................................... 345
Appendix 2. The Curve-linear of linear model analysis ......................................... 347
Appendix 3. The mathametical formula of Non-linear model analyis, .................. 348
Appendix 4. The limiting theory of cumulative probability distribution function . 349
Appendix 5. An application of Dow Jones ............................................................. 350
Appendix 6. The estimation of Cos model analysis ............................................... 359
Appendix 7. The population of Logistic distribution ............................................. 376
Appendix 8. The critical values of Logistic distribution ........................................ 381
Appendix 9. The transformation of probability distribution by the simulator ....... 383
Appendix 10. One way analysis when the error distribution is arcsin ................. 396
Appendix 11. The errors and residuals when the distribution of the errors is
shifted-exponential..................................................................................................... 419
Appendix 12. The critical values from two population means test of arcsin and
semi-circle 433
Appendix 13. The critical values of Zr statistic .................................................... 436
2
Preface
The big data is a population data, the anslysical method is belogned to mathecial
mehtod. The amount of data is huge and very hard to get the characteritics of big data.
Before the big dat analyis, the computer software must have the follwowing
functions:
(1) The curve-fitting method: it can formulate the pattern of big data.
(2) The probability distribution transformation simulator: it can get any kind of
probability distributions and do the transformation of probability dsitributions.
(3) SLLN software: it can analysize the central limiting theory and law of large
number.
(4) The curve-linear method: it can find out the relationship of two random variables,
which one is a mathematical combination of lot of variables.
In presnet, the statistical analysis is always the tool for big data, however, it is
incorrect way. Statistics is used on the condition of the part data of a population to
infer the characterestics of a population. But the big data is not part of population data,
but population, so the statistical analyis is not the true analysis tool for big data.
For easy to understand, this book introduces the orders of chapers and method
following the Statistics book. There are 36 examples that can study the difference
between the statistical analysis and the big data analysis. Readers can use the output
digit to understand the big data analysis skills.
The statiscal analysis method and theroy cannot analyize the big data, in
particular, the sampling distribution of test statistic cannot be gottten if the population
is not normal distribution. Of coures, the critical values of test statistic are always a
problem as calculating the values. The result of hypothesis and test doest not answer
in reality. Indeed, the small sample data can be analysized by the statistical analysis
and we get the information of assumption population distribution. The statistical
analysis is not suitable for the population that is big data.
The big data analysis is belonged to the analysis method of probability
distribution. Here, the following courses are necessary to understand the process of
big data analysis:
1) probability theory, 2) advance caluculus, 3)matrix, 4)mathematical statistics,
5)linear model. Big data analysis method is not as easy as the statistical analysis and
the process is also not easy to know. The accurate analysis method is always relied on
the mathematical method in generally.
The computer software is desinged and coded by the author, includng statistical
analysis package, probability distribution transformation simulator, the sampling
1
distribution of test statistics and residual, the sampling distribution of Durbin-Watson
test and LM test. This software can run and analyze the small sample data and the big
data.
The contents include 36 examples as follows.
Chapter 1 Basic analysis method
Section 1 The frequency distribution table cannot analysis big data
( )
Example 1, X 1 ~ Normal µ X1 = 0,σ X2 1 = 2 2 ,
Section 2 Assumption population is normal distribution, it is not a good idea.
Example 2, The population is shifted exponential
distribution,
X ~ Shifted_exponential (λ X , c X ) the sample mean and the sample variance.
Section 3 The hypothesis and test is not analyis method about big data
( )
Example 3, X 1 ~ Normal µ X1 = 100,σ X2 1 = 10 2 , , simulated the sample which size
is n,n=500,000,000, hypothesis and test.
Chapter 2 The population distribution test and the population mean and variance test
Section 1 The population distribution test
Example 4,Population is Normal(0,1), n=100,goodness of fit test
Example 5,Population is
U_quadratic(0,1)+ U_quadratic(0,1),
simuated the sample data which size is
100,000,000, the curve-fitting method.
Section 2 One population mean and population variance test
Example 6,Population is the Logistic distribution,
population mean=100,
population variance= 4, simulated 100 samples,
Section 3 Two independent population means and population variances test
Example 7 1st population is Arcsin distribution, population mean=100, population
variance= 25, simulated 50 samples.
2nd population is Semi circle distribution,
population variance= 25, simulated 50 samples.
Two populations are independent,
Example 8 1st population is Arcsin distribution, population
mean=100,population variance= 25, simulated
60,000,000 samples.
2nd population is Semi circle distribution,
population mean=100, population variance= 25,
simulated 60,000,000 samples.
Let X 1 is the data set of 1st population, X 2 is the data set of 2nd population and two
sample sizes are big data.
Example 9 1st population is Normal distribution,
2nd population is Normal distribution, population
mean=100,population variance= 9,
2
simulated 15 samples.
Section 4 Two dependent population means and population variances test
Example 10 1st population is Double exponential distribution, population
mean=100, population variance= 8,
(
X 1 ~ Double exponential λ X 1 = 0.5, µ X1 = 100 , )
nd
2 population is
(
X 2 , X 2 x1 ~ Double exponential λ X 2 = 0.5, µ X 2 = x1 , )
Two populations are dependent, simulated the 20 pair samples.
Example 11 1st population is Double exponential distribution, population
mean=100, population variance= 8,
(
X 1 ~ Double exponential λ X 1 = 0.5, µ X1 = 100 , )
(
2nd population is X 2 , X 2 x1 ~ Double exponential λ X 2 = 0.5, µ X 2 = x1 , )
Two populations are dependent, simulated the 60,000,000 pair samples.
Chapter 3 The population proportion test

Section 1 One population proportion test,
Example 12 The population is B(1, p = 0.5) and simulated n samples, the
summation of sample is B(n, p = 0.5) ,
sample poprtion pˆ = , X ~ B(n, p = 0.5), x = 0,1,..., n,

X
n
Example 13 The population is B(1, p 0 ) and simulated n samples, the summation
of sample is B(n, p 0 ) ,
sample poprtion pˆ = , X ~ B(n, p 0 ), x = 0,1,..., n,

X
n
pˆ − p0
H 0 : p = p0 , test statistic= , confidence interval
p0 (1 − p0 )
n
pˆ − p
formula=
pˆ (1 − pˆ )
n
Example 14, The population is B(1, p ) , simulated the sample size n=100,0000, it is
big data(population data), the sample porportion is population porportin.
Example 15, X 1 ~ Beta(α = 5, β = 5) , X 2 x1 ~ B(1, x1 ) . Let
X 2 = x1 + ε ,
Section 2 Two independent population proportion test
Example 16,
X 1 ~ Binomial (n1 , p1 ), pˆ 1 = 1 , X 2 ~ Binomial (n2 , p 2 ), pˆ 2 = 2 ,
X X
n1 n2
X 1 , X 2 are independent r.v.’s,
3
pˆ 1 − pˆ 2 X1 + X 2
W3 = ,p= ,
(
p 1− p ) 1 1
+
n1 n2
n1 + n2
pˆ 1 − pˆ 2
W5 = ,
pˆ 1 (1 − pˆ 1 ) pˆ 1 (1 − pˆ 1 )
+
n1 n2
Example 17, X 1 ~ Beta(α = 5, β = 5) , X 2 x1 ~ B(1, x1 ) ,

let X 2 = x1 + ε 1 ,
X 3 ~ Beta(α = 0.5, β = 0.5) , X 4 x3 ~ B(1, x1 ) , let X 4 = x3 + ε 2 ,
X 1 , X 3 are independent random variables,
X 2 , X 4 are independent random variables.
Y1 = X 2 − X 4 marginal probability distribution?
Chapter 4 One way analysis

Section 1 one way model
Section 2 the α i = 0, i = 1,2,..., k ,,
Example 18 Normal population is divided to 5 categories,
(
Category 1 population, X 1 ~ N µ1 = 25, σ 12 = 52 , )
(
Category 2 population, X 2 ~ N µ 2 = 25,σ 22 = 5 2 , )
(
Category 3 population, X 3 ~ N µ 3 = 25, σ 32 = 5 2 , )
Category 4 population, X 4 ~ N (µ 4 = 25, σ 4
2
= 5 ),
2
Category 5 population, X 5 ~ N (µ 5 = 25, σ 5

2
= 5 ),
2
The each has n sample data, one way model is designed by

X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,
α1 = 0,α 2 = 0,α 3 = 0,α 4 = 0,α 5 = 0, ε ij ~ Normal (0,σ ε2 = 5 2 )

iid
Section 3 the α i ≠ 0, i = 1,2,..., k ,

Example 19 Normal population is divided to 5 categories,
(
Category 1 population, X 1 ~ N µ1 = 15, σ 12 = 5 2 , )
(
(
Category 4 population, X 4 ~ N (µ 4 = 5, σ = 5 ),
2
4
2
Category 5 population, X 5 ~ N (µ5 = 45, σ = 5 ),

2
5
2

X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,
α 1 = −10,α 2 = 10,α 3 = −0,α 4 = −20,α 5 = 20, ε ij ~ Normal (0,σ ε2 = 5 2 )

iid
Section 4 the α i ≠ 0, i = 1,2,..., k and error distribution is Arcsin

distribution.
Exmple 20,
4
the α i ≠ 0, i = 1,2,..., k ,
Arcsin population is divided to 5 categories,
Category 1 population, X 1 ~ Arc sin (µ1 = 5, c1 = 10 ),
Category 2 population, X 2 ~ Arc sin (µ 2 = 15, c2 = 10 ),
X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,
α 1 = −20,α 2 = −10,α 3 = 0,α 4 = 10,α 5 = 20, ε ij ~ Arc sin (0, cε = 10),

iid
σ ε2 = 50,
Section 5 the α i ≠ 0, i = 1,2,..., k and error distribution of each category
has a specific probability distribution.
Exmple 21,the α i ≠ 0, i = 1,2,..., k ,
(
Category 2 population, X 2 ~ Normal µ 2 = 15, σ 22 = 50 , )
(
Category 3 population, X 3 ~ Semi _ circle µ 3 = 25, R3 = 200 , )
Category 4 population, X 4 ~ DE (λ4 = 0.2, µ 4 = 35),
Category 5 population, X 5 ~ Triangular1(µ5 = 45, c5 = 10 ),
X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,
α 1 = −20,α 2 = −10,α 3 = 0,α 4 = 10, α 5 = 20,
ε 1 j ~ Arc sin (0, cε = 10 ), σ ε2 = 50, ε 2 j ~ Normal (0, σ ε2 ), σ ε2 = 50,

iid iid
1 1 2 2
( )
ε 3 j ~ Semi _ circle 0, Rε = 200 ,σ ε2 = 50, ε 4 j ~ DE (λε = 0.2,0),
iid iid
3 3 4
σ ε2 = 50, ε 5 j ~ Triangular1(0, cε = 10 ),σ ε2 = 50,

iid
4 5 5
Section 6 the α i = 0, i = 1,2,..., k and error distribution of each category

has a specific probability distribution.
Exmple 22,the α i = 0, i = 1,2,..., k ,
(
Category 2 population, X 2 ~ Normal µ 2 = 15, σ 22 = 50 , )
(
Category 3 population, X 3 ~ Semi _ circle µ 3 = 25, R3 = 200 , )
5
X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,
α 1 = −20, α 2 = −10, α 3 = 0, α 4 = 10, α 5 = 20,
ε 1 j ~ Arc sin (0, cε = 10), σ ε2 = 50, ε 2 j ~ Normal (0, σ ε2 ), σ ε2 = 50,

iid iid
1 1 2 2
( )
ε 3 j ~ Semi _ circle 0, Rε = 200 , σ ε2 = 50, ε 4 j ~ DE (λε = 0.2,0),
iid iid
3 3 4
σ ε2 = 50, ε 5 j ~ Triangular1(0, cε = 10), σ ε2 = 50,

iid
4 5 5
Section 7 the α i = 0, i = 1,2,..., k ,

This section is checking the multiple comparison method and the critical value.
Chapter 5 Simple linear model

Section 1 Simple linear analysis
Section2 The parabola model analysis, three basic assumptions are
unchanged.
(
Example 23, X 1 ~ Normal µ X1 = 0,σ X2 1 = 2 2 , )
the population conditional expectation line is
( )
E X 2 x1 = β 0 + β1 x12 = 1 + 2 x12 , ε ~ Normal 0,σ 2 = 1 , ( )
Section 3 The comparison of independent variable is Normal distribution and
independent variable is Arcsin distribution, the three basic assumptions are
unchanged.
Example 24, independent variable is Normal distribution,
(
X 1 ~ Normal µ X1 = 0, σ X2 1 = 8 , )
The population conditional expectation line is
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0,σ 2 = 1 , ( )
Example 25, independent variable is Arcsin distribution,
(
X 1 ~ Arc sin µ X1 = 0, c X1 = 4 , )
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0,σ 2 = 1 , ( )
X 2i = β 0 + β1 X 1i + ε i , i = 1,2,...., n ,
the three basic assumptions are unchanged.
Section 4 The error probability distribution is not normal distribution and other basic
assumptions are unchanged.
Example 26 The error probability distribution is shifted exponential
(
distribution. X 1 ~ Normal µ X1 = 1000, σ X2 1 = 10 2 , )
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Shifted _ exp onential (λ = 1, c = −1),
X 2i = β 0 + β1 X 1i + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope,
ε i is error.Three basic assumptions are
i) ε i ~ shifted exponential distribution ,ii) E (ε i ) = 0,Var (ε i ) = σ 2 ,
iii) ε 1 ,..., ε n are independently.
Section 5 The variances of error are not equally and the other basic assumptions are
unchanged.
Example 27 The variances of error are not equally,
6
(
X 1 ~ Normal µ X = 10, σ X2 = 12 , )
(
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0, σ 2 = X 14 , )
X 2i = β 0 + β1 X 1i + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope,
ε i is error,Three basic assumptions are
i) ε i ~ shifted exponential distribution ,
ii) E (ε i ) = 0,Var (ε i ) = σ 2 is affected by X1,
Section 6 The independent variable has a shifted exponential distribution and the
non-linear model, the three basic assumptions are unchanged.
(
Example 28 X 1 ~ Shifted _ exponential λ X 1 = 1, c X 1 = 0.1 , )
E ( X 2 x1 ) = β 0 + β1 ( x1 + log( x1 )) = 1 + 2( x1 + log( x1 )),
ε ~ Normal (0, σ 2 = 1),
X 2i = β 0 + β1 H ( X 1i ) + ε i , i = 1,2,...., n , β 0 is intercept,
β1 is slope, ε i is error,
three basic assumptions are
i) ε i ~ Normal distribution,ii) E (ε i ) = 0,Var (ε i ) = σ 2 ,
iii) ε 1 ,..., ε n are independently,
Section 7 The random vatiable range has a specific region and the three basic
(
Example 29, X 1 ~ Normal µ X 1 = 2, σ X2 1 = 5 2 , )
( ) (
E X 2 x 1 = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0,σ 2 = 1 , )
− 20 ≤ X 1 X 2 ≤ 20 , X 2i = β 0 + β1 X 1i + ε i , i = 1,2,...., n ,
three basic assumptions
i) ε i ~ Normal distribution,ii) E (ε i ) = 0, Var (ε i ) = σ 2 ,
Section 8 The 3th basic assumptionis modified, error has the Durbin Watson the first
order autoregressive error model.
Example 30, Durbin Watson model
(
X 1 ~ Normal µ X1 = 2, σ X2 1 = 5 2 , )
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 ,
µ ~ Normal (0, σ 2 = 1), there are n paired samples, T=n。
X 2t = β 0 + β1 X 1t + ε t , t = 1,2,...., T ,
β 0 is intercept, β1 is slope, ε i is error,
ε t = ρε t −1 + µ t , t = 1,2,3,...., T , ε 0 = 0, ρ < 1, let ρ =0.5.
The three basic assumptions are
i) µt ~Normal distribution,ii) E (µ t ) = 0, Var (µ t ) = σ 2 ,
7
iii) µ1 ,..., µ T are independently.
Chapter 6 The general linear model and non-linear model

Section 1 multiple regression analysis
Section 2 Collinarity in highly, the other assumptions are unchanged.
Example 31,
Multi-variate normal distribution and there are 5 random variables,
the vector of population expection mean and cov-variance matrix
 E ( X 1 )  100   1 0.99 0.99 0.99 0.99
 E ( X )  0  0.99 1 0.99 0.99 0.99
 2    
μ =  E ( X 3 ) = − 100, Σ = 0.99 0.99 1 0.99 0.99,
     
 E ( X 4 ) − 120 0.99 0.99 0.99 1 0.99
 E ( X 5 )  180  0.99 0.99 0.99 0.99 1 
X i ~ Normal (E ( X i ),Var ( X i )),Var ( X i ) = 1, i = 1,2,..,5,
Cov (X i , X j ) = ρ ((X i , X j )) = 0.99, i, j = 1,2,...,5, i ≠ j ,
Section 3 The probability distributions of independent variable and error
are not normal distribution, the other assumptions are
unchanged.
Example 32,
X 1 ~ Arc sin (µ = 100, c = 10),
X 2 ~ Double _ exponential (λ = 0.1, µ = 50),
X 3 ~ Semi _ circle(µ = 100, R = 10),
X 4 ~ Logistic (µ = 100, σ = 10),
X 5 ~ Gamma(α = 50, β = 2),
X 6 ~ U _ quadratic(a = 90, b = 110),
X 1 , X 2 ,..., X 6 are independent random variables.
X 7 = 1 + 2 X1 + 3X 3 + 4 X 4 + 5X 5 + 6 X 6 + ε ,
ε ~ Raised _ secant (0, s = 5 ),
Section 4 Non-linear model and the other assumptions are unchanged.
Example 33,
X 1 ~ Normal (E ( X 1 ) = 100,Var ( X 1 ) = 25),
 E (X 2 x1 ) = 50 + 0.5 x1 , 
X 2 x1 ~ Normal  ,
Var (X x ) = 16 
 2 1 
 E (X 3 x1 , x 2 ) = 10 + 0.5 x1 + 0.5 x 2 , 
X 3 x1 , x 2 ~ Normal  ,
Var (X x , x ) = 12.25 
 3 1 2 
 E (X 4 x1 , x 2 ) = 5 + 0.7 x1 + 0.3x 2 , 
X 4 x1 , x 2 ~ Normal  ,
Var (X x , x ) = 16 
 4 1 2 
ε ~ Normal (E (error ) = 0,Var (error ) = 16),
X 5 = 1 + 2 X 1 + 3Cos ( X 2π ) + 4 X 3 + 5 log( X 4 ) + ε ,
Section 5 Non-linare model and the indepenet variable is the sample statistics, the
8
other assumptions are unchanged.
Example 34,
( )
iid
X 1 , X 2 ,....., X 10 ~ Normal µ X i = 100,σ X2 i = 25 ,
X 11 = sample Mid _ range ( X 1 , X 2 ,....., X 10 ) + ε ,
ε ~ Normal (µε = 0,σ ε2 = 16 )
Section 6 Dummy variable is one of independent variable, the other assumptions are
unchanged.
Example 35,
Dummy=0,
X 1 ~ Normal (E ( X 1 ) = 100,Var ( X 1 ) = 25),
X 2 x1 ~ Normal (E (X 2 x1 ) = 50 + 2 x1 ,Var (X 2 x1 ) = 1),
ε ~ Normal (E (ε ) = 0,Var (ε ) = 16 ),
X 3 Dummmy = 0, x1 , x2 = 50 + 2 x1 + 3x2 + ε
Dummy=1,
X 1 ~ Normal (E ( X 1 ) = 100,Var ( X 1 ) = 25),
ε ~ Normal (E (ε ) = 0,Var (ε ) = 16 ),
X 3 Dummmy = 1, x1 , x2 = 10 + x1 + 5 x2 + ε
Section 7 The endogenous variable in the linear model, the other assumptions are
unchanged.
Example 36,
X 2 (t + 1) = β 0 + β1 X 1 (t ) + β 2 X 3 (t ) + β 3 X 4 (t ) + ε 1 (t ),
X 1 (t + 1) = α 0 + α 1 X 2 (t + 1) + α 2 X 3 (t + 1) + α 3 X 4 (t + 1) + ε 2 (t + 1),
X3(t)~ Normal(mu=10,sigma*sigma=4),
X4(t)~ Normal(mu=30+2*X3,sigma*sigma=25),
X 2 (t + 1) = 0.1 + 0.8 X 1 (t ) + 0.2 X 3 (t ) − 0.02 X 4 (t ) + ε 1 (t ),

X 1 (t + 1) = 0.2 + 0.9 X 2 (t + 1) + 0.3 X 3 (t + 1) − 0.01X 4 (t + 1) + ε 2 (t + 1),
ε 1 = ε 2 = ε (t ) ~ Normal (0,1), t = 0,1,2,....., n − 1 , X 1 (t = 0) = 10,
Chapter 7 Multi-variate analysis using linear model

Example 37，
X1~Shifted exponential(1,0.1),
X2|x1~Normal(4+5*log(x1),4),
X3|x1~Raised cosine(5+x1+log(x1),2),
X4|x1,x2~Semi circle(3+0.5*x1+0.5*x2,4),
X5|x2,x3~Arcsin(4.5+0.3*x2+0.7*x3,3),
X6|x4,x5~DE(0.5,10+2*x4*x5),
(1)The population distribution of sample data,
(2).The marigainl probability distribution and join probability
distribution from the sample data,
(3).Estimating the cumulative probability distribution function using
9
Curve-fitting,
(4)The multi-variate analyis is substituted by non-line analysis,
(4.1).Conclusion
(5).The mathematical model,
(6).The confirm the mathematical model using the probability
distribution simulator,
Appendix 1, The probability distribution,

Appendix 2, the Curve-linear of linear model analysis,
Appendix 3,Non-linear model analyis,
Appendix 4, the limiting theory of cumulative probability distribution
function
Appendix 5,Dow Jones industry index is additive measure and is not
close range,
Appendix 6 The Cos model analysis,
(
appendix 6.1) X 1 ~ Normal µ X1 = 0,σ X2 1 = 2 2 , )
( )
E X 2 x1 = β 0 + β1 cos( x1π ) = 1 + 2 cos( x1π ),
ε ~ Normal (0,σ 2 = 1),
(
Appendix 6.2) X 1 ~ Normal µ X1 = 0,σ X2 1 = 2 2 , )
( )
E X 2 x1 = β 0 + β1 cos 2 ( x1π ) = 1 + 2 cos 2 ( x1π ),
ε ~ Normal (0,σ 2 = 1),
Appendix 7
The population is Logistic probabilitydistribution, the population mean is
100 and the population variance is 4,
simulating 100,000,000 samples,
( the parameters of Logisitic are µ = 0, σ = 1.10760 ).
Appendix 8 The population distribution is Logistic, the critical value of test statistic.
Apprendix 9 The proability distribution transformation using the simulator,
appendix 9.1,
X 1 , X 2 ~ Unform(− 1,1), f X i (xi ) = 0.5,−1 < xi < 1, i = 1,2,
iid
appendix 9.2,
X 1 ~ Shifted_ exp onential (λ1 = 1, c1 = 0 ),
X 2 ~ DEl (λ2 = 1, µ 2 = 0 ),
X 1 and X 2 are independent random variables,
appendix 9.3, X 1 ~ Arc sin (0,1), X 2 x1 ~ Uniform − x12 , x12 , ( )
f X 1 (x1 ) = ,−1 < x1 < 1, f X 2 x1 (x 2 x1 ) =
1 1 1
, x 2 ≤ x12 ,
π 1 − x12 2 x12
X 1 and X 2 are not independent random variables,
appendix 9.4,
10
X 1 , X 2 ~ Unform(− 1,1), f X i (xi ) = 0.5,−1 < xi < 1, i = 1,2,
iid
the range of random variables is changed to 0.1 ≤ X 12 + X 22 ≤ 0.8 ,

P( 0.1 ≤ X 12 + X 22 ≤ 0.8 )=0.6282,
appendix 9.5， X 1 , X 2 , X 3 , X 4 ~ Uniform(α = −1, β = 1),

iid
X 1 = r sin θ , X 2 = r cosθ sin φ ,

X 3 = r cosθ cos φ sin γ , X 4 = r cosθ cos φ cos γ ,
P1 = R = X 12 + X 22 + X 32 + X 42 ,
X 
P2 = θ = tan −1  1 × sin φ ,
 X2 
X  X 
P3 = φ = tan −1  2 × sin γ , P4 = γ = tan −1  3 ,
 X3   X4 
( )
appendix 9.6， X i ~ Normal µ i = i, σ i2 = 2 2 , i = 1,2,...,10,
X 1 ,..., X 10 are indepednent random variables and let
∑ (X )
10 10
∑
2
Xi − X i −X
W1 = MAD = i =1
, W2 = S = i =1
.
10 9
Appendix 10 One way analyis，the sampling distribution of test
statsistic when error distribution is arcsin distribution.
Appendix 10.1)k=5, n=5,
Appendix 10.2)k=5, n=100,
Appendix 10.3)k=5, n=1000,
11
Chaper 1. Basic analysis method
1.1. The frequency distribution table cannot analysis big data
The frequency distribution table is arranged data method, the process has the class
number, frequency of each class and class limit. The formula of class number
k = log 2 (n ) + 1, k =class number, n =sample size,when n=100,000,000 k= 26.
The 26 class cannot understand the character of data set that has 100,000,000
records.
For accurately, the probability method is a good method when big data.
Note: Big data is not close set, Curve-linear analysis can be usedful, please refer the
Appendix 5.
( )
Example 1, X 1 ~ Normal µ X1 = 0,σ X2 1 = 2 2 , simulated the sample which size is n.
(1.1)n=10, frequency distribution table,
X1 frequency distribution table
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -6.31382~ -4.88201 -5.59792 10.00000 0.0100000 0.0100000
[ 2 ] -4.88201~ -3.45020 -4.16610 34.00000 0.0340000 0.0440000
[ 3 ] -3.45020~ -2.01839 -2.73429 128.00000 0.1280000 0.1720000
[ 4 ] -2.01839~ -0.58657 -1.30248 231.00000 0.2310000 0.4030000
[ 5 ] -0.58657~ 0.84524 0.12933 279.00000 0.2790000 0.6820000
[ 6 ] 0.84524~ 2.27705 1.56115 197.00000 0.1970000 0.8790000
[ 7 ] 2.27705~ 3.70886 2.99296 84.00000 0.0840000 0.9630000
[ 8 ] 3.70886~ 5.14068 4.42477 27.00000 0.0270000 0.9900000
[ 9 ] 5.14068~ 6.57249 5.85658 10.00000 0.0100000 1.0000000
frequency distribution: sample mean=-0.075416 , sample variance=4.355512 , sample sd=2.086986
(1.2n=100,000,000, frequency distribution table,

X1 frequency distribution table, but cannot response the charateric of X1.
[ 1 ] -11.18981~ -10.29611 -10.74296 14.00000 0.0000001 0.0000001
[ 2 ] -10.29611~ -9.40241 -9.84926 108.00000 0.0000011 0.0000012
[ 3 ] -9.40241~ -8.50871 -8.95556 908.00000 0.0000091 0.0000103
[ 4 ] -8.50871~ -7.61501 -8.06186 5923.00000 0.0000592 0.0000695
[ 5 ] -7.61501~ -6.72131 -7.16816 31998.00000 0.0003200 0.0003895
[ 6 ] -6.72131~ -5.82760 -6.27445 139820.00000 0.0013982 0.0017877
[ 7 ] -5.82760~ -4.93390 -5.38075 503125.00000 0.0050313 0.0068190
[ 8 ] -4.93390~ -4.04020 -4.48705 1487944.00000 0.0148794 0.0216984
[ 9 ] -4.04020~ -3.14650 -3.59335 3614075.00000 0.0361407 0.0578391
[ 10 ] -3.14650~ -2.25280 -2.69965 7217807.00000 0.0721781 0.1300172
[ 11 ] -2.25280~ -1.35910 -1.80595 11844001.00000 0.1184400 0.2484572
[ 12 ] -1.35910~ -0.46540 -0.91225 15957507.00000 0.1595751 0.4080323
[ 13 ] -0.46540~ 0.42831 -0.01855 17677107.00000 0.1767711 0.5848034
[ 14 ] 0.42831~ 1.32201 0.87516 16089539.00000 0.1608954 0.7456988
[ 15 ] 1.32201~ 2.21571 1.76886 12033715.00000 0.1203372 0.8660359
[ 16 ] 2.21571~ 3.10941 2.66256 7395516.00000 0.0739552 0.9399911
[ 17 ] 3.10941~ 4.00311 3.55626 3735828.00000 0.0373583 0.9773494
[ 18 ] 4.00311~ 4.89681 4.44996 1547930.00000 0.0154793 0.9928286
[ 19 ] 4.89681~ 5.79051 5.34366 528374.00000 0.0052837 0.9981124
[ 20 ] 5.79051~ 6.68421 6.23736 147289.00000 0.0014729 0.9995853
[ 21 ] 6.68421~ 7.57792 7.13107 33929.00000 0.0003393 0.9999246
[ 22 ] 7.57792~ 8.47162 8.02477 6421.00000 0.0000642 0.9999888
[ 23 ] 8.47162~ 9.36532 8.91847 965.00000 0.0000097 0.9999984
1
[ 24 ] 9.36532~ 10.25902 9.81217 141.00000 0.0000014 0.9999998
[ 25 ] 10.25902~ 11.15272 10.70587 15.00000 0.0000001 1.0000000
[ 26 ] 11.15272~ 12.04642 11.59957 1.00000 0.0000000 1.0000000
(1.3)n=100,000,000 個, the probability distribution,

f(x1),F(x1) Coefficient
Mathematical Mean: -0.00013
Geometrical Mean : none
Harmonic Mean : none
Variance : 4.00003
S.D. : 2.00001
Skewed Coef. : -0.00020
Kurtosis Coef. : 2.99965
MAD : 1.59580
Range : 23.23623
Mid_range : 0.42831
Median : -0.00000
Q1 : -1.34943
Q2 : -0.00000
Q3 : 1.34898
IQR : 2.69841
C.V. : none
(1.4)n=100,000,000, Curve-fitting estimated the cumulative distribution function,

The distribution function estimated line ------
F(X)= 0.00386999803514678780+
0.01001194588514464600*(X- -2.67349227634976220000)^1+
0.01550554396389403100*(X- -2.67349227634976220000)^2+
0.01390802599959850600*(X- -2.67349227634976220000)^3+
0.00388208129651745890*(X- -2.67349227634976220000)^4+
value range 0.0000000000<=F(x)<= 0.1000000000 ,
value range -5.7286634386<=X<= -1.2814350350 ,
Error=0.000027845572026633 MAX=0.002733636683693696 coefficient of
determination=0.999967509051313820,

F(X)= 0.03631200763629749400+
-0.03209349309327080800*(X- -2.18631354313496830000)^1+
0.11086014460306615000*(X- -2.18631354313496830000)^2+
0.00260500264994334430*(X- -2.18631354313496830000)^3+
value range 0.1000003052<=F(x)<= 0.2000000000 ,
value range -1.2814332352<=X<= -0.8413359414 ,
determination=0.999999854054121060,

F(X)= 0.06278864992782473600+
-0.06410391163080930700*(X- -1.97268507212137670000)^1+
0.18720657564699650000*(X- -1.97268507212137670000)^2+
-0.02063883515074849100*(X- -1.97268507212137670000)^3+
value range 0.2000003052<=F(x)<= 0.3000000000 ,
value range -0.8413350562<=X<= -0.5240766469 ,
determination=0.999999898674572510,
2
F(X)= 0.08435101807117462200+
-0.08860223740339279200*(X- 1.82400258924639000000)^1+
0.25061420723795891000*(X--1.82400258924639000000)^2+
-0.04219520930200815200*(X- -1.82400258924639000000)^3+
value range 0.3000003052<=F(x)<= 0.4000000000 ,
value range -0.5240759524<=X<= -0.2532618458 ,
determination=0.999999820467032170,

F(X)= 0.29147876799106598000+
-0.45904943346977234000*(X- -1.70717565313745820000)^1+
0.52002820372581482000*(X- -1.70717565313745820000)^2+
-0.10520285367965698000*(X- -1.70717565313745820000)^3+
value range 0.4000003052<=F(x)<= 0.5000000000 ,
value range -0.2532610163<=X<= 0.0000498975 ,
determination=0.999999943209205710,

F(X)= 0.04907521605491638200+
0.03276270627975463900*(X- -1.61005980743653470000)^1+
0.23294138908386230000*(X- -1.61005980743653470000)^2+
-0.04928661137819290200*(X- -1.61005980743653470000)^3+
value range 0.5000003052<=F(x)<= 0.6000000000 ,
value range 0.0000506574<=X<= 0.2532352478 ,
determination=0.999999857465343260,

F(X)= 0.07592004537582397500+
0.02947926521301269500*(X- -1.52632536545651050000)^1+
0.24662965536117554000*(X- -1.52632536545651050000)^2+
-0.05490001663565635700*(X- -1.52632536545651050000)^3+
value range 0.6000003052<=F(x)<= 0.7000000000 ,
value range 0.2532359929<=X<= 0.5241786916 ,
determination=0.999999912574473070,

F(X)= -0.26849794387817383000+
0.57103854417800903000*(X- -1.45226474672493060000)^1+
-0.01066714525222778300*(X- -1.45226474672493060000)^2+
-0.01534482091665267900*(X- -1.45226474672493060000)^3+
value range 0.7000003052<=F(x)<= 0.8000000000 ,
value range 0.5241798271<=X<= 0.8414278890 ,
determination=0.999999920888603460,

F(X)= -0.43635883927345276000+
0.83778893947601318000*(X- -1.38516108490693870000)^1+
0.13011927902698517000*(X- -1.38516108490693870000)^2+
0.00145238172262907030*(X--1.38516108490693870000)^3+
value range 0.8000003052<=F(x)<= 0.9000000000 ,
value range 0.8414289685<=X<= 1.2814374704 ,
determination=0.999999856151661090,
3
F(X)= -1.24017958471085880000+
1.87075669982004910000*(X- -1.32396818420487010000)^1+
-0.58725876218522899000*(X- -1.32396818420487010000)^2+
0.08198952173552243000*(X- -1.32396818420487010000)^3+
-0.00428836389892239820*(X- -1.32396818420487010000)^4+
value range 0.9000003052<=F(x)<= 0.9999996948 ,
value range 1.2814384883<=X<= 5.0553297197 ,
determination=0.999991132738400010
The image of estimated line
The comparison of estimated value and

the sample data.
1.2. Assumption population is normal distribution, it is not a good

idea.
The probability distribution of big data is the population distribution, the characters
of big data is the characters of population. In statistic, the population dsitrbituion is
assumed the normal distribution in usually,. In fact, population distribution doesn’t
need set a specific probability distribution.
Finding the population distributon methods are
i) Curve-fitting, ii)SLLN(strong law of large number), iii) Curve-linear.
The curve-fitting method is more impottant than the statistical analysis in big data
and finding the probability distribution of big data is first step for analysis the big
data.
Sample data Big data

Population The population dsitrbituion is The population distribution is big
distribution assumed the normal distribution. data distribution.
Point Sample mean and sample The character of big data.
estimator variance.
Test Z, t,chi-square and F. The big data analysis is the
statistic probability distribution.
Crritical The critical value is calculated It is not necessary
value from the sampling distribution of
test statistic.
4
Example 2, The population is shifted exponential distribution,
X ~ Shifted_exponential (λ X , c X ),
f X (x ) = λ X exp(− λ X (x − c X )), x > c X ,
E ( X ) = µ X = λ X + c X , Var ( X ) = σ X2 =
1
,
(λ X )2
µ X is the function of σ X .
Let X ~ Shifted_exponential (λ X = 1, c X = −1),
E ( X ) = µ X = λ X + c X = 0, Var ( X ) = σ X2 = 1, the sample size is n.
∑ (X )
n n
∑X
2
i i −X
Y1 = X = i =1
, sample mean, Y2 = i =1
,sample variance,
n n −1
(2.1)n=30,
f Y1 ( y1 ), FY1 ( y1 ) Coefficient
Mathematical Mean: 0.00001
Variance : 0.03333
S.D. : 0.18256
Skewed Coef. : 0.36519
MAD : 0.14527
Range : 2.07844
Mid_range : 0.28129
Median : -0.01107
Q1 : -0.12844
Q2 : -0.01107
Q3 : 0.11640
IQR : 0.24483
C.V. : none
X is not normal distribution,
Geometrical Mean : 0.88771
Harmonic Mean : 0.78723
Variance : 0.26920
S.D. : 0.51884
MAD : 0.38430
Range : 13.85028
Mid_range : 6.96980
Median : 0.88990
Q1 : 0.64057
Q2 : 0.88990
Q3 : 1.23306
IQR : 0.59249
C.V. : 0.51883
Cov(Y1,Y2)= 0.0667, Y1 and Y2 correlation coefficient=0.7039.
5
(2.2)n=200,
Variance : 0.00500
S.D. : 0.07071
MAD : 0.05640
Range : 0.80854
Mid_range : 0.06479
Median : -0.00167
Q1 : -0.04855
Q2 : -0.00167
Q3 : 0.04675
IQR : 0.09531
C.V. : none
X is not normal distribution,
Variance : 0.04008
S.D. : 0.20021
MAD : 0.15714
Range : 2.86882
Mid_range : 1.77018
Median : 0.97946
Q1 : 0.85848
Q2 : 0.97946
Q3 : 1.11862
IQR : 0.26015
C.V. : 0.20021
The following is goodness of fit(Pearson chi square test statistic), there are 20 basic
probability distribution can be selected and the null hypothesis probability
distributipon.
(2.3)n=30,
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
lower limit -0.96036 -0.74116 -0.45856 -0.06026 0.62065
upper limit -0.74116 -0.45856 -0.06026 0.62065
observed no 8.00000 4.00000 5.00000 6.00000 7.00000
probability 0.20000 0.20000 0.20000 0.20000 0.20000
expected no 6.00000 6.00000 6.00000 6.00000 6.00000
chi square 0.66667 0.66667 0.16667 0.00000 0.16667
degree of freedom=2
H0: X1~Shifted exponential(lamda,c), lamda,c are unknown
lamda point estimated value=1.017983 (MLE)
c point estimated value=-0.960361 (MLE)
pearson chi-square test statistic =1.666667
p-value=0.434500
“The best parameter value method about goodness of fit”

6
lamda value from 0.848319 to 1.272478
c value from -0.826382 to -1.094340
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
lower limit -0.96036 -0.74116 -0.45856 -0.06026 0.62065
upper limit -0.74116 -0.45856 -0.06026 0.62065
observed no 8.00000 4.00000 5.00000 6.00000 7.00000
probability 0.20000 0.20000 0.20000 0.20000 0.20000
expected no 6.00000 6.00000 6.00000 6.00000 6.00000
chi square 0.66667 0.66667 0.16667 0.00000 0.16667
degree of freedom=2
H0: X1~Shifted exponential(lamda=1.017983,c=-0.960361),
p-value=0.434500
Population is Shifted exponential(lamda=1.017983,c=-0.960361).
(2.4) n=200,
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ]
lower limit -0.99517 -0.86123 -0.70661 -0.52374 -0.29991 -0.01136 0.39534
1.09060
upper limit -0.86123 -0.70661 -0.52374 -0.29991 -0.01136 0.39534 1.09060
observed no 23.00000 20.00000 28.00000 24.00000 23.00000 34.00000 26.00000
22.00000
probability 0.12500 0.12500 0.12500 0.12500 0.12500 0.12500 0.12500
0.12500
expected no 25.00000 25.00000 25.00000 25.00000 25.00000 25.00000 25.00000
25.00000
chi square 0.16000 1.00000 0.36000 0.04000 0.16000 3.24000 0.04000
0.36000
degree of freedom=5
p-value=0.373500
“The best parameter value method about goodness of fit”

lamda value from 0.830806 to 1.246209
c value from -0.975086 to -1.015251
H0: X1~Shifted exponential(lamda=0.996968,c=-0.995168),
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ]
lower limit -0.99517 -0.19477 0.60563 1.40604 2.20644 3.00684 3.80724
4.60764
upper limit -0.19477 0.60563 1.40604 2.20644 3.00684 3.80724 4.60764
5.40804
observed no 104.00000 58.00000 23.00000 8.00000 3.00000 3.00000 0.00000
1.00000
probability 0.54976 0.24752 0.11145 0.05018 0.02259 0.01017 0.00458
0.00375
expected no 109.95195 49.50479 22.28905 10.03543 4.51835 2.03434 0.91594
0.75014
chi square 0.32219 1.45781 0.02268 0.41283 0.51023 0.45837 0.91594
0.08323
pearson chi square test statistic=4.183288
degree of freedom=5
p-value=0.523300
correction:
expected number>=5 in each cell, the frequency table is adjusted
7
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
lower limit -0.99517 -0.19477 0.60563 1.40604 2.20644
upper limit -0.19477 0.60563 1.40604 2.20644 5.40804
observed no 104.00000 58.00000 23.00000 8.00000 7.00000
probability 0.54976 0.24752 0.11145 0.05018 0.04109
expected no 109.95195 49.50479 22.28905 10.03543 8.21878
chi square 0.32219 1.45781 0.02268 0.41283 0.18073
degree of freedom=2
p-value=0.301700
(2.5) n=100,000,000, it is big data, goodness of fit(Pearson chi square test statistic)
and the probability distribution.
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.00000 -0.94871 -0.89465 -0.83749 -0.77688 -0.71234 -0.64335
-0.56925 -0.48922 -0.40221 -0.30691 -0.20156 -0.08379 0.04973 0.20387
0.38618 0.60930 0.89696 1.30239 1.99548
upper limit -0.94871 -0.89465 -0.83749 -0.77688 -0.71234 -0.64335 -0.56925
-0.48922 -0.40221 -0.30691 -0.20156 -0.08379 0.04973 0.20387 0.38618
0.60930 0.89696 1.30239 1.99548
observed no 4999364.00000 4996823.00000 5004706.00000 4999628.00000 4999942.00000 5001463.00000
5001842.00000 5002197.00000 4999556.00000 4999314.00000 4999025.00000 4995225.00000
4999502.00000 5000939.00000 5000360.00000 5000155.00000 5000682.00000 4997930.00000
4999445.00000 5001902.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.08090 2.01867 4.42929 0.02768 0.00067 0.42807 0.67859
0.96536 0.03943 0.09412 0.19012 4.56013 0.04960 0.17634 0.02592
0.00481 0.09302 0.85698 0.06161 0.72352
degree of freedom=17
lamda point estimated value=1.000084 (MLE), c point estimated value=-1.000000 (MLE)
pearson chi-square test statistic =15.504827 , p-value=0.559100
The probability distribution,

Variance : 0.99991
S.D. : 0.99996
MAD : 0.73575
Range : 18.22941
Mid_range : 8.11470
Median : -0.30701
Q1 : -0.71235
Q2 : -0.30701
Q3 : 0.38618
IQR : 1.09853
C.V. : none
Curve-fittig estimated the cumulative distribution function,
F(X)=1- exp( -1*(X- -0.9999999991)/ 1.0001792808 )^ 0.9999051744 )
Error=0.000028150335119980,MAX=0.000124377304293266,
8
coefficient of determination=0.999999983600273760

the sample data.
The big data is population all data, the population distribution does not assume and
gets the population distribution from curve-fitting methid in directly.
1.3. The hypothesis and test is not analyis method about big data
The hypothesis and test is method of the statistics, it gets the information of
population form the test. The test result is not true always, it is sometimes and the
sampling distribution of test statistic cannot link the critical value in sometime.
Big data is population data, it is not necessary to check the parameter of population.
The character of population can get from the big data in directly and the result is
really and rightly.
hypothesis and test probability distribution

The parameter of The hypothesis and test The big data can be formed
one population can get the parameter a specific probability
distribution value, but it is not always distribution.
right.
The comparison The big data can be formed The big data can be formed
of parameters of a specific probability a specific probability
two population distribution. distribution and transferred
distributions the probability distribution.
Many population The big data can be formed The big data can be formed
distributions a specific probability a specific probability
analysis distribution. distribution and transferred
the probability distribution.
Experiment The big data can be formed The big data can be formed
desgin a specific probability a specific probability
distribution. distribution and transferred
the probability distribution.
The line model The big data can be formed The big data can be formed
a specific probability a specific probability
distribution. distribution and
curve-linear analysis.
9
System integrated It is impossible to do, The probability
and analysis distribution can be
transferred when the
mathematical model is
setted.
simulator Ouput the simulated According the model to
sample data. simulating data and the
comparison with simulated
data and the real data.
The comparison It is impossible to do, SLLN and the probability
of system distribution transferred.
designed
( )
Example 3, X 1 ~ Normal µ X1 = 100,σ X2 1 = 10 2 , , simulated the sample which size is n,
n=500,000,000, it is big data.
(3.1)Hypothesis and test
* Suppose the population distribution is the normal distribution.
1. one population mean test and mu confidence interval when population sigma is
unknown
H0: mu=0 , mu is population mean
t(df=499999999)=223600.346338
which formula is t=(X1 sample mean-0)/standard error
the standard error =sample stand deviation/(n-1)^0.5, n is sample size=500000000
left tail test p-value= 1.0000, right tail test p-value= 0.0000
two tailes test p-value= 0.0000
90% confidence interval for mu, [99.999350 , 100.000822]
2. one population sigma confidence interval when population mean is unknown

90% confidence interval for population variance, [99.995540 , 100.016347]
90% confidence interval for population standard deviation
[9.999777 , 10.000817]
[9.999677 , 10.000917]
[9.999483 , 10.001112]
3.One population mean test , the population standard deviation is unknown

H0: mu=100.000000 , mu is population mean ,
the sample standard deviation=10.000297,The sample mean=100.000086
the test statistic t(df=499999999)=0.192177 ,
which formula is t=(X1 sample mean-0)/standard error
the standard error =sample stand deviation/(n-1)^0.5, n is sample size=500000000
10
4. one population sigma test when population mean is unknown

H0: sigma=10.000000 , sigma is population standard deviation ,
sample mean=100.000086,The sample variance=100.005942
The test static chi-square(df=499999999)=500029711.3602 ,
which formula is chi-square=(n-1)*(Sample Variance)/100.000000
n is sample size=500000000
a.s. a.s.
→ µ = 100, S 2 n
X n
→∞
→σ 2 = 100 ,
→∞
(3.2)n=500,000,000, the probability distribution,

Variance : 100.00594
S.D. : 10.00030
MAD : 7.97905
Range : 119.02763
Mid_range : 99.34521
Median : 100.00013
Q1 : 93.25510
Q2 : 100.00013
Q3 : 106.74498
IQR : 13.48988
C.V. : 0.10000
(3.23)Comaprsion of the cumulative probability distribution function of X1 and X2,

X1 is the big data and X2~ Normal(100,100). This is SLLN method,
E(| X1 distribution - X2 distribution |^2)= 0.0000006913
************ The | X1 distribution F() - X2 distribution F()| ****************
The almost surely limiting theory
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0000000003
Pr(| X1 distribution F() - X2 distribution F()|< 0.1000000000)= 1.000000
11
The probability limiting theory
Pr(| X1 distribution F() - X2 distribution F()|>= 0.1000000000)= 0.000000
Red line isX1,Blue line is X2,
(3.4) Curve-fittig estimated the distribution function,

F(X)= 0.03968240540137848300+
0.00856160766427638970*(X-82.45015371561977700000)^1+
0.00073748677735076284*(X-82.45015371561977700000)^2+
0.00002891985767975122*(X-82.45015371561977700000)^3+
0.00000041879600578094*(X-82.45015371561977700000)^4+
value range 0.0000000000<=F(x)<= 0.1000000000 ,
value range 39.8313898088<=X<= 87.1841416787 ,
determination=0.999986237417931580,

F(X)= 0.14810273312869901000+
0.02312860086114329500*(X-89.55329031578212100000)^1+
0.00119685027951833830*(X-89.55329031578212100000)^2+
value range 0.1000000020<=F(x)<= 0.2000000000 ,
value range 87.1841416945<=X<= 91.5834299641 ,
determination=0.999999939832695420,

F(X)= 0.24910111552059244000+
0.03167274804946222000*(X- 93.22671303764495600000)^1+
0.00107787449420609920*(X- 93.22671303764495600000)^2+
value range 0.2000000020<=F(x)<= 0.3000000000 ,
value range 91.5834300029<=X<= 94.7556650144 ,
determination=0.999999724102138330,

F(X)=1 / (( 1+(x/100.0237845663)^ 15.6585935074)
12
value range 0.3000000020<=F(x)<= 0.4000000000 ,
value range 94.7556650642<=X<= 97.4667068903 ,
determination=1.000000000000000000,

F(X)= 0.44986693054116655000+
0.03951809184188455300*(X- 98.74016410595750400000)^1+
0.00024935500785239206*(X- 98.74016410595750400000)^2+
value range 0.4000000020<=F(x)<= 0.5000000000 ,
value range 97.4667070173<=X<= 100.0001263519 ,
determination=0.999999497426736770,

F(X)= 0.55013498602882427000+
0.03952380199343007900*(X- 101.25995288637667000000)^1+
-0.00025301515392373020*(X-101.25995288637667000000)^2+
value range 0.5000000020<=F(x)<= 0.6000000000 ,
value range 100.0001263596<=X<= 102.5334821574 ,
determination=0.999999333698843640,

F(X)= 0.65043395243913082000+
0.03696571736418136100*(X-103.86500204918346000000)^1+
-0.00071093140888445205*(X-103.86500204918346000000)^2+
value range 0.6000000020<=F(x)<= 0.7000000000 ,
value range 102.5334821646<=X<= 105.2440290406 ,
determination=0.999999487878982520,

F(X)= 0.75089598881159303000+
0.03167282089782098900*(X-106.77312308791723000000)^1+
-0.00107443126944750670*(X-106.77312308791723000000)^2+
value range 0.7000000020<=F(x)<= 0.8000000000 ,
value range 105.2440290438<=X<= 108.4164524644 ,
determination=0.999999643026273980,

F(X)= 0.85189748483166849000+
0.02312745232149512900*(X- 110.44673623957767000000)^1+
-0.00119686135738439340*(X- 110.44673623957767000000)^2+
value range 0.8000000020<=F(x)<= 0.9000000000 ,
value range 108.4164524799<=X<= 112.8159940328 ,
determination=0.999999955285428730,

F(X)= 0.96031622665227356000+
0.00855918566910951470*(X- 117.55085838942058000000)^1+
-0.00073734019250377980*(X-117.55085838942058000000)^2+
0.00002895927346734280*(X-117.55085838942058000000)^3+
-0.00000042095490929384*(X-117.55085838942058000000)^4+
value range 0.9000000020<=F(x)<= 0.9999999980 ,
value range 112.8159941460<=X<= 158.8590234399 ,
determination=0.999988321082443730
13
the sample data.
Chaper 2. The population distribution test and the

population mean and variance test
2.1. The population distribution test

The test statistic is goodness of fit test of pearson chi square test statistic,
(1)Formula,
There are sample form a specific population and the size is n that the samples are
independently/
H 0 : a specific population distribution
H 1 : against H 0
The frequency distribution table will be used, the a specific population distribution
is changed to k class table.
H 0 : P1 = P10 , P2 = P20 ,...., Pk = Pk0 ,

H 1 : against H 0
k
P10 , P20 ,...., Pk0 is pre-assumed value and ∑P
i =1
i
0
= 1,
The ith class has frequency is X i , i = 1,2,..., k , X i ~ Binomial (n, Pi ),

( ) (
Under null hypothesis X i ~ Binomial n, Pi 0 , E ( X i ) = nPi 0 ,Var ( X i ) = nPi 0 1 − Pi 0 , )
k
The actually observed frequency is Oi , i = 1,2,..., k , and ∑O
i =1
i = n.
pearson chi square test statistic
χ df = ∑
2
k
(Oi − E ( X i ))
2
=∑
k
(
Oi − nPi 0
2
)
> χ α2 ,df , reject null hypothesis.
i =1 E(X i ) i =1 nPi 0
df =k-1-the number of point estimator.
(2) The distribution of big data,

The goodness of fit test that is useless about big data, the curve-fitting and
curve-linear can get the distribution of big data and the SLLN analysis also can name
14
the distribution of big data.
Note 1: please refer appendix 2,
Note 2: There are 20 probability distributions that can be null hypotheis,

Uniform Normal Shifted Pareto1 Pareto2
exponential
Rayleigh Double Lognormal Gamma Beta
exponential
Cauchy Arcsin Gumbel Triangular 1 Trapezoid
U-quadratic semicircle Logisitic Weibull Pareto3
Example 4,Population is Normal(0,1), simuated the sample data which size is 100,
(4.1) Normal(0,1) probability distribution,
Normal(0,1) Coeffficient
Variance : 0.99994
S.D. : 0.99997
MAD : 0.79783
Range : 10.84608
Mid_range : -0.03259
Median : -0.00009
Q1 : -0.67455
Q2 : -0.00009
Q3 : 0.67426
IQR : 1.34881
C.V. : none
(4.2)The population distribution is assumptions of 20 kinds probability distribution
and do the goodness of fit test.
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
lower limit -1.17046 -0.60521 -0.17064 0.23492 0.66939 1.23442
upper limit -1.17046 -0.60521 -0.17064 0.23492 0.66939 1.23442
observed no 12.00000 19.00000 11.00000 16.00000 9.00000 21.00000 12.00000
probability 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
expected no 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571
chi square 0.36571 1.55571 0.75571 0.20571 1.95571 3.15571 0.36571
degree of freedom=4
H0: X1~Normal(mu,sigma*sigma), mu,sigma are unknown
population mean(mu) point estimated value=0.032257 (MLE,UMVUE)
population variance(sigma*sigma) which point estimated value=1.268638
(UMVUE) , pearson chi-square test statistic =8.360000, p-value=0.079200

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
lower limit -2.58884 -1.43306 -0.81174 -0.24907 0.31359 0.87625 1.49757
upper limit -1.43306 -0.81174 -0.24907 0.31359 0.87625 1.49757 3.31913
observed no 9.00000 16.00000 17.00000 18.00000 16.00000 16.00000 8.00000
probability 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
expected no 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571
chi square 1.95571 0.20571 0.51571 0.96571 0.20571 0.20571 2.76571
degree of freedom=4
H0: X1~trapezoid(mu,c), mu,c are unknown
mu point estimated value=0.032257
c point estimated value=1.969321 (MLE)
15
pearson chi-square test statistic =6.820000, p-value=0.145700

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
lower limit -1.08037 -0.53673 -0.14639 0.21090 0.60125 1.14489
upper limit -1.08037 -0.53673 -0.14639 0.21090 0.60125 1.14489
observed no 15.00000 18.00000 11.00000 10.00000 13.00000 18.00000 15.00000
probability 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
expected no 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571
chi square 0.03571 0.96571 0.75571 1.28571 0.11571 0.96571 0.03571
degree of freedom=4
H0: X1~Logistic(mu,sigma), mu,sigma are unknown
mu point estimated value=0.032257 (MME)
sigma point estimated value=0.620970 (MME)
There are three kinds of probability distributions that is not rejected.
(4.3)The 3 kinds of probability distributions are

X1~ Normal(mu=0.032257,sigma*sigma=1.268638),
X2~Trapezoid(mu=0.032257,c =6.820000),
X3~ Logistic(mu=0.032257,sigma=0.620970),
f(x1),F(x1) Coeffficient
Variance : 1.26846
S.D. : 1.12626
MAD : 0.89862
Range : 12.95929
Median : 0.03233
Q1 : -0.72737
Q2 : 0.03233
Q3 : 0.79171
IQR : 1.51908
C.V. : 34.95726
Variance : 1.61572
S.D. : 1.27111
MAD : 1.06662
Range : 5.90724
Mid_range : 0.03241
Median : 0.03227
Q1 : -0.95217
Q2 : 0.03227
Q3 : 1.01697
IQR : 1.96914
C.V. : 39.20616
16
Variance : 1.26824
S.D. : 1.12616
MAD : 0.86076
Range : 17.14903
Mid_range : 0.03484
Median : 0.03235
Q1 : -0.64980
Q2 : 0.03235
Q3 : 0.71441
IQR : 1.36422
C.V. : 34.90190
(4.4) Comaprsion of the cumulative probability distribution function of X1and X2,

X1 is one of three kinds probability distribution and X2 is big data.
X1~ Normal(0.032257, 1.268638), X2~ Trapezoid(0.032257, 6.820000),
Red line, Red line,
X2~Normal(0,1),Blue line X3~Normal(0,1),Blue line
X3~ Logistic(0.032257, 0.620970),

Red line,
X4~Normal(0,1),Blue line
17
(4.5)The comparison of two distribution functions,
X1~ Normal(0.032257, 1.268638),X2~Normal(0,1),Blue line
X2~ Trapezoid(0.032257, 6.820000),X3~Normal(0,1)
X3~ Logistic(0.032257, 0.620970),X4~Normal(0,1),

The goodness of fit test is not a good analysis tool.
Example 5,Population is U_quadratic(0,1)+ U_quadratic(0,1), simuated the sample

data which size is 100,000,000,
(5.1)The frequency distribution table,
18
(5.2)The probability distribution
pdf,cdf Coeffficient
Variance : 0.30000
S.D. : 0.54773
MAD : 0.42858
Range : 1.99996
Mid_range : 1.00001
Median : 1.00002
Q1 : 0.63663
Q2 : 1.00002
Q3 : 1.36463
IQR : 0.72799
C.V. : 0.54770
(5.3) Comaprsion of the cumulative probability distribution function of X1 and X2,

X1 is the big data and X2~ U_quadratic(0,1) + U_quadratic(0,1),.
This is SLLN method,
X1~the probability distribution generated from sample data, Red line,
X2~ U_quadratic(0,1)+U_quadratic(0,1),Blue line
(5.4) Curve-fittig estimated the distribution function,

F(X)= -0.00334591492694480410+0.14497325225680413000*(X/(1+X))^1+
2.96138394689495500000*(X/(1+X))^2+
value range 0.0000000000<=F(x)<= 0.1000000000 ,
value range 0.0000293275<=X<= 0.1975766721 ,
determination=0.999691012339424030,

F(X)= 0.07229677913710475000+-0.66993094980716705000*(log(X))^1+
-0.85403209179639816000*(log(X))^2+-0.36817971337586641000*(log(X))^3+
-0.05536942742764949800*(log(X))^4+
19
value range 0.1000000100<=F(x)<= 0.2000000000 ,
value range 0.1975766814<=X<= 0.3655230421 ,

determination=0.999999976942073650,

F(X)= 2.71219043666496870000+-34.68482831120491000000*(X/(1+X))^1+
171.64077770709991000000*(X/(1+X))^2+-363.63223952054977000000*(X/(1+X))^3+
282.19495049118996000000*(X/(1+X))^4+
value range 0.2000000100<=F(x)<= 0.3000000000 ,
value range 0.3655230811<=X<= 0.8227063715 ,
determination=0.999925497327482820,

F(X)= 0.49864384187537780000+1.75296088970935670000*(log(X))^1+
4.78648839105153460000*(log(X))^2+5.22951845166971910000*(log(X))^3+
value range 0.3000000100<=F(x)<= 0.4000000000 ,
value range 0.8227063780<=X<= 0.9342973850 ,
determination=0.999999898459132060,

F(X)= 0.49995978687259601000+1.79922763805052450000*(log(X))^1+
5.33583558480313510000*(log(X))^2+7.40354398963972930000*(log(X))^3+
value range 0.4000000100<=F(x)<= 0.5000000000 ,
value range 0.9342973909<=X<= 1.0000204842 ,
determination=0.999999941836549610,

F(X)= 0.49997810440663848000+1.79953844068472790000*(log(X))^1+
-3.58430101289468440000*(log(X))^2+
value range 0.5000000100<=F(x)<= 0.6000000000 ,
value range 1.0000204907<=X<= 1.0657183497 ,
determination=0.999999954318608330,

F(X)= -16.29824006929993600000+60.17138575017452200000*(X/(1+X))^1+
53.14293237030506100000*(X/(1+X))^2+
value range 0.6000000100<=F(x)<= 0.7000000000 ,
value range 1.0657183567<=X<= 1.1774847537 ,
determination=0.999999017568134920,

F(X)= 0.44682687934005116000+2.56094765743910100000*(log(X))^1+
-7.44055478803056760000*(log(X))^2+7.51058122712129260000*(log(X))^3+
value range 0.7000000100<=F(x)<= 0.8000000000 ,
value range 1.1774847585<=X<= 1.6345008075 ,
determination=0.999931094069408610,

F(X)= 22.04035533964633900000+-71.52778774499893200000*(X/(1+X))^1+
60.10756823420524600000*(X/(1+X))^2+
value range 0.8000000100<=F(x)<= 0.9000000000 ,
value range 1.6345008155<=X<= 1.8023945831 ,
20
determination=0.999984251538919790,

F(X)= 0.95307219267465504000+0.64305513179196971000*(X-1.87914270610787360000)^1+
-1.17603480446899770000*(X-1.87914270610787360000)^2+
-7.41345334551942870000*(X-1.87914270610787360000)^3+
value range 0.9000000100<=F(x)<= 0.9999999900 ,
value range 1.8023945889<=X<= 1.9999863146 ,
determination=0.999996289406651420
the sample data.
(5.5) Curve-fittig estimated the random variable value,

The random variable value estimated line ------
X=0.00006180438504088670+0.46823816420510411000*F(x)^(0.5*1)+
0.42375092953443527000*F(x)^(0.5*2)+ -2.48992633819580080000*F(x)^(0.5*3)+
32.89773559570312500000*F(x)^(0.5*4)+ -196.52471923828125000000*F(x)^(0.5*5)+
603.33581542968750000000*F(x)^(0.5*6)+ -730.49261474609375000000*F(x)^(0.5*7)+
0.000000<F(x)<=0.050000
Error=0.199174442701451710 MAX=0.019677903377411807

X=0.02279628254473209400+3.140324473381042500000000000000*log(1+F(x))^1+
-37.506774902343750000000000000000*log(1+F(x))^2+
460.195007324218750000000000000000*log(1+F(x))^3+
-2987.257324218750000000000000000000*log(1+F(x))^4+
8170.607421875000000000000000000000*log(1+F(x))^5+
0.050000<F(x)<=0.100000
Error=0.000000035279996120 MAX=0.000013487041533727

X=0.05922343023121357000+-1.34120517969131470000*log(1-F(x)))^1+
0.31472492218017578000*log(1-F(x)))^2+9.66924285888671870000*log(1-F(x)))^3+
39.40975189208984400000*log(1-F(x)))^4+
0.100000<F(x)<=0.150000
Error=0.000000057610853700 MAX=0.000015114248538117

X=0.25951731204986572000+3.18989372253417970000*log(1-F(x)))^1+
38.71198272705078100000*log(1-F(x)))^2+154.18635559082031000000*log(1-F(x)))^3+
243.35913085937500000000*log(1-F(x)))^4+
0.150000<F(x)<=0.200000
Error=0.000000100806699716 MAX=0.000023768200877461
21

X=94.83422088623046900000+-2635.742675781250000000000000000000*log(1+F(x))^1+
25653.484375000000000000000000000000*log(1+F(x))^2+
-47267.531250000000000000000000000000*log(1+F(x))^3+
-957365.000000000000000000000000000000*log(1+F(x))^4+
7889714.000000000000000000000000000000*log(1+F(x))^5+
-24546912.000000000000000000000000000000*log(1+F(x))^6+
28345208.000000000000000000000000000000*log(1+F(x))^7+
0.200000<F(x)<=0.250000
Error=0.000747488002795274 MAX=0.004701226371604084

X=43.40369099378585800000+
142.496345520019530000000000000000*log(F(x)/(1-F(x)))^1+
112.567829132080080000000000000000*log(F(x)/(1-F(x)))^2+
-63.647373199462891000000000000000*log(F(x)/(1-F(x)))^3+
-57.506805419921875000000000000000*log(F(x)/(1-F(x)))^4+
43.805576324462891000000000000000*log(F(x)/(1-F(x)))^5+
-47.816497802734375000000000000000*log(F(x)/(1-F(x)))^6+
-120.333984375000000000000000000000*log(F(x)/(1-F(x)))^7+
-47.577613830566406000000000000000*log(F(x)/(1-F(x)))^8+
0.250000<F(x)<=0.300000, Error=0.000068903077138949 MAX=0.000954205252581386

X=0.92433845996856689000+
-0.352319240570068360000000000000*log(F(x)/(1-F(x)))^1+
-1.253063201904296900000000000000*log(F(x)/(1-F(x)))^2+
-1.319009780883789100000000000000*log(F(x)/(1-F(x)))^3+
-0.587675571441650390000000000000*log(F(x)/(1-F(x)))^4+
0.300000<F(x)<=0.350000, Error=0.000000033064832210 MAX=0.000013235762259201

X=0.99537280201911926000+
0.12942987680435181000*tan((F(x)-0.5)*pi)^1+
-0.26254975795745850000*tan((F(x)-0.5)*pi)^2+
-0.33274817466735840000*tan((F(x)-0.5)*pi)^3+
-0.24375689029693604000*tan((F(x)-0.5)*pi)^4+
0.350000<F(x)<=0.400000, Error=0.000000007801631226 MAX=0.000006385159202149

X= 0.99909142218530178000+
0.125923931598663330000000000000*log(F(x)/(1-F(x)))^1+
-0.114378780126571660000000000000*log(F(x)/(1-F(x)))^2+
-0.133997142314910890000000000000*log(F(x)/(1-F(x)))^3+
-0.143139779567718510000000000000*log(F(x)/(1-F(x)))^4+
0.400000<F(x)<=0.450000
Error=0.000000004920412543 MAX=0.000004631920769271

X=1.00002327679612790000+
0.17869696719571948000*tan((F(x)-0.5)*pi)^1+
0.06632557511329650900*tan((F(x)-0.5)*pi)^2+
22
4.32947254180908200000*tan((F(x)-0.5)*pi)^3+
67.88774108886718800000*tan((F(x)-0.5)*pi)^4+
589.52453613281250000000*tan((F(x)-0.5)*pi)^5+
2660.26855468750000000000*tan((F(x)-0.5)*pi)^6+
4830.36816406250000000000*tan((F(x)-0.5)*pi)^7+
0.450000<F(x)<=0.500000
Error=0.000000003958578990 MAX=0.000004794943887165

X=1.00001699286349320000+
0.17700911406427622000*tan((F(x)-0.5)*pi)^1+
0.05480703711509704600*tan((F(x)-0.5)*pi)^2+
0.94527626037597656000*tan((F(x)-0.5)*pi)^3+
-20.32119750976562500000*tan((F(x)-0.5)*pi)^4+
226.63073730468750000000*tan((F(x)-0.5)*pi)^5+
-1219.08886718750000000000*tan((F(x)-0.5)*pi)^6+
2496.00781250000000000000*tan((F(x)-0.5)*pi)^7+
0.500000<F(x)<=0.550000
Error=0.000000003683153647 MAX=0.000004970831049445

X=0.99897602945566177000+
0.165663361549377440000000000000*log(F(x)/(1-F(x)))^1+
-0.195174217224121090000000000000*log(F(x)/(1-F(x)))^2+
1.037982940673828100000000000000*log(F(x)/(1-F(x)))^3+
-2.019107818603515600000000000000*log(F(x)/(1-F(x)))^4+
1.555332183837890600000000000000*log(F(x)/(1-F(x)))^5+
0.550000<F(x)<=0.600000
Error=0.000000013359409620 MAX=0.000008187808703930

X=1.01905836164951320000+
-0.014826655387878418000000000000*log(F(x)/(1-F(x)))^1+
0.513012409210205080000000000000*log(F(x)/(1-F(x)))^2+
-0.614833831787109380000000000000*log(F(x)/(1-F(x)))^3+
0.344666481018066410000000000000*log(F(x)/(1-F(x)))^4+
0.600000<F(x)<=0.650000
Error=0.000000013311541231 MAX=0.000007735283408916

X=1.03756684064865110000+
-0.09692454338073730500*tan((F(x)-0.5)*pi)^1+
0.84510135650634766000*tan((F(x)-0.5)*pi)^2+
0.99522924423217773000*tan((F(x)-0.5)*pi)^3+
0.52366161346435547000*tan((F(x)-0.5)*pi)^4+
0.650000<F(x)<=0.700000
Error=0.000000044022322711 MAX=0.000014293000793364

X= 19.34960496425628700000+
-89.77158164978027300000*tan((F(x)-0.5)*pi)^1+
165.36785316467285000000*tan((F(x)-0.5)*pi)^2+
-135.05597400665283000000*tan((F(x)-0.5)*pi)^3+
41.47272789478302000000*tan((F(x)-0.5)*pi)^4+
0.700000<F(x)<=0.750000
23
Error=0.000283952128120614 MAX=0.001985943214733776
X=-245.96403503417969000000+
913.20166015625000000000*tan((F(x)-0.5)*pi)^1+
-1283.43377685546870000000*tan((F(x)-0.5)*pi)^2+
793.42468261718750000000*tan((F(x)-0.5)*pi)^3+
-131.82031250000000000000*tan((F(x)-0.5)*pi)^4+
-67.65928649902343700000*tan((F(x)-0.5)*pi)^5+
23.61195373535156300000*tan((F(x)-0.5)*pi)^6+
0.750000<F(x)<=0.800000
Error=0.000618953406496594 MAX=0.003635112268771001

X=1.67997968196868900000+-4.43243789672851560000*log(F(x))^1+
-48.23260498046875000000*log(F(x))^2+-186.43218994140625000000*log(F(x))^3+
-284.08203125000000000000*log(F(x))^4+
0.800000<F(x)<=0.850000
Error=0.000000037147194785 MAX=0.000014164343235423

X= 1.93015085160732270000+1.05131769180297850000*log(F(x))^1+
-3.22946548461914060000*log(F(x))^2+-22.43151855468750000000*log(F(x))^3+
-59.89041137695312500000*log(F(x))^4+
0.850000<F(x)<=0.900000
Error=0.000000034485397332 MAX=0.000014246611140356

X= 1.96804702514782550000+2.38401614129543300000*(F(x)-1)^1+
14.36566019058227500000*(F(x)-1)^2+93.85990142822265600000*(F(x)-1)^3+
229.57086181640625000000*(F(x)-1)^4+
0.900000<F(x)<=0.950000
Error=0.000000030426263239 MAX=0.000012447611644983

X=1.17790971230715510000+
0.561246736440807580000000000000*log(F(x)/(1-F(x)))^1+
-0.196631069760769610000000000000*log(F(x)/(1-F(x)))^2+
0.044083505636081100000000000000*log(F(x)/(1-F(x)))^3+
-0.006669080175925046200000000000*log(F(x)/(1-F(x)))^4+
0.000685060578689444810000000000*log(F(x)/(1-F(x)))^5+
-0.000046906166062399279000000000*log(F(x)/(1-F(x)))^6+
0.000002042033500515572100000000*log(F(x)/(1-F(x)))^7+
-0.000000050972907228441500000000*log(F(x)/(1-F(x)))^8+
0.000000000553990395224523980000*log(F(x)/(1-F(x)))^9+
0.950000<F(x)<=1.000000
Error=0.000000149711449056 MAX=0.000112595573454666
24
the sample data.
The simulated estimated line is below,

Variance : 0.30008
S.D. : 0.54780
MAD : 0.42859
Range : 1.99475
Mid_range : 1.00002
Median : 1.00001
Q1 : 0.63873
Q2 : 1.00001
Q3 : 1.35939
IQR : 0.72067
C.V. : 0.54783
2.2. One population mean and population variance test

The sampling distribution of test statistic that always is not existed, the normal
population assumption required is necessary. The new software can improve
sampling distribution of test statistic in any kind of population distribution.
The big data is population data, the analysis method is probability distribution.
Example 6,Population is the Logistic distribution, population mean=100,

population variance= 4, simulated 100 samples,
(6.1)The Central limit theorem is applied,
X − 100 X − 100
H 0 : µ = 100, t 99 = = ,
S n S 100
H 0 : σ = 2, χ 992 =
(n − 1)S 2 =
99 × S 2
,
4 4
∑ (X )
n n
∑ Xi
2
i −X
X= i =1
, sample mean S 2 = i =1 ,sample variance,
n n −1
X − 100 X − 100 X − 100
(6.2) t 99 = = , W2 = , it is test statistic.
S n S 100 S 100
25
Variance : 1.02019
S.D. : 1.01004
MAD : 0.80462
Range : 11.20633
Mid_range : 0.02111
Median : 0.00001
Q1 : -0.67859
Q2 : 0.00001
Q3 : 0.67859
IQR : 1.35719
C.V. : none
W2 is symmetric distribution, P (t 99 ≤ t1−α ,99 ) = α ,

α 0.9 0.95 0.975 0.99 0.995
Critical value 1.291414 1.660411 1.981549 2.357562 2.614991
student(df=99),
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.2900 1.6610 1.9854 2.3651 2.6270
可見得 W2 不是真正的 student(df=99)分配.
Z(standard normal)
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.28 1.645 1.96 2.326 2.576
student(df=99) is not Z distribution,but student(df) df→
∞
→ Z.
Comaprsion of the cumulative probability distribution function of W2 and W0,

the analyis method is SLLN.
W2,Red line,W0~t 分配(df=99),Blue line
E(| W2 distribution - W0 distribution |^2)= 0.0000060300
************ The | W2 distribution F() - W0 distribution F()| ****************
E(| W2 distribution F() - W0 distribution F()|^2)= 0.0000001051
Pr(| W2 distribution F() - W0 distribution F()|< 0.1000000000)= 1.000000
Pr(| W2 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.000000
26
W2 is approached to t (df=99).
(6.3) χ 992 =
(n − 1)S 2 =
99 × S 2
,W3 =
99 × S 2
, the test statistic.
4 4 4
S.D. : 17.74643
MAD : 14.03831
Range : 202.31477
Mid_range : 135.86554
Median : 97.56694
Q1 : 86.48509
Q2 : 97.56694
Q3 : 109.94606
IQR : 23.46098
C.V. : 0.17925
(
W3 is not symmetric distribution, P χ 992 ≤ χ12−α ,99 = α , )
α 0.005 0.01 0.025 0.05 0.1
Critical value 60.995366 63.911996 68.402117 72.495428 77.480065
α 0.9 0.95 0.975 0.99 0.995

Critical value 122.353588 130.397197 137.777691 146.911876 153.446014

W3,Red line,W0~卡方分配(df=99),Blue line
27

W3 is not chi squre distribition (df=99).
Note:Population is the Logistic distribution, population mean=100,

population variance= 4, simulated 100,000,000samples,please refer
Appendix7.The critical value of Logisitic population is Appendix 8.
2.3. Two independent population means and population variances

test
Two independent population distributions is always the normal probability

distribution. In reality, the population distribution can be any kind of probability
distribution. The big data is population data, the probability distribution is analysis
method.
Example 7 1st population is Arcsin distribution, population mean=100,

2nd population is Semi circle distribution, population mean=100,
(7.1)The central limit theorem is applied,

X1 − X 2 X1 − X 2
H 0 : µ1 = µ 2 , t 98 = = ,
1 1 1 1
S pool + S pool +
n1 n2 50 50
S12
H 0 : σ 1 = σ 2 , F49, 49 = 2,
S2
28
n1 n2
∑X 1i ∑X
j =1
2j
X1 = i =1
,X2 = , the sample means,
n1 n2
∑ (X ) ∑ (X )
n1 n2
2
−X2
2
1i − X1 2j
j =1
S12 = i =1
, S 22 = ,the sample variances,
n1 − 1 n2 − 1
∑ (X ) ( )
n1 n2
− X1 +∑ X2j − X 2
2 2
1i
i =1 j =1
2
Spool sample variance, S spool = ,
n1 + n2 − 2
σ1 = σ 2 = σ ,
(n1 + n2 − 2)S pool
2
H 0 : σ = 5, χ 982 = ,
25
X1 − X 2 X1 − X 2 X1 − X 2
(7.2) t 98 = = , W2 = ,
1 1 1 1 1 1
S pool + S pool + S pool +
n1 n2 50 50 50 50
It is sampling distribution of test statistic,

Variance : 1.02041
S.D. : 1.01015
MAD : 0.80300
Range : 11.53113
Mid_range : 0.15225
Median : -0.00030
Q1 : -0.67528
Q2 : -0.00030
Q3 : 0.67445
IQR : 1.34972
C.V. : none
W2 is the symmetric distribution, P (t 99 ≤ t1−α ,98 ) = α ,

α 0.9 0.95 0.975 0.99 0.995
Critical value 1.286621 1.658805 1.986901 2.370267 2.637129
student(df=98),
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.2897 1.66004 1.9837 2.3640 2.6258
Z(standard normal)
α 0.9 0.95 0.975 0.99 0.995
臨界值 1.28 1.645 1.96 2.326 2.576
student(df=98) is not Z,student(df)分配 df→
∞
→ Z.

29
W2,Red line,W0~t 分配(df=99),Blue line
S12
(7.3) F49, 49 = = W3 , it is test statistic,
S 22
Variance : 0.03526
S.D. : 0.18778
MAD : 0.14708
Range : 2.65286
Mid_range : 1.70655
Median : 1.00245
Q1 : 0.88971
Q2 : 1.00245
Q3 : 1.13237
IQR : 0.24266
C.V. : 0.18378
W3 is not symmentric distribution, P(F49, 49 ≤ F1−α , 49, 49 ) = α ,

α 0.005 0.01 0.025 0.05 0.1
Critical value 0.664482 0.691267 0.732738 0.770350 0.816132
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.267709 1.358526 1.444584 1.553117 1.633779
30
W3,Red line,W0~ F 分配(df1=49, df2=49),Blue line
W3 is not F(df1=49, df2=49).
(n1 + n2 − 2)S pool

2
(7.4) χ 982 = = W3 , the test statistic

25
Variance : 75.98867
S.D. : 8.71715
MAD : 6.95855
Range : 84.62134
Median : 97.88522
Q1 : 92.04618
Q2 : 97.88522
Q3 : 103.81945
IQR : 11.77327
C.V. : 0.08896
(
W3 is not sysmmetric distribution, P χ 992 ≤ χ12−α ,99 = α , )
α 0.005 0.01 0.025 0.05 0.1
Critical value 76.197494 78.232165 81.220576 83.834295 86.890946
α 0.9 0.95 0.975 0.99 0.995
Critical value 109.234755 112.517459 115.387940 118.721108 121.007592
31
W3,Red line,W0~Chi square(df=99),Blue line
W3 is not chi square (df=98).
Note:The critical value of test staitsitc is Appendix 12.
Example 8 1st population is Arcsin distribution, population mean=100,

population variance= 25, simulated 60,000,000 samples.
2nd population is Semi circle distribution, population mean=100,
population variance= 25, simulated 60,000,000 samples.
Let X 1 is the data set of 1st population, X 2 is the data set of 2nd
population and two sample sizes are big data.
(8.1) The marginal probability distribution,

X 1 marginal probability distribution
32
Variance : 25.00367
S.D. : 5.00037
MAD : 4.50195
Range : 14.14214
Mid_range : 100.00000
Median : 100.00159
Q1 : 95.00027
Q2 : 100.00159
Q3 : 105.00150
IQR : 10.00123
C.V. : 0.05000
X 2 marginal probability distribution

Variance : 24.99981
S.D. : 4.99998
MAD : 4.24421
Range : 19.99988
Mid_range : 100.00003
Median : 100.00009
Q1 : 95.96022
Q2 : 100.00009
Q3 : 104.03956
IQR : 8.07934
C.V. : 0.05000
(8.2) Comaprsion of the cumulative probability distribution function of X 1 and X 2 ,

X 1 ,Red line, X 2 ,Blue line
************ The |X1 distribution F() - X2 distribution F()| ****************
33
X 1 and X 2 are different probability
distribution.
(8.3)The probability distribution transformation,

Y1 = X 1 + X 2 ,
Variance : 49.99410
S.D. : 7.07065
MAD : 5.78316
Range : 34.13825
Mid_range : 199.99912
Median : 199.99937
Q1 : 194.91373
Q2 : 199.99937
Q3 : 205.08752
IQR : 10.17379
C.V. : 0.03535
Y2 = X 1 − X 2 ,
Variance : 49.99462
S.D. : 7.07069
MAD : 5.78337
Range : 34.13656
Median : 0.00079
Q1 : -5.08802
Q2 : 0.00079
Q3 : 5.08761
IQR : 10.17563
C.V. : none
Y3 = X 1 × X 2 ,
34
Variance : 500650.64213
S.D. : 707.56671
MAD : 578.83920
Range : 3413.79617
Mid_range : 10070.67790
Median : 9977.05165
Q1 : 9485.48654
Q2 : 9977.05165
Q3 : 10503.52127
IQR : 1018.03473
C.V. : 0.07076
Y4 = Min( X 1 , X 2 ),
Variance : 16.63579
S.D. : 4.07870
MAD : 3.42879
Range : 17.07097
Median : 96.21186
Q1 : 93.63726
Q2 : 96.21186
Q3 : 100.00155
IQR : 6.36429
C.V. : 0.04200
Y5 = Max( X 1 , X 2 ),
Variance : 16.63924
S.D. : 4.07912
MAD : 3.42913
Range : 17.07099
Mid_range : 101.46443
Median : 103.78740
Q1 : 99.99853
Q2 : 103.78740
Q3 : 106.36321
IQR : 6.36468
C.V. : 0.03965
X1 × X 2 1
W1 = = ,
X1 + X 2 1 X1 +1 X 2
Variance : 3.13287
S.D. : 1.76999
MAD : 1.44915
Range : 8.53611
Median : 49.88532
Q1 : 48.66361
Q2 : 49.88532
Q3 : 51.21596
IQR : 2.55235
C.V. : 0.03544
35
Example 9 1st population is Normal distribution, population mean=100,
2nd population is Normal distribution, population mean=100,
2
 S12 S 22 
 + 
X1 − X 2 X1 − X 2  n1 n2 
H 0 : µ1 = µ 2 , t df = = , df = 2 2
,
S12 S 22 S12 S 22  S12   S12 
+ +   (n1 − 1) +   (n2 − 1)
n1 n2 20 15  n1   n2 
∑ (X ) ∑ (X )
n1 n2
2
−X2
2
1i − X1 2j
j =1
S12 = i =1
, S 22 = , the sample variance of two populations.
n1 − 1 n2 − 1
2
 S12 S 22 
 + 
(9.1) df = W5 =  1
n n 2 
is estimated value,
2 2
 S12   S12 
  (n1 − 1) +   (n2 − 1)
 n1   n2 
The probability distribution of estimated value,

Variance : 5.39206
S.D. : 2.32208
MAD : 1.89084
Range : 15.79901
Median : 31.36151
Q1 : 29.26977
Q2 : 31.36151
Q3 : 32.57810
IQR : 3.30833
C.V. : 0.07580
2
 σ 12 σ 22 
 + 
 1
n n 2 
=
3.4225
= 27.1303392919,
2 2
 σ 12   σ 12  0.1272468421
  (n1 − 1) +   (n2 − 1)
 n1   n2 
  S12 S 22 
2
  σ 12 σ 22 
2
  +    + 
  n1 n2    n1 n2 
E 2 2 ≠ 2 2
,
  S12   S12    σ 12   σ 12 
   (n1 − 1) +   (n2 − 1)    (n1 − 1) +   (n2 − 1)
  n1   n2    n1   n2 
X1 − X 2 X1 − X 2 X1 − X 2
(9.2) t df = = , W2 = , the test statistic.
2 2 2 2
S S S S S12 S 22
+ 1 2 1
+ 2
+
n1 n2 20 15 20 15
36
Variance : 1.06693
S.D. : 1.03292
MAD : 0.81728
Range : 14.32443
Mid_range : 0.38564
Median : -0.00002
Q1 : -0.68216
Q2 : -0.00002
Q3 : 0.68226
IQR : 1.36442
C.V. : none
W2 is symmetric distribution, P (t 99 ≤ t1−α ,df ) = α ,

α 0.9 0.95 0.975 0.99 0.995
Critical value 1.3087 1.6943 2.0374 2.4499 2.7387
student(df=27),
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.3137 1.7033944 2.052 2.4726 2.7704
W2 is not student(df=27),

W2,Red line,W0~t (df=27),Blue line
37
2.4. Two dependent population means and population variances test
Two dependent population distributions is always the normal probability distribution.

In reality, the joint probability of twp populations is bi-variate normal distribution,the
population distribution can be any kind of probability distribution. The big data is
population data, the probability distribution is analysis method, there are the marginal
probability distrirbution and the joint probability distrbution.
Example 10 1st population is Double exponential distribution, population mean=100,

population variance= 8, X 1 ~ Double exponential λ X 1 = 0.5, µ X1 = 100 , ( )
2 population is X 2 , X 2 x1 ~ Double exponential λ X 2 = 0.5, µ X 2
nd
( = x ),
1

Two populations are dependent, simulated the 20 pair samples.
Two dependent population means test

d i = X 1i − X 2i , i = 1,2,...,20
H 0 : µ1 − µ 2 = 0,
∑ (d )
n n
∑d
2
i i −d
d d
t n −1 = = t19 = ,d = i =1
, S d2 = i =1
,
Sd n Sd 20 n n −1
The correlation coefficient test
H 0 : ρ ( X 1 , X 2 ) = ρ 0 = 0.5 ,
1 1+ r  1  1 + ρ0 
Z r = ln , Z ρ0 = ln ,
2 1− r  2  1 − ρ 0 
Z r − Z ρ0 Z r − Z 0.70710678118
Z test statistic n →
>10
= = W9 ,
1 1
n−3 17
∑ (X )( )
n n n
1i − X 1 X 2i − X 2 ∑ X 1i ∑X 2i
r= i =1
,X1 = i =1
,X2 = i =1
,
∑ (X ) ∑ (X )
n
2
n
2 n n
1i − X1 2i −X2
i =1 i =1
1 1+ r 
Zr = ln  is approached to standara normal disrribution when n > 10 .
2 1− r 
d
(10.1) t19 = = W2 , this is test statistic,
Sd 20
38
Variance : 1.10477
S.D. : 1.05108
MAD : 0.83664
Range : 15.32141
Median : 0.00102
Q1 : -0.70487
Q2 : 0.00102
Q3 : 0.70679
IQR : 1.41166
C.V. : none
W2 is symmetric distribution, P(t 99 ≤ t1−α ,19 ) = α ,

α 0.9 0.95 0.975 0.99 0.995
Critical value 1.339859 1.721406 2.058278 2.460474 2.742261
student(df=19),
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.3280 1.7293 2.0932 2.5388 2.8600
W2 is not student(df=19),
W2,Red line,W0~t (df=19),Blue line
W2 is approached to t(df=19).
(10.2) Z = 17 × (Z r − Z 0.70710678118 ) = W9 , it is test statistic,
39
Variance : 1.55681
S.D. : 1.24772
MAD : 0.99178
Range : 14.31015
Mid_range : 0.23809
Median : 0.11060
Q1 : -0.71401
Q2 : 0.11060
Q3 : 0.95325
IQR : 1.66726
C.V. : 9.64807
W9 is not symmetric distribution, P(W9 ≤ W9,1−α ) = α ,

α 0.005 0.01 0.025 0.05 0.1
Critical value -3.034597 -2.722399 -2.271204 -1.887507 -1.448223
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.732804 2.210495 2.632219 3.131701 3.476317
W9 is not Z distribution. The critical value table is Appendix 13.

W9,Red line,W0~Z disrribution(standard normal distribution),Blue line
40
W9 is not Z distribution,
Example 11 1st population is Double exponential distribution, population mean=100,

(
population variance= 8, X 1 ~ Double exponential λ X 1 = 0.5, µ X1 = 100 , )
(
2nd population is X 2 , X 2 x1 ~ Double exponential λ X 2 = 0.5, µ X 2 = x ), 1

Two populations are dependent, simulated the 60,000,000 pair samples.
(11.1)the marginal probability
X 1 marginal probability
Variance : 8.00101
S.D. : 2.82861
MAD : 2.00007
Range : 69.35558
Median : 100.00016
Q1 : 98.61422
Q2 : 100.00016
Q3 : 101.38700
IQR : 2.77278
C.V. : 0.02829
X 2 marginal probability
Variance : 15.99534
S.D. : 3.99942
MAD : 2.99973
Range : 78.72053
Median : 99.99975
Q1 : 97.70688
Q2 : 99.99975
Q3 : 102.29224
IQR : 4.58536
C.V. : 0.03999
41
(11.2) Comaprsion of the cumulative probability distribution function of X 1 and
X 2 , the analyis method is SLLN.
X 1 ,Red line, X 2 ,Blue line

X 1 and X 2 are not same probability
distribution.
(11.3)The joint probability distribution,

f (x1 , x2 ) f ( x2 , x1 )
42
E(X1)= 99.9998, Var(X1)= 8.0009, E(X2)=99.9999, Var(X2)=16.0037,
Cov(X1,X2)= 8.0028, X1 and X2 correlation coefficient=0.7072.
(11.4)The probability distribution transformation,
Y1 = X 1 + X 2 ,
Variance : 40.00594
S.D. : 6.32502
MAD : 4.66678
Range : 140.09298
Mid_range : 198.22280
Median : 200.00043
Q1 : 196.52041
Q2 : 200.00043
Q3 : 203.48213
IQR : 6.96173
C.V. : 0.03162
Y2 = X 1 − X 2 ,
Variance : 7.99976
S.D. : 2.82838
MAD : 2.00007
Range : 71.51912
Mid_range : 1.87186
Median : -0.00008
Q1 : -1.38633
Q2 : -0.00008
Q3 : 1.38652
IQR : 2.77285
C.V. : none
Y3 = Max( X 1 , X 2 ),
Variance : 11.00200
S.D. : 3.31693
MAD : 2.42632
Range : 70.62467
Mid_range : 102.26586
Median : 100.71252
Q1 : 99.18867
Q2 : 100.71252
Q3 : 102.69491
IQR : 3.50624
C.V. : 0.03284
Y4 = Min( X 1 , X 2 ),
43
Variance : 11.00050
S.D. : 3.31670
MAD : 2.42618
Range : 71.89303
Median : 99.28708
Q1 : 97.30518
Q2 : 99.28708
Q3 : 100.81114
IQR : 3.50596
C.V. : 0.03350
W2 = Max( X 1 , X 2 ) − Min( X 1 , X 2 ),
Variance : 3.99947
S.D. : 1.99987
MAD : 1.47152
Range : 37.63142
Median : 1.38642
Q1 : 0.57545
Q2 : 1.38642
Q3 : 2.77257
IQR : 2.19712
C.V. : 0.99990
Note: please refer the Appendix 9.
Chaper 3. The population proportion test
3.1. One population proportion test,

The population proportion is parameter of Bernoulli population, the sample poprtion
is the sample mean is always use the the central limit theorem to do test. The big
data is population data, use the probability distribution to analysis.
Example 12 The population is B(1, p = 0.5) and simulated n samples, the summation
of sample is B(n, p = 0.5) ,
sample poprtion pˆ = , X ~ B(n, p = 0.5), x = 0,1,..., n,

X
n
(12.1)n=30,
 30 
X 1 ~ Binomial (n = 30, p = 0.5), X 2 ~ Normal  µ = np = 15, σ 2 = np(1 − p ) = ,
 4
44
X1 X2
(12.2)n=31,
 31 
X 1 ~ Binomial (n = 31, p = 0.5), X 2 ~ Normal  µ = np = 15.5, σ 2 = np(1 − p ) = ,
 4
X 1 and X 2 are independent r.v.’s.

X1 X2
45
Whe n=30, the binomial distribution is not approached to the standard normal
distribution, the central limit theorem cannot be applied.
12.3)n=1000,
X 1 ~ Binomial (n = 1000, p = 0.5),
 1000 
, σ = np(1 − p ) =
1000 2
X 2 ~ Normal  µ = np = ,
 2 4 

46
X1 X2
distribution, the central limit theorem cannot be applied.
12.4)n=10000,
X 1 ~ Binomial (n = 10000, p = 0.5),
 10000 
, σ = np(1 − p ) =
10000 2
X 2 ~ Normal  µ = np = ,
 2 4 

Pr(| X1 distribution F() - X2 distribution F()||>= 0.0001000000)= 0.939519
47
X1 X2
distribution, the central limit theorem canbe applied.
Note: The probability distribitoon of sample proportion,
X ~ B(1, p = 0.5), E ( X ) = µ = p = 0.5, Var ( X ) = σ 2 = p(1 − p ) = 0.25,

X−p X − 0.5
Y= = = 2 X − 1,
p(1 − p ) 0.5
P(Y = −1) = 0.5, P(Y = 1) = 0.5,
exp(− it ) + exp(it ) cos(t ) − i sin (t ) + cos(t ) + i sin (t )
φY (t ) = E (exp(itY )) = = = cos(t ),
2 2
X 1 ,...., X n ~ B(1, p = 0.5),
iid
E ( X i ) = µ = p = 0.5,Var ( X i ) = σ 2 = p(1 − p ) = 0.25, i = 1,2,..., n,

Xi − p X i − 0.5
Yi = = = 2 X i − 1, i = 1,2,..., n,
p(1 − p ) 0.5
n
 t  n
 t    t  
φ X − p (t ) = φ n   = ∏ φ Xi − p    = φ X1 − p   
p (1− p ) n
∑ ( X i − p )  n  i =1
i =1
p (1− p )  n   p (1− p )  n  
p (1− p )
n n
 t2 t4 t6   ∞
t 2k 
= φ X − p (t ) = E 1 − + − + ....  = E 1 + (− 1) ∑ 
k
 2!×n 4!×n 6!×n 3 k =1 (2k )!×n

2 k
p (1− p ) n   
n
  t   t2 

=  cos  → exp − ,
  n
→∞
  n   2
48
∞  t2   w2 
f W (w) =  exp(− itw)dw = ,−∞ < w < ∞, W ~ Normal (0,1).
1 1
2π ∫−∞  − 2
exp
 2π
exp −
 2 
The inver formula is applied when W is continuous random variable.

n
∑X i n
X = pˆ = i =1
, ∑ X i ~ Binomial (n, p ) , the sample proportion is disctete random
n i =1
variable.
X is discrete random value, but the range 0 ≤ X ≤ 1 ,
X−p
is discrete random variable, but sometime is likely the continuous
p(1 − p ) n
random variable.
( )
P Y 2 = 1 = 1, Y 2 is point distribution, it is not continuous random variable.
( )
P Y 2 k = 1 = 1, Y 2 k is point distribution also , k = 1,2,..., ∞ .
Example 13 The population is B(1, p 0 ) and simulated n samples, the summation of

sample is B(n, p 0 ) ,
sample poprtion pˆ = , X ~ B(n, p 0 ), x = 0,1,..., n,

X
n
pˆ − p0 pˆ − p
H 0 : p = p0 , test statistic= , confidence interval formula= ,
p0 (1 − p0 ) pˆ (1 − pˆ )
n n
13.1)
X1
X 1 ~ Binomial (n = 30, p = 0.1), pˆ =
n
pˆ − p pˆ − 0.1 pˆ − p pˆ − 0.1
W4 = = , W5 = = ,
p(1 − p ) 0.1(1 − 0.1) pˆ (1 − pˆ ) pˆ (1 − pˆ )
30 30 30 30
f W4 (w4 ), FW4 (w4 ) Coefficient
Variance : 0.89015
S.D. : 0.94348
MAD : 0.75072
Range : 8.52013
Mid_range : 3.04290
Median : 0.00000
Q1 : -0.60858
Q2 : 0.00000
Q3 : 0.60858
IQR : 1.21716
C.V. : 11.66352
49
Variance : 1.05820
S.D. : 1.02869
MAD : 0.83097
Range : 6.41597
Mid_range : 1.17379
Median : 0.00000
Q1 : -0.73193
Q2 : 0.00000
Q3 : 0.53709
IQR : 1.26901
C.V. : none
Whe n=30 and p=0.1, the binomial distribution is not approached to the standard
normal distribution, the central limit theorem cannot be applied.
13.2)
X1
X 1 ~ Binomial (n = 30, p = 0.5), pˆ =
n
pˆ − p pˆ − 0.5 pˆ − p pˆ − 0.5
W4 = = , W5 = = ,
p(1 − p ) 0.5(1 − 0.5) pˆ (1 − pˆ ) pˆ (1 − pˆ )
30 30 30 30
Variance : 1.00000
S.D. : 1.00000
MAD : 0.79134
Range : 10.22415
Mid_range : 0.00000
Median : 0.00000
Q1 : -0.73030
Q2 : 0.00000
Q3 : 0.73030
IQR : 1.46059
C.V. : none

Variance : 1.11814
S.D. : 1.05742
MAD : 0.82078
Range : 28.47867
Mid_range : 0.00000
Median : 0.00000
Q1 : -0.73688
Q2 : 0.00000
Q3 : 0.73688
IQR : 1.47375
C.V. : none
50
13.3)
X 1 ~ Binomial (n = 1000, p = 0.1), pˆ =
X1
n
pˆ − p pˆ − 0.1 pˆ − p pˆ − 0.1
W4 = = ,W5 = = ,
p(1 − p ) 0.1(1 − 0.1) pˆ (1 − pˆ ) pˆ (1 − pˆ )
n 1000 n 1000
Variance : 1.00007
S.D. : 1.00003
MAD : 0.79733
Range : 10.54093
Mid_range : 0.42164
Median : 0.00000
Q1 : -0.63246
Q2 : 0.00000
Q3 : 0.63246
IQR : 1.26491
C.V. : none
W0~Normal(0,1),

Variance : 1.01594
S.D. : 1.00794
MAD : 0.80281
Range : 11.16694
Median : 0.00000
Q1 : -0.65016
Q2 : 0.00000
Q3 : 0.61635
IQR : 1.26652
C.V. : none
W0~Normal(0,1),
51
13.4)
X 1 ~ Binomial (n = 1000, p = 0.5), pˆ = 1
X
n
pˆ − p pˆ − 0.5 pˆ − p pˆ − 0.5
W4 = = , W5 = = ,
p(1 − p ) 0.5(1 − 0.5) pˆ (1 − pˆ ) pˆ (1 − pˆ )
n 1000 n 1000
Variance : 0.99998
S.D. : 0.99999
MAD : 0.79763
Range : 10.81499
Median : 0.00000
Q1 : -0.69570
Q2 : 0.00000
Q3 : 0.69570
IQR : 1.39140
C.V. : none
W0~Normal(0,1),

Variance : 1.00300
S.D. : 1.00150
MAD : 0.79843
Range : 10.97668
Median : 0.00000
Q1 : -0.69587
Q2 : 0.00000
Q3 : 0.69587
IQR : 1.39174
C.V. : none
W0~Normal(0,1),
52
Example 14, The population is B(1, p ) , simulated the sample size n=100,0000, it is big
data(population data), the sample porportion is population porportin.
value Simple number probability
0 n-X 1-X/n=1-p
1 X p=X/n
Example 15, X 1 ~ Beta(α = 5, β = 5) , X 2 x1 ~ B(1, x1 ) . let X 2 = x1 + ε ,

X 1 marginal probaiblity distribution,
Variance : 0.02273
S.D. : 0.15076
MAD : 0.12305
Range : 0.97494
Mid_range : 0.50109
Median : 0.49999
Q1 : 0.39195
Q2 : 0.49999
Q3 : 0.60803
IQR : 0.21609
C.V. : 0.30153
X 2 marginal probaiblity distribution, it it discrete random variable.

Variance : 0.25000
S.D. : 0.50000
MAD : 0.50000
Range : 1.00000
Mid_range : 0.50000
Median : 1.00000
Q1 : 0.00000
Q2 : 1.00000
Q3 : 1.00000
IQR : 1.00000
C.V. : 0.99998
ε = W1 = X 2 − X 1 ,
Variance : 0.22727
S.D. : 0.47673
MAD : 0.45454
Range : 1.96586
Mid_range : 0.00293
Median : -0.03940
Q1 : -0.45171
Q2 : -0.03940
Q3 : 0.45172
IQR : 0.90343
C.V. : none
53
3.2. Two independent population proportion test
Two indepdendent Bernoulli population, there are two sample proporitons and they
are discrete random varuables. The central limit theory may not be applied when the
sample size is not very large. When the sample size very large, it is big data and the
analysis method is probability distribution.
Example 16, X 1 ~ Binomial (n1 , p1 ), pˆ 1 = , X 2 ~ Binomial (n2 , p 2 ), pˆ 2 = 2 ,

X1 X
n1 n2
X 1 , X 2 are independent r.v.’s,
pˆ 1 − pˆ 2 X + X2 pˆ 1 − pˆ 2
W3 = ,p= 1 ,W5 = ,
pˆ 1 (1 − pˆ 1 ) pˆ 1 (1 − pˆ 1 )
(
p 1− p ) 1 1
+
n1 n2
n1 + n2
n1
+
n2
16.1) X 1 ~ Binomial (n1 = 30, p1 = 0.1), X 2 ~ Binomial (n2 = 30, p 2 = 0.1),
Variance : 0.81016
S.D. : 0.90009
MAD : 0.71823
Range : 7.60029
Mid_range : 0.07571
Median : 0.00000
Q1 : -0.59235
Q2 : 0.00000
Q3 : 0.59235
IQR : 1.18470
C.V. : none

Variance : 0.84023
S.D. : 0.91664
MAD : 0.72772
Range : 8.72422
Mid_range : 0.11444
Median : 0.00000
Q1 : -0.59409
Q2 : 0.00000
Q3 : 0.59409
IQR : 1.18818
C.V. : none
The central limit theory is not happen.
54
Variance : 1.01669
S.D. : 1.00831
MAD : 0.80121
Range : 11.19213
Mid_range : 0.09698
Median : 0.00000
Q1 : -0.77503
Q2 : 0.00000
Q3 : 0.77503
IQR : 1.55005
C.V. : none

Variance : 1.07257
S.D. : 1.03565
MAD : 0.81552
Range : 16.20374
Mid_range : 0.29369
Median : 0.00000
Q1 : -0.77894
Q2 : 0.00000
Q3 : 0.77894
IQR : 1.55787
C.V. : none
The central limit theory is not happen.

Variance : 1.00039
S.D. : 1.00019
MAD : 0.79809
Range : 10.98974
Median : 0.00000
Q1 : -0.67535
Q2 : 0.00000
Q3 : 0.67535
IQR : 1.35069
C.V. : none
W0~Normal(0,1),
55

Variance : 1.00189
S.D. : 1.00094
MAD : 0.79849
Range : 11.07388
Median : 0.00000
Q1 : -0.67542
Q2 : 0.00000
Q3 : 0.67542
IQR : 1.35085
C.V. : none
W0~Normal(0,1),
The central limit theory can be applied when n=1000.

Variance : 1.00034
S.D. : 1.00017
MAD : 0.79785
Range : 10.86838
Median : 0.00000
Q1 : -0.67092
Q2 : 0.00000
Q3 : 0.67094
IQR : 1.34186
C.V. : none
W0~Normal(0,1),
56
Variance : 1.00185
S.D. : 1.00092
MAD : 0.79825
Range : 10.94953
Median : 0.00000
Q1 : -0.67099
Q2 : 0.00000
Q3 : 0.67102
IQR : 1.34201
C.V. : none
W0~Normal(0,1),
The central limit theory can be applied when n=1000.
Example 17, X 1 ~ Beta(α = 5, β = 5) , X 2 x1 ~ B(1, x1 ) ,let X 2 = x1 + ε 1 ,

X 3 ~ Beta(α = 0.5, β = 0.5) , X 4 x3 ~ B(1, x1 ) , let X 4 = x3 + ε 2 ,
X 1 , X 3 are independent random variables,
X 2 , X 4 are independent random variables.
Y1 = X 2 − X 4 marginal probability distribution?
X 1 marginal probability distribution, X 2 marginal probability distribution
and ε 1 marginal probability distributio, please refer the example 15.
Y1 = X 2 − X 4 marginal probability distribution,

Variance : 0.49987
S.D. : 0.70701
MAD : 0.49993
Range : 2.00000
Mid_range : 0.00000
Median : 0.00000
Q1 : 0.00000
Q2 : 0.00000
Q3 : 0.00000
IQR : 0.00000
C.V. : none
This is tri-nomial distribution, let P(Y1=-1)=0.25, P(Y1=0)=0.5, P(Y1=1)=0.25,

and 2Y1-1~Binomial(n=3,p=0.5).
57
X 3 marginal probability distribution,
Variance : 0.12500
S.D. : 0.35356
MAD : 0.31831
Range : 1.00000
Mid_range : 0.50000
Median : 0.50029
Q1 : 0.14655
Q2 : 0.50029
Q3 : 0.85369
IQR : 0.70714
C.V. : 0.70694
X 4 marginal probability distribution, it is discrete random variable.

Variance : 0.25000
S.D. : 0.50000
MAD : 0.50000
Range : 1.00000
Mid_range : 0.50000
Median : 1.00000
Q1 : 0.00000
Q2 : 1.00000
Q3 : 1.00000
IQR : 1.00000
C.V. : 0.99984
ε 2 = W2 = X 4 − X 3 ,
Variance : 0.12495
S.D. : 0.35348
MAD : 0.24993
Range : 1.99998
Mid_range : 0.00000
Median : 0.00000
Q1 : -0.16316
Q2 : 0.00000
Q3 : 0.16314
IQR : 0.32630
C.V. : none
U1 = ε1 + ε 2 ,
Variance : 0.35231
S.D. : 0.59355
MAD : 0.50497
Range : 3.80177
Mid_range : 0.00499
Median : -0.00000
Q1 : -0.46153
Q2 : -0.00000
Q3 : 0.46154
IQR : 0.92307
C.V. : none
58
Chaper 4. One way analysis
4.1. one way model
One way model requriement,

( )
iid
X ij = µ + α i + ε ij , i = 1,2,.., k , j = 1,....., n, ε ij ~ Normal 0, σ ε2
This model cannot analysis the big data, the big data is population data and the
analysis method is the probability distribition.
4.2. the α
= i 0,=i 1, 2, ..., k ,
Example 18 Normal population is divide to 5 categories,

(
Category 2 population, X 2 ~ N (µ 2 = 25,σ 22 = 5 ), 2
Category 3 population, X 3 ~ N (µ 3 = 25, σ 32 = 5 ), 2
Category 4 population, X 4 ~ N (µ
4 = 25, σ 4
2 2
= 5 ),
5 = 25, σ 5
2 2
= 5 ),
The each category has n sample data, one way model is designed by
X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,

iid
18.1)n=100,
One way model analysis, popuation distribution is normal distribution.
One way model
X(ij)=mu+alpha(i)+e(ij), i=1,2...,5, j=1,2...,n(i)
1=A1, 2=A2, 3=A3, 4=A4, 5=A5
A1 A2 A3 A4 A5 Total
sample size 100 100 100 100 100 500
sample mean 24.40637 25.13159 24.90588 25.63750 24.43427 24.90312
sample variance 24.11047 25.44705 22.79769 20.40478 24.85717
alpha estimate value -0.49675 0.22847 0.00276 0.73438 -0.46885
summation of alpha(i)=0.000000
H0:alpha(1)=...=alpha(5)=0
ANOVA
Source df SS MS F
Treatment 4 105.8109657939 26.4527414485 1.1245272744
Error 495 11644.0990944496 23.5234325140
Total 499 11749.9100602435
The F test p value=0.348400
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -5.92056 -3.70917 -2.08938 -0.67794 0.67706 2.08788
3.70737 5.91757
upper limit -5.92056 -3.70917 -2.08938 -0.67794 0.67706 2.08788 3.70737
5.91757
observed no 52.00000 55.00000 58.00000 66.00000 59.00000 48.00000 53.00000
51.00000 58.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
59
chi square 0.22756 0.00556 0.10756 1.96356 0.21356 1.02756 0.11756
0.37356 0.10756
degree of freedom=7
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
p-value=0.763000
H0: Variances are equal

The Bartlett chi-square test statistic =1.495242
p-value=0.827400
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=260
number of the positive ofresidual=240
Run=257
H0: residualis random , H1: Increasing line or decreasing line
Z=0.573928, p-value=0.717100
H0: residual is random , H1: Oscillation
Z=0.573928, p-value=0.282900
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.573928, p-value=0.565800
multiple comparison of population means
1. LSD( least significant difference)
The confidence coefficietn=0.95
95% C.I. for mu(1)-mu(2)
[ -2.0696073781, 0.61915806020], mu(1)=mu(2)
95% C.I. for mu(1)-mu(3)
[ -1.8438932354, 0.84487220300], mu(1)=mu(3)
95% C.I. for mu(1)-mu(4)
[ -2.5755180890, 0.11324734930], mu(1)=mu(4)
95% C.I. for mu(1)-mu(5)
[ -1.3722842001, 1.31648123820], mu(1)=mu(5)
95% C.I. for mu(2)-mu(3)
[ -1.1186685764, 1.57009686190], mu(2)=mu(3)
95% C.I. for mu(2)-mu(4)
[ -1.8502934301, 0.83847200820], mu(2)=mu(4)
95% C.I. for mu(2)-mu(5)
[ -0.6470595412, 2.04170589710], mu(2)=mu(5)
95% C.I. for mu(3)-mu(4)
[ -2.0760075728, 0.61275786550], mu(3)=mu(4)
95% C.I. for mu(3)-mu(5)
[ -0.8727736839, 1.81599175440], mu(3)=mu(5)
95% C.I. for mu(4)-mu(5)
[ -0.1411488302, 2.54761660810], mu(4)=mu(5)
conclusion,
mu(1)=mu(2)= mu(3) =mu(4) = mu(5),
90% confidence interval for population variance [21.296712 , 26.270162]
90% confidence interval for population standard deviation [4.614836 , 5.125443]
sample scatter diagram residual polr
60
(18.2)n=100,000,000, this is big data and the method is probability distribution.
(18.2.1)X1,…,X5 marginal probability disribution,
X1 marginal probability distribution,
Variance : 25.00206
S.D. : 5.00021
MAD : 3.98959
Range : 59.70709
Median : 24.99979
Q1 : 21.62796
Q2 : 24.99979
Q3 : 28.37247
IQR : 6.74452
C.V. : 0.20001
Variance : 24.99649
S.D. : 4.99965
MAD : 3.98918
Range : 57.16562
Median : 25.00050
Q1 : 21.62799
Q2 : 25.00050
Q3 : 28.37249
IQR : 6.74450
C.V. : 0.19998
( )
iid
X1,…,X5 ~ Normal µ1 = 25, σ 12 = 5 2 .
(18.2.2) The probability distribution of merging X1,X2,X3,X4,X5, the probability
distrituions of X1,..,X5 are conditional probability and the pripori probability
distribution is the proportion(each category sample size ratio) that is 0.2.
The marginal probability distribution,
f X (x ) = P(1st ) f (x 1st ) + P(2nd ) f (x 2nd ) + P(3rd ) f (x 3rd ) + P(4th ) f (x 4th )
 (x − 25)2 
+ P(5th ) f (x 5th ) =
1
× exp − ,−∞ < x < ∞
50π  50 
Y1=X marginal probability distribution,
61
Variance : 25.00216
S.D. : 5.00022
MAD : 3.98988
Range : 56.66966
Median : 25.00039
Q1 : 21.62609
Q2 : 25.00039
Q3 : 28.37265
IQR : 6.74656
C.V. : 0.20001
4.3. the α i ≠ 0, i = 1,2,..., k ,
Example 19 Normal population is divide to 5 categories,

(
Category 1 population, X 1 ~ N µ1 = 15, σ 12 = 5 2 , )
(
(
4 = 5, σ = 5 ),
2
4
2
5 = 45, σ = 5 ),
2
5
2
X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,
α 1 = −10,α 2 = 10,α 3 = −0,α 4 = −20,α 5 = 20, ε ij ~ Normal (0,σ ε2 = 5 2 )

iid
19.1)n=100,
One way model analysis, popuation distribution is normal distribution.
One way model
1=A1, 2=A2, 3=A3, 4=A4, 5=A5
sample size 100 100 100 100 100 500
sample mean 14.67626 35.11895 25.00049 4.90064 44.68392 24.87606
sample variance 35.35926 23.77747 27.54776 24.88746 19.30776
alpha estimate value -10.19979 10.24290 0.12444 -19.97541 19.80787
summation of alpha(i)=-0.000000
ANOVA
Source df SS MS F
Treatment 4 100033.6931928730 25008.4232982183 955.3972743523
Error 495 12957.0911127101 26.1759416418
Total 499 112990.7843055832
[checking the three basic assumptions]
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -6.24545 -3.91271 -2.20404 -0.71515 0.71421 2.20246
3.91081 6.24230
upper limit -6.24545 -3.91271 -2.20404 -0.71515 0.71421 2.20246 3.91081
6.24230
observed no 57.00000 58.00000 47.00000 58.00000 57.00000 52.00000 58.00000
58.00000 55.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
62
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 0.03756 0.10756 1.31756 0.10756 0.03756 0.22756 0.10756
0.10756 0.00556
degree of freedom=7

p-value=0.044400
number of the positive ofresidual=261, Run=250
Z=-0.046289, p-value=0.481600
H0: residual is random , H1: Oscillation, Z=-0.046289, p-value=0.518400
Z=-0.046289, p-value=0.963200
95% C.I. for mu(1)-mu(2)
[ -21.8608427374, -19.02453253610], mu(1)<mu(2)
95% C.I. for mu(1)-mu(3)
[ -11.7423868730, -8.90607667160], mu(1)<mu(3)
95% C.I. for mu(1)-mu(4)
[ 8.3574644140, 11.19377461530], mu(1)>mu(4)
95% C.I. for mu(1)-mu(5)
[ -31.4258170928, -28.58950689140], mu(1)<mu(5)
95% C.I. for mu(2)-mu(3)
[ 8.7003007638, 11.53661096520], mu(2)>mu(3)
95% C.I. for mu(2)-mu(4)
[ 28.8001520507, 31.63646225210], mu(2)>mu(4)
95% C.I. for mu(2)-mu(5)
[ -10.9831294560, -8.14681925470], mu(2)<mu(5)
95% C.I. for mu(3)-mu(4)
[ 18.6816961862, 21.51800638760], mu(3)>mu(4)
95% C.I. for mu(3)-mu(5)
[ -21.1015853205, -18.26527511910], mu(3)<mu(5)
95% C.I. for mu(4)-mu(5)
[ -41.2014366074, -38.36512640600], mu(4)<mu(5)
conclusion,mu(4)<mu(1)< mu(3) <mu(2) < mu(5)
63
The best parameters and goodness of fit(pearson chi square test)
mu point estimated value=0.000000 (MLE), sigma point estimated value=5.116243 (MLE)
mu value from -1.023249 to 1.023249, sigma value from 4.263536 to 6.395304
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -6.30129 -3.97826 -2.27671 -0.79403 0.62937 2.11142
3.81265 6.13443
upper limit -6.30129 -3.97826 -2.27671 -0.79403 0.62937 2.11142 3.81265
6.13443
observed no 55.00000 58.00000 48.00000 56.00000 53.00000 56.00000 59.00000
57.00000 58.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 0.00556 0.10756 1.02756 0.00356 0.11756 0.00356 0.21356
0.03756 0.10756
degree of freedom=6
H0: A0~Normal(mu=-0.081860,sigma*sigma=25.958263), sigma=5.094925
19.2) n= 100,000,000, this is big data and the method is probability distribution.
(19.2.1)X1,…,X5 marginal probability distribution,
Variance : 25.00033
S.D. : 5.00003
MAD : 3.98943
Range : 56.25291
Median : 14.99986
Q1 : 11.62697
Q2 : 14.99986
Q3 : 18.37208
IQR : 6.74511
C.V. : 0.33334
64
Variance : 24.99987
S.D. : 4.99999
MAD : 3.98947
Range : 55.94190
Median : 34.99930
Q1 : 31.62725
Q2 : 34.99930
Q3 : 38.37220
IQR : 6.74495
C.V. : 0.14286
Variance : 25.00206
S.D. : 5.00021
MAD : 3.98959
Range : 59.70709
Median : 24.99979
Q1 : 21.62795
Q2 : 24.99979
Q3 : 28.37247
IQR : 6.74451
C.V. : 0.20001

Variance : 24.99648
S.D. : 4.99965
MAD : 3.98918
Range : 57.16562
Mid_range : 4.59357
Median : 5.00051
Q1 : 1.62799
Q2 : 5.00051
Q3 : 8.37249
IQR : 6.74450
C.V. : 0.99989
Variance : 25.00086
S.D. : 5.00009
MAD : 3.98926
Range : 57.15199
Median : 44.99966
Q1 : 41.62779
Q2 : 44.99966
Q3 : 48.37221
IQR : 6.74442
C.V. : 0.11111
65
X1,X2,X3,X4,X5 are normal distribution and the population mean are not equal and
the population variances are equally.

+ P(5th ) f (x 5th )
1  ( x − 15)2  1  ( x − 35)2 
= 0.2 × × exp −  + 0.2 × × exp− 
50π  50  50π 
 50 
1  ( x − 25)2  1  ( x − 5)2 
+ 0.2 × × exp −  + 0.2 × × exp− 
50π  50  50π 
 50 
1  ( x − 45)2 
+ 0.2 × × exp − ,−∞ < x < ∞
50π  50 
Y1=X marginal probability distribution ,Y1 is not normal distribution,

S.D. : 15.00025
MAD : 12.83177
Range : 92.51818
Median : 24.99654
Q1 : 12.51836
Q2 : 24.99654
Q3 : 37.47605
IQR : 24.95769
C.V. : 0.60007
E (Y1 ) = E ( X ) = 0.2 × E (X 1st ) + 0.2 × E (X 2nd ) + 0.2 × E (X 3rd ) + 0.2 × E (X 4th )

+ 0.2 × E (X 5th ) = 25,
( ) ( ) ( ) ( ) (
E Y12 = E X 2 = 0.2 × E X 2 1st + 0.2 × E X 2 2nd + 0.2 × E X 2 3rd + 0.2 × E X 2 4th ) ( )
15 + 25 + 35 + 25 + 25 2 + 25 +5 2 +25 + 45 2 + 25
+ 0.2 × E (X 5th ) =
2 2
2
= 430,
5
Var (Y1 ) = 430 − 25 2 = 225,
(19.2.3)The mean of X1,X2,X3,X4,X5.

X1 + X 2 + X 3 + X 4 + X 5
Y1= ,Y1 ~ Normal (E (Y1 ) = 25,Var (Y1 ) = 5) .
5
66
Variance : 4.99994
S.D. : 2.23605
MAD : 1.78407
Range : 25.90353
Median : 24.99983
Q1 : 23.49187
Q2 : 24.99983
Q3 : 26.50803
IQR : 3.01616
C.V. : 0.08944
4.4. the α i ≠ 0, i = 1,2,..., k and error distribution is Arcsin

distribution.
Exmple 20,
the α i ≠ 0, i = 1,2,..., k ,
X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,
α1 = −20, α 2 = −10, α 3 = 0, α 4 = 10, α 5 = 20, ε ij ~ Arc sin (0, cε = 10), σ ε2 = 50,

iid
20.1)n=100,
One way model analysis, popuation distribution is arcsin distribution.
1=A1, 2=A2, 3=A3, 4=A4, 5=A5
sample size 100 100 100 100 100 500
sample mean 5.95631 14.08830 24.75121 33.69864 44.68603 24.63610
sample variance 47.32315 43.33253 46.56744 53.27101 40.77840
alpha estimate value -18.67979 -10.54780 0.11511 9.06254 20.04993
ANOVA
Source df SS MS F
Treatment 4 94433.3479159967 23608.3369789992 510.4007919148
Error 495 22895.9809422788 46.2545069541
Total 499 117329.3288582755
[residual probabiltiy distribution analysis]

***************** test the error probability distribution ***********************
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -10.95300 -8.48030 -6.00759 -3.53489 -1.06219 1.41051 3.88321
6.35591 8.82861
upper limit -8.48030 -6.00759 -3.53489 -1.06219 1.41051 3.88321 6.35591
8.82861 11.30131
67
observed no 69.00000 57.00000 58.00000 51.00000 49.00000 45.00000 52.00000
51.00000 68.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 3.25356 0.03756 0.10756 0.37356 0.77356 2.00556 0.22756
0.37356 2.78756
degree of freedom=7
H0: error~Uniform(alpha,beta), alpha,beta are unknown
alpha point estimated value=-10.952996 (MLE), beta point estimated value=11.301311 (MLE)

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -8.24747 -5.16695 -2.91055 -0.94439 0.94316 2.90847
5.16444 8.24331
upper limit -8.24747 -5.16695 -2.91055 -0.94439 0.94316 2.90847 5.16444
8.24331
observed no 85.00000 59.00000 58.00000 35.00000 39.00000 39.00000 41.00000
56.00000 88.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 15.60556 0.21356 0.10756 7.60556 4.93356 4.93356 3.81356
0.00356 18.94756
degree of freedom=7
H0: error~Normal(mu=0,sigma*sigma), sigma are unknown
population variance(sigma*sigma) which point estimated value=45.647453 (UMVUE)

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -0.39431 -0.27796 -0.20990 -0.16161 -0.12207 -0.07378
-0.00572 0.11063
upper limit -0.39431 -0.27796 -0.20990 -0.16161 -0.12207 -0.07378 -0.00572
0.11063
observed no 246.00000 4.00000 1.00000 0.00000 0.00000 0.00000 3.00000
2.00000 244.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 652.84356 47.84356 53.57356 55.55556 55.55556 55.55556 49.71756
51.62756 639.20356
degree of freedom=7
H0: error~Double exponential(lamda,mu), lamda,mu are unknown
lamda point estimated value=0.167856 (MLE), mu point estimated value=-0.141842 (MLE)

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -10.95300 -10.45610 -8.52389 -5.56358 -1.93221 1.93221 5.56358
8.52389 10.45610
upper limit -10.45610 -8.52389 -5.56358 -1.93221 1.93221 5.56358 8.52389
10.45610 11.30131
observed no 10.00000 57.00000 71.00000 87.00000 67.00000 71.00000 53.00000
66.00000 18.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 37.35556 0.03756 4.29356 17.79756 2.35756 4.29356 0.11756
1.96356 25.38756
degree of freedom=7
H0: error~Arcsin(mu=0,c), mu,c are unknown,c point estimated value=11.127154 (MLE)
68
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -10.95300 -9.81323 -8.29369 -6.42427 -3.70905 3.70905 6.42427
8.29369 9.81323
upper limit -9.81323 -8.29369 -6.42427 -3.70905 3.70905 6.42427 8.29369
9.81323 11.30131
observed no 13.00000 66.00000 41.00000 61.00000 144.00000 56.00000 31.00000
51.00000 37.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 32.59756 1.96356 3.81356 0.53356 140.80356 0.00356 10.85356
0.37356 6.19756
degree of freedom=7
H0: error~Triangular 1(mu=0,c), mu,c are unknown

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -10.95300 -6.18175 -4.13330 -2.47270 -0.82423 0.82423 2.47270
4.13330 6.18175
upper limit -6.18175 -4.13330 -2.47270 -0.82423 0.82423 2.47270 4.13330
6.18175 11.30131
observed no 124.00000 43.00000 47.00000 27.00000 34.00000 30.00000 27.00000
48.00000 120.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 84.32356 2.83756 1.31756 14.67756 8.36356 11.75556 14.67756
1.02756 74.75556
degree of freedom=7
H0: error~Trapezoid(mu=0,c), mu,c are unknown, c point estimated value=7.418102 (MLE)

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit 0.00000 -10.39320 -9.29048 -7.83589 -5.43334 5.43244 7.83509
9.28958 10.39235
upper limit -10.39320 -9.29048 -7.83589 -5.43334 5.43244 7.83509 9.28958
10.39235 11.30131
observed no 10.00000 22.00000 60.00000 48.00000 220.00000 40.00000 47.00000
31.00000 22.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 37.35556 20.26756 0.35556 1.02756 486.75556 4.35556 1.31756
10.85356 20.26756
degree of freedom=7
H0: error~U_quadratic(a,b), a,b are unknown
a point estimated value=-11.301311 (MLE), b point estimated value=11.301311 (MLE)

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit 0.00000 -7.38183 -5.03257 -2.94816 -0.97269 0.97163 2.94710
5.03126 7.37970
upper limit -7.38183 -5.03257 -2.94816 -0.97269 0.97163 2.94710 5.03126
7.37970 11.12707
observed no 102.00000 46.00000 53.00000 36.00000 40.00000 38.00000 40.00000
39.00000 106.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 38.82756 1.64356 0.11756 6.88356 4.35556 5.54756 4.35556
4.93356 45.80356
degree of freedom=7
H0: error~Semi-circle(mu=0,R), mu,R are unknown , R point estimated value=11.127154 (MLE)
69
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -173.02438 -104.23882 -57.67479 -18.56714 18.56714 57.67479
104.23882 173.02438
upper limit -173.02438 -104.23882 -57.67479 -18.56714 18.56714 57.67479 104.23882
173.02438
observed no 0.00000 0.00000 0.00000 0.00000 500.00000 0.00000 0.00000
0.00000 0.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 55.55556 55.55556 55.55556 55.55556 3555.55556 55.55556 55.55556
55.55556 55.55556
degree of freedom=7
H0: error~Logistic(mu=0,sigma), mu,sigma are unknown
sigma point estimated value=83.207141 (MME)

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -10.95300 -5.97382 -3.76710 -2.07383 -0.64633 0.64633 2.07383
3.76710 5.97382
upper limit -5.97382 -3.76710 -2.07383 -0.64633 0.64633 2.07383 3.76710
5.97382 11.30131
observed no 128.00000 50.00000 47.00000 19.00000 29.00000 23.00000 29.00000
50.00000 125.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 94.46756 0.55556 1.31756 24.05356 12.69356 19.07756 12.69356
0.55556 86.80556
degree of freedom=7
H0: error~Triangular 2(a,b,0), mu,sigma are unknown
a point estimated value=-11.301311 (MLE) b point estimated value=11.301311 (MLE)
*********************************************************************************
The error probability is Uniform distribution after goodness of fit test
H0: alpha(1)=….= alpha(5)

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -10.95300 -8.48030 -6.00759 -3.53489 -1.06219 1.41051 3.88321
6.35591 8.82861
upper limit -8.48030 -6.00759 -3.53489 -1.06219 1.41051 3.88321 6.35591
8.82861 11.30131
observed no 69.00000 57.00000 58.00000 51.00000 49.00000 45.00000 52.00000
51.00000 68.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 3.25356 0.03756 0.10756 0.37356 0.77356 2.00556 0.22756
0.37356 2.78756
degree of freedom=7
H0: error~Uniform(alpha,beta), alpha,beta are unknown
alpha point estimated value=-10.952996 (MLE), beta point estimated value=11.301311 (MLE)

Max(sample variance(i))/SSE=test value=0.002327, p value=0.157790

number of the positive ofresidual=246, Run=250
70
Z=-0.083824, p-value=0.466600
Z=-0.083824, p-value=0.933200

1. LSD( least significant difference), The confidence coefficietn=0.95
95% C.I. for mu(1)-mu(2)
[ -10.0217884824, -6.24219714610], mu(1)<mu(2)
95% C.I. for mu(1)-mu(3)
[ -20.6847016860, -16.90511034980], mu(1)<mu(3)
95% C.I. for mu(1)-mu(4)
[ -29.6321288083, -25.85253747200], mu(1)<mu(4)
95% C.I. for mu(1)-mu(5)
[ -40.6195223109, -36.83993097470], mu(1)<mu(5)
95% C.I. for mu(2)-mu(3)
[ -12.5527088718, -8.77311753560], mu(2)<mu(3)
95% C.I. for mu(2)-mu(4)
[ -21.5001359940, -17.72054465780], mu(2)<mu(4)
95% C.I. for mu(2)-mu(5)
[ -32.4875294966, -28.70793816040], mu(2)<mu(5)
95% C.I. for mu(3)-mu(4)
[ -10.8372227904, -7.05763145410], mu(3)<mu(4)
95% C.I. for mu(3)-mu(5)
[ -21.8246162930, -18.04502495670], mu(3)<mu(5)
95% C.I. for mu(4)-mu(5)
[ -12.8771891707, -9.09759783450], mu(4)<mu(5)
conclusion,mu(1)<mu(2)< mu(3) <mu(4) < mu(5)
The common population standard deviation and variance confidence interval
residual goodness of fit test,H0: the arcsin distribution,

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -10.95300 -10.45610 -8.52389 -5.56358 -1.93221 1.93221 5.56358
8.52389 10.45610
upper limit -10.45610 -8.52389 -5.56358 -1.93221 1.93221 5.56358 8.52389
10.45610 11.30131
observed no 10.00000 57.00000 71.00000 87.00000 67.00000 71.00000 53.00000
66.00000 18.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
71
55.55556 55.55556
chi square 37.35556 0.03756 4.29356 17.79756 2.35756 4.29356 0.11756
1.96356 25.38756
degree of freedom=7
H0: A0~Arcsin(mu=0.000000,c), c is unknown, c point estimated value=11.127154 (MLE),
residual goodness of fit test, H0: the uniform distribution,

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -10.95300 -8.48030 -6.00759 -3.53489 -1.06219 1.41051 3.88321
6.35591 8.82861
upper limit -8.48030 -6.00759 -3.53489 -1.06219 1.41051 3.88321 6.35591
8.82861 11.30131
observed no 69.00000 57.00000 58.00000 51.00000 49.00000 45.00000 52.00000
51.00000 68.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 3.25356 0.03756 0.10756 0.37356 0.77356 2.00556 0.22756
0.37356 2.78756
degree of freedom=6
H0: A0~Uniform(alpha,beta), alpha,beta are unknown
alpha point estimated value=-10.952996 (MLE)
beta point estimated value=11.301311 (MLE)
p-value=0.127200
residual goodness of fit test( the best parameter values)

H0: the arcsin distribution,
mu point estimated value=-0.000000
c point estimated value=11.127154
mu value from -2.225431 to 2.225431
c value from 9.272628 to 13.908942
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -10.95300 -8.70965 -7.06728 -4.55101 -1.46434 1.82041 4.90707
7.42334 9.06572
upper limit -8.70965 -7.06728 -4.55101 -1.46434 1.82041 4.90707 7.42334
9.06572 11.30131
observed no 54.00000 50.00000 56.00000 68.00000 63.00000 60.00000 43.00000
49.00000 57.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
72
55.55556 55.55556
chi square 0.04356 0.55556 0.00356 2.78756 0.99756 0.35556 2.83756
0.77356 0.03756
degree of freedom=6
H0: A0~Arcsin(mu=0.178034,c=9.458081),
p-value=0.210700
residual goodness of fit test( the best parameter values),

H0: the uniform distribution,
alpha point estimated value=-10.952996 (MLE)
beta point estimated value=11.301311 (MLE)
alpha value from -11.042192 to -10.863801
beta value from 11.212115 to 11.390507
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -11.04219 -8.55423 -6.06627 -3.57830 -1.09034 1.39762 3.88559
6.37355 8.86151
upper limit -8.55423 -6.06627 -3.57830 -1.09034 1.39762 3.88559 6.37355
8.86151 11.34948
observed no 65.00000 61.00000 57.00000 50.00000 51.00000 46.00000 51.00000
52.00000 67.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 1.60556 0.53356 0.03756 0.55556 0.37356 1.64356 0.37356
0.22756 2.35756
degree of freedom=6
H0: A0~Uniform(alpha=-11.042192,beta=11.349477),
p-value=0.260200
(20.2)n=100 and data is same as (20.1), one way analysis and error is Arcsin
distribution,
(20.2.1) Each category probability distribution,
Category 1 data goodness of fit test,
mu point estimated value=5.956306, c point estimated value=9.998345
mu value from 3.956637 to 7.955975, c value from 8.331954 to 12.497931
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
lower limit -4.99669 -2.50363 0.02794 3.68618 7.74651 11.40475 13.93633
upper limit -2.50363 0.02794 3.68618 7.74651 11.40475 13.93633 15.00000
observed no 16.00000 11.00000 13.00000 12.00000 18.00000 15.00000 15.00000
probability 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
expected no 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571
chi square 0.20571 0.75571 0.11571 0.36571 0.96571 0.03571 0.03571
degree of freedom=4
73
H0: X1~Arcsin(mu=5.716346,c=9.123490),
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
lower limit 5.00034 6.22754 8.67307 12.20696 16.12928 19.66317 22.10870
upper limit 6.22754 8.67307 12.20696 16.12928 19.66317 22.10870 24.95515
observed no 15.00000 13.00000 16.00000 19.00000 12.00000 8.00000 17.00000
probability 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
expected no 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571
chi square 0.03571 0.11571 0.20571 1.55571 0.36571 2.76571 0.51571
degree of freedom=4
H0: X2~Arcsin(mu=14.168118,c=8.813378),
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
lower limit 15.01221 15.47688 18.28394 22.34026 26.84244 30.89876 33.70582
upper limit 15.47688 18.28394 22.34026 26.84244 30.89876 33.70582 34.99502
observed no 11.00000 14.00000 18.00000 16.00000 15.00000 13.00000 13.00000
probability 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
expected no 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571
chi square 0.75571 0.00571 0.96571 0.20571 0.03571 0.11571 0.11571
degree of freedom=4
H0: X3~Arcsin(mu=24.591350,c=10.116300),
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
lower limit 25.00016 25.41150 28.19782 32.22417 36.69309 40.71944 43.50576
upper limit 25.41150 28.19782 32.22417 36.69309 40.71944 43.50576 44.99995
observed no 17.00000 17.00000 16.00000 16.00000 9.00000 11.00000 14.00000
probability 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
expected no 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571
chi square 0.51571 0.51571 0.20571 0.20571 1.95571 0.75571 0.00571
degree of freedom=4
H0: X4~Arcsin(mu=34.458631,c=10.041560),
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
lower limit 35.00913 35.53594 38.19390 42.03475 46.29779 50.13865 52.79660
upper limit 35.53594 38.19390 42.03475 46.29779 50.13865 52.79660 54.99998
observed no 10.00000 12.00000 18.00000 18.00000 16.00000 12.00000 14.00000
probability 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
expected no 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571
chi square 1.28571 0.36571 0.96571 0.96571 0.20571 0.36571 0.00571
degree of freedom=4
H0: X5~Arcsin(mu=44.166271,c=9.578946),
(20.2.2)
One way model analysis,
1=A1, 2=A2, 3=A3, 4=A4, 5=A5
sample size 100 100 100 100 100 500
74
sample mean 5.95631 14.08830 24.75121 33.69864 44.68603 24.63610
sample variance 47.32315 43.33253 46.56744 53.27101 40.77840
alpha estimate value -18.67979 -10.54780 0.11511 9.06254 20.04993
ANOVA
Source df SS MS F
Treatment 4 94433.3479159967 23608.3369789992 510.4007919148
Error 495 22895.9809422788 46.2545069541
Total 499 117329.3288582755
The error probability is Arcsin distribution.

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -10.95300 -10.45610 -8.52389 -5.56358 -1.93221 1.93221 5.56358
8.52389 10.45610
upper limit -10.45610 -8.52389 -5.56358 -1.93221 1.93221 5.56358 8.52389
10.45610 11.30131
observed no 10.00000 57.00000 71.00000 87.00000 67.00000 71.00000 53.00000
66.00000 18.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 37.35556 0.03756 4.29356 17.79756 2.35756 4.29356 0.11756
1.96356 25.38756
degree of freedom=7
H0: error~Arcsin(mu=0,c), mu,c are unknown

Max(sample variance(i))/SSE=test value=0.002327, p value=0.047028

Run=250
Z=-0.083824, p-value=0.466600
Z=-0.083824, p-value=0.533400
Z=-0.083824, p-value=0.933200
95% C.I. for mu(1)-mu(2)
[ -10.0222940599, -6.24169156860], mu(1)<mu(2)
95% C.I. for mu(1)-mu(3)
[ -20.6852072636, -16.90460477230], mu(1)<mu(3)
95% C.I. for mu(1)-mu(4)
[ -29.6326343858, -25.85203189450], mu(1)<mu(4)
95% C.I. for mu(1)-mu(5)
[ -40.6200278884, -36.83942539710], mu(1)<mu(5)
95% C.I. for mu(2)-mu(3)
[ -12.5532144493, -8.77261195800], mu(2)<mu(3)
95% C.I. for mu(2)-mu(4)
[ -21.5006415715, -17.72003908030], mu(2)<mu(4)
75
95% C.I. for mu(2)-mu(5)
[ -32.4880350742, -28.70743258290], mu(2)<mu(5)
95% C.I. for mu(3)-mu(4)
[ -10.8377283679, -7.05712587660], mu(3)<mu(4)
95% C.I. for mu(3)-mu(5)
[ -21.8251218705, -18.04451937920], mu(3)<mu(5)
95% C.I. for mu(4)-mu(5)
[ -12.8776947483, -9.09709225700], mu(4)<mu(5)
The common population standard deviation and variance confidence interval
(20.2.3)The probability distribution of residual , the reason the uniform distribution

non-rejected when the goodness of fit test.
Catetory 1 , the first residual ,W5,
Variance : 49.50686
S.D. : 7.03611
MAD : 6.31904
Range : 26.66973
Mid_range : 0.03464
Median : 0.00053
Q1 : -6.96295
Q2 : 0.00053
Q3 : 6.96582
IQR : 13.92877
C.V. : none
Catetory 2 , the first residual ,W6,

Variance : 49.50846
S.D. : 7.03622
MAD : 6.31924
Range : 26.41640
Mid_range : 0.06983
Median : 0.00052
Q1 : -6.96262
Q2 : 0.00052
Q3 : 6.96479
IQR : 13.92740
C.V. : none
76
f(w5,w6) f(w6,w5)
E(W5)= 0.0013, Var(W5)= 49.5069, E(W6)= 0.0005, Var(W6)= 49.5085,

Cov(W5,W6)= 0.0102, W5 and W6 correlation coefficient=0.0002.
The residual probability distribution is not the error probability distribution.
n
∑X j
 50 
X 11 ,..., X 1n ~ Arc sin (5,10 ), X =
iid CLT
j =1
 → N  5, ,
n→∞
n  n 
Categoty 1 j-th residual = X 1 j − X is not Arcsin distribution or Normal
distribution, j = 1,2,..., n.
(20.2.4)ANOVA when error is Arcsin distribution and n=100. The sampling

distribution and critical value is below.
H0:alpha(1)=...=alpha(5)=0, MSTR/MSE test statistic,
Variance : 0.51228
S.D. : 0.71574
MAD : 0.54658
Range : 9.64185
Mid_range : 4.82103
Median : 0.83962
Q1 : 0.48001
Q2 : 0.83962
Q3 : 1.35105
IQR : 0.87104
C.V. : 0.71278
Critical value,P(MSTR/MSE> Critical value)= α ,

α 0.995 0.99 0.975 0.95 0.9
Critical value 0.0515 0.0740 0.1207 0.1771 0.2651
α 0.1 0.05 0.025 0.01 0.005
Critical value 1.9577 2.3926 2.8159 3.3642 3.7739
SLLN method, the comparison of MSTR/MSE and F(4,495)

E(| new distribution F(x) – F distribution F(x)| ^2)= 0.0000001127
Pr(| new distribution F(x) - F distribution F(x)|>= 0.1000000000)= 0.000000
77
MSTR/MSE is approached to F(4,495), but is not F(4,495).
Note: please refer the Appendix 10.
(20.3)n=100,000,000 , this is big data and the method is probability distribution.

Variance : 50.00204
S.D. : 7.07121
MAD : 6.36633
Range : 20.00000
Mid_range : 5.00000
Median : 4.99983
Q1 : -2.07007
Q2 : 4.99983
Q3 : 12.07197
IQR : 14.14204
C.V. : 1.41415
Variance : 50.00204
S.D. : 7.07121
MAD : 6.36630
Range : 20.00000
Median : 14.99808
Q1 : 7.92735
Q2 : 14.99808
Q3 : 22.07020
IQR : 14.14284
C.V. : 0.47145

Variance : 49.99877
S.D. : 7.07098
MAD : 6.36617
Range : 20.00000
Median : 24.99924
Q1 : 17.93019
Q2 : 24.99924
Q3 : 32.07234
IQR : 14.14214
C.V. : 0.28283
78
Variance : 50.00963
S.D. : 7.07175
MAD : 6.36697
Range : 20.00000
Median : 35.00040
Q1 : 27.92758
Q2 : 35.00040
Q3 : 42.07292
IQR : 14.14534
C.V. : 0.20205
Variance : 50.00112
S.D. : 7.07115
MAD : 6.36629
Range : 20.00000
Median : 45.00107
Q1 : 37.92928
Q2 : 45.00107
Q3 : 52.07130
IQR : 14.14202
C.V. : 0.15713

f X (x ) = P (1st ) f (x 1st ) + P (2nd ) f (x 2nd ) + P (3rd ) f (x 3rd ) + P (4th ) f (x 4th )
+ P (5th ) f (x 5th ) = 0.2 ×

1 1 1 1
+ 0.2 ×
π ( x − 5) 2 π ( x − 15)
2
1− 1−
100 100
1 1 1 1 1 1
+ 0.2 × + 0.2 × + 0.2 × ,
π (x − 25)2 π (x − 35)2 π (x − 45)2
1− 1− 1−
100 100 100
S.D. : 15.81391
MAD : 13.27544
Range : 60.00000
Median : 25.00000
Q1 : 13.21437
Q2 : 25.00000
Q3 : 36.78374
IQR : 23.56937
C.V. : 0.63256
X1 + X 2 + X 3 + X 4 + X 5
(20.3.3) The mean of X1,X2,X3,X4,X5, Y1= ,
5
79
Variance : 10.00071
S.D. : 3.16239
MAD : 2.55694
Range : 19.96294
Median : 24.99959
Q1 : 22.80531
Q2 : 24.99959
Q3 : 27.19544
IQR : 4.39013
C.V. : 0.12649
4.5. the α i ≠ 0, i = 1,2,..., k and error distribution of each

category has a specific probability distribution.
Exmple 21,
the α i ≠ 0, i = 1,2,..., k ,
Arcsin population is divide to 5 categories,
Category 2 population, X 2 ~ Normal µ 2 = 15, σ 22 = 50 , ( )
Category 3 population, X 3 ~ Semi _ circle µ 3 = 25, R3 = 200 , ( )
X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,
α1 = −20, α 2 = −10, α 3 = 0, α 4 = 10, α 5 = 20,

iid iid
1 1 2 2
( )
ε 3 j ~ Semi _ circle 0, Rε = 200 , σ ε2 = 50, ε 4 j ~ DE (λε = 0.2,0), σ ε2 = 50,
iid iid
3 3 4 4
ε 5 j ~ Triangular1(0, cε = 10), σ ε2 = 50,

iid
5 5
21.1)n=100, the each category has a specific probability distribution and the variances
are equally, the error is normal distribution in assumption when analysis data.
One way model, X(ij)=mu+alpha(i)+e(ij), i=1,2...,5, j=1,2...,n(i)
1=A1, 2=A2, 3=A3, 4=A4, 5=A5
sample size 100 100 100 100 100 500
sample mean 4.11151 14.78073 23.99617 35.50465 44.53823 24.58626
sample variance 52.12294 48.92488 52.72852 63.07862 51.03545
alpha estimate value -20.47475 -9.80552 -0.59009 10.91840 19.95197
ANOVA
Source df SS MS F
Treatment 4 103300.4246150119 25825.1061537530 482.0087696471
Error 495 26521.1513796043 53.5780835952
Total 499 129821.5759946162
80
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -8.93524 -5.59783 -3.15327 -1.02314 1.02181 3.15101
5.59511 8.93073
upper limit -8.93524 -5.59783 -3.15327 -1.02314 1.02181 3.15101 5.59511
8.93073
observed no 51.00000 83.00000 48.00000 54.00000 42.00000 41.00000 52.00000
70.00000 59.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 0.37356 13.55756 1.02756 0.04356 3.30756 3.81356 0.22756
3.75556 0.21356
degree of freedom=7
p-value=0.000400

p-value=0.745400

number of the negative of residual=256, number of the positive ofresidual=244
Run=239
Z=-1.062110, p-value=0.144100
Z=-1.062110, p-value=0.855900
Z=-1.062110, p-value=0.288200
multiple comparison of population means,假設各個母體為常態分配,

95% C.I. for mu(1)-mu(2)
[ -12.6981522577, -8.64030068100] mu(1)<mu(2)
95% C.I. for mu(1)-mu(3)
[ -21.9135862670, -17.85573469030] mu(1)<mu(3)
95% C.I. for mu(1)-mu(4)
[ -33.4220712270, -29.36421965030] mu(1)<mu(4)
95% C.I. for mu(1)-mu(5)
[ -42.4556431670, -38.39779159030] mu(1)<mu(5)
95% C.I. for mu(2)-mu(3)
[ -11.2443597977, -7.18650822100] mu(2)<mu(3)
95% C.I. for mu(2)-mu(4)
[ -22.7528447576, -18.69499318090] mu(2)<mu(4)
95% C.I. for mu(2)-mu(5)
[ -31.7864166977, -27.72856512100] mu(2)<mu(5)
95% C.I. for mu(3)-mu(4)
[ -13.5374107483, -9.47955917160] mu(3)<mu(4)
95% C.I. for mu(3)-mu(5)
[ -22.5709826884, -18.51313111170] mu(3)<mu(5)
95% C.I. for mu(4)-mu(5)
[ -11.0624977284, -7.00464615170] mu(4)<mu(5)
Conclusion,mu(1)<mu(2) <mu(3) <mu(4) <mu(5)
81
(21.2)n=100,000,000 , this is big data and the method is probability distribution.

Variance : 50.00735
S.D. : 7.07159
MAD : 6.36673
Range : 20.00000
Mid_range : 5.00000
Median : 5.00225
Q1 : -2.07068
Q2 : 5.00225
Q3 : 12.07320
IQR : 14.14388
C.V. : 1.41393
Variance : 50.01358
S.D. : 7.07203
MAD : 5.64265
Range : 79.98374
Median : 14.99973
Q1 : 10.22996
Q2 : 14.99973
Q3 : 19.76935
IQR : 9.53939
C.V. : 0.47145

Variance : 50.00195
S.D. : 7.07121
MAD : 6.00218
Range : 28.28411
Median : 24.99957
Q1 : 19.28774
Q2 : 24.99957
Q3 : 30.71420
IQR : 11.42645
C.V. : 0.28284
82
Variance : 50.01376
S.D. : 7.07204
MAD : 5.00031
Range : 171.46965
Median : 34.99991
Q1 : 31.53313
Q2 : 34.99991
Q3 : 38.46399
IQR : 6.93085
C.V. : 0.20207
Variance : 49.99835
S.D. : 7.07095
MAD : 6.66655
Range : 20.00000
Median : 44.92970
Q1 : 37.92903
Q2 : 44.92970
Q3 : 52.07031
IQR : 14.14128
C.V. : 0.15713

The marginal probability
distribution,
f X (x ) = P (1st ) f (x 1st ) + P(2nd ) f (x 2nd ) + P(3rd ) f (x 3rd ) + P(4th ) f (x 4th )
 ( x − 15)2 
+ P(5th ) f (x 5th ) = 0.2 ×
1 1 1
+ 0.2 × × exp − 
π 
1−
( x − 5)
2
50π  50 
100
 x − 45  1
200 − ( x − 25) + 0.2 × 0.1exp(− 0.2 x − 35 ) + 0.2 × 
1
+ 0.2 × × ,
2
100π  10  10
S.D. : 15.81018
MAD : 13.43457
Range : 163.69237
Median : 25.14082
Q1 : 13.19401
Q2 : 25.14082
Q3 : 36.66261
IQR : 23.46860
C.V. : 0.63235
83
X1 + X 2 + X 3 + X 4 + X 5
(21.2.3)The mean of X1,X2,X3,X4,X5 Y1= ,
5
Variance : 9.99995
S.D. : 3.16227
MAD : 2.53224
Range : 44.39859
Median : 24.99926
Q1 : 22.84348
Q2 : 24.99926
Q3 : 27.15464
IQR : 4.31117
C.V. : 0.12649
4.6. the α i = 0, i = 1,2,..., k and error distribution of each

category has a specific probability distribution.
Exmple 22,
the α i = 0, i = 1,2,..., k ,
Arcsin population is divide to 5 categories,
Category 2 population, X 2 ~ Normal µ 2 = 15, σ 22 = 50 ,( )
Category 3 population, X 3 ~ Semi _ circle µ 3 = 25, R3 = 200 ,( )
X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,
α1 = −20, α 2 = −10, α 3 = 0, α 4 = 10, α 5 = 20,

iid iid
1 1 2 2
( )
ε 3 j ~ Semi _ circle 0, Rε = 200 , σ ε2 = 50, ε 4 j ~ DE (λε = 0.2,0), σ ε2 = 50,
iid iid
3 3 4 4
ε 5 j ~ Triangular1(0, cε = 10), σ ε2 = 50,

iid
5 5
(22.1)n=100, , the each category has a specific probability distribution and the
variances are equally, the error is normal distribution in assumption when
analysis data.
One way model
1=A1, 2=A2, 3=A3, 4=A4, 5=A5
sample size 100 100 100 100 100 500
sample mean 24.47177 25.20717 24.55538 25.82802 25.92013 25.19649
sample variance 47.91952 39.94974 43.76623 43.07667 52.68748
alpha estimate value -0.72472 0.01068 -0.64111 0.63152 0.72364
84
ANOVA
Source df SS MS F
Treatment 4 185.8843267195 46.4710816799 1.0217931828
Error 495 22512.5649871330 45.4799292669
Total 499 22698.4493138525
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -8.23232 -5.15746 -2.90521 -0.94266 0.94142 2.90313
5.15496 8.22817
upper limit -8.23232 -5.15746 -2.90521 -0.94266 0.94142 2.90313 5.15496
8.22817
observed no 66.00000 61.00000 51.00000 57.00000 42.00000 36.00000 48.00000
75.00000 64.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 1.96356 0.53356 0.37356 0.03756 3.30756 6.88356 1.02756
6.80556 1.28356
degree of freedom=7
p-value=0.002300
p-value=0.686800
Run=237
Z=-1.241278, p-value=0.107300
Z=-1.241278, p-value=0.892700
Z=-1.241278, p-value=0.214600
1. LSD( least significant difference),假設各個母體為常態分配,
95% C.I. for mu(1)-mu(2)[ -2.6047178349, 1.13391193990] mu(1)=mu(2)
95% C.I. for mu(1)-mu(3)[ -1.9529250259, 1.78570474890] mu(1)=mu(3)
95% C.I. for mu(1)-mu(4)[ -3.2255615302, 0.51306824450] mu(1)=mu(4)
95% C.I. for mu(1)-mu(5)[ -3.3176800410, 0.42094973370] mu(1)=mu(5)
95% C.I. for mu(2)-mu(3)[ -1.2175220784, 2.52110769640] mu(2)=mu(3)
95% C.I. for mu(2)-mu(4)[ -2.4901585827, 1.24847119200] mu(2)=mu(4)
95% C.I. for mu(2)-mu(5)[ -2.5822770935, 1.15635268130] mu(2)=mu(5)
95% C.I. for mu(3)-mu(4)[ -3.1419513917, 0.59667838300] mu(3)=mu(4)
95% C.I. for mu(3)-mu(5)[ -3.2340699025, 0.50455987220] mu(3)=mu(5)
95% C.I. for mu(4)-mu(5)[ -1.9614333982, 1.77719637660] mu(4)=mu(5)
conclusion,mu(1)=mu(2)= mu(3)=mu(4)=mu(5),
85
(22.2)n=100,000,000 this is big data and the method is probability distribution.
The comparison of X1 and X2 The comparison of X1 and X3
86

 (x − 25)2 
+ P(5th ) f (x 5th ) = 0.2 ×
1 1 1
+ 0.2 × × exp − 

π ( x − 25)
2
50π  50 
1−
100
 x − 25  1
200 − (x − 25) + 0.2 × 0.1 exp(− 0.2 x − 25 ) + 0.2 × 
1
+ 0.2 × × ,
2
100π  10  10
87
Variance : 50.00985
S.D. : 7.07176
MAD : 5.93557
Range : 163.69237
Median : 25.00175
Q1 : 19.22921
Q2 : 25.00175
Q3 : 30.77336
IQR : 11.54415
C.V. : 0.28286
X1 + X 2 + X 3 + X 4 + X 5
(22.2.3)The mean of X1,X2,X3,X4,X5, Y1= .
5
Variance : 10.00058
S.D. : 3.16237
MAD : 2.53237
Range : 40.88973
Median : 24.99994
Q1 : 22.84364
Q2 : 24.99994
Q3 : 27.15600
IQR : 4.31237
C.V. : 0.12649
4.7. the α i = 0, i = 1,2,..., k ,

This section is checking the multiple comparison method and the critical value.
Normal population is divide to 5 categories,

(
(
Category 2 population, X 2 ~ N µ 2 = 25,σ 22 = 5 2 , )
(
Category 4 population, X 4 ~ N (µ 4 = 25, σ 4 = 5 ),
2 2
Category 5 population, X 5 ~ N (µ 5 = 25, σ 5 = 5 ),

2 2
X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,

iid
n=100,
One way model
1=X1, 2=X2, 3=X3, 4=X4, 5=X5
X1 X2 X3 X4 X5 Total
sample size 100 100 100 100 100 500
sample mean 25.83636 24.37861 25.14427 25.48965 24.80035 25.12985
sample variance 24.12428 28.19286 19.79491 27.18655 26.64595
88
alpha estimate value 0.70651 -0.75124 0.01442 0.35980 -0.32949
ANOVA
Source df SS MS F
Treatment 4 130.1739575636 32.5434893909 1.2919769848
Error 495 12468.5094530738 25.1889079860
Total 499 12598.6834106374
The F test p value=0.277200,
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -6.12657 -3.83823 -2.16208 -0.70153 0.70062 2.16053
3.83636 6.12348
upper limit -6.12657 -3.83823 -2.16208 -0.70153 0.70062 2.16053 3.83636
6.12348
observed no 60.00000 53.00000 42.00000 55.00000 65.00000 56.00000 62.00000
56.00000 51.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 0.35556 0.11756 3.30756 0.00556 1.60556 0.00356 0.74756
0.00356 0.37356
degree of freedom=7
p-value=0.480500

p-value=0.428000

95% C.I. for mu(1)-mu(2)
[ 0.0665828388, 2.84890388000] mu(1)>mu(2)
95% C.I. for mu(1)-mu(3)
[ -0.6990785956, 2.08324244560] mu(1)=mu(3)
95% C.I. for mu(1)-mu(4)
[ -1.0444551970, 1.73786584420] mu(1)=mu(4)
95% C.I. for mu(1)-mu(5)
[ -0.3551599125, 2.42716112870] mu(1)=mu(5)
95% C.I. for mu(2)-mu(3)
[ -2.1568219550, 0.62549908620] mu(2)=mu(3)
95% C.I. for mu(2)-mu(4)
[ -2.5021985564, 0.28012248480] mu(2)=mu(4)
95% C.I. for mu(2)-mu(5)
[ -1.8129032719, 0.96941776930] mu(2)=mu(5)
95% C.I. for mu(3)-mu(4)
[ -1.7365371220, 1.04578391920 mu(3)=mu(4)
95% C.I. for mu(3)-mu(5)
[ -1.0472418376, 1.73507920370] mu(3)=mu(5)
95% C.I. for mu(4)-mu(5)
[ -0.7018652361, 2.08045580510] mu(4)=mu(5)
conclusion,mu(2)=mu(3)= mu(4) =mu(5) = mu(1),但是 mu(2)<mu(1),
89
sample scatter diagram residual plot
H0:alpha(1)=...=alpha(5)=0, p-value=0.480500,that is mu(2)=mu(3)= mu(4) =mu(5)

= mu(1), but the multiple comparison has mu(2)<mu(1), the test result has a conflict.
LSD test is wrong in confidence coefficient 0.95. The simulated times is 6,000,000,
the crictical value of LSD, the mathed required probability is 71.283174%. If the test
result is according to the ANOVA, the confidence coefficient is95%, the simulated
time= 6,000,000, the critical value is 2.7373265745, the probability is 94.996285%
When all poplatiion means are equllay, it is closed to 95%.
The multiple comparison and ANOVA has same test result, the critical value of
multiple comparison must be re-calculated.
Xi −X j
α = 0.05 ,test statistic= is symmetric distribution, the right sided
1 1
MSE +
n n
critical value will be shown.
P(|test statistic | ≤ right sided critival value)=0.95,
critival value Treatment number,k
n 2 3 4 5 6
2 4.3023 4.1774 4.0682 4.0120 3.9780
3 2.7745 3.0668 3.1999 3.2939 3.3600
4 2.4442 2.7922 2.9696 3.0870 3.1788
5 2.3028 2.6695 2.8624 2.9919 3.0905
8 2.1437 2.5208 2.7304 2.8769 2.9865
10 2.0997 2.4792 2.6944 2.8416 2.9540
15 2.0491 2.4300 2.6489 2.7993 2.9161
20 2.0247 2.4066 2.6280 2.7820 2.8984
25 2.0085 2.3917 2.6146 2.7703 2.8880
30 2.0007 2.3852 2.6074 2.7628 2.8821

n 7 8 9 10 11
2 3.9626 3.9574 3.9577 3.9583 3.9643
3 3.4148 3.4626 3.5026 3.5415 3.5759
4 3.2505 3.3118 3.3624 3.4108 3.4532
5 3.1707 3.2387 3.2975 3.3477 3.3940
8 3.0735 3.1486 3.2130 3.2681 3.3169
10 3.0456 3.1219 3.1872 3.2441 3.2958
15 3.0105 3.0897 3.1554 3.2160 3.2676
90
20 2.9951 3.0740 3.1435 3.2017 3.2549
25 2.9855 3.0663 3.1323 3.1954 3.2474
30 2.9792 3.0600 3.1274 3.1900 3.2428

,n 12 13 14 15 16
2 4.0600 4.0723 4.0869 4.1016 4.1174
3 3.5852 3.6237 3.6621 3.6951 3.7274
4 3.4637 3.5095 3.5492 3.5901 3.6237
5 3.4094 3.4558 3.4991 3.5376 3.5736
8 3.3447 3.3911 3.4344 3.4741 3.5107
10 3.3258 3.3728 3.4155 3.4558 3.4912
15 3.3032 3.3502 3.3923 3.4326 3.4674
20 3.2935 3.3398 3.3813 3.4216 3.4570
25 3.2880 3.3344 3.3752 3.4143 3.4497
30 3.2849 3.3295 3.3722 3.4100 3.4460
91
Chaper 5. Simple linear model
5.1. Simple linear analysis

(1.1) samples
The paired sample is ( X i , Yi ), i = 1,2,..., n,
Yi = β 0 + β1 X i + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope , ε i is error,
X i is independent variable, Yi is dependent variablel, this is conditional proerty,
There are three basic assumptions,
i) ε i ~ Normal distribution,ii) E (ε i ) = 0,Var (ε i ) = σ 2 ,iii) ε 1 ,..., ε n are independently,
(1.2)Big data
The simple linear model analysis can be applied in big data, the method is
f X ( x ), f ε (ε ) can be formed using the curve-fitting or SLLN.
Y = H ( x ) + ε , H ( x ) is from the linear model analysis.
X , ε are independent random variables.
f X ,ε ( x, ε ) = f X ( x ) f ε (ε ), f X ,Y ( x, y ) = f X ,ε ( x, ε = y − H (x )),
f Y ( y ) = ∫ f X ,Y ( x, y )dx,
f X ,Y (x, y ) f X ,Y ( x, y )
fY x (y x) = , fX (x y ) =
,
f X (x ) fY ( y )
y
There are marginal probability, conditional probability distribution and the joint
probability distribution.
5.2. The parabola model analysis, three basic assumptions are

unchanged.
(
Example 23, X 1 ~ Normal µ X1 = 0,σ X2 1 = 2 2 , )
( )
E X 2 x1 = β 0 + β1 x12 = 1 + 2 x12 , ε ~ Normal 0,σ 2 = 1 , ( )
(23.1) paird samples, n=1000,
(23.1.1) Basic analysis
scatter diagram scatter diagram using the linear model
(23.1.2) the frequency probability table of independent variable,

X1 frequency probability table
[ 1 ] -6.31382~ -4.88201 -5.59792 10.00000 0.0100000 0.0100000
92
[ 2 ] -4.88201~ -3.45020 -4.16610 34.00000 0.0340000 0.0440000
[ 3 ] -3.45020~ -2.01839 -2.73429 128.00000 0.1280000 0.1720000
[ 4 ] -2.01839~ -0.58657 -1.30248 231.00000 0.2310000 0.4030000
[ 5 ] -0.58657~ 0.84524 0.12933 279.00000 0.2790000 0.6820000
[ 6 ] 0.84524~ 2.27705 1.56115 197.00000 0.1970000 0.8790000
[ 7 ] 2.27705~ 3.70886 2.99296 84.00000 0.0840000 0.9630000
[ 8 ] 3.70886~ 5.14068 4.42477 27.00000 0.0270000 0.9900000
[ 9 ] 5.14068~ 6.57249 5.85658 10.00000 0.0100000 1.0000000
(23.1.3) the frequency probability table of dependent variable,

[ 1 ] -1.95255~ 7.94041 2.99393 632.00000 0.6320000 0.6320000
[ 2 ] 7.94041~ 17.83337 12.88689 217.00000 0.2170000 0.8490000
[ 3 ] 17.83337~ 27.72633 22.77985 75.00000 0.0750000 0.9240000
[ 4 ] 27.72633~ 37.61929 32.67281 32.00000 0.0320000 0.9560000
[ 5 ] 37.61929~ 47.51224 42.56576 19.00000 0.0190000 0.9750000
[ 6 ] 47.51224~ 57.40520 52.45872 11.00000 0.0110000 0.9860000
[ 7 ] 57.40520~ 67.29816 62.35168 6.00000 0.0060000 0.9920000
[ 8 ] 67.29816~ 77.19112 72.24464 6.00000 0.0060000 0.9980000
[ 9 ] 77.19112~ 87.08407 82.13759 2.00000 0.0020000 1.0000000
frequency distribution: sample mean=9.800288 , sample variance=151.567910 , sample sd=12.311292
(23.1.4)
X 2i = β 0 + β1 X 1i + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope, ε i is error,
(23.1.4.1)
The linear mdoel analysis
The estimated line is X2=9.496367+-0.000008*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 0.0000002558 0.0000002558 0.0000000017
error 998 150956.1107438368 151.2586279998
total 999 150956.1107440926
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 9.4963665700 0.3891511813 24.40277 0.00000
slpoe -0.0000077701 0.1889605459 -0.00004 1.00000
----------------------------------------------------------------------------------
MSE=151.2586279998 , R2=0.000000 , R2(adj)=-0.001002
X2(mean)= 9.4963671217, X2(variance)= 151.1072179621, X2(s.d.)= 12.2925675903
X1(mean)= -0.0710038541, X1(variance)= 4.2404544136, X1(s.d.)= 2.0592363666
93
SSX1= 4236.2139591564 , SS(X2*X1)= -0.0329158468, C.V.= 1.2950978508
[testing the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -15.76201 -10.35077 -6.44917 -3.11546 0.00031 3.11558
6.44919 10.34561 15.76074
upper limit -15.76201 -10.35077 -6.44917 -3.11546 0.00031 3.11558 6.44919
10.34561 15.76074
observed no 0.00000 8.00000 351.00000 213.00000 112.00000 76.00000 62.00000
52.00000 39.00000 87.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 100.00000 84.64000 630.01000 127.69000 1.44000 5.76000 14.44000
23.04000 37.21000 1.69000
degree of freedom=8
p-value=0.000000

Z=-0.167482, p-value=0.433500
Z=-0.167482, p-value=0.566500
Z=-0.167482, p-value=0.867000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.940563
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.059437
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
estimated line residual plot
94
(23.1.4.2) residual analysis
X0= residual,residual frequency distribution table,
[ 1 ] -11.44891~ -1.55595 -6.50243 632.00000 0.6320000 0.6320000
[ 2 ] -1.55595~ 8.33702 3.39053 217.00000 0.2170000 0.8490000
[ 3 ] 8.33702~ 18.22998 13.28350 75.00000 0.0750000 0.9240000
[ 4 ] 18.22998~ 28.12294 23.17646 32.00000 0.0320000 0.9560000
[ 5 ] 28.12294~ 38.01591 33.06942 19.00000 0.0190000 0.9750000
[ 6 ] 38.01591~ 47.90887 42.96239 11.00000 0.0110000 0.9860000
[ 7 ] 47.90887~ 57.80183 52.85535 6.00000 0.0060000 0.9920000
[ 8 ] 57.80183~ 67.69480 62.74831 6.00000 0.0060000 0.9980000
[ 9 ] 67.69480~ 77.58776 72.64128 2.00000 0.0020000 1.0000000
X0= residual,goodness of fit(peasrson chi square test statistic)

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -15.76201 -10.35077 -6.44917 -3.11546 0.00031 3.11558
6.44919 10.34561 15.76074
upper limit -15.76201 -10.35077 -6.44917 -3.11546 0.00031 3.11558 6.44919
10.34561 15.76074
observed no 0.00000 8.00000 351.00000 213.00000 112.00000 76.00000 62.00000
52.00000 39.00000 87.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 100.00000 84.64000 630.01000 127.69000 1.44000 5.76000 14.44000
23.04000 37.21000 1.69000
degree of freedom=8
H0: X0~Normal(mu=0.000000,sigma*sigma), sigma is unknown
population variance(sigma*sigma) which point estimated value=151.258628 pearson chi-square
test statistic =1025.920000
p-value=0.000000
(23.1.5) X 2i = β 0 + β1 H ( X 1i ) + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope, ε i is error,

(23.1.5.1)
Non-linear model analysis
The relation is X2= 0.9706108969+ 2.0101963232*X1^2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1^2 1 149968.7899284744 149968.7899284744 151590.9013372837
error 998 987.3208156181 0.9892994144
total 999 150956.1107440926
95
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
----------------------------------------------------------------------------------
intercept 0.9706108969 0.0383249777 25.32580 0.00000
slpoe 2.0101963232 0.0051629974 389.34676 0.00000
----------------------------------------------------------------------------------
MSE=0.9892994144 , R2=0.993460 , R2(adj)=0.993453
X1^2(mean)= 4.2412555065, X1^2(variance)= 37.1499685491, X1^2(s.d.)= 6.0950774030
SS(X1^2)=37112.8185805040 , SS(X2*X1^2)= 74604.0514540141, C.V.= 0.1047385073
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.27472 -0.83710 -0.52156 -0.25196 0.00002 0.25197
0.52157 0.83668 1.27462
upper limit -1.27472 -0.83710 -0.52156 -0.25196 0.00002 0.25197 0.52157
0.83668 1.27462
observed no 94.00000 112.00000 96.00000 89.00000 109.00000 100.00000 104.00000
88.00000 103.00000 105.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.36000 1.44000 0.16000 1.21000 0.81000 0.00000 0.16000
1.44000 0.09000 0.25000
degree of freedom=8
p-value=0.656100
H0: residualis random , H1: Increasing line or decreasing line, Z=-1.518654, p-value=0.064500
Z=-1.518654, p-value=0.129000
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0, D.W. test=1.928344
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0, D.W. test=2.071656
estimated line X1^2 residual plot
96
(23.1.5.2)
X0=residual,residual frequency distribution table,
[ 1 ] -2.96420~ -2.27648 -2.62034 9.00000 0.0090000 0.0090000
[ 2 ] -2.27648~ -1.58875 -1.93261 54.00000 0.0540000 0.0630000
[ 3 ] -1.58875~ -0.90102 -1.24489 121.00000 0.1210000 0.1840000
[ 4 ] -0.90102~ -0.21330 -0.55716 218.00000 0.2180000 0.4020000
[ 5 ] -0.21330~ 0.47443 0.13056 291.00000 0.2910000 0.6930000
[ 6 ] 0.47443~ 1.16215 0.81829 181.00000 0.1810000 0.8740000
[ 7 ] 1.16215~ 1.84988 1.50602 99.00000 0.0990000 0.9730000
[ 8 ] 1.84988~ 2.53761 2.19374 23.00000 0.0230000 0.9960000
[ 9 ] 2.53761~ 3.22533 2.88147 4.00000 0.0040000 1.0000000
frequency distribution: sample mean=-0.002854 , sample variance=1.013268 , sample sd
X0= residual,goodness of fit(peasrson chi square test statistic)

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.27472 -0.83710 -0.52156 -0.25196 0.00002 0.25197
0.52157 0.83668 1.27462
upper limit -1.27472 -0.83710 -0.52156 -0.25196 0.00002 0.25197 0.52157
0.83668 1.27462
observed no 94.00000 112.00000 96.00000 89.00000 109.00000 100.00000 104.00000
88.00000 103.00000 105.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.36000 1.44000 0.16000 1.21000 0.81000 0.00000 0.16000
1.44000 0.09000 0.25000
degree of freedom=8
H0: X0~Normal(mu=0.000000,sigma*sigma), sigma is unknown
p-value=0.656100
97
Concluson,
the population conditional expectation line is E (Y x ) = β 0 + β1 H ( x ),
( )
H ( x ) is the function of x , ε ~ Normal 0,σ 2 = 1 , there are n pair samples,
以 Yi = β 0 + β1 H ( X i ) + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope, ε i is error,
The thress basic assumptions,
i) ε i ~ Normal distribution,,ii) E (ε i ) = 0,Var (ε i ) = σ 2 ,iii) ε 1 ,..., ε n are independently,
(23.2)n = 100,000,000, it is big data.

(23.2.1)Basiec analysis
(23.2.1.1) X1 and X2 joint probability distribution,
f(x1,x2) f(x2,x1)
sample mean(X1)= -0.0001, sample variance(X1)= 4.0000,

sample mean(X2)= 9.0001, sample variance(X2)= 128.9794,
sample cov(X1,X2)= -0.0052,
X1 and X2 sample correlation coefficient=-0.0002.
X1 and X2 are not the relationship of line.
E(X2|x1) and x1^2 are linear relation E(X1|x2) and x2 are not linear relation
98
(23.2.1.2)X1 marginal probability distribution,
Variance : 4.00003
S.D. : 2.00001
MAD : 1.59580
Range : 23.23623
Mid_range : 0.42831
Median : -0.00000
Q1 : -1.34943
Q2 : -0.00000
Q3 : 1.34898
IQR : 2.69841
C.V. : none

S.D. : 11.35691
MAD : 7.77302
Range : 296.83866
Mid_range : 143.99587
Median : 4.74463
Q1 : 2.02913
Q2 : 4.74463
Q3 : 11.64163
IQR : 9.61249
C.V. : 1.26187
(23.2.2)
The relation is X2=1.0000038041+2.0000020130*X1^2(This analysis of population data)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
X1^2 1 12797974037.9741990000 12797974037.9741990000
error 99999998 99969713.5608463290 0.9996971556
total 99999999 12897943751.5350460000
----------------------------------------------------------------------------------
F test value=12801851006.8304460000,
H0: slope(X1)=0
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 1.0000038041 0.0001224595 8165.99698 0.00000
slpoe 2.0000020130 0.0000176764 113145.26507 0.00000
----------------------------------------------------------------------------------
MSE=0.9996971556 , R2=0.992249 , R2(adj)=0.992249
99
SS(X1^2)=3199487068.7844071000 , SS(X2*X1^2)=6398980578.2747154000,
C.V.= 0.1110934733

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.64466 -1.28140 -1.03627 -0.84149 -0.67436 -0.52430
-0.38523 -0.25354 -0.12559 -0.00023 0.12542 0.25329 0.38522 0.52430
0.67432 0.84142 1.03622 1.28130 1.64461
upper limit -1.64466 -1.28140 -1.03627 -0.84149 -0.67436 -0.52430 -0.38523
-0.25354 -0.12559 -0.00023 0.12542 0.25329 0.38522 0.52430 0.67432
0.84142 1.03622 1.28130 1.64461
observed no 5000651.00000 4997648.00000 5001498.00000 4999114.00000 4999548.00000 5001173.00000
5000997.00000 4991388.00000 5011449.00000 4985657.00000 5000010.00000 5010982.00000
5002897.00000 4997757.00000 4995290.00000 5001747.00000 5002231.00000 4999525.00000
5002148.00000 4998290.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.08476 1.10638 0.44880 0.15700 0.04086 0.27519 0.19880
14.83331 26.21592 41.14433 0.00002 24.12086 1.67852 1.00621 4.43682
0.61040 0.99547 0.04512 0.92278 0.58482
p-value=0.000000
Z=0.137812, p-value=0.554800
Z=0.137812, p-value=0.445200
Z=0.137812, p-value=0.890400
t=2,3,...,100000000
D.W. test=2.000076
D.W. test=1.999924

100
The joint probability of x1^2 and The joint probability of X2 estimated
residual value and X2
(23.2.3) residual analysis,

X0=residual,residual mariginal probability distribution
Variance : 0.99970
S.D. : 0.99985
MAD : 0.79776
Range : 11.89839
Median : 0.00004
Q1 : -0.67432
Q2 : 0.00004
Q3 : 0.67444
IQR : 1.34876
C.V. : none
SLLN analysis, X0=residual and Normal(0,1),Note:X1~Normal(0,1), X1 is

representable code of Normal(0,1),

101
(23.2.4) Conclusion,
X1~Normal(0,4),X2=1.0000038041+2.0000020130*X1^2+error,
error~Normal(0,1).
Note: Please refer Appendix 2 and Appendix 6.
5.3. The comparison of independent variable is Normal distribution

and independent variable is Arcsin distribution, the three basic

Example25, independent variable is Arcsin distribution,
Use those examples to understand the independent variable probability distribution
that will effect the linear model analysis.

(
X 1 ~ Normal µ X1 = 0, σ X2 1 = 8 , )
The population conditional expectation line is
(
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0,σ 2 = 1 , )

[ 1 ] -9.20729~ -7.13854 -8.17291 7.00000 0.0070000 0.0070000
[ 2 ] -7.13854~ -5.06979 -6.10417 34.00000 0.0340000 0.0410000
[ 3 ] -5.06979~ -3.00104 4.03542 124.00000 0.1240000 0.1650000
[ 4 ] -3.00104~ -0.93230 -1.96667 236.00000 0.2360000 0.4010000
[ 5 ] -0.93230~ 1.13645 0.10208 272.00000 0.2720000 0.6730000
102
[ 6 ] 1.13645~ 3.20520 2.17082 212.00000 0.2120000 0.8850000
[ 7 ] 3.20520~ 5.27395 4.23957 93.00000 0.0930000 0.9780000
[ 8 ] 5.27395~ 7.34269 6.30832 16.00000 0.0160000 0.9940000
[ 9 ] 7.34269~ 9.41144 8.37707 6.00000 0.0060000 1.0000000

[ 1 ] -17.72866~ -13.58690 -15.65778 6.00000 0.0060000 0.0060000
[ 2 ] -13.58690~ -9.44514 -11.51602 36.00000 0.0360000 0.0420000
[ 3 ] -9.44514~ -5.30339 -7.37427 122.00000 0.1220000 0.1640000
[ 4 ] -5.30339~ -1.16163 -3.23251 221.00000 0.2210000 0.3850000
[ 5 ] -1.16163~ 2.98013 0.90925 268.00000 0.2680000 0.6530000
[ 6 ] 2.98013~ 7.12189 5.05101 221.00000 0.2210000 0.8740000
[ 7 ] 7.12189~ 11.26365 9.19277 94.00000 0.0940000 0.9680000
[ 8 ] 11.26365~ 15.40540 13.33452 25.00000 0.0250000 0.9930000
[ 9 ] 15.40540~ 19.54716 17.47628 7.00000 0.0070000 1.0000000
(24.1.2)liner model,
The estimated line is X2=0.914975+2.016337*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 33222.8669385391 33222.8669385391 34431.1819581484
error 998 962.9765613322 0.9649063741
total 999 34185.8434998714
----------------------------------------------------------------------------------
H0: slope(X1)=0
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 0.9149751100 0.0311157337 29.40555 0.00000
slpoe 2.0163366364 0.0108664347 185.55641 0.00000
----------------------------------------------------------------------------------
MSE=0.9649063741 , R2=0.971831 , R2(adj)=0.971803
SSX1=8171.6738443145 , SS(X2*X1)= 16476.8453532465, C.V.= 1.6971565864

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
103
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.25891 -0.82671 -0.51509 -0.24883 0.00002 0.24884
0.51510 0.82630 1.25881
upper limit -1.25891 -0.82671 -0.51509 -0.24883 0.00002 0.24884 0.51510
0.82630 1.25881
observed no 95.00000 121.00000 95.00000 97.00000 111.00000 69.00000 105.00000
111.00000 101.00000 95.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.25000 4.41000 0.25000 0.09000 1.21000 9.61000 0.25000
1.21000 0.01000 0.25000
degree of freedom=8
p-value=0.024900
Z=0.299228, p-value=0.617700
Z=0.299228, p-value=0.382300
Z=0.299228, p-value=0.764600
t=2,3,...,1000
D.W. test=2.138562
D.W. test=1.861438

(24.1.3) residual analysis

[ 1 ] -3.20712~ -2.41622 -2.81167 5.00000 0.0050000 0.0050000
104
[ 2 ] -2.41622~ -1.62533 -2.02078 34.00000 0.0340000 0.0390000
[ 3 ] -1.62533~ -0.83444 -1.22989 175.00000 0.1750000 0.2140000
[ 4 ] -0.83444~ -0.04355 -0.43900 281.00000 0.2810000 0.4950000
[ 5 ] -0.04355~ 0.74734 0.35190 282.00000 0.2820000 0.7770000
[ 6 ] 0.74734~ 1.53823 1.14279 163.00000 0.1630000 0.9400000
[ 7 ] 1.53823~ 2.32913 1.93368 52.00000 0.0520000 0.9920000
[ 8 ] 2.32913~ 3.12002 2.72457 5.00000 0.0050000 0.9970000
[ 9 ] 3.12002~ 3.91091 3.51546 3.00000 0.0030000 1.0000000
X0=residual,goodness of fit(peasrson chi square test statistic)

mu point estimated value=0.000000 (MLE)
sigma point estimated value=0.982296 (MLE)
mu value from -0.196459 to 0.196459
sigma value from 0.818580 to 1.227871
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.31260 -0.88221 -0.57189 -0.30673 -0.05891 0.18887
0.45401 0.76392 1.19462
upper limit -1.31260 -0.88221 -0.57189 -0.30673 -0.05891 0.18887 0.45401
0.76392 1.19462
observed no 88.00000 107.00000 99.00000 92.00000 104.00000 84.00000 94.00000
113.00000 109.00000 110.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 1.44000 0.49000 0.01000 0.64000 0.16000 2.56000 0.36000
1.69000 0.81000 1.00000
degree of freedom=7
H0: X0~Normal(mu=-0.058938,sigma*sigma=0.956882), sigma=0.978204
p-value=0.241300
(24.2) sample size= 100,000,000, it is big data.

(24.2.1) Basiec analysis
f(x1,x2) f(x2,x1)
105
sample mean(X1)= 0.0002, sample variance(X1)=7.9996,
sample cov(X1,X2)=5.9990,
X1 and X2 sample correlation coefficient=0.9847.
E(X2|x1) and x1 are linear relation E(X1|x2) and x2 are linear relation

Variance : 7.99961
S.D. : 2.82836
MAD : 2.25676
Range : 30.89940
Mid_range : 0.30865
Median : 0.00026
Q1 : -1.90739
Q2 : 0.00026
Q3 : 1.90835
IQR : 3.81574
C.V. : none

106
Variance : 32.99767
S.D. : 5.74436
MAD : 4.58337
Range : 62.34209
Mid_range : 1.75812
Median : 1.00061
Q1 : -2.87410
Q2 : 1.00061
Q3 : 4.87528
IQR : 7.74938
C.V. : 5.74288
(24.2.2)
linear model analysis
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
X1 1 3199757235.7981005000 3199757235.7981005000
error 99999998 100009863.4082655900 1.0000986541
total 99999999 3299767099.2063661000
----------------------------------------------------------------------------------
F test value=3199441597.8159437000,
H0: slope(X1)=0, The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 0.9998002603 0.0001000049 9997.50943 0.00000
slpoe 1.9999729773 0.0000353579 56563.60665 0.00000
----------------------------------------------------------------------------------
MSE=1.0000986541 , R2=0.969692 , R2(adj)=0.969692
X1(mean)= 0.0002285750, X1(variance)=7.9996093391, X1(s.d.)= 2.8283580642
SSX1=799960925.9119683500 , SS(X2*X1)=1599900234.7154553000, C.V.= 0.9997919753
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.64499 -1.28166 -1.03648 -0.84165 -0.67450 -0.52440
-0.38531 -0.25359 -0.12562 -0.00023 0.12544 0.25334 0.38530 0.52440
0.67446 0.84159 1.03643 1.28156 1.64494
upper limit -1.64499 -1.28166 -1.03648 -0.84165 -0.67450 -0.52440 -0.38531
-0.25359 -0.12562 -0.00023 0.12544 0.25334 0.38530 0.52440 0.67446
0.84159 1.03643 1.28156 1.64494
observed no 4997611.00000 4998213.00000 5000648.00000 5003532.00000 4995760.00000 5003631.00000
5003659.00000 4991788.00000 5008607.00000 4988199.00000 5002254.00000 5010054.00000
4996379.00000 5000935.00000 4999903.00000 5001543.00000 4999865.00000 4994052.00000
5001195.00000 5002172.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 1.14146 0.63867 0.08398 2.49500 3.59552 2.63683 2.67766
13.48739 14.81609 27.85272 1.01610 20.21658 2.62233 0.17485 0.00188
0.47617 0.00364 7.07574 0.28561 0.94352
107
Z=1.046802, p-value=0.852500
H0: residual is random , H1: Oscillation, Z=1.046802, p-value=0.147500
Z=1.046802, p-value=0.295000
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t), t=2,3,...,100000000
The joint probability of X1 and residual The joint probability of X2 estimated
value and X2

Variance : 1.00010
S.D. : 1.00005
MAD : 0.79789
Range : 11.43194
Median : -0.00002
Q1 : -0.67437
Q2 : -0.00002
Q3 : 0.67442
IQR : 1.34879
C.V. : none
108

(24.2.4)Conclusion,
X1~Normal(0,8), X2=0.999800+1.999973*X1+error, error~Normal(0,1),
X2~Normal(1,9).
Example 25, independent variable is Arcsin distribution,

( )
X 1 ~ Arc sin µ X1 = 0, c X1 = 4 , the population conditional expectation line is
( )
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0,σ 2 = 1 ,
X 2i = β 0 + β1 X 1i + ε i , i = 1,2,...., n , the three basic assumptions are unchanged.
(25.1.1)Basic analysis
109
[ 1 ] -3.99999~ -3.11111 -3.55555 219.00000 0.2190000 0.2190000
[ 2 ] -3.11111~ -2.22222 -2.66666 93.00000 0.0930000 0.3120000
[ 3 ] -2.22222~ -1.33334 -1.77778 73.00000 0.0730000 0.3850000
[ 4 ] -1.33334~ -0.44446 -0.88890 69.00000 0.0690000 0.4540000
[ 5 ] -0.44446~ 0.44442 -0.00002 65.00000 0.0650000 0.5190000
[ 6 ] 0.44442~ 1.33331 0.88887 83.00000 0.0830000 0.6020000
[ 7 ] 1.33331~ 2.22219 1.77775 80.00000 0.0800000 0.6820000
[ 8 ] 2.22219~ 3.11107 2.66663 113.00000 0.1130000 0.7950000
[ 9 ] 3.11107~ 3.99996 3.55551 205.00000 0.2050000 1.0000000

[ 1 ] -9.38896~ -7.06665 -8.22780 68.00000 0.0680000 0.0680000
[ 2 ] -7.06665~ -4.74434 -5.90550 171.00000 0.1710000 0.2390000
[ 3 ] -4.74434~ -2.42203 -3.58319 118.00000 0.1180000 0.3570000
[ 4 ] -2.42203~ -0.09972 -1.26088 88.00000 0.0880000 0.4450000
[ 5 ] -0.09972~ 2.22259 1.06143 88.00000 0.0880000 0.5330000
[ 6 ] 2.22259~ 4.54490 3.38374 107.00000 0.1070000 0.6400000
[ 7 ] 4.54490~ 6.86721 5.70605 137.00000 0.1370000 0.7770000
[ 8 ] 6.86721~ 9.18951 8.02836 169.00000 0.1690000 0.9460000
[ 9 ] 9.18951~ 11.51182 10.35067 54.00000 0.0540000 1.0000000
(25.1.2)Linear model,
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 31635.2079013432 31635.2079013432 30064.5594131703
error 998 1050.1380396651 1.0522425247
total 999 32685.3459410082
110
----------------------------------------------------------------------------------
H0: slope(X1)=0
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 1.0032882298 0.0324391428 30.92832 0.00000
slpoe 1.9948353290 0.0115048147 173.39135 0.00000
----------------------------------------------------------------------------------
MSE=1.0522425247 , R2=0.967871 , R2(adj)=0.967839
X1(mean)=0.0204693128, X1(variance)= 7.9577648652, X1(s.d.)= 2.8209510569
SSX1=7949.8071003290 , SS(X2*X1)= 15858.5560627216, C.V.= 0.9824422622
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.31465 -0.86332 -0.53790 -0.25985 0.00003 0.25986
0.53790 0.86289 1.31454
upper limit -1.31465 -0.86332 -0.53790 -0.25985 0.00003 0.25986 0.53790
0.86289 1.31454
observed no 95.00000 106.00000 112.00000 93.00000 86.00000 118.00000 97.00000
81.00000 114.00000 98.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.25000 0.36000 1.44000 0.49000 1.96000 3.24000 0.09000
3.61000 1.96000 0.04000
degree of freedom=8
Z=-0.498246, p-value=0.309200
Z=-0.498246, p-value=0.690800
Z=-0.498246, p-value=0.618400
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0 , D.W. test=2.016499
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0 , D.W. test=1.983501
111
(25.1.3) )residual analysis
[ 1 ] -3.41247~ -2.68336 -3.04792 3.00000 0.0030000 0.0030000
[ 2 ] -2.68336~ -1.95426 -2.31881 26.00000 0.0260000 0.0290000
[ 3 ] -1.95426~ -1.22515 -1.58970 84.00000 0.0840000 0.1130000
[ 4 ] -1.22515~ -0.49604 -0.86059 214.00000 0.2140000 0.3270000
[ 5 ] -0.49604~ 0.23307 -0.13149 271.00000 0.2710000 0.5980000
[ 6 ] 0.23307~ 0.96217 0.59762 227.00000 0.2270000 0.8250000
[ 7 ] 0.96217~ 1.69128 1.32673 124.00000 0.1240000 0.9490000
[ 8 ] 1.69128~ 2.42039 2.05584 40.00000 0.0400000 0.9890000
[ 9 ] 2.42039~ 3.14950 2.78494 11.00000 0.0110000 1.0000000
frequency distribution: sample mean=-0.009726 , sample variance=1.096746 , sam

mu value from -0.205158 to 0.205158
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.24352 -0.79407 -0.47001 -0.19312 0.06568 0.32443
0.60131 0.92494 1.37472
upper limit -1.24352 -0.79407 -0.47001 -0.19312 0.06568 0.32443 0.60131
0.92494 1.37472
observed no 111.00000 111.00000 111.00000 91.00000 101.00000 110.00000 87.00000
89.00000 100.00000 89.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 1.21000 1.21000 1.21000 0.81000 0.01000 1.00000 1.69000
1.21000 0.00000 1.21000
degree of freedom=7
H0: X0~Normal(mu=0.065650,sigma*sigma=1.043492), sigma=1.021515
p-value=0.214900
112
(25.1.4)Conclusion,
Example 24
( )
X 1 ~ Normal µ X1 = 0, σ X2 1 = 8 , the population conditional expectation line is
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 ,
ε ~ Normal (0,σ 2 = 1), paird samples, n=1000.
and the example 25
( )
X 1 ~ Arc sin µ X1 = 0, c X1 = 4 , the population conditional expectation line is
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 ,
ε ~ Normal (0,σ 2 = 1), paird samples, n=1000.
The scatter diagram will be affected by the difference of example 24 and example 25.

f(x1,x2) f(x2,x1)
sample mean(X1)= -0.0001, sample variance(X1)=8.0002,

113
Variance : 8.00016
S.D. : 2.82846
MAD : 2.54651
Range : 8.00000
Mid_range : 0.00000
Median : -0.00068
Q1 : -2.82871
Q2 : -0.00068
Q3 : 2.82824
IQR : 5.65696
C.V. : none
The curve-fitting estimated the distribution function of X1.

F(X)=1-Arcsin( (X- -0.0006830781)/ 4.0000397221 )/pi+0.5
SSE=0.208205343463537080 MAX error=0.006051299306878921 coefficient of
determination=0.999999950305306970,
Left diagram is the comparison of
estimated line and the sample data.

Variance : 33.00289
S.D. : 5.74481
MAD : 5.13318
Range : 26.27579
Mid_range : 0.99723
Median : 0.99922
Q1 : -4.55145
Q2 : 0.99922
Q3 : 6.55067
IQR : 11.10212
C.V. : 5.74644
The curve-fitting estimated the distribution function of X2.
F(X)= 0.02042808470162666600+
0.03329133726483016200*(X- -7.88255373455696070000)^1+
0.01938461863899696600*(X- -7.88255373455696070000)^2+
0.00355989247484296210*(X- -7.88255373455696070000)^3+
114
-0.00050362767001588780*(X- -7.88255373455696070000)^4+
-0.00016884419128152623*(X- -7.88255373455696070000)^5+
value range 0.0000000000<=F(x)<= 0.0500000000 ,
value range -12.1406624088<=X<= -7.2493987045 ,
determination=0.999989517414489940,

F(X)= 0.07440592320036801300+
0.07324035953531515800*(X- -6.88493827927071410000)^1+
0.01484552519248982800*(X- -6.88493827927071410000)^2+
-0.00694590958313145990*(X- -6.88493827927071410000)^3+
value range 0.0500000100<=F(x)<= 0.1000000000 ,
value range -7.2493986740<=X<= -6.5540961898 ,
determination=0.999999819231158550,

F(X)= 0.12496589509799957000+
0.08296006458518551100*(X- -6.25095708673948370000)^1+
0.00112681333914999020*(X- -6.25095708673948370000)^2+
value range 0.1000000100<=F(x)<= 0.1500000000 ,
value range -6.5540959641<=X<= -5.9495013648 ,
determination=0.999995763812284500,

F(X)= 0.17526212853174986000+
0.07833991753139546400*(X- -5.63555580154393440000)^1+
-0.00765876728473269260*(X- -5.63555580154393440000)^2+
-0.00260976435108339900*(X- -5.63555580154393440000)^3+
value range 0.1500000100<=F(x)<= 0.2000000000 ,
value range -5.9495012428<=X<= -5.3082372133 ,
determination=0.999999927235948330,

F(X)= 0.22540218416090632000+
0.06607858178108205700*(X- -4.94203736804530800000)^1+
-0.00845164534648662480*(X- -4.94203736804530800000)^2+
0.00133390469663918760*(X- -4.94203736804530800000)^3+
value range 0.2000000100<=F(x)<= 0.2500000000 ,
value range -5.3082371925<=X<= -4.5514496404 ,
determination=0.999999945445226300,

F(X)= 0.27535721619819931000+
0.05477230043772102200*(X- -4.10889155958933830000)^1+
-0.00518189327544449350*(X- -4.10889155958933830000)^2+
0.00129960503772785780*(X- -4.10889155958933830000)^3+
value range 0.2500000100<=F(x)<= 0.3000000000 ,
value range -4.5514495418<=X<= -3.6407212066 ,

determination=0.999999869719420230,

F(X)= 0.32522443801691148000+
0.04769826136977872700*(X- -3.12696456286737190000)^1+
115
-0.00246402792098705800*(X- -3.12696456286737190000)^2+
0.00054935093062757900*(X- -3.12696456286737190000)^3+
value range 0.3000000100<=F(x)<= 0.3500000000 ,
value range -3.6407210842<=X<= -2.5943772611 ,
determination=0.999999952277162760,

F(X)= 0.37514711926244937000+
0.04367283367855079300*(X- -2.02941120152394380000)^1+
-0.00135032417459118870*(X- -2.02941120152394380000)^2+
0.00019999000069126360*(X- -2.02941120152394380000)^3+
value range 0.3500000100<=F(x)<= 0.4000000000 ,
value range -2.5943771357<=X<= -1.4508463489 ,
determination=0.999999941021206930,

F(X)= 0.42508111760884582000+
0.04127843639684147800*(X- -0.85038615533496476000)^1+
-0.00062981564411579427*(X- -0.85038615533496476000)^2+
0.00081645138737940215*(X- -0.85038615533496476000)^3+
-0.00034014356750944330*(X- -0.85038615533496476000)^4+
-0.00310578527557936470*(X- -0.85038615533496476000)^5+
0.00066931925942981252*(X- -0.85038615533496476000)^6+
0.00414631760327210940*(X- -0.85038615533496476000)^7+
value range 0.4000000100<=F(x)<= 0.4500000000 ,
value range -1.4508461716<=X<= -0.2418923048 ,
determination=0.999999961724775450,

F(X)= 0.47503116061513295000+
0.04023531179775877200*(X-0.37712126169178711000)^1+
-0.00024272046829451133*(X-0.37712126169178711000)^2+
0.00013507435875070861*(X-0.37712126169178711000)^3+
value range 0.4500000100<=F(x)<= 0.5000000000 ,
value range -0.2418922393<=X<= 0.9992172844 ,
determination=0.999999964458153090,
F(X)= 0.52497223936310178000+
0.04025970774468845500*(X-1.62148901014151910000)^1+
-0.00004653164523915621*(X-1.62148901014151910000)^2+
-0.00051377721251810726*(X-1.62148901014151910000)^3+
0.00655748494318686430*(X-1.62148901014151910000)^4+
0.00462636770316748880*(X-1.62148901014151910000)^5+
-0.04932781658135354500*(X-1.62148901014151910000)^6+
-0.01460118127579335100*(X- 1.62148901014151910000)^7+
0.14570867549628019000*(X-1.62148901014151910000)^8+
0.01641044287680415400*(X-1.62148901014151910000)^9+
-0.14856775570660830000*(X-1.62148901014151910000)^10+
value range 0.5000000100<=F(x)<= 0.5500000000 ,
value range 0.9992174286<=X<= 2.2408591609 ,
determination=0.999999961757824130
F(X)= 0.57491797113892840000+
0.04126014305144609700*(X-2.84948746905875840000)^1+
0.00079572525731474997*(X-2.84948746905875840000)^2+
116
0.00221499110969602950*(X-2.84948746905875840000)^3+
-0.00227351335534464740*(X-2.84948746905875840000)^4+
-0.02064939566048451500*(X-2.84948746905875840000)^5+
0.01077810503960563400*(X-2.84948746905875840000)^6+
0.07482507346776401400*(X-2.84948746905875840000)^7+
-0.01480126317869690000*(X-2.84948746905875840000)^8+
-0.09024417059117695300*(X- 2.84948746905875840000)^9+
value range 0.5500000100<=F(x)<= 0.6000000000 ,
value range 2.2408596227<=X<= 3.4498809780 ,
determination=0.999999959351747570,

F(X)= 0.62484864694583409000+
0.04361082316356901200*(X-4.02892149357052800000)^1+
0.00138710928807010260*(X-4.02892149357052800000)^2+
0.00028640241694855018*(X-4.02892149357052800000)^3+
value range 0.6000000100<=F(x)<= 0.6500000000 ,
value range 3.4498811389<=X<= 4.5942777104 ,
determination=0.999999935743412620,

F(X)= 0.67475585192091680000+
0.04774910286004931100*(X-5.12704796399553420000)^1+
0.00268303932817860660*(X-5.12704796399553420000)^2+
0.00046114221552552570*(X-5.12704796399553420000)^3+
value range 0.6500000100<=F(x)<= 0.7000000000 ,
value range 4.5942777852<=X<= 5.6397774185 ,
determination=0.999999941145921610,

F(X)= 0.72465143951632283000+
0.05490867519782721700*(X-6.10790738372750220000)^1+
0.00503263947951460010*(X-6.10790738372750220000)^2+
value range 0.7000000100<=F(x)<= 0.7500000000 ,
value range 5.6397774438<=X<= 6.5506721003 ,
determination=0.999998526017673580,

F(X)= 0.77459555942547176000+
0.06618352553201889400*(X-6.94144094178337580000)^1+
0.00847689138892704360*(X-6.94144094178337580000)^2+
value range 0.7500000100<=F(x)<= 0.8000000000 ,
value range 6.5506722147<=X<= 7.3078436066 ,
determination=0.999999501459629790,

F(X)= 0.82473808255732139000+
0.07830989204460317400*(X-7.63525973623477850000)^1+
0.00765300389272296350*(X-7.63525973623477850000)^2+
-0.00231305617806754070*(X-7.63525973623477850000)^3+
value range 0.8000000100<=F(x)<= 0.8500000000 ,
value range 7.3078436444<=X<= 7.9491564369 ,
determination=0.999999952613622840,
117
F(X)= 0.87502675117248407000+
0.08291309172761111800*(X-8.25074935649551480000)^1+
-0.00088256106199935402*(X-8.25074935649551480000)^2+
value range 0.8500000100<=F(x)<= 0.9000000000 ,
value range 7.9491564965<=X<= 8.5538951422 ,
determination=0.999996216105796250,

F(X)= 0.92558893269573650000+
0.07280526257656316800*(X-8.88460293237139350000)^1+
-0.01488390327841315800*(X-8.88460293237139350000)^2+
value range 0.9000000100<=F(x)<= 0.9500000000 ,
value range 8.5538951601<=X<= 9.2486836083 ,
determination=0.999990492866865140,

F(X)= 0.97951684649957649000+
0.03332352551065273500*(X-9.88202136602830630000)^1+
-0.01885965771837083700*(X-9.88202136602830630000)^2+
0.00353359987626666870*(X-9.88202136602830630000)^3+
value range 0.9500000100<=F(x)<= 0.9999999900 ,
value range 9.2486836324<=X<= 14.1351232902 ,
determination=0.999888328718027800
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 3200259101.8005004000 3200259101.8005004000 3199301615.1923523000
error 99999998 100029925.9875474000 1.0002992799
total 99999999 3300289027.7880478000
----------------------------------------------------------------------------------
H0: slope(X1)=0
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
intercept 0.9998891566 0.0001000150 9997.39566 0.00000
slpoe 2.0000612022 0.0000353603 56562.36925 0.00000
----------------------------------------------------------------------------------
MSE= 1.0002992799 , R2=0.969691 , R2(adj)=0.969691
SSX1=800015811.9593379500 , SS(X2*X1)=1600080586.6603060000, C.V.=1.0004322702
118
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.64515 -1.28179 -1.03658 -0.84174 -0.67457 -0.52446
-0.38535 -0.25361 -0.12563 -0.00023 0.12545 0.25336 0.38534 0.52446
0.67453 0.84168 1.03654 1.28169 1.64510
upper limit -1.64515 -1.28179 -1.03658 -0.84174 -0.67457 -0.52446 -0.38535
-0.25361 -0.12563 -0.00023 0.12545 0.25336 0.38534 0.52446 0.67453
0.84168 1.03654 1.28169 1.64510
observed no 5002390.00000 4998681.00000 4998148.00000 4998710.00000 4998083.00000 5002449.00000
5000725.00000 4991509.00000 5010771.00000 4990005.00000 5000132.00000 5011954.00000
4999865.00000 4996435.00000 4998836.00000 5000731.00000 5001620.00000 5002092.00000
4996024.00000 5000840.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 1.14242 0.34795 0.68598 0.33282 0.73498 1.19952 0.10513
14.41942 23.20289 19.98000 0.00348 28.57962 0.00364 2.54184 0.27098
0.10687 0.52488 0.87529 3.16172 0.14112
p -value=0.000000
H0: residualis random , H1: Increasing line or decreasing line, Z=0.238401, p-value=0.594300
H0: residual is random , H1: Oscillation, Z=0.238401, p-value=0.405700
Z=0.238401, p-value=0.811400

t=2,3,...,100000000
D.W. test=2.000370
D.W. test=1.999630

119
value and X2
(25.2.2.1) residual analysis,

Variance : 1.00030
S.D. : 1.00015
MAD : 0.79797
Range : 11.47237
Median : -0.00001
Q1 : -0.67444
Q2 : -0.00001
Q3 : 0.67457
IQR : 1.34901
C.V. : none

120
(25.2.1.3)Conclusion,
X1~Arcsin(0.0006830781, 4.0000397221),X2=0.999889+2.000061*X1+error,
Error~Normal(0,1).
(25.2.3) X1 is dependent variable and X2 is independent variables.

The X2 and E ( X 1 x2 ) = α 0 + α 1 x2 and getting the intercept and slope using linear
model analysis.
Var ( X 1 ) 8.0002
α1 = ρ × = 0.9847 × = 0.48481752291,
Var ( X 2 ) 33.0029
α 0 = E ( X 1 ) − α 1 E ( X 2 ) = -0.0001 − ( 0.9997 ) × 0.48481752291 = 0.48477207765,
The relation is X1= -0.4847793029+ 0.4848304416*X2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
X2 1 775767777.3825616800 775767777.3825616800
error 99999998 24248034.5767762660 0.2424803506
total 99999999 800015811.9593379500
----------------------------------------------------------------------------------
F test value=3199301615.1923647000
H0: slope(X2)=0 The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept -0.4847793029 0.0000499823 -9699.01147 0.00000
slpoe 0.4848304416 0.0000085716 56562.36925 0.00000
----------------------------------------------------------------------------------
MSE=0.2424803506 , R2=0.969691 , R2(adj)=0.969691
SS(X2)=3300289027.7880478000 , SS(X1*X2)=1600080586.6603060000, C.V.=-------
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -0.80999 -0.63109 -0.51036 -0.41443 -0.33212 -0.25822
-0.18973 -0.12487 -0.06185 -0.00011 0.06177 0.12474 0.18972 0.25822
0.33210 0.41440 0.51034 0.63104 0.80997
upper limit -0.80999 -0.63109 -0.51036 -0.41443 -0.33212 -0.25822 -0.18973
-0.12487 -0.06185 -0.00011 0.06177 0.12474 0.18972 0.25822 0.33210
0.41440 0.51034 0.63104 0.80997
observed no 4999957.00000 4998561.00000 5003158.00000 4998733.00000 5002659.00000 5000469.00000
4998021.00000 4989832.00000 5008592.00000 4987777.00000 5000426.00000 5010664.00000
5000583.00000 4999488.00000 4998986.00000 4999453.00000 4999935.00000 4998856.00000
5001839.00000 5002011.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.00037 0.41414 1.99459 0.32106 1.41406 0.04399 0.78329
20.67764 14.76449 29.88035 0.03630 22.74418 0.06798 0.05243 0.20564
0.05984 0.00085 0.26175 0.67638 0.80882
121
pearson chi-square test statistic =95.208147 p-value=0.000000
Z=-0.011758, p-value=0.990600
t=2,3,...,100000000
D.W. test=2.000392
D.W. test=1.999608
value and X1
(25.2.3.1)
The residual of X1 estimated line,
X0= residual,residual mariginal probability distribution
Variance : 0.24248
S.D. : 0.49242
MAD : 0.39291
Range : 5.78566
Mid_range : 0.04909
Median : 0.00004
Q1 : -0.33217
Q2 : 0.00004
Q3 : 0.33214
IQR : 0.66430
C.V. : none
122
SLLN analysis, X0=residual and Normal(0,1),Note:X1~ Normal(0,0.24248), X1 is
representable code of Normal(0,0.24248),
(25.2.3.2)Conclusion,
X2~ The curve-fitting estimated line,
X1=-0.4847793029+0.4848304416*X2+error, error~Normal(0, 0.24248),
(25.2.4)X1 and X2 are random variables,

(i)
X1 is the ramdom variable which has a priori probability distribution and X2 are
dependenet variable, the probability model is
X1~Arcsin(0.0006830781, 4.0000397221),
X2=0.999889+2.000061*X1+error,
Error~Normal(0,1).
(ii)
X2 is the ramdom variable which has a priori probability distribution and X1 are
dependenet variable, the probability model is
X2~a special distribution,
X1=-0.4847793029+0.4848304416*X2+error, error~Normal(0,0.24248),
ρ ( X 1 , X 2 ) = 2.000061 × 0.4848304416 = 0.9847 , However,

X2=0.999889+2.000061*X1+error can convert to
X1=-0.999889/2.000061+X2/2.000061-error/2.000061, but this inverse method is not
good idea that is not matched the linear model analysis requirement.
123
5.4. The error probability distribution is not normal distribution and
other basic assumptions are unchanged.
Example 26 The error probability distribution is shifted exponential distribution.

( )
X 1 ~ Normal µ X1 = 1000, σ X2 1 = 10 2 , the population conditional expectation line is
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Shifted _ exp onential (λ = 1, c = −1),
Three basic assumptions are
i) ε i ~ shifted exponential distribution ,ii) E (ε i ) = 0,Var (ε i ) = σ 2 ,
26.1) paird samples, n=1000,
(26.1.1.1) the frequency probability table of independent variable,

[ 1 ] 970.70485~ 977.78782 974.24634 8.00000 0.0080000 0.0080000
[ 2 ] 977.78782~ 984.87078 981.32930 59.00000 0.0590000 0.0670000
[ 3 ] 984.87078~ 991.95375 988.41227 138.00000 0.1380000 0.2050000
[ 4 ] 991.95375~ 999.03671 995.49523 256.00000 0.2560000 0.4610000
[ 5 ] 999.03671~ 1006.11968 1002.57820 275.00000 0.2750000 0.7360000
[ 6 ] 1006.11968~ 1013.20264 1009.66116 187.00000 0.1870000 0.9230000
[ 7 ] 1013.20264~ 1020.28561 1016.74413 57.00000 0.0570000 0.9800000
[ 8 ] 1020.28561~ 1027.36857 1023.82709 17.00000 0.0170000 0.9970000
[ 9 ] 1027.36857~ 1034.45154 1030.91006 3.00000 0.0030000 1.0000000
X1 probability distribution, goodness of fit test(peasrson chi square test statistic).

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit 970.70485 977.07952 983.45419 989.82886 996.20353 1002.57820 1008.95286
1015.32753 1021.70220 1028.07687
upper limit 977.07952 983.45419 989.82886 996.20353 1002.57820 1008.95286 1015.32753
1021.70220 1028.07687 1034.45154
124
observed no 5.00000 46.00000 99.00000 206.00000 243.00000 216.00000 129.00000
41.00000 12.00000 3.00000
probability 0.00970 0.03590 0.10390 0.19970 0.25480 0.21590 0.12140
0.04540 0.01130 0.00200
expected no 9.70000 35.90000 103.90000 199.70000 254.80000 215.90000 121.40000
45.40000 11.30000 2.00000
chi square 2.27732 2.84150 0.23109 0.19875 0.54647 0.00005 0.47578
0.42643 0.04336 0.50000
pearson chi square test statistic=7.540751, degree of freedom=7, p-value=0.374800
correction:
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit 970.70485 977.07952 983.45419 989.82886 996.20353 1002.57820 1008.95286
1015.32753 1021.70220
upper limit 977.07952 983.45419 989.82886 996.20353 1002.57820 1008.95286 1015.32753
1021.70220 1034.45154
observed no 5.00000 46.00000 99.00000 206.00000 243.00000 216.00000 129.00000
41.00000 15.00000
probability 0.00970 0.03590 0.10390 0.19970 0.25480 0.21590 0.12140
0.04540 0.01330
expected no 9.70000 35.90000 103.90000 199.70000 254.80000 215.90000 121.40000
45.40000 13.30000
chi square 2.27732 2.84150 0.23109 0.19875 0.54647 0.00005 0.47578
0.42643 0.21729
degree of freedom=6, pearson chi-square test statistic =7.214681
p-value=0.301400
26.1.1.2) the frequency probability table of X2,

[ 1 ] 1943.18963~ 1957.18790 1950.18877 13.00000 0.0130000 0.0130000
[ 2 ] 1957.18790~ 1971.18618 1964.18704 59.00000 0.0590000 0.0720000
[ 3 ] 1971.18618~ 1985.18445 1978.18532 133.00000 0.1330000 0.2050000
[ 4 ] 1985.18445~ 1999.18273 1992.18359 254.00000 0.2540000 0.4590000
[ 5 ] 1999.18273~ 2013.18100 2006.18187 274.00000 0.2740000 0.7330000
[ 6 ] 2013.18100~ 2027.17928 2020.18014 185.00000 0.1850000 0.9180000
[ 7 ] 2027.17928~ 2041.17755 2034.17842 61.00000 0.0610000 0.9790000
[ 8 ] 2041.17755~ 2055.17583 2048.17669 18.00000 0.0180000 0.9970000
[ 9 ] 2055.17583~ 2069.17410 2062.17497 3.00000 0.0030000 1.0000000
X2 probability distribution, goodness of fit test(peasrson chi square test statistic).

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit 1943.18963 1955.78808 1968.38652 1980.98497 1993.58342 2006.18187 2018.78031
2031.37876 2043.97721 2056.57566
upper limit 1955.78808 1968.38652 1980.98497 1993.58342 2006.18187 2018.78031 2031.37876
125
2043.97721 2056.57566 2069.17410
observed no 7.00000 48.00000 98.00000 205.00000 245.00000 210.00000 133.00000
38.00000 13.00000 3.00000
probability 0.01050 0.03740 0.10520 0.19890 0.25180 0.21390 0.12170
0.04650 0.01190 0.00220
expected no 10.50000 37.40000 105.20000 198.90000 251.80000 213.90000 121.70000
46.50000 11.90000 2.20000
chi square 1.16667 3.00428 0.49278 0.18708 0.18364 0.07111 1.04922
1.55376 0.10168 0.29091
pearson chi square test statistic=8.101118, degree of freedom=7
p-value=0.323700
correction:
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit 1943.18963 1955.78808 1968.38652 1980.98497 1993.58342 2006.18187 2018.78031
2031.37876 2043.97721
upper limit 1955.78808 1968.38652 1980.98497 1993.58342 2006.18187 2018.78031 2031.37876
2043.97721 2069.17410
observed no 7.00000 48.00000 98.00000 205.00000 245.00000 210.00000 133.00000
38.00000 16.00000
probability 0.01050 0.03740 0.10520 0.19890 0.25180 0.21390 0.12170
0.04650 0.01410
expected no 10.50000 37.40000 105.20000 198.90000 251.80000 213.90000 121.70000
46.50000 14.10000
chi square 1.16667 3.00428 0.49278 0.18708 0.18364 0.07111 1.04922
1.55376 0.25603
degree of freedom=6, pearson chi-square test statistic =7.964556 ,p-value=0.240700
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 382218.8128254331 382218.8128254331 384923.1253450022
error 998 990.9884599890 0.9929744088
total 999 383209.8012854221
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100[error is assumption normal distribution]
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 3.1665098965 3.2203203082 0.98329 0.32540
slpoe 1.9978640165 0.0032201709 620.42173 0.00000
[Note:The p value of t test and F test is assumption normal distribution ]
MSE=0.9929744088 , R2=0.997414 , R2(adj)=0.997411

SSX1=95759.1339139825 , SS(X2*X1)= 191313.7278968607, C.V.= 0.0004979847
126
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.27709 -0.83865 -0.52253 -0.25242 0.00002 0.25243
0.52253 0.83823 1.27698
upper limit -1.27709 -0.83865 -0.52253 -0.25242 0.00002 0.25243 0.52253
0.83823 1.27698
observed no 0.00000 148.00000 224.00000 152.00000 93.00000 94.00000 75.00000
54.00000 63.00000 97.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 100.00000 23.04000 153.76000 27.04000 0.49000 0.36000 6.25000
21.16000 13.69000 0.09000
degree of freedom=8
Z=0.293092, p-value=0.615300
H0: residual is random , H1: Oscillation Z=0.293092, p-value=0.384700
Z=0.293092, p-value=0.769400
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0 D.W. test=2.050645
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0 D.W. test=1.949355
127
(26.1.3) )residual analysis
[ 1 ] -1.06633~ -0.28840 -0.67736 512.00000 0.5120000 0.5120000
[ 2 ] -0.28840~ 0.48953 0.10056 270.00000 0.2700000 0.7820000
[ 3 ] 0.48953~ 1.26745 0.87849 121.00000 0.1210000 0.9030000
[ 4 ] 1.26745~ 2.04538 1.65641 49.00000 0.0490000 0.9520000
[ 5 ] 2.04538~ 2.82330 2.43434 27.00000 0.0270000 0.9790000
[ 6 ] 2.82330~ 3.60123 3.21227 11.00000 0.0110000 0.9900000
[ 7 ] 3.60123~ 4.37915 3.99019 5.00000 0.0050000 0.9950000
[ 8 ] 4.37915~ 5.15708 4.76812 1.00000 0.0010000 0.9960000
[ 9 ] 5.15708~ 5.93501 5.54604 4.00000 0.0040000 1.0000000

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.06633 -0.36619 0.33394 1.03407 1.73421 2.43434 3.13447
3.83461 4.53474 5.23487
upper limit -0.36619 0.33394 1.03407 1.73421 2.43434 3.13447 3.83461
4.53474 5.23487 5.93501
observed no 472.00000 270.00000 128.00000 73.00000 25.00000 16.00000 8.00000
3.00000 1.00000 4.00000
probability 0.48138 0.24965 0.12948 0.06715 0.03482 0.01806 0.00937
0.00486 0.00252 0.00271
expected no 481.38047 249.65331 129.47508 67.14831 34.82442 18.06063 9.36659
4.85770 2.51930 2.71419
chi square 0.18279 1.65825 0.01681 0.50995 2.77160 0.23511 0.19939
0.71043 0.91623 0.60914
pearson chi square test statistic=7.809690. degree of freedom=7, p-value=0.349600
correction:
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ]
lower limit -1.06633 -0.36619 0.33394 1.03407 1.73421 2.43434 3.13447
3.83461
upper limit -0.36619 0.33394 1.03407 1.73421 2.43434 3.13447 3.83461
5.93501
observed no 472.00000 270.00000 128.00000 73.00000 25.00000 16.00000 8.00000
8.00000
probability 0.48138 0.24965 0.12948 0.06715 0.03482 0.01806 0.00937
0.01009
expected no 481.38047 249.65331 129.47508 67.14831 34.82442 18.06063 9.36659
10.09118
chi square 0.18279 1.65825 0.01681 0.50995 2.77160 0.23511 0.19939
0.43335
degree of freedom=5, pearson chi-square test statistic =6.007245 , p-value=0.305500
128
(26.1.4)Conclusion,
X1~Normal distribution, X2~Normal distribution,

residual~ Shifted exponential distribution,
X1 is constant in the conditional probability distribution,

X2=3.166510+1.997864*X1+residual~ Shifted exponential distribution,
X1 is random variables in the joint probability distribution,

X2=3.166510+1.997864*X1+residual~Normal distribution,
The probability distribution of residual is not same as the probability distribution of

X2 .
Note: please refere Appendix 11.

f(x1,x2) f(x2,x1)

sample cov(X1,X2)= 200.0345,
129
S.D. : 10.00090
MAD : 7.97943
Range : 114.27209
Mid_range : 999.86497
Median : 999.99816
Q1 : 993.25136
Q2 : 999.99816
Q3 : 1006.74323
IQR : 13.49187
C.V. : 0.01000
SLLN analysis, X1=residual and Normal(1000,100),Note:X2~ Normal(1000,100),

X2 is representable code of Normal(1000,100),

S.D. : 20.02664
MAD : 15.97869
Range : 228.62959
Mid_range : 2000.24908
Median : 2000.99600
Q1 : 1987.48714
Q2 : 2000.99600
Q3 : 2014.50318
IQR : 27.01605
C.V. : 0.01001
130
SLLN analysis, X2=residual and Normal(2001,401),Note:X3~ Normal(2001,401),
X3is representable code of Normal(2001,401)
(26.2.2)X2 is dependent variable and X1 is independent variable.

The linear model analysis
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
X1 1 40006611468.1925660000 40006611468.1925660000
error 99999998 100002947.0060882600 1.0000294901
total 99999999 40106614415.1986540000
----------------------------------------------------------------------------------
F test value=40005431705.5523380000
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 1.0145090303 0.0099997307 101.45364 0.00000
slpoe 1.9999854581 0.0000099992 200013.57880 0.00000
----------------------------------------------------------------------------------
MSE=1.0000294901 , R2=0.997507 , R2(adj)=0.997507
X2(mean)= 2000.9964093958, X2(variance)= 401.0661481626, X2(s.d.)=20.0266359672
SSX1=10001798311.2688870000 , SS(X2*X1)=20003451177.7882690000, C.V.= 0.0004997584
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.64493 -1.28162 -1.03644 -0.84163 -0.67448 -0.52439
-0.38530 -0.25358 -0.12561 -0.00023 0.12544 0.25333 0.38529 0.52439
0.67444 0.84156 1.03640 1.28151 1.64488
upper limit -1.64493 -1.28162 -1.03644 -0.84163 -0.67448 -0.52439 -0.38530
-0.25358 -0.12561 -0.00023 0.12544 0.25333 0.38529 0.52439 0.67444
0.84156 1.03640 1.28151 1.64488
observed no 0.00000 0.00000 0.00000 14640748.00000 13138016.00000 10066524.00000
8069912.00000 6679338.00000 5692346.00000 4918283.00000 4346424.00000 3896316.00000
131
3529322.00000 3249696.00000 3033308.00000 2883804.00000 2808472.00000 2837661.00000
3110262.00000 7099568.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 5000000.00000 5000000.00000 5000000.00000 18588804.39990 13245460.88325
5133933.08852 1884871.93755 564035.22365 95868.59674 1335.53362 85432.31756 243623.67437
432578.75594 612712.81848 773575.48457 895657.10208 960558.99496 935141.99018 714221.94173
881637.15732
Z=-2.339794, p-value=0.009700
H0: residual is random , H1: Oscillation Z=-2.339794, p-value=0.990300
Z=-2.339794, p-value=0.019400
t=2,3,...,100000000
D.W. test=1.999609
D.W. test=2.000391
value and X2
132
Variance : 1.00003
S.D. : 1.00001
MAD : 0.73569
Range : 17.46274
Mid_range : 7.73083
Median : -0.30685
Q1 : -0.71219
Q2 : -0.30685
Q3 : 0.38620
IQR : 1.09839
C.V. : none
The curve-fitting estimated the distribution function of residual,

F(X)=1-exp( -0.9999842258*(X- -1.0000156743))
determination=0.999999994462915760
Left diagram, the comparison of the
estimated line and sample data.
(26.2.4)Conclusion,
X1~Normal(1000,100),
X2=1.014509+1.999985*X1+error~Normal(20001,401),
error~Shifted exponential(1,-1).
Note 1:
The sum of two independent normal distribution and shifted exponential distribution,
the new probability distribution is not normal distribution.
X1~Normal(1000,100), error~Shifted exponential(1,-1),
X2=1.014509+1.999985*X1+error~Normal(20001,401),
X1 value is larger than error value, the probability distribution of X2 is closed to the
normal distribution.
133
Note 2：special case 1,X1~Normal(0,0.01), error~Shifted exponential(1,-1),
X2=1+2*X1+error, X2 is not Normal(1,1.04),
X2 marginal probability distribution
Variance : 1.04050
S.D. : 1.02005
MAD : 0.75078
Range : 19.30974
Mid_range : 8.70015
Median : 0.71305
Q1 : 0.29908
Q2 : 0.71305
Q3 : 1.40650
IQR : 1.10742
C.V. : 1.02006
f(x1,x2) f(x2,x1)
134
5.5. The variances of error are not equally and the other basic
Example 27 The variances of error are not equally,

( )
X 1 ~ Normal µ X = 10, σ X2 = 12 , the population conditional expectation line is
(
E (X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0, σ 2 = X 14 , )
Three basic assumptions are
i) ε i ~ shifted exponential distribution ,ii) E (ε i ) = 0,Var (ε i ) = σ 2 is affected by X1,


[ 1 ] 6.98917~ 7.60536 7.29726 11.00000 0.0110000 0.0110000
[ 2 ] 7.60536~ 8.22154 7.91345 21.00000 0.0210000 0.0320000
[ 3 ] 8.22154~ 8.83773 8.52964 91.00000 0.0910000 0.1230000
[ 4 ] 8.83773~ 9.45392 9.14583 162.00000 0.1620000 0.2850000
[ 5 ] 9.45392~ 10.07011 9.76201 249.00000 0.2490000 0.5340000
[ 6 ] 10.07011~ 10.68629 10.37820 231.00000 0.2310000 0.7650000
[ 7 ] 10.68629~ 11.30248 10.99439 139.00000 0.1390000 0.9040000
[ 8 ] 11.30248~ 11.91867 11.61058 70.00000 0.0700000 0.9740000
[ 9 ] 11.91867~ 12.53486 12.22676 26.00000 0.0260000 1.0000000

[ 1 ] -277.85832~ -205.75350 -241.80591 9.00000 0.0090000 0.0090000
[ 2 ] -205.75350~ -133.64868 -169.70109 43.00000 0.0430000 0.0520000
[ 3 ] -133.64868~ -61.54386 -97.59627 139.00000 0.1390000 0.1910000
[ 4 ] -61.54386~ 10.56096 -25.49145 265.00000 0.2650000 0.4560000
[ 5 ] 10.56096~ 82.66578 46.61337 288.00000 0.2880000 0.7440000
[ 6 ] 82.66578~ 154.77060 118.71819 180.00000 0.1800000 0.9240000
[ 7 ] 154.77060~ 226.87542 190.82301 52.00000 0.0520000 0.9760000
[ 8 ] 226.87542~ 298.98024 262.92783 20.00000 0.0200000 0.9960000
[ 9 ] 298.98024~ 371.08506 335.03265 4.00000 0.0040000 1.0000000
135
(27.1.2)Linear model
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 482.1211954987 482.1211954987 0.0511827938
error 998 9400755.9438872896 9419.5951341556
total 999 9401238.0650827885
----------------------------------------------------------------------------------
H0: slope(X1)=0
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 14.3080334157 31.4308126066 0.45522 0.64880
slpoe 0.7081806427 3.1302718635 0.22624 0.82080
----------------------------------------------------------------------------------
MSE=9419.5951341556 , R2=0.000051 , R2(adj)=-0.000951
SSX1=961.3203181839 , SS(X2*X1)= 680.7884407509, C.V.= 4.5384772752
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -124.38487 -81.68244 -50.89324 -24.58548 0.00243 24.58640
50.89335 81.64170 124.37486
upper limit -124.38487 -81.68244 -50.89324 -24.58548 0.00243 24.58640 50.89335
81.64170 124.37486
observed no 85.00000 109.00000 111.00000 106.00000 89.00000 112.00000 103.00000
93.00000 92.00000 100.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 2.25000 0.81000 1.21000 0.36000 1.21000 1.44000 0.09000
0.49000 0.64000 0.00000
degree of freedom=8
Z=-1.455376, p-value=0.072800
Z=-1.455376, p-value=0.927200
Z=-1.455376, p-value=0.145600
136
t=2,3,...,1000
D.W. test=2.015410
D.W. test=1.984590
(27.1.3)residual analysis, the residual is dependent variable and X1 is independent

variable.
X0=the residual of the first estimated line,X0 is dependent variable,X1 is
independent variable and the model is non-linear model.
X 0i = residuali = α 0 + α 1G ( X 1i ) + ε i , i = 1,2,...., n ,
Let X 0i = X 2i , X 2i is temporary symbol.
|error|= -242.5040219496+ 101.0672375803*|X1|^0.5
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
|X1|^0.5 1 248315.7094192368 248315.7094192368 75.4314302944
error 998 3285355.6804243643 3291.9395595435
total 999 3533671.3898436013
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0

----------------------------------------------------------------------------------
intercept -242.5040219496 36.7858489028 -6.59232 0.00000
slpoe 101.0672375803 11.6368175223 8.68513 0.00000
----------------------------------------------------------------------------------
MSE=3291.9395595435 , R2=0.070271 , R2(adj)=0.069340
|error|(mean)= 76.5968965040, |error|(variance)= 3537.2085984420, |error|(s.d.)= 9.4744365122
|X1|^0.5(mean)=3.1573131521, |X1|^0.5(variance)= 0.0243342472, |X1|^0.5(s.d.)= 0.1559943821
SS(|X1|^0.5)= 24.3099129979 , SS(|error|*|X1|^0.5)= 2456.9357525185, C.V.= 0.7490568034
137
estimated line the residual plot of the second estimated
line
|residual|=-242.5040219496+101.0672375803*|X1|^0.5,
E(residual*residual)=Var(residual)=101.0672376803*101.0672376803*E(|X1|),
X2=14.308033+0.708181*X1+residual,
|residual|=-242.5040219496+101.0672375803*|X1|^0.5+residual*,
The analysis has a problem, X2=14.308033+0.708181*X1+residual

|residual|=-242.5040219496+101.0672375803*|X1|^0.5+residual*,
This analysis result cannot explain.
138
f(x1,x2) f(x2,x1)

E(X2|x1) and x1 E(X1|x2) and x2 are not linear relation
(27.2.1.2) X1 marginal probability distribution,

Variance : 0.99996
S.D. : 0.99998
MAD : 0.79789
Range : 11.41489
Mid_range : 9.66185
Median : 9.99991
Q1 : 9.32546
Q2 : 9.99991
Q3 : 10.67460
IQR : 1.34913
C.V. : 0.10000
139

Variance : 10607.91637
S.D. : 102.99474
MAD : 80.60438
Range : 1648.84473
Mid_range : -22.82641
Median : 20.57029
Q1 : -45.30808
Q2 : 20.57029
Q3 : 86.88324
IQR : 132.19132
C.V. : 4.90899
The curve-fitting estimated the distribution function of X2,

F(X)= 0.03852146210650201500+
0.00073273371405036133*(X--161.03017739629126000000)^1+
0.00000611198306385231*(X--161.03017739629126000000)^2+
0.00000002767942017320*(X--161.03017739629126000000)^3+
0.00000000006940702365*(X--161.03017739629126000000)^4+
0.00000000000008914891*(X--161.03017739629126000000)^5+
0.00000000000000004519*(X--161.03017739629126000000)^6+
value range 0.0000000000<=F(x)<= 0.1000000000 ,
value range -847.2487795868<=X<= -107.5163093668 ,
determination=0.999998788718186820,
F(X)= 0.14788746493879376000+
0.00223992172727974830*(X--82.83744668486399100000)^1+
0.00001253233379517151*(X--82.83744668486399100000)^2+
0.00000001481878528830*(X--82.83744668486399100000)^3+
140
value range 0.1000000100<=F(x)<= 0.2000000000 ,
value range -107.5163092722<=X<= -62.0432012067 ,
determination=0.999999914447524340,
F(X)= 0.24899682944380910000+
0.00318975348182066500*(X--45.62364773537726100000)^1+
0.00001213660629145654*(X--45.62364773537726100000)^2+
-0.00000002751632815062*(X--45.62364773537726100000)^3+
value range 0.2000000100<=F(x)<= 0.3000000000 ,
value range -62.0431986609<=X<= -30.4738637617 ,
determination=0.999999963604179750,
F(X)= 0.34951062691196494000+
0.00379050655727160810*(X--16.97804083056261600000)^1+
0.00000839573872546268*(X--16.97804083056261600000)^2+
-0.00000004136959653219*(X--16.97804083056261600000)^3+
value range 0.3000000100<=F(x)<= 0.4000000000 ,
value range -30.4738612351<=X<= -4.0029828124 ,
determination=0.999999961701279250,
F(X)= 0.44985527289905308000+
0.00408116139290391410*(X-8.35507859593608690000)^1+
0.00000287623085366576*(X-8.35507859593608690000)^2+
-0.00000007602363901208*(X-8.35507859593608690000)^3+
value range 0.4000000100<=F(x)<= 0.5000000000 ,
value range -4.0029806215<=X<= 20.5702906021 ,
determination=0.999999965154154900,
F(X)= 0.55016728455410380000+
0.00407298611669351320*(X-32.79901676105938400000)^1+
-0.00000331194750528143*(X-32.79901676105938400000)^2+
-0.00000007132410243069*(X-32.79901676105938400000)^3+
value range 0.5000000100<=F(x)<= 0.6000000000 ,
value range 20.5702921348<=X<= 45.1923860199 ,
determination=0.999999988154823270,
F(X)= 0.65050750867698381000+
0.00376151001768973770*(X-58.26221002426717600000)^1+
-0.00000856418163655686*(X-58.26221002426717600000)^2+
-0.00000005153877821425*(X-58.26221002426717600000)^3+
value range 0.6000000100<=F(x)<= 0.7000000000 ,
value range 45.1923926241<=X<= 71.8755376329 ,
determination=0.999999930999149970,
F(X)= 0.75101213763260977000+
0.00315197743657036340*(X-87.20846187146278800000)^1+
-0.00001194425136915499*(X-87.20846187146278800000)^2+
-0.00000003236150879327*(X-87.20846187146278800000)^3+
value range 0.7000000100<=F(x)<= 0.8000000000 ,
value range 71.8755434489<=X<= 103.8430413815 ,
determination=0.999999950153608100,
F(X)= 0.85212189000189875000+
141
0.00220249096772242440*(X-124.97668440835029000000)^1+
-0.00001217755270888239*(X-124.97668440835029000000)^2+
0.00000001566674319784*(X-124.97668440835029000000)^3+
value range 0.8000000100<=F(x)<= 0.9000000000 ,
value range 103.8430422181<=X<= 150.0839242573 ,
determination=0.999999966246177600,
F(X)= 0.96133595246982062000+
0.00073725354816319474*(X-204.67622832737760000000)^1+
-0.00000571180585784637*(X-204.67622832737760000000)^2+
0.00000001913929754021*(X-204.67622832737760000000)^3+
-0.00000000002195783893*(X-204.67622832737760000000)^4+
value range 0.9000000100<=F(x)<= 0.9999999900 ,
value range 150.0839267729<=X<= 801.5959538084 ,
determination=0.999617179281705900
Left diagram, the comparison of
(27.2.2)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 396935813.1594531500 396935813.1594531500 37432.8359736426
Error 99999998 1060394690640.6692000000 10603.9471184856
total 99999999 1060791626453.8286000000
----------------------------------------------------------------------------------
H0: slope(X1)=0, The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 1.0571068904 0.1034915241 10.21443 0.00000
slpoe 1.9923693011 0.0102977768 193.47567 0.00000
----------------------------------------------------------------------------------
MSE=10603.9471184856 , R2=0.000374 , R2(adj)=0.000374
SSX1= 99995533.4745774870 , SS(X2*X1)=199228031.1403110000, C.V.= 4.9080733901
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -169.38485 -131.97304 -106.72642 -86.66552 -69.45338 -53.99801
-39.67537 -26.11194 -12.93506 -0.02331 12.91674 26.08630 39.67459 53.99812
69.44932 86.65914 106.72188 131.96242 169.37991
upper limit -169.38485 -131.97304 -106.72642 -86.66552 -69.45338 -53.99801 -39.67537
-26.11194 -12.93506 -0.02331 12.91674 26.08630 39.67459 53.99812 69.44932
86.65914 106.72188 131.96242 169.37991
observed no 4938599.00000 4548475.00000 4678516.00000 4822658.00000 4952994.00000 5070986.00000
5163751.00000 5224911.00000 5292317.00000 5296576.00000 5311669.00000 5292980.00000
142
5233672.00000 5161407.00000 5068570.00000 4952938.00000 4820357.00000 4679514.00000
4548569.00000 4940541.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 754.01656 40774.96513 20670.39245 6290.03699 441.91281 1007.80244 5362.87800
10116.99158 17089.84570 17591.46476 19427.51311 17167.45608 10920.52072 5210.44393
940.36898 442.96637 6454.32149 20542.25524 40757.98955 707.07454
Z=-0.869998, p-value=0.192200
Z=-0.869998, p-value=0.384400
The joint probability of x1 and residual The joint probability of X2 estimated

value and X2
143
(27.2.3) residual analysis I, the first line model residual,
residual = X 2i − 1.057107 + 1.992369 X 1i ,residual is dependent vairable,X1 is
independent variable, the model is non-linear model.
X 2i − 1.057107 + 1.992369 X 1i = residual i = α 0 + α 1G ( X 1i ) + ε i* , i = 1,2,...., n ,
|error|= 0.0147617544+ 0.7977439361*X1^2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
F
----------------------------------------------------------------------------------
X1^2 1 25582884352.8384210000 25582884352.8384210000
error 99999998 385383264709.4420800000 3853.8327241711
total 99999999 410966149062.2805200000
----------------------------------------------------------------------------------
F test value=6638296.5177454781,
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 0.0147617544 0.0318823771 0.46301 0.64320
slpoe 0.7977439361 0.0003096244 2576.48918 0.00000
----------------------------------------------------------------------------------
MSE=3853.8327241711 , R2=0.062251 , R2(adj)=0.062251
SS(X1^2)=40199669655.5234070000 , SS(|error|*X1^2)=32069042701.9511030000,
C.V.= 0.7703369759

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
[ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ]
[ 12 ] [ 13 ] [ 14 ] [ 15 ] [ 16 ] [ 17 ]
[ 18 ] [ 19 ] [ 20 ]
lower limit -102.11445 -79.56057 -64.34053 -52.24672 -41.87030
-32.55296 -23.91848 -15.74171 -7.79796 -0.01405 7.78691
15.72625 23.91801 32.55302 41.86785 52.24287 64.33779
79.55416 102.11148
upper limit -102.11445 -79.56057 -64.34053 -52.24672 -41.87030 -32.55296
-23.91848 -15.74171 -7.79796 -0.01405 7.78691 15.72625
23.91801 32.55302 41.86785 52.24287 64.33779 79.55416
102.11148
observed no 449494.00000 3899843.00000 7590504.00000 8577220.00000
7962185.00000 7040159.00000 6281749.00000 5667769.00000 5216269.00000
4814978.00000 4512140.00000 4260655.00000 4040228.00000 3873429.00000
3742699.00000 3663769.00000 3641944.00000 3715624.00000 4027143.00000
7022199.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000
chi square 4141420.97121 242069.08493 1342142.19480 2559300.58568 1754907.99485 832449.74906
328576.09980 89183.08747 9354.45607 6846.62810 47601.47592 109326.20581 184232.45840
144
253832.44361 316161.16092 357102.65707 368863.21983 329924.34188 189290.14849 817857.75912
p-value=0.000000
Z=-0.271073, p-value=0.786400
t=2,3,...,100000000
the joint probability distribution of X1^2 the joint probability distribution of
and residual of the second line model absoluted value of resisual(1st estimated
line) and the estimated value.
The residual analysis(the residual is com from the second estimated line )
X0= the estimate value of ε i* ,X0 the frequency distribution table
Variance : 3853.83269
S.D. : 62.07925
MAD : 48.75117
Range : 911.65015
Mid_range : 275.73722
Median : -11.94476
Q1 : -46.39342
Q2 : -11.94476
Q3 : 34.48710
IQR : 80.88052
C.V. : none
145
(27.2.4) residual analysis II, the first line model residual,
The residual is come from X2 estimated line.
residual= X 2i − 1.057107 + 1.992369 X 1i , square of residual is dependent variable,X1
is the independent variable, the model is non-linear model.
( X 2i − 1.057107 + 1.992369 X 1i )2 = (residuali )2 = α 0 + α 1G ( X 1i ) + ε i* , i = 1,2,...., n ,
The non-linear model does not have the modelerror^2=b0+b1*X1^4.
Please refer the Appendix 2.
error^2= -3531.6577789815+ 13.7238340284*X1^3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
F
----------------------------------------------------------------------------------
X1^3 1 1763214739041286.0000000000
1763214739041286.0000000000 6769613.5938211801
error 99999998 26046016945285144.0000000000
260460174.6620549300
total 99999999 27809231684326428.0000000000
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0

----------------------------------------------------------------------------------
intercept -3531.6577789815 5.6675483878 -623.13677 0.00000
slpoe 13.7238340284 0.0052746484 2601.84811 0.00000
----------------------------------------------------------------------------------
MSE=260460174.6620549300 , R2=0.063404 , R2(adj)=0.063404
error^2(mean)= 10603.9469064106, error^2(variance)=278092319.6241874700,
error^2(s.d.)= 16676.1002522828
X1^3(mean)= 1030.0040539831, X1^3(variance)= 93616.9089547982,
X1^3(s.d.)= 305.9688038915
SS(X1^3)=9361690801862.9160000000 , SS(error^2*X1^3)=128478290789502.3700000000,
C.V.= 1.5219595818

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -26546.75569 -20683.40847 -16726.64515 -13582.61012 -10885.04647
-8462.81135 -6218.10197 -4092.38084 -2027.24061 -3.65302 2024.36886 4088.36258 6217.97987
8462.82894 10884.40911 13581.60944 16725.93368 20681.74340 26545.98195
upper limit -26546.75569 -20683.40847 -16726.64515 -13582.61012 -10885.04647 -8462.81135
-6218.10197 -4092.38084 -2027.24061 -3.65302 2024.36886 4088.36258 6217.97987 8462.82894
10884.40911 13581.60944 16725.93368 20681.74340 26545.98195
observed no 24832.00000 368756.00000 1653540.00000 4335030.00000 8127396.00000 11739573.00000
13421411.00000 12303544.00000 9447384.00000 6644643.00000 4926869.00000 3934248.00000
3272900.00000 2810510.00000 2482084.00000 2245937.00000 2102090.00000 2054233.00000
2205526.00000 5899494.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 4950459.32564 4289684.19751 2239758.90632 88437.02018 1956121.14816 9084368.84447
14184032.64618 10668350.99199 3955844.88869 540970.11949 1069.62863 227165.46510 596574.88200
958773.29202 1267980.19661 1516972.60159 1679576.47362 1735508.64366 1561816.98734
146
161817.89121
p-value=0.000000
Z=0.323444, p-value=0.626900
Z=0.323444, p-value=0.373100
Z=0.323444, p-value=0.746200
t=2,3,...,100000000
D.W. test=2.000306
D.W. test=1.999694
(27.2.5)residual analysis conclusion,
X2=1.057107+1.992369*X1+residual,
| residual |=0.0147617544+0.7977439361*X1^2,
residual ^2=-3531.6577789815+13.7238340284*X1^3,
(27.2.6)The probability distribution transformation and removing the effect of

variance.
(27.2.6.1)X2=1.057107+1.992369*X1+residual,
let | residual |/|X1|
W1= X1,W2=(X2-1.057107-1.992369*X1)/|X1|,
f(w1,w2) f(w2,w1)
sample mean(W1)= 10.0002, sample variance(W1)= 1.0002,

sample mean(W2)= 0.0034, sample variance(W2)= 100.9855
sample cov(W1,W2)= 0.0011,W1 and W2 sample correlation coefficient =0.0001.
147
W2 Coefficient
S.D. : 10.05100
MAD : 7.97972
Range : 129.22523
Mid_range : 3.13574
Median : 0.00260
Q1 : -6.68951
Q2 : 0.00260
Q3 : 6.69701
IQR : 13.38652
C.V. : none
(27.2.6.2)
X2=1.057107+1.992369*X1+residual,
| residual |=0.0147617544+0.7977439361*X1^2,
let | residual |/(X1^2), W1= X1,W2=( X2-1.057107-1.992369*X1)/ (X1^2),
W1=Z1,W2=Z2/Z3,
f(w1,w2) f(w2,w1)

sample cov(W1,W2)= -0.0001,W1 and W2 sample correlation coefficient=-0.0001.
W2 Coefficient
Variance : 1.00010
S.D. : 1.00005
MAD : 0.79794
Range : 11.36516
Median : 0.00035
Q1 : -0.67426
Q2 : 0.00035
Q3 : 0.67493
IQR : 1.34918
C.V. : none
148
5.6. The independent variable has a shifted exponential distribution
and the non-linear model, the three basic assumptions are
unchanged.
(
Example 28 X 1 ~ Shifted _ exponential λ X 1 = 1, c X 1 = 0.1 , )
E ( X 2 x1 ) = β 0 + β1 ( x1 + log( x1 )) = 1 + 2( x1 + log( x1 )),
ε ~ Normal (0, σ 2 = 1),
X 2i = β 0 + β1 H ( X 1i ) + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope,
ε i is error,
three basic assumptions are
i) ε i ~ Normal distribution,ii) E (ε i ) = 0,Var (ε i ) = σ 2 ,
(28.1.1) Basic analysis,

[ 1 ] 0.10011~ 0.73365 0.41688 462.00000 0.4620000 0.4620000
[ 2 ] 0.73365~ 1.36719 1.05042 238.00000 0.2380000 0.7000000
[ 3 ] 1.36719~ 2.00073 1.68396 148.00000 0.1480000 0.8480000
[ 4 ] 2.00073~ 2.63427 2.31750 72.00000 0.0720000 0.9200000
[ 5 ] 2.63427~ 3.26781 2.95104 35.00000 0.0350000 0.9550000
[ 6 ] 3.26781~ 3.90136 3.58459 25.00000 0.0250000 0.9800000
[ 7 ] 3.90136~ 4.53490 4.21813 11.00000 0.0110000 0.9910000
[ 8 ] 4.53490~ 5.16844 4.85167 5.00000 0.0050000 0.9960000
[ 9 ] 5.16844~ 5.80198 5.48521 4.00000 0.0040000 1.0000000

[ 1 ] -5.67659~ -3.20823 -4.44241 21.00000 0.0210000 0.0210000
[ 2 ] -3.20823~ -0.73987 -1.97405 168.00000 0.1680000 0.1890000
[ 3 ] -0.73987~ 1.72849 0.49431 268.00000 0.2680000 0.4570000
[ 4 ] 1.72849~ 4.19685 2.96267 231.00000 0.2310000 0.6880000
[ 5 ] 4.19685~ 6.66520 5.43103 161.00000 0.1610000 0.8490000
[ 6 ] 6.66520~ 9.13356 7.89938 81.00000 0.0810000 0.9300000
[ 7 ] 9.13356~ 11.60192 10.36774 45.00000 0.0450000 0.9750000
149
[ 8 ] 11.60192~ 14.07028 12.83610 19.00000 0.0190000 0.9940000
[ 9 ] 14.07028~ 16.53864 15.30446 6.00000 0.0060000 1.0000000
frequency distribution: sample mean=2.708426 , sample variance=15.002841 , sample s
(28.1.2)The linear model

(28.1.2.1)The linear model analysis
The estimated line is X2=-1.420213+3.700399*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 12973.1132764297 12973.1132764297 8117.2024064228
error 998 1595.0282377623 1.5982246871
total 999 14568.1415141920
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
----------------------------------------------------------------------------------
intercept -1.4202132961 0.0607028404 -23.39616 0.00000
slpoe 3.7003989443 0.0410719536 90.09552 0.00000
----------------------------------------------------------------------------------
MSE=1.5982246871 , R2=0.890513 , R2(adj)=0.890403
X2(mean)= 2.6952984452, X2(variance)=14.5827242384, X2(s.d.)= 3.8187333291
SSX1=947.4299587109 , SS(X2*X1)= 3505.8688189718, C.V.= 0.4690423495
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.62021 -1.06398 -0.66292 -0.32024 0.00003 0.32026
0.66292 1.06344 1.62008
upper limit -1.62021 -1.06398 -0.66292 -0.32024 0.00003 0.32026 0.66292
1.06344 1.62008
observed no 100.00000 80.00000 102.00000 79.00000 104.00000 123.00000 124.00000
93.00000 104.00000 91.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.00000 4.00000 0.04000 4.41000 0.16000 5.29000 5.76000
0.49000 0.16000 0.81000
degree of freedom=8
p-value=0.006800
Z=-0.861633, p-value=0.194500
Z=-0.861633, p-value=0.805500
150
Z=-0.861633, p-value=0.389000
t=2,3,...,1000
D.W. test=1.946452
D.W. test=2.053548

[ 1 ] -5.17614~ -4.21556 -4.69585 3.00000 0.0030000 0.0030000
[ 2 ] -4.21556~ -3.25497 -3.73527 5.00000 0.0050000 0.0080000
[ 3 ] -3.25497~ -2.29439 -2.77468 38.00000 0.0380000 0.0460000
[ 4 ] -2.29439~ -1.33381 -1.81410 96.00000 0.0960000 0.1420000
[ 5 ] -1.33381~ -0.37323 -0.85352 205.00000 0.2050000 0.3470000
[ 6 ] -0.37323~ 0.58735 0.10706 334.00000 0.3340000 0.6810000
[ 7 ] 0.58735~ 1.54794 1.06764 214.00000 0.2140000 0.8950000
[ 8 ] 1.54794~ 2.50852 2.02823 90.00000 0.0900000 0.9850000
[ 9 ] 2.50852~ 3.46910 2.98881 15.00000 0.0150000 1.0000000
frequency distribution: sample mean=0.004280 , sample variance=1.645714 , sample sd=
X0= residual,goodness of fit(peasrson chi square test statistic).

mu point estimated value=-0.000000 (MLE)
mu value from -0.252842 to 0.252842
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
151
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.56114 -1.02345 -0.63577 -0.30451 0.00509 0.31464
0.64588 1.03305 1.57113
upper limit -1.56114 -1.02345 -0.63577 -0.30451 0.00509 0.31464 0.64588
1.03305 1.57113
observed no 108.00000 83.00000 97.00000 79.00000 98.00000 122.00000 115.00000
91.00000 105.00000 102.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.64000 2.89000 0.09000 4.41000 0.04000 4.84000 2.25000
0.81000 0.25000 0.04000
degree of freedom=7
(28.1.3)Non-linear model
(28.1.3.1) Non-linear model analysis
The relation is X2= -5.7019052126+ 8.6972906461*|X1|^0.5
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
|X1|^0.5 1 13615.4936331317 13615.4936331317 14263.6780241845
error 998 952.6478810603 0.9545569951
total 999 14568.1415141920
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
----------------------------------------------------------------------------------
intercept -5.7019052126 0.0767990536 -74.24447 0.00000
slpoe 8.6972906461 0.0728229420 119.43064 0.00000
----------------------------------------------------------------------------------
MSE=0.9545569951 , R2=0.934607 , R2(adj)=0.934542
|X1|^0.5(mean)= 0.9654964977, |X1|^0.5(variance)= 0.1801772425, |X1|^0.5(s.d.)= 0.4244728996
SS(|X1|^0.5)= 179.9970652660 , SS(X2*|X1|^0.5)= 1565.4867920592, C.V.= 0.3624883651

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.25214 -0.82227 -0.51232 -0.24749 0.00002 0.24750
0.51233 0.82186 1.25204
upper limit -1.25214 -0.82227 -0.51232 -0.24749 0.00002 0.24750 0.51233
0.82186 1.25204
observed no 100.00000 92.00000 93.00000 106.00000 114.00000 106.00000 100.00000
95.00000 89.00000 105.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.00000 0.64000 0.49000 0.36000 1.96000 0.36000 0.00000
152
0.25000 1.21000 0.25000
degree of freedom=8
p-value=0.700800
Z=-1.515641, p-value=0.064900
Z=-1.515641, p-value=0.935100
Z=-1.515641, p-value=0.129800
t=2,3,...,1000
D.W. test=1.863513
D.W. test=2.136487

[ 1 ] -3.38599~ -2.67618 -3.03108 6.00000 0.0060000 0.0060000
[ 2 ] -2.67618~ -1.96636 -2.32127 13.00000 0.0130000 0.0190000
[ 3 ] -1.96636~ -1.25655 -1.61146 80.00000 0.0800000 0.0990000
[ 4 ] -1.25655~ -0.54674 -0.90165 173.00000 0.1730000 0.2720000
[ 5 ] -0.54674~ 0.16307 -0.19183 302.00000 0.3020000 0.5740000
[ 6 ] 0.16307~ 0.87288 0.51798 247.00000 0.2470000 0.8210000
[ 7 ] 0.87288~ 1.58270 1.22779 129.00000 0.1290000 0.9500000
[ 8 ] 1.58270~ 2.29251 1.93760 41.00000 0.0410000 0.9910000
[ 9 ] 2.29251~ 3.00232 2.64742 9.00000 0.0090000 1.0000000
153
X 2i = β 0 + β1 ( X 1i + log( X 1i )) + ε i , i = 1,2,...., n ,n=1,000 時,
The estimated line X2=-5.7019052126+8.6972906461*|X1|^0.5,
MSE=0.9545569951 , R2=0.934607,
X 1 + log( X 1 ) can replaced by the X1 .
(28.1.4)Curve-linear model
(28.1.4.1)Curve-linear model analysis,
The estimated line ------
X2=3.46357664007337010000+
3.67239577180589550000*(X1-1.11218055224209960000)^1+
-1.27835332491667940000*(X1-1.11218055224209960000)^2+
0.90127949018938125000*(X1-1.11218055224209960000)^3+
0.49003005831036717000*(X1-1.11218055224209960000)^4+
-0.29802305408520624000*(X1-1.11218055224209960000)^5+
-0.59487223676114809000*(X1-1.11218055224209960000)^6+
0.58458658553718124000*(X1- 1.11218055224209960000)^7+
-0.20875884690030944000*(X1-1.11218055224209960000)^8+
0.03382465923368727100*(X1- 1.11218055224209960000)^9+
-0.00208700331694444690*(X1- 1.11218055224209960000)^10+
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
regression 10 13640.4301041395 1364.0430104139 1454.1575350712
error 989 927.7114100525 0.9380297372
total 999 14568.1415141920
----------------------------------------------------------------------------------
MSE= 0.9380297372 , R2=0.936319 , R2(adj)=0.935675
X2(Mean)= 2.6952984452, X2(Var)= 14.5827242384, X2(sd)= 3.8187333291
X1(Mean)= 1.1121805522, X1(Var)= 0.9483783370, X1(sd)= 0.9738471836
------------------- individual test -------------------------
parameter coefficient standard error t test p value
----------------------------------------------------------------------------------
b0 3.4635766401 0.0707195225 48.9762447154 0.0000000000
b1 3.6723957718 0.1983373678 18.5159045545 0.0000000000
b2 -1.2783533249 0.4801468941 -2.6624213144 0.0078000000
b3 0.9012794902 0.5152977320 1.7490461032 0.0802000000
b4 0.4900300583 0.8797847787 0.5569885615 0.5774000000
b5 -0.2980230541 0.4557298397 -0.6539467643 0.5132000000
b6 -0.5948722368 0.4481985075 -1.3272517128 0.1846000000
b7 0.5845865855 0.3624036389 1.6130814453 0.1066000000
b8 -0.2087588469 0.1211280822 -1.7234553967 0.0850000000
b9 0.0338246592 0.0188148083 1.7977679461 0.0722000000
b10 -0.0020870033 0.0011241658 -1.8564906575 0.0634000000
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
154
lower limit -1.24125 -0.81512 -0.50787 -0.24534 0.00002 0.24535
0.50787 0.81471 1.24115
upper limit -1.24125 -0.81512 -0.50787 -0.24534 0.00002 0.24535 0.50787
0.81471 1.24115
observed no 97.00000 96.00000 94.00000 96.00000 131.00000 87.00000 106.00000
104.00000 88.00000 101.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.09000 0.16000 0.36000 0.16000 9.61000 1.69000 0.36000
0.16000 1.44000 0.01000
degree of freedom=8
Z=-1.178388, p-value=0.119400
Z=-1.178388, p-value=0.880600
Z=-1.178388, p-value=0.238800
t=2,3,...,1000
D.W. test=1.860814
D.W. test=2.139186


[ 1 ] -3.35366~ -2.63329 -2.99348 5.00000 0.0050000 0.0050000
[ 2 ] -2.63329~ -1.91292 -2.27310 18.00000 0.0180000 0.0230000
[ 3 ] -1.91292~ -1.19255 -1.55273 88.00000 0.0880000 0.1110000
[ 4 ] -1.19255~ -0.47218 -0.83236 187.00000 0.1870000 0.2980000
[ 5 ] -0.47218~ 0.24820 -0.11199 307.00000 0.3070000 0.6050000
[ 6 ] 0.24820~ 0.96857 0.60838 245.00000 0.2450000 0.8500000
[ 7 ] 0.96857~ 1.68894 1.32875 108.00000 0.1080000 0.9580000
[ 8 ] 1.68894~ 2.40931 2.04912 36.00000 0.0360000 0.9940000
[ 9 ] 2.40931~ 3.12968 2.76950 6.00000 0.0060000 1.0000000
155
(28.2) n = 100,000,000, it is big data.
(28.2.1) Basiec analysis,
(28.2.1.1) X1 and X2 joint probability distribution
f(x1,x2) f(x2,x1)

sample cov(X1,X2)= 3.5981,
E(X2|x1) and x1 E(X1|x2) and x2 are linear relation

Variance : 1.00032
S.D. : 1.00016
MAD : 0.73586
Range : 18.49177
Mid_range : 9.34588
Median : 0.79305
Q1 : 0.38758
Q2 : 0.79305
Q3 : 1.48635
IQR : 1.09877
C.V. : 0.90925
156
Curve-fitting estimated the distribution function of X1,
F(X)=1- exp( -1*(X-0.1000000037)/ 0.9998860023 )^ 0.9999155137 )
SSE=0.000941193706477202 MAX error=0.000090575757443090
Left diagram, the comparison the

Variance : 14.71496
S.D. : 3.83601
MAD : 3.02132
Range : 52.72124
Median : 2.15319
Q1 : -0.17933
Q2 : 2.15319
Q3 : 4.87221
IQR : 5.05153
C.V. : 1.46197
Curve-fitting estimated the distribution function of X2,

F(X)= 0.04235313473516118300+
0.04387530716953855200*(X--2.88092966054477760000)^1+
0.01439407138442670700*(X--2.88092966054477760000)^2+
0.00141422281221382540*(X--2.88092966054477760000)^3+
value range 0.0000000000<=F(x)<= 0.1000000000 ,
value range -8.3331827207<=X<= -1.8970802060 ,
determination=0.999844720399884150,

F(X)= 0.14888656638727046000+0.08402433214752497200*(X--1.26828800336955980000)^1+
0.00928686947972101610*(X--1.26828800336955980000)^2+
-0.00111670593855350830*(X--1.26828800336955980000)^3+
value range 0.1000000100<=F(x)<= 0.2000000000 ,
value range -1.8970800511<=X<= -0.6939370987 ,
determination=0.999999970225285530,

F(X)= 0.24951169666166911000+
0.10041175148154680000*(X- -0.18411548574877493000)^1+
0.00586599118738168060*(X- -0.18411548574877493000)^2+
-0.00120254376630501980*(X- -0.18411548574877493000)^3+
value range 0.2000000100<=F(x)<= 0.3000000000 ,
value range -0.6939370269<=X<= 0.3061470283 ,
157
determination=0.999999970317733580,

F(X)= 0.34984408914049631000+
0.10825483340496544000*(X-0.77224849623803193000)^1+
0.00218159313606588330*(X-0.77224849623803193000)^2+
-0.00124322430239853790*(X-0.77224849623803193000)^3+
value range 0.3000000100<=F(x)<= 0.4000000000 ,
value range 0.3061470638<=X<= 1.2324693163 ,
determination=0.999999967007586870,

F(X)= 0.45009810161313046000+
0.10886601163202503000*(X-1.69093508349217390000)^1+
-0.00138821569535929610*(X-1.69093508349217390000)^2+
-0.00126526168316143380*(X- 1.69093508349217390000)^3+
value range 0.4000000100<=F(x)<= 0.5000000000 ,
value range 1.2324694160<=X<= 2.1531903379 ,
determination=0.999999940232483840,

F(X)= 0.55037298751045505000+
0.10298727740422642000*(X-2.63272574179456950000)^1+
-0.00472636456162940640*(X- 2.63272574179456950000)^2+
-0.00080965847207803421*(X-2.63272574179456950000)^3+
value range 0.5000000100<=F(x)<= 0.6000000000 ,
value range 2.1531903416<=X<= 3.1266646471 ,
determination=0.999999942289562900,

F(X)= 0.65068936815090839000+
0.09093172686165962300*(X-3.66345057334614620000)^1+
-0.00679836979442069440*(X-3.66345057334614620000)^2+
-0.00057022318392885296*(X-3.66345057334614620000)^3+
value range 0.6000000100<=F(x)<= 0.7000000000 ,
value range 3.1266647103<=X<= 4.2309263507 ,
determination=0.999999967558683030,

F(X)= 0.75120033958852883000+
0.07288645444222285900*(X-4.88872910259883930000)^1+
-0.00758938028927610970*(X-4.88872910259883930000)^2+
-0.00009753382468469241*(X-4.88872910259883930000)^3+
value range 0.7000000100<=F(x)<= 0.8000000000 ,
value range 4.2309264866<=X<= 5.6133073334 ,
determination=0.999999951070761560,

F(X)= 0.85234766657886174000+
0.04854982933888263300*(X-6.56326558693344890000)^1+
-0.00656970311243487700*(X-6.56326558693344890000)^2+
0.00035351670518002365*(X-6.56326558693344890000)^3+
value range 0.8000000100<=F(x)<= 0.9000000000 ,
value range 5.6133076365<=X<= 7.7123372986 ,
158
determination=0.999999970885772080,

F(X)= 0.96218108699700866000+
0.01452825513989730600*(X-10.36065017322863500000)^1+
-0.00257673886880663630*(X-10.36065017322863500000)^2+
0.00025929158122736662*(X-10.36065017322863500000)^3+
-0.00001475423636387863*(X-10.36065017322863500000)^4+
0.00000043082389540843*(X-10.36065017322863500000)^5+
-0.00000000493119720805*(X-10.36065017322863500000)^6+
value range 0.9000000100<=F(x)<= 0.9999999900 ,
value range 7.7123374775<=X<= 44.3880567168 ,
determination=0.999994784722464280
Left diagram, the comparison the
(28.2.2)
The relation is X2= -5.6489811404+ 8.6404656140*|X1|^0.5
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
|X1|^0.5 1 1368250291.0205469000 1368250291.0205469000 1325230928.3527293000
error 99999998 103246176.5253460400 1.0324617859
total 99999999 1471496467.5458930000
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
----------------------------------------------------------------------------------
intercept -5.6489811404 0.0002489346 -22692.62692 0.00000
slpoe 8.6404656140 0.0002373512 36403.72135 0.00000
----------------------------------------------------------------------------------
MSE=1.0324617859 , R2=0.929836 , R2(adj)=0.929836
|X1|^0.5(mean)= 0.9574539774, |X1|^0.5(variance)= 0.1832699499, X1|^0.5(s.d.)= 0.4281003970
SS(|X1|^0.5)= 18326994.8069597260 , SS(X2*|X1|^0.5)=158353768.4368600500,
C.V.= 0.3872533389

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.67139 -1.30223 -1.05311 -0.85516 -0.68533 -0.53282
-0.39149 -0.25766 -0.12764 -0.00023 0.12745 0.25740 0.39149 0.53282
0.68528 0.85510 1.05307 1.30213 1.67134
upper limit -1.67139 -1.30223 -1.05311 -0.85516 -0.68533 -0.53282 -0.39149
-0.25766 -0.12764 -0.00023 0.12745 0.25740 0.39149 0.53282 0.68528
0.85510 1.05307 1.30213 1.67134
observed no 4952589.00000 4994905.00000 5008902.00000 5015253.00000 5014639.00000 5013823.00000
5020860.00000 5012331.00000 5025278.00000 5003833.00000 5015023.00000 5017030.00000
159
5005753.00000 5001931.00000 4995232.00000 4993472.00000 4981832.00000 4968200.00000
4960777.00000 4998337.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 449.56058 5.19180 15.84912 46.53080 42.86006 38.21507 87.02792
30.41071 127.79546 2.93838 45.13811 58.00418 6.61940 0.74575 4.54676
8.52296 66.01524 202.24800 307.68875 0.55311
Z=1.556457, p-value=0.940300
Z=1.556457, p-value=0.059700
Z=1.556457, p-value=0.119400
t=2,3,...,100000000
D.W. test=2.000006
D.W. test=1.999994
The joint probability distribution X1 and The joint probability distribution X2
resiudal estimated line and X2
160
Variance : 1.03246
S.D. : 1.01610
MAD : 0.80963
Range : 18.46113
Mid_range : 3.55036
Median : -0.00182
Q1 : -0.68489
Q2 : -0.00182
Q3 : 0.68216
IQR : 1.36705
C.V. : none

(28.2.4)Checking the linear relationship of residual and X1.

Non-linear model analysis, |residual| is the dependent variable and X1 is the
independent variable.
|error|= 0.7960300619+ 0.0020507187*X1^3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1^3 1 317599.9286092636 317599.9286092636 849702.0398615686
error 99999998 37377799.1999416130 0.3737779995
total 99999999 37695399.1285508800
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 0.7960300619 0.0000628935 12656.79203 0.00000
slpoe 0.0020507187 0.0000022247 921.79284 0.00000
161
----------------------------------------------------------------------------------
MSE=0.3737779995 , R2=0.008425 , R2(adj)=0.008425
SS(X1^3)=75521081149.6118320000 , SS(|error|*X1^3)=154872495.8848765800,
C.V.= 0.7551234275

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.00565 -0.78353 -0.63364 -0.51454 -0.41235 -0.32059
-0.23556 -0.15503 -0.07680 -0.00014 0.07669 0.15488 0.23555 0.32059
0.41233 0.51450 0.63362 0.78347 1.00562
upper limit -1.00565 -0.78353 -0.63364 -0.51454 -0.41235 -0.32059 -0.23556
-0.15503 -0.07680 -0.00014 0.07669 0.15488 0.23555 0.32059 0.41233
0.51450 0.63362 0.78347 1.00562
observed no 62484.00000 1835676.00000 11733184.00000 9132501.00000 7596837.00000 6574549.00000
5840865.00000 5282767.00000 4881744.00000 4527529.00000 4272074.00000 4066003.00000
3895172.00000 3773531.00000 3691529.00000 3660998.00000 3703861.00000 3852534.00000
4278635.00000 7337527.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 4875812.85005 2002589.27540 9067153.35557 3415512.90300 1348712.48091 495840.91068
141410.78965 15991.43526 2796.89631 44645.76917 105975.25230 174470.07920 244128.98192
300845.24159 342419.27157 358585.27120 335995.26146 263335.64423 104073.49265 1092806.49515
H0: residualis random , H1: Increasing line or decreasing line Z=-2.199491, p-value=0.014000
Z=-2.199491, p-value=0.028000
The joint proabability distribution of X1 The joint proabability distribution of
and |residual| |residual| estimated line and |residual|
162
X 2 = -5.6489811404 + 8.6404656140 × X 1 is close to X 2 = 1 + 2( X 1 + log( X 1 ))
(
when X 1 ~ Shifted _ exponential λ X1 = 1, c X1 = 0.1 . )
Note:
( )
(i) X 1 ~ Shifted _ exponential λ X 1 = 1, c X 1 = 0.1 ,
let W1 = 1 + 2( X 1 + log( X 1 )), W2 = -5.6489811404 + 8.6404656140 X 1 and
the probability distribution,
f(w1) Coefficient
Variance : 13.71195
S.D. : 3.70297
MAD : 2.91641
Range : 46.88099
Median : 2.12241
Q1 : -0.11984
Q2 : 2.12241
Q3 : 4.76471
IQR : 4.88455
C.V. : 1.41127
f(w2) Coefficient
Variance : 13.68024
S.D. : 3.69868
MAD : 2.97408
Range : 34.74667
Median : 2.04549
Q1 : -0.26919
Q2 : 2.04549
Q3 : 4.88529
IQR : 5.15448
C.V. : 1.40947
f(w1,w2) f(w,w1)
E(W1)= 2.6236, Var(W1)= 13.7064, E(W2)= 2.6236, Var(W2)= 13.6746,

163
The comparison of distribution functions of W1 andW2, the SLLN method.
( )
(ii) X 1 ~ Beta α X 1 = 5, β X 1 = 5 ,
let W1 = 1 + 2( X 1 + log( X 1 )), W2 = -5.6489811404 + 8.6404656140 X 1 and the
probability distribution,
f(w1) Coefficient
Variance : 0.95557
S.D. : 0.97753
MAD : 0.77685
Range : 12.37201
Median : 0.61394
Q1 : -0.08908
Q2 : 0.61394
Q3 : 1.22121
IQR : 1.31029
C.V. : 1.92097
f(w2) Coefficient
Variance : 0.91912
S.D. : 0.95871
MAD : 0.77238
Range : 7.73477
Median : 0.46061
Q1 : -0.23948
Q2 : 0.46061
Q3 : 1.08852
IQR : 1.32800
C.V. : 2.48974
164
f(w1,w2) f(w,w1)
E(W1)= 0.5087, Var(W1)= 0.9556, E(W2)= 0.3850, Var(W2)= 0.9190,

(
(iii) X 1 ~ U _ quadratic a X 1 = 0.1, b X 1 = 10.1 , )
let W1 = 1 + 2( X 1 + log( X 1 )), W2 = -5.6489811404 + 8.6404656140 X 1 and the
f(w1) Coefficient
S.D. : 10.13152
MAD : 9.68774
Range : 29.23024
Median : 13.96007
Q1 : 3.50935
Q2 : 13.96007
Q3 : 23.54578
IQR : 20.03644
C.V. : 0.75841
165
f(w2) Coefficient
Variance : 74.17143
S.D. : 8.61228
MAD : 8.15813
Range : 24.72747
Mid_range : 9.44711
Median : 13.43310
Q1 : 3.54255
Q2 : 13.43310
Q3 : 20.37056
IQR : 16.82800
C.V. : 0.72617
f(w1,w2) f(w,w1)
E(W1)= 13.3593, Var(W1)= 102.6501, E(W2)= 11.8598, Var(W2)= 74.1726,

X 2 = -5.6489811404 + 8.6404656140 × X1 is closed to X 2 = 1 + 2( X 1 + log( X 1 ))

that is not always existed, the probability distribution of X1 is important factor.
166
5.7. The random vatiable range has a specific region and the three
basic assumptions are unchanged.
(
Example 29, X 1 ~ Normal µ X 1 = 2, σ X2 1 = 5 2 , )
( )
E X 2 x 1 = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0,σ 2 = 1 , ( )
− 20 ≤ X 1 X 2 ≤ 20 , X 2i = β 0 + β1 X 1i + ε i , i = 1,2,...., n ,
three basic assumptions
i) ε i ~ Normal distribution,ii) E (ε i ) = 0, Var (ε i ) = σ 2 ,

[ 1 ] -4.02385~ -3.21469 -3.61927 32.00000 0.0320000 0.0320000
[ 2 ] -3.21469~ -2.40553 -2.81011 100.00000 0.1000000 0.1320000
[ 3 ] -2.40553~ -1.59637 -2.00095 100.00000 0.1000000 0.2320000
[ 4 ] -1.59637~ -0.78721 -1.19179 112.00000 0.1120000 0.3440000
[ 5 ] -0.78721~ 0.02195 -0.38263 134.00000 0.1340000 0.4780000
[ 6 ] 0.02195~ 0.83111 0.42653 145.00000 0.1450000 0.6230000
[ 7 ] 0.83111~ 1.64027 1.23569 151.00000 0.1510000 0.7740000
[ 8 ] 1.64027~ 2.44943 2.04485 127.00000 0.1270000 0.9010000
[ 9 ] 2.44943~ 3.25859 2.85401 99.00000 0.0990000 1.0000000

[ 1 ] -6.66141~ -4.96588 -5.81365 59.00000 0.0590000 0.0590000
[ 2 ] -4.96588~ -3.27034 -4.11811 102.00000 0.1020000 0.1610000
[ 3 ] -3.27034~ -1.57481 -2.42258 127.00000 0.1270000 0.2880000
[ 4 ] -1.57481~ 0.12072 -0.72704 116.00000 0.1160000 0.4040000
[ 5 ] 0.12072~ 1.81626 0.96849 130.00000 0.1300000 0.5340000
[ 6 ] 1.81626~ 3.51179 2.66403 170.00000 0.1700000 0.7040000
[ 7 ] 3.51179~ 5.20733 4.35956 146.00000 0.1460000 0.8500000
[ 8 ] 5.20733~ 6.90286 6.05509 133.00000 0.1330000 0.9830000
[ 9 ] 6.90286~ 8.59840 7.75063 17.00000 0.0170000 1.0000000
167
(29.1.2) The linear mdoel analysis
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 12444.1963216948 12444.1963216948 13576.3568375690
error 998 914.7747129542 0.9166079288
total 999 13358.9710346490
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
----------------------------------------------------------------------------------
intercept 0.9414462120 0.0302768708 31.09457 0.00000
slpoe 1.9396800517 0.0166470957 116.51762 0.00000
----------------------------------------------------------------------------------
MSE=0.9166079288 , R2=0.931524 , R2(adj)=0.931455
SSX1=3307.5518056979 , SS(X2*X1)= 6415.5922574834, C.V.= 0.9823454269

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.22700 -0.80576 -0.50204 -0.24252 0.00002 0.24253
0.50204 0.80536 1.22690
upper limit -1.22700 -0.80576 -0.50204 -0.24252 0.00002 0.24253 0.50204
0.80536 1.22690
observed no 103.00000 95.00000 115.00000 96.00000 88.00000 86.00000 108.00000
101.00000 109.00000 99.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.09000 0.25000 2.25000 0.16000 1.44000 1.96000 0.64000
0.01000 0.81000 0.01000
degree of freedom=8
p-value=0.471400
Z=0.127698, p-value=0.550800
Z=0.127698, p-value=0.449200
Z=0.127698, p-value=0.898400
168
t=2,3,...,1000
D.W. test=2.060949
D.W. test=1.939051

[ 1 ] -2.80648~ -2.13679 -2.47164 8.00000 0.0080000 0.0080000
[ 2 ] -2.13679~ -1.46710 -1.80195 60.00000 0.0600000 0.0680000
[ 3 ] -1.46710~ -0.79742 -1.13226 133.00000 0.1330000 0.2010000
[ 4 ] -0.79742~ -0.12773 -0.46257 244.00000 0.2440000 0.4450000
[ 5 ] -0.12773~ 0.54196 0.20712 263.00000 0.2630000 0.7080000
[ 6 ] 0.54196~ 1.21165 0.87681 192.00000 0.1920000 0.9000000
[ 7 ] 1.21165~ 1.88134 1.54650 77.00000 0.0770000 0.9770000
[ 8 ] 1.88134~ 2.55103 2.21619 20.00000 0.0200000 0.9970000
[ 9 ] 2.55103~ 3.22072 2.88588 3.00000 0.0030000 1.0000000
− 20 ≤ X 1 X 2 ≤ 20 cannot be displayed.
169
(29.2) )n = 100,000,000, it is big data.
f(x1,x2) f(x2,x1)

− 20 ≤ X 1 X 2 ≤ 20 will be shown from the red region.

Variance : 3.19427
S.D. : 1.78725
MAD : 1.52915
Range : 9.04820
Median : 0.15951
Q1 : -1.40831
Q2 : 0.15951
Q3 : 1.56225
IQR : 2.97057
C.V. : 45.19706
Curve-fittinge estimated the distribution function of X1.

F(X)=0.02150256097689234500+
0.07571045405081912300*(X--3.25954375288234570000)^1+
0.07272183016895894500*(X--3.25954375288234570000)^2+
170
-0.00966084117124410560*(X--3.25954375288234570000)^3+
-0.02777743313523883800*(X--3.25954375288234570000)^4+
value range 0.0000000000<=F(x)<= 0.0500000000 ,
value range -4.8603737625<=X<= -2.9598606193 ,
determination=0.999882158334783560,

F(X)= 0.07478304467283711200+
0.11674697242520826000*(X--2.74081161824866190000)^1+
0.01407554235612465400*(X--2.74081161824866190000)^2+
-0.00943186019386033080*(X--2.74081161824866190000)^3+
value range 0.0500000100<=F(x)<= 0.1000000000 ,
value range -2.9598602915<=X<= -2.5294714498 ,
determination=0.999999847310703130,

F(X)= 0.12486292933792702000+
0.12598965111305727000*(X--2.32894794725935570000)^1+
0.01045955493380026900*(X--2.32894794725935570000)^2+
0.00294612519102699370*(X--2.32894794725935570000)^3+
value range 0.1000000100<=F(x)<= 0.1500000000 ,
value range -2.5294713561<=X<= -2.1327857494 ,
determination=0.999999968028108750,

F(X)= 0.17487453218567106000+
0.13442303666364219000*(X--1.94477995281030600000)^1+
0.01086701623858131500*(X--1.94477995281030600000)^2+
-0.00221760690530814490*(X--1.94477995281030600000)^3+
value range 0.1500000100<=F(x)<= 0.2000000000 ,
value range -2.1327857286<=X<= -1.7605588954 ,
determination=0.999999936485638900,

F(X)= 0.22488979533061662000+
0.14201488988872968000*(X--1.58289612487925170000)^1+
0.01066513856696529900*(X--1.58289612487925170000)^2+
-0.00032128490556715406*(X--1.58289612487925170000)^3+
value range 0.2000000100<=F(x)<= 0.2500000000 ,
value range -1.7605588542<=X<= -1.4083132148 ,
determination=0.999999920971001540,

F(X)= 0.27490929753485005000+
0.14882172845348174000*(X--1.23905875416191490000)^1+
0.00963761273382965360*(X--1.23905875416191490000)^2+
-0.00108519001485518630*(X--1.23905875416191490000)^3+
value range 0.2500000100<=F(x)<= 0.3000000000 ,
value range -1.4083131996<=X<= -1.0721952973 ,
determination=0.999999961832651610,

F(X)= 0.32492203069611247000+
0.15484848236423887000*(X--0.90966334524730819000)^1+
171
0.00897070770342134340*(X--0.90966334524730819000)^2+
-0.00088349228864359475*(X--0.90966334524730819000)^3+
value range 0.3000000100<=F(x)<= 0.3500000000 ,
value range -1.0721952808<=X<= -0.7492054344 ,
determination=0.999999965671563910,

F(X)= 0.37493027527442263000+
0.16049303991618935000*(X--0.59254351726793575000)^1+
0.00862211629439160740*(X--0.59254351726793575000)^2+
0.00063053917637034829*(X--0.59254351726793575000)^3+
value range 0.3500000100<=F(x)<= 0.4000000000 ,
value range -0.7492052251<=X<= -0.4376245025 ,
determination=0.999999948707625430,

F(X)= 0.42494265459421832000+
0.16562761429271977000*(X--0.28580765700339150000)^1+
0.00752905143603747880*(X--0.28580765700339150000)^2+
-0.01063516795630192700*(X--0.28580765700339150000)^3+
value range 0.4000000100<=F(x)<= 0.4500000000 ,
value range -0.4376243779<=X<= -0.1353684426 ,
determination=0.999999957049249380,

F(X)= 0.47494912421196073000+
0.16957945004808606000*(X-0.01262669944079152100)^1+
0.00702224845658470930*(X-0.01262669944079152100)^2+
-0.00039289997154412504*(X-0.01262669944079152100)^3+
value range 0.4500000100<=F(x)<= 0.5000000000 ,
value range -0.1353684171<=X<= 0.1595057580 ,
determination=0.999999963330626910,

F(X)= 0.52495630321746722000+
0.17312328337482563000*(X-0.30437094089057731000)^1+
0.00629115554352557840*(X-0.30437094089057731000)^2+
0.00282157844485197980*(X-0.30437094089057731000)^3+
value range 0.5000000100<=F(x)<= 0.5500000000 ,
value range 0.1595058168<=X<= 0.4482560786 ,
determination=0.999999951008708640,

F(X)= 0.57495777986914842000+
0.17642914820098832000*(X-0.59044332521383924000)^1+
0.00630790751640145090*(X-0.59044332521383924000)^2+
-0.00053354438026076423*(X-0.59044332521383924000)^3+
value range 0.5500000100<=F(x)<= 0.6000000000 ,
value range 0.4482561353<=X<= 0.7317210956 ,
determination=0.999999948414172390,

F(X)= 0.62497350098674365000+
0.17885136299567361000*(X-0.87179656701912989000)^1+
172
0.00406936298207664220*(X-0.87179656701912989000)^2+
-0.00005390853473841162*(X-0.87179656701912989000)^3+
value range 0.6000000100<=F(x)<= 0.6500000000 ,
value range 0.7317211982<=X<= 1.0112597272 ,
determination=0.999999959820099700,

F(X)= 0.67497825224553087000+
0.18073467749479166000*(X-1.14978443920360100000)^1+
0.00341307503136645960*(X-1.14978443920360100000)^2+
0.00338829896797676610*(X-1.14978443920360100000)^3+
value range 0.6500000100<=F(x)<= 0.7000000000 ,
value range 1.0112598445<=X<= 1.2878318082 ,
determination=0.999999963749377390,

F(X)= 0.72498610487027615000+
0.18214660905347091000*(X-1.42517950828268390000)^1+
0.00221491636147068750*(X-1.42517950828268390000)^2+
0.00285912975276403360*(X-1.42517950828268390000)^3+
value range 0.7000000100<=F(x)<= 0.7500000000 ,
value range 1.2878318485<=X<= 1.5622542580 ,
determination=0.999999970381359240,

F(X)= 0.77498885869378364000+
0.18319914858895772000*(X-1.69882913153400270000)^1+
0.00179563518923941960*(X-1.69882913153400270000)^2+
value range 0.7500000100<=F(x)<= 0.8000000000 ,
value range 1.5622543417<=X<= 1.8352286879 ,
determination=0.999999919779390070,

F(X)= 0.82499818453289353000+
0.18339009971025438000*(X-1.97150092976767820000)^1+
0.00029388396529839156*(X-1.97150092976767820000)^2+
value range 0.8000000100<=F(x)<= 0.8500000000 ,
value range 1.8352288145<=X<= 2.1077473810 ,
determination=0.999999841915244270,

F(X)= 0.87501503253528190000+
0.18308337802132899000*(X-2.24415941760612240000)^1+
-0.00241782564327763790*(X-2.24415941760612240000)^2+
value range 0.8500000100<=F(x)<= 0.9000000000 ,
value range 2.1077473999<=X<= 2.3809094046 ,
determination=0.999999919496346920,

F(X)= 0.92524210660741835000+
0.17573599284312780000*(X-2.52109209632405620000)^1+
-0.03584864970272860800*(X-2.52109209632405620000)^2+
value range 0.9000000100<=F(x)<= 0.9500000000 ,
value range 2.3809094429<=X<= 2.6670546078 ,
173
determination=0.999991820409337540,

F(X)= 0.97902744954306431000+
0.09973267048488576600*(X-2.88514166212198650000)^1+
-0.15018903223799640000*(X-2.88514166212198650000)^2+
0.05117263305368169300*(X-2.88514166212198650000)^3+
0.02539734567571372300*(X- 2.88514166212198650000)^4+
value range 0.9500000100<=F(x)<= 0.9999999900 ,
value range 2.6670546432<=X<= 4.1878227121 ,
determination=0.999867302864060340
(29.2.1.3) X2 marginal probability distribution,,

Variance : 12.91683
S.D. : 3.59400
MAD : 3.07469
Range : 18.34687
Mid_range : 0.63804
Median : 1.30079
Q1 : -1.85051
Q2 : 1.30079
Q3 : 4.12000
IQR : 5.97051
C.V. : 3.37498
Curve-fittinge estimated the distribution function of X2.

F(X)= 0.02175347565755944600+
0.04072892905985584000*(X--5.53346245383860500000)^1+
0.01924508367471149800*(X--5.53346245383860500000)^2+
-0.00351525917552150680*(X--5.53346245383860500000)^3+
-0.00289002825674628680*(X- -5.53346245383860500000)^4+
value range 0.0000000000<=F(x)<= 0.0500000000 ,
value range -8.5353935834<=X<= -4.9647750123 ,

determination=0.999863441522931720,

F(X)= 0.07481514302297538600+
0.05823153443428114000*(X--4.52769205809543520000)^1+
0.00299546875977874800*(X--4.52769205809543520000)^2+
-0.00059836146060154860*(X--4.52769205809543520000)^3+
value range 0.0500000100<=F(x)<= 0.1000000000 ,
value range -4.9647748989<=X<= -4.1039338754 ,
174
determination=0.999999840114692780,

F(X)= 0.12485287712296113000+
0.06277503055171190800*(X--3.70062258673838060000)^1+
0.00277840692442114100*(X- -3.70062258673838060000)^2+
-0.00024259933244108467*(X--3.70062258673838060000)^3+
value range 0.1000000100<=F(x)<= 0.1500000000 ,
value range -4.1039338088<=X<= -3.3066371265 ,
determination=0.999999919778435720,

F(X)= 0.17487185898294874000+
0.06686272929640757500*(X--2.92873612788151010000)^1+
0.00274608469224995460*(X--2.92873612788151010000)^2+
-0.00025480570551428272*(X--2.92873612788151010000)^3+
value range 0.1500000100<=F(x)<= 0.2000000000 ,
value range -3.3066371171<=X<= -2.5583164354 ,
determination=0.999999907404609870,

F(X)= 0.22489142992094394000+
0.07071534216687647100*(X--2.20143924624777300000)^1+
0.00260148698545606060*(X--2.20143924624777300000)^2+
-0.00044103961259356339*(X--2.20143924624777300000)^3+
value range 0.2000000100<=F(x)<= 0.2500000000 ,
value range -2.5583161798<=X<= -1.8505089733 ,
determination=0.999999950236933330,

F(X)= 0.27490877935026797000+
0.07402177829927930600*(X--1.51026242934677350000)^1+
0.00239923948460180060*(X- -1.51026242934677350000)^2+
0.00004911720134792574*(X- -1.51026242934677350000)^3+
value range 0.2500000100<=F(x)<= 0.3000000000 ,
value range -1.8505086932<=X<= -1.1750090251 ,
determination=0.999999970270712630,

F(X)= 0.32492274262864584000+
0.07700751917784720600*(X--0.84862875444311758000)^1+
0.00220318038452807160*(X--0.84862875444311758000)^2+
0.00069482591864300502*(X- -0.84862875444311758000)^3+
value range 0.3000000100<=F(x)<= 0.3500000000 ,
value range -1.1750090060<=X<= -0.5262250127 ,
determination=0.999999859900271290,

F(X)= 0.37493230081559237000+
0.07981515175817788200*(X--0.21126293385662004000)^1+
0.00207035909772879460*(X--0.21126293385662004000)^2+
0.00005283539801936854*(X--0.21126293385662004000)^3+
value range 0.3500000100<=F(x)<= 0.4000000000 ,
value range -0.5262247982<=X<= 0.1002555494 ,
175
determination=0.999999875576865980,

F(X)= 0.42494320952390541000+
0.08222475211874386000*(X-0.40574711079491915000)^1+
0.00184338410032188620*(X- 0.40574711079491915000)^2+
0.00008456936152434480*(X-0.40574711079491915000)^3+
value range 0.4000000100<=F(x)<= 0.4500000000 ,
value range 0.1002564532<=X<= 0.7084289782 ,
determination=0.999999928415619800,

F(X)= 0.47495203376572115000+
0.08449088316318120700*(X-1.00579349057888030000)^1+
0.00164065659721846290*(X-1.00579349057888030000)^2+
-0.00089220513309307137*(X-1.00579349057888030000)^3+
value range 0.4500000100<=F(x)<= 0.5000000000 ,
value range 0.7084290260<=X<= 1.3007913426 ,
determination=0.999999941975005520,

F(X)= 0.52496093640774300000+
0.08623220260602509900*(X-1.59168485349992260000)^1+
0.00139428026231337710*(X-1.59168485349992260000)^2+
-0.00005335579051912731*(X-1.59168485349992260000)^3+
value range 0.5000000100<=F(x)<= 0.5500000000 ,
value range 1.3007916411<=X<= 1.8807958179 ,
determination=0.999999932905818900,

F(X)= 0.57496909345948743000+
0.08775999513172438900*(X-2.16643204516275520000)^1+
0.00114281435264685160*(X-2.16643204516275520000)^2+
0.00003552487874536325*(X-2.16643204516275520000)^3+
value range 0.5500000100<=F(x)<= 0.6000000000 ,
value range 1.8807959353<=X<= 2.4507551910 ,
determination=0.999999873520563300,

F(X)= 0.62497420521230485000+
0.08905962592707011800*(X-2.73221011867731530000)^1+
0.00097981691344984842*(X-2.73221011867731530000)^2+
-0.00138627019400949790*(X-2.73221011867731530000)^3+
value range 0.6000000100<=F(x)<= 0.6500000000 ,
value range 2.4507552324<=X<= 3.0126379487 ,
determination=0.999999922996329120,

F(X)= 0.67498088946827639000+
0.08994090313006462800*(X-3.29092941336564900000)^1+
0.00074223101817949555*(X-3.29092941336564900000)^2+
value range 0.6500000100<=F(x)<= 0.7000000000 ,
value range 3.0126379689<=X<= 3.5684927820 ,
determination=0.999999912857674870,
176
F(X)= 0.72498905470022679000+
0.09069476539588389200*(X-3.84448233551434180000)^1+
0.00043206523340668344*(X-3.84448233551434180000)^2+
-0.00038638921889067035*(X-3.84448233551434180000)^3+
value range 0.7000000100<=F(x)<= 0.7500000000 ,
value range 3.5684929476<=X<= 4.1200041420 ,
determination=0.999999924307723890,

F(X)= 0.77499373675573036000+
0.09110847226714459400*(X-4.39455295215612730000)^1+
0.00024974936509636336*(X-4.39455295215612730000)^2+
value range 0.7500000100<=F(x)<= 0.8000000000 ,
value range 4.1200041479<=X<= 4.6688006486 ,
determination=0.999999911355433090,

F(X)= 0.82499397034583710000+
0.09127737790505584300*(X-4.94278804125783290000)^1+
0.00024133463643494224*(X- 4.94278804125783290000)^2+
value range 0.8000000100<=F(x)<= 0.8500000000 ,
value range 4.6688006817<=X<= 5.2166063602 ,
determination=0.999999946990998370,

F(X)= 0.87502394091367797000+
0.09083562039924526800*(X-5.49141579353442480000)^1+
-0.00094797963428305820*(X-5.49141579353442480000)^2+
value range 0.8500000100<=F(x)<= 0.9000000000 ,
value range 5.2166063823<=X<= 5.7675057509 ,
determination=0.999999617319049290,

F(X)= 0.92535080075317566000+
0.08487222988563192200*(X-6.05566092463653320000)^1+
-0.01210036504667755300*(X-6.05566092463653320000)^2+
value range 0.9000000100<=F(x)<= 0.9500000000 ,
value range 5.7675059675<=X<= 6.3608165675 ,
determination=0.999991410509987410,

F(X)= 0.97919172777950370000+
0.04482610156181288100*(X-6.83829726533906520000)^1+
-0.03227003655580329400*(X-6.83829726533906520000)^2+
0.00750804180199526880*(X-6.83829726533906520000)^3+
value range 0.9500000100<=F(x)<= 0.9999999900 ,
value range 6.3608165730<=X<= 9.8114715002 ,
determination=0.999879040437503530
177
(29.2.2)The linear model analysis,

ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 1192798701.2153571000 1192798701.2153571000 1206253876.4068592000
error 99999998 98884546.6687695980 0.9888454865
total 99999999 1291683247.8841267000
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
----------------------------------------------------------------------------------
intercept 0.9884801893 0.0000994650 9937.96534 0.00000
slpoe 1.9324034432 0.0000556389 34731.16578 0.00000
----------------------------------------------------------------------------------
MSE=0.9888454865 , R2=0.923445 , R2(adj)=0.923445
SSX1=319426948.2025855200 , SS(X2*X1)=617261734.5577393800,
C.V.=0.9338083006

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.63571 -1.27443 -1.03063 -0.83691 -0.67069 -0.52144
-0.38313 -0.25216 -0.12491 -0.00023 0.12473 0.25191 0.38313 0.52145
0.67065 0.83684 1.03059 1.27433 1.63566
upper limit -1.63571 -1.27443 -1.03063 -0.83691 -0.67069 -0.52144 -0.38313
-0.25216 -0.12491 -0.00023 0.12473 0.25191 0.38313 0.52145 0.67065
0.83684 1.03059 1.27433 1.63566
observed no 4998283.00000 5000923.00000 5002985.00000 4998424.00000 5000232.00000 4999233.00000
4999750.00000 4993752.00000 5011951.00000 4991026.00000 4998906.00000 5007057.00000
4997762.00000 4999668.00000 4998405.00000 5001006.00000 5001972.00000 4998895.00000
4999979.00000 4999791.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.58962 0.17039 1.78205 0.49676 0.01076 0.11766 0.01250
7.80750 28.56528 16.10654 0.23937 9.96025 1.00173 0.02204 0.50880
0.20241 0.77776 0.24421 0.00009 0.00874
p-value=0.000000
178
Z=0.234327, p-value=0.592700
Z=0.234327, p-value=0.407300
Z=0.234327, p-value=0.814600
The joint probability distribution of The joint probability distribution of X2
X1and residual estimated line and X

Variance : 0.98885
S.D. : 0.99441
MAD : 0.79341
Range : 11.27247
Median : -0.00014
Q1 : -0.67072
Q2 : -0.00014
Q3 : 0.67071
IQR : 1.34142
C.V. : none
179
SLLN analysis, X0=residual and Normal(0, 0.98885),
Note:X1~ Normal(0, 0.98885), X1 is representable code of Normal(0, 0.98885),
Note:
(
Case 1, X 1 ~ Normal µ X 1 = 2, σ X2 1 = 5 2 ,)
( )
the population conditional expectation line is E X 2 x 1 = β 0 + β1 x1 = 1 + 2 x1 ,
ε ~ Normal (0,σ 2 = 1),
f X1 ( x1 ) Coefficient
Variance : 25.00207
S.D. : 5.00021
MAD : 3.98942
Range : 55.92611
Mid_range : 2.64690
Median : 2.00030
Q1 : -1.37190
Q2 : 2.00030
Q3 : 5.37332
IQR : 6.74521
C.V. : 2.49941
f X 2 (x2 ) Coefficient
S.D. : 10.05039
MAD : 8.01866
Range : 112.96368
Mid_range : 5.47472
Median : 5.00042
Q1 : -1.77931
Q2 : 5.00042
Q3 : 11.77619
IQR : 13.55550
C.V. : 2.01031
180
f X1 , X 2 ( x1 , x2 ) f X 2 , X1 ( x2 , x1 )
E(X1)= 2.0000, Var(X1)= 24.9980, E(X2)= 4.9999, Var(X2)= 100.9887,

fW1 (w1 ),W1 = ε , Coefficient
Variance : 0.99988
S.D. : 0.99994
MAD : 0.79782
Range : 11.12671
Median : -0.00009
Q1 : -0.67434
Q2 : -0.00009
Q3 : 0.67457
IQR : 1.34891
C.V. : none
Case 2,
( )
X 1 ~ Normal µ X 1 = 2, σ X2 1 = 5 2 , the population conditional expectation line is
( ) ( )
E X 2 x 1 = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0,σ 2 = 1 , − 20 ≤ X 1 X 2 ≤ 20 ,
P(− 20 ≤ X 1 X 2 ≤ 20 ) = 0.4349,
f X1 (x1 − 20 ≤ X 1 X 2 ≤ 20 ) Coefficient
Variance : 3.19544
S.D. : 1.78758
MAD : 1.52948
Range : 9.09938
Median : 0.15974
Q1 : -1.40847
Q2 : 0.15974
Q3 : 1.56271
IQR : 2.97118
C.V. : 45.13446
181
f X 2 (x2 − 20 ≤ X 1 X 2 ≤ 20 ) Coefficient
Variance : 12.92068
S.D. : 3.59453
MAD : 3.07533
Range : 18.15173
Mid_range : 0.91494
Median : 1.30158
Q1 : -1.85081
Q2 : 1.30158
Q3 : 4.12148
IQR : 5.97229
C.V. : 3.37416
f X1 , X 2 (x1 , x2 − 20 ≤ X 1 X 2 ≤ 20 ) f X 2 , X1 (x2 , x1 − 20 ≤ X 1 X 2 ≤ 20 )
E(X1)= 0.0397, Var(X1)= 3.1942, E(X2)= 1.0651, Var(X2)= 12.9168,

fW1 (w1 − 20 ≤ X 1 X 2 ≤ 20 ),W1 = ε , Coefficient

Variance : 1.00335
S.D. : 1.00167
MAD : 0.79922
Range : 11.36519
Median : -0.01420
Q1 : -0.68985
Q2 : -0.01420
Q3 : 0.66145
IQR : 1.35130
C.V. : none
182
Case 3,
( )
X 1 ~ Normal µ X 1 = 2, σ X2 1 = 5 2 , the population conditional expectation line is
( ) ( )
E X 2 x 1 = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0,σ 2 = 1 , 50 ≤ X 12 + X 22 ≤ 200 ,
P (50 ≤ X 2
)
+ X 22 ≤ 200 = 0.3164,
( )
1
f X1 x1 5 ≤ X 12 + X 22 ≤ 20 Coefficient
Variance : 18.17945
S.D. : 4.26374
MAD : 3.78165
Range : 16.35907
Median : 3.58329
Q1 : -3.81694
Q2 : 3.58329
Q3 : 4.67325
IQR : 8.49018
C.V. : 2.72916
(
f X 2 x2 50 ≤ X 12 + X 22 ≤ 200 ) Coefficient
Variance : 73.26312
S.D. : 8.55939
MAD : 7.62476
Range : 26.79517
Mid_range : 0.11570
Median : 8.18657
Q1 : -6.79991
Q2 : 8.18657
Q3 : 10.37197
IQR : 17.17188
C.V. : 2.07827
(
f X1 , X 2 x1 , x2 50 ≤ X 12 + X 22 ≤ 200 ) (
f X 2 , X1 x2 , x1 50 ≤ X 12 + X 22 ≤ 200 )
E(X1)= 1.5623, Var(X1)= 18.1791, E(X2)= 4.1210, Var(X2)= 73.2476,

183
E ( X 2 x1 ),50 ≤ X 12 + X 22 ≤ 200 E ( X 1 x2 ),50 ≤ X 12 + X 22 ≤ 200
( )
fW1 w1 50 ≤ X 12 + X 22 ≤ 200 ,W1 = ε , Coefficient
Variance : 0.99984
S.D. : 0.99992
MAD : 0.79780
Range : 11.19210
Mid_range : 0.04738
Median : 0.00024
Q1 : -0.67411
Q2 : 0.00024
Q3 : 0.67464
IQR : 1.34875
C.V. : none
184
5.8. The 3th basic assumptionis modified, error has the Durbin
Watson the first order autoregressive error model.
Example 30, Durbin Watson model

(
X 1 ~ Normal µ X1 = 2, σ X2 1 = 5 2 , )
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 ,
µ ~ Normal (0, σ 2 = 1), there are n paired samples, T=n.
X 2t = β 0 + β1 X 1t + ε t , t = 1,2,...., T ,
β 0 is intercept, β1 is slope, ε i is error,
ε t = ρε t −1 + µ t , t = 1,2,3,...., T , ε 0 = 0, ρ < 1, let ρ =0.5.
The three basic assumptions are
i) µt ~Normal distribution,ii) E (µ t ) = 0, Var (µ t ) = σ 2 ,
iii) µ1 ,..., µ T are independently.
( X 1 , X 2 )scatter diagram (residual(t-1),residual(t)) scatter diagram

[ 1 ] -8.81688~ -5.29292 -7.05490 4.00000 0.0404040 0.0404040
[ 2 ] -5.29292~ -1.76895 -3.53093 7.00000 0.0707071 0.1111111
[ 3 ] -1.76895~ 1.75502 -0.00697 24.00000 0.2424242 0.3535354
[ 4 ] 1.75502~ 5.27898 3.51700 39.00000 0.3939394 0.7474747
[ 5 ] 5.27898~ 8.80295 7.04097 22.00000 0.2222222 0.9696970
[ 6 ] 8.80295~ 12.32692 10.56493 3.00000 0.0303030 1.0000000

[ 1 ] -16.77622~ -9.34386 -13.06004 3.00000 0.0300000 0.0300000
[ 2 ] -9.34386~ -1.91149 -5.62768 10.00000 0.1000000 0.1300000
[ 3 ] -1.91149~ 5.52087 1.80469 25.00000 0.2500000 0.3800000
[ 4 ] 5.52087~ 12.95324 9.23706 43.00000 0.4300000 0.8100000
[ 5 ] 12.95324~ 20.38560 16.66942 17.00000 0.1700000 0.9800000
185
[ 6 ] 20.38560~ 27.81797 24.10179 2.00000 0.0200000 1.0000000
(30.1.2)The linear model analysis

ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 6077.9615303431 6077.9615303431 4114.3059777280
error 98 144.7729539801 1.4772750406
total 99 6222.7344843232
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0

----------------------------------------------------------------------------------
intercept 0.9796787996 0.1482308562 6.60914 0.00000
slpoe 2.0299052130 0.0316466297 64.14286 0.00000
----------------------------------------------------------------------------------
MSE=1.4772750406 , R2=0.976735 , R2(adj)=0.976497
SSX1=1475.0489378756 , SS(X2*X1)= 2994.2095283700, C.V.= 0.1892535068
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
lower limit -1.29785 -0.68789 -0.21895 0.21870 0.68753 1.29725
upper limit -1.29785 -0.68789 -0.21895 0.21870 0.68753 1.29725
observed no 14.00000 15.00000 14.00000 21.00000 8.00000 12.00000 16.00000
probability 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
expected no 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571
chi square 0.00571 0.03571 0.00571 3.15571 2.76571 0.36571 0.20571
degree of freedom=5
p-value=0.257100
Z=-2.989958, p-value=0.001400
Z=-2.989958, p-value=0.998600
Z=-2.989958, p-value=0.002800
t=2,3,...,100
e(t)~Normal(0,sigma*sigma),
186
D.W. test=0.859603
Z=6.137602, p-value=0.000000
Z=6.137602, p-value=1.000000
H0: auto correlation coefficient=0 , H1:against H0
Z=6.137602, p-value=0.000000
(C.L.T. can be applied when Durbin Watson test statistic),
H0:Variances are equal
The test statistic=Max(each residual*residual)/SSE
p value=0.197109
90% confidence interval for population variance
[1.185621 , 1.900812]
[1.088862 , 1.378699]
[1.137383 , 1.996875]
[1.066482 , 1.413108]
[1.050533 , 2.203873]
[1.024955 , 1.484545]

[ 1 ] -2.95462~ -1.99689 -2.47576 4.00000 0.0400000 0.0400000
[ 2 ] -1.99689~ -1.03916 -1.51803 19.00000 0.1900000 0.2300000
[ 3 ] -1.03916~ -0.08143 -0.56030 28.00000 0.2800000 0.5100000
[ 4 ] -0.08143~ 0.87630 0.39744 24.00000 0.2400000 0.7500000
[ 5 ] 0.87630~ 1.83403 1.35517 15.00000 0.1500000 0.9000000
[ 6 ] 1.83403~ 2.79176 2.31290 10.00000 0.1000000 1.0000000
187
(30.1.4) Drubin Watson analysis
D.W. test=0.859603
Z=6.137602, p-value=0.000000,
Ho:auto corealtion coefficient is 0 will be rejected, (see 30.1.2).
The Durbin Watson model analysis
[ The Durbin Watson information ]
The Durbin Watson Model
Y(t)=b0+b1*X(1,t) +error(t),t=1,..,100,
error(t+1)=rho*error(t)+mu(t+1),t=1,...,99,
mu(1),...,mu(100) are iid, error(1)=mu(1),
E(mu(t))=0,Var(mu(t))=1.000000,t=1,..,100,
The probability distribution of mu(t) are Normal distribution(the probability distribution),
t=1,...,100,
--- The sample size=100,lag=1,sigma=1.000000(variance is known),
--- independent variable number=1,
[ Durbin Watson test statistic ]
H0: auto correlation coefficient=0.500000

Durbin Watson test value=0.859603
P(Durbin Watson test statistic<=test value=0.859603)=15.4195547%
H1:auto correlation coefficient>0.500000 , p value=15.4195547%
H1:auto correlation coefficient is not 0.500000 , p value=30.8391094%
auto correlation coefficient ρ =0.5,
[ The Durbin Watson information ]

The Durbin Watson Model
Y(t)=b0+b1*X(1,t) +error(t),t=1,..,100,
error(t+1)=rho*error(t)+mu(t+1),t=1,...,99,
mu(1),..,mu(100) are iid, error(1)=mu(1),
E(mu(t))=0,Var(mu(t))=1.000000,t=1,..,100,
The probability distribution of mu(t) are Normal distribution(the probability distribution),
t=1,...,100,
--- The sample size=100,lag=1,sigma=1.000000(variance is known),
estimated rho=0.595000, it is the point estimator.
--- independent variable number=1,
Simulating the sampling distribution of estimated regressor coefficient(s),
each has 10000
-------------------------------------------------------------------------
The DW test= 0.8596025941
The H0:auto correlation coefficient=0.595000,
H1:auto correlation coefficient is not equal 0.595000
The p value=0.498300
==== the following result from the sampl test value =====
-----------The Durbin Watson model-------------
95% C.I. for auto correlation coefficient
0.425000<=auto correlation coefficient<=0.770000
99% C.I. for auto correlation coefficient
0.370000<=auto correlation coefficient<=0.830000
---------------end--------------------
The variance estimated value= 1.0188720780
------------- The regression coefficient test by Durbin Watson model
The population parameters b0,b1 are 0
H0:b0=0, b0 estimated value=0.9796787996, S(b0)= 0.1231027536,
test value=7.9582200331, p value=0.0000000000
H0:b1=0, b1 estimated value=2.0299052130, S(b1)= 0.0262818914,
test value=77.2358878292, p value= 0.0000000000
188
Xˆ 2t = 0.9796787996 + 2.0299052130 × X 1t + εˆt , t = 1,2,....,100,
εˆt = 0.595 × εˆt −1 + µˆ t , t = 1,2,....,100, εˆ0 = 0,
µ (sample mean)=0, µ (sample variance)=1.0188720780,
(30.2) n = 100,000,000, it is big data and the Durbin Watson the first order
autoregressive error model will be applied.
(30.2.1) Basiec analysis,
(30.2.1.1) X1 and X2 joint probability distribution when the auto correlation
coefficient is 0.
f(x1,x2) f(x2,x1)


Variance : 25.00086
S.D. : 5.00009
MAD : 3.98943
Range : 57.25336
Mid_range : 3.34852
Median : 1.99989
Q1 : -1.37235
Q2 : 1.99989
Q3 : 5.37215
IQR : 6.74450
C.V. : 2.50025
189
S.D. : 10.06677
MAD : 8.03195
Range : 113.33610
Mid_range : 7.23542
Median : 4.99982
Q1 : -1.78978
Q2 : 4.99982
Q3 : 11.79044
IQR : 13.58022
C.V. : 2.01346

ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 10000645868.9021430000 10000645868.9021430000 7500537833.5163288000
error 99999998 133332380.8354263300 1.3333238350
total 99999999 10133978249.7375700000
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
----------------------------------------------------------------------------------
intercept 1.0000236578 0.0001243629 8041.17560 0.00000
slpoe 2.0000302021 0.0000230935 86605.64551 0.00000
----------------------------------------------------------------------------------
MSE=1.3333238350 , R2=0.986843 , R2(adj)=0.986843
SSX1=2500085958.7255616000 , SS(X2*X1)=5000247425.3809719000,
C.V.= 0.2309510036
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.89937 -1.47986 -1.19676 -0.97181 -0.77880 -0.60550
-0.44489 -0.29280 -0.14504 -0.00026 0.14484 0.29251 0.44488 0.60550
0.77876 0.97174 1.19671 1.47974 1.89931
upper limit -1.89937 -1.47986 -1.19676 -0.97181 -0.77880 -0.60550 -0.44489
-0.29280 -0.14504 -0.00026 0.14484 0.29251 0.44488 0.60550 0.77876
0.97174 1.19671 1.47974 1.89931
observed no 5002000.00000 4997852.00000 4998990.00000 5000513.00000 5000180.00000 4995910.00000
5001968.00000 4993101.00000 5014384.00000 4988088.00000 4998635.00000 5011244.00000
4994630.00000 4999552.00000 4999624.00000 5001067.00000 5004743.00000 4999114.00000
4998443.00000 4999962.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.80000 0.92278 0.20402 0.05263 0.00648 3.34562 0.77460
9.51924 41.37989 28.37915 0.37265 25.28551 5.76738 0.04014 0.02828
0.22770 4.49921 0.15700 0.48485 0.00029
190
p-value=0.000000
Z=-3332.395606, p-value=0.000000
Z=-3332.395606, p-value=1.000000
Z=-3332.395606, p-value=0.000000
t=2,3,...,100000000
D.W. test=1.000222
D.W. test=2.999778
The joint proability distribution of X1 The joint proability distribution of X2
and residual estimated line and X2
191
Variance : 1.33332
S.D. : 1.15470
MAD : 0.92130
Range : 13.15760
Mid_range : 0.07225
Median : -0.00006
Q1 : -0.77878
Q2 : -0.00006
Q3 : 0.77888
IQR : 1.55766
C.V. : none
SLLN analysis, X0=residual and Normal(01.3333238350),

Note:X1~Normal(0, 1.3333238350),
X1 is representable code of Normal(0, 1.3333238350),

Z=-3332.395606, p-value=0.000000,
D.W. test=1.000222,L-M test=24988914.693132,
This data is big data ,1.000222 2 − 2 ρ , ρ = 0.49989 is population auto correlation
coefficient. The L-M test statistic cannot be deriven the auto correlation coefficient.
192
(30.2.4)The auto correlation coefficient analysis, the residual is form (30.2.2)
estimated line.
(30.2.4.1)The joint proabability distribution of t and error(t).
X1= t = 1,2,3,....., T ,X2= error (t ) ,T=100,000,000.
f(x1,x2) f(x2,x1)

sample cov(X1,X2)= 17.1989,X1 and X2 sample correlation coefficient=0.0000.
The t cannot explain the moving og error (t ) .
(30.2.4.2) The joint proabability distribution of error(t-1) and error(t).

Durbin Watson model,lag=1, letX1= error (t − 1) ,X2= error (t ) ,
t = 2,3,....., T ,T=100,000,000.
f(x1,x2) f(x2,x1)

193
(30.2.4.3) X1= residual (t − 1) is independent variable and X2= residual (t ) are

dependent variable.
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 33318312.4660798980 33318312.4660798980 33313626.3967685850
error 99999997 100014063.5237549200 1.0001406652
total 99999998 133332375.9898348300
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
----------------------------------------------------------------------------------
intercept 0.0000000305 0.0001000070 0.00031 0.99960
slpoe 0.4998891284 0.0000866089 5771.79577 0.00000
----------------------------------------------------------------------------------
MSE=1.0001406652 , R2=0.249889 , R2(adj)=0.249889
SSX1=133332374.3872116200 , SS(X2*X1)= 66651404.4238939140,
C.V.= 32633081.2133811970
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.64502 -1.28169 -1.03650 -0.84167 -0.67451 -0.52441
-0.38532 -0.25359 -0.12562 -0.00023 0.12544 0.25334 0.38531 0.52442
0.67447 0.84161 1.03645 1.28158 1.64497
upper limit -1.64502 -1.28169 -1.03650 -0.84167 -0.67451 -0.52441 -0.38532
-0.25359 -0.12562 -0.00023 0.12544 0.25334 0.38531 0.52442 0.67447
0.84161 1.03645 1.28158 1.64497
observed no 4998500.00000 4999336.00000 5001054.00000 4999549.00000 4999498.00000 4999623.00000
5002322.00000 4992981.00000 5010497.00000 4986435.00000 5002078.00000 5007784.00000
4998710.00000 4998617.00000 4999710.00000 5002749.00000 4998913.00000 5002522.00000
4999984.00000 4999137.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 4999999.95000 4999999.95000 4999999.95000 4999999.95000 4999999.95000 4999999.95000
4999999.95000 4999999.95000 4999999.95000 4999999.95000 4999999.95000 4999999.95000
4999999.95000 4999999.95000 4999999.95000 4999999.95000 4999999.95000 4999999.95000
4999999.95000 4999999.95000
chi square 0.44997 0.08817 0.22220 0.04067 0.05039 0.02842 1.07838
9.85313 22.03761 36.80157 0.86366 12.11829 0.33279 0.38251 0.01681
1.51146 0.23629 1.27215 0.00005 0.14894
194
p-value=0.000000
Z=-1.389294, p-value=0.082400
Z=-1.389294, p-value=0.917600
Z=-1.389294, p-value=0.164800
t=2,3,...,99999999
D.W. test=1.999839
D.W. test=2.000161
[0.999908 , 1.000373]
[0.999954 , 1.000187]
[0.999864 , 1.000418]
[0.999932 , 1.000209]
[0.999776 , 1.000505]
[0.999888 , 1.000253]
The joint probability distribution of The joint probability distribution of
X1=residual(t) and mu(t) X2=residual(t) estimated line and
X2=residual(t)
195
(30.2.4.3)mu(t)分析
mu(t)=residual of Durbin Waston,lag=1 and marginal probability distribution,
Variance : 1.00014
S.D. : 1.00007
MAD : 0.79793
Range : 11.16799
Median : 0.00003
Q1 : -0.67445
Q2 : 0.00003
Q3 : 0.67458
IQR : 1.34903
C.V. : none
SLLN analysis, X0=m(t) and Normal(0,1),Note:X1~Normal(0,1), X1 is

(30.2.5)Concluson, The Durbin Watson Model mus be included,

X1(t)~Normal(2,25), m(t)~Normal(0,1),
X2(t)=1.000024+2.000030*+error(t),
error(t)= 0.000000+0.499889*error(t-1)+mu(t),t=2,….,T
196
Chaper 6. The general linear model and non-linear
model
6.1. multiple regression analysis
(1.1)Sample
Yi = β 0 + β1 X 1i + β 2 X 2i + ... + β k X ki + ε i , i = 1,2,..., n
β 0 is intercept, β1 , β 2 ,.., β k are slopes,
X 1i , X 2i ,..., X ki are independent variables,
Yi is dependent variables.
ε i is error, there are three basic assumptions,
(a )ε i ~ N (0,σ i2 ), (b )σ 12 = ... = σ n2 , (c )Cov (ε i , ε j ) = 0, i ≠ j.
(1.2) Big data

The linear model analysis can be applied in big data, the method is
f X j (x j ), f ε (ε ) can be formed using the curve-fitting or SLLN.
Y = H (x1 ,..., x k ) + ε , H ( x1 ,..., xk ) is from the linear model analysis,
( X 1 ,..., X k )' and ε are independent random variables.
f X 1 ,..., X k ,ε ( x1 ,..., xk , ε ) = f X 1 ,..., X k ( x1 ,..., xk ) fε (ε ),
f X 1 ,..., X k Y ( x1 ,..., xk , y ) = f X 1 ,..., X k ,ε ( x1 ,..., xk , ε = y − H ( x1 ,..., xk )),
fY ( y ) = ∫ ....∫ f X 1 ,..., X k Y ( x1 ,..., xk , y )dx1...dxk
f X 1 ,..., X k Y ( x1 ,..., xk , y )
fY x1 ,..., xk ( y x1 ,..., xk ) = ,
f X 1 ,..., X k ( x1 ,..., xk )
f X 1 ,..., X k Y ( x1 ,..., xk , y )
f x1 ,..., xk (x ,..., x y ) = ,
fY ( y )
y 1 k
f X 1 y (x1 y ) = ∫ ....∫ f x1 ,..., xk y

(x ,..., x y )dx ...dx ,...,
1 k 2 k
fXk y
(x y ) = ∫ ....∫ f
k x1 ,..., xk y
(x ,..., x y )dx ...dx ,
1 k 2 k −1
There are marginal probability, conditional probability distribution and the joint
probability distribution.
Let W = H (x1 ,..., x k ) , Y = W + ε

f W ,ε (w, ε ) is transferred from f X 1 ,..., X k ,ε ( x1 ,..., xk , ε ) ,
f W ,Y (w, y ) = f W ,ε (w, ε = y − w) ,
197
6.2. Collinarity in highly, the other assumptions are unchanged.
Example 31,
Multi-variate normal distribution and there are 5 random variables,
the vector of population expection mean and cov-variance matrix
 E ( X 1 )  100   1 0.99 0.99 0.99 0.99
 E ( X )  0  0.99 1 0.99 0.99 0.99
 2    
μ =  E ( X 3 ) = − 100, Σ = 0.99 0.99 1 0.99 0.99,
     
 E ( X 4 ) − 120 0.99 0.99 0.99 1 0.99
 E ( X 5 )  180  0.99 0.99 0.99 0.99 1 
X i ~ Normal (E ( X i ),Var ( X i )),Var ( X i ) = 1, i = 1,2,..,5,
Cov (X i , X j ) = ρ ((X i , X j )) = 0.99, i, j = 1,2,...,5, i ≠ j ,
(31.1.1) X 1 , X 2 , X 3 , X 4 are independent variables, X 5 is dependent variables.
Dependent variable is X5,
Independent variables are X1,X2,X3,X4
The correlation matrix is below
r(X5,X1)=0.990839,r(X5,X2)=0.990473,r(X5,X3)=0.990308,r(X5,X4)=0.991157,
r(X1,X2)=0.990072,r(X1,X3)=0.990595,r(X1,X4)=0.990603,r(X2,X3)=0.990136,
r(X2,X4)=0.990641,r(X3,X4)=0.990697,
The estimated line is X5=207.931419+0.268172*X1+0.240660*X2+0.207652*X3+0.283226*X4
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 4 1002.8234887438 250.7058721860 21539.0189884664
error 995 11.5814161712 0.0116396142
total 999 1014.4049049150
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 207.9314185679 6.1480819864 33.82053 0.00000
X1 0.2681718005 0.0300243283 8.93182 0.00000
X2 0.2406601650 0.0297202607 8.09751 0.00000
X3 0.2076518602 0.0302769893 6.85841 0.00000
X4 0.2832257049 0.0305070251 9.28395 0.00000
----------------------------------------------------------------------------------
MSE=0.0116396142 , R2=0.988583 , R2(adj)=0.988537
dependent variable:X5 , sample mean=180.0015808554 , sample variance=1.015420
independent variable:X1 , sample mean=100.0017783040 , sample variance=1.010565
independent variable:X3 , sample mean=-99.9910537157 , sample variance=1.004678
-------- Regression CoefficientVariance and Covariance Matrix ---------------

Var(b0)= 37.7989121121, Cov(b0,b1)= -0.1585817368, Cov(b0,b2)= -0.0390525466,
Cov(b0,b3)= 0.0863582216, Cov(b0,b4)= 0.1108784652,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -0.1585817368, Var(b1)= 0.0009014603, Cov(b1,b2)= -0.0002745713,
Cov(b1,b3)= -0.0003205022, Cov(b1,b4)= -0.0003032501,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.0390525466, Cov(b2,b1)= -0.0002745713, Var(b2)= 0.0008832939,
198
Cov(b2,b3)= -0.0002781319, Cov(b2,b4)= -0.0003224420,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= 0.0863582216, Cov(b3,b1)= -0.0003205022, Cov(b3,b2)= -0.0002781319, Var(b3)=
0.0009166961, Cov(b3,b4)= -0.0003113108,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b0)= 0.1108784652, Cov(b4,b1)= -0.0003032501, Cov(b4,b2)= -0.0003224420, Cov(b4,b3)=
-0.0003113108, Var(b4)= 0.0009306786,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ indivudal test ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
variable coefficient staradarad error t value F value
intercept 207.9314185679 6.1480819864 33.8205 1143.8285
X1 slope 0.2681718005 0.0300243283 8.9318 79.7774
X2 slope 0.2406601650 0.0297202607 8.0975 65.5697
X3 slope 0.2076518602 0.0302769893 6.8584 47.0377
X4 slope 0.2832257049 0.0305070251 9.2840 86.1917
====================

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -0.13827 -0.09080 -0.05657 -0.02733 0.00000 0.02733
0.05657 0.09075 0.13826
upper limit -0.13827 -0.09080 -0.05657 -0.02733 0.00000 0.02733 0.05657
0.09075 0.13826
observed no 100.00000 105.00000 89.00000 90.00000 119.00000 101.00000 98.00000
108.00000 97.00000 93.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.00000 0.25000 1.21000 1.00000 3.61000 0.01000 0.04000
0.64000 0.09000 0.49000
degree of freedom=8
p-value=0.500400
Z=0.823773, p-value=0.795000
Z=0.823773, p-value=0.205000
Z=0.823773, p-value=0.410000
t=2,3,...,1000
D.W. test=1.984575
Z=0.243273, p-value=0.403800
Z=0.243273, p-value=0.596200
Z=0.243273, p-value=0.807600

p value=0.530722
199
residual plot (X5 estimated line and X5) scatter

diagram
Durbin Watson the first order auto-regressive error model,

(residual(t-1),residual(t)) scatter diagram

Linear model stepwise analysis
r(X5,X1)=0.990839,r(X5,X2)=0.990473,r(X5,X3)=0.990308,r(X5,X4)=0.991157,
r(X1,X2)=0.990072,r(X1,X3)=0.990595,r(X1,X4)=0.990603,r(X2,X3)=0.990136,
r(X2,X4)=0.990641,r(X3,X4)=0.990697,
Sorting the Independent variable by coefficient of determination and the order is from large to small
r(X5,X4) square=0.982392,
r(X5,X1) square=0.981763,
r(X5,X2) square=0.981037,
r(X5,X3) square=0.980710
analysis process 1 :[ The simple linear model analysis ]
analysis process 2 :[ The multiple linear model analysis ],

there are 2 independnent variables.
The independnent variables are:X4,X1, The independnent variables are:X4,X2, The independnent
variables are:X4,X3,
The independnent variables are:X4,X1,X2, The independnent variables are:X4,X1,X3,
200
The independnent variables are:X4,X1,X2,X3,
[ The stepwise analysis ]

The dependent variables X5
The insert order of indpendent variables are X4,X1,X2,X3
X4
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 1 996.5434228552 996.5434228552 55681.2885224165
error 998 17.8614820598 0.0178972766
total 999 1014.4049049150
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 298.3536114944 0.5015757414 594.83262
X4 0.9862928194 0.0041797589 235.96883
----------------------------------------------------------------------------------
MSE=0.0178972766 , R2=0.982392 , R2(adj)=0.982375
Var(b0)= 0.2515782244, Cov(b0,b1)= 0.0020963911,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= 0.0020963911, Var(b1)= 0.0000174704,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------- partial coefficient of determination and test ---------------
r(X5,X4) square= 0.9823921572, test value= 55681.2885224169
X4,X1
The estimated line is X5=193.253972+0.512206*X4+0.482098*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 1000.9325215354 500.4662607677
37036.1240416042
X4 1 996.5434228552
X1 1 4.3890986801
error 997 13.4723833797 0.0135129221
total 999 1014.4049049150
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 193.2539716661 5.8478697371 33.04690
X4 0.5122059123 0.0265549382 19.28854
X1 0.4820984749 0.0267499348 18.02242
----------------------------------------------------------------------------------
MSE=0.0135129221 , R2=0.986719 , R2(adj)=0.986692
dependent variable:X5 , sample mean= 180.0015808554 , sample variance=1.015420
independent variable:X4 , sample mean= -119.9968491273 , sample variance=1.025461
201
Var(b0)= 34.1975804621, Cov(b0,b1)= 0.1549855754, Cov(b0,b2)=
-0.1559950883,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= 0.1549855754, Var(b1)= 0.0007051647, Cov(b1,b2)=
-0.0007036678,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.1559950883, Cov(b2,b1)= -0.0007036678, Var(b2)=
0.0007155590,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
r(X5,X1|X4) square= 0.2457298149, test value= 324.8075162925
X4,X1,X2
The estimated line is X5=188.369379+0.353744*X4+0.340773*X1+0.303663*X2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 3 1002.2759878036 334.0919959345 27434.8999909182
X4 1 996.5434228552
X1 1 4.3890986801
X2 1 1.3434662682
error 996 12.1289171115 0.0121776276
total 999 1014.4049049150
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 188.3693785632 5.5708685591 33.81329
X4 0.3537444690 0.0293783731 12.04098
X1 0.3407726171 0.0287383399 11.85777
X2 0.3036631765 0.0289107990 10.50345
----------------------------------------------------------------------------------
MSE=0.0121776276 , R2=0.988043 , R2(adj)=0.988007
independent variable:X1 , sample mean= 100.0017783040 , sample variance=1.010565

Var(b0)= 31.0345765033, Cov(b0,b1)= 0.1466864756, Cov(b0,b2)=
-0.1343229739, Cov(b0,b3)= -0.0134448652,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= 0.1466864756, Var(b1)= 0.0008630888, Cov(b1,b2)=
-0.0004311410, Cov(b1,b3)= -0.0004361659,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.1343229739, Cov(b2,b1)= -0.0004311410, Var(b2)=
0.0008258922, Cov(b2,b3)= -0.0003890001,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= -0.0134448652, Cov(b3,b1)= -0.0004361659, Cov(b3,b2)=
-0.0003890001, Var(b3)= 0.0008358343,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
r(X5,X2|X4,X1) square= 0.0997200147, test value= 110.3224954748
202
X4,X1,X2,X3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 4 1002.8234887438 250.7058721860 21539.0189884775
X4 1 996.5434228552
X1 1 4.3890986801
X2 1 1.3434662682
X3 1 0.5475009402
error 995 11.5814161712 0.0116396142
total 999 1014.4049049150
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 207.9314185232 6.1480819864 33.82053
X4 0.2832257042 0.0305070251 9.28395
X1 0.2681717999 0.0300243283 8.93182
X2 0.2406601653 0.0297202607 8.09751
X3 0.2076518605 0.0302769893 6.85841
----------------------------------------------------------------------------------
MSE=0.0116396142 , R2=0.988583 , R2(adj)=0.988537
Var(b0)= 37.7989121120, Cov(b0,b1)= 0.1108784652, Cov(b0,b2)=
-0.1585817368, Cov(b0,b3)= -0.0390525466, Cov(b0,b4)= 0.0863582216,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= 0.1108784652, Var(b1)= 0.0009306786, Cov(b1,b2)=
-0.0003032501, Cov(b1,b3)= -0.0003224420, Cov(b1,b4)= -0.0003113108,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.1585817368, Cov(b2,b1)= -0.0003032501, Var(b2)=
0.0009014603, Cov(b2,b3)= -0.0002745713, Cov(b2,b4)= -0.0003205022,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-0.0002745713, Var(b3)= 0.0008832939, Cov(b3,b4)= -0.0002781319,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b0)= 0.0863582216, Cov(b4,b1)= -0.0003113108, Cov(b4,b2)=
-0.0003205022, Cov(b4,b3)= -0.0002781319, Var(b4)= 0.0009166961,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
r(X5,X3|X4,X1,X2) square= 0.0451401337, test value= 47.0377221138
[ Multiple regression analysis ]

ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 4 1002.8234887438 250.7058721860 21539.0189884775
X4 1 996.5434228552
203
X1 1 4.3890986801
X2 1 1.3434662682
X3 1 0.5475009402
error 995 11.5814161712 0.0116396142
total 999 1014.4049049150
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 207.9314185232 6.1480819864 33.82053
X4 0.2832257042 0.0305070251 9.28395
X1 0.2681717999 0.0300243283 8.93182
X2 0.2406601653 0.0297202607 8.09751
X3 0.2076518605 0.0302769893 6.85841
----------------------------------------------------------------------------------
MSE= 0.0116396142 , R2=0.988583 , R2(adj)=0.988537
Var(b0)= 37.7989121120, Cov(b0,b1)= 0.1108784652, Cov(b0,b2)=
-0.1585817368, Cov(b0,b3)= -0.0390525466, Cov(b0,b4)= 0.0863582216,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= 0.1108784652, Var(b1)= 0.0009306786, Cov(b1,b2)=
-0.0003032501, Cov(b1,b3)= -0.0003224420, Cov(b1,b4)= -0.0003113108,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.1585817368, Cov(b2,b1)= -0.0003032501, Var(b2)=
0.0009014603, Cov(b2,b3)= -0.0002745713, Cov(b2,b4)= -0.0003205022,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-0.0002745713, Var(b3)= 0.0008832939, Cov(b3,b4)= -0.0002781319,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b0)= 0.0863582216, Cov(b4,b1)= -0.0003113108, Cov(b4,b2)=
-0.0003205022, Cov(b4,b3)= -0.0002781319, Var(b4)= 0.0009166961,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(31.2) n = 100,000,000, it is big data.

r(X5,X1)=0.990000,r(X5,X2)=0.990002,r(X5,X3)=0.990000,r(X5,X4)=0.990003,
r(X1,X2)=0.989998,r(X1,X3)=0.989998,r(X1,X4)=0.990002,r(X2,X3)=0.989999,
r(X2,X4)=0.990001,r(X3,X4)=0.990002,

ANOVA
----------------------------------------------------------------------------------
204
Source df SS MS
----------------------------------------------------------------------------------
Regression 4 98755044.7592373790 24688761.1898093450
error 99999995 1249190.6841565552 0.0124919075
total 99999999 100004235.4433939300
----------------------------------------------------------------------------------
F test statistic=1976380409.2119820000
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 209.9307396412 0.0196175495 10701.17037 0.00000
X1 0.2492945867 0.0000968252 2574.68747 0.00000
X2 0.2494729046 0.0000968258 2576.51194 0.00000
X3 0.2492945247 0.0000968282 2574.60603 0.00000
X4 0.2494228352 0.0000968482 2575.39951 0.00000
----------------------------------------------------------------------------------
MSE=0.0124919075 , R2=0.987509 , R2(adj)=0.987509

Var(b0)= 0.0003848482, Cov(b0,b1)= -0.0000016228, Cov(b0,b2)= -0.0000003739,
Cov(b0,b3)= 0.0000008750, Cov(b0,b4)= 0.0000011256,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -0.0000016228, Var(b1)= 0.0000000094, Cov(b1,b2)= -0.0000000031,
Cov(b1,b3)= -0.0000000031, Cov(b1,b4)= -0.0000000031,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.0000003739, Cov(b2,b1)= -0.0000000031, Var(b2)= 0.0000000094,
Cov(b2,b3)= -0.0000000031, Cov(b2,b4)= -0.0000000031,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= 0.0000008750, Cov(b3,b1)= -0.0000000031, Cov(b3,b2)= -0.0000000031, Var(b3)=
0.0000000094, Cov(b3,b4)= -0.0000000031,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b0)= 0.0000011256, Cov(b4,b1)= -0.0000000031, Cov(b4,b2)= -0.0000000031, Cov(b4,b3)=
-0.0000000031, Var(b4)= 0.0000000094,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ indivudal test ~~~~~~~~~~~~~~~~~~
intercept 209.9307396412 0.0196175495 10701.1704 114515047.2704
X1 slope 0.2492945867 0.0000968252 2574.6875 6629015.5833
X2 slope 0.2494729046 0.0000968258 2576.5119 6638413.7554
X3 slope 0.2492945247 0.0000968282 2574.6060 6628596.2208
X4 slope 0.2494228352 0.0000968482 2575.3995 6632682.6111
====================

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -0.18385 -0.14324 -0.11584 -0.09406 -0.07538 -0.05861
-0.04306 -0.02834 -0.01404 -0.00003 0.01402 0.02831 0.04306 0.05861
0.07538 0.09406 0.11583 0.14323 0.18384
upper limit -0.18385 -0.14324 -0.11584 -0.09406 -0.07538 -0.05861 -0.04306
-0.02834 -0.01404 -0.00003 0.01402 0.02831 0.04306 0.05861 0.07538
0.09406 0.11583 0.14323 0.18384
observed no 4999811.00000 4998184.00000 5003803.00000 5000909.00000 4996858.00000 4997815.00000
5005298.00000 4989640.00000 5010527.00000 4985638.00000 5005109.00000 5006413.00000
4998023.00000 5001070.00000 4997432.00000 5003777.00000 4999022.00000 5001172.00000
5000166.00000 4999333.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
205
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.00714 0.65957 2.89256 0.16526 1.97443 0.95485 5.61376
21.46592 22.16355 41.25341 5.22038 8.22531 0.78171 0.22898 1.31892
2.85315 0.19130 0.27472 0.00551 0.08898
p-value=0.000000
Z=-0.712975, p-value=0.238000
Z=-0.712975, p-value=0.762000
Z=-0.712975, p-value=0.476000

t=2,3,...,100000000
D.W. test=1.999732
D.W. test=2.000268

The joint probability distribution of X5 The joint probability distribution of X5
estimated line and residual estimated line and X5
206
sample mean(X5 estimated value)= 180.0000,
sample variance(X5 estimated value)= 0.9876
sample mean(residual)= -0.0000, sample variance(residual)= 0.0125,
sample cov(X5 estimated value,residual)= -0.0000,
X5 estimated value and residual sample correlation coefficient=-0.0000.
sample variance(X5 estimated value)= 0.9876,
sample cov(X5 estimated value,X5)= 0.9876,
X5 estimated value and X5 sample correlation coefficient=0.9937.

Variance : 0.01249
S.D. : 0.11177
MAD : 0.08918
Range : 1.25194
Median : 0.00001
Q1 : -0.07538
Q2 : 0.00001
Q3 : 0.07539
IQR : 0.15077
C.V. : none
SLLN analysis, X0=residual and Normal(0, 0.01249),Note:X1~Normal(0, 0.01249),

207
(31.2.3)one of X 1 , X 2 , X 3 , X 4 , X 5 is dependent variable and the other is independent
variables(refer Chpater 7), it is the multu-variate analysis using linear model.
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 4 98756340.1276181040 24689085.0319045260 1975734119.9261568000
error 99999995 1249615.7022571960 0.0124961576
total 99999999 100005955.8298753100
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 109.9858552217 0.2375014503 463.09551 0.00000
X2 0.2492833210 0.0008663705 287.73292 0.00000
X3 0.2492686305 0.0008663542 287.72138 0.00000
X4 0.2495613558 0.0008664952 288.01240 0.00000
X5 0.2493798155 0.0008664593 287.81480 0.00000
----------------------------------------------------------------------------------
MSE=0.0124961576 , R2=0.987505 , R2(adj)=0.987505,C.V.= 0.0011178621

The estimated line is X2=-14.985366+0.249258*X1+0.249316*X3+0.249373*X4+0.249533*X5
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 4 98754235.4142759290 24688558.8535689820 1975891755.3997853000
error 99999995 1249489.3787410820 0.0124948944
total 99999999 100003724.7930170100
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept -14.9853659915 0.2567249482 -58.37129 0.00000
X1 0.2492580265 0.0008663266 287.71833 0.00000
X3 0.2493157421 0.0008663404 287.78033 0.00000
X4 0.2493731747 0.0008665356 287.78179 0.00000
X5 0.2495328720 0.0008664211 288.00414 0.00000
----------------------------------------------------------------------------------
MSE=0.0124948944 , R2=0.987506 , R2(adj)=0.987506,C.V.= 21818.2718360461

ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
208
----------------------------------------------------------------------------------
Regression 4 98751088.0409933180 24687772.0102483290 1975745125.0907817000
error 99999995 1249542.2846975452 0.0124954235
total 99999999 100000630.3256908700
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept -139.8693654835 0.2245683476 -622.83651 0.00000
X1 0.2492539653 0.0008663287 287.71292 0.00000
X2 0.2493263735 0.0008663589 287.78647 0.00000
X4 0.2495145493 0.0008665043 287.95536 0.00000
X5 0.2493650829 0.0008664610 287.79723 0.00000
----------------------------------------------------------------------------------
MSE=0.0124954235 , R2=0.987505 , R2(adj)=0.987505,C.V.=-------

ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 4 98755604.8918029670 24688901.2229507420 1976723476.2189426000
error 99999995 1248980.9670156986 0.0124898103
total 99999999 100004585.8588186700
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept -164.8917723298 0.2105188956 -783.26353 0.00000
X1 0.2494341135 0.0008662743 287.93897 0.00000
X2 0.2492713225 0.0008663586 287.72302 0.00000
X3 0.2494020045 0.0008663088 287.89041 0.00000
X5 0.2493808935 0.0008664444 287.82099 0.00000
----------------------------------------------------------------------------------
MSE=0.0124898103 , R2=0.987511 , R2(adj)=0.987511,C.V.=-------
There are 5 random variables, X1,…,X5, any on of them can be depedent variables,
because the multi-variate normal distribution is joint probability distribution.
209
6.3. The probability distributions of independent variable and error
are not normal distribution, the other assumptions are
unchanged.
Example 32,
X 1 ~ Arc sin (µ = 100, c = 10 ), X 2 ~ Double _ exponential (λ = 0.1, µ = 50 ),
X 3 ~ Semi _ circle(µ = 100, R = 10), X 4 ~ Logistic (µ = 100,σ = 10),
X 5 ~ Gamma(α = 50, β = 2 ), X 6 ~ U _ quadratic(a = 90, b = 110 ),
X 1 , X 2 ,..., X 6 are independent random variables.
X 7 = 1 + 2 X 1 + 3 X 3 + 4 X 4 + 5 X 5 + 6 X 6 + ε , ε ~ Raised _ secant (0, s = 5 ),

(32.1.1)Linear model analysis, ANOVA F testand individual test p-value are nonsense,
because probability distributions of error is not normal.
Independent variables are X1,X2,X3,X4,X5,X6
r(X7,X1)=0.172080,r(X7,X2)=0.391786,r(X7,X3)=0.185324,r(X7,X4)=0.389410,
r(X7,X5)=0.691354,r(X7,X6)=0.432117,r(X1,X2)=-0.031192,r(X1,X3)=0.014053,
r(X1,X4)=-0.018977,r(X1,X5)=0.079279,r(X1,X6)=0.048505,r(X2,X3)=0.017823,
r(X2,X4)=0.027734,r(X2,X5)=-0.009630,r(X2,X6)=0.071402,r(X3,X4)=0.016840,
r(X3,X5)=0.009900,r(X3,X6)=0.008429,r(X4,X5)=0.015745,r(X4,X6)=-0.030661,
r(X5,X6)=-0.025705,
The estimated line is

X7=1.725619+2.003624*X1+3.001740*X2+3.990032*X3+5.005397*X4+5.999391*X5
+6.992869*X6
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 6 15771369.4369300980 2628561.5728216828 835131.7203478662
error 993 3125.4490497914 3.1474814197
total 999 15774494.8859798890
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 1.7256189342 1.6720512567 1.03204 0.30200
X1 2.0036237968 0.0079792685 251.10369 0.00000
X2 3.0017403101 0.0037608884 798.14661 0.00000
X3 3.9900315733 0.0111098385 359.14398 0.00000
X4 5.0053974070 0.0058727120 852.31447 0.00000
X5 5.9993912365 0.0039025193 1537.31237 0.00000
X6 6.9928693758 0.0073136456 956.14004 0.00000
----------------------------------------------------------------------------------
MSE=3.1474814197 , R2=0.999802 , R2(adj)=0.999801
210

Var(b0)= 2.7957554051, Cov(b0,b1)= -0.0058399172, Cov(b0,b2)= -0.0004678744,
Cov(b0,b3)= -0.0119699807, Cov(b0,b4)= -0.0034932829,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b0,b5)= -0.0012814815, Cov(b0,b6)= -0.0051012466,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -0.0058399172, Var(b1)= 0.0000636687, Cov(b1,b2)= 0.0000010189,
Cov(b1,b3)= -0.0000012222, Cov(b1,b4)= 0.0000008435,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b5)= -0.0000025087, Cov(b1,b6)= -0.0000030552,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.0004678744, Cov(b2,b1)= 0.0000010189, Var(b2)= 0.0000141443,
Cov(b2,b3)= -0.0000007230, Cov(b2,b4)= -0.0000006453,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b5)= 0.0000000832, Cov(b2,b6)= -0.0000020238,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= -0.0119699807, Cov(b3,b1)= -0.0000012222, Cov(b3,b2)= -0.0000007230, Var(b3)=
0.0001234285, Cov(b3,b4)= -0.0000010889,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b5)= -0.0000003843, Cov(b3,b6)= -0.0000005872,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b0)= -0.0034932829, Cov(b4,b1)= 0.0000008435, Cov(b4,b2)= -0.0000006453, Cov(b4,b3)=
-0.0000010889, Var(b4)= 0.0000344887,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b5)= -0.0000003768, Cov(b4,b6)= 0.0000013522,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b5,b0)= -0.0012814815, Cov(b5,b1)= -0.0000025087, Cov(b5,b2)= 0.0000000832, Cov(b5,b3)=
-0.0000003843, Cov(b5,b4)= -0.0000003768,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Var(b5)= 0.0000152297, Cov(b5,b6)= 0.0000008208,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b6,b0)= -0.0051012466, Cov(b6,b1)= -0.0000030552, Cov(b6,b2)= -0.0000020238, Cov(b6,b3)=
-0.0000005872, Cov(b6,b4)= 0.0000013522,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b6,b5)= 0.0000008208, Var(b6)= 0.0000534894,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ indivudal test ~~~~~~~~~~~~~~~~~~~~~~~~~
intercept 1.7256189342 1.6720512567 1.0320 1.0651
X1 slope 2.0036237968 0.0079792685 251.1037 63053.0649
X2 slope 3.0017403101 0.0037608884 798.1466 637038.0070
X3 slope 3.9900315733 0.0111098385 359.1440 128984.3952
X4 slope 5.0053974070 0.0058727120 852.3145 726439.9525
X5 slope 5.9993912365 0.0039025193 1537.3124 2363329.3080
X6 slope 6.9928693758 0.0073136456 956.1400 914203.7695
====================
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -2.27370 -1.49312 -0.93031 -0.44941 0.00004 0.44943
0.93031 1.49237 2.27352
upper limit -2.27370 -1.49312 -0.93031 -0.44941 0.00004 0.44943 0.93031
1.49237 2.27352
observed no 113.00000 96.00000 112.00000 96.00000 91.00000 71.00000 101.00000
107.00000 96.00000 117.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 1.69000 0.16000 1.44000 0.16000 0.81000 8.41000 0.01000
0.49000 0.16000 2.89000
degree of freedom=8
p-value=0.039300
211
Z=-0.624833, p-value=0.266100
Z=-0.624833, p-value=0.733900
Z=-0.624833, p-value=0.532200
t=2,3,...,1000
D.W. test=1.968793
D.W. test=2.031207

Scatter diagrram (X5 estimated line,X5) scatter diagram
(32.1.2)residual analysis
[ 1 ] -4.54698~ -3.50989 -4.02843 15.00000 0.0150000 0.0150000
[ 2 ] -3.50989~ -2.47280 -2.99134 75.00000 0.0750000 0.0900000
[ 3 ] -2.47280~ -1.43571 -1.95425 129.00000 0.1290000 0.2190000
[ 4 ] -1.43571~ -0.39862 -0.91716 210.00000 0.2100000 0.4290000
[ 5 ] -0.39862~ 0.63848 0.11993 191.00000 0.1910000 0.6200000
[ 6 ] 0.63848~ 1.67557 1.15702 189.00000 0.1890000 0.8090000
[ 7 ] 1.67557~ 2.71266 2.19411 121.00000 0.1210000 0.9300000
[ 8 ] 2.71266~ 3.74975 3.23120 61.00000 0.0610000 0.9910000
[ 9 ] 3.74975~ 4.78684 4.26829 9.00000 0.0090000 1.0000000
212
X0= residual,goodness of fit( the best parameters)
mu point estimated value=-0.000000 (MLE), sigma point estimated value=1.769665 (MLE)
mu value from -0.353933 to 0.353933, sigma value from 1.474720 to 2.212081
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -2.44514 -1.61785 -1.02136 -0.51170 -0.03535 0.44093
0.95058 1.54628 2.37416
upper limit -2.44514 -1.61785 -1.02136 -0.51170 -0.03535 0.44093 0.95058
1.54628 2.37416
observed no 91.00000 99.00000 106.00000 107.00000 96.00000 80.00000 108.00000
106.00000 104.00000 103.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.81000 0.01000 0.36000 0.49000 0.16000 4.00000 0.64000
0.36000 0.16000 0.09000
degree of freedom=7
H0: X0~Normal(mu=-0.035393,sigma*sigma=3.535410), sigma=1.880269
p-value=0.420500
(32.1.3)Checking the probability distribution of random variable

X1 goodness of fit( the best parameters)
c point estimated value=9.999926
mu value from 98.266226 to 102.266197
c value from 8.333271 to 12.499907
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit 90.00014 90.51572 91.93610 94.14840 96.93607 100.02621 103.11636
105.90402 108.11632 109.53671
upper limit 90.51572 91.93610 94.14840 96.93607 100.02621 103.11636 105.90402
108.11632 109.53671 110.00000
observed no 98.00000 87.00000 86.00000 123.00000 102.00000 87.00000 104.00000
104.00000 107.00000 102.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.04000 1.69000 1.96000 5.29000 0.04000 1.69000 0.16000
0.16000 0.49000 0.04000
degree of freedom=7
H0: X1~Arcsin(mu=100.026213,c=9.999926),
p-value=0.115900
213
lamda point estimated value=0.094262 (MLE), mu point estimated value=49.359194 (MLE)
lamda value from 5.304350 to 21.217400, mu value from 49.092580 to 49.625808
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit 32.08204 39.47238 43.79546 46.86272 49.24188 51.62104
54.68831 59.01138 66.40173
upper limit 32.08204 39.47238 43.79546 46.86272 49.24188 51.62104 54.68831
59.01138 66.40173
observed no 101.00000 94.00000 105.00000 99.00000 96.00000 97.00000 96.00000
108.00000 107.00000 97.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.01000 0.36000 0.25000 0.01000 0.16000 0.09000 0.16000
0.64000 0.49000 0.09000
degree of freedom=7
H0: X2~Double exponential(lamda=0.093791,mu=49.241884),
p-value=0.944000

R point estimated value=9.932799
mu value from 97.943828 to 101.916948
R value from 8.277333 to 12.415999
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit 90.05079 93.23084 95.20198 96.94078 98.57610 100.16901 101.76190
103.39753 105.13430 107.10694
upper limit 93.23084 95.20198 96.94078 98.57610 100.16901 101.76190 103.39753
105.13430 107.10694 109.91638
observed no 102.00000 114.00000 100.00000 119.00000 91.00000 95.00000 98.00000
85.00000 98.00000 98.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.04000 1.96000 0.00000 3.61000 0.81000 0.25000 0.04000
2.25000 0.04000 0.04000
degree of freedom=7
H0: X3~Semi-circle(mu=100.168775,R=10.098346),
p-value=0.249700
214
sigma point estimated value=5.276446
mu value from 99.028774 to 101.139352
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit 88.53504 92.67281 95.42304 97.67749 99.74637 101.81525
104.06971 106.81993 110.95770
upper limit 88.53504 92.67281 95.42304 97.67749 99.74637 101.81525 104.06971
106.81993 110.95770
observed no 99.00000 102.00000 94.00000 98.00000 100.00000 112.00000 85.00000
99.00000 106.00000 105.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.01000 0.04000 0.36000 0.04000 0.00000 1.44000 2.25000
0.01000 0.36000 0.25000
degree of freedom=7
H0: X4~Logistic(mu=99.746370,sigma=5.102497),
p-value=0.689200

alpha point estimated value=48.000000 (MME), beta point estimated value=2.084452 (MME)
alpha values are 47.500000, 48.000000 and 48.500000
beta value from 1.667561 to 2.501342
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit 61.68168 82.37916 88.09066 92.37442 96.14428 99.75897 103.46107
107.52195 112.40086 119.42506
upper limit 82.37916 88.09066 92.37442 96.14428 99.75897 103.46107 107.52195
112.40086 119.42506
observed no 106.00000 96.00000 104.00000 115.00000 95.00000 100.00000 100.00000
96.00000 94.00000 94.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.36000 0.16000 0.16000 2.25000 0.25000 0.00000 0.00000
0.16000 0.36000 0.36000
degree of freedom=7
H0: X5~Gamma(alpha=48.000000,beta=2.092789),
215
p-value=0.772800

a point estimated value=90.000357, b point estimated value=109.977695
a value from 89.996361 to 90.004352, b value from 109.973700 to 109.981691
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit 90.00036 90.71251 91.56038 92.62579 94.14464 100.24121 105.82759
107.34546 108.40949 109.25838
upper limit 90.71251 91.56038 92.62579 94.14464 100.24121 105.82759 107.34546
108.40949 109.25838 109.97770
observed no 88.00000 111.00000 102.00000 95.00000 106.00000 94.00000 113.00000
92.00000 102.00000 97.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 1.44000 1.21000 0.04000 0.25000 0.36000 0.36000 1.69000
0.64000 0.04000 0.09000
degree of freedom=7
H0: X6~U_quadratic(a=89.996361,b=109.974419),
p-value=0.525800

mu value from 2524.179319 to 2574.443080
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit 2385.07979 2441.28933 2481.81739 2516.44651 2548.81176 2581.17183
2615.79987 2656.27416 2712.52416
upper limit 2385.07979 2441.28933 2481.81739 2516.44651 2548.81176 2581.17183 2615.79987
2656.27416 2712.52416
observed no 102.00000 97.00000 91.00000 113.00000 115.00000 97.00000 91.00000
87.00000 109.00000 98.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.04000 0.09000 0.81000 1.69000 2.25000 0.09000 0.81000
1.69000 0.81000 0.04000
degree of freedom=7
p-value=0.305200
216
(32.1.4)The linear model stepwise analysis
Sorting the Independent variable by coefficient of determination and the order is from large to small
r(X7,X5) square=0.477971,
r(X7,X6) square=0.186725,
r(X7,X2) square=0.153496,
r(X7,X4) square=0.151640,
r(X7,X3) square=0.034345,
r(X7,X1) square=0.029612
analysis process 1 :[ The simple linear model analysis ]

The independnent variables are:X5,X6, The independnent variables are:X5,X2, The independnent
variables are:X5,X4, The independnent variables are:X5,X3, The independnent variables are:X5,X1,
The independnent variables are:X5,X6,X2, The independnent variables are:X5,X6,X4, The
independnent variables are:X5,X6,X3, The independnent variables are:X5,X6,X1,
The independnent variables are:X5,X6,X2,X4, The independnent variables are:X5,X6,X2,X3, The
independnent variables are:X5,X6,X2,X1,
The independnent variables are:X5,X6,X2,X4,X3, The independnent variables are:X5,X6,X2,X4,X1,
The independnent variables are:X5,X6,X2,X4,X3,X1,
[ The stepwise analysis ]

The dependent variables X7
The insert order of indpendent variables are X5,X6,X2,X4,X3,X1
X5
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 1 7539744.9039596841 7539744.9039596841 913.7697477860
error 998 8234749.9820202049 8251.2524869942
total 999 15774494.8859798890
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
217
----------------------------------------------------------------------------------
intercept 1947.5829634149 20.1120969927 96.83639
X5 6.0172627135 0.1990584350 30.22862
----------------------------------------------------------------------------------
MSE= 8251.2524869942 , R2=0.477971 , R2(adj)=0.477448
-------- Regression Coefficient Variance and Covariance Matrix ---------------
Var(b0)= 404.4964454450, Cov(b0,b1)= -3.9624389920,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -3.9624389920, Var(b1)= 0.0396242605,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

X5,X6
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 10734608.1139184050 5367304.0569592025 1061.7703108833
X5 1 7539744.9039596915
X6 1 3194863.2099587135
error 997 5039886.7720614839 5055.0519278450
total 999 15774494.8859798890
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 1204.0048687995 33.5059202032 35.93409
X5 6.1179821270 0.1558572424 39.25376
X6 7.3356449337 0.2917930735 25.13989
----------------------------------------------------------------------------------
MSE=5055.0519278450 , R2=0.680504 , R2(adj)=0.679863

Var(b0)= 1122.6466886637, Cov(b0,b1)= -2.5460494102, Cov(b0,b2)=
-8.6305454150,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -2.5460494102, Var(b1)= 0.0242914800, Cov(b1,b2)=
0.0011690278,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -8.6305454150, Cov(b2,b1)= 0.0011690278, Var(b2)=
0.0851431977,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

X5,X6,X2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 3 12863423.2126139330 4287807.7375379773 1467.0392851062
218
X5 1 7539744.9039596915
X6 1 3194863.2099587135
X2 1 2128815.0986955278
error 996 2911071.6733659552 2922.7627242630
total 999 15774494.8859798890
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 1092.0864015411 25.8127178293 42.30807
X5 6.1429847459 0.1185152471 51.83286
X6 6.9083264658 0.2224395406 31.05710
X2 3.0893593685 0.1144712010 26.98809
----------------------------------------------------------------------------------
MSE=2922.7627242630 , R2=0.815457 , R2(adj)=0.814901
Var(b0)= 666.2964017370, Cov(b0,b1)= -1.4759332404, Cov(b0,b2)=
-4.9244035158, Cov(b0,b3)= -0.4747071819,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -1.4759332404, Var(b1)= 0.0140458638, Cov(b1,b2)=
0.0006612473, Cov(b1,b3)= 0.0001060497,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -4.9244035158, Cov(b2,b1)= 0.0006612473, Var(b2)=
0.0494793492, Cov(b2,b3)= -0.0018124904,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= -0.4747071819, Cov(b3,b1)= 0.0001060497, Cov(b3,b2)=
-0.0018124904, Var(b3)= 0.0131036559,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
X5,X6,X2,X4
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 4 15158991.6238933490 3789747.9059733371 6126.3674763649
X5 1 7539744.9039596915
X6 1 3194863.2099587135
X2 1 2128815.0986955278
X4 1 2295568.4112794157
error 995 615503.2620865402 618.5962433031
total 999 15774494.8859798890
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 579.9185677943 14.5501716217 39.85648
X5 6.0924561580 0.0545294774 111.72776
X6 7.1100981813 0.1023873294 69.44315
X2 2.9926379623 0.0526866266 56.80071
X4 5.0138705763 0.0823060270 60.91742
----------------------------------------------------------------------------------
MSE=618.5962433031 , R2=0.960981 , R2(adj)=0.960824
219
Var(b0)= 211.7074942222, Cov(b0,b1)= -0.3054042424, Cov(b0,b2)=
-1.0700867928, Cov(b0,b3)= -0.0871216228, Cov(b0,b4)= -0.6919942043,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -0.3054042424, Var(b1)= 0.0029734639, Cov(b1,b2)=
0.0001372042, Cov(b1,b3)= 0.0000237622, Cov(b1,b4)= -0.0000682696,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -1.0700867928, Cov(b2,b1)= 0.0001372042, Var(b2)=
0.0104831652, Cov(b2,b3)= -0.0003888685, Cov(b2,b4)= 0.0002726154,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-0.0003888685, Var(b3)= 0.0027758806, Cov(b3,b4)= -0.0001306811,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0.0002726154, Cov(b4,b3)= -0.0001306811, Var(b4)= 0.0067742821,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
X5,X6,X2,X4,X3
X7=185.504969+6.078340*X5+7.089015*X6+2.969676*X2+4.978852*X4+4.028495*X3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 5 15572911.0866016300 3114582.2173203258 15357.8548155408
X5 1 7539744.9039596915
X6 1 3194863.2099587135
X2 1 2128815.0986955278
X4 1 2295568.4112794157
X3 1 413919.4627082814
error 994 201583.7993782585 202.8006029962
total 999 15774494.8859798890
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 185.5049693609 12.0674818975 15.37230
X5 6.0783397806 0.0312236784 194.67084
X6 7.0890153415 0.0586260935 120.91911
X2 2.9696759947 0.0301712293 98.42741
X4 4.9788516136 0.0471325959 105.63500
X3 4.0284947365 0.0891701501 45.17762
----------------------------------------------------------------------------------
MSE=202.8006029962 , R2=0.987221 , R2(adj)=0.987157
Var(b0)= 145.6241193453, Cov(b0,b1)= -0.0973958315, Cov(b0,b2)=
-0.3467431472, Cov(b0,b3)= -0.0241246995, Cov(b0,b4)= -0.2200961989,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b0,b5)= -0.7784811029,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -0.0973958315, Var(b1)= 0.0009749181, Cov(b1,b2)=
0.0000451268, Cov(b1,b3)= 0.0000079490, Cov(b1,b4)= -0.0000221393,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
220
Cov(b1,b5)= -0.0000278625,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.3467431472, Cov(b2,b1)= 0.0000451268, Var(b2)=
0.0034370188, Cov(b2,b3)= -0.0001272495, Cov(b2,b4)= 0.0000897360,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b5)= -0.0000416126,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-0.0001272495, Var(b3)= 0.0009103031, Cov(b3,b4)= -0.0000424485,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b5)= -0.0000453216,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0.0000897360, Cov(b4,b3)= -0.0000424485, Var(b4)= 0.0022214816,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b5)= -0.0000691193,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-0.0000416126, Cov(b5,b3)= -0.0000453216, Cov(b5,b4)= -0.0000691193,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Var(b5)= 0.0079513157,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
r(X7,X3|X5,X6,X2,X4) square= 0.6724894703, test value= 2041.0169229919
X5,X6,X2,X4,X3,X1
X7=1.725619+5.999391*X5+6.992869*X6+3.001740*X2+5.005397*X4+3.990032*X3
+2.003624*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 6 15771369.4369300980 2628561.5728216828 835131.7203478590
X5 1 7539744.9039596915
X6 1 3194863.2099587135
X2 1 2128815.0986955278
X4 1 2295568.4112794157
X3 1 413919.4627082814
X1 1 198458.3503284678
error 993 3125.4490497915 3.1474814197
total 999 15774494.8859798890
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 1.7256189338 1.6720512567 1.03204
X5 5.9993912365 0.0039025193 1537.31237
X6 6.9928693758 0.0073136456 956.14004
X2 3.0017403101 0.0037608884 798.14661
X4 5.0053974070 0.0058727120 852.31447
X3 3.9900315733 0.0111098385 359.14398
X1 2.0036237968 0.0079792685 251.10369
----------------------------------------------------------------------------------
221
MSE= 3.1474814197 , R2=0.999802 , R2(adj)=0.999801
Var(b0)= 2.7957554051, Cov(b0,b1)= -0.0012814815, Cov(b0,b2)=
-0.0051012466, Cov(b0,b3)= -0.0004678744, Cov(b0,b4)= -0.0034932829,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b0,b5)= -0.0119699807, Cov(b0,b6)= -0.0058399172,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -0.0012814815, Var(b1)= 0.0000152297, Cov(b1,b2)=
0.0000008208, Cov(b1,b3)= 0.0000000832, Cov(b1,b4)= -0.0000003768,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b5)= -0.0000003843, Cov(b1,b6)= -0.0000025087,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.0051012466, Cov(b2,b1)= 0.0000008208, Var(b2)=
0.0000534894, Cov(b2,b3)= -0.0000020238, Cov(b2,b4)= 0.0000013522,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b5)= -0.0000005872, Cov(b2,b6)= -0.0000030552,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-0.0000020238, Var(b3)= 0.0000141443, Cov(b3,b4)= -0.0000006453,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b5)= -0.0000007230, Cov(b3,b6)= 0.0000010189,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0.0000013522, Cov(b4,b3)= -0.0000006453, Var(b4)= 0.0000344887,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b5)= -0.0000010889, Cov(b4,b6)= 0.0000008435,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-0.0000005872, Cov(b5,b3)= -0.0000007230, Cov(b5,b4)= -0.0000010889,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Var(b5)= 0.0001234285, Cov(b5,b6)= -0.0000012222,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-0.0000030552, Cov(b6,b3)= 0.0000010189, Cov(b6,b4)= 0.0000008435,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b6,b5)= -0.0000012222, Var(b6)= 0.0000636687,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
r(X7,X1|X5,X6,X2,X4,X3) square= 0.9844955346, test value= 63053.0649313661
[ Multiple regression analysis ]

X7=1.725619+5.999391*X5+6.992869*X6+3.001740*X2+5.005397*X4+3.990032*X3+2.003624*X
1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 6 15771369.4369300980 2628561.5728216828 835131.7203478590
X5 1 7539744.9039596915
X6 1 3194863.2099587135
X2 1 2128815.0986955278
X4 1 2295568.4112794157
X3 1 413919.4627082814
222
X1 1 198458.3503284678
error 993 3125.4490497915 3.1474814197
total 999 15774494.8859798890
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 1.7256189338 1.6720512567 1.03204
X5 5.9993912365 0.0039025193 1537.31237
X6 6.9928693758 0.0073136456 956.14004
X2 3.0017403101 0.0037608884 798.14661
X4 5.0053974070 0.0058727120 852.31447
X3 3.9900315733 0.0111098385 359.14398
X1 2.0036237968 0.0079792685 251.10369
----------------------------------------------------------------------------------
MSE=3.1474814197 , R2=0.999802 , R2(adj)=0.999801
r(X7,X1|X5,X6,X2,X4,X3) square= 0.9844955346, test value= 63053.0649313661
(32.2) goodness of fit( the best parameters)

r(X7,X1)=0.117370,r(X7,X2)=0.350746,r(X7,X3)=0.165660,r(X7,X4)=0.375302,
r(X7,X5)=0.702273,r(X7,X6)=0.448480,r(X1,X2)=-0.000060,r(X1,X3)=0.000167,
r(X1,X4)=0.000135,r(X1,X5)=0.000304,r(X1,X6)=0.000086,r(X2,X3)=0.000551,
r(X2,X4)=-0.000252,r(X2,X5)=-0.000202,r(X2,X6)=-0.000408,r(X3,X4)=0.000080,
r(X3,X5)=-0.000191,r(X3,X6)=-0.000094,r(X4,X5)=0.000061,r(X4,X6)=-0.000309,
r(X5,X6)=-0.000132,
X7=0.986753+1.999921*X1+2.999941*X2+4.000169*X3+5.000047*X4+5.999984*X5
+7.000040*X6
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
Regression 6 1458978976184.2949000000 243163162697.3824800000
error 99999993 326648737.7526771400 3.2664876062
total 99999999 1459305624922.0476000000
----------------------------------------------------------------------------------
F test value=74441783350.7970430000,
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 0.9867531718 0.0055751175 176.99236 0.00000
X1 1.9999213639 0.0000255605 78242.50800 0.00000
X2 2.9999412544 0.0000127842 234660.26463 0.00000
X3 4.0001689600 0.0000361412 110681.79038 0.00000
X4 5.0000470778 0.0000199242 250953.70286 0.00000
223
X5 5.9999843218 0.0000127805 469463.88122 0.00000
X6 7.0000395856 0.0000233334 300000.46358 0.00000
----------------------------------------------------------------------------------
MSE=3.2664876062 , R2=0.999776 , R2(adj)=0.999776
Var(b0)= 0.0000310819, Cov(b0,b1)= -0.0000000653, Cov(b0,b2)= -0.0000000082,
Cov(b0,b3)= -0.0000001306, Cov(b0,b4)= -0.0000000397,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b0,b5)= -0.0000000163, Cov(b0,b6)= -0.0000000545,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -0.0000000653, Var(b1)= 0.0000000007, Cov(b1,b2)= 0.0000000000,
Cov(b1,b3)= -0.0000000000, Cov(b1,b4)= -0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b5)= -0.0000000000, Cov(b1,b6)= -0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.0000000082, Cov(b2,b1)= 0.0000000000, Var(b2)= 0.0000000002,
Cov(b2,b3)= -0.0000000000, Cov(b2,b4)= 0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b5)= 0.0000000000, Cov(b2,b6)= 0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= -0.0000001306, Cov(b3,b1)= -0.0000000000, Cov(b3,b2)= -0.0000000000, Var(b3)=
0.0000000013, Cov(b3,b4)= -0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b5)= 0.0000000000, Cov(b3,b6)= 0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-0.0000000000, Var(b4)= 0.0000000004,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b5)= -0.0000000000, Cov(b4,b6)= 0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0.0000000000, Cov(b5,b4)= -0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Var(b5)= 0.0000000002, Cov(b5,b6)= 0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0.0000000000, Cov(b6,b4)= 0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b6,b5)= 0.0000000000, Var(b6)= 0.0000000005,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ indivudal test ~~~~~~~~~~~~~~~~~~~~
intercept 0.9867531718 0.0055751175 176.9924 31326.2939
X1 slope 1.9999213639 0.0000255605 78242.5080 6121890058.7773
X2 slope 2.9999412544 0.0000127842 234660.2646 55065439797.1507
X3 slope 4.0001689600 0.0000361412 110681.7904 12250458721.4729
X4 slope 5.0000470778 0.0000199242 250953.7029 62977760979.4029
X5 slope 5.9999843218 0.0000127805 469463.8812 220396335769.2494
X6 slope 7.0000395856 0.0000233334 300000.4636 90000278149.9821
====================
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -2.97291 -2.31628 -1.87318 -1.52108 -1.21899 -0.94773
-0.69635 -0.45830 -0.22703 -0.00041 0.22670 0.45785 0.69634 0.94773
1.21892 1.52097 1.87310 2.31610 2.97282
upper limit -2.97291 -2.31628 -1.87318 -1.52108 -1.21899 -0.94773 -0.69635
-0.45830 -0.22703 -0.00041 0.22670 0.45785 0.69634 0.94773 1.21892
1.52097 1.87310 2.31610 2.97282
observed no 5051613.00000 5971972.00000 5542443.00000 5224203.00000 4980733.00000 4823755.00000
4697688.00000 4601644.00000 4563161.00000 4528589.00000 4541139.00000 4572172.00000
4614681.00000 4690735.00000 4808916.00000 4991488.00000 5225001.00000 5546665.00000
5974645.00000 5048757.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
224
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 532.78035 188945.91376 58848.88165 10053.39704 74.24346 6212.46000 18278.50907
31737.50055 38165.66238 44445.66618 42110.68346 36607.35952 29694.14635 19128.96805
7302.61901 14.49083 10125.09000 59768.52445 189986.57521 475.44901
p-value=0.000000
Z=1.752560, p-value=0.960200
Z=1.752560, p-value=0.039800
Z=1.752560, p-value=0.079600
t=2,3,...,100000000
D.W. test=1.999746
D.W. test=2.000254

The probability distribution of X7 The probability distribution of X7

sample cov(X7 estimated value,residual)= -0.0000,
225
X7 estimated value and residual sample correlation coefficient=-0.0000.
sample variance(X7 estimated value)= 14589.7898
(32.2.2)The marginal probability distribution of X7 estimated line

Variance : 14589.78991
S.D. : 120.78820
MAD : 96.40327
Range : 1405.47866
Mid_range : 2613.89889
Median : 2549.16206
Q1 : 2468.29156
Q2 : 2549.16206
Q3 : 2631.52458
IQR : 163.23303
C.V. : 0.04735
SLLN analysis, X0=residual and Normal(2551.02346,14589.78991),\

Note:X1~ Normal(2551.02346,14589.78991),
X1 is representable code of Normal(2551.02346,14589.78991),
(32.2.3)X0= residual,residual mariginal probability distribution
226
Variance : 3.26649
S.D. : 1.80734
MAD : 1.48660
Range : 9.94034
Mid_range : 0.00275
Median : 0.00031
Q1 : -1.32379
Q2 : 0.00031
Q3 : 1.32410
IQR : 2.64788
C.V. : none
curve-fitting estimated the distribution function of X0(residual)

F(X)= 0.04279117781628622600+
0.06386796554038924600*(X- -3.08770295077597770000)^1+
0.02924626079175995900*(X- -3.08770295077597770000)^2+
0.00244352912279721670*(X- -3.08770295077597770000)^3+
-0.00080594470102637872*(X- -3.08770295077597770000)^4+
value range 0.0000000000<=F(x)<= 0.1000000000 ,
value range -4.9674217029<=X<= -2.4106031277 ,
determination=0.999999716367427130,

F(X)= 0.14852674053127812000+
0.13088301317834972000*(X- -2.00060364558644690000)^1+
0.02985757426370139200*(X- -2.00060364558644690000)^2+
-0.00162920872622507320*(X- -2.00060364558644690000)^3+
value range 0.1000000100<=F(x)<= 0.2000000000 ,
value range -2.4106031277<=X<= -1.6367992035 ,
determination=0.999999835366473190,

F(X)= 0.24930314274936044000+
0.16689713483229096000*(X- -1.32778208575397530000)^1+
0.02319621571956193000*(X- -1.32778208575397530000)^2+
-0.00168972013052481880*(X- -1.32778208575397530000)^3+
value range 0.2000000100<=F(x)<= 0.3000000000 ,
value range -1.6367992035<=X<= -1.0356911027 ,
determination=0.999999673153987080,

F(X)= 0.34965310517686132000+
0.18871044080936428000*(X- -0.76624954607540219000)^1+
0.01472384040379440300*(X- -0.76624954607540219000)^2+
-0.00829597787393865360*(X- -0.76624954607540219000)^3+
value range 0.3000000100<=F(x)<= 0.4000000000 ,
value range -1.0356910100<=X<= -0.5040456445 ,
determination=0.999999746017233290,
227
F(X)= 0.44987576559502285000+
0.19858376182936632000*(X- -0.25066119544383186000)^1+
0.00585919768037360120*(X- -0.25066119544383186000)^2+
-0.00523476762313990210*(X- -0.25066119544383186000)^3+
value range 0.4000000100<=F(x)<= 0.5000000000 ,
value range -0.5040456445<=X<= 0.0003084673 ,
determination=0.999999815607305110,

F(X)= 0.55011281142678792000+
0.19908476254435437000*(X-0.25100872492706289000)^1+
-0.00533519966687007190*(X-0.25100872492706289000)^2+
-0.00870999205344702430*(X-0.25100872492706289000)^3+
value range 0.5000000100<=F(x)<= 0.6000000000 ,
value range 0.0003084872<=X<= 0.5039975012 ,
determination=0.999999807275206650,

F(X)= 0.65035089249212519000+
0.18839848608373333000*(X-0.76615473814143320000)^1+
-0.01488542500843687000*(X-0.76615473814143320000)^2+
-0.00455041054919291810*(X-0.76615473814143320000)^3+
value range 0.6000000100<=F(x)<= 0.7000000000 ,
value range 0.5039975012<=X<= 1.0359382973 ,
determination=0.999999755719242160,

F(X)= 0.75068010461750023000+
0.16728580746042221000*(X-1.32824611297930170000)^1+
-0.02263394268283758200*(X-1.32824611297930170000)^2+
-0.00626591866013370690*(X-1.32824611297930170000)^3+
value range 0.7000000100<=F(x)<= 0.8000000000 ,
value range 1.0359382973<=X<= 1.6371473720 ,
determination=0.999999764087466050,

F(X)= 0.85147207185431228000+
0.13087397632475284000*(X-2.00057859839327580000)^1+
-0.02986037620957660000*(X-2.00057859839327580000)^2+
-0.00118279895714579200*(X-2.00057859839327580000)^3+
value range 0.8000000100<=F(x)<= 0.9000000000 ,
value range 1.6371475110<=X<= 2.4103371025 ,
determination=0.999999613804144700,

F(X)= 0.95718373271293644000+
0.06347187942155918500*(X-3.08701186261087730000)^1+
-0.02902312673994789100*(X-3.08701186261087730000)^2+
0.00362183346458078150*(X- 3.08701186261087730000)^3+
value range 0.9000000100<=F(x)<= 0.9999999900 ,
value range 2.4103371025<=X<= 4.9729221728 ,
determination=0.999990830055728750
228
(32.2.4)The mariginal probability distribution of random variable.

X1 mariginal probability distribution
Variance : 49.99663
S.D. : 7.07083
MAD : 6.36589
Range : 20.00000
Mid_range : 100.00000
Median : 99.99818
Q1 : 92.92796
Q2 : 99.99818
Q3 : 107.06763
IQR : 14.13967
C.V. : 0.07071
SLLN analysis, X1 and Arcsin(100,10),Note:X2~ Arcsin(100,10), X2 is

representable code of Arcsin(100,10),
229
S.D. : 14.13733
MAD : 9.99860
Range : 314.45206
Median : 50.00008
Q1 : 43.07283
Q2 : 50.00008
Q3 : 56.93560
IQR : 13.86277
C.V. : 0.28274
SLLN analysis, X2 and DE(0.1,50),Note:X3~ Arcsin(100,10) ,X3 is representable

code of DE(0.1,50),

Variance : 25.00788
S.D. : 5.00079
MAD : 4.24472
Range : 19.99959
Mid_range : 100.00003
Median : 100.00132
Q1 : 95.95938
Q2 : 100.00132
Q3 : 104.03906
IQR : 8.07968
C.V. : 0.05001
SLLN analysis, X3 and Semi circle(100,10),Note:X4~ Semi circle(100,10),X4 is
230
representable code of Semi circle(100,10),

Variance : 82.28490
S.D. : 9.07110
MAD : 6.93383
Range : 137.34176
Median : 100.00042
Q1 : 94.50637
Q2 : 100.00042
Q3 : 105.49555
IQR : 10.98918
C.V. : 0.09071
SLLN analysis, X4 and Logistic(100,5),Note:X5~ Logistic(100,5),X5 is
representable code of Logistic(100,5),
231
S.D. : 14.14141
MAD : 11.26371
Range : 153.08074
Mid_range : 116.88629
Median : 99.34229
Q1 : 90.13605
Q2 : 99.34229
Q3 : 109.14344
IQR : 19.00739
C.V. : 0.14141
SLLN analysis, X5 and Gamma(50,2),Note:X6~ Gamma(50,2),X6 is representable

code of Gamma(50,2),
X6 marginal probability
Variance : 59.99624
S.D. : 7.74572
MAD : 7.49981
Range : 20.00000
Mid_range : 100.00000
Median : 99.49599
Q1 : 92.06262
Q2 : 99.49599
Q3 : 107.93659
IQR : 15.87397
C.V. : 0.07746
SLLN analysis, X6 and U-quadratic(90,110),Note:X7~ U-quadratic(90,110),X7 is

representable code of U-quadratic(90,110),
232
(32.2.5)The joint probability distribution of X7 and one of independent variables.

f(x1,x7) f(x7,x1)

f(x2,x7) f(x7,x2)

233
f(x3,x7) f(x7,x3)

f(x4,x7) f(x7,x4)

f(x5,x7) f(x7,x5)

234
sample cov(X5,X7)= 1199.6967, X5 and X7 sample correlation coefficient=0.7023.
f(x6,x7) f(x7,x6)

(32.2.6)The multi-variate analysis using linear model(refer chapter 7).

Independent variables are X2,X3,X4,X5,X6,X7,
X1=1.120834+-1.475921*X2+-1.968012*X3+-2.459938*X4+-2.951889*X5+-3.443901*X6
+0.491983*X7
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 6 4919307391.8737688000 819884565.3122948400 1020315163.5287672000
error 99999993 80356005.4017818270 0.8035601103
total 99999999 4999663397.2755508000
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 1.1208336970 0.0030826525 363.59391 0.00000
X2 -1.4759212214 0.0000221998 -66483.53662 0.00000
X3 -1.9680121731 0.0000344584 -57112.62611 0.00000
X4 -2.4599375336 0.0000367661 -66907.78196 0.00000
X5 -2.9518891217 0.0000426791 -69164.76974 0.00000
X6 -3.4439007133 0.0000507719 -67830.82138 0.00000
X7 0.4919832135 0.0000070145 70137.76990 0.00000
----------------------------------------------------------------------------------
MSE=0.8035601103 , R2=0.983928 , R2(adj)=0.983928, C.V.= 0.0089643188,

X2=-0.237706+-0.665445*X1+-1.330996*X3+-1.663694*X4+-1.996409*X5+-2.329164*X6
+0.332736*X7
ANOVA
----------------------------------------------------------------------------------
235
Source df SS MS F
----------------------------------------------------------------------------------
Regression 6 19950183969.6544340000 3325030661.6090722000 9177578605.3239517000
error 99999993 36229931.3560557440 0.3622993389
total 99999999 19986413901.0104900000
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept -0.2377059175 0.0030849345 - 77.05380 0.00000
X1 -0.6654452663 0.0000149064 -44641.49140 0.00000
X3 -1.3309958247 0.0000221106 -60197.06391 0.00000
X4 -1.6636944112 0.0000161308 -103137.67725 0.00000
X5 -1.9964087901 0.0000158033 -126328.44017 0.00000
X6 -2.3291637691 0.0000209387 -111237.03189 0.00000
X7 0.3327356069 0.0000023557 141245.07951 0.00000
----------------------------------------------------------------------------------
MSE=0.3622993389 , R2=0.998187 , R2(adj)=0.998187, C.V.= 0.0120378148,

X3=0.564897+-0.495910*X1+-0.743880*X2+-1.239838*X4+-1.487789*X5+-1.735767*X6
+0.247965*X7
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 6 2480539377.8867912000 413423229.6477985400 2041743914.3752115000
error 99999993 20248533.5108581890 0.2024853493
total 99999999 2500787911.3976493000
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 0.5648965275 0.0030826321 183.25136 0.00000
X1 -0.4959101677 0.0000172975 -28669.47540 0.00000
X2 -0.7438797949 0.0000165297 -45002.67651 0.00000
X4 -1.2398378822 0.0000272264 -45538.09259 0.00000
X5 -1.4877885377 0.0000306965 -48467.65654 0.00000
X6 -1.7357674964 0.0000371645 -46704.93271 0.00000
X7 0.2479653129 0.0000049787 49805.00428 0.00000
----------------------------------------------------------------------------------
MSE=0.2024853493 , R2=0.991903 , R2(adj)=0.991903, C.V.= 0.0044998456,

X4=-0.038485+-0.399346*X1+-0.599032*X2+-0.798758*X3+-1.198083*X5+-1.397776*X6
+0.199681*X7
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
236
----------------------------------------------------------------------------------
Regression 6 8215444625.9808950000 1369240770.9968159000 10496295470.1297400000
Error 99999993 13044989.8161359350 0.1304499073
total 99999999 8228489615.7970314000
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept -0.0384853409 0.0030851689 -12.47431 0.00000
X1 -0.3993461255 0.0000148136 -26958.10317 0.00000
X2 -0.5990316805 0.0000096793 -61887.85966 0.00000
X3 -0.7987577144 0.0000218532 -36551.05508 0.00000
X5 -1.1980831163 0.0000149912 -79918.99507 0.00000
X6 -1.3977758379 0.0000201090 -69509.97265 0.00000
X7 0.1996810515 0.0000022030 90639.08156 0.00000
----------------------------------------------------------------------------------
MSE= 0.1304499073 , R2=0.998415 , R2(adj)=0.998415, C.V.= 0.0036117202

X5=-0.119023+-0.333170*X1+-0.499765*X2+-0.666394*X3+-0.832965*X4+-1.166147*X6
+0.166592*X7
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 6 19988869357.5217510000 3331478226.2536254000 36732729636.6581340000
error 99999993 9069508.3812269624 0.0906950902
total 99999999 19997938865.9029770000
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept -0.1190229107 0.0030849341 -38.58199 0.00000
X1 -0.3331696539 0.0000143383 -23236.33526 0.00000
X2 -0.4997648502 0.0000079069 -63206.13741 0.00000
X3 -0.6663944529 0.0000205440 -32437.47975 0.00000
X4 -0.8329653751 0.0000124999 -66637.70242 0.00000
X6 -1.1661473093 0.0000153193 -76122.60029 0.00000
X7 0.1665915151 0.0000011783 141381.98453 0.00000
----------------------------------------------------------------------------------
MSE=0.0906950902 , R2=0.999546 , R2(adj)=0.999546 ,C.V.= 0.0030114631

X6=-0.029766+-0.285384*X1+-0.428085*X2+-0.570815*X3+-0.713496*X4+-0.856185*X5
+0.142698*X7
ANOVA
----------------------------------------------------------------------------------
237
Source df SS MS F
----------------------------------------------------------------------------------
Regression 6 5992964742.2179699000 998827457.0363283200 15000050782.4948750000
error 99999993 6658826.7040005112 0.0665882717
total 99999999 5999623568.9219704000
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept -0.0297658424 0.0030851657 -9.64805 0.00000
X1 -0.2853842400 0.0000146155 -19526.15614 0.00000
X2 -0.4280852131 0.0000089767 -47688.59117 0.00000
X3 -0.5708154020 0.0000213123 -26783.34505 0.00000
X4 -0.7134959238 0.0000143670 -49661.98808 0.00000
X5 -0.8561845389 0.0000131264 -65225.97779 0.00000
X7 0.1426977827 0.0000018433 77414.22985 0.00000
----------------------------------------------------------------------------------
MSE=0.0665882717 , R2=0.998890 , R2(adj)=0.998890, C.V.= 0.0025804938
X7=0.986753+1.999921*X1+2.999941*X2+4.000169*X3+5.000047*X4
+5.999984*X5+7.000040*X6+error,
Convert the above linear model to

X1=-0.986753/1.999921+-2.999941/1.999921*X2+-4.000169/1.999921*X3
+-5.000047/1.999921*X4+-5.999984/1.999921*X5+-7.000040/1.999921*X6
+X7/1.999921-error/1.999921,
X1 estimated line
X1=1.120834+-1.475921*X2+-1.968012*X3+-2.459938*X4+-2.951889*X5
+-3.443901*X6+0.491983*X7,
X1,…,X6 are independent random variables.
There have a difference about X1 estiamted line and from the X7 estimated line
coverted to X1 estimated line.
238
6.4. Non-linear model and the other assumptions are unchanged.
Example 33,
X 1 ~ Normal (E ( X 1 ) = 100,Var ( X 1 ) = 25),
X 2 x1 ~ Normal (E (X 2 x1 ) = 50 + 0.5 x1 ,Var ( X 2 x1 ) = 16 ),
X 3 x1 , x2 ~ Normal (E ( X 3 x1 , x2 ) = 10 + 0.5 x1 + 0.5 x2 ,Var ( X 3 x1 , x2 ) = 12.25),
X 4 x1 , x2 ~ Normal (E ( X 4 x1 , x2 ) = 5 + 0.7 x1 + 0.3 x2 ,Var ( X 4 x1 , x2 ) = 16 ),
ε ~ Normal (E (error ) = 0,Var (error ) = 16),
X 5 = 1 + 2 X 1 + 3Cos ( X 2π ) + 4 X 3 + 5 log( X 4 ) + ε ,
(33.1.1)Non-linear model analysis,
Independent variables are X1,X2*X2*Cos(X2*pi),X3^2,X4*Sin(X4*pi),
r(X5,X1)=0.913424,r(X5,X2*X2*Cos(X2*pi))=0.190844,r(X5,X3^2)=0.669410,
r(X5,X4*Sin(X4*pi))=-0.004997,r(X1,X2*X2*Cos(X2*pi))=-0.005078,
r(X1,X3^2)=0.661686,r(X1,X4*Sin(X4*pi))=0.031870,r(X2*X2*Cos(X2*pi),X3^2)=0.048152,
r(X2*X2*Cos(X2*pi),X4*Sin(X4*pi))=-0.007655,r(X3^2,X4*Sin(X4*pi))=0.005973,
step 1, X1 into the linear model, SSR= 109970.4139046841

step 2, X2*X2*Cos(X2*pi) into the linear model, SSR= 5036.8150548244
step 3, X3^2 into the linear model, SSR= 711.0615840575
step 4, X4*Sin(X4*pi) into the linear model, SSR= 128.4593413558

X5=53.1089278285+2.019806*X1+0.000306*X2*X2*Cos(X2*pi)+0.000923*X3^2+
-0.004751*X4*Sin(X4*pi)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 4 115846.7498849219 28961.6874712305 1805.7850730744
error 995 15958.0890680491 16.0382804704
total 999 131804.8389529710
----------------------------------------------------------------------------------
MSE= 16.0382804704 , R2=0.878926 , R2(adj)=0.878440
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 53.1089278285 0.6739540370 78.80200 0.00000
X1 2.0198062914 0.0087362837 231.19742 0.00000
X2*X2*Cos(X2*pi) 0.0003055191 0.0000044342 68.90005 0.00000
X3^2 0.0009232908 0.0000349238 26.43727 0.00000
X4*Sin(X4*pi) -0.0047511270 0.0004191928 -11.33399 0.00000
----------------------------------------------------------------------------------
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -5.13252 -3.37048 -2.10002 -1.01448 0.00010 1.01451
2.10002 3.36880 5.13210
upper limit -5.13252 -3.37048 -2.10002 -1.01448 0.00010 1.01451 2.10002
3.36880 5.13210
observed no 94.00000 95.00000 119.00000 100.00000 96.00000 89.00000 101.00000
94.00000 113.00000 99.00000
239
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.36000 0.25000 3.61000 0.00000 0.16000 1.21000 0.01000
0.36000 1.69000 0.01000
degree of freedom=8
p-value=0.467300
Z=0.634838, p-value=0.737300
Z=0.634838, p-value=0.262700
Z=0.634838, p-value=0.525400
t=2,3,...,1000
D.W. test=2.084979
D.W. test=1.915021
residual plot (X5 estimated line,X5) scatter diagram
(33.2)The non-linear model stepwise analysis

r(X5,X1)=0.913424,r(X5,X2)=0.508751,r(X5,X3)=0.667716,r(X5,X4)=0.699627,
r(X1,X2)=0.517558,r(X1,X3)=0.659864,r(X1,X4)=0.734316,r(X2,X3)=0.664733,
r(X2,X4)=0.524104,r(X3,X4)=0.558454,

Independent variables are X1,
r(X5,X1)=0.913424,
240
The step of independent variable function into the linear model
X5= 49.3856531699+2.168077*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 1 109970.4139046841 109970.4139046841 5026.4878893839
error 998 21834.4250482869 21.8781814111
total 999 131804.8389529710
----------------------------------------------------------------------------------
MSE=21.8781814111 , R2=0.834343 , R2(adj)=0.834177
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 49.3856531699 0.6555359596 75.33630 0.00000
X1 2.1680765441 0.0065378760 331.61787 0.00000
----------------------------------------------------------------------------------

Independent variables are X2^3,
r(X5,X2^3)=0.511084,
X5= 224.8348867557+0.000041*X2^3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 1 34428.3523168149 34428.3523168149 352.8520775304
error 998 97376.4866361561 97.5716298959
total 999 131804.8389529710
----------------------------------------------------------------------------------
MSE=97.5716298959 , R2=0.261207 , R2(adj)=0.260467
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 224.8348867557 0.2268732444 991.01543 0.00000
X2^3 0.0000411869 0.0000002220 185.54879 0.00000
----------------------------------------------------------------------------------

r(X5,X3^3)=0.670340,
X5= 214.9710238916+0.000038*X3^3
ANOVA
----------------------------------------------------------------------------------
241
Source df SS MS F
----------------------------------------------------------------------------------
Regression 1 59227.2618314910 59227.2618314910 814.4224380609
error 998 72577.5771214800 72.7230231678
total 999 131804.8389529710
----------------------------------------------------------------------------------
MSE=72.7230231678 , R2=0.449356 , R2(adj)=0.448804
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 214.9710238916 0.2141637910 1003.76923 0.00000
X3^3 0.0000383022 0.0000001574 243.36652 0.00000
----------------------------------------------------------------------------------

r(X5,X4^2)=0.699740,
X5= 193.5756416621+0.006587*X4^2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 1 64536.3972970594 64536.3972970594 957.4671705927
error 998 67268.4416559116 67.4032481522
total 999 131804.8389529710
----------------------------------------------------------------------------------
MSE=67.4032481522 , R2=0.489636 , R2(adj)=0.489125
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 193.5756416621 0.2888732462 670.10581 0.00000
X4^2 0.0065866932 0.0000259278 254.04015 0.00000
----------------------------------------------------------------------------------

Independent variables are X1,X2*X2*Cos(X2*pi),
r(X5,X1)=0.913424,r(X5,X2*X2*Cos(X2*pi))=0.190844,r(X1,X2*X2*Cos(X2*pi))=-0.005078,

X5= 49.2429693450+2.170433*X1+0.000314*X2*X2*Cos(X2*pi)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 115007.2289595085 57503.6144797543 3413.0512411366
error 997 16797.6099934625 16.8481544568
total 999 131804.8389529710
242
----------------------------------------------------------------------------------
MSE=16.8481544568 , R2=0.872557 , R2(adj)=0.872301
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 49.2429693450 0.6555390426 75.11829 0.00000
X1 2.1704326025 0.0065379603 331.97396 0.00000
X2*X2*Cos(X2*pi) 0.0003139506 0.0000044237 70.97052 0.00000
----------------------------------------------------------------------------------

Independent variables are X1,X3^3,
r(X5,X1)=0.913424,r(X5,X3^3)=0.670340,r(X1,X3^3)=0.662741,
X5= 58.7297728175+1.985807*X1+0.000007*X3^3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 110962.7418283631 55481.3709141816 2654.0000495502
error 997 20842.0971246079 20.9048115593
total 999 131804.8389529710
----------------------------------------------------------------------------------
MSE=20.9048115593 , R2=0.841872 , R2(adj)=0.841554
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 58.7297728175 0.7195241581 81.62307 0.00000
X1 1.9858068687 0.0087305734 227.45435 0.00000
X3^3 0.0000066206 0.0000002102 31.50124 0.00000
----------------------------------------------------------------------------------

Independent variables are X1,X4^3,
r(X5,X1)=0.913424,r(X5,X4^3)=0.698904,r(X1,X4^3)=0.732508,
X5= 56.1186402713+2.056225*X1+0.000004*X4^3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 110223.2122374654 55111.6061187327 2545.9745006571
error 997 21581.6267155056 21.6465664147
total 999 131804.8389529710
----------------------------------------------------------------------------------
MSE=21.6465664147 , R2=0.836261 , R2(adj)=0.835932
243
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 56.1186402713 0.7804182270 71.90842 0.00000
X1 2.0562247214 0.0096038114 214.10507 0.00000
X4^3 0.0000038173 0.0000002401 15.89963 0.00000
----------------------------------------------------------------------------------

Independent variables are X2*X2*Cos(X2*pi),X3^3,
r(X5,X2*X2*Cos(X2*pi))=0.190844,r(X5,X3^3)=0.670340,r(X2*X2*Cos(X2*pi),X3^3)=0.048436,
X5= 215.6380574254+0.000255*X2*X2*Cos(X2*pi)+0.000038*X3^3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 62541.0732944958 31270.5366472479 450.1159407219
error 997 69263.7656584753 69.4721822051
total 999 131804.8389529710
----------------------------------------------------------------------------------
MSE=69.4721822051 , R2=0.474498 , R2(adj)=0.473443
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 215.6380574254 0.2144770287 1005.41330 0.00000
X2*X2*Cos(X2*pi) 0.0002549480 0.0000044288 57.56571 0.00000
X3^3 0.0000378629 0.0000001576 240.29265 0.00000
----------------------------------------------------------------------------------

Independent variables are X2*X2*Cos(X2*pi),X4^2,
r(X5,X2*X2*Cos(X2*pi))=0.190844,r(X5,X4^2)=0.699740,r(X2*X2*Cos(X2*pi),X4^2)=0.007460,
X5= 193.8085737983+0.000298*X2*X2*Cos(X2*pi)+0.006574*X4^2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 69078.1550056679 34539.0775028340 548.9762586407
error 997 62726.6839473031 62.9154302380
total 999 131804.8389529710
----------------------------------------------------------------------------------
MSE=62.9154302380 , R2=0.524094 , R2(adj)=0.523140
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
244
intercept 193.8085737983 0.2888939230 670.86414 0.00000
X2*X2*Cos(X2*pi) 0.0002981273 0.0000044237 67.39256 0.00000
X4^2 0.0065736582 0.0000259285 253.53035 0.00000
----------------------------------------------------------------------------------

Independent variables are X3^3,X4^2,
r(X5,X3^3)=0.670340,r(X5,X4^2)=0.699740,r(X3^3,X4^2)=0.559496,
X5= 186.0360031268+0.000023*X3^3+0.004449*X4^2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 79454.1168815012 39727.0584407506 756.5870287587
error 997 52350.7220714698 52.5082468119
total 999 131804.8389529710
----------------------------------------------------------------------------------
MSE=52.5082468119 , R2=0.602816 , R2(adj)=0.602020
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 186.0360031268 0.2953953257 629.78655 0.00000
X3^3 0.0000231925 0.0000001899 122.13812 0.00000
X4^2 0.0044489971 0.0000312822 142.22115 0.00000
----------------------------------------------------------------------------------

Independent variables are X1,X2*X2*Cos(X2*pi),X3^2,
r(X1,X2*X2*Cos(X2*pi))=-0.005078,r(X1,X3^2)=0.661686,r(X2*X2*Cos(X2*pi),X3^2)=0.048152,
X5= 53.3699085295+2.016154*X1+0.000306*X2*X2*Cos(X2*pi)+0.000931*X3^2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 3 115718.2905435661 38572.7635145220 2388.2359026131
error 996 16086.5484094050 16.1511530215
total 999 131804.8389529710
----------------------------------------------------------------------------------
MSE=16.1511530215 , R2=0.877952 , R2(adj)=0.877584
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 53.3699085295 0.6735605615 79.23550 0.00000
245
X1 2.0161536896 0.0087303376 230.93651 0.00000
X2*X2*Cos(X2*pi) 0.0003058271 0.0000044342 68.97079 0.00000
X3^2 0.0009310891 0.0000349171 26.66574 0.00000
----------------------------------------------------------------------------------

Independent variables are X1,X2*X2*Cos(X2*pi),X4^3,
r(X1,X2*X2*Cos(X2*pi))=-0.005078,r(X1,X4^3)=0.732508,r(X2*X2*Cos(X2*pi),X4^3)=0.006869,
X5= 55.5104633485+2.066314*X1+0.000313*X2*X2*Cos(X2*pi)+0.000004*X4^3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 3 115226.1952529242 38408.7317509747 2307.4925498195
error 996 16578.6437000468 16.6452245984
total 999 131804.8389529710
----------------------------------------------------------------------------------
MSE=16.6452245984 , R2=0.874218 , R2(adj)=0.873839
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 55.5104633485 0.7804655923 71.12481 0.00000
X1 2.0663138354 0.0096048706 215.13188 0.00000
X2*X2*Cos(X2*pi) 0.0003129323 0.0000044242 70.73177 0.00000
X4^3 0.0000035532 0.0000002401 14.79751 0.00000
----------------------------------------------------------------------------------

Independent variables are X1,X3^3,X4*Sin(X4*pi),
r(X5,X1)=0.913424,r(X5,X3^3)=0.670340,r(X5,X4*Sin(X4*pi))=-0.004997,r(X1,X3^3)=0.662741,
r(X1,X4*Sin(X4*pi))=0.031870,r(X3^3,X4*Sin(X4*pi))=0.008973,
X5= 58.4351073897+1.989399*X1+0.000007*X3^3+-0.004979*X4*Sin(X4*pi)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 3 111103.8548538642 37034.6182846214 1781.8708344921
error 996 20700.9840991069 20.7841205814
total 999 131804.8389529710
----------------------------------------------------------------------------------
MSE=20.7841205814 , R2=0.842942 , R2(adj)=0.842469
Individual test
----------------------------------------------------------------------------------
246
----------------------------------------------------------------------------------
intercept 58.4351073897 0.7199516097 81.16533 0.00000
X1 1.9893993937 0.0087358097 227.72925 0.00000
X3^3 0.0000065801 0.0000002102 31.30427 0.00000
X4*Sin(X4*pi) -0.0049791827 0.0004191549 -11.87910 0.00000
----------------------------------------------------------------------------------

Independent variables are X2*X2*Cos(X2*pi),X3^3,X4^2,
r(X5,X2*X2*Cos(X2*pi))=0.190844,r(X5,X3^3)=0.670340,r(X5,X4^2)=0.699740,
r(X2*X2*Cos(X2*pi),X3^3)=0.048436,r(X2*X2*Cos(X2*pi),X4^2)=0.007460,
r(X3^3,X4^2)=0.559496,
X5= 186.4482337824+0.000270*X2*X2*Cos(X2*pi)+0.000023*X3^3+0.004494*X4^2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 3 83169.8561265889 27723.2853755296 567.7475477394
error 996 48634.9828263821 48.8303040426
total 999 131804.8389529710
----------------------------------------------------------------------------------
MSE=48.8303040426 , R2=0.631008 , R2(adj)=0.629896
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 186.4482337824 0.2954727264 631.01673 0.00000
X2*X2*Cos(X2*pi) 0.0002700428 0.0000044301 60.95686 0.00000
X3^3 0.0000225735 0.0000001902 118.70847 0.00000
X4^2 0.0044942475 0.0000312911 143.62724 0.00000
----------------------------------------------------------------------------------

Independent variables are X1,X2*X2*Cos(X2*pi),X3^2,X4*Sin(X4*pi),
r(X5,X4*Sin(X4*pi))=-0.004997,r(X1,X2*X2*Cos(X2*pi))=-0.005078,r(X1,X3^2)=0.661686,
r(X1,X4*Sin(X4*pi))=0.031870,r(X2*X2*Cos(X2*pi),X3^2)=0.048152,
r(X2*X2*Cos(X2*pi),X4*Sin(X4*pi))=-0.007655,r(X3^2,X4*Sin(X4*pi))=0.005973,
X5=53.1089278285+2.019806*X1+0.000306*X2*X2*Cos(X2*pi)+0.000923*X3^2
+-0.004751*X4*Sin(X4*pi)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 4 115846.7498849219 28961.6874712305 1805.7850730744
247
error 995 15958.0890680491 16.0382804704
total 999 131804.8389529710
----------------------------------------------------------------------------------
MSE=16.0382804704 , R2=0.878926 , R2(adj)=0.878440
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 53.1089278285 0.6739540370 78.80200 0.00000
X1 2.0198062914 0.0087362837 231.19742 0.00000
X2*X2*Cos(X2*pi) 0.0003055191 0.0000044342 68.90005 0.00000
X3^2 0.0009232908 0.0000349238 26.43727 0.00000
X4*Sin(X4*pi) -0.0047511270 0.0004191928 -11.33399 0.00000
----------------------------------------------------------------------------------
(33.2) n = 100,000,000, it is big data.

(33.2.1)Non-linear model analysis
Independent variables are X1,Cos(X2*pi),|X3|^0.5,log(X4),
r(X5,X1)=0.921517,r(X5,Cos(X2*pi))=0.179069,r(X5,|X3|^0.5)=0.676784,
r(X5,log(X4))=0.696588,r(X1,Cos(X2*pi))=0.000076,r(X1,|X3|^0.5)=0.680920,
r(X1,log(X4))=0.737117,r(Cos(X2*pi),|X3|^0.5)=-0.000024,r(Cos(X2*pi),log(X4))=0.000016,
r(|X3|^0.5,log(X4))=0.577720,
step 1, X1 into the linear model, SSR=11921050361.1319960000

step 2, Cos(X2*pi) into the linear model, SSR=449787823.1983413700
step 3, |X3|^0.5 into the linear model, SSR= 63660012.3779678340
step 4, log(X4) into the linear model, SSR= 3377899.0688457489

X5= 1.0251562982+2.000082*X1+2.999255*Cos(X2*pi)+3.998136*|X3|^0.5
+4.996964*log(X4)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 4 12437876095.7771510000 3109469023.9442878000 194317632.2003609200
error 99999995 1600199031.4829810000 16.0019911149
total 99999999 14038075127.2601320000
----------------------------------------------------------------------------------
MSE=16.0019911149 , R2=0.886010 , R2(adj)=0.886010
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 1.0251562982 0.0107055373 95.75944 0.00000
X1 2.0000815477 0.0000333810 59916.68067 0.00000
Cos(X2*pi) 2.9992546113 0.0001414136 21209.10325 0.00000
|X3|^0.5 3.9981358640 0.0005258580 7603.07144 0.00000
log(X4) 4.9969641068 0.0027188356 1837.90595 0.00000
----------------------------------------------------------------------------------
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
248
lower limit -6.58003 -5.12671 -4.14596 -3.36666 -2.69803 -2.09764
-1.54125 -1.01436 -0.50248 -0.00091 0.50177 1.01336 1.54122 2.09764
2.69787 3.36641 4.14578 5.12629 6.57984
upper limit -6.58003 -5.12671 -4.14596 -3.36666 -2.69803 -2.09764 -1.54125
-1.01436 -0.50248 -0.00091 0.50177 1.01336 1.54122 2.09764 2.69787
3.36641 4.14578 5.12629 6.57984
observed no 4998027.00000 5001867.00000 4999790.00000 5000871.00000 5000036.00000 4998466.00000
5003401.00000 4985695.00000 5013290.00000 4990655.00000 5001686.00000 5006678.00000
4997993.00000 5001607.00000 5000595.00000 4999887.00000 4996942.00000 4999684.00000
5000622.00000 5002208.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.77855 0.69714 0.00882 0.15173 0.00026 0.47063 2.31336
40.92661 35.32482 17.46581 0.56852 8.91914 0.80561 0.51649 0.07081
0.00255 1.87027 0.01997 0.07738 0.97505
p-value=0.000000
Z=0.198806, p-value=0.578800
Z=0.198806, p-value=0.421200
Z=0.198806, p-value=0.842400

249
sample cov(X5 estimated value,residual)= 0.0000,
X5 estimated value and residual sample correlation coefficient=0.0000.
(33.2.2) The marginal probability distribution of depenendet variable estimated line

X5 estimated probability distribuiton
S.D. : 11.15252
MAD : 8.89933
Range : 124.33897
Mid_range : 263.72541
Median : 266.21092
Q1 : 258.68125
Q2 : 266.21092
Q3 : 273.72886
IQR : 15.04760
C.V. : 0.04190
SLLN analysis, X5 estimated and Normal(266.19954, 124.37876),

Note:X6~ Normal(266.19954, 124.37876),
X6 is representable code of Normal(266.19954, 124.37876),
250
Variance : 16.00199
S.D. : 4.00025
MAD : 3.19168
Range : 44.76841
Mid_range : 0.46503
Median : -0.00013
Q1 : -2.69810
Q2 : -0.00013
Q3 : 2.69779
IQR : 5.39589
C.V. : none

(33.2.4)The marginal probability distribution

X1,Cos(X2*pi),|X3|^0.5,log(X4),
Y1=X1 marginal probability distribution
Variance : 24.99785
S.D. : 4.99978
MAD : 3.98941
Range : 56.11118
Median : 99.99927
Q1 : 96.62678
Q2 : 99.99927
Q3 : 103.37175
IQR : 6.74497
C.V. : 0.05000
251
Y2=Cos(X2*pi) marginal probability distribution
Variance : 0.50006
S.D. : 0.70715
MAD : 0.63668
Range : 2.00000
Mid_range : 0.00000
Median : -0.00005
Q1 : -0.70709
Q2 : -0.00005
Q3 : 0.70726
IQR : 1.41436
C.V. : none
Y3=X3|^0.5 marginal probability distribution
Variance : 0.06904
S.D. : 0.26276
MAD : 0.20957
Range : 3.06716
Median : 10.48810
Q1 : 10.30951
Q2 : 10.48810
Q3 : 10.66365
IQR : 0.35414
C.V. : 0.02506
Y4=log(X4) marginal probability distribution
Variance : 0.00303
S.D. : 0.05508
MAD : 0.04389
Range : 0.66743
Mid_range : 4.60436
Median : 4.65395
Q1 : 4.61624
Q2 : 4.65395
Q3 : 4.69030
IQR : 0.07406
C.V. : 0.01184
Y5=X5 marginal probability distribution
S.D. : 11.84824
MAD : 9.45415
Range : 133.04969
Mid_range : 265.89242
Median : 266.20948
Q1 : 258.21112
Q2 : 266.20948
Q3 : 274.19887
IQR : 15.98774
C.V. : 0.04451
252
(33.2.5)The joint probability distribution,
The joint probability distribution of one of X1,Cos(X2*pi),|X3|^0.5,log(X4) and X5.
f(y1,y5),Y1=X1,Y5=X5, f(y5,y1)
sample mean(Y1)= 99.9993, sample variance(Y1)= 24.9978,

sample cov(Y1,Y5)= 54.5894, Y1 and Y5 sample correlation coefficient=0.9215.
f(y2,y5),Y2=Cos(X2*pi),Y5=X5, f(y5,y2)

sample cov(Y2,Y5)=.5003,Y2 and Y5 sample correlation coefficient=0.1791.
f(y3,y5),Y3=|X3|^0.5,Y5=X5, f(y5,y3)
253
sample cov(Y3,Y5)=2.1070, Y3 and Y5 sample correlation coefficient=0.6768.
f(y4,y5),Y4=log(X4),Y5=X5, f(y5,y4)
sample mean(Y4)= 4.6524, sample variance(Y4)=0.0030,

(33.2.6)The multi-variate analysis using linear model

Independent variables are Cos(X2*pi),X3^2,X4,X5,
r(X1,Cos(X2*pi))=0.000076,r(X1,X3^2)=0.680603,r(X1,X4)=0.737681,r(X1,X5)=0.921517,
r(Cos(X2*pi),X3^2)=-0.000019,r(Cos(X2*pi),X4)=0.000013,r(Cos(X2*pi),X5)=0.179069,
r(X3^2,X4)=0.577670,r(X3^2,X5)=0.676368,r(X4,X5)=0.697063,

step 2, Cos(X2*pi) into the linear model, SSR= 70259236.4917988780

X1=-7.0882470635+-1.036578*Cos(X2*pi)+0.000187*X3^2+0.121884*X4+0.345668*X5
ANOVA
----------------------------------------------------------------------------------
254
Source df SS MS
----------------------------------------------------------------------------------
Regression 4 2223226529.0748138000 555806632.2687034600
error 99999995 276557964.4251456300 2.7655797825
total 99999999 2499784493.4999595000
----------------------------------------------------------------------------------
F test statistic=200972915.6177705500
MSE=2.7655797825 , R2=0.889367 , R2(adj)=0.889367
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept -7.0882470635 0.0024553389 -2886.87113 0.00000
Cos(X2*pi) -1.0365780832 0.0001474203 -7031.44887 0.00000
X3^2 0.0001873891 0.0000001154 1623.35288 0.00000
X4 0.1218840691 0.0000249898 4877.35611 0.00000
X5 0.3456682051 0.0000138827 24899.20961 0.00000
----------------------------------------------------------------------------------

Independent variables are X1,X3,X4,Sin(2*X5*pi),
r(X2,X1)=0.529988,r(X2,X3)=0.669036,r(X2,X4)=0.567591,r(X2,Sin(2*X5*pi))=-0.000126,
r(X1,X3)=0.681029,r(X1,X4)=0.737681,r(X1,Sin(2*X5*pi))=-0.000031,r(X3,X4)=0.578029,
r(X3,Sin(2*X5*pi))=0.000009,r(X4,Sin(2*X5*pi))=-0.000171,

step 4, Sin(2*X5*pi) into the linear model, SSR= 14.8453516960
X2=
29.1318536270+-0.050245*X1+0.456142*X3+0.244926*X4+-0.000545*Sin(2*X5*pi)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
Regression 4 1107559056.0695310000 276889764.0173827400
error 99999995 1117552941.2460485000 11.1755299712
total 99999999 2225111997.3155794000
----------------------------------------------------------------------------------
F test statistic=24776432.5029799640
MSE=11.1755299712 , R2=0.497754 , R2(adj)=0.497754

Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 29.1318536270 0.0022165524 13142.86692 0.00000
X1 -0.0502445867 0.0000334063 -1504.04398 0.00000
X3 0.4561416440 0.0000250998 18173.10940 0.00000
X4 0.2449258837 0.0000260156 9414.56904 0.00000
Sin(2*X5*pi) -0.0005448983 0.0001414217 -3.85300 0.00020
----------------------------------------------------------------------------------
Independent variables are X1,X2,X4/(1-X4),X5,
255
r(X3,X1)=0.681029,r(X3,X2)=0.669036,r(X3,X4/(1-X4))=0.576200,r(X3,X5)=0.676858,
r(X1,X2)=0.529988,r(X1,X4/(1-X4))=0.735363,r(X1,X5)=0.921517,r(X2,X4/(1-X4))=0.565801,
r(X2,X5)=0.519779,r(X4/(1-X4),X5)=0.694989,
step 4, X4/(1-X4) into the linear model, SSR= 40385.9280145168
X3= -53.6223818064+0.266098*X1+0.489453*X2+-57.804549*X4/(1-X4)+0.111590*X5
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
Regression 4 1832108984.6512086000 458027246.1628021600
error 99999995 1199137009.6899021000 11.9913706965
total 99999999 3031245994.3411107000
----------------------------------------------------------------------------------
F test statistic=38196404.5442885760
MSE=11.9913706965 , R2=0.604408 , R2(adj)=0.604408
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept -53.6223818064 0.2931435360 -182.92193 0.00000
X1 0.2660983112 0.0000548225 4853.81603 0.00000
X2 0.4894527205 0.0000263447 18578.82396 0.00000
X4/(1-X4) -57.8045492172 0.2876381258 -200.96275 0.00000
X5 0.1115896524 0.0000218496 5107.16261 0.00000
----------------------------------------------------------------------------------

Independent variables are X1,X2,X3/(1-X3),|X5|^0.5,
r(X4,X1)=0.737681,r(X4,X2)=0.567591,r(X4,X3/(1-X3))=0.576518,r(X4,|X5|^0.5)=0.696979,
r(X1,X2)=0.529988,r(X1,X3/(1-X3))=0.679254,r(X1,|X5|^0.5)=0.921390,
r(X2,X3/(1-X3))=0.667286,r(X2,|X5|^0.5)=0.519719,r(X3/(1-X3),|X5|^0.5)=0.675676,
X4=-78.9549067020+0.635392*X1+0.299587*X2+-72.869741*X3/(1-X3)+1.037101*|X5|^0.5
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
Regression 4 1952243152.4098201000 488060788.1024550200
error 99999995 1366861730.6949239000 13.6686179904
total 99999999 3319104883.1047440000
----------------------------------------------------------------------------------
F test statistic=35706666.7929375320
MSE=13.6686179904 , R2=0.588184 , R2(adj)=0.588184

Individual test
256
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept -78.9549067020 0.3454075797 -228.58475 0.00000
X1 0.6353919780 0.0000526114 12077.07480 0.00000
X2 0.2995868991 0.0000287494 10420.62338 0.00000
X3/(1-X3) -72.8697409630 0.3383507794 -215.36744 0.00000
|X5|^0.5 1.0371012120 0.0007190906 1442.23997 0.00000
----------------------------------------------------------------------------------

Independent variables are X1,Cos(X2*pi),|X3|^0.5,log(X4),
r(X5,X1)=0.921517,r(X5,Cos(X2*pi))=0.179069,r(X5,|X3|^0.5)=0.676784,
r(X5,log(X4))=0.696588,r(X1,Cos(X2*pi))=0.000076,r(X1,|X3|^0.5)=0.680920,
r(X1,log(X4))=0.737117,r(Cos(X2*pi),|X3|^0.5)=-0.000024,r(Cos(X2*pi),log(X4))=0.000016,
r(|X3|^0.5,log(X4))=0.577720,

step 2, Cos(X2*pi) into the linear model, SSR=449787823.1983413700
step 4, log(X4) into the linear model, SSR= 3377899.0688457489
X5= 1.0251562982+2.000082*X1+2.999255*Cos(X2*pi)+3.998136*|X3|^0.5+4.996964*log(X4)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
F
----------------------------------------------------------------------------------
Regression 4 12437876095.7771510000 3109469023.9442878000
error 99999995 1600199031.4829810000 16.0019911149
total 99999999 14038075127.2601320000
----------------------------------------------------------------------------------
F test statistic=194317632.2003609200
MSE= 16.0019911149 , R2=0.886010 , R2(adj)=0.886010

Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 1.0251562982 0.0107055373 95.75944 0.00000
X1 2.0000815477 0.0000333810 59916.68067 0.00000
Cos(X2*pi) 2.9992546113 0.0001414136 21209.10325 0.00000
|X3|^0.5 3.9981358640 0.0005258580 7603.07144 0.00000
log(X4) 4.9969641068 0.0027188356 1837.90595 0.00000
----------------------------------------------------------------------------------
257
6.5. Non-linare model and the indepenet variable is the sample
statistics, the other assumptions are unchanged.
Example 34,
( )
iid
X 1 , X 2 ,....., X 10 ~ Normal µ X i = 100,σ X2 i = 25 ,
X 11 = sample Mid _ range ( X 1 , X 2 ,....., X 10 ) + ε ,
ε ~ Normal (µε = 0,σ ε2 = 16 )
(34.1.1)The linear model analysis,
Independent variables are X1,X2,X3,X4,X5,X6,X7,X8,X9,X10
r(X11,X1)=0.156999,r(X11,X2)=0.118742,r(X11,X3)=0.120827,r(X11,X4)=0.119763,
r(X11,X5)=0.073588,r(X11,X6)=0.111077,r(X11,X7)=0.139506,r(X11,X8)=0.135484,
r(X11,X9)=0.091303,r(X11,X10)=0.099970,r(X1,X2)=-0.022653,r(X1,X3)=-0.006942,
r(X1,X4)=0.002438,r(X1,X5)=-0.014813,r(X1,X6)=-0.011543,r(X1,X7)=0.019416,
r(X1,X8)=0.009116,r(X1,X9)=0.032938,r(X1,X10)=-0.043615,r(X2,X3)=-0.045026,
r(X2,X4)=-0.015778,r(X2,X5)=0.039732,r(X2,X6)=0.007813,r(X2,X7)=0.065894,
r(X2,X8)=-0.011657,r(X2,X9)=-0.025933,r(X2,X10)=-0.027953,r(X3,X4)=-0.026932,
r(X3,X5)=0.023902,r(X3,X6)=-0.045622,r(X3,X7)=0.018674,r(X3,X8)=0.036982,
r(X3,X9)=0.006055,r(X3,X10)=-0.024494,r(X4,X5)=-0.005415,r(X4,X6)=-0.054387,
r(X4,X7)=0.016722,r(X4,X8)=0.071585,r(X4,X9)=0.039967,r(X4,X10)=0.056471,
r(X5,X6)=0.018856,r(X5,X7)=0.000047,r(X5,X8)=-0.037696,r(X5,X9)=0.000259,
r(X5,X10)=-0.006063,r(X6,X7)=0.024971,r(X6,X8)=-0.025989,r(X6,X9)=0.024292,
r(X6,X10)=0.011157,r(X7,X8)=-0.000994,r(X7,X9)=0.041997,r(X7,X10)=-0.019164,
r(X8,X9)=0.012759,r(X8,X10)=-0.010528,r(X9,X10)=-0.035934,


X11=-7.450331+0.147088*X1+0.113482*X2+0.121496*X3+0.103626*X4+0.070297*X5
+0.110504*X6+0.109094*X7+0.118187*X8+0.075671*X9+0.106440*X10
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 10 3026.9388197853 302.6938819785 16.1549852235
error 989 18530.7658988442 18.7368714852
total 999 21557.7047186295
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept -7.4503312427 8.6670028613 -0.85962 0.39000
X1 0.1470880113 0.0268668371 5.47471 0.00000
X2 0.1134822176 0.0269079848 4.21742 0.00000
258
X3 0.1214958597 0.0277529953 4.37776 0.00000
X4 0.1036262427 0.0277277976 3.73727 0.00000
X5 0.0702971007 0.0289345571 2.42952 0.01500
X6 0.1105042678 0.0273097594 4.04633 0.00000
X7 0.1090938389 0.0269458775 4.04863 0.00000
X8 0.1181866415 0.0271613650 4.35128 0.00000
X9 0.0756706102 0.0285400874 2.65138 0.00800
X10 0.1064396116 0.0278894303 3.81649 0.00000
----------------------------------------------------------------------------------
MSE=18.7368714852 , R2=0.140411 , R2(adj)=0.131720

Var(b0)= 75.1169385973, Cov(b0,b1)= -0.0753803106, Cov(b0,b2)= -0.0754915511,
Cov(b0,b3)= -0.0820542032, Cov(b0,b4)= -0.0700239069,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-0.0702830267, Cov(b0,b9)= -0.0742002508,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b0,b10)= -0.0847615428,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -0.0753803106, Var(b1)= 0.0007218269, Cov(b1,b2)= 0.0000173454,
Cov(b1,b3)= 0.0000075090, Cov(b1,b4)= -0.0000010566,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b5)= 0.0000103927, Cov(b1,b6)= 0.0000088101, Cov(b1,b7)= -0.0000140049,
Cov(b1,b8)= -0.0000054397, Cov(b1,b9)= -0.0000230901,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b10)= 0.0000321724,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.0754915511, Cov(b2,b1)= 0.0000173454, Var(b2)= 0.0007240396,
Cov(b2,b3)= 0.0000358549, Cov(b2,b4)= 0.0000108114,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0.0000050626, Cov(b2,b9)= 0.0000216663,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b10)= 0.0000217277,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= -0.0820542032, Cov(b3,b1)= 0.0000075090, Cov(b3,b2)= 0.0000358549, Var(b3)=
0.0007702287, Cov(b3,b4)= 0.0000246529,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-0.0000289005, Cov(b3,b9)= -0.0000041872,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b10)= 0.0000175929,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0.0000246529, Var(b4)= 0.0007688308,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-0.0000538034, Cov(b4,b9)= -0.0000329566,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b10)= -0.0000454363,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-0.0000222751, Cov(b5,b4)= -0.0000000231,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Var(b5)= 0.0008372086, Cov(b5,b6)= -0.0000148209, Cov(b5,b7)= 0.0000027538,
Cov(b5,b8)= 0.0000295819, Cov(b5,b9)= -0.0000012781,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b5,b10)= 0.0000043977,
259
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0.0000357009, Cov(b6,b4)= 0.0000425538,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b6,b5)= -0.0000148209, Var(b6)= 0.0007458230, Cov(b6,b7)= -0.0000190907,
Cov(b6,b8)= 0.0000144784, Cov(b6,b9)= -0.0000210208,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b6,b10)= -0.0000107506,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-0.0000172970, Cov(b7,b4)= -0.0000143101,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b7,b5)= 0.0000027538, Cov(b7,b6)= -0.0000190907, Var(b7)= 0.0007260803,
Cov(b7,b8)= 0.0000020155, Cov(b7,b9)= -0.0000315939,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b7,b10)= 0.0000118710,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-0.0000289005, Cov(b8,b4)= -0.0000538034,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b8,b5)= 0.0000295819, Cov(b8,b6)= 0.0000144784, Cov(b8,b7)= 0.0000020155, Var(b8)=
0.0007377398, Cov(b8,b9)= -0.0000072636,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b8,b10)= 0.0000100245,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-0.0000041872, Cov(b9,b4)= -0.0000329566,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-0.0000072636, Var(b9)= 0.0008145366,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b9,b10)= 0.0000294704,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b10,b0)= -0.0847615428, Cov(b10,b1)= 0.0000321724, Cov(b10,b2)= 0.0000217277,
Cov(b10,b3)= 0.0000175929, Cov(b10,b4)= -0.0000454363,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b10,b5)= 0.0000043977, Cov(b10,b6)= -0.0000107506, Cov(b10,b7)= 0.0000118710,
Cov(b10,b8)= 0.0000100245, Cov(b10,b9)= 0.0000294704,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Var(b10)= 0.0007778203,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ indivudal test ~~~~~~~~~~~~~~
intercept -7.4503312427 8.6670028613 -0.8596 0.7389
X1 slope 0.1470880113 0.0268668371 5.4747 29.9724
X2 slope 0.1134822176 0.0269079848 4.2174 17.7866
X3 slope 0.1214958597 0.0277529953 4.3778 19.1648
X4 slope 0.1036262427 0.0277277976 3.7373 13.9672
X5 slope 0.0702971007 0.0289345571 2.4295 5.9026
X6 slope 0.1105042678 0.0273097594 4.0463 16.3728
X7 slope 0.1090938389 0.0269458775 4.0486 16.3914
X8 slope 0.1181866415 0.0271613650 4.3513 18.9336
X9 slope 0.0756706102 0.0285400874 2.6514 7.0298
X10 slope 0.1064396116 0.0278894303 3.8165 14.5656
====================
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -5.54753 -3.64302 -2.26983 -1.09651 0.00011 1.09655
2.26983 3.64120 5.54709
upper limit -5.54753 -3.64302 -2.26983 -1.09651 0.00011 1.09655 2.26983
3.64120 5.54709
observed no 112.00000 85.00000 101.00000 95.00000 104.00000 103.00000 86.00000
112.00000 110.00000 92.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 1.44000 2.25000 0.01000 0.25000 0.16000 0.09000 1.96000
260
1.44000 1.00000 0.64000
degree of freedom=8
p-value=0.322400
Z=-0.188700, p-value=0.425200
Z=-0.188700, p-value=0.574800
Z=-0.188700, p-value=0.850400
t=2,3,...,1000
D.W. test=1.926742
D.W. test=2.073258

[17.446358 , 20.233554]
[4.176884 , 4.498172]
[17.219169 , 20.547976]
[4.149599 , 4.532988]
[16.791603 , 21.191905]
[4.097756 , 4.603467]
residual plot (X11 estimated line,X11) scatter
diagram
261
(34.1.2)Non-linear model analysis,
Independent variables are
X1/(1-X1),X2/(1-X2),X3^3,X4^3,X5^3,X6/(1-X6),X7/(1-X7),X8^3,X9/(1-X9),X10/(1-X10),
r(X11,X1/(1-X1))=0.159839,r(X11,X2/(1-X2))=0.120176,r(X11,X3^3)=0.122788,
r(X11,X4^3)=0.120656,r(X11,X5^3)=0.076117,r(X11,X6/(1-X6))=0.113402,
r(X11,X7/(1-X7))=0.142056,r(X11,X8^3)=0.137334,r(X11,X9/(1-X9))=0.093740,
r(X11,X10/(1-X10))=0.102837,r(X1/(1-X1),X2/(1-X2))=-0.020878,r(X1/(1-X1),X3^3)=-0.005132,
r(X1/(1-X1),X4^3)=0.002852,r(X1/(1-X1),X5^3)=-0.016269,r(X1/(1-X1),X6/(1-X6))=-0.010289,
r(X1/(1-X1),X7/(1-X7))=0.021956,r(X1/(1-X1),X8^3)=0.011516,r(X1/(1-X1),X9/(1-X9))=0.036457,
r(X1/(1-X1),X10/(1-X10))=-0.041011,r(X2/(1-X2),X3^3)=-0.043579,r(X2/(1-X2),X4^3)=-0.013706,
r(X2/(1-X2),X5^3)=0.042441,r(X2/(1-X2),X6/(1-X6))=0.009398,
r(X2/(1-X2),X7/(1-X7))=0.065409,r(X2/(1-X2),X8^3)=-0.013791,
r(X2/(1-X2),X9/(1-X9))=-0.026665,r(X2/(1-X2),X10/(1-X10))=-0.027921,r(X3^3,X4^3)=-0.024639,
r(X3^3,X5^3)=0.023045,r(X3^3,X6/(1-X6))=-0.040099,r(X3^3,X7/(1-X7))=0.016333,
r(X3^3,X8^3)=0.038874,r(X3^3,X9/(1-X9))=0.009999,r(X3^3,X10/(1-X10))=-0.019516,
r(X4^3,X5^3)=-0.002575,r(X4^3,X6/(1-X6))=-0.058136,r(X4^3,X7/(1-X7))=0.017290,
r(X4^3,X8^3)=0.073642,r(X4^3,X9/(1-X9))=0.039448,r(X4^3,X10/(1-X10))=0.056368,
r(X5^3,X6/(1-X6))=0.017163,r(X5^3,X7/(1-X7))=0.002357,r(X5^3,X8^3)=-0.038543,
r(X5^3,X9/(1-X9))=0.001146,r(X5^3,X10/(1-X10))=-0.003904,
r(X6/(1-X6),X7/(1-X7))=0.019963,r(X6/(1-X6),X8^3)=-0.024927,
r(X6/(1-X6),X9/(1-X9))=0.019955,r(X6/(1-X6),X10/(1-X10))=0.010906,
r(X7/(1-X7),X8^3)=-0.000673,r(X7/(1-X7),X9/(1-X9))=0.040521,
r(X7/(1-X7),X10/(1-X10))=-0.023087,r(X8^3,X9/(1-X9))=0.009972,
r(X8^3,X10/(1-X10))=-0.007249,r(X9/(1-X9),X10/(1-X10))=-0.038285,
One or more independent variable mathematical model are changed,
the inptut order is nonsense.
X11=6655.2612996575+1427.635572*X1/(1-X1)+1109.290130*X2/(1-X2)+0.000004*X3^3
+0.000003*X4^3+0.000002*X5^3+1086.414201*X6/(1-X6)+1080.517700*X7/(1-X7)
+0.000004*X8^3+752.276595*X9/(1-X9)+1047.002574*X10/(1-X10)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 10 3118.1878836530 311.8187883653 16.7243417739
error 989 18439.5168349765 18.6446075177
total 999 21557.7047186295
----------------------------------------------------------------------------------
MSE=18.6446075177 , R2=0.144644 , R2(adj)=0.135995
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 6655.2612996575 150.4263295500 44.24266 0.00000
X1/(1-X1) 1427.6355718509 59.8433526094 23.85621 0.00000
262
X2/(1-X2) 1109.2901296678 60.4827853500 18.34059 0.00000
X3^3 0.0000040097 0.0000002118 18.92945 0.00000
X4^3 0.0000034308 0.0000002121 16.17560 0.00000
X5^3 0.0000023940 0.0000002217 10.79969 0.00000
X6/(1-X6) 1086.4142005958 60.7742339354 17.87623 0.00000
X7/(1-X7) 1080.5176999974 60.0325350419 17.99887 0.00000
X8^3 0.0000039386 0.0000002073 18.99554 0.00000
X9/(1-X9) 752.2765951882 63.4245585025 11.86097 0.00000
X10/(1-X10) 1047.0025740547 62.1957122237 16.83400 0.00000
----------------------------------------------------------------------------------
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -5.53386 -3.63404 -2.26423 -1.09380 0.00011 1.09384
2.26424 3.63222 5.53341
upper limit -5.53386 -3.63404 -2.26423 -1.09380 0.00011 1.09384 2.26424
3.63222 5.53341
observed no 111.00000 82.00000 102.00000 102.00000 95.00000 110.00000 85.00000
109.00000 113.00000 91.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 1.21000 3.24000 0.04000 0.04000 0.25000 1.00000 2.25000
0.81000 1.69000 0.81000
degree of freedom=8
p-value=0.183100
Z=0.197982, p-value=0.578500
Z=0.197982, p-value=0.421500
Z=0.197982, p-value=0.843000
t=2,3,...,1000
D.W. test=1.920537
D.W. test=2.079463
[4.087654 , 4.592118]
263
residual plot (X11 estimated line,X11) scatter
diagram
SSR of stepwise in the linear model SSR of stepwise in the non-linear model
step 1, X1 into the linear model, SSR= step 1, X1/(1-X1) into the linear model, SSR=
531.3664409936 550.7637684560
401.5706605275 414.0005835120
step 3, X8 into the linear model, SSR= step 3, X8^3 into the linear model, SSR=
388.3547404849 396.5661458599
285.3976832166 292.5981016362
309.9942976198 317.9405210125
299.5827128196 308.6943160358
310.4347613610 312.5261301144
257.5509009630 267.1628304095
132.0909257631 141.3022651843
110.5956960362 116.6332214324
The SSR of linear model and non-linear model are unqueal but is very closely.
All estimated slope value of linear model are equally likely, it is said the X1,..,X10
has a function of central tendency. The sample central tendency has sample median,
sample median and sample midrange, reconstructing the line model and the
independent variable is the sample statistic of central tendency.
(34.1.3)
Independent variable is sample statistic of central tendency and the dependent
variable is X11,
(34.1.3.1)Let X1=sample median of (X1,…,X10),X2= X11,
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 898.5240494875 898.5240494875 43.4057388698
error 998 20659.1806691420 20.7005818328
total 999 21557.7047186295
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
264
----------------------------------------------------------------------------------
intercept 49.3037874860 7.7136389653 6.39177 0.00000
slpoe 0.5080232590 0.0771098786 6.58830 0.00000
----------------------------------------------------------------------------------
MSE=20.7005818328 , R2=0.041680 , R2(adj)=0.040720
X1(mean)=100.0169779713, X1(variance)=3.4849538007, X1(s.d.)= 1.8668030964
SSX1=3481.4688468525 , SS(X2*X1)= 1768.6671497030, C.V.= 0.0454457483
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -5.83100 -3.82916 -2.38581 -1.15253 0.00011 1.15258
2.38581 3.82725 5.83053
upper limit -5.83100 -3.82916 -2.38581 -1.15253 0.00011 1.15258 2.38581
3.82725 5.83053
observed no 114.00000 94.00000 88.00000 89.00000 104.00000 109.00000 92.00000
106.00000 106.00000 98.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 1.96000 0.36000 1.44000 1.21000 0.16000 0.81000 0.64000
0.36000 0.36000 0.04000
degree of freedom=8
p-value=0.500400
Z=-0.047987, p-value=0.480900
Z=-0.047987, p-value=0.519100
Z=-0.047987, p-value=0.961800
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0 , D.W. test=1.929624
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0 , D.W. test=2.070376
265
residual plot (X11 esitmated line,X11) scatter
diagram
(34.1.3.2) X1=the sample mean( X1,…,X10) X2= X11,

ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 2923.9354965878 2923.9354965878 156.6021125851
error 998 18633.7692220418 18.6711114449
total 999 21557.7047186295
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
----------------------------------------------------------------------------------
intercept -7.6779103511 8.6147955224 -0.89125 0.37280
slpoe 1.0782578258 0.0861635950 12.51408 0.00000
----------------------------------------------------------------------------------
MSE=18.6711114449 , R2=0.135633 , R2(adj)=0.134767
SSX1=2514.9105918149 , SS(X2*X1)= 2711.7220267114, C.V.= 0.0431605597

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -5.53779 -3.63662 -2.26584 -1.09458 0.00011 1.09462
2.26584 3.63480 5.53734
upper limit -5.53779 -3.63662 -2.26584 -1.09458 0.00011 1.09462 2.26584
3.63480 5.53734
observed no 116.00000 89.00000 90.00000 93.00000 102.00000 115.00000 82.00000
108.00000 114.00000 91.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 2.56000 1.21000 1.00000 0.49000 0.04000 2.25000 3.24000
0.64000 1.96000 0.81000
degree of freedom=8
p-value=0.076600
266
Z=0.075963, p-value=0.530300
Z=0.075963, p-value=0.469700
Z=0.075963, p-value=0.939400
t=2,3,...,1000
scatter plot (X11 estimated line,X11) scatter
diagram
(34.1.3.3)X1=sample midrange of (X1,…,X10), X2= X11,

ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 5311.5659127240 5311.5659127240 326.2893936971
error 998 16246.1388059055 16.2786961983
total 999 21557.7047186295
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
----------------------------------------------------------------------------------
intercept -7.1972862353 5.9421969848 -1.21122 0.22600
slpoe 1.0736134778 0.0594355761 18.06348 0.00000
----------------------------------------------------------------------------------
MSE=16.2786961983 , R2=0.246388 , R2(adj)=0.245633
267
SSX1=4608.1506155634 , SS(X2*X1)= 4947.3726088020, C.V.= 0.0403006259
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -5.17084 -3.39565 -2.11570 -1.02205 0.00010 1.02209
2.11570 3.39395 5.17043
upper limit -5.17084 -3.39565 -2.11570 -1.02205 0.00010 1.02209 2.11570
3.39395 5.17043
observed no 109.00000 95.00000 94.00000 99.00000 90.00000 100.00000 114.00000
106.00000 87.00000 106.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.81000 0.25000 0.36000 0.01000 1.00000 0.00000 1.96000
0.36000 1.69000 0.36000
degree of freedom=8
p-value=0.558300
Z=-1.181679, p-value=0.118700
Z=-1.181679, p-value=0.881300
Z=-1.181679, p-value=0.237400
t=2,3,...,1000
D.W. test=1.904664
D.W. test=2.095336

residual plot (X11 esitmated line,X11) scatter
diagram
268
(34.1. 4) The best linear model of three models.
X11=sample midrange of (X1,…,X10)
X2=-7.197286+1.073613* sample midrange of (X1,…,X10)+residual,
residual~Normal(0,16.2786961983).
intercept test H0: b0=0,p-value=0.22600,
X2=1.073613*sample midrange of (X1,…,X10) +error,
(34.2) n = 100,000,000, it is big data.

(34.2.1)The linear model,
Independent variables are X1,X2,X3,X4,X5,X6,X7,X8,X9,X10
r(X11,X1)=0.109974,r(X11,X2)=0.109977,r(X11,X3)=0.110003,r(X11,X4)=0.109937,
r(X11,X5)=0.110145,r(X11,X6)=0.109958,r(X11,X7)=0.110137,r(X11,X8)=0.109748,
r(X11,X9)=0.110318,r(X11,X10)=0.110312,r(X1,X2)=-0.000120,r(X1,X3)=0.000241,
r(X1,X4)=-0.000214,r(X1,X5)=0.000247,r(X1,X6)=-0.000222,r(X1,X7)=-0.000010,
r(X1,X8)=-0.000170,r(X1,X9)=-0.000169,r(X1,X10)=0.000065,r(X2,X3)=0.000030,
r(X2,X4)=-0.000088,r(X2,X5)=0.000003,r(X2,X6)=-0.000218,r(X2,X7)=0.000251,
r(X2,X8)=-0.000196,r(X2,X9)=-0.000120,r(X2,X10)=0.000296,r(X3,X4)=0.000000,
r(X3,X5)=0.000017,r(X3,X6)=0.000210,r(X3,X7)=0.000012,r(X3,X8)=-0.000366,
r(X3,X9)=0.000282,r(X3,X10)=0.000050,r(X4,X5)=-0.000060,r(X4,X6)=-0.000024,
r(X4,X7)=-0.000089,r(X4,X8)=0.000187,r(X4,X9)=0.000149,r(X4,X10)=-0.000003,
r(X5,X6)=-0.000042,r(X5,X7)=0.000018,r(X5,X8)=0.000094,r(X5,X9)=-0.000292,
r(X5,X10)=0.000027,r(X6,X7)=-0.000096,r(X6,X8)=-0.000131,r(X6,X9)=0.000085,
r(X6,X10)=-0.000165,r(X7,X8)=-0.000023,r(X7,X9)=-0.000037,r(X7,X10)=0.000191,
r(X8,X9)=-0.000270,r(X8,X10)=0.000237,r(X9,X10)=0.000294,
X11=0.011014+0.099968*X1+0.099930*X2+0.099888*X3+0.099892*X4+0.100078*X5
+0.099962*X6+0.100054*X7+0.099772*X8+0.100227*X9+0.100115*X10
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 10 124971361.0827400700 12497136.1082740070 689030.0901626347
error 49999989 906864122.2877736100 18.1372864360
total 49999999 1031835483.3705137000
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 0.0110135921 0.0380967324 0.28910 0.77240
X1 0.0999679442 0.0001204760 829.77463 0.00000
X2 0.0999300896 0.0001204493 829.64416 0.00000
X3 0.0998878245 0.0001204471 829.30876 0.00000
X4 0.0998924611 0.0001204500 829.32703 0.00000
X5 0.1000779894 0.0001204645 830.76725 0.00000
X6 0.0999622438 0.0001204553 829.87029 0.00000
X7 0.1000537324 0.0001204693 830.53287 0.00000
X8 0.0997723293 0.0001204530 828.30917 0.00000
X9 0.1002271516 0.0001204440 832.14750 0.00000
X10 0.1001151352 0.0001204452 831.20924 0.00000
----------------------------------------------------------------------------------
MSE=18.1372864360 , R2=0.121116 , R2(adj)=0.121115
269
~~~~~~~~~~~~~~~~~~~~~~~~~~ indivudal test ~~~~~~~~~~~~~~~~~~~~~~
intercept 0.0110135921 0.0380967324 0.2891 0.0836
X1 slope 0.0999679442 0.0001204760 829.7746 688525.9418
X2 slope 0.0999300896 0.0001204493 829.6442 688309.4337
X3 slope 0.0998878245 0.0001204471 829.3088 687753.0188
X4 slope 0.0998924611 0.0001204500 829.3270 687783.3216
X5 slope 0.1000779894 0.0001204645 830.7672 690174.2230
X6 slope 0.0999622438 0.0001204553 829.8703 688684.6935
X7 slope 0.1000537324 0.0001204693 830.5329 689784.8444
X8 slope 0.0997723293 0.0001204530 828.3092 686096.0828
X9 slope 0.1002271516 0.0001204440 832.1475 692469.4537
X10 slope 0.1001151352 0.0001204452 831.2092 690908.8073
====================
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -7.00530 -5.45805 -4.41392 -3.58425 -2.87241 -2.23321
-1.64087 -1.07992 -0.53496 -0.00096 0.53420 1.07886 1.64083 2.23322
2.87224 3.58399 4.41373 5.45761 7.00510
upper limit -7.00530 -5.45805 -4.41392 -3.58425 -2.87241 -2.23321 -1.64087
-1.07992 -0.53496 -0.00096 0.53420 1.07886 1.64083 2.23322 2.87224
3.58399 4.41373 5.45761 7.00510
observed no 2498239.00000 2494891.00000 2500279.00000 2498080.00000 2500951.00000 2499743.00000
2502938.00000 2497196.00000 2508220.00000 2493915.00000 2502167.00000 2507317.00000
2502174.00000 2501887.00000 2501793.00000 2498769.00000 2500123.00000 2497027.00000
2494048.00000 2500243.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000
chi square 1.24045 10.44075 0.03114 1.47456 0.36176 0.02642 3.45274
3.14497 27.02736 14.81089 1.87836 21.41540 1.89051 1.42431 1.28594
0.60614 0.00605 3.53549 14.17052 0.02362
p-value=0.000000
Z=0.429932, p-value=0.666400
Z=0.429932, p-value=0.333600
Z=0.429932, p-value=0.667200
t=2,3,...,50000000
270
D.W. test=1.999919
D.W. test=2.000081

sample mean(residual)= 0.0000, sample variance(residual)= 18.1373,
sample cov(X11 estimated value,residual)= 0.0000,
X11 estimated value and residual sample correlation coefficient=0.0000.
271
(34.2.1.1)The marginal probability of depenednet estimated of X11,
X11 estimated line probability distribution
Variance : 2.49943
S.D. : 1.58096
MAD : 1.26151
Range : 17.33966
Mid_range : 100.25892
Median : 99.99987
Q1 : 98.93349
Q2 : 99.99987
Q3 : 101.06666
IQR : 2.13317
C.V. : 0.01581
SLLN analysis, X11 and Normal(100, 2.49943),

Note:X12~Normal Normal(100, 2.49943),

Variance : 18.13728
S.D. : 4.25879
MAD : 3.39696
Range : 48.16873
Median : 0.00020
Q1 : -2.87038
Q2 : 0.00020
Q3 : 2.86960
IQR : 5.73998
C.V. : none
272
Note:X1~Normal(0, 18.13728), X1 is representable code of Normal(0, 18.13728),
(34.2.2)Non-linear model analysis

Independent variables are
X1/(1-X1),X2/(1-X2),X3/(1-X3),X4/(1-X4),X5/(1-X5),X6/(1-X6),X7/(1-X7),X8/(1-X8),X9/(1-X9),X1
0/(1-X10),

r(X11,X1/(1-X1))=0.110172,r(X11,X2/(1-X2))=0.110181,r(X11,X3/(1-X3))=0.110202,
r(X11,X4/(1-X4))=0.110137,r(X11,X5/(1-X5))=0.110348,r(X11,X6/(1-X6))=0.110160,
r(X11,X7/(1-X7))=0.110350,r(X11,X8/(1-X8))=0.109932,r(X11,X9/(1-X9))=0.110512,
r(X11,X10/(1-X10))=0.110515,r(X1/(1-X1),X2/(1-X2))=-0.000108,
r(X1/(1-X1),X3/(1-X3))=0.000242,r(X1/(1-X1),X4/(1-X4))=-0.000232,
r(X1/(1-X1),X5/(1-X5))=0.000241,r(X1/(1-X1),X6/(1-X6))=-0.000248,
r(X1/(1-X1),X7/(1-X7))=0.000000,r(X1/(1-X1),X8/(1-X8))=-0.000179,
r(X1/(1-X1),X9/(1-X9))=-0.000172,r(X1/(1-X1),X10/(1-X10))=0.000057,
r(X2/(1-X2),X3/(1-X3))=0.000029,r(X2/(1-X2),X4/(1-X4))=-0.000088,
r(X2/(1-X2),X5/(1-X5))=0.000003,r(X2/(1-X2),X6/(1-X6))=-0.000223,
r(X2/(1-X2),X7/(1-X7))=0.000253,r(X2/(1-X2),X8/(1-X8))=-0.000198,
r(X2/(1-X2),X9/(1-X9))=-0.000153,r(X2/(1-X2),X10/(1-X10))=0.000284,
r(X3/(1-X3),X4/(1-X4))=-0.000002,r(X3/(1-X3),X5/(1-X5))=0.000017,
r(X3/(1-X3),X6/(1-X6))=0.000198,r(X3/(1-X3),X7/(1-X7))=0.000031,
r(X3/(1-X3),X8/(1-X8))=-0.000340,r(X3/(1-X3),X9/(1-X9))=0.000277,
r(X3/(1-X3),X10/(1-X10))=0.000059,r(X4/(1-X4),X5/(1-X5))=-0.000081,
r(X4/(1-X4),X6/(1-X6))=-0.000018,r(X4/(1-X4),X7/(1-X7))=-0.000117,
r(X4/(1-X4),X8/(1-X8))=0.000217,r(X4/(1-X4),X9/(1-X9))=0.000127,
r(X4/(1-X4),X10/(1-X10))=-0.000007,r(X5/(1-X5),X6/(1-X6))=-0.000034,
r(X5/(1-X5),X7/(1-X7))=0.000041,r(X5/(1-X5),X8/(1-X8))=0.000106,
r(X5/(1-X5),X9/(1-X9))=-0.000276,r(X5/(1-X5),X10/(1-X10))=0.000016,
r(X6/(1-X6),X7/(1-X7))=-0.000096,r(X6/(1-X6),X8/(1-X8))=-0.000144,
r(X6/(1-X6),X9/(1-X9))=0.000083,r(X6/(1-X6),X10/(1-X10))=-0.000189,
r(X7/(1-X7),X8/(1-X8))=-0.000008,r(X7/(1-X7),X9/(1-X9))=-0.000049,
r(X7/(1-X7),X10/(1-X10))=0.000179,r(X8/(1-X8),X9/(1-X9))=-0.000246,
r(X8/(1-X8),X10/(1-X10))=0.000253,r(X9/(1-X9),X10/(1-X10))=0.000268,

One or more independent variable mathematical model are changed,
the inptut order is nonsense.
273

X11= 9915.3591722846+971.504502*X1/(1-X1)+971.161806*X2/(1-X2)+970.646087*X3/(1-X3)
+970.809395*X4/(1-X4)+972.569407*X5/(1-X5)+971.498969*X6/(1-X6)
+972.415861*X7/(1-X7)+969.349676*X8/(1-X8)+973.969357*X9/(1-X9)
+973.031030*X10/(1-X10)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
Regression 10 125428463.3759459300 12542846.3375945930
error 49999989 906407019.9945677500 18.1281443881
total 49999999 1031835483.3705137000
----------------------------------------------------------------------------------
F test value=691899.0752213928
MSE=18.1281443881 , R2=0.121559 , R2(adj)=0.121558
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 9915.3591722846 0.8764141772 11313.55406 0.00000
X1/(1-X1) 971.5045024011 0.2744079765 3540.36539 0.00000
X2/(1-X2) 971.1618056388 0.2743424836 3539.96141 0.00000
X3/(1-X3) 970.6460874266 0.2743399774 3538.11390 0.00000
X4/(1-X4) 970.8093946087 0.2743530261 3538.54087 0.00000
X5/(1-X5) 972.5694065264 0.2743888410 3544.49329 0.00000
X6/(1-X6) 971.4989688649 0.2743603896 3540.95928 0.00000
X7/(1-X7) 972.4158608132 0.2743946795 3543.85829 0.00000
X8/(1-X8) 969.3496756554 0.2743566548 3533.17355 0.00000
X9/(1-X9) 973.9693574405 0.2743280596 3550.38183 0.00000
X10/(1-X10) 973.0310297902 0.2743459174 3546.73049 0.00000
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -7.00354 -5.45668 -4.41281 -3.58335 -2.87168 -2.23265
-1.64045 -1.07965 -0.53482 -0.00096 0.53407 1.07859 1.64042 2.23265
2.87151 3.58309 4.41262 5.45624 7.00333
upper limit -7.00354 -5.45668 -4.41281 -3.58335 -2.87168 -2.23265 -1.64045
-1.07965 -0.53482 -0.00096 0.53407 1.07859 1.64042 2.23265 2.87151
3.58309 4.41262 5.45624 7.00333
observed no 2492186.00000 2497050.00000 2503600.00000 2502368.00000 2503151.00000 2502599.00000
2504506.00000 2499442.00000 2509863.00000 2493407.00000 2501643.00000 2505185.00000
2501491.00000 2500565.00000 2499057.00000 2494270.00000 2497706.00000 2494081.00000
2491518.00000 2506312.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000
274
chi square 24.42344 3.48100 5.18400 2.24297 3.97152 2.70192 8.12161
0.12455 38.91151 17.38706 1.07978 10.75369 0.88923 0.12769 0.35570
13.13316 2.10497 14.01382 28.77773 15.93654
p-value=0.000000
Z=0.362145, p-value=0.641400
Z=0.362145, p-value=0.358600
Z=0.362145, p-value=0.717200
t=2,3,...,50000000
D.W. test=1.999922
D.W. test=2.000078
[18.122183 , 18.134110]
[4.257016 , 4.258416]
[18.121041 , 18.135253]
[4.256882 , 4.258551]
[18.118809 , 18.137489]
[4.256619 , 4.258813]
275
(34.2.2.1)The mariagnal proability distribution of depedent variable estimated line,
X11 estimated line probability distribution
Variance : 2.50855
S.D. : 1.58384
MAD : 1.26330
Range : 17.46074
Median : 100.02558
Q1 : 98.94654
Q2 : 100.02558
Q3 : 101.08149
IQR : 2.13495
C.V. : 0.01584
SLLN analysis, X11 estimated line and Normal (100, 2.50855),

Note:X12~ Normal (100, 2.50855),
X12 is representable code of Normal (100, 2.50855),
X11 estimated line is not Normal(100,2.50855),

Variance : 18.12814
S.D. : 4.25772
MAD : 3.39614
Range : 48.09841
Median : -0.00270
Q1 : -2.87123
Q2 : -0.00270
Q3 : 2.86724
IQR : 5.73847
C.V. : none
276
Note:X1~ Normal(0, 18.12814),
(34.2.3)The marginal probability distribution of X1,…,X11, there are no linear

relationship of any two random variables from X1,..,X10.
The marginal probability distribution of X1 and X11.
Variance : 24.99201
S.D. : 4.99920
MAD : 3.98885
Range : 58.27985
Median : 100.00022
Q1 : 96.62679
Q2 : 100.00022
Q3 : 103.37103
IQR : 6.74424
C.V. : 0.04999
iid
X1,X2,…,X10 ~ Normal(100,25),
Variance : 20.63671
S.D. : 4.54276
MAD : 3.62362
Range : 50.74723
Median : 99.99902
Q1 : 96.93858
Q2 : 99.99902
Q3 : 103.06278
IQR : 6.12420
C.V. : 0.04543
277
(34.2.4) The joint probability distribution of one of X1,…,X10 and X11,
f(x1,x2) and f(x1,x11) only,
f(x1,x2) f(x2,x1)

sample cov(X1,X2)= -0.0030, X1 and X2 sample correlation coefficient=-0.0001.
f(x1,x11) f(x11,x1)

The sample mean(X1)= sample mean(X2)=….= sample mean(X10)

= sample mean(X11)=100,
E(sample median(X1,…,X10))=E(sample mean(X1,…,X10))
= E(sample midrange (X1,…,X10))=100,
Those sample statistic of central tendency will be discussed.
278
(34.2.5)The marginal probability distribution of sample median(X1,…,X10),
sample mean(X1,…,X10) and sample midrange (X1,…,X10),
the joint probability distribution of sample satsitic and X11.
Y1= sample median(X1,…,X10),
Variance : 3.45868
S.D. : 1.85975
MAD : 1.48278
Range : 20.68329
Mid_range : 100.06641
Median : 99.99999
Q1 : 98.74835
Q2 : 99.99999
Q3 : 101.25230
IQR : 2.50395
C.V. : 0.01860
Y2= sample mean(X1,…,X10),
Variance : 2.49999
S.D. : 1.58113
MAD : 1.26165
Range : 17.34104
Mid_range : 100.25825
Median : 100.00016
Q1 : 98.93366
Q2 : 100.00016
Q3 : 101.06709
IQR : 2.13342
C.V. : 0.01581
Y3= sample midrange(X1,…,X10)

Variance : 4.63861
S.D. : 2.15374
MAD : 1.70995
Range : 26.29790
Mid_range : 100.31064
Median : 99.99992
Q1 : 98.56607
Q2 : 99.99992
Q3 : 101.43409
IQR : 2.86803
C.V. : 0.02154
279
f(y1,y4), f(y4,y1),
Y1= sample median(X1,…,X10),
Y4=X11,

sample cov(Y1,Y4)= 1.6138,Y1 and Y4 sample correlation coefficient=0.1910.
E(Y4|Y1) Var(Y4|Y1)
f(y2,y4), f(y4,y2),
Y2= sample mean(X1,…,X10),
Y4=X11,

sample cov(Y2,Y4)= 2.4997,Y2 and Y4 sample correlation coefficient=0.3480.
280
E(Y4|Y2) Var(Y4|Y2)
f(y3,y4), f(y4,y3),
Y3= sample midrange(X1,…,X10),
Y4=X11,

E(Y4|Y3) Var(Y4|Y3)
281
Y1= sample median(X1,…,X10), Y2= sample mean (X1,…,X10),
f(y1,y2), f(y2,y1),

Y1= sample median(X1,…,X10)Y3= sample midrange(X1,…,X10)

f(y1,y3), f(y3,y1),

282
Y2= sample mean(X1,…,X10),Y3= sample midrange(X1,…,X10),
f(y1,y3), f(y3,y1),

The sample midrange(X1,…,X10) is the best independent variable from the

comparison of determination coeffiiceint.
(34.2.6)Let X1= sample midrange(X1,…,X10) ,X2= X11.

ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 231909492.1941784600 231909492.1941784600 14495683.6929853410
error 49999998 799925991.1763352200 15.9985204635
total 49999999 1031835483.3705137000
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
----------------------------------------------------------------------------------
intercept 0.0045784986 0.0262700757 0.17429 0.86160
slpoe 0.9999552219 0.0002626402 3807.31975 0.00000
----------------------------------------------------------------------------------
MSE=15.9985204635 , R2=0.224754 , R2(adj)=0.224754
SSX1=231930262.5135440800 , SS(X2*X1)=231919877.1213423000,
C.V.= 0.0399981637

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -6.57932 -5.12615 -4.14551 -3.36630 -2.69774 -2.09741
-1.54109 -1.01425 -0.50243 -0.00091 0.50172 1.01325 1.54106 2.09742
2.69758 3.36605 4.14533 5.12574 6.57912
upper limit -6.57932 -5.12615 -4.14551 -3.36630 -2.69774 -2.09741 -1.54109
283
-1.01425 -0.50243 -0.00091 0.50172 1.01325 1.54106 2.09742 2.69758
3.36605 4.14533 5.12574 6.57912
observed no 2499843.00000 2497922.00000 2503165.00000 2499465.00000 2501094.00000 2498744.00000
2500649.00000 2496794.00000 2502781.00000 2497297.00000 2498742.00000 2503158.00000
2499099.00000 2500824.00000 2499902.00000 2498744.00000 2502175.00000 2500066.00000
2499780.00000 2499756.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000
chi square 0.00986 1.72723 4.00689 0.11449 0.47873 0.63101 0.16848
4.11137 3.09358 2.92248 0.63303 3.98919 0.32472 0.27159 0.00384
0.63101 1.89225 0.00174 0.01936 0.02381
p-value=0.123400
Z=-0.326914, p-value=0.371900
Z=-0.326914, p-value=0.628100
Z=-0.326914, p-value=0.743800
[15.993259 , 16.003785]
[3.999157 , 4.000473]
[15.992251 , 16.004794]
[3.999031 , 4.000599]
[15.990282 , 16.006768]
[3.998785 , 4.000846]
and residual estimated line and X2
X11= sample midrange(X1,…,X10) +error, error~Normal(0,15.9985204635),
284
6.6. Dummy variable is one of independent variable, the other
Example 35,
Dummy=0,
X 1 ~ Normal (E ( X 1 ) = 100,Var ( X 1 ) = 25),
ε ~ Normal (E (ε ) = 0,Var (ε ) = 16 ),
X 3 Dummmy = 0, x1 , x2 = 50 + 2 x1 + 3 x2 + ε
Dummy=1,
X 1 ~ Normal (E ( X 1 ) = 100,Var ( X 1 ) = 25),
ε ~ Normal (E (ε ) = 0,Var (ε ) = 16 ),
X 3 Dummmy = 1, x1 , x2 = 10 + x1 + 5 x2 + ε
(35.1) 1000 pair samples when Dummy=0,

2000 pair samples when Dummy=1,
(35.1.1) Dummy=0,
Independent variables are X1,X2
r(X3,X1)=0.992210,r(X3,X2)=0.994992,r(X1,X2)=0.995619,
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 1670340.2253973677 835170.1126986839 50856.1998071640
error 997 16372.9221907629 16.4221887570
total 999 1686713.1475881306
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 37.5364372935 7.0209839612 5.34632 0.00000
X1 1.4586785103 0.2699270014 5.40397 0.00000
X2 3.2672239363 0.1337101427 24.43512 0.00000
----------------------------------------------------------------------------------
MSE=16.4221887570 , R2=0.990293 , R2(adj)=0.990274
~~~~~~~~~~~~~~~~~~~~~~~~~~~ indivudal test ~~~~~~~~~~~~~
intercept 37.5364372935 7.0209839612 5.3463 28.5832
X1 slope 1.4586785103 0.2699270014 5.4040 29.2029
X2 slope 3.2672239363 0.1337101427 24.4351 597.0753
285
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -5.19358 -3.41058 -2.12500 -1.02655 0.00010 1.02658
2.12501 3.40888 5.19316
upper limit -5.19358 -3.41058 -2.12500 -1.02655 0.00010 1.02658 2.12501
3.40888 5.19316
observed no 99.00000 114.00000 111.00000 92.00000 89.00000 81.00000 104.00000
108.00000 100.00000 102.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.01000 1.96000 1.21000 0.64000 1.21000 3.61000 0.16000
0.64000 0.00000 0.04000
degree of freedom=8
p-value=0.303400
Z=0.446149, p-value=0.672300
Z=0.446149, p-value=0.327700
Z=0.446149, p-value=0.655400
t=2,3,...,1000
D.W. test=2.041210
D.W. test=1.958790
286
(35.1.2) Dummy=1,
r(X3,X1)=0.989692,r(X3,X2)=0.996094,r(X1,X2)=0.994836,
The estimated line is X3=3.135026+-1.115684*X1+5.074093*X2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 4016310.4152957569 2008155.2076478784 129633.4189882031
error 1997 30935.5872966504 15.4910301936
total 1999 4047246.0025924072
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 3.1350260377 4.7189833703 0.66434 0.50640
X1 -1.1156843386 0.1760572333 -6.33705 0.00000
X2 5.0740930310 0.0875175417 57.97801 0.00000
----------------------------------------------------------------------------------
MSE=15.4910301936 , R2=0.992356 , R2(adj)=0.992349
~~~~~~~~~~~~~~~~~~~~~~~~~~ indivudal test ~~~~~~~~~~~~~~~~~~~
intercept 3.1350260377 4.7189833703 0.6643 0.4414
X1 slope -1.1156843386 0.1760572333 -6.3371 40.1583
X2 slope 5.0740930310 0.0875175417 57.9780 3361.4498
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ]
lower limit -5.25552 -3.57581 -2.37980 -1.37290 -0.44969 0.44899
1.37185 2.37860 3.57418 5.25278
upper limit -5.25552 -3.57581 -2.37980 -1.37290 -0.44969 0.44899 1.37185
2.37860 3.57418 5.25278
observed no 180.00000 191.00000 175.00000 163.00000 187.00000 197.00000 166.00000
188.00000 203.00000 178.00000 172.00000
probability 0.09091 0.09091 0.09091 0.09091 0.09091 0.09091 0.09091
0.09091 0.09091 0.09091 0.09091
expected no 181.81818 181.81818 181.81818 181.81818 181.81818 181.81818 181.81818
181.81818 181.81818 181.81818 181.81818
chi square 0.01818 0.46368 0.25568 1.94768 0.14768 1.26768 1.37618
0.21018 2.46768 0.08018 0.53018
degree of freedom=9
p-value=0.459200
Z=-1.163046, p-value=0.122500
Z=-1.163046, p-value=0.877500
287
Z=-1.163046, p-value=0.245000
t=2,3,...,2000
D.W. test=1.967591
D.W. test=2.032409
residual plor (X3 estimated line,X3) scatter diagram
(35.1.3) Merging two lines, one is Dummy=0 line and the other is Dummy=1 line,
Dummy explains two lines,
Dummy = 0 − − − − X 3 = β 0* + β1* X 1 + β 2* X 2 + ε
X3=37.536437+1.458679*X1+3.267224*X2,
Dummy = 1 − − − − X 3 = β 0 + β1 X 1 + β 2 X 2 + ε ,
X3=3.135026+-1.115684*X1+5.074093*X2,
( ) ( )
X 3 = β 0* + β 0 − β 0* × Dummy + β1* × X 1 + β1 − β1* × Dummy × X 1 + β 2* × X 2
( )
+ β 2 − β × Dummy × X 2 + ε ,
*
2
β̂ =37.536437, βˆ0 − βˆ0* =3.135026-37.536437=-34.401411,

*
( )
0
β * =1.458679, βˆ − βˆ * =-1.115684-1.458679=-2.54363,
1 0 0
β̂ 2* =3.267224, βˆ2 − βˆ2* =5.074093-3.267224=1.806869,
288
(35.2) 100,000,000 pair samples when Dummy=0,
100,000,000 pair samples when Dummy=1,
This is big data.
(35.2.1)
Dummy=0
r(X3,X1)=0.992276,r(X3,X2)=0.994758,r(X1,X2)=0.995035,
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
Regression 2 160891286545.8791800000 80445643272.9395900000
error 99999997 1600009673.5958138000 16.0000972160
total 99999999 162491296219.4750100000
----------------------------------------------------------------------------------
F test value=5027822155.5235453000
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 49.9819685798 0.0215437686 2320.01975 0.00000
X1 1.9990339848 0.0008038458 2486.83762 0.00000
X2 3.0004562569 0.0003999390 7502.28390 0.00000
----------------------------------------------------------------------------------
MSE=16.0000972160 , R2=0.990153 , R2(adj)=0.990153
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -6.57964 -5.12640 -4.14572 -3.36646 -2.69787 -2.09752
-1.54116 -1.01430 -0.50245 -0.00091 0.50174 1.01330 1.54113 2.09752
2.69771 3.36622 4.14554 5.12599 6.57945
upper limit -6.57964 -5.12640 -4.14572 -3.36646 -2.69787 -2.09752 -1.54116
-1.01430 -0.50245 -0.00091 0.50174 1.01330 1.54113 2.09752 2.69771
3.36622 4.14554 5.12599 6.57945
observed no 5000255.00000 4995740.00000 4996814.00000 5000481.00000 5002041.00000 5000762.00000
5006458.00000 4989759.00000 5009520.00000 4991637.00000 5000057.00000 5009650.00000
4999340.00000 4999557.00000 5000250.00000 4998997.00000 4999801.00000 5000112.00000
4997946.00000 5000823.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.01300 3.62952 2.03012 0.04627 0.83314 0.11613 8.34115
20.97562 18.12608 13.98795 0.00065 18.62450 0.08712 0.03925 0.01250
0.20120 0.00792 0.00251 0.84378 0.13547
p-value=0.000000
289
Z=0.230026, p-value=0.591000
Z=0.230026, p-value=0.409000
Z=0.230026, p-value=0.818000
t=2,3,...,100000000
D.W. test=2.000129
D.W. test=1.999871
X3 estimated line and residual X3 estimated line and X3

Variance : 16.00010
S.D. : 4.00001
MAD : 3.19133
Range : 44.51238
Median : -0.00026
Q1 : -2.69727
Q2 : -0.00026
Q3 : 2.69742
IQR : 5.39469
C.V. : none
290
(35.2.2) Dummy=1
r(X3,X1)=0.990027,r(X3,X2)=0.996060,r(X1,X2)=0.995038,
The estimated line is X3=9.981917+-1.001015*X1+5.000479*X2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
F
----------------------------------------------------------------------------------
Regression 2 205006682851.6498400000 102503341425.8249200000
error 99999997 1600005568.4168379000 16.0000561642
total 99999999 206606688420.0666800000
----------------------------------------------------------------------------------
F test value=6406436350.8527622000,
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 9.9819167145 0.0215435153 463.33742 0.00000
X1 -1.0010152763 0.0008040300 -1244.99738 0.00000
X2 5.0004786870 0.0004000125 12500.80602 0.00000
----------------------------------------------------------------------------------
MSE=16.0000561642 , R2=0.992256 , R2(adj)=0.992256
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -6.57963 -5.12640 -4.14571 -3.36646 -2.69787 -2.09751
-1.54116 -1.01430 -0.50245 -0.00091 0.50174 1.01330 1.54113 2.09752
2.69771 3.36621 4.14553 5.12598 6.57944
upper limit -6.57963 -5.12640 -4.14571 -3.36646 -2.69787 -2.09751 -1.54116
-1.01430 -0.50245 -0.00091 0.50174 1.01330 1.54113 2.09752 2.69771
3.36621 4.14553 5.12598 6.57944
observed no 5001503.00000 4996434.00000 4999121.00000 5000801.00000 5004121.00000 4997297.00000
5001568.00000 4991596.00000 5009405.00000 4986991.00000 5001564.00000 5007662.00000
4998122.00000 5001150.00000 4998299.00000 5002529.00000 5000721.00000 4997985.00000
5003478.00000 4999653.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.45180 2.54327 0.15453 0.12832 3.39653 1.46124 0.49172
14.12544 17.69081 33.84682 0.48922 11.74125 0.70538 0.26450 0.57868
1.27917 0.10397 0.81205 2.41930 0.02408
p-value=0.000000
291
Z=0.141220, p-value=0.556200
Z=0.141220, p-value=0.443800
Z=0.141220, p-value=0.887600
t=2,3,...,100000000
D.W. test=2.000287
D.W. test=1.999713
X3 estimated line and residual X3 estimated line and X3
X3 估計值與殘差的聯合機率分配 X3 估計值與 X3 的聯合機率分配

Variance : 16.00006
S.D. : 4.00001
MAD : 3.19159
Range : 45.37638
Median : 0.00023
Q1 : -2.69811
Q2 : 0.00023
Q3 : 2.69826
IQR : 5.39638
C.V. : none
292
(35.2.3) Merging two lines, one is Dummy=0 line and the other is Dummy=1 line,
Dummy explains two lines,
Dummy ~ Bernoulli ( p = 0.5), the sample sizes of two lines are equally,
X 1 ~ Normal (E ( X 1 ) = 100,Var ( X 1 ) = 25),
ε ~ Normal (E (ε ) = 0,Var (ε ) = 16 ),
X 3 Dummmy = 0, x1 , x2 = 50 + 2 x1 + 3 x2 + ε
又
X 1 ~ Normal (E ( X 1 ) = 100, Var ( X 1 ) = 25),
X 2 x1 ~ Normal (E ( X 2 x1 ) = 50 + 2 x1 , Var ( X 2 x1 ) = 1),
ε ~ Normal (E (ε ) = 0, Var (ε ) = 16),
X 3 Dummmy = 1, x1 , x2 = 10 + x1 + 5 x2 + ε
Dummy and X1,X2, ε are independent random variables, ε and X1,X2 are
independent random variables,X1,X2 are depenedent random variables.
The joint probability distribution of f (Dummy, x1 , x2 , x3 ) ,
f (Dummy, x1 , x2 , x3 ) from f (Dummy, x1 , x2 , error )
f (Dummy, x1 , x2 )
1  − (x1 − 100 )2  1  − ( x2 − (50 + 2 x1 ))2 
= 0.5 Dummy × 0.51− Dummy × exp ×
 exp 

,

5 2π  50  2π  2 
 − error 
2
f (error ) =
1
exp ,−∞ < error < ∞,−∞ < x1 , x2 < ∞, Dummy = 0,1,
4 2π  32 
 ( x3 − 50 − 2 x1 − 3 x2 )2 
f (x3 = 50 + 2 x1 + 3 x2 + error Dummy = 0, x1 , x2 ) =
1 ,
exp − 
4 2π  32 
 ( x − 10 + x1 − 5 x2 ) 
2
f (x3 = 10 − x1 − 5 x2 + error Dummy = 1, x1 , x2 ) =

1 ,
exp − 3 
4 2π  32 
− ∞ < x3 < ∞ ,
or
 (x3 − Q )2 
f (x3 = Q + error Dummy, x1 , x2 ) =
1 ,−∞ < x3 < ∞,
exp − 
4 2π  32 
Q = 50 − 40 × Dummy + 2 x1 − 3 × Dummy × x1 + 3 x2 + 2 × Dummy × x1
f (Dummy, x1 , x2 , x3 ) = f (Dummy, x1 , x2 ) f (x3 Dummy, x1 , x2 )
1 ∞ ∞
f ( x3 ) = ∑ ∫ ∫ f (Dummy, x , x , x )dx dx
1 2 3 1 2
−∞ −∞
Dummy =0
293
X3 conditional probability distribution when Dummy=0 is condition.
Variance : 1624.91298
S.D. : 40.31021
MAD : 32.16495
Range : 454.78176
Mid_range : 997.76593
Median : 999.99921
Q1 : 972.81214
Q2 : 999.99921
Q3 : 1027.19611
IQR : 54.38398
C.V. : 0.04031
X3|Dummy=0~Normal(1000.00251, 1624.91298),
X3 conditional probability distribution when Dummy=1 is condition.

Variance : 2066.06690
S.D. : 45.45401
MAD : 36.26540
Range : 513.58205
Mid_range : 1153.56746
Median : 1159.99448
Q1 : 1129.34318
Q2 : 1159.99448
Q3 : 1190.64885
IQR : 61.30568
C.V. : 0.03918
X3|Dummy=1~Normal(1159.99545,2066.06690),
f ( x3 ) = P(Dummy = 0 ) f (x3 Dummy = 0 ) + P(Dummy = 1) f (x3 Dummy = 1)

1  ( x3 − 1000.00251)2 
= 0.5 × × exp − 

2π ×1624.91298  2 × 1624.91298 
1  ( x3 − 1159.99545)2 
+ 0.5 × × exp − ,−∞ < x3 < ∞,
2π × 2066.06690 2 × 2066 .06690 
 

Variance : 8244.70330
S.D. : 90.80035
MAD : 81.06944
Range : 625.20751
Mid_range : 1090.13817
Median : 1075.22177
Q1 : 999.98193
Q2 : 1075.22177
Q3 : 1160.00571
IQR : 160.02378
C.V. : 0.08407
294
Note:X3 marginal probability distribution is not from
(X3|Dummy=0+X3|Dummy=1)/2
~Normal((1000.00251+1159.99545)/2,(1624.91298+2066.06690)/4,
(X3|Dummy=0+X3|Dummy=1)/2 marginal probability distribution

S.D. : 30.37770
MAD : 24.23845
Range : 349.66821
Mid_range : 1078.74152
Median : 1079.99648
Q1 : 1059.50930
Q2 : 1079.99648
Q3 : 1100.49100
IQR : 40.98169
C.V. : 0.02813
295
6.7. The endogenous variable in the linear model, the other
Example 36,
X 2 (t + 1) = β 0 + β1 X 1 (t ) + β 2 X 3 (t ) + β 3 X 4 (t ) + ε 1 (t ),
X 1 (t + 1) = α 0 + α 1 X 2 (t + 1) + α 2 X 3 (t + 1) + α 3 X 4 (t + 1) + ε 2 (t + 1),
X3(t)~ Normal(mu=10,sigma*sigma=4),
X4(t)~ Normal(mu=30+2*X3,sigma*sigma=25),
X 2 (t + 1) = 0.1 + 0.8 X 1 (t ) + 0.2 X 3 (t ) − 0.02 X 4 (t ) + ε 1 (t ),

X 1 (t + 1) = 0.2 + 0.9 X 2 (t + 1) + 0.3 X 3 (t + 1) − 0.01X 4 (t + 1) + ε 2 (t + 1),
ε 1 = ε 2 = ε (t ) ~ Normal (0,1), t = 0,1,2,....., n − 1 , X 1 (t = 0) = 10,

(36.1.1)Merging two lines, there are 2000 pair samples.
X1 is depenent variable,X2,X3,X4 are independent variables.
X 1 = α 0 + α1 X 2 + α 2 X 3 + α 3 X 4 + ε 2 ,
The linar model analysis
Independent variables are X2,X3,X4
r(X1,X2)=0.821468,r(X1,X3)=0.112102,r(X1,X4)=0.059818,r(X2,X3)=0.077767,
r(X2,X4)=0.025066,r(X3,X4)=0.609887,
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
Regression 3 5145.3824161158 1715.1274720386
error 1996 2451.5230847452 1.2282179783
total 1999 7596.9055008611
----------------------------------------------------------------------------------
F test value=1396.4357323377
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 2.5047251362 0.2532236294 9.89136 0.00000
X2 0.8697515895 0.0135652896 64.11596 0.00000
X3 0.0395712780 0.0163199930 2.42471 0.01520
X4 0.0048332598 0.0050010169 0.96646 0.33360
----------------------------------------------------------------------------------
MSE=1.2282179783 , R2=0.677300 , R2(adj)=0.676815
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ]
lower limit -1.47983 -1.00687 -0.67010 -0.38658 -0.12662 0.12643
296
0.38628 0.66976 1.00641 1.47906
upper limit -1.47983 -1.00687 -0.67010 -0.38658 -0.12662 0.12643 0.38628
0.66976 1.00641 1.47906
observed no 178.00000 186.00000 168.00000 172.00000 189.00000 193.00000 161.00000
210.00000 197.00000 162.00000 184.00000
probability 0.09091 0.09091 0.09091 0.09091 0.09091 0.09091 0.09091
0.09091 0.09091 0.09091 0.09091
expected no 181.81818 181.81818 181.81818 181.81818 181.81818 181.81818 181.81818
181.81818 181.81818 181.81818 181.81818
chi square 0.08018 0.09618 1.05018 0.53018 0.28368 0.68768 2.38368
4.36818 1.26768 2.16018 0.02618
degree of freedom=9
p-value=0.165600
Z=-4.199955, p-value=0.000100
Z=-4.199955, p-value=0.999900
Z=-4.199955, p-value=0.000200
t=2,3,...,2000
D.W. test=1.717703
D.W. test=2.282297

[1.167432 , 1.295682]
[1.080477 , 1.138280]
[1.156467 , 1.309461]
[1.075392 , 1.144317]
[1.135613 , 1.337267]
[1.065651 , 1.156403]
297
(36.1.2) Merging two lines, there are 2000 pair samples.
X2 is dependent variable and X1,X3,X4 are independent variables,
X 2 = β 0 + β1 X 1 + β 2 X 3 + β 3 X 4 + ε 1 ,
r(X2,X1)=0.821468,r(X2,X3)=0.077767,r(X2,X4)=0.025066,r(X1,X3)=0.112102,
r(X1,X4)=0.059818,r(X3,X4)=0.609887,
The estimated line is X2=1.805642+0.773962*X1+0.000384*X3+-0.007151*X4
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
Regression 3 4538.9481394521 1512.9827131507
error 1996 2181.5246729835 1.0929482330
total 1999 6720.4728124356
----------------------------------------------------------------------------------
F test value=1384.3132433239
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 1.8056421122 0.2412956998 7.48311 0.00000
X1 0.7739615277 0.0120712769 64.11596 0.00000
X3 0.0003842533 0.0154177372 0.02492 0.98000
X4 -0.0071513426 0.0047159801 -1.51641 0.12960
----------------------------------------------------------------------------------
MSE=1.0929482330 , R2=0.675391 , R2(adj)=0.674903
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ]
lower limit -1.39597 -0.94980 -0.63212 -0.36467 -0.11945 0.11926
0.36439 0.63180 0.94937 1.39524
upper limit -1.39597 -0.94980 -0.63212 -0.36467 -0.11945 0.11926 0.36439
0.63180 0.94937 1.39524
observed no 197.00000 177.00000 165.00000 196.00000 177.00000 194.00000 183.00000
170.00000 181.00000 163.00000 197.00000
probability 0.09091 0.09091 0.09091 0.09091 0.09091 0.09091 0.09091
0.09091 0.09091 0.09091 0.09091
expected no 181.81818 181.81818 181.81818 181.81818 181.81818 181.81818 181.81818
181.81818 181.81818 181.81818 181.81818
chi square 1.26768 0.12768 1.55568 1.10618 0.12768 0.81618 0.00768
0.76818 0.00368 1.94768 1.26768
degree of freedom=9
298
p-value=0.437600
Z=-3.262117, p-value=0.000600
Z=-3.262117, p-value=0.999400
Z=-3.262117, p-value=0.001200
t=2,3,...,2000
D.W. test=1.737485
D.W. test=2.262515
[1.038856 , 1.152982]
[1.019243 , 1.073770]
[1.029100 , 1.165243]
[1.014446 , 1.079464]
[1.010542 , 1.189988]
[1.005257 , 1.090866]
(36.1.3) X 1 (t + 1) = 0.2 + 0.9 X 2 (t + 1) + 0.3 X 3 (t + 1) − 0.01X 4 (t + 1) + ε 2 (t + 1), there are

1000 pair samples.
r(X1,X2)=0.819799,r(X1,X3)=0.239361,r(X1,X4)=0.149957,r(X2,X3)=-0.017271,
r(X2,X4)=0.014170,r(X3,X4)=0.647157, The estimated line is
299
X1=0.778870+0.878632*X2+0.291467*X3+-0.013764*X4
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 3 2808.8009175528 936.2669725176 932.8222440846
error 996 999.6780314159 1.0036928026
total 999 3808.4789489687
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 0.7788696180 0.3249882453 2.39661 0.01640
X2 0.8786315019 0.0172949985 50.80264 0.00000
X3 0.2914669693 0.0219899836 13.25453 0.00000
X4 -0.0137636091 0.0065889029 -2.08891 0.03680
----------------------------------------------------------------------------------
MSE=1.0036928026 , R2=0.737513 , R2(adj)=0.736722
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.28396 -0.84317 -0.52534 -0.25378 0.00003 0.25379
0.52535 0.84275 1.28386
upper limit -1.28396 -0.84317 -0.52534 -0.25378 0.00003 0.25379 0.52535
0.84275 1.28386
observed no 96.00000 100.00000 98.00000 102.00000 96.00000 105.00000 100.00000
109.00000 94.00000 100.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.16000 0.00000 0.04000 0.04000 0.16000 0.25000 0.00000
0.81000 0.36000 0.00000
degree of freedom=8
p-value=0.986000
Z=-1.131181, p-value=0.129000
Z=-1.131181, p-value=0.871000
Z=-1.131181, p-value=0.258000
t=2,3,...,1000
D.W. test=1.994126
D.W. test=2.005874
300
[0.934790 , 1.083562]
[0.966845 , 1.040943]
[0.922656 , 1.100335]
[0.960550 , 1.048969]
[0.899818 , 1.134680]
[0.948587 , 1.065214]
(36.1.4) X 2 (t + 1) = 0.1 + 0.8 X 1 (t ) + 0.2 X 3 (t ) − 0.02 X 4 (t ) + ε 1 (t ), there are 1000 pair

samples.
r(X2,X1)=0.823147,r(X2,X3)=0.170141,r(X2,X4)=0.036211,r(X1,X3)=-0.011653,
r(X1,X4)=-0.032400,r(X3,X4)=0.572137,
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 3 2393.3213355196 797.7737785065 821.7709160524
error 996 966.9150706982 0.9707982638
total 999 3360.2364062178
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 0.3015963945 0.3379359668 0.89247 0.37200
X1 0.7757719497 0.0160169882 48.43432 0.00000
X3 0.2010013433 0.0194928200 10.31156 0.00000
X4 -0.0175805681 0.0061397492 -2.86340 0.00420
----------------------------------------------------------------------------------
MSE=0.9707982638 , R2=0.712248 , R2(adj)=0.711381
301
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.26275 -0.82923 -0.51666 -0.24959 0.00002 0.24960
0.51667 0.82882 1.26264
upper limit -1.26275 -0.82923 -0.51666 -0.24959 0.00002 0.24960 0.51667
0.82882 1.26264
observed no 100.00000 92.00000 107.00000 112.00000 100.00000 86.00000 101.00000
104.00000 99.00000 99.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.00000 0.64000 0.49000 1.44000 0.00000 1.96000 0.01000
0.16000 0.01000 0.01000
degree of freedom=8
p-value=0.787000
Z=1.408094, p-value=0.920500
Z=1.408094, p-value=0.079500
Z=1.408094, p-value=0.159000
t=2,3,...,1000
D.W. test=2.091406
D.W. test=1.908594
302
(36.1.5)Conclusion,
Two lines cannot merge a line from above output.
(36.2) paird samples, n=50,000,000, it is big data.

(36.2.1)Merging two lines, there are 100,000,000 pair samples.
X 1 = α 0 + α1 X 2 + α 2 X 3 + α 3 X 4 + ε 2 ,
r(X1,X2)=0.848577,r(X1,X3)=0.130399,r(X1,X4)=0.072374,r(X2,X3)=0.079323,
r(X2,X4)=0.030193,r(X3,X4)=0.624759,
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
Regression 3 333486636.3724753900 111162212.1241584600
error 99999996 127019317.8117549400 1.2701932289
total 99999999 460505954.1842303300
----------------------------------------------------------------------------------
F test value=87515985.4364943800
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 1.9159002951 0.0010857543 1764.57991 0.00000
X2 0.8986221367 0.0000561273 16010.43525 0.00000
X3 0.0601329080 0.0000723854 830.73206 0.00000
X4 0.0039825055 0.0000225497 176.61009 0.00000
----------------------------------------------------------------------------------
MSE=1.2701932289 , R2=0.724174 , R2(adj)=0.724174
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.85385 -1.44440 -1.16808 -0.94852 -0.76014 -0.59099
-0.43423 -0.28579 -0.14157 -0.00026 0.14137 0.28550 0.43422 0.59099
0.76010 0.94845 1.16803 1.44428 1.85380
upper limit -1.85385 -1.44440 -1.16808 -0.94852 -0.76014 -0.59099 -0.43423
-0.28579 -0.14157 -0.00026 0.14137 0.28550 0.43422 0.59099 0.76010
0.94845 1.16803 1.44428 1.85380
observed no 4998822.00000 4984458.00000 4988518.00000 4995274.00000 5001333.00000 5000594.00000
5011305.00000 4993699.00000 5015713.00000 4995329.00000 5008776.00000 5018245.00000
5006062.00000 5009219.00000 5000800.00000 5000197.00000 4997609.00000 4992711.00000
4984845.00000 4996491.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
303
chi square 0.27754 48.31075 26.36726 4.46702 0.35538 0.07057 25.56060
7.94052 49.37967 4.36365 15.40364 66.57600 7.34957 16.99799 0.12800
0.00776 1.14338 10.62590 45.93480 2.46262
p-value=0.000000
Z=-890.032474, p-value=0.000000
Z=-890.032474, p-value=1.000000
Z=-890.032474, p-value=0.000000
t=2,3,...,100000000
D.W. test=1.724288
D.W. test=2.275712
[1.269898 , 1.270489]
[1.126897 , 1.127160]
[1.269841 , 1.270545]
[1.126872 , 1.127185]
[1.269731 , 1.270656]
[1.126823 , 1.127234]
304
The marginal probability distribution of X1 estimated line
Variance : 3.33487
S.D. : 1.82616
MAD : 1.45705
Range : 21.73526
Median : 13.17991
Q1 : 11.94844
Q2 : 13.17991
Q3 : 14.41184
IQR : 2.46340
C.V. : 0.13856

Variance : 1.27019
S.D. : 1.12703
MAD : 0.89873
Range : 12.95684
Mid_range : 0.18301
Median : 0.00017
Q1 : -0.75902
Q2 : 0.00017
Q3 : 0.75910
IQR : 1.51812
C.V. : none
(36.2.2)Merging two lines, there are 100,000,000 pair samples.

X 2 = β 0 + β1 X 1 + β 2 X 3 + β 3 X 4 + ε 1 ,
r(X2,X1)=0.848577,r(X2,X3)=0.079323,r(X2,X4)=0.030193,r(X1,X3)=0.130399,
r(X1,X4)=0.072374,r(X3,X4)=0.624759,
The estimated line is X2=1.593907+0.800519*X1+-0.020101*X3+-0.005993*X4
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
F
----------------------------------------------------------------------------------
Regression 3 292852966.2136475400 97617655.4045491810
error 99999996 113152596.5576512400 1.1315260108
total 99999999 406005562.7712987700
----------------------------------------------------------------------------------
F test value=86270801.0859536680
Individual test
----------------------------------------------------------------------------------
305
----------------------------------------------------------------------------------
intercept 1.5939066857 0.0010283287 1549.99726 0.00000
X1 0.8005194001 0.0000499999 16010.43525 0.00000
X3 -0.0201006237 0.0000685260 -293.32850 0.00000
X4 -0.0059930572 0.0000212781 -281.65318 0.00000
----------------------------------------------------------------------------------
MSE=1.1315260108 , R2=0.721303 , R2(adj)=0.721303

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.74974 -1.36328 -1.10248 -0.89525 -0.71745 -0.55780
-0.40984 -0.26974 -0.13362 -0.00024 0.13343 0.26947 0.40984 0.55780
0.71741 0.89519 1.10243 1.36317 1.74969
upper limit -1.74974 -1.36328 -1.10248 -0.89525 -0.71745 -0.55780 -0.40984
-0.26974 -0.13362 -0.00024 0.13343 0.26947 0.40984 0.55780 0.71741
0.89519 1.10243 1.36317 1.74969
observed no 4996856.00000 4999659.00000 5002524.00000 5001516.00000 4999150.00000 4999288.00000
5001227.00000 4989628.00000 5014409.00000 4991340.00000 5001160.00000 5012125.00000
4993358.00000 4998766.00000 4999704.00000 5000811.00000 4997154.00000 5000813.00000
4999024.00000 5001488.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 1.97695 0.02326 1.27412 0.45965 0.14450 0.10139 0.30111
21.51568 41.52386 14.99912 0.26912 29.40312 8.82323 0.30455 0.01752
0.13154 1.61994 0.13219 0.19052 0.44283
p-value=0.000000
Z=-899.309523, p-value=0.000000
Z=-899.309523, p-value=1.000000
Z=-899.309523, p-value=0.000000
t=2,3,...,100000000
D.W. test=1.721284
D.W. test=2.278716
306
[1.131263 , 1.131789]
[1.063608 , 1.063856]
[1.131212 , 1.131840]
[1.063585 , 1.063880]
[1.131114 , 1.131938]
[1.063538 , 1.063926]

Variance : 2.92853
S.D. : 1.71129
MAD : 1.36538
Range : 20.44647
Median : 11.64390
Q1 : 10.48985
Q2 : 11.64390
Q3 : 12.79804
IQR : 2.30819
C.V. : 0.14697

Variance : 1.13153
S.D. : 1.06373
MAD : 0.84869
Range : 12.39443
Median : -0.00012
Q1 : -0.71744
Q2 : -0.00012
Q3 : 0.71738
IQR : 1.43482
C.V. : none
307
(36.2.3)The marginal probability distribution of X1,X2,X3,X4,
Variance : 4.60506
S.D. : 2.14594
MAD : 1.71220
Range : 25.60974
Median : 13.17982
Q1 : 11.73255
Q2 : 13.17982
Q3 : 14.62702
IQR : 2.89447
C.V. : 0.16282
Variance : 4.06006
S.D. : 2.01496
MAD : 1.60771
Range : 23.52923
Median : 11.64399
Q1 : 10.28502
Q2 : 11.64399
Q3 : 13.00343
IQR : 2.71841
C.V. : 0.17305

Variance : 4.00020
S.D. : 2.00005
MAD : 1.59583
Range : 22.40931
Mid_range : 9.86822
Median : 10.00003
Q1 : 8.65107
Q2 : 10.00003
Q3 : 11.34928
IQR : 2.69821
C.V. : 0.20000
Variance : 40.99753
S.D. : 6.40293
MAD : 5.10864
Range : 72.56000
Median : 49.99980
Q1 : 45.68236
Q2 : 49.99980
Q3 : 54.31871
IQR : 8.63635
C.V. : 0.12806
308
(36.2.4)The joint probability distribution of two random variables from 1,X2,X3,X4.
F(x1,x2) F(x2,x1)

f(x1,x3) f(x3,x1)

f(x1,x4) f(x4,x1)

309
f(x2,x3) f(x3,x2)

f(x2,x4) f(x4,x2)

(36.2.4) X 1 (t + 1) = 0.2 + 0.9 X 2 (t + 1) + 0.3 X 3 (t + 1) − 0.01X 4 (t + 1) + ε 2 (t + 1),

there are 50,000,000 pair samples.
r(X1,X2)=0.845109,r(X1,X3)=0.260988,r(X1,X4)=0.144931,r(X2,X3)=0.000025,
r(X2,X4)=0.000085,r(X3,X4)=0.624800,
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
F
----------------------------------------------------------------------------------
Regression 3 180254998.7811259000 60084999.5937086340
error 49999996 49997973.2549964930 0.9999595451
total 49999999 230252972.0361223800
----------------------------------------------------------------------------------
F test value=60087430.4248964120
310
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 0.2001544589 0.0013811857 144.91495 0.00000
X2 0.9000407087 0.0000701843 12823.95337 0.00000
X3 0.2999729077 0.0000905484 3312.84656 0.00000
X4 -0.0100017702 0.0000282866 -353.58635 0.00000
----------------------------------------------------------------------------------
MSE=0.9999595451 , R2=0.782856 , R2(adj)=0.782856

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.64487 -1.28157 -1.03640 -0.84160 -0.67445 -0.52437
-0.38528 -0.25357 -0.12561 -0.00023 0.12543 0.25332 0.38527 0.52437
0.67441 0.84153 1.03636 1.28147 1.64482
upper limit -1.64487 -1.28157 -1.03640 -0.84160 -0.67445 -0.52437 -0.38528
-0.25357 -0.12561 -0.00023 0.12543 0.25332 0.38527 0.52437 0.67441
0.84153 1.03636 1.28147 1.64482
observed no 2500659.00000 2501365.00000 2497107.00000 2500555.00000 2499179.00000 2499651.00000
2500732.00000 2494826.00000 2502847.00000 2495174.00000 2499691.00000 2506501.00000
2501303.00000 2500770.00000 2498739.00000 2500871.00000 2498925.00000 2500768.00000
2499930.00000 2500407.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000
chi square 0.17371 0.74529 3.34778 0.12321 0.26962 0.04872 0.21433
10.70811 3.24216 9.31611 0.03819 16.90520 0.67912 0.23716 0.63605
0.30346 0.46225 0.23593 0.00196 0.06626
p-value=0.000100
Z=-1.607831, p-value=0.054000
Z=-1.607831, p-value=0.946000
Z=-1.607831, p-value=0.108000
t=2,3,...,50000000
D.W. test=1.999694
D.W. test=2.000306
311
estiamted line and residual estiamted line and X1
The marginal probability distribution of X1 estimate line,

Variance : 3.60510
S.D. : 1.89871
MAD : 1.51495
Range : 22.74271
Median : 13.17959
Q1 : 11.89926
Q2 : 13.17959
Q3 : 14.46080
IQR : 2.56153
C.V. : 0.14406

Variance : 0.99996
S.D. : 0.99998
MAD : 0.79787
Range : 11.02130
Median : 0.00017
Q1 : -0.67438
Q2 : 0.00017
Q3 : 0.67447
IQR : 1.34885
C.V. : none
312
(36.2.5) X 2 (t + 1) = 0.1 + 0.8 X 1 (t ) + 0.2 X 3 (t ) − 0.02 X 4 (t ) + ε 1 (t ),
there are 50,000,000 pair samples.
r(X2,X1)=0.852045,r(X2,X3)=0.158640,r(X2,X4)=0.060304,r(X1,X3)=-0.000222,
r(X1,X4)=-0.000188,r(X3,X4)=0.624718,
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
Regression 3 152997438.5548028300 50999146.1849342810
error 49999996 50005342.8307363090 1.0001069366
total 49999999 203002781.3855391400
----------------------------------------------------------------------------------
F test value=50993693.0915864330,
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 0.0989422760 0.0014125169 70.04679 0.00000
X1 0.8000688390 0.0000659053 12139.66651 0.00000
X3 0.2000458617 0.0000905696 2208.75324 0.00000
X4 -0.0200050845 0.0000282883 -707.18705 0.00000
----------------------------------------------------------------------------------
MSE=1.0001069366 , R2=0.753672 , R2(adj)=0.753672

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.64499 -1.28167 -1.03648 -0.84166 -0.67450 -0.52441
-0.38531 -0.25359 -0.12562 -0.00023 0.12544 0.25334 0.38530 0.52441
0.67446 0.84160 1.03644 1.28156 1.64494
upper limit -1.64499 -1.28167 -1.03648 -0.84166 -0.67450 -0.52441 -0.38531
-0.25359 -0.12562 -0.00023 0.12544 0.25334 0.38530 0.52441 0.67446
0.84160 1.03644 1.28156 1.64494
observed no 2500074.00000 2501467.00000 2499031.00000 2500092.00000 2497857.00000 2500480.00000
2499526.00000 2493614.00000 2506772.00000 2494433.00000 2501981.00000 2505350.00000
2500752.00000 2500367.00000 2495674.00000 2501336.00000 2502093.00000 2498945.00000
2500025.00000 2500131.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000
chi square 0.00219 0.86084 0.37558 0.00339 1.83698 0.09216 0.08987
16.31240 18.34399 12.39660 1.56974 11.44900 0.22620 0.05388 7.48571
0.71396 1.75226 0.44521 0.00025 0.00686
p-value=0.000000
313
Z=0.491071, p-value=0.688400
Z=0.491071, p-value=0.311600
Z=0.491071, p-value=0.623200
t=2,3,...,50000000
D.W. test=1.999954
D.W. test=2.000046


Variance : 3.05995
S.D. : 1.74927
MAD : 1.39573
Range : 21.10509
Median : 11.64399
Q1 : 10.46409
Q2 : 11.64399
Q3 : 12.82382
IQR : 2.35973
C.V. : 0.15023
314
Variance : 1.00011
S.D. : 1.00005
MAD : 0.79790
Range : 11.07731
Median : 0.00011
Q1 : -0.67440
Q2 : 0.00011
Q3 : 0.67462
IQR : 1.34903
C.V. : none
The endogenous variable cannot be applied when the linear model.
315
Chaper 7. Multi-variate analysis using linear model
The multi-variate analyisis is vey complex, for big data, the linear model analysis will
do the job of the multi-varaiate analysis.
The method is select one variable from X 1 ,...., X k which is dependent variable and
the other variables are independent variable. The number of line model is
 k    k − 1  k − 2   k − 1 
  ×    +   + ... + 
−
( )
  = k × 2 k − 1 ,
  
1 1   1   k 1 
From the correlation matrix can get the relationship between any two random
variables.
Non-linear model also can be running, the non-linear formula is in appendix 3. There
are has 33 kinds of model, the number of line model is
 k    k − 1  k − 2  k − 1 
  ×    × 33 +   × 332 + ... +  (
 × 33k  = k × 34 k − 1 ,)
1   1  1   k − 1 
Example 37,
(1) The population distribution of sample data,
X1~Shifted exponential(1,0.1),
X2|x1~Normal(4+5*log(x1),4),
X3|x1~Raised cosine(5+x1+log(x1),2),
X4|x1,x2~Semi circle(3+0.5*x1+0.5*x2,4),
X5|x2,x3~Arcsin(4.5+0.3*x2+0.7*x3,3),
X6|x4,x5~DE(0.5,10+2*x4*x5),
f X 1 ( x1 ) = exp(− ( x1 − 0.1)), c < x1 < ∞,

 ( x − (4 + 5 × log(x1 )))2 
f X 2 X 1 = x1 (x 2 x1 ) =
1 ,−∞ < x 2 < ∞ ,
exp − 2 
2 2π  8 
1  x − (5 + x1 + log(x1 )) 
f X 3 x1 (x3 x1 ) = 1 + cos 2 × π ,
4  2 
5 + x1 + log(x1 ) − 2 ≤ x3 ≤ 5 + x1 + log(x1 ) + 2,
f X 4 x1 , x2 (x 4 x1 , x 2 ) = R 2 − ( x 4 − (3 + 0.5 x1 + 0.5 x 2 )) , x 4 − (3 + 0.5 x1 + 0.5 x 2 ) ≤ 2,

1 2
2π
f X 5 x2 , x3 (x5 x 2 , x3 ) = , x5 − (4.5 + 0.3 x 2 + 0.7 x3 ) < 3,
1 1
π (x5 − (4.5 + 0.3x2 + 0.7 x3 ))2
1−
9
f X 6 x4 , x5 (x6 x 4 , x5 ) = exp(− 0.5 x6 − (10 + 2 x 4 x5 ) ),−∞ < x6 < ∞
1
4
316
(1.2)There are simulating 100000000 data of each random variable,
(2) .The marigainl probability distribution and join probability distribution from the
sample data,
(2.1)The marigainl probability distribution,
Variance : 1.00023
S.D. : 1.00011
MAD : 0.73579
Range : 17.49852
Mid_range : 8.84926
Median : 0.79294
Q1 : 0.38758
Q2 : 0.79294
Q3 : 1.48601
IQR : 1.09843
C.V. : 0.90928
Variance : 24.76556
S.D. : 4.97650
MAD : 4.06678
Range : 40.40991
Mid_range : 3.54517
Median : 2.77224
Q1 : -0.97310
Q2 : 2.77224
Q3 : 6.18263
IQR : 7.15573
C.V. : 1.94418
Variance : 3.95023
S.D. : 1.98752
MAD : 1.56641
Range : 26.34688
Median : 5.59331
Q1 : 4.37840
Q2 : 5.59331
Q3 : 6.99343
IQR : 2.61503
C.V. : 0.34195
317
Variance : 12.43939
S.D. : 3.52695
MAD : 2.84745
Range : 31.39992
Mid_range : 7.55374
Median : 4.79539
Q1 : 2.35577
Q2 : 4.79539
Q3 : 7.26191
IQR : 4.90614
C.V. : 0.73015
Variance : 12.08518
S.D. : 3.47638
MAD : 2.81281
Range : 32.18093
Median : 9.26423
Q1 : 6.87166
Q2 : 9.26423
Q3 : 11.73394
IQR : 4.86228
C.V. : 0.37233
Variance : 9568.44490
S.D. : 97.81843
MAD : 76.04563
Range : 1535.85616
Mid_range : 680.08533
Median : 94.68323
Q1 : 42.22852
Q2 : 94.68323
Q3 : 167.12280
IQR : 124.89428
C.V. : 0.84500
318
(2.2)The jont probability distribution, it can explains the relationship of two random
variables and estimates the mathematical equaiton of each other.
f(x1,x2) f(x2,x1)
E(X1)= 1.1001, Var(X1)= 1.0005, E(X2)= 2.5610, Var(X2)= 24.7737,

E(X2|X1) E(X1|X2)
Var(X2|X1) Var(X1|X2)
The Var(X2|X1) is closed to a constant and (X1,E(X2|X1)) has a logarithm line.

f(x1,x3) f(x3,x1)
319
E(X1)= 1.1000, Var(X1)= 1.0003, E(X3)= 5.8121, Var(X3)= 3.9513,
E(X3|X1) E(X1|X3)
The Var(X3|X1) is closed to a constant and (X1,E(X3|X1)) is approached to the

logarithm line.
f(x1,x4) f(x4,x1)
E(X1)= 1.0999, Var(X1)= 1.0003, E(X4)= 4.8291, Var(X4)= 12.4382,

E(X4|X1) E(X1|X4)
320

f(x1,x5) f(x5,x1)
E(X1)= 1.1001, Var(X1)= 1.0006, E(X5)= 9.3370, Var(X5)= 12.0880,

E(X5|X1) E(X1|X5)
321
f(x1,x6) f(x6,x1)
E(X1)= 1.1000, Var(X1)= 1.0003, E(X6)= 115.7863, Var(X6)= 9568.8515,

E(X6|X1) E(X1|X6)
The (X1,E(X6|X1)) has a logarithm line.

f(x2,x3) f(x3,x2)
E(X2)= 2.5605, Var(X2)= 24.7695, E(X3)= 5.8121, Var(X3)= 3.9510,

322
E(X3|X2) E(X2|X3)
f(x2,x4) f(x4,x2)
E(X2)= 2.5593, Var(X2)= 24.7750, E(X4)= 4.8293, Var(X4)= 12.4381,

E(X4|X2) E(X2|X4)
323
f(x2,x5) f(x5,x2)
E(X2)= 2.5606, Var(X2)= 24.7695, E(X5)= 9.3370, Var(X5)= 12.0868,

E(X5|X2) E(X2|X5)
324
f(x2,x6) f(x6,x2)
E(X2)= 2.5594, Var(X2)= 24.7748, E(X6)= 115.7705, Var(X6)= 9569.2388,

E(X6|X2) E(X2|X6)
f(x3,x4) f(x4,x3)
E(X3)= 5.8119, Var(X3)= 3.9521, E(X4)= 4.8301, Var(X4)= 12.4391,

325
E(X4|X3) E(X3|X4)
f(x3,x5) f(x5,x3)
E(X3)= 5.8123, Var(X3)= 3.9516, E(X5)= 9.3372, Var(X5)= 12.0871,

E(X5|X3) E(X3|X5)
326
f(x3,x6) f(x6,x3)
E(X3)= 5.8118, Var(X3)= 3.9517, E(X6)= 115.7723, Var(X6)= 9569.7574,

E(X6|X3) E(X3|X6)
327
f(x4,x5) f(x5,x4)
E(X4)= 4.8299, Var(X4)= 12.4386, E(X5)= 9.3365, Var(X5)= 12.0875,

E(X5|X4) E(X4|X5)
f(x4,x6) f(x5,x4)
E(X4)= 4.8303, Var(X4)= 12.4370, E(X6)= 115.7919, Var(X6)= 9570.2893,

328
E(X6|X4) E(X4|X6)
f(x5,x6) f(x6,x5)
E(X5)= 9.3364, Var(X5)= 12.0874, E(X6)= 115.7742, Var(X6)= 9568.7784,

E(X6|X5) E(X5|X6)
329
(3) Estimating the cumulative probability distribution function using

Curve-fitting, X1 cumulative probability distribution function estimated
line,
F(X)=1- exp( -1*(X- 0.1000000242)/ 0.9999609544 )^ 0.9999238120 )
determination=0.999999996499728150
X2 cumulative probability distribution function estimated line,

F(X)= 0.01010400782125230400+
0.00806280347259475880*(X--8.45895009276509670000)^1+
0.00244324925232824050*(X--8.45895009276509670000)^2+
0.00032334834982042501*(X--8.45895009276509670000)^3+
0.00001536247104970721*(X--8.45895009276509670000)^4+
value range 0.0000000000<=F(x)<= 0.0250000000 ,
value range -16.8778905711<=X<= -7.1846032573 ,
determination=0.999990530832157940,

F(X)= 0.03706164670722803700+
0.02015580060937039600*(X- -6.51013821544638740000)^1+
0.00336996384742853370*(X- -6.51013821544638740000)^2+
0.00000892456330459090*(X- -6.51013821544638740000)^3+
value range 0.0250000100<=F(x)<= 0.0500000000 ,
value range -7.1846024225<=X<= -5.9254864980 ,
determination=0.999999874416728880,

F(X)= 0.06226852281459775000+
0.02742157814503282100*(X- -5.45005022329149200000)^1+
0.00332108828320898040*(X- -5.45005022329149200000)^2+
-0.00013990511486838830*(X- -5.45005022329149200000)^3+
value range 0.0500000100<=F(x)<= 0.0750000000 ,
value range -5.9254851517<=X<= -5.0090376453 ,
determination=0.999999870623672570
F(X)= 0.08734636945746569700+
330
0.03292626918224489400*(X- -4.61901000819476600000)^1+
0.00318823596547814990*(X- -4.61901000819476600000)^2+
-0.00014109893893810010*(X- -4.61901000819476600000)^3+
value range 0.0750000100<=F(x)<= 0.1000000000 ,
value range -5.0090375273<=X<= -4.2479248639 ,
determination=0.999999872082410370,

F(X)= 0.11238171558346960000+
0.03750435396938489600*(X- -3.90787418812871710000)^1+
0.00318986160727267530*(X- -3.90787418812871710000)^2+
-0.00006656295669693613*(X- -3.90787418812871710000)^3+
value range 0.1000000100<=F(x)<= 0.1250000000 ,
value range -4.2479246588<=X<= -3.5803390690 ,
determination=0.999999869599558110,

F(X)= 0.13740194190725841000+
0.04153136247824326700*(X- -3.27441884729703240000)^1+
0.00324328029923587340*(X- -3.27441884729703240000)^2+
-0.00013369615632674581*(X- -3.27441884729703240000)^3+
value range 0.1250000100<=F(x)<= 0.1500000000 ,
value range -3.5803390481<=X<= -2.9780172501 ,
determination=0.999999786703228420,
F(X)= 0.16241751418408817000+
0.04524883648442240600*(X- -2.69797608095340680000)^1+
0.00324135048411307300*(X- -2.69797608095340680000)^2+
0.00002134989779323249*(X- -2.69797608095340680000)^3+
value range 0.1500000100<=F(x)<= 0.1750000000 ,
value range -2.9780169405<=X<= -2.4253200537 ,
determination=0.999999893166325320,

F(X)= 0.18742843522057598000+
0.04860693789036499300*(X- -2.16529700714405050000)^1+
0.00324910766164851830*(X- -2.16529700714405050000)^2+
0.00042967834407381389*(X- -2.16529700714405050000)^3+
value range 0.1750000100<=F(x)<= 0.2000000000 ,
value range -2.4253199790<=X<= -1.9110972165 ,
determination=0.999999916147037760,
,………………………………………………..,

F(X)= 0.99012184520243895000+
0.00715170196279801830*(X- 13.08359146637336000000)^1+
-0.00213063099613108880*(X- 13.08359146637336000000)^2+
0.00031171887372671847*(X- 13.08359146637336000000)^3+
-0.00001919459111743294*(X- 13.08359146637336000000)^4+
-0.00000009537393960701*(X- 13.08359146637336000000)^5+
0.00000004204922171396*(X- 13.08359146637336000000)^6+
value range 0.9750000100<=F(x)<= 0.9999999900 ,
value range 11.6830436166<=X<= 23.4083510192 ,
determination=0.999999659106653780
331
The comparison of estimated line and The simulated data of estimated line.
the sample data
The cumulative probability distribution function estimated line of X3,X4,...,X6 ar

ignored and showed the simualated image only.

The comparison of estimated line and The simulated data of estimated line.
the sample data
332
(4) The multi-variate analyis is substituted by non-line analysis,

r(X1,X2)=0.802187,r(X1,X3)=0.904891,r(X1,X4)=0.707819,r(X1,X5)=0.706652,r(X1,X6)=0.823413,

r(X1,X2^3)=0.810273,


X1= 0.7821129684+0.001645*X2^3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 1 65681326.7317799110 65681326.7317799110 191156601.8751497000
error 99999998 34359956.5873491990 0.3435995727
total 99999999 100041283.3191291100
----------------------------------------------------------------------------------
MSE= 0.3435995727 , R2=0.656542 , R2(adj)=0.656542
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 0.7821129684 0.0001074185 7280.98828 0.00000
X2^3 0.0016446109 0.0000002029 8104.40169 0.00000
333
----------------------------------------------------------------------------------

r(X1,X3^2)=0.941409,


X1= -0.2336437812+0.035347*X3^2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 1 88661668.5341732800 88661668.5341732800 779127135.9919241700
error 99999998 11379614.7849558290 0.1137961501
total 99999999 100041283.3191291100
----------------------------------------------------------------------------------
MSE= 0.1137961501 , R2=0.886251 , R2(adj)=0.886251
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept -0.2336437812 0.0001733818 -1347.56772 0.00000
X3^2 0.0353467151 0.0000037539 9416.03253 0.00000
----------------------------------------------------------------------------------
r(X1,X4^2)=0.748223,


X1= 0.4079068260+0.019350*X4^2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 1 56006883.4358911220 56006883.4358911220 127188930.6184751200
error 99999998 44034399.8832379880 0.4403440076
total 99999999 100041283.3191291100
----------------------------------------------------------------------------------
MSE= 0.4403440076 , R2=0.559838 , R2(adj)=0.559838
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 0.4079068260 0.0001362092 2994.70836 0.00000
X4^2 0.0193501435 0.0000025856 7483.77468 0.00000
----------------------------------------------------------------------------------

r(X1,X5^3)=0.758994,


X1= 0.3504015040+0.000647*X5^3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
334
----------------------------------------------------------------------------------
Regression 1 57630966.6931709500 57630966.6931709500 135889024.4768352500
error 99999998 42410316.6259581600 0.4241031747
total 99999999 100041283.3191291100
----------------------------------------------------------------------------------
MSE= 0.4241031747 , R2=0.576072 , R2(adj)=0.576072
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 0.3504015040 0.0001405365 2493.31298 0.00000
X5^3 0.0006471917 0.0000000853 7591.50622 0.00000
----------------------------------------------------------------------------------

Independent variables are |X6|,
r(X1,|X6|)=0.824596,

step 1, |X6| into the linear model, SSR= 68023877.5471389140

X1= 0.0980431977+0.008562*|X6|
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 1 68023877.5471389140 68023877.5471389140 212459050.1525602600
error 99999998 32017405.7719901910 0.3201740641
total 99999999 100041283.3191291100
----------------------------------------------------------------------------------
MSE= 0.3201740641 , R2=0.679958 , R2(adj)=0.679958
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 0.0980431977 0.0001573498 623.09062 0.00000
|X6| 0.0085615933 0.0000010381 8247.65891 0.00000
----------------------------------------------------------------------------------

Independent variables are X2^3,X3^2,
r(X1,X2^3)=0.810273,
r(X1,X3^2)=0.941409,
r(X2^3,X3^2)=0.764303,


X1= -0.0829703252+0.000443*X2^3+0.029084*X3^2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 90642999.8999737350 45321499.9499868680 482231664.7524011700
error 99999997 9398283.4191553779 0.0939828370
total 99999999 100041283.3191291100
----------------------------------------------------------------------------------
MSE= 0.0939828370 , R2=0.906056 , R2(adj)=0.906056
Individual test
335
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept -0.0829703252 0.0002037633 -407.18983 0.00000
X2^3 0.0004429525 0.0000003147 1407.59773 0.00000
X3^2 0.0290840146 0.0000058213 4996.16585 0.00000
----------------------------------------------------------------------------------
,…………………………………………………….,

Independent variables are exp(-X1)/X1,X2,exp(-1*X3),X4^2,X5^2,
r(X6,exp(-X1)/X1)=-0.568621,
r(X6,X2)=0.825215,
r(X6,exp(-1*X3))=-0.542390,
r(X6,X4^2)=0.925907,
r(X6,X5^2)=0.834330,
r(exp(-X1)/X1,X2)=-0.789526,
r(exp(-X1)/X1,exp(-1*X3))=0.808122,
r(exp(-X1)/X1,X4^2)=-0.462532,
r(exp(-X1)/X1,X5^2)=-0.534947,
r(X2,exp(-1*X3))=-0.701484,
r(X2,X4^2)=0.730800,
r(X2,X5^2)=0.733443,
r(exp(-1*X3),X4^2)=-0.442822,
r(exp(-1*X3),X5^2)=-0.538636,
r(X4^2,X5^2)=0.618760,
step 1, X4^2 into the linear model, SSR=820482425841.2602500000

step 2, X5^2 into the linear model, SSR=105977599849.2966300000
step 4, exp(-X1)/X1 into the linear model, SSR= 84292656.9885253910
step 5, exp(-1*X3) into the linear model, SSR=102355396.7584228500

X6= 1.0377658589+-1.511670*exp(-X1)/X1+1.424571*X2+34.050383*exp(-1*X3)+1.589610*X4^2+0.552997*X5^2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 5 929100054963.1672400000 185820010992.6334500000 664855513.2980691200
error 99999994 27948929673.7057570000 279.4893135064
total 99999999 957048984636.8730500000
----------------------------------------------------------------------------------
MSE= 279.4893135064 , R2=0.970797 , R2(adj)=0.970797

Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 1.0377658589 0.0002721179 3813.66306 0.00000
exp(-X1)/X1 -1.5116703085 0.0001133531 -13335.94504 0.00000
X2 1.4245705758 0.0000486466 29284.05991 0.00000
exp(-1*X3) 34.0503828469 0.0033656320 10117.08440 0.00000
X4^2 1.5896102770 0.0000039867 398725.79854 0.00000
X5^2 0.5529972649 0.0000022091 250322.63143 0.00000
----------------------------------------------------------------------------------
336
(4.1).The result of non-line model analysis,
Conclusion,
X1=-0.2336437812+0.035347*X3^2
MSE=0.1137961501 , R2=0.886251
X1=-0.0829703252+0.000443*X2^3+0.029084*X3^2
MSE=0.0939828370 , R2=0.906056
X1=-0.1275426498+0.000342*X2^3+0.026851*X3^2+0.001267*|X6|
MSE=0.0896129611 , R2=0.910424
X1=-0.1373223672+0.000349*X2^3+0.026895*X3^2+0.775916*exp(-1*X5)
+0.001283*|X6|
MSE=0.0888607882 , R2=0.911176
X1=-0.1602518180+0.000358*X2^3+0.027103*X3^2+0.014863*|X4|
+0.782158*exp(-1*X5)+0.000753*|X6|
MSE=0.0885592540 , R2=0.911477
X2=4.0003967634+5.000141*log(X1)
MSE=3.9996024754 , R2=0.838560
X2=0.1120553209+3.577348*log(X1)+0.353141*|X6|^0.5
MSE=3.1874576509 , R2=0.871342
X2=-1.4826502464+3.361569*log(X1)+0.372298*X4+1.072416*|X5|^0.5
MSE=3.0293903187 , R2=0.877722
X2=0.5136191676+3.781916*log(X1)+-0.012885*X3^2+-0.026204*exp(-1*X4)
+0.370138*|X6|^0.5
MSE=3.1004901213 , R2=0.874852
X2=0.5117732688+3.728065*log(X1)+-0.011873*X3^2+-0.023205*exp(-1*X4)
+-4.096719*exp(-1*X5)+0.367254*|X6|^0.5
MSE=3.0812170616 , R2=0.875630
X2=4.0003967634+5.000141*log(X1)+residual,
X3=1.6751858508+4.320531*|X1|^0.5
MSE=0.5308059044 , R2=0.865680
X3=1.4276330468+3.886705*|X1|^0.5+0.071003*X5
MSE=0.5043594406 , R2=0.872372
X3=1.4821101114+3.840865*|X1|^0.5+0.068103*X5+0.000000*X6^3
MSE=0.5028855871 , R2=0.872745
X3=1.2990829172+4.023415*|X1|^0.5+-0.020896*X2+0.074984*X5
+0.000000*X6^3
MSE=0.5007872003 , R2=0.873276
X3= 1.3199505815+4.039275*|X1|^0.5+-0.017140*X2+-0.001077*X4^2
+0.073466*X5+0.000000*X6^3
MSE=0.5003167932 , R2=0.873395
X3=1.6751858508+4.320531*|X1|^0.5+residual,
337
X4= -2.4956606911+0.743695*|X6|^0.5
MSE= 1.3763692986 , R2=0.889346
X4=2.9997406458+0.500452*X1+0.499848*X2
MSE= 4.0000428815 , R2=0.678415
X4=-2.9643515908+-0.020698*X5^2+0.999850*|X6|^0.5
MSE=0.6820507077 , R2=0.945166
X4= -2.5149584784+-0.091872*X1+-0.069399*|X2|+0.788415*|X6|^0.5
MSE=1.3313077787 , R2=0.892969
X4=-2.6089171999+0.000672*X2^3+-0.022134*X5^2+0.965048*|X6|^0.5
MSE=0.6309448683 , R2=0.949275
X4=-2.4564205981+0.000676*X2^3+-4.366075*exp(-1*X3)+-0.022272*X5^2
+0.956085*|X6|^0.5
MSE= 0.6240742427 , R2=0.949827
X4=-1.9250723755+-0.119939*exp(-X1)/X1+0.000684*X2^3+-0.179871*log(X3)
+-0.022005*X5^2+0.941606*|X6|^0.5
MSE=0.6087417797 , R2=0.951060
X4=2.9997406458+0.500452*X1+0.499848*X2+residual,
X4=-2.9643515908+-0.020698*X5^2+0.999850*|X6|^0.5+residual,
X5=3.1081856134+0.632310*|X6|^0.5,
MSE=4.0908530177 , R2=0.661564
X5=-1.2355589466+2.599183*|X3|^0.5+0.446347*|X6|^0.5
MSE=3.6556632584 , R2=0.697567
X5=4.5011708433+0.300020*X2+0.699824*X3
MSE=4.5002575962 , R2=0.627694
X5= -0.5019385434+0.047599*X2+2.354357*|X3|^0.5+0.418552*|X6|^0.5
MSE=3.6444578101 , R2=0.698494 , R2(adj)=0.698494
X5=0.6365520060+0.006178*X2^2+0.268705*X3+-1.335977*|X4|
+1.393435*|X6|^0.5
MSE=1.6958721421 , R2=0.859701
X5=0.9176472304+-0.047826*exp(-X1)/X1+0.007338*X2^2+0.240705*X3
+-1.333907*|X4|+1.383227*|X6|^0.5
MSE=1.6931528079 , R2=0.859926
X5=4.5011708433+0.300020*X2+0.699824*X3+residual,
X5=0.6365520060+0.006178*X2^2+0.268705*X3+-1.335977*|X4|
+1.393435*|X6|^0.5+residual
338
X6=32.0137473116+2.3420605999*X4^2
MSE=1365.6656152749 , R2=0.857305
X6=-4.3751636672+1.679077*X4^2+0.605495*X5^2
MSE=305.8895986398 , R2=0.968038
X6=0.4116674758+1.712298*X2+1.580176*X4^2+0.548742*X5^2
MSE=281.3557885288 , R2=0.970602
X6=1.1557635081+1.783814*X2+6.690695*1/X3+1.579208*X4^2+0.549917*X5^2
MSE=281.2575403898 , R2=0.970612
X6=1.0377658589+-1.511670*exp(-X1)/X1+1.424571*X2+34.050383*exp(-1*X3)
+1.589610*X4^2+0.552997*X5^2
MSE= 279.4893135064 , R2=0.970797
X6=32.0137473116+2.3420605999*X4^2
The analysis summary,

X1~Shifted exponential(lamda=1,c=0.1),
X2=4.0003967634+5.000141*log(X1)+residual,
X3=1.6751858508+4.320531*|X1|^0.5+residual,
X4=2.9997406458+0.500452*X1+0.499848*X2+residual,
X5=4.5011708433+0.300020*X2+0.699824*X3+residual,
X6=32.0137473116+1.580176*X4^2+0.548742*X5^2+residual
(5) The mathematical model,

X2=4.0003967634+5.000141*log(X1)+residual,
X3=1.6751858508+4.320531*|X1|^0.5+residual,
X4=2.9997406458+0.500452*X1+0.499848*X2+residual,
X5=4.5011708433+0.300020*X2+0.699824*X3+residual,
X6=32.0137473116+1.580176*X4^2+0.548742*X5^2+residual
For the following reason,

X6=32.0137473116+2.3420605999*X4^2
MSE=1365.6656152749 , R2=0.857305
X6=-4.3751636672+1.679077*X4^2+0.605495*X5^2
MSE=305.8895986398 , R2=0.968038
X6=0.4116674758+1.712298*X2+1.580176*X4^2+0.548742*X5^2
MSE=281.3557885288 , R2=0.970602
X6=b0+b1*X4*X5+error will be tested,
X6=10.000307+1.999999* X4*X5+residual,
MSE=7.9997516135 , R2=0.999164
letX1=X4*X5,X2=X6, Linear model analysis
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 956249009491.5258800000 956249009491.5258800000
119534837541.9469900000
error 99999998 799975145.3470459000 7.9997516135
total 99999999 957048984636.8729200000
----------------------------------------------------------------------------------
339
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0

----------------------------------------------------------------------------------
intercept 10.0003073367 0.0004166688 24000.61442 0.00000
slpoe 1.9999989019 0.0000057847 345738.10542 0.00000
----------------------------------------------------------------------------------
MSE= 7.9997516135 , R2=0.999164 , R2(adj)=0.999164
SSX1=239062514889.9035900000 , SS(X2*X1)=478124767262.7130100000, C.V.= 0.0244279918
X4*X5 and residual joint pdf X6 estimated line andX6 joint pdf
X0=residual, the residual probability distribution.

Variance : 7.99975
S.D. : 2.82838
MAD : 2.00010
Range : 71.18795
Mid_range : 3.38238
Median : -0.00018
Q1 : -1.38652
Q2 : -0.00018
Q3 : 1.38652
IQR : 2.77304
C.V. : none
(6) The confirm the mathematical model using the probability distribution simulator,
X2 simulating data,X2=4.0003967634+ 5.0001407941*log(X1)+residual,
X1~Shifted exponential(1,0.1),residual~f:\\test07_data_caseXX\\X2_residual.txt,
340
Variance : 24.77917
S.D. : 4.97787
MAD : 4.06800
Range : 40.28849
Mid_range : 3.33694
Median : 2.77110
Q1 : -0.97484
Q2 : 2.77110
Q3 : 6.18374
IQR : 7.15858
C.V. : 1.94488
Comaprsion of the cumulative probability distribution function of X2 and X3,
Note:X3 is the estimated line of X2.
X3 simulating data,X3=1.6751858508+4.320531*|X1|^0.5+residual,
Variance : 3.95284
S.D. : 1.98817
MAD : 1.58994
Range : 19.50163
Median : 5.56123
Q1 : 4.33543
Q2 : 5.56123
Q3 : 7.03998
IQR : 2.70455
C.V. : 0.34210
X4 simulating data,X1~shifted exponential(1,0.1),
X2=4.0003967634+ 5.0001407941*log(X1)+residual,
X4=2.9997406458+0.500452*X1+0.499848*X2+residual,
residual~f:\\test07_data_caseXX\\X4_residual.txt,
341
Variance : 12.44001
S.D. : 3.52704
MAD : 2.84760
Range : 33.02010
Mid_range : 7.56494
Median : 4.79414
Q1 : 2.35387
Q2 : 4.79414
Q3 : 7.26105
IQR : 4.90718
C.V. : 0.73036
X5 simulating data,X1~shifted exponential(1,0.1),
X2=4.0003967634+ 5.0001407941*log(X1)+residual,
X3=1.6751858508+4.320531*|X1|^0.5+residual,
X5=4.5011708433+0.300020*X2+0.699824*X3+residual,
Variance : 12.08803
S.D. : 3.47678
MAD : 2.81811
Range : 28.75054
Median : 9.25869
Q1 : 6.85920
Q2 : 9.25869
Q3 : 11.73911
IQR : 4.87991
C.V. : 0.37240

Note:X6 is the estimated line of X4 and X5.
342
X6 simulating data,
X1~shifted exponential(1,0.1),
X2=4.0003967634+ 5.0001407941*log(X1)+residual,
X3=1.6751858508+4.320531*|X1|^0.5+residual,
X4=2.9997406458+0.500452*X1+0.499848*X2+residual,
X5=4.5011708433+0.300020*X2+0.699824*X3+residual,
X6=-4.3751636672+1.679077*X4^2+0.605496*X5^2+residual,
Variance : 9247.97277
S.D. : 96.16638
MAD : 74.24751
Range : 1287.36189
Mid_range : 639.30587
Median : 90.67089
Q1 : 43.50364
Q2 : 90.67089
Q3 : 163.19796
IQR : 119.69432
C.V. : 0.83060

X6 simulating data,
X1~shifted exponential(1,0.1),
X2=4.0003967634+ 5.0001407941*log(X1)+residual,
X3=1.6751858508+4.320531*|X1|^0.5+residual,
343
X4=2.9997406458+0.500452*X1+0.499848*X2+residual,
X5=4.5011708433+0.300020*X2+0.699824*X3+residual,
X6=10.000307+1.999999*X4*X5+residual,
residual~f:\\test07_data_caseXX\\X6_residual_spc.txt,
Variance : 9548.69469
S.D. : 97.71742
MAD : 76.18625
Range : 1296.86566
Mid_range : 556.56971
Median : 94.54750
Q1 : 42.12351
Q2 : 94.54750
Q3 : 167.45398
IQR : 125.33048
C.V. : 0.84403

The mathematical model is closed to the following,

X2=4.0003967634+5.000141*log(X1)+residual,
X3=1.6751858508+4.320531*|X1|^0.5+residual,
X4=2.9997406458+0.500452*X1+0.499848*X2+residual,
X5=4.5011708433+0.300020*X2+0.699824*X3+residual,
X6=10.000307+1.999999*X4*X5+residual,
344
Appendix 1. The common probability distributions
1)Uniform distribution,
f X (x ) =
1
, α ≤ x ≤ β ,−∞ < α < β < ∞,
X ~ U (α , β ) β −α
2) Normal distribution,  (x − µ )2 
f X (x ) =
1
(
X ~ N µ ,σ 2 ) 2π σ
exp −
 2 × σ
,−∞ < x < ∞
2 

− ∞ < µ < ∞, σ > 0,
3)Shifted exponential distribution, f X ( x ) = λ exp(− λ ( x − c )), c < x < ∞
X ~ Shifted _ exp onential (λ , c ) − ∞ < c < ∞, λ > 0,
4)Pareto1 distribution, x λ −1
X ~ Pareto1(λ , c ) f X (x ) = λ × ,0 < x < c, λ > 0, c > 0,
cλ
5)Pareto2 distribution, cλ
X ~ Pareto2(λ , c ) f X ( x ) = λ λ +1 , c < x < ∞, λ > 0, c > 0,
x
6)Rayleigh distribution, ( )
f X ( x ) = 2λ × ( x − c ) × exp − λ ( x − c ) , c < x < ∞
2
X ~ Rayleigh(λ , c )
λ > 0, c > 0,
λ
exp(− λ x − µ ),−∞ < x < ∞
7)Double exponential distribution,
f X (x ) =
X ~ DE (λ , µ ) 2
− ∞ < µ < ∞, λ > 0,
8)Lognormal distribution  (ln ( x ) − µ )2 
f X (x ) =
1
(
X ~ Log _ normal µ , σ 2 ) 2π σx
exp −
 2σ 2
,0 < x < ∞,


− ∞ < µ < ∞, σ > 0,
9)Gamma distribution x α −1  x
X ~ Gamma(α , β ) f X (x ) = exp − ,0 < x < ∞, α , β > 0,
Γ(α )β α
 β
Γ( ) : gamma function ,
10)Beta distribution Γ(α + β ) α −1
f X (x ) = x (1 − x ) ,0 < x < 1
β −1
X ~ Beta(α , β ) Γ(α )Γ(β )
α , β > 0, Γ( ) : gamma function ,
11)Cauchy distribution σ
f X (x ) = ×
1
,−∞ < x < ∞,
X ~ Cauchy (µ , σ ) π (x − µ )2 + σ 2
σ > 0,−∞ < µ < ∞,
12)Arcsin distribution
f (x ) =
1 1
, x − µ < c,
X ~ Arc sin (µ , c ) π ( x − µ)
2
1−
c2
− ∞ < µ < ∞, c > 0,
345
13)Gumbel distribution  − x−µ 
x−µ − e σ 
X ~ Gumbel (µ , σ ) f X (x ) =
1
e
−
σ
e




,−∞ < x < ∞,
σ
− ∞ < µ < ∞, σ > 0,
14) Triangular 1 distribution  x − µ  1
X ~ Triangular1(µ , c )  × ,−c + µ < x < µ + c
f ( x ) =   c  c ,

 0, otherwise
− ∞ < µ < ∞, c > 0,
15)Trapezoid distribution f X (x ) =
X ~ Trapezoid (µ , c ) 1.5c + x − µ
 , µ − 1.5c < x < µ − 0.5c
2c 2

 1 , µ − 0.5c < x < µ + 0.5c
 2c ,
1.5c − x + µ
 , µ + 0.5c < x < µ + 1.5c
 2c 2
0, otherwie
− ∞ < µ < ∞, c > 0,
16)U-quadratic distribution f X (x ) = α (x − β ) , a ≤ x ≤ b,−∞ < a < b < ∞,
2
X ~ U _ quadratic(a, b )
a+b 12
β= ,α = ,
2 (b − a )3
f X ( x ) = 2 R 2 − ( x − µ ) , x − µ ≤ R,
17) Wingner semicircle distribution 2 2
X ~ Semi _ circle(µ , R ) πR
− ∞ < µ < ∞, R > 0,
18) Logisitic distribution −
( x−µ )
X ~ Logistic (µ , σ )
σ
f X (x ) =
e 1
× ,−∞ < x < ∞,
 −
( x−µ )

2
σ
1 + e σ 
 
 
− ∞ < µ < ∞, σ > 0,
19)Weibull distribution γ −1
  x − α γ 
X ~ Weibull (α , β , γ )  x −α 
f X (x ) = γ ×   × × exp −  
1 
 β  β   β  
 
, x > α , α > 0, β > 0, γ > 0,
20)Pareto3 distribution λ −1
 x
f X ( x ) = λ 1 − 
1
X ~ Pareto3(λ , c ) × ,0 < x < c
 c c
λ > 0, c > 0
346
Appendix 2. The Curve-linear of linear model
analysis
Curve-linear analysis model,
1) The simple linear model, X 2 is dependent variable, X 1 is independent variable.
( ) 2
( ) k
(1) X 2 = βˆ0 + βˆ1 × X 1 − X 1 + βˆ 2 × X 1 − X 1 + ... + βˆ k × X 1 − X 1 + εˆ, ( )
(2) X = βˆ + βˆ × X + βˆ × X 2 + ... + βˆ × X k + εˆ,
2 0 1 1 2 1 k 1
2 k
(3) X 2 = βˆ0 + βˆ1 × X 1 − X 1 + βˆ2 × X 1 − X 1 + ... + βˆk × X 1 − X 1 + εˆ,
1 1 1
(4) X 2 = βˆ0 + βˆ1 × + βˆ2 × 2 + ... + βˆk × k + εˆ,
X1 X1 X1
(5) X 2 = βˆ0 + βˆ1 × cos( X 1 ) + βˆ2 × cos ( X 1 ) + ... + βˆk × cos k ( X 1 ) + εˆ,
2
There two kinds selection criterion, one is the coefficient of determination and the
other is the MSE.
2) The general line model, Y is dependent variable, X 1 , X 2 ,...., X p ( p ≥ 2 ) are

independent variables.
The estimated line Yˆ = βˆ0 + βˆ1 × X 1 + βˆ2 × X 12 + ... + βˆ p × X p , the curve-linear
analysis is basis on the estimated line.
( ) 2
( )
(1) Y = βˆ0 + βˆ1 × Yˆ − Y + βˆ 2 × Yˆ − Y + ... + βˆ k × Yˆ − Y ( )
k
+ εˆ,
(2) Y = βˆ + βˆ × Yˆ + βˆ × Yˆ 2 + ... + βˆ × Yˆ k + εˆ,
0 1 2 k
2 k
(3) Y = βˆ0 + βˆ1 × Yˆ − Y + βˆ2 × Yˆ − Y + ... + βˆk × Yˆ − Y + εˆ,
1 1 1
(4) Y = βˆ0 + βˆ1 × + βˆ 2 × 2 + ... + βˆ k × k + εˆ,
Yˆ Yˆ Yˆ
() ()
(5) Y = βˆ0 + βˆ1 × cos Yˆ + βˆ2 × cos Yˆ + ... + βˆk × cos k Yˆ + εˆ,
2
()
There two kinds selection criterion, one is the coefficient of determination and the
other is the MSE.
347
Appendix 3. The mathametical formula of
Non-linear model analyis,
There are 33 kinds model for analysis and the criterion is the coefficient of
determination.X2 is dependent variable and X1 is independent variable.
1. X2=b0+b1*X1
2. X2=b0+b1*X1^2
3. X2=b0+b1*X1^3
4. X2=b0+b1*Cos(X1*pi)
5. X2=b0+b1*Cos(2*X1*pi)
6. X2=b0+b1*Sin(X1*pi)
7. X2=b0+b1*Sin(2*X1*pi)
8. X2=b0+b1*Cos(X1*pi)*Sin(X1*pi)
9. X2=b0+b1*Cos(X1*pi)*Cos(X1*pi)
10. X2=b0+b1*Sin(X1*pi)*Sin(X1*pi)
11. X2=b0+b1*exp(X1)
12. X2=b0+b1*exp(-1*X1)
13. X2=b0+b1*log(X1)
14. X2=b0+b1/X1
15. X2=b0+b1*X1/(1-X1)
16. X2=b0+b1*X1*exp(X1)
17. X2=b0+b1*X1*exp(-1*X1)
18. X2=b0+b1*X1*Cos(X1*pi)
19. X2=b0+b1*X1*Sin(X1*pi)
20. X2=b0+b1*X1*Cos(X1*pi)*Cos(X1*pi)
21. X2=b0+b1*X1*Sin(X1*pi)*Sin(X1*pi)
22. X2=b0+b1*X1*X1*Cos(X1*pi)
23. X2=b0+b1*X1*X1*Sin(X1*pi)
24. X2=b0+b1*X1*X1*Cos(X1pi)*Cos(X1*pi)
25. X2=b0+b1*X1*X1*Sin(X1*pi)*Sin(X1*pi)
26. X2=b0+b1*X1*Cos(X1*pi)*Sin(X1*pi)
27. X2=b0+b1*X1*X1*Cos(X1*pi)*Sin(X1*pi)
28. X2=b0+b1*|X1|
29. X2=b0+b1*|X1|^0.5
30. X2=b0+b1*exp(X1)/X1
31. X2=b0+b1*exp(-X1)/X1
32. X2=b0+b1*exp(X1)*log(X1)
33. X2=b0+b1*exp(-X1)*log(X1)
348
Appendix 4. The limiting theory of cumulative
probability distribution function
According the cumulative probability distribution function of X n and X and the

limiting theory rule(probability and almost surely) to understand the relationship of
X n and X .
Whether FX n ( xn ) is closed FX ( xn ) ,
FX n ( x ) ~ Uniform(0,1) ,
i)If the cdf of two random variables are different, FX ( x ) = 0,1 ,
[(
E FX n ( x ) − FX ( x ) = ,
2
)]
1
3
{ } {
P FX n ( x ) − FX (x ) ≥ 0.1 = 0.1, P FX n (x ) − FX (x ) ≥ 0.05 = 0.05, }
P{F (x ) − FX (x ) ≥ 0.01} = 0.01, P{FX (x ) − FX (x ) ≥ 0.05} = 0.05,
Xn n
P{FX ( x ) − FX ( x ) ≥ 0.001} = 0.001, P{FX ( x ) − FX ( x ) ≥ 0.005} = 0.005,

n n
P{FX ( x ) − FX ( x ) ≥ 0.0005} = 0.0005, P{FX ( x ) − FX ( x ) ≥ 0.0001} = 0.0001,

n n
ii) If the cdf of two random variables are same ,

[( )]
E FX n ( x ) − FX ( x ) → 0,
2
P{F (x ) − FX (x ) ≥ 0.1}→ 0, P{FX (x ) − FX (x ) ≥ 0.05}→ 0,

Xn n
P{FX ( x ) − FX ( x ) ≥ 0.01} → 0, P{FX ( x ) − FX ( x ) ≥ 0.05} → 0,

n n
P{FX ( x ) − FX ( x ) ≥ 0.001} → 0, P{FX ( x ) − FX ( x ) ≥ 0.005} → 0,

n n
P{FX ( x ) − FX ( x ) ≥ 0.0005} → 0, P{FX ( x ) − FX ( x ) ≥ 0.0001} → 0,

n n
Because the error of computation will let the P FX n (x ) − FX (x ) ≥ 0.0001{ }

is not 0,
but
{ } {
P FX n ( x ) − FX ( x ) ≥ 0.1 → 0, P FX n ( x ) − FX ( x ) ≥ 0.05 → 0, }
{ } {
P FX n ( x ) − FX ( x ) ≥ 0.01 → 0, P FX n ( x ) − FX ( x ) ≥ 0.05 → 0, }
P{F ( x ) − F ( x ) ≥ 0.001} → 0,
Xn X
Computation,
FX n ( xn ) is compuated in first and FX ( xn ) is gotten from the X probability
distribution, the data base of FX n (x ) − FX ( x ) is setting. The calculated the
[(
E FX n (x ) − FX ( x ) )]
2
{
and P FX n ( x ) − FX ( x ) ≥ ε . }
349
Appendix 5. An application of Dow Jones
Dow Jones industry index is additive measure and is not close range,
there are two case,
Case 1, data is 1999/7/27, 1999/7/28,……,2014/6/5,
Case 2, data is 1999/7/27, 1999/7/28,……,2015/5/12,
Data analysis,
Case 1, dates are 1999/7/27, 1999/7/28,……,2014/6/5,

Each record has X2=open,X3=day high,X4=day low,X5=close,
X1=t, 1999/7/27=25001, 1999/7/28=25002,……
t=25001, 25002, 25003,….., 28738, is arithmetic series and time value,
3738 records is totally.
X5=Dow Jones industry index close index ,
(1999/7/28 close index),(1999/7/29 close index),…..,etc.
X1 esitmated the X5 using curve-linear analysis, the result is below,
X5=12355.119320938364000000000000000000+
10.347977755591273000000000000000*(X1-26869.500000000000000000000000000000)^1+
0.001818358466948666300000000000*(X1-26869.500000000000000000000000000000)^2+
-0.000105233053247388850000000000*(X1-26869.500000000000000000000000000000)^3+
-0.000000172135270318923840000000*(X1-26869.500000000000000000000000000000)^4+
0.000000000335997899967264980000*(X1-26869.500000000000000000000000000000)^5+
0.000000000000809839187473130810*(X1-26869.500000000000000000000000000000)^6+
-0.000000000000000508227701428245*(X1-26869.500000000000000000000000000000)^7+
-0.000000000000000001670497075534*(X1-26869.500000000000000000000000000000)^8+
0.000000000000000000000434515811*(X1-26869.500000000000000000000000000000)^9+
0.000000000000000000000001850787*(X1-26869.500000000000000000000000000000)^10+
-0.000000000000000000000000000227*(X1-26869.500000000000000000000000000000)^11+
-0.000000000000000000000000000001*(X1-26869.500000000000000000000000000000)^12+
0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^13+
0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^14+
-0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^15+
0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^16+
0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^17+
-0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^18+
0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^19+
0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^20+
-0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^21+
-0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^22+
0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^23+
0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^24+
-0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^25+
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
regression 25 13523127587.1665190000 540925103.4866607200 2367.7384066913
error 3712 848030330.7443275500 228456.4468600020
total 3737 14371157917.9108470000
----------------------------------------------------------------------------------
The F test p value=0.000100,MSE= 228456.4468600020 , R2=0.940991 , R2(adj)=0.940593
X5(Mean)= 11236.6572899947, X5(Var)= 3845640.3312579198, X5(sd)= 1961.0304258879
X1(Mean)= 26869.5000000000, X1(Var)= 1164698.5000000000, X1(sd)= 1079.2119810306
------------------- individual test -------------------------
----------------------------------------------------------------------------------
b0 12355.1193209384 29.6674997696 416.4530013270 0.0000000000
350
b1 10.3479777556 0.2175345703 47.5693483532 0.0000000000
b2 0.0018183585 0.0009599880 1.8941471774 0.0582000000
b3 -0.0001052331 0.0000037129 -28.3421743413 0.0000000000
b4 -0.0000001721 0.0000000083 -20.7450146490 0.0000000000
H0: residual is random , H1: Increasing line or decreasing line or Oscillation, Z=-51.413601, p-value=0.000000
The first order auto regressive error model, Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,3738
X0=residual, residual probability distribution

lamda point estimated value=0.002763 (MLE) , mu point estimated value=21.035599 (MLE)
lamda value from 0.001381 to 0.005526 , mu value from 21.027784 to 21.043413
H0: X0~Double exponential(lamda=0.002696,mu=21.027784),
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ]
lower limit -2029.14765 -1751.30188 -1473.45611 -1195.61035 -917.76458 -639.91881 -362.07305
-84.22728 193.61849 471.46425 749.31002 1027.15579
upper limit -1751.30188 -1473.45611 -1195.61035 -917.76458 -639.91881 -362.07305 -84.22728
193.61849 471.46425 749.31002 1027.15579 1305.00155
observed no 8.00000 12.00000 43.00000 89.00000 197.00000 389.00000 743.00000
1075.00000 661.00000 328.00000 156.00000 37.00000
probability 0.00421 0.00469 0.00992 0.02098 0.04438 0.09384 0.19846
0.30952 0.16552 0.07827 0.03701 0.03320
expected no 15.73275 17.53849 37.08999 78.43701 165.87670 350.79204 741.84657
1156.96950 618.70924 292.56492 138.34322 124.09958
chi square 3.80070 1.74901 0.94172 1.42250 5.83964 4.16158 0.00179
5.80741 2.89071 4.29185 2.25354 61.13104
degree of freedom=9, pearson chi-square test statistic =94.291488 , p-value=0.000000
residual probability distribution estimated line using curve-fitting.

F(X)= 0.47037689949508632000+0.00101906117315600510*(X- -10.75798745619877500000)^1+
0.00000052208436038996*(X--10.75798745619877500000)^2+
-0.00000000191363645448*(X--10.75798745619877500000)^3+
-0.00000000000402402585*(X--10.75798745619877500000)^4+
0.00000000000000442451*(X--10.75798745619877500000)^5+
0.00000000000000001630*(X--10.75798745619877500000)^6+
-0.00000000000000000001*(X--10.75798745619877500000)^7+
-0.00000000000000000000*(X--10.75798745619877500000)^8+
0.00000000000000000000*(X--10.75798745619877500000)^9+
0.00000000000000000000*(X--10.75798745619877500000)^10+
0.00000000000000000000*(X--10.75798745619877500000)^11+
-0.00000000000000000000*(X--10.75798745619877500000)^12+
-0.00000000000000000000*(X--10.75798745619877500000)^13+
351
0.00000000000000000000*(X--10.75798745619877500000)^14+
0.00000000000000000000*(X- -10.75798745619877500000)^15+
-0.00000000000000000000*(X- -10.75798745619877500000)^16+
-0.00000000000000000000*(X--10.75798745619877500000)^17+
-0.00000000000000000000*(X--10.75798745619877500000)^18+
-0.00000000000000000000*(X--10.75798745619877500000)^19+
-0.00000000000000000000*(X--10.75798745619877500000)^20+
0.00000000000000000000*(X--10.75798745619877500000)^21+
0.00000000000000000000*(X--10.75798745619877500000)^22+
0.00000000000000000000*(X--10.75798745619877500000)^23+
SSE=0.038058137174926572 MAX error=0.010738934771908681 coefficient of determination=0.999877822985879020
Durbin Watson model analysis will be applied whren the curve-linear analysis is
finished, the first order auto-regressive error is ε t +1 = ρε t + µ t +1 .From the output of
regression analyis ,0.067287=2-2 ρ , ρ = 0.96636,The data set is population , the auto
regressive correlation coefficient is population correation ceofficienf of AR(1), the
real MSE= MSE (regressive analysis) × 1 − ρ 2 ( )
= 228456.4468600020*(1-0.96636*0.96636)=15112.017098,
the esimtated population variance is 122.93094428, the value is removed the effect of
the first order auto-regressive error mode.
The SSE is the part of X1 cannot explain X5, the value is very huge. But using the
AR(1) analysis, the residual is µ (t).
µ (t)的機率分配,
Variance : 15001.42664
S.D. : 122.48031
MAD : 86.93288
Range : 1639.51650
Median : 4.04876
Q1 : -61.15728
Q2 : 4.04876
Q3 : 66.57909
IQR : 127.73637
C.V. : none
Curve-fitting estimated the distribution function of µ (t),

F(X)=exp(0.0109496804*(X-4.0487589014))/2,X<4.0487589014
F(X)=1-exp(-0.0121369408*(X-4.0487589014))/2,X>= 4.0487589014
SSE=0.250858805617182990 MAX error=0.019829422847566502 coefficient
of determination=0.999846680856886550
Left diagram is comparison of the
estimated line and the real sample data.
352
µ (t) is close to double exponential distribution and | µ (t)| is shifted exponential
distribution. The exponential distribution has the memoryless property.
X2=mu(t),X1=t=25002,….., 28738, X1 is independent variable and non-linear model

analysis,
The relation is X2= -6.9340372476+ 182834.3132005120/X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
1/X1 1 280.8792582680 280.8792582680 0.0187185853
error 3735 56045049.0339617650 15005.3678805788
total 3736 56045329.9132200330
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept -6.9340372476 49.8547006021 -0.13908 0.88940
slpoe 182834.3132005120 1336353.0019461529 0.13682 0.89100
----------------------------------------------------------------------------------
MSE=15005.3678805788 , R2=0.000005 , R2(adj)=-0.000263
1/X1(mean)= 0.0000372764, 1/X1(variance)= 0.0000000000, 1/X1(s.d.)= 0.0000014997
SS(1/X1)= 0.0000000084 , SS(X2*1/X1)= 0.0015362502, C.V.=-------
mu(t) is not affected by X1=t,

Concluson,
X 5t = β 0 + β1 X 1t + β 2 X 2t + .... + β 25 X 25,t + ε t , ε t +1 = ρε t + µ t +1 , ρ = 0.96636 ,
X it = ( X 1t − 26869.5) , i = 1,2,....,25,
i
(i )µ t ~ DE (λ , E (µ )), (ii )Var (µ t ), E (µ ) ≈ 0 equally, (iii ) µt are independently.
Please refer is appendix 9.

(1) Simple line model,
X 5t = β 0 + β1 X 1t + ε t , ε t +1 = ρε t + µ t +1 and Durbin Watson model.
Durbin Watson test
0.0063585278 = 2 − 2 ρ , ρ = 0.9968207365, ]
The variance estimated value= 16161.8551027577
The first order auto-regressive error model,auto regressive correlation
coefficient=0.995,MSE=16161.8551027577,
Simple line model anslysis is worse than the curve-linear analysis/
353
Case 2,
Dates are 1999/7/27, 1999/7/28,……,2015/5/12,
Each record has X2=open,X3=day high,X4=day low,X5=close,
X1=t, 1999/7/27=25001, 1999/7/28=25002,……
t=25001, 25002, 25003,….., 28973, is arithmetic series and time value,
3973 records is totally.
X5= Dow Jones industry index close index ,
(1999/7/28 close index),(1999/7/29 close index),…..,etc.
X1 esitmated the X5 using curve-linear analysis, the result is below,
X5= 13423.50612813327500000000+
6.28301955573260780000*(X1- 26987.00000000000000000000)^1+
-0.04408700016989541800*(X1- 26987.00000000000000000000)^2+
-0.00013050937590719514*(X1- 26987.00000000000000000000)^3+
0.00000017258416938094*(X1- 26987.00000000000000000000)^4+
0.00000000071248246365*(X1- 26987.00000000000000000000)^5+
-0.00000000000028123310*(X1- 26987.00000000000000000000)^6+
-0.00000000000000185997*(X1- 26987.00000000000000000000)^7+
0.00000000000000000019*(X1- 26987.00000000000000000000)^8+
0.00000000000000000000*(X1- 26987.00000000000000000000)^9+
0.00000000000000000000*(X1- 26987.00000000000000000000)^10+
-0.00000000000000000000*(X1- 26987.00000000000000000000)^11+
-0.00000000000000000000*(X1- 26987.00000000000000000000)^12+
0.00000000000000000000*(X1- 26987.00000000000000000000)^13+
0.00000000000000000000*(X1- 26987.00000000000000000000)^14+
-0.00000000000000000000*(X1- 26987.00000000000000000000)^15+
-0.00000000000000000000*(X1- 26987.00000000000000000000)^16+
0.00000000000000000000*(X1- 26987.00000000000000000000)^17+
0.00000000000000000000*(X1- 26987.00000000000000000000)^18+
-0.00000000000000000000*(X1- 26987.00000000000000000000)^19+
-0.00000000000000000000*(X1- 26987.00000000000000000000)^20+
0.00000000000000000000*(X1- 26987.00000000000000000000)^21+
0.00000000000000000000*(X1- 26987.00000000000000000000)^22+
-0.00000000000000000000*(X1- 26987.00000000000000000000)^23+
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
regression 23 21965463556.1898500000 955020154.6169500400 4296.8688614340
error 3949 877702976.7959175100 222259.5535061832
total 3972 22843166532.9857670000
----------------------------------------------------------------------------------
MSE= 222259.5535061832 , R2=0.961577 , R2(adj)=0.961353
X5(Mean)= 11601.3823533854, X5(Var)= 5751048.9760789946, X5(sd)= 2398.1344783141
X1(Mean)= 26987.0000000000, X1(Var)= 1315725.1666666667, X1(sd)= 1147.0506382312
------------------- individual test -------------------------
----------------------------------------------------------------------------------
b0 13423.5061281333 28.4893013675 471.1770904797 0.0000000000
b1 6.2830195557 0.2154666082 29.1600615402 0.0000000000
b2 -0.0440870002 0.0008499905 -51.8676398765 0.0000000000
b3 -0.0001305094 0.0000035030 -37.2565163629 0.0000000000
b4 0.0000001726 0.0000000071 24.4549886281 0.0000000000
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ]
lower limit -652.13595 -456.20123 -317.97271 -203.09431 -99.25938 -0.10671
99.15823 202.94871 317.95409 455.99928 651.79075
upper limit -652.13595 -456.20123 -317.97271 -203.09431 -99.25938 -0.10671 99.15823
202.94871 317.95409 455.99928 651.79075
observed no 345.00000 271.00000 272.00000 294.00000 318.00000 371.00000 398.00000
432.00000 365.00000 350.00000 242.00000 315.00000
probability 0.08333 0.08333 0.08333 0.08333 0.08333 0.08333 0.08333
0.08333 0.08333 0.08333 0.08333 0.08333
expected no 331.08333 331.08333 331.08333 331.08333 331.08333 331.08333 331.08333
331.08333 331.08333 331.08333 331.08333 331.08333
chi square 0.58497 10.90362 10.54369 4.15356 0.51701 4.81251 13.52481
30.76015 3.47447 1.08082 23.96931 0.78129
354
H0: residualis random , H1: Increasing line or decreasing line Z=-53.551452, p-value=0.000000
Durbin Watson model analysis will be applied whren the curve-linear analysis is
finished, the first order auto-regressive error is ε t +1 = ρε t + µ t +1 .From the output of
regression analyis, 0.069523=2-2 ρ , ρ = 0.9652385,The data set is population , the
auto regressive correlation coefficient is population correation ceofficienf of AR(1),
the real MSE= MSE (regressive analysis) × 1 − ρ 2 ( )
= 222259.5535061832*(1-0.9652385*0.9652385)=15183.5809664,
the esimtated population variance is 123.221674093 the value is removed the effect of
the first order auto-regressive error mode.
The SSE is the part of X1 cannot explain X5, the value is very huge. But using the
AR(1) analysis, the residual is µ (t).
µ (t) probability distribution,
Variance : 15096.54665
S.D. : 122.86800
MAD : 87.50202
Range : 1638.51426
Median : 4.79757
Q1 : -61.63347
Q2 : 4.79757
Q3 : 66.47271
IQR : 128.10618
C.V. : 842.94913
Curve-fitting estimated the distribution function of µ (t),

F(X)=exp(0.0108837231*(X-4.7975713831))/2, X<4.7975713831
F(X)=1-exp(-0.0121487797*(X-4.7975713831))/2, X>=4.7975713831
SSE=0.239074958149817520 MAX error=0.018948482734298611 coefficient
of determination=0.999872192234390940
355
Left diagram is comparison of the
estimated line and the real sample data.
The coefficients of two cases,

Case 1 Case 2
auto regressive 0.96636 0.9652385
correlation coefficient
standard deviation 122.48031 122.86800
(2)The analysis of data set that the new inputting of two cases,
X5= 17779.29496671001100000000+
6.75789595209062100000*(X1- 28856.00000000000000000000)^1+
-2.21204850418814660000*(X1- 28856.00000000000000000000)^2+
0.09267482485302025500*(X1- 28856.00000000000000000000)^3+
0.00270808698662494680*(X1- 28856.00000000000000000000)^4+
-0.00015703847732595477*(X1- 28856.00000000000000000000)^5+
-0.00000160841359467799*(X1- 28856.00000000000000000000)^6+
0.00000010570796549203*(X1- 28856.00000000000000000000)^7+
0.00000000058491599096*(X1- 28856.00000000000000000000)^8+
-0.00000000003815047667*(X1- 28856.00000000000000000000)^9+
-0.00000000000013969470*(X1- 28856.00000000000000000000)^10+
0.00000000000000831085*(X1- 28856.00000000000000000000)^11+
0.00000000000000002202*(X1- 28856.00000000000000000000)^12+
-0.00000000000000000115*(X1- 28856.00000000000000000000)^13+
-0.00000000000000000000*(X1- 28856.00000000000000000000)^14+
0.00000000000000000000*(X1- 28856.00000000000000000000)^15+
0.00000000000000000000*(X1- 28856.00000000000000000000)^16+
-0.00000000000000000000*(X1- 28856.00000000000000000000)^17+
-0.00000000000000000000*(X1- 28856.00000000000000000000)^18+
0.00000000000000000000*(X1- 28856.00000000000000000000)^19+
0.00000000000000000000*(X1- 28856.00000000000000000000)^20+
-0.00000000000000000000*(X1- 28856.00000000000000000000)^21+
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
regression 21 59071918.0792927290 2812948.4799663206 94.7949695660
error 213 6320567.7366195917 29674.0269324863
total 234 65392485.8159123210
----------------------------------------------------------------------------------
MSE= 29674.0269324863 , R2=0.903344 , R2(adj)=0.893815
X5(Mean)= 17402.8388936170, X5(Var)= 279455.0675893689, X5(sd)= 528.6350987112
X1(Mean)= 28856.0000000000, X1(Var)= 4621.6666666667, X1(sd)= 67.9828409723
t=2,3,...,235
the auto regressive correlation coefficient is population correation ceofficienf of
AR(1), 0.639414=2-2 ρ , ρ = 0.680293,MSE= 126.257395131,
µ (t) probability distribution
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ]
lower limit -123.53180 -59.66444 -22.30443 4.20292 30.71027 68.07028
131.93764
upper limit -123.53180 -59.66444 -22.30443 4.20292 30.71027 68.07028 131.93764
356
observed no 32.00000 30.00000 31.00000 24.00000 23.00000 29.00000 40.00000
25.00000
probability 0.12500 0.12500 0.12500 0.12500 0.12500 0.12500 0.12500
0.12500
expected no 29.25000 29.25000 29.25000 29.25000 29.25000 29.25000 29.25000
29.25000
chi square 0.25855 0.01923 0.10470 0.94231 1.33547 0.00214 3.95085
0.61752
degree of freedom=5
H0: X1~Double exponential(lamda,mu), lamda,mu are unknown
The X estimated value, t=28739,…, 28750,

(2.1)Curve-linear model(Durbin Watson model) estimated line,
date Close inex(A) Esimtated close residual(A-B)
index (B)
2014-06-06 16924.28 16941.10026 -16.82025508
2014-06-09 16943.10 16963.12514 -20.02513837
2014-06-10 16945.92 16969.96551 -24.04550639
2014-06-11 16843.88 16875.76479 -31.88479444
2014-06-12 16734.19 16774.47758 -40.28758081
2014-06-13 16775.74 16819.66408 -43.92408175
2014-06-16 16781.01 16830.12832 -49.11831586
2014-06-17 16808.49 16862.40911 -53.9191117
2014-06-18 16906.62 16963.33398 -56.71398351
2014-06-19 16921.46 16984.15913 -62.69912994
2014-06-20 16947.08 17015.80833 -68.72833244
2014-06-23 16937.26 17013.63539 -76.37538563
(2.2) Curve-linear model(Durbin Watson model) estimated line and simuated the error
value,
date Close inex(A) Simulated close difference(A-B)
index (B))
2014-06-06 16924.28 16962.63737 -38.3573668
2014-06-09 16943.10 16983.23773 -40.13772725
2014-06-10 16945.92 17028.92259 -83.00259165
2014-06-11 16843.88 17037.60969 -193.7296914
2014-06-12 16734.19 17220.70375 -486.5137479
2014-06-13 16775.74 16865.55693 -89.8169267
2014-06-16 16781.01 16920.17659 -139.1665905
2014-06-17 16808.49 16920.35400 -111.8640003
2014-06-18 16906.62 16973.74084 -67.12084393
2014-06-19 16921.46 17000.41977 -78.95976731
2014-06-20 16947.08 17019.8807 -72.80070224
2014-06-23 16937.26 17032.03546 -94.77545902
357
(9.3.3)The estimated line is updated each day,
The estimated line will be re-esimtated when the new date close index is happened.
date Close inex(A) Esimtated close residual(A-B)
index (B)
2014-06-06 16924.28 16941.10026 -16.82025508
2014-06-09 16943.10 16907.17784 35.92215678
2014-06-10 16945.92 16949.50737 -3.58736874
2014-06-11 16843.88 16814.06412 29.81587580
2014-06-12 16734.19 16737.10220 -2.91219893
2014-06-13 16775.74 16795.82379 -20.08378979
2014-06-16 16781.01 16798.94964 -17.93964378
2014-06-17 16808.49 16804.96515 3.52485263
2014-06-18 16906.62 16931.36683 -24.74683243
2014-06-19 16921.46 16915.13261 6.32738863
2014-06-20 16947.08 16948.01616 -0.93616120
2014-06-23 16937.26 16936.82191 0.43808744
358
Appendix 6. The estimation of Cos model analysis
(
appendix 6.1) X 1 ~ Normal µ X1 = 0,σ X2 1 = 2 2 , )
( )
E X 2 x1 = β 0 + β1 cos( x1π ) = 1 + 2 cos( x1π ), ε ~ Normal 0,σ 2 = 1 , ( )
(1)paird samples, n=1000,
(1.1)Basic analysis
(1.2)the frequency probability table of independent variable,

[ 1 ] -7.80947~ -6.15569 -6.98258 3.00000 0.0030000 0.0030000
[ 2 ] -6.15569~ -4.50191 -5.32880 16.00000 0.0160000 0.0190000
[ 3 ] -4.50191~ -2.84813 -3.67502 62.00000 0.0620000 0.0810000
[ 4 ] -2.84813~ -1.19435 -2.02124 174.00000 0.1740000 0.2550000
[ 5 ] -1.19435~ 0.45943 -0.36746 328.00000 0.3280000 0.5830000
[ 6 ] 0.45943~ 2.11321 1.28632 278.00000 0.2780000 0.8610000
[ 7 ] 2.11321~ 3.76699 2.94010 110.00000 0.1100000 0.9710000
[ 8 ] 3.76699~ 5.42077 4.59388 25.00000 0.0250000 0.9960000
[ 9 ] 5.42077~ 7.07455 6.24766 4.00000 0.0040000 1.0000000
(1.3) the frequency probability table of dependent variable,

[ 1 ] -3.92061~ -2.77680 -3.34871 6.00000 0.0060000 0.0060000
[ 2 ] -2.77680~ -1.63299 -2.20490 65.00000 0.0650000 0.0710000
[ 3 ] -1.63299~ -0.48919 -1.06109 173.00000 0.1730000 0.2440000
[ 4 ] -0.48919~ 0.65462 0.08272 207.00000 0.2070000 0.4510000
[ 5 ] 0.65462~ 1.79843 1.22652 207.00000 0.2070000 0.6580000
[ 6 ] 1.79843~ 2.94224 2.37033 201.00000 0.2010000 0.8590000
[ 7 ] 2.94224~ 4.08604 3.51414 111.00000 0.1110000 0.9700000
[ 8 ] 4.08604~ 5.22985 4.65795 28.00000 0.0280000 0.9980000
[ 9 ] 5.22985~ 6.37366 5.80176 2.00000 0.0020000 1.0000000
359
(1.4)
(1.4.1)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 1.7163094683 1.7163094683 0.5523661436
error 998 3100.9808786213 3.1071952692
total 999 3102.6971880896
----------------------------------------------------------------------------------
H0: slope(X1)=0
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 0.9234629152 0.0557439800 16.56615 0.00000
slpoe -0.0205110272 0.0275977632 -0.74321 0.45740
----------------------------------------------------------------------------------
MSE=3.1071952692 , R2=0.000553 , R2(adj)=-0.000448
SSX1=4079.6300111218 , SS(X2*X1)= -83.6774020592, C.V.= 1.9081393352
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
[ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ]
lower limit -2.25910 -1.48353 -0.92433 -0.44653 0.00004
0.44654 0.92433 1.48279 2.25892
upper limit -2.25910 -1.48353 -0.92433 -0.44653 0.00004 0.44654
0.92433 1.48279 2.25892
observed no 103.00000 127.00000 103.00000 93.00000 85.00000
80.00000 78.00000 97.00000 130.00000 104.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000 100.00000 100.00000
chi square 0.09000 7.29000 0.09000 0.49000 2.25000
4.00000 4.84000 0.09000 9.00000 0.16000
degree of freedom=8
p-value=0.000400
Z=-0.364527, p-value=0.357800
360
Z=-0.364527, p-value=0.642200
Z=-0.364527, p-value=0.715600

t=2,3,...,1000
D.W. test=1.965936
Z=0.538054, p-value=0.295200
Z=0.538054, p-value=0.704800
Z=0.538054, p-value=0.590400

p value=0.128927

(1.4.2)residual analysis
[ 1 ] -4.87791~ -3.73037 -4.30414 6.00000 0.0060000 0.0060000
[ 2 ] -3.73037~ -2.58282 -3.15660 64.00000 0.0640000 0.0700000
[ 3 ] -2.58282~ -1.43528 -2.00905 169.00000 0.1690000 0.2390000
[ 4 ] -1.43528~ -0.28774 -0.86151 213.00000 0.2130000 0.4520000
[ 5 ] -0.28774~ 0.85981 0.28603 206.00000 0.2060000 0.6580000
[ 6 ] 0.85981~ 2.00735 1.43358 198.00000 0.1980000 0.8560000
[ 7 ] 2.00735~ 3.15489 2.58112 113.00000 0.1130000 0.9690000
[ 8 ] 3.15489~ 4.30244 3.72866 29.00000 0.0290000 0.9980000
[ 9 ] 4.30244~ 5.44998 4.87621 2.00000 0.0020000 1.0000000
361
mu point estimated value=-0.000000 (MLE)
mu value from -0.352545 to 0.352545

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -2.26150 -1.47300 -0.90448 -0.41871 0.03530 0.48924
0.97499 1.54276 2.33182
upper limit -2.26150 -1.47300 -0.90448 -0.41871 0.03530 0.48924 0.97499
1.54276 2.33182
observed no 103.00000 128.00000 104.00000 96.00000 84.00000 81.00000 82.00000
101.00000 125.00000 96.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.09000 7.84000 0.16000 0.16000 2.56000 3.61000 3.24000
0.01000 6.25000 0.16000
degree of freedom=7
p-value=0.001100
(1.5) X 2i = β 0 + β1 H ( X 1i ) + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope, ε i is error,

(1.5.1)
The relation is X2=0.9710470890+2.0161453275*Cos(X1*pi)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Cos(X1*pi) 1 2093.8428692628 2093.8428692628 2071.3150992447
error 998 1008.8543188268 1.0108760710
total 999 3102.6971880896
----------------------------------------------------------------------------------
H0: slope(X1)=0
Individual test
----------------------------------------------------------------------------------
362
----------------------------------------------------------------------------------
intercept 0.9710470890 0.0318112268 30.52530 0.00000
slpoe 2.0161453275 0.0442994922 45.51170 0.00000
----------------------------------------------------------------------------------
MSE= 1.0108760710 , R2=0.674846 , R2(adj)=0.674520
Cos(X1*pi)(mean)= -0.0234383429, Cos(X1*pi)(variance)= 0.5156261467, Cos(X1*pi)(s.d.)=
0.7180711293
SS(Cos(X1*pi))= 515.1105205874 , SS(X2*Cos(X1*pi))= 1038.5376692321, C.V.= 1.0883655058
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.28855 -0.84618 -0.52722 -0.25469 0.00003 0.25470
0.52722 0.84576 1.28844
upper limit -1.28855 -0.84618 -0.52722 -0.25469 0.00003 0.25470 0.52722
0.84576 1.28844
observed no 97.00000 109.00000 101.00000 104.00000 98.00000 87.00000 120.00000
93.00000 90.00000 101.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.09000 0.81000 0.01000 0.16000 0.04000 1.69000 4.00000
0.49000 1.00000 0.01000
degree of freedom=8
p-value=0.404700
Z=-0.053044, p-value=0.478900
Z=-0.053044, p-value=0.521100
Z=-0.053044, p-value=0.957800
t=2,3,...,1000
D.W. test=2.005910
Z=-0.093348, p-value=0.537200
Z=-0.093348, p-value=0.462800
Z=-0.093348, p-value=0.925600

p value=0.976585

[0.941544 , 1.091230]
[0.970332 , 1.044620]
[0.929334 , 1.108103]
[0.964020 , 1.052665]
[0.906352 , 1.142651]
[0.952025 , 1.068949]
estimated line Cos(X1*pi) residual plot
363
(1.5.2)
[ 1 ] -3.29633~ -2.46000 -2.87817 5.00000 0.0050000 0.0050000
[ 2 ] -2.46000~ -1.62366 -2.04183 35.00000 0.0350000 0.0400000
[ 3 ] -1.62366~ -0.78733 -1.20549 186.00000 0.1860000 0.2260000
[ 4 ] -0.78733~ 0.04901 -0.36916 300.00000 0.3000000 0.5260000
[ 5 ] 0.04901~ 0.88534 0.46718 294.00000 0.2940000 0.8200000
[ 6 ] 0.88534~ 1.72168 1.30351 134.00000 0.1340000 0.9540000
[ 7 ] 1.72168~ 2.55802 2.13985 35.00000 0.0350000 0.9890000
[ 8 ] 2.55802~ 3.39435 2.97618 10.00000 0.0100000 0.9990000
[ 9 ] 3.39435~ 4.23069 3.81252 1.00000 0.0010000 1.0000000
(2)sample size= 100,000,000, it is big data.

(2.1)Basiec analysis
(2.1.1)X1 and X2 joint probability distribution,
f(x1,x2) f(x2,x1)

X1 and X2 are not the relationship of line.
364
E(X2|x1) and Cos(x1) E(X1|x2) and x2 are not linear relation
(2.1.2)X1 marginal probability distribution,

Variance : 3.99987
S.D. : 1.99997
MAD : 1.59576
Range : 20.80164
Median : 0.00077
Q1 : -1.34944
Q2 : 0.00077
Q3 : 1.34927
IQR : 2.69871
C.V. : none
Variance : 3.00085
S.D. : 1.73230
MAD : 1.44677
Range : 14.36703
Mid_range : 0.95895
Median : 0.99994
Q1 : -0.33358
Q2 : 0.99994
Q3 : 2.33413
IQR : 2.66771
C.V. : 1.73202
(2.2)
The relation is X2= 0.9998775155+ 1.9999954117*Cos(X1*pi)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Cos(X1*pi) 1 19999024.1243208420 19999024.1243208420 19980084.2396042200
error 9999998 10009477.3799172790 1.0009479382
total 9999999 30008501.5042381210
----------------------------------------------------------------------------------
H0: slope(X1)=0
Individual test
----------------------------------------------------------------------------------
365
----------------------------------------------------------------------------------
intercept 0.9998775155 0.0003163776 3160.39269 0.00000
slpoe 1.9999954117 0.0004474354 4469.90875 0.00000
----------------------------------------------------------------------------------
MSE= 1.0009479382 , R2=0.666445 , R2(adj)=0.666445
Cos(X1*pi)(mean)= 0.0001418253, Cos(X1*pi)(variance)= 0.4999779471, Cos(X1*pi)(s.d.)=
0.7070911873
SS(Cos(X1*pi))= 4999778.9713012017 , SS(X2*Cos(X1*pi))= 9999535.0023550987, C.V.= 1.0003126411
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.64568 -1.28220 -1.03692 -0.84201 -0.67478 -0.52463
-0.38547 -0.25369 -0.12567 -0.00023 0.12549 0.25345 0.38546 0.52463
0.67475 0.84195 1.03687 1.28210 1.64564
upper limit -1.64568 -1.28220 -1.03692 -0.84201 -0.67478 -0.52463 -0.38547
-0.25369 -0.12567 -0.00023 0.12549 0.25345 0.38546 0.52463 0.67475
0.84195 1.03687 1.28210 1.64564
observed no 500181.00000 499440.00000 500221.00000 499775.00000 500666.00000 499780.00000
498682.00000 499119.00000 501077.00000 499775.00000 499889.00000 501302.00000 499538.00000
499583.00000 500575.00000 501046.00000 499705.00000 499674.00000 500605.00000 499367.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 500000.00000 500000.00000 500000.00000 500000.00000 500000.00000 500000.00000
500000.00000 500000.00000 500000.00000 500000.00000 500000.00000 500000.00000 500000.00000
500000.00000 500000.00000 500000.00000 500000.00000 500000.00000 500000.00000 500000.00000
chi square 0.06552 0.62720 0.09768 0.10125 0.88711 0.09680 3.47425
1.55232 2.31986 0.10125 0.02464 3.39041 0.42689 0.34778 0.66125
2.18823 0.17405 0.21255 0.73205 0.80138
p-value=0.437100
Z=0.400995, p-value=0.655900
Z=0.400995, p-value=0.344100
Z=0.400995, p-value=0.688200
t=2,3,...,10000000
D.W. test=2.000053

D.W. test=1.999947


The joint probability of Cos(X1*pi) and The joint probability of X2 estimated
residual value and X2
366
(2.3)residual analysis,
Variance : 1.00095
S.D. : 1.00047
MAD : 0.79818
Range : 10.68772
Median : 0.00010
Q1 : -0.67486
Q2 : 0.00010
Q3 : 0.67487
IQR : 1.34973
C.V. : none
367
(2.4)Conclusion,
X1~Normal(0,4),X2=1.0000038041+2.0000020130*Cos(X1*pi)+error,
error~Normal(0,1).
(
Appendix 6.2) X 1 ~ Normal µ X1 = 0,σ X2 1 = 2 2 , )
( )
E X 2 x1 = β 0 + β1 cos 2 ( x1π ) = 1 + 2 cos 2 ( x1π ), ε ~ Normal 0,σ 2 = 1 , ( )
(1)paird samples, n=1000,
(1.1)Basic analysis
(1.2)the frequency probability table of independent variable,

[ 1 ] -6.31996~ -4.88709 -5.60352 7.00000 0.0070000 0.0070000
[ 2 ] -4.88709~ -3.45422 -4.17065 38.00000 0.0380000 0.0450000
[ 3 ] -3.45422~ -2.02135 -2.73778 103.00000 0.1030000 0.1480000
[ 4 ] -2.02135~ -0.58848 -1.30491 236.00000 0.2360000 0.3840000
[ 5 ] -0.58848~ 0.84439 0.12796 290.00000 0.2900000 0.6740000
[ 6 ] 0.84439~ 2.27726 1.56083 210.00000 0.2100000 0.8840000
[ 7 ] 2.27726~ 3.71013 2.99370 88.00000 0.0880000 0.9720000
[ 8 ] 3.71013~ 5.14300 4.42657 24.00000 0.0240000 0.9960000
[ 9 ] 5.14300~ 6.57587 5.85944 4.00000 0.0040000 1.0000000
368
(1.3)the frequency probability table of dependent variable,
[ 1 ] -1.49027~ -0.71803 -1.10415 10.00000 0.0100000 0.0100000
[ 2 ] -0.71803~ 0.05421 -0.33191 37.00000 0.0370000 0.0470000
[ 3 ] 0.05421~ 0.82646 0.44034 132.00000 0.1320000 0.1790000
[ 4 ] 0.82646~ 1.59870 1.21258 197.00000 0.1970000 0.3760000
[ 5 ] 1.59870~ 2.37095 1.98482 266.00000 0.2660000 0.6420000
[ 6 ] 2.37095~ 3.14319 2.75707 201.00000 0.2010000 0.8430000
[ 7 ] 3.14319~ 3.91543 3.52931 112.00000 0.1120000 0.9550000
[ 8 ] 3.91543~ 4.68768 4.30156 38.00000 0.0380000 0.9930000
[ 9 ] 4.68768~ 5.45992 5.07380 7.00000 0.0070000 1.0000000
(1.4)
(1.4.1)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 0.0651048857 0.0651048857 0.0483258413
error 998 1344.5120478279 1.3472064607
total 999 1344.5771527135
----------------------------------------------------------------------------------
H0: slope(X1)=0
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 1.9448119558 0.0367074052 52.98146 0.00000
slpoe -0.0041492944 0.0188748948 -0.21983 0.82600
----------------------------------------------------------------------------------
MSE=1.3472064607 , R2=0.000048 , R2(adj)=-0.000954
SSX1=3781.5084539155 , SS(X2*X1)= -15.6905919453, C.V.= 0.5967824839
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.48754 -0.97685 -0.60864 -0.29402 0.00003 0.29403
0.60864 0.97637 1.48742
upper limit -1.48754 -0.97685 -0.60864 -0.29402 0.00003 0.29403 0.60864
0.97637 1.48742
observed no 97.00000 118.00000 98.00000 78.00000 113.00000 96.00000 99.00000
92.00000 99.00000 110.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.09000 3.24000 0.04000 4.84000 1.69000 0.16000 0.01000
0.64000 0.01000 1.00000
degree of freedom=8
p-value=0.164100
369
Z=0.951244, p-value=0.829300
Z=0.951244, p-value=0.170700
Z=0.951244, p-value=0.341400
t=2,3,...,1000
D.W. test=2.063777

D.W. test=1.936223

(1.5) X 2i = β 0 + β1 H ( X 1i ) + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope, ε i is error,

(1.5.1)
The relation is X2= 1.0224268459+ 1.8313426849*Cos(X1*pi)*Cos(X1*pi)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Cos(X1*pi)*Cos(X1*pi) 1 406.7950654736 406.7950654736 432.9166454197
error 998 937.7820872400 0.9396614101
total 999 1344.5771527135
----------------------------------------------------------------------------------
H0: slope(X1)=0
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 1.0224268459 0.0539014758 18.96844 0.00000
slpoe 1.8313426849 0.0880171852 20.80665 0.00000
----------------------------------------------------------------------------------
MSE= 0.9396614101 , R2=0.302545 , R2(adj)=0.301846
Cos(X1*pi)*Cos(X1*pi)(mean)= 0.5037232440, Cos(X1*pi)*Cos(X1*pi)(variance)= 0.1214146108,
Cos(X1*pi)*Cos(X1*pi)(s.d.)= 0.3484459940
SS(Cos(X1*pi)*Cos(X1*pi))= 121.2931961421 , SS(X2*Cos(X1*pi)*Cos(X1*pi))= 222.1294074771, C.V.= 0.4984076333
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.24233 -0.81583 -0.50831 -0.24555 0.00002 0.24556
370
0.50831 0.81542 1.24223
upper limit -1.24233 -0.81583 -0.50831 -0.24555 0.00002 0.24556 0.50831
0.81542 1.24223
observed no 104.00000 92.00000 101.00000 108.00000 89.00000 98.00000 109.00000
103.00000 94.00000 102.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.16000 0.64000 0.01000 0.64000 1.21000 0.04000 0.81000
0.09000 0.36000 0.04000
degree of freedom=8
p-value=0.857100
Z=-0.944739, p-value=0.172400
Z=-0.944739, p-value=0.827600
Z=-0.944739, p-value=0.344800
t=2,3,...,1000
D.W. test=1.995970
D.W. test=2.004030
estimated line Cos(X1*pi)^2, residual plot
(2)sample size= 100,000,000, it is big data.

(2.1)Basiec analysis
(2.1.1)X1 and X2 joint probability distribution,
f(x1,x2) f(x2,x1)
371
sample cov(X1,X2)= -0.0001, X1 and X2 sample correlation coefficient=-0.0000.
E(X2|x1) and Cos(x1) E(X1|x2) and x2 are not linear relation

Variance : 3.99997
S.D. : 1.99999
MAD : 1.59574
Range : 23.12501
Median : 0.00018
Q1 : -1.34872
Q2 : 0.00018
Q3 : 1.34924
IQR : 2.69796
C.V. : none

372
Variance : 1.49994
S.D. : 1.22472
MAD : 0.98580
Range : 12.84450
Mid_range : 1.94613
Median : 2.00005
Q1 : 1.15340
Q2 : 2.00005
Q3 : 2.84642
IQR : 1.69302
C.V. : 0.61237
(2.2)
The relation is X2= 0.9998860304+ 2.0001194854*Cos(X1*pi)*Cos(X1*pi)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Cos(X1*pi)*Cos(X1*pi) 1 50004162.7065743580 50004162.7065743580 50009305.0178369730
error 99999998 99989715.2912962140 0.9998971729
total 99999999 149993877.9978705600
----------------------------------------------------------------------------------
H0: slope(X1)=0
Individual test
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
intercept 0.9998860304 0.0001732013 5772.97106 0.00000
slpoe 2.0001194854 0.0002828333 7071.72575 0.00000
----------------------------------------------------------------------------------
MSE=0.9998971729 , R2=0.333375 , R2(adj)=0.333375
Cos(X1*pi)*Cos(X1*pi)(mean)= 0.5000130858, Cos(X1*pi)*Cos(X1*pi)(variance)= 0.1249954724,
Cos(X1*pi)*Cos(X1*pi)(s.d.)= 0.3535469876
SS(Cos(X1*pi)*Cos(X1*pi))= 12499547.1191571710 , SS(X2*Cos(X1*pi)*Cos(X1*pi))= 25000587.7511875290,
C.V.= 0.4999813058
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.64482 -1.28153 -1.03637 -0.84157 -0.67443 -0.52435
-0.38527 -0.25356 -0.12561 -0.00023 0.12543 0.25331 0.38526 0.52435
0.67439 0.84151 1.03633 1.28143 1.64477
upper limit -1.64482 -1.28153 -1.03637 -0.84157 -0.67443 -0.52435 -0.38527
-0.25356 -0.12561 -0.00023 0.12543 0.25331 0.38526 0.52435 0.67439
0.84151 1.03633 1.28143 1.64477
observed no 5000200.00000 4999190.00000 4999348.00000 5002193.00000 5000539.00000 4998575.00000
4999144.00000 4989605.00000 5010040.00000 4991075.00000 4995383.00000 5010131.00000
4999004.00000 5005535.00000 5000563.00000 5000975.00000 4997245.00000 5002898.00000
5000265.00000 4998092.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.00800 0.13122 0.08502 0.96185 0.05810 0.40613 0.14655
21.61121 20.16032 15.93113 4.26334 20.52743 0.19840 6.12725 0.06339
0.19012 1.51801 1.67968 0.01405 0.72809
p-value=0.000000
373
Z=-0.031195, p-value=0.487600
Z=-0.031195, p-value=0.512400
Z=-0.031195, p-value=0.975200
t=2,3,...,100000000
D.W. test=1.999886

D.W. test=2.000114



The joint probability of Cos(X1*pi)^2 The joint probability of X2 estimated
and residual value and X2
(2.3) residual analysis,

X0=residual, the probability distribution of residual
Variance : 0.99990
S.D. : 0.99995
MAD : 0.79784
Range : 11.58241
Median : 0.00003
Q1 : -0.67448
Q2 : 0.00003
Q3 : 0.67437
IQR : 1.34885
C.V. : none
374
(2.4) Conclusion,
X1~Normal(0,4),X2=1.0000038041+2.0000020130*Cos(X1*pi)^2+error,
error~Normal(0,1).
375
Appendix 7. The population of Logistic distribution
The population is Logistic probabilitydistribution, the population mean is 100 and
the population variance is 4, simulating 100,000,000 samples,
( the parameters of Logisitic are µ = 0, σ = 1.10760 ).
(1)The marginal probability distribution,
Variance : 4.03673
S.D. : 2.00916
MAD : 1.53566
Range : 30.59933
Mid_range : 0.00000
Median : 0.00006
Q1 : -1.21675
Q2 : 0.00006
Q3 : 1.21721
IQR : 2.43396
C.V. : none
(2) Curve-fitting estimated the distribution function,

F(X)= 1/( 1 + exp (- (X-0.0000926492)/ 1.1077102881 ))
SSE=0.000537376801715650 MAX error=0.000062666346455797
(3)Curve-fitting estimated the random variable value,

X= 2.81097635626792910000+
4.78798216581344600000*log(F(x))^1+
2.27256998419761660000*log(F(x))^2+
0.85047294199466705000*log(F(x))^3+
0.20991237275302410000*log(F(x))^4+
0.03521677898243069600*log(F(x))^5+
0.00401852309005334970*log(F(x))^6+
0.00030484539820463397*log(F(x))^7+
0.00001460413875520317*log(F(x))^8+
0.00000039718769073716*log(F(x))^9+
0.00000000465021707252*log(F(x))^10+
0.000000<F(x)<=0.050000
Error=0.000683287633189380 MAX=0.011062371964946749
X= -0.36144773662090302000+
1.13121198117733000000*tan((F(x)-0.5)*pi)^1+
0.21319090574979782000*tan((F(x)-0.5)*pi)^2+
0.02551528438925743100*tan((F(x)-0.5)*pi)^3+
0.00161103846039623020*tan((F(x)-0.5)*pi)^4+
0.00003934353298973292*tan((F(x)-0.5)*pi)^5+
0.050000<F(x)<=0.100000
Error=0.000004880519120589 MAX=0.000144586395838253
376
X= -4.59137023985385890000+
-35.37335491180419900000*log(1-F(x)))^1+
-205.04631805419922000000*log(1-F(x)))^2+
-714.86285400390625000000*log(1-F(x)))^3+
-1048.61547851562500000000*log(1-F(x)))^4+
0.100000<F(x)<=0.150000
Error=0.000001466090004518 MAX=0.000102311196046756
X= 0.29488432407379150000+
2.27502429485321040000*tan((F(x)-0.5)*pi)^1+
1.03891706466674800000*tan((F(x)-0.5)*pi)^2+
0.31234908103942871000*tan((F(x)-0.5)*pi)^3+
0.04100593179464340200*tan((F(x)-0.5)*pi)^4+
0.150000<F(x)<=0.200000
Error=0.000000692821008230 MAX=0.000058836914659688
X= 1.90501141548156740000+
3.42121016979217530000*log(F(x))^1+
1.12947809696197510000*log(F(x))^2+
0.20629882812500000000*log(F(x))^3+
0.200000<F(x)<=0.250000
Error=0.000000522986619037 MAX=0.000056209626235093
X= 0.14986997842788696000+
2.14525794982910160000*tan((F(x)-0.5)*pi)^1+
1.34892559051513670000*tan((F(x)-0.5)*pi)^2+
0.78867864608764648000*tan((F(x)-0.5)*pi)^3+
0.21835052967071533000*tan((F(x)-0.5)*pi)^4+
0.250000<F(x)<=0.300000
Error=0.000000663805238469 MAX=0.000066825379190227
X= 3.28086045384407040000+
-2.52311021089553830000*(1/F(x))^1+
0.51717242598533630000*(1/F(x))^2+
-0.04198981169611215600*(1/F(x))^3+
0.300000<F(x)<=0.350000
Error=0.000000532872788396 MAX=0.000044405749932031
X= -2.69870167225599290000+
-6.55925154685974120000*log(1-F(x)))^1+
-5.23477113246917720000*log(1-F(x)))^2+
-1.98897159099578860000*log(1-F(x)))^3+
0.350000<F(x)<=0.400000
Error=0.000000522831295055 MAX=0.000049890658496810
X= 0.00001115538179874420+
1.39649295806884770000*tan((F(x)-0.5)*pi)^1+
-0.21163129806518555000*tan((F(x)-0.5)*pi)^2+
-1.46191787719726560000*tan((F(x)-0.5)*pi)^3+
-2.79061889648437500000*tan((F(x)-0.5)*pi)^4+
-2.23810195922851560000*tan((F(x)-0.5)*pi)^5+
0.400000<F(x)<=0.450000
Error=0.000000312752819030 MAX=0.000042977291241642
X= 0.00004515495038504014+
1.101215330883860600000000000000*log(F(x)/(1-F(x)))^1+
-0.086022198200225830000000000000*log(F(x)/(1-F(x)))^2+
4.551019668579101600000000000000*log(F(x)/(1-F(x)))^3+
157.682952880859370000000000000000*log(F(x)/(1-F(x)))^4+
2109.235229492187500000000000000000*log(F(x)/(1-F(x)))^5+
377
14309.840820312500000000000000000000*log(F(x)/(1-F(x)))^6+
48705.539062500000000000000000000000*log(F(x)/(1-F(x)))^7+
65953.402343750000000000000000000000*log(F(x)/(1-F(x)))^8+
0.450000<F(x)<=0.500000
Error=0.000000149343082446 MAX=0.000028367131421380
X= 0.00005732061163143953+
1.111632163869217000000000000000*log(F(x)/(1-F(x)))^1+
-0.043361157178878784000000000000*log(F(x)/(1-F(x)))^2+
1.669347763061523400000000000000*log(F(x)/(1-F(x)))^3+
-77.822616577148438000000000000000*log(F(x)/(1-F(x)))^4+
1275.596923828125000000000000000000*log(F(x)/(1-F(x)))^5+
-9452.799804687500000000000000000000*log(F(x)/(1-F(x)))^6+
32981.671875000000000000000000000000*log(F(x)/(1-F(x)))^7+
-44203.011718750000000000000000000000*log(F(x)/(1-F(x)))^8+
0.500000<F(x)<=0.550000
Error=0.000000155651482517 MAX=0.000029820467389419
X= -0.00402648001909255980+
1.216633915901184100000000000000*log(F(x)/(1-F(x)))^1+
-0.990192890167236330000000000000*log(F(x)/(1-F(x)))^2+
4.117090225219726600000000000000*log(F(x)/(1-F(x)))^3+
-8.011781692504882800000000000000*log(F(x)/(1-F(x)))^4+
5.939398765563964800000000000000*log(F(x)/(1-F(x)))^5+
0.550000<F(x)<=0.600000
Error=0.000000273554275783 MAX=0.000038477489750333
X= -0.00899159908294677730+
1.201054513454437300000000000000*log(F(x)/(1-F(x)))^1+
-0.328715801239013670000000000000*log(F(x)/(1-F(x)))^2+
0.489564538002014160000000000000*log(F(x)/(1-F(x)))^3+
-0.262946486473083500000000000000*log(F(x)/(1-F(x)))^4+
0.600000<F(x)<=0.650000
Error=0.000000369839116296 MAX=0.000040407285126498
X= 0.06860533356666564900+
1.00061774253845210000*tan((F(x)-0.5)*pi)^1+
0.91882944107055664000*tan((F(x)-0.5)*pi)^2+
-1.22047185897827150000*tan((F(x)-0.5)*pi)^3+
0.45349740982055664000*tan((F(x)-0.5)*pi)^4+
0.650000<F(x)<=0.700000
Error=0.000000450977800240 MAX=0.000054210018204492
X= 4.74466514587402340000+
29.23645019531250000000*log(F(x))^1+
102.57336425781250000000*log(F(x))^2+
192.31542968750000000000*log(F(x))^3+
142.07812500000000000000*log(F(x))^4+
0.700000<F(x)<=0.750000
Error=0.000000997935804361 MAX=0.000065634571963180
X= -0.23868405818939209000+
2.30883240699768070000*tan((F(x)-0.5)*pi)^1+
-1.29462718963623050000*tan((F(x)-0.5)*pi)^2+
0.54605150222778320000*tan((F(x)-0.5)*pi)^3+
-0.10434103012084961000*tan((F(x)-0.5)*pi)^4+
0.750000<F(x)<=0.800000
Error=0.000001124348221286 MAX=0.000067089558881905
X= -0.36335521936416626000+
2.43422782421112060000*tan((F(x)-0.5)*pi)^1+
378
-1.17309939861297610000*tan((F(x)-0.5)*pi)^2+
0.36107987165451050000*tan((F(x)-0.5)*pi)^3+
-0.04744070023298263500*tan((F(x)-0.5)*pi)^4+
0.800000<F(x)<=0.850000
Error=0.000000821088910784 MAX=0.000064113400427557
X= 0.60040664672851563000+
0.30081748962402344000*tan((F(x)-0.5)*pi)^1+
0.65662479400634766000*tan((F(x)-0.5)*pi)^2+
-0.38221311569213867000*tan((F(x)-0.5)*pi)^3+
0.08851981163024902300*tan((F(x)-0.5)*pi)^4+
-0.00764834880828857420*tan((F(x)-0.5)*pi)^5+
0.850000<F(x)<=0.900000
Error=0.000002097995186246 MAX=0.000088115155979729
X= 0.11100018024444580000+
1.41221305727958680000*tan((F(x)-0.5)*pi)^1+
-0.33743028342723846000*tan((F(x)-0.5)*pi)^2+
0.05258737690746784200*tan((F(x)-0.5)*pi)^3+
-0.00451843289192765950*tan((F(x)-0.5)*pi)^4+
0.00016247624444076791*tan((F(x)-0.5)*pi)^5+
0.900000<F(x)<=0.950000
Error=0.000003175717615387 MAX=0.000147648432990088
X= -2.09225997701287270000+
4.082087025046348600000000000000*log(F(x)/(1-F(x)))^1+
-1.798192268237471600000000000000*log(F(x)/(1-F(x)))^2+
0.605736260768026110000000000000*log(F(x)/(1-F(x)))^3+
-0.125122338649816810000000000000*log(F(x)/(1-F(x)))^4+
0.016424997724243440000000000000*log(F(x)/(1-F(x)))^5+
-0.001370952220895560500000000000*log(F(x)/(1-F(x)))^6+
0.000070298643606747646000000000*log(F(x)/(1-F(x)))^7+
-0.000002016273541016744300000000*log(F(x)/(1-F(x)))^8+
0.000000024715343244219312000000*log(F(x)/(1-F(x)))^9+
0.950000<F(x)<=1.000000
Error=0.000413627728662946 MAX=0.007649680932839686
pdf and df of estimated line

Variance : 4.02790
S.D. : 2.00696
MAD : 1.53496
Range : 22.93481
Median : 0.00033
Q1 : -1.21667
Q2 : 0.00033
Q3 : 1.21715
IQR : 2.43382
C.V. : none
379
(4) SLLN analysis, X1~Logistic, the population mean is 100 and
the population variance is 4,Note:X2~ Logistic( µ = 0, σ = 1.10760 ),

Red line is X1,Blue line is X2
(5) X1~Logistic, the population mean is 100 and the population variance is 4,
simulated 100,000,000 samples, let is Z1=MIN(X1^2,|X1|^0.5).
f(z1),F(z1) Coefficient
Variance : 0.43841
S.D. : 0.66212
MAD : 0.56631
Range : 3.91148
Mid_range : 1.95574
Median : 1.10317
Q1 : 0.32025
Q2 : 1.10317
Q3 : 1.46813
IQR : 1.14788
C.V. : 0.67192
380
Appendix 8. The critical values of Logistic
distribution
The population distribution is Logistic and the size is n,
(1) Population mean test, the test statistic is below.
X − µ0
H 0 : µ = µ 0 ,W2 = ,W2 is symmetric distribution,let P(W2 ≤ W2,,1−α ,n ) = α ,
S n
α
n 0.9 0.95 0.975 0.99 0.995
3 1.832074 2.773549 4.038885 6.494457 9.230786
4 1.617368 2.275064 3.032092 4.273789 5.469409
5 1.524799 2.082087 2.674281 3.561003 4.342657
6 1.473804 1.980366 2.494179 3.223868 3.833998
7 1.440605 1.917936 2.387339 3.029054 3.547273
8 1.417804 1.874686 2.315461 2.901814 3.362455
9 1.400650 1.844090 2.264861 2.813804 3.237146
10 1.387597 1.820916 2.226902 2.749689 3.146810
11 1.377002 1.802217 2.197069 2.699893 3.077922
12 1.368606 1.787869 2.173740 2.660647 3.023007
13 1.361515 1.774840 2.154866 2.629129 2.979524
14 1.355690 1.765044 2.138563 2.602998 2.942553
15 1.350262 1.756018 2.124872 2.580223 2.911964
20 1.332484 1.726209 2.079441 2.507003 2.812751
25 1.322380 1.709321 2.053773 2.467227 2.758745
30 1.315117 1.698104 2.037418 2.442101 2.725241
40 1.306762 1.684679 2.017151 2.410781 2.684739
50 1.301810 1.676165 2.005176 2.393369 2.661536
60 1.298437 1.671040 1.997169 2.381784 2.646289
70 1.295938 1.667310 1.991929 2.373999 2.636381
80 1.294317 1.664634 1.988154 2.367312 2.627865
90 1.292706 1.662162 1.984677 2.363223 2.622213
100 1.291414 1.660411 1.981549 2.357562 2.614991
500 1.283723 1.648030 1.964215 2.331347 2.582061
1000 1.282632 1.646505 1.962219 2.330148 2.579613
(2)Population variance test,

(n − 1)S 2 ,W3 is not symmetric distribution, P(W ≤ W
H 0 : σ = σ 0 ,W3 = 3,,1−α , n ) = α ,
(σ 0 ) 2 3
α
n 0.005 0.01 0.025 0.05 0.01
3 0.008403 0.016867 0.042528 0.086376 0.159444
4 0.059119 0.094817 0.178843 0.293231 0.491692
5 0.169494 0.243808 0.400363 0.592060 0.897213
6 0.336514 0.455166 0.687924 0.957122 1.365034
7 0.552361 0.716395 1.027004 1.372351 1.879315
8 0.809824 1.020691 1.408302 1.827812 2.429735
9 1.103232 1.360638 1.823780 2.316140 3.009192
381
10 1.429280 1.732159 2.269594 2.831804 3.613066
11 1.781371 2.130483 2.741035 3.371553 4.237220
12 2.158214 2.551584 3.233786 3.930505 4.878840
13 2.557395 2.995094 3.746949 4.509101 5.537230
14 2.975774 3.457066 4.276104 5.101819 6.207915
15 3.412854 3.938117 4.825366 5.710974 6.892107
20 5.813817 6.543346 7.745914 8.922009 10.454083
25 8.494694 9.412828 10.911204 12.348840 14.196985
30 11.387816 12.482961 14.245811 15.925322 18.065095
40 17.607472 19.030081 21.286075 23.404382 26.065239
50 24.256767 25.975437 28.678530 31.189666 34.311383
60 31.208779 33.198644 36.314654 39.182994 42.733670
70 38.405502 40.644663 44.133973 47.337958 51.281553
80 45.798381 48.268437 52.107084 55.621938 59.933550
90 53.325879 56.028213 60.204638 64.014510 68.673697
100 60.995366 63.911996 68.402117 72.495428 77.480065
500 404.333799 412.623254 425.07711 436.034652 448.985695
1000 861.758831 874.129319 892.611616 908.755821 927.719993
α
n 0.9 0.95 0.975 0.99 0.995
3 4.711185 6.483522 8.422638 11.254946 13.603583
4 6.522300 8.618926 10.873294 14.110851 16.759289
5 8.208240 10.563933 13.060454 16.597565 19.469364
6 9.810309 12.384144 15.078838 18.858426 21.899558
7 11.353633 14.116056 16.979500 20.964656 24.151310
8 12.848181 15.779548 18.795694 22.958149 26.269819
9 14.310642 17.395572 20.542562 24.866555 28.287443
10 15.739825 18.961779 22.236348 26.706210 30.232595
11 17.144261 20.498782 23.885786 28.487558 32.087822
12 18.530760 22.006093 25.502358 30.219110 33.899608
13 19.897226 23.486719 27.080340 31.925397 35.700713
14 21.244815 24.942558 28.634362 33.576790 37.414163
15 22.579713 26.383770 30.161302 35.214266 39.120753
20 29.091643 32.774641 37.520223 43.022865 47.231696
25 35.402157 40.048426 44.551252 50.408345 54.842358
30 41.575570 46.565597 51.361142 57.554326 62.189708
40 53.630356 59.209573 64.508761 71.276492 76.286030
50 65.419420 71.511919 77.240342 84.492281 89.797246
60 77.025623 83.573324 89.690387 97.378675 102.976929
70 88.496966 95.463891 101.935625 110.020823 115.88847
80 99.855558 107.212127 114.024173 122.457852 128.564761
90 111.138753 118.851592 125.955253 134.762581 141.102749
100 122.353588 130.397197 137.777691 146.911876 153.446014
500 550.948973 567.069979 581.440733 598.524253 610.401251
1000 1072.25162 1094.29574 1108.88654 1136.90360 1152.89168
382
Appendix 9. The transformation of probability
distribution by the simulator
The proability distribution transformation using the simulator,
appendix 9.1, X 1 , X 2 ~ Unform(− 1,1), f X i (xi ) = 0.5,−1 < xi < 1, i = 1,2,
iid
1.1)X1 marginal probability distribution,

X1 pdf and cdf Ceofficeint
Variance : 0.33332
S.D. : 0.57734
MAD : 0.49999
Range : 2.00000
Median : -0.00001
Q1 : -0.50002
Q2 : -0.00001
Q3 : 0.50000
IQR : 1.00002
C.V. : none
1.2)X1,X2 joint probability distribution,
The joint pdf The joint cdf
E(X1)= 0.0000, Var(X1)= 0.3333, E(X2)= -0.0000, Var(X2)= 0.3333,

1.3) Y1 = X 1 + X 2 , marginal probability distribution,
Y1 pdf and cdf Coefficient
383
Variance : 0.66668
S.D. : 0.81650
MAD : 0.66668
Range : 3.99931
Median : 0.00002
Q1 : -0.58580
Q2 : 0.00002
Q3 : 0.58583
IQR : 1.17163
C.V. : none
1.4) Y2 = X 1 × X 2 , marginal probability distribution,

Variance : 0.11112
S.D. : 0.33334
MAD : 0.25001
Range : 1.99957
Median : 0.00000
Q1 : -0.18666
Q2 : 0.00000
Q3 : 0.18672
IQR : 0.37338
C.V. : none
1.5) Y1 = X 1 + X 2 , Y2 = X 1 × X 2 , joint distribution,

Y1,Y2 joint pdf Y1,Y2 joint cdf
E(Y1)=0.0000, Var(Y1)= 0.6667, E(Y2)=0.0000, Var(Y2)=0.1111,

Cov(Y1,Y2)=0.0000, Y1 and Y2 correlation coefficient=0.0000.
X × X2 1
1.6) W2 = 1 = , marginal probability distribution, display the
X1 + X 2 1 X1 +1 X 2
images when the range [-5,5] only,and the mathematical mean and the variance are
not existed.
384
W2 pdf and cdf Coefficient
Variance : 100865.74363
S.D. : 317.59368
MAD : 4.08617
Range : 587070.08862
Mid_range : 1491.26030
Median : -0.00000
Q1 : -0.27023
Q2 : -0.00000
Q3 : 0.27022
IQR : 0.54045
C.V. : none
The second example is shifted- exponential distribution.

appendix 9.2, X 1 ~ Shifted_ exp onential (λ1 = 1, c1 = 0 ), X 2 ~ DEl (λ 2 = 1, µ 2 = 0 ),
X 1 and X 2 are independent random variables,
exp(− x 2 )
f X 1 (x1 ) = exp(− x1 ),0 < x1 < ∞, f X 2 ( x 2 ) = ,0 < x 2 < ∞,
2
X1 pdf and cdf Coefficinet
Variance : 0.99993
S.D. : 0.99996
MAD : 0.73574
Range : 18.35513
Mid_range : 9.17757
Median : 0.69311
Q1 : 0.28768
Q2 : 0.69311
Q3 : 1.38628
IQR : 1.09859
C.V. : 0.99999

X2 pdf and cdf Coefficient
Variance : 1.99993
S.D. : 1.41419
MAD : 1.00000
Range : 35.63209
Median : 0.00002
Q1 : -0.69317
Q2 : 0.00002
Q3 : 0.69318
IQR : 1.38635
C.V. : none
385
the joint pdf the joint cdf
E(X1)= 1.0000, Var(X1)=1.0000, E(X2)= -0.0000, Var(X2)= 2.0002,


Variance : 2.99989
S.D. : 1.73202
MAD : 1.28754
Range : 37.95242
Mid_range : 1.85160
Median : 0.85768
Q1 : 0.00002
Q2 : 0.85768
Q3 : 1.92382
IQR : 1.92380
C.V. : 1.73203
2.5) Y2 = X 1 − X 2 , marginal probability distribution,

Y2 pdf and cdf Ceofficient
Variance : 3.00005
S.D. : 1.73207
MAD : 1.28755
Range : 38.04744
Mid_range : 1.64281
Median : 0.85769
Q1 : 0.00004
Q2 : 0.85769
Q3 : 1.92385
IQR : 1.92382
C.V. : 1.73209
386
2.6) Y1 = X 1 + X 2 , Y2 = X 1 − X 2 , joint proabability distribution,
Y1,Y2 joint pdf Y1,Y2 joint cdf
E(Y1)= 1.0000, Var(Y1)= 3.0001, E(Y2)= 1.0001, Var(Y2)= 3.0000,

Cov(Y1,Y2)=-0.9998, Y1 and Y2 correlation coefficient=-0.3333.
The third example is the conditional distribution.

appendix 9.3, X 1 ~ Arc sin (0,1), X 2 x1 ~ Uniform − x12 , x12 ,( )
f X 1 ( x1 ) = ,−1 < x1 < 1, f X 2 x1 (x 2 x1 ) =
1 1 1
, x 2 ≤ x12 ,
π 1 − x12 2
2 x1
X 1 and X 2 are not independent random variables,
Variance : 0.49999
S.D. : 0.70710
MAD : 0.63661
Range : 2.00000
Median : 0.00006
Q1 : -0.70709
Q2 : 0.00006
Q3 : 0.70710
IQR : 1.41418
C.V. : none
387
Variance : 0.12501
S.D. : 0.35357
MAD : 0.25002
Range : 1.99996
Mid_range : 0.00000
Median : -0.00000
Q1 : -0.16322
Q2 : -0.00000
Q3 : 0.16321
IQR : 0.32643
C.V. : none

The joint pdf The joint cdf
E(X1)= 0.0000, Var(X1)= 0.5000, E(X2)= 0.0000, Var(X2)= 0.1250,


Y1 marginal probability distribution, Coefficient
Variance : 0.62507
S.D. : 0.79061
MAD : 0.63667
Range : 3.99995
Median : 0.00013
Q1 : -0.50677
Q2 : 0.00013
Q3 : 0.50675
IQR : 1.01352
C.V. : none
388
3.5) Y2 = X 1 − X 2 , marginal probability distribution,
Variance : 0.62500
S.D. : 0.79057
MAD : 0.63662
Range : 3.99993
Median : 0.00003
Q1 : -0.50672
Q2 : 0.00003
Q3 : 0.50668
IQR : 1.01340
C.V. : none
3.6) Y1 = X 1 + X 2 , Y2 = X 1 − X 2 , the joint probability distribution,

Y1,Y2 joint pdf Y1,Y2 join cdf
E(Y1)= 0.0000, Var(Y1)= 0.6251, E(Y2)= 0.0000, Var(Y2)= 0.6250,

If the distribution with range limiting, then the forth example will give you the figures
and coefficients of this distribution.
appendix 9.4, X 1 , X 2 ~ Unform(− 1,1), f X i (xi ) = 0.5,−1 < xi < 1, i = 1,2, the range of
iid
random variables is changed to 0.1 ≤ X 12 + X 22 ≤ 0.8 ,

P( 0.1 ≤ X 12 + X 22 ≤ 0.8 )=0.6282,
389
4.1)X1 在 0.1 ≤ X 12 + X 22 ≤ 0.9 ,the conditional marginal probability distribution,
X1 conditional pdf and cdf Coefficinet
Variance : 0.25000
S.D. : 0.50000
MAD : 0.43618
Range : 1.89735
Mid_range : 0.00000
Median : -0.00013
Q1 : -0.42902
Q2 : -0.00013
Q3 : 0.42902
IQR : 0.85803
C.V. : none
4.2)X2 在 0.1 ≤ X 12 + X 22 ≤ 0.8 , the conditional marginal probability distribution,

X2 conditional pdf and cdf Coefficient
Variance : 0.24998
S.D. : 0.49998
MAD : 0.43618
Range : 1.89735
Median : 0.00017
Q1 : -0.42894
Q2 : 0.00017
Q3 : 0.42918
IQR : 0.85813
C.V. : none
4.3)X1,X2 在 0.1 ≤ X 12 + X 22 ≤ 0.8 ,the conditional joint probability distribution,

The conditional joint pdf The conditional joint cdf
E(X1)= -0.0000, Var(X1)= 0.2500, E(X2)= -0.0000, Var(X2)= 0.2500,

Cov(X1,X2)= -0.0000, X1 and X2 correlation coefficient=-0.0000.
390
4.4) Y1 = X 1 + X 2 , 在 0.1 ≤ X 12 + X 22 ≤ 0.8 , the conditional marginal probability
distribution,
Y1 conditional pdf and cdf Ceofficient
Variance : 0.50001
S.D. : 0.70711
MAD : 0.61686
Range : 2.68326
Median : 0.00000
Q1 : -0.60676
Q2 : 0.00000
Q3 : 0.60677
IQR : 1.21353
C.V. : none
4.5) Y2 = X 1 − X 2 , 在 0.1 ≤ X 12 + X 22 ≤ 0.8 , the conditional marginal probability

distribution,
Y2 conditional pdf and cdf Ceofficient
Variance : 0.50000
S.D. : 0.70711
MAD : 0.61687
Range : 2.68326
Mid_range : 0.00000
Median : -0.00012
Q1 : -0.60685
Q2 : -0.00012
Q3 : 0.60671
IQR : 1.21356
C.V. : none
4.6) Y1 = X 1 + X 2 , Y2 = X 1 − X 2 , 在 0.1 ≤ X 12 + X 22 ≤ 0.8 ,the conditional joint

Y1,Y2 conditional joint pdf Y1,Y2 conditional joint cdf
E(Y1)= -0.0000, Var(Y1)= 0.5000, E(Y2)= -0.0000, Var(Y2)= 0.5000,

Cov(Y1,Y2)= -0.0000, Y1 and Y2 correlation coefficient=-0.0000.
391
Of course, the random variables can do the mathametical combination and form new
distributions.
appendix 9.5, X 1 , X 2 , X 3 , X 4 ~ Uniform(α = −1, β = 1),
iid
X 1 = r sin θ , X 2 = r cos θ sin φ , X 3 = r cos θ cos φ sin γ , X 4 = r cos θ cos φ cos γ ,

X 
P1 = R = X 12 + X 22 + X 32 + X 42 , P2 = θ = tan −1  1 × sin φ ,
 X2 
X  X 
P3 = φ = tan −1  2 × sin γ , P4 = γ = tan −1  3 ,
 X3   X4 
5.1)
f P1 ( p1 ) Coefficient
Variance : 0.07466
S.D. : 0.27325
MAD : 0.21831
Range : 1.97337
Mid_range : 1.00441
Median : 1.13923
Q1 : 0.94891
Q2 : 1.13923
Q3 : 1.31634
IQR : 0.36744
C.V. : 0.24356
5.2)
f P2 ( p 2 ) Coefficient
Variance : 0.30483
S.D. : 0.55212
MAD : 0.47791
Range : 3.13068
Median : -0.00001
Q1 : -0.47989
Q2 : -0.00001
Q3 : 0.48001
IQR : 0.95990
C.V. : none
f P3 ( p3 ) Coefficient
Variance : 0.44242
S.D. : 0.66515
MAD : 0.57749
Range : 3.14087
Median : -0.00002
Q1 : -0.57870
Q2 : -0.00002
Q3 : 0.57875
IQR : 1.15746
C.V. : none
392
f P4 ( p 4 ) Coefficient
Variance : 0.78978
S.D. : 0.88869
MAD : 0.78547
Range : 3.14159
Median : -0.00057
Q1 : -0.78551
Q2 : -0.00002
Q3 : 0.78537
IQR : 1.57088
C.V. : none
f P1 , P2 ( p1 , p 2 ) FP1 , P2 ( p1 , p 2 )
E(P1)= 1.1219, Var(P1)= 0.0747, E(P2)= 0.0000, Var(P2)= 0.3049,

Cov(P1,P2)= 0.0000, P1 and P2 correlation coefficient =0.0000.
f P1 , P3 ( p1 , p3 ) FP1 , P3 ( p1 , p3 )
E(P1)= 1.1219, Var(P1)= 0.0747, E(P3)= 0.0000, Var(P3)= 0.4425,

Cov(P1,P3)= -0.0000, P1 and P3 correlation coefficient=-0.0003.
393
f P1 , P4 ( p1 , p 4 ) FP1 , P4 ( p1 , p 4 )
E(P1)= 1.1219, Var(P1)= 0.0747, E(P4)= -0.0001, Var(P4)= 0.7897,

Cov(P1,P4)= -0.0000, P1 and P4 correlation coefficient =-0.0001.
f P2 , P3 ( p 2 , p3 ) FP2 , P3 ( p 2 , p3 )
E(P2)= -0.0001, Var(P2)= 0.3049, E(P3)= 0.0001, Var(P3)= 0.4425,

Cov(P2,P3)= -0.0000, P2 and P3 correlation coefficient =-0.0001.
f P2 , P4 ( p 2 , p 4 ) FP2 , P4 ( p 2 , p 4 )
E(P2)= 0.0000, Var(P2)= 0.3048, E(P4)= 0.0001, Var(P4)= 0.7896,

394
f P3 , P4 ( p3 , p 4 ) FP3 , P4 ( p3 , p 4 )
E(P3)= -0.0000, Var(P3)= 0.4424, E(P4)= -0.0000, Var(P4)= 0.7895,

( )
appendix 9.6, X i ~ Normal µ i = i, σ i2 = 2 2 , i = 1,2,...,10, X 1 ,..., X 10 are indepednent
∑ (X )
10 10
∑X
2
i −X i −X
random variables and let W1 = MAD = i =1
, W2 = S = i =1
.
10 9
f W1 (w1 ) Coefficient
Variance : 0.28346
S.D. : 0.53241
MAD : 0.42532
Range : 5.78206
Mid_range : 3.32641
Median : 2.83962
Q1 : 2.48456
Q2 : 2.83962
Q3 : 3.20518
IQR : 0.72062
C.V. : 0.18672
f W2 (w2 ) Coefficient
Variance : 0.37877
S.D. : 0.61544
MAD : 0.49160
Range : 6.62652
Mid_range : 3.92894
Median : 3.57031
Q1 : 3.15653
Q2 : 3.57031
Q3 : 3.98912
IQR : 0.83258
C.V. : 0.17210
395
Appendix 10. One way analysis when the error
distribution is arcsin
One way analyis,the sampling distribution of test statsistic when error distribution is
arcsin distribution.
X ij = µ + α i + ε ij , i = 1,2,...., k , j = 1,2,..., n,
ε ij ~ Arc sin (µ = 0, c = 1),

iid
E (ε ij ) = 0,Var (ε ij ) = 1, µ = 10, α 1 = α 2 = ..... = α k = 0,
appendix 10.1)k=5, n=5,

( )
n
SST
(1) W1 = 2 = ∑ Yi − Y
2
,degree of freedom=24,
σ i =1
f W1 (w1 ) FW1 (w1 ) Coefficient

S.D. : 13.65246
MAD : 10.00737
Range : 389.32250
Mid_range : 195.61430
Median : 20.93029
Q1 : 14.62201
Q2 : 20.93029
Q3 : 29.84313
IQR : 15.22112
C.V. : 0.56883
Var (W1 ) = 186.38973 ≠ 2 × E (W1 ) = 2 × 24.00091,

SST
is not chi square distribution,
σ2
  w1 − E (W1 )   
2
w − E (W )    = 0.1375511730, Z ~ N (0,1),
E  1 1
− Z
 Var (W1 )  Var (W )   
  1 

  w1 − E (W1 )   
2
 w1 − E (W1 ) 
E  FW3   − Φ    = 0.0044384256,
  Var (W )   Var (W )   
  1   1 

  w − E (W1 )  
 − Φ w1 − E (W1 )  ≥ ε  ,
 
P  FW3  1
 Var (W )   Var (W ) 
  1   1  
ε probability ε probability
0.1000 0.106313 0.0010 0.991236
0.0500 0.591668 0.0005 0.995614
0.0100 0.911350 0.0001 0.999113
0.0050 0.956030
W1 − E (W1 )
is not approached to the standard normal distribution,
Var (W1 )
396
SSTr
(2) W2 = , degree of freedom=4,
σ2
Variance : 11.84350
S.D. : 3.44144
MAD : 2.44357
Range : 101.20935
Median : 3.06624
Q1 : 1.68786
Q2 : 3.06624
Q3 : 5.23325
IQR : 3.54539
C.V. : 0.86028
Var (W2 ) = 11.84350 ≠ 2 × E (W2 ) = 2 × 4.00039,

SSTr
σ2
[(
E W2 − χ 42 (w2 ) )]
2
[ ]
=0.6119647639, E (FW (w2 ) − χ 42 df (w2 )) = 0.0011065152,
2
{ }
2
P FW2 (w2 ) − χ 42 df (w2 ) ≥ ε ,

0.1000 0.000000 0.0010 0.985562
0.0500 0.000000 0.0005 0.992997
0.0100 0.845733 0.0001 0.998608
0.0050 0.925951
W2 is not approached to χ 42 , chi square distribution df=4,
n
= ∑ (εî ) , degree of freedom=20,
SSE
(3) W3 =
2
σ 2
i =1

S.D. : 11.66597
MAD : 8.55045
Range : 337.64671
Mid_range : 169.33018
Median : 17.34728
Q1 : 11.97952
Q2 : 17.34728
Q3 : 24.97483
IQR : 12.99531
C.V. : 0.58328
Var (W3 ) = 136.09474 ≠ 2 × E (W3 ) = 2 × 20.00053,

SSE
σ2
  
2
w − E (W )  w − E (W ) 
E  3 3
− Z 3 3  
=0.1391742868, Z ~ N (0,1),
 Var (W3 )  Var (W )   
  3  
397
  w3 − E (W3 )   w3 − E (W3 )   
2
E  FW3   − Φ    =0.0045366405,
  Var (W )   Var (W )   
  3   3  
  w − E (W3 )  
 − Φ w3 − E (W3 )  ≥ ε  ,
 
P  FW3  3
 Var (W )   Var (W ) 
  3   3  
0.1000 0.123440 0.0010 0.991354
0.0500 0.595534 0.0005 0.995678
0.0100 0.912089 0.0001 0.999136
0.0050 0.956375
W3 − E (W3 )
Var (W3 )
the right side probability
0.995 0.99 0.975 0.95 0.9
W3 3.997720 4.643964 5.767607 6.925535 8.524510

0.1 0.05 0.025 0.01 0.005
W3 34.569056 41.978727 49.692250 60.495766 69.255721
(4) W4 = MSTr MSE = F ,

Variance : 0.78637
S.D. : 0.88678
MAD : 0.61777
Range : 41.31589
Median : 0.88171
Q1 : 0.52013
Q2 : 0.88171
Q3 : 1.41769
IQR : 0.89756
C.V. : 0.80573
[ ] [
E W4 − F (4,20 )(w4 ) =0.0053817782, E (W4 − F (4,20 )df (w4 )) = 0.0003085445,
2 2
]
P{W4 − F (4,20 )df (w4 ) ≥ ε } ,
0.1000 0.000000 0.0010 0.968536
0.0500 0.000000 0.0005 0.982002
0.0100 0.707318 0.0001 0.996260
0.0050 0.859357
398
(5) W4 = MSTr MSE is not approached to F30,1000 distribution,
0.995 0.99 0.975 0.95 0.9
W4 0.060973 0.086961 0.140243 0.203086 0.298185

0.1 0.05 0.025 0.01 0.005
W4 2.132181 2.717875 3.360400 4.318186 5.145067
X 1• − X 2• − (α 1 − α 2 ) X 1• − X 2•
(6) W5 = = , α 1 = 0, α 2 = 0, ,
(
S X 1• − X 2• ) (
S X 1• − X 2• )
Variance : 1.10044
S.D. : 1.04902
MAD : 0.82626
Range : 20.75598
Median : 0.00025
Q1 : -0.68667
Q2 : 0.00025
Q3 : 0.68647
IQR : 1.37314
C.V. : none
[
E w5 − t 20 (w5 )
2
] = 0.0000752963, E[(F W5 (w5 ) − t 20 df (w5 ))2 ] = 0.0000002301,
{
P FW5 (w5 ) − t 20 df (w5 ) ≥ ε , }
0.1000 0.000000 0.0010 0.103650
0.0500 0.000000 0.0005 0.222103
0.0100 0.000000 0.0001 0.875660
0.0050 0.000000
W5 is approached to t 20 distribution,
0.995 0.99 0.975 0.95 0.9
W5 -2.819497 -2.500928 -2.065375 -1.712420 -1.321295
t 20 -2.845336 -2.527554 -2.085834 -1.724817 -1.325341

0.1 0.05 0.025 0.01 0.005
W5 1.321091 1.711830 2.064919 2.500544 2.819458
t 20 1.325341 1.724817 2.085834 2.527554 2.845336
399
(7) W6 = Bartlett’s test statistic,
Variance : 30.24505
S.D. : 5.49955
MAD : 4.27492
Range : 67.01933
Median : 7.34715
Q1 : 4.35582
Q2 : 7.34715
Q3 : 11.35473
IQR : 6.99891
C.V. : 0.65104
Because Var (W6 ) = 30.24505 ≠ 2 × E (W6 ) = 2 × 8.44731, W6 is not chi-square
distribution.
[(
E W6 − χ 42 (w6 ) ) ]=26.9458736711, E [(F (w ) − χ df (w )) ]= 0.0896172970,
2
W6 6
2
4 6
2
{ }
P FW6 (w6 ) − χ 42 df (w6 ) ≥ ε ,
0.1000 0.867559 0.0010 0.998790
0.0500 0.936260 0.0005 0.999394
0.0100 0.987705 0.0001 0.999879
0.0050 0.993895
W6 is not approached to χ 42 , the chi square distribution df=4,
0.995 0.99 0.975 0.95 0.9
W6 0.494955 0.707972 1.147801 1.670573 2.469927

0.1 0.05 0.025 0.01 0.005
W6 15.869184 19.000901 21.967923 25.728415 28.469250
( )
(8) W7 = Max S12 , S 22 ,.., S k2 SSE
Variance : 0.00113
S.D. : 0.03368
MAD : 0.02727
Range : 0.19670
Mid_range : 0.14891
Median : 0.11481
Q1 : 0.09497
Q2 : 0.11481
Q3 : 0.14171
IQR : 0.04673
C.V. : 0.27891
400
0.995 0.99 0.975 0.95 0.9
W7 0.063828 0.0664709 0.070985 0.075578 0.081785

0.1 0.05 0.025 0.01 0.005
W7 0.169536 0.185570 0.198158 0.210713 0.217910
(9) W8 = Levene’ test statistic,

Variance : 4.98559
S.D. : 2.23284
MAD : 1.42844
Range : 64.06654
Median : 2.04625
Q1 : 1.26930
Q2 : 2.04625
Q3 : 3.14623
IQR : 1.87693
C.V. : 0.86494
[ ] [
E W8 − F (4,20 )(w8 ) =3.9392454231, E (W8 − F (4,20 )df (w8 )) = 0.0999608830,
2 2
]
P{W8 − F (4,20 )df (w8 ) ≥ ε },
0.1000 0.876347 0.0010 0.998894
0.0500 0.941644 0.0005 0.999449
0.0100 0.988829 0.0001 0.999890
0.0050 0.994446
W8 is not approached to F4, 20 distribution,
0.995 0.99 0.975 0.95 0.9
W8 0.160785 0.228220 0.364881 0.522319 0.754528

0.1 0.05 0.025 0.01 0.005
W8 4.798365 6.473119 8.443361 11.495912 14.126156
401
(10) W9 = Brwon-Forshe test statistic
Variance : 0.36970
S.D. : 0.60803
MAD : 0.43846
Range : 11.64236
Mid_range : 5.82128
Median : 0.71054
Q1 : 0.44224
Q2 : 0.71054
Q3 : 1.09340
IQR : 0.65116
C.V. : 0.71394

0.995 0.99 0.975 0.95 0.9
W9 0.057296 0.081223 0.129220 0.184337 0.265030
0.1 0.05 0.025 0.01 0.005
W9 1.592510 1.992754 2.422282 3.035811 3.530783
(11) W10 = Hartlely test statistic

Variance : 16957.09140
S.D. : 130.21940
MAD : 29.84775
Range : 282993.23752
Mid_range : 141497.65336
Median : 13.86364
Q1 : 7.17340
Q2 : 13.86364
Q3 : 29.32322
IQR : 22.14982
C.V. : 4.12515

0.995 0.99 0.975 0.95 0.9
W10 1.894661 2.151353 2.664436 3.282465 4.293196

0.1 0.05 0.025 0.01 0.005
W10 62.374554 101.983327 160.620958 281.776053 423.61724
402
(12) W11 = Cochran test statistic
Variance : 0.01814
S.D. : 0.13470
MAD : 0.10908
Range : 0.78679
Mid_range : 0.59564
Median : 0.45922
Q1 : 0.37988
Q2 : 0.45922
Q3 : 0.56682
IQR : 0.18694
C.V. : 0.27891

0.995 0.99 0.975 0.95 0.9
W11 0.255312 0.265884 0.283939 0.302312 0.327139

0.1 0.05 0.025 0.01 0.005
W11 0.678142 0.742279 0.792634 0.842851 0.871640
403
appendix 10.2)k=5, n=100,
( )
n
SST
(1) W1 = 2 = ∑ Yi − Y ,degree of freedom=499,
2
σ i =1

Variance : 3988.28806
S.D. : 63.15289
MAD : 50.02391
Range : 747.34469
Mid_range : 641.32814
Median : 494.72660
Q1 : 454.77224
Q2 : 494.72660
Q3 : 538.51405
IQR : 83.74180
C.V. : 0.12656
Var (W1 ) = 3988.28806 ≠ 2 × E (W1 ) = 2 × 499.01505,

SST
is not chi square
σ2
  
2
w
distribution, E  1
− E (W )  w − E (W ) 
1  
= 0.0097830505, Z ~ N (0,1),
1
− Z 1
 Var (W1 )  Var (W )   
  1  
  w1 − E (W1 )   
2
 w1 − E (W1 ) 
E  FW3   − Φ    = 0.0002900136,
  Var (W )   Var (W )   
  1   1 

  w1 − E (W1 )   w1 − E (W1 )  
P  FW3   − Φ   ≥ ε,
 Var (W )   Var (W ) 
  1   1  
0.1000 0.000000 0.0010 0.969510
0.0500 0.000000 0.0005 0.984718
0.0100 0.643468 0.0001 0.996870
0.0050 0.842283
W1 − E (W1 )
Var (W1 )
404
SSTr
(2) W2 = , degree of freedom=4,
σ2
Variance : 8.19720
S.D. : 2.86308
MAD : 2.18052
Range : 51.46479
Median : 3.34000
Q1 : 1.90970
Q2 : 3.34000
Q3 : 5.37507
IQR : 3.46536
C.V. : 0.71573
Because Var (W2 ) = 8.19720 ≠ 2 × E (W2 ) = 2 × 4.00021,

SSTr
is not chi-square
σ2
distribution.
[(
E W2 − χ 42 (w2 ) ) ]= 0.0024677135, E [(F (w ) − χ df (w )) ]=0.0000036315,
2
W2 2
2
4 2
2
{ }
P FW2 (w2 ) − χ 42 df (w2 ) ≥ ε ,
0.1000 0.000000 0.0010 0.718710
0.0500 0.000000 0.0005 0.863821
0.0100 0.000000 0.0001 0.978192
0.0050 0.000000
W2 is approached to χ 42 , the chi square distribution df=4,
n
(3) W3 = 2 = ∑ (εî ) , degree of freedom=495,
SSE 2
σ i =1

Variance : 3932.49401
S.D. : 62.70960
MAD : 49.67261
Range : 744.47756
Mid_range : 638.10249
Median : 490.75736
Q1 : 451.07856
Q2 : 490.75736
Q3 : 534.23868
IQR : 83.16013
C.V. : 0.12668
Var (W3 ) = 3932.49401 ≠ 2 × E (W3 ) = 2 × 495.01484,

SSE
is not chi-square
σ2
distribution.
  w3 − E (W3 )   
2
w − E (W )    =0.0098191285, Z ~ N (0,1),
E  3 3
− Z
 Var (W3 )  Var (W )   
  3  
405
  w3 − E (W3 )   w3 − E (W3 )   
2
E  FW3   − Φ    =0.0002932969,
  Var (W )   Var (W )   
  3   3  
  w − E (W3 )  
 − Φ w3 − E (W3 )  ≥ ε  ,
 
P  FW3  3
 Var (W )   Var (W ) 
  3   3  
0.1000 0.000000 0.0010 0.969578
0.0500 0.000000 0.0005 0.984778
0.0100 0.643253 0.0001 0.996895
0.0050 0.842916
W3 − E (W3 )
Var (W3 )

0.995 0.99 0.975 0.95 0.9
W3 356.224373 367.426716 384.481196 399.803865 418.205491

0.1 0.05 0.025 0.01 0.005
W3 577.127094 604.779487 629.975965 660.970847 683.208794
(4) W4 = MSTr MSE = F ,

Variance : 0.50476
S.D. : 0.71046
MAD : 0.54302
Range : 11.61955
Mid_range : 5.80983
Median : 0.84266
Q1 : 0.48318
Q2 : 0.84266
Q3 : 1.35048
IQR : 0.86730
[ ]
C.V. : 0.70762
[
E W4 − F (4,495)(w4 ) =0.0000191233, E (W4 − F (4,495)df (w4 )) = 0.0000015378,
2 2
]
P{W4 − F (4,495)df (w4 ) ≥ ε },
0.1000 0.000000 0.0010 0.506144
0.0500 0.000000 0.0005 0.807838
0.0100 0.000000 0.0001 0.958244
0.0050 0.000000
W4 = MSTr MSE is approached to F4, 495 distribution,
406
0.995 0.99 0.975 0.95 0.9
W4 0.052135 0.074801 0.121940 0.178936 0.267685

0.1 0.05 0.025 0.01 0.005
W4 1.950445 2.379906 2.797251 3.338226 3.743513
X 1• − X 2• − (α 1 − α 2 ) X 1• − X 2•
(5) W5 = = , α 1 = 0, α 2 = 0, ,
(
S X 1• − X 2• ) (
S X 1• − X 2• )
Variance : 1.00405
S.D. : 1.00202
MAD : 0.79928
Range : 10.85770
Mid_range : 0.20268
Median : 0.00022
Q1 : -0.67527
Q2 : 0.00022
Q3 : 0.67554
IQR : 1.35081
C.V. : none
[
E w5 − t 495 (w5 )
2
] = 0.0000005357, E[(F W5 (w5 ) − t 495 df (w5 ))2 ]= 0.0000000231,
{
P FW5 (w5 ) − t 495 df (w5 ) ≥ ε , }
0.1000 0.000000 0.0010 0.000000
0.0500 0.000000 0.0005 0.000000
0.0100 0.000000 0.0001 0.611416
0.0050 0.000000
W5 is t 495 distribution,
0.995 0.99 0.975 0.95 0.9
W5 -2.582447 -2.332560 -1.964141 -1.647945 -1.283977
t 495 -2.585516 -2.334550 -1.965193 -1.647786 -1.283195

0.1 0.05 0.025 0.01 0.005
W5 1.283846 1.647997 1.964434 2.332570 2.583137
t 495 1.283195 1.647786 1.965193 2.334550 2.585516
407
S.D. : 10.75054
MAD : 8.14539
Range : 205.06390
Mid_range : 102.53231
Median : 12.45426
Q1 : 7.13221
Q2 : 12.45426
Q3 : 20.02521
IQR : 12.89300
C.V. : 0.71985
Var (W6 ) = 115.57412 ≠ 2 × E (W6 ) = 2 × 14.93448, W6 is not chi squared distribution,
[(
E W6 − χ 42 (w6 ) ) ]=182.3594577579, E [(F (w ) − χ df (w )) ]=0.1843295012,
2
W6 6
2
4 6
2
{ }
P FW6 (w6 ) − χ 42 df (w6 ) ≥ ε ,
0.1000 0.889506 0.0010 0.998921
0.0500 0.945236 0.0005 0.999461
0.0100 0.989155 0.0001 0.999891
0.0050 0.994588
W6 is not approached to χ 42 , the chi square distribution, df=4,
0.995 0.99 0.975 0.95 0.9
W6 0.767567 1.102412 1.796724 2.637296 3.946318

0.1 0.05 0.025 0.01 0.005
W6 29.071280 35.643976 42.153792 50.767209 57.383848
( )
(7) W7 = Max S12 , S 22 ,.., S k2 SSE
Variance : 0.00000
S.D. : 0.00035
MAD : 0.00027
Range : 0.00425
Mid_range : 0.00415
Median : 0.00266
Q1 : 0.00247
Q2 : 0.00266
Q3 : 0.00291
IQR : 0.00043
C.V. : none
408
0.995 0.99 0.975 0.95 0.9
W7 0.0021572 0.0021850 0.0022325 0.0022802 0.0023438
0.1 0.05 0.025 0.01 0.005
W7 0.0031876 0.0033821 0.0035672 0.0038019 0.0039716
Variance : 1.50253
S.D. : 1.22578
MAD : 0.93505
Range : 23.22863
Median : 1.45064
Q1 : 0.83284
Q2 : 1.45064
Q3 : 2.32357
IQR : 1.49074
[ ]
C.V. : 0.70886
[
E W8 − F (4,2495)(w8 ) =0.7874818145, E (W8 − F (4,495)df (w8 )) = 0.0450906919,
2 2
]
P{W8 − F (4,495)df (w8 ) ≥ ε } ,
0.1000 0.820088 0.0010 0.998496
0.0500 0.916875 0.0005 0.999251
0.0100 0.984502 0.0001 0.999850
0.0050 0.992365
0.995 0.99 0.975 0.95 0.9
W8 0.089959 0.129173 0.210553 0.308938 0.461700
0.1 0.05 0.025 0.01 0.005
W8 3.356857 4.098499 4.823204 5.770799 6.483476
Variance : 0.50162
S.D. : 0.70825
MAD : 0.54079
Range : 15.30359
Mid_range : 7.65191
Median : 0.83982
Q1 : 0.48203
Q2 : 0.83982
Q3 : 1.34521
IQR : 0.86318
C.V. : 0.70784
409
0.995 0.99 0.975 0.95 0.9
W9 0.052165 0.074807 0.121894 0.178716 0.267168
0.1 0.05 0.025 0.01 0.005
W9 1.942176 2.370283 2.788086 3.331875 3.741441

Variance : 0.25879
S.D. : 0.50871
MAD : 0.38638
Range : 13.03125
Mid_range : 7.51875
Median : 1.86118
Q1 : 1.59605
Q2 : 1.86118
Q3 : 2.21014
IQR : 0.61409
C.V. : 0.25989
0.995 0.99 0.975 0.95 0.9
W10 1.164465 1.200202 1.262648 1.262647 1.398892
0.1 0.05 0.025 0.01 0.005
W10 2.616746 2.912915 3.208445 3.605771 3.913562
Variance : 0.00118
S.D. : 0.03429
MAD : 0.02655
Range : 0.42061
Mid_range : 0.41064
Median : 0.26363
Q1 : 0.24497
Q2 : 0.26363
Q3 : 0.28796
IQR : 0.04300
C.V. : 0.12711

0.995 0.99 0.975 0.95 0.9
W11 0.213566 0.216314 0.221022 0.225741 0.230942
0.1 0.05 0.025 0.01 0.005
W11 0.315574 0.334829 0.353158 0.376390 0.393185
410
appendix 10.3)k=5, n=1000,
( )
n
SST
(1) W1 = 2 = ∑ Yi − Y , degree of freedom=4999,
2
σ i =1

Variance : 40023.10429
S.D. : 200.05775
MAD : 159.47399
Range : 2021.80635
Mid_range : 5059.38045
Median : 4994.77446
Q1 : 4862.12449
Q2 : 4994.77446
Q3 : 5131.44039
IQR : 269.31590
C.V. : 0.04002
Var (W1 ) = 40023.10429 ≠ 2 × E (W1 ) = 2 × 4999.1498,

SST
is not chi square
σ2
  
2
w
distribution, E  1
− E (W )  w − E (W ) 
1  
=0.0009373094, Z ~ N (0,1),
1
− Z 1
 Var (W1 )  Var (W )   
  1  
  w1 − E (W1 )   
2
 w1 − E (W1 ) 
E  FW3   − Φ    =0.0000288884,
  Var (W )   Var (W )   
  1   1 

  w1 − E (W1 )   w1 − E (W1 )  
P  FW3   − Φ   ≥ ε,
 Var (W )   Var (W ) 
  1   1  
0.1000 0.000000 0.0010 0.900086
0.0500 0.000000 0.0005 0.951076
0.0100 0.000000 0.0001 0.990864
0.0050 0.431664
W1 − E (W1 )
Var (W1 )
411
SSTr
(2) W2 = ,degree of freedom=4,
σ2
Variance : 8.01231
S.D. : 2.83060
MAD : 2.16609
Range : 42.26903
Median : 3.35626
Q1 : 1.92139
Q2 : 3.35626
Q3 : 5.38631
IQR : 3.46492
C.V. : 0.70769
Var (W2 ) = 8.01231 ≠ 2 × E (W2 ) = 2 × 3.99977,

SSTr
σ2
[(
E W2 − χ 42 (w2 ) )]
2
[ ]
=0.0001002801, E (FW (w2 ) − χ 42 df (w2 )) = 0.0000000201,
2
{ }
2
P FW2 (w2 ) − χ 42 df (w2 ) ≥ ε ,

0.1000 0.000000 0.0010 0.000000
0.0500 0.000000 0.0005 0.000000
0.0100 0.000000 0.0001 0.543312
0.0050 0.000000
W2 is approached to χ 42 , chi square distribution df=4,
n
= ∑ (εî ) ,degree of freedom=4995,
SSE
(3) W3 =
2
σ 2
i =1

Variance : 39967.51242
S.D. : 199.91876
MAD : 159.36314
Range : 2020.87869
Mid_range : 5057.82203
Median : 4990.79393
Q1 : 4858.21142
Q2 : 4990.79393
Q3 : 5127.34963
IQR : 269.13821
C.V. : 0.04002
Var (W3 ) = 39967.51242 ≠ 2 × E (W3 ) = 2 × 4995.14421,

SSE
is not chi square
σ2
  
2
w − E (W )  w − E (W ) 
distribution, E  3 3
− Z 3 3  
=0.0009460894, Z ~ N (0,1),
 Var (W3 )  Var (W )   
  3  
412
  w3 − E (W3 )   w3 − E (W3 )   
2
E  FW3   − Φ    =0.0000291555,
  Var (W )   Var (W )   
  3   3  
  w − E (W3 )  
 − Φ w3 − E (W3 )  ≥ ε  ,
 
P  FW3  3
 Var (W )   Var (W ) 
  3   3  
0.1000 0.000000 0.0010 0.903133
0.0500 0.000000 0.0005 0.954476
0.0100 0.000000 0.0001 0.991415
0.0050 0.430280
W3 − E (W3 )
is not approached to the standard normal distribution.
Var (W3 )

0.995 0.99 0.975 0.95 0.9
W3 4502.70324 4548.81977 4615.88752 4674.06159 4730.97382

0.1 0.05 0.025 0.01 0.005
W3 5253.65688 5331.32488 5399.63221 5479.87254 5535.54920
(4) W4 = MSTr MSE = F ,

Variance : 0.49998
S.D. : 0.70709
MAD : 0.54133
Range : 10.46608
Mid_range : 5.23311
Median : 0.83977
Q1 : 0.48088
Q2 : 0.83977
Q3 : 1.34726
IQR : 0.86638
[ ]
C.V. : 0.70686
E W4 − F (4,4995)(w4 ) =0.0000034609,
2
[ ]
E (W4 − F (4,4995)df (w4 )) = 0.0000000413,
2
P{W4 − F (4,4995)df (w4 ) ≥ ε },

0.1000 0.000000 0.0010 0.000000
0.0500 0.000000 0.0005 0.000000
0.0100 0.000000 0.0001 0.726528
0.0050 0.000000
W4 = MSTr MSE is closed to F4, 4995 distribution,
413
0.995 0.99 0.975 0.95 0.9
W4 0.051818 0.074328 0.121122 0.177694 0.265910

0.1 0.05 0.025 0.01 0.005
W4 1.944333 2.370468 2.783068 3.316176 3.714074
X 1• − X 2• − (α 1 − α 2 ) X 1• − X 2•
(5) W5 = = , α 1 = 0, α 2 = 0, ,
(
S X 1• − X 2• ) (
S X 1• − X 2• )
Variance : 1.00029
S.D. : 1.00015
MAD : 0.79804
Range : 10.98179
Median : -0.00023
Q1 : -0.67475
Q2 : -0.00023
Q3 : 0.67450
IQR : 1.34926
C.V. : none
[
E w5 − t 4995 (w5 )
2
] = 0.0000006366, E[(F W5 (w5 ) − t 4995 df (w5 ))2 ]= 0.0000000063,
{
P FW5 (w5 ) − t 4995 df (w5 ) ≥ ε , }
0.1000 0.000000 0.0010 0.000000
0.0500 0.000000 0.0005 0.000000
0.0100 0.000000 0.0001 0.197183
0.0050 0.000000
W5 is t 4995 distribution and approached to standard normal distribution.
0.995 0.99 0.975 0.95 0.9
W5 -2.575921 -2.326218 -1.959983 -1.644768 -1.281282
Z -2.575 -2.326 -1.96 -1.645 -1.28

0.1 0.05 0.025 0.01 0.005
W5 1.282168 1.645533 1.960642 2.326751 2.575256
Z 1.28 1.645 1.96 2.326 2.575
414
S.D. : 11.28843
MAD : 8.62151
Range : 157.25538
Median : 13.30024
Q1 : 7.61448
Q2 : 13.30024
Q3 : 21.36097
IQR : 13.74650
C.V. : 0.71075
Because Var (W6 ) = 127.42862 ≠ 2 × E (W6 ) = 2 × 15.88240, W6 is not chi-square

distribution.
[(
E W6 − χ 42 (w6 ) ) ]=212.7476575956, E [(F (w ) − χ df (w )) ]= 0.1953819219,
2
W6 6
2
4 6
2
{ }
P FW6 (w6 ) − χ 42 df (w6 ) ≥ ε ,
0.1000 0.890833 0.0010 0.998930
0.0500 0.945839 0.0005 0.999466
0.0100 0.989264 0.0001 0.999893
0.0050 0.994642
W6 is not approached to χ 42 , the chi square distribtion ,df=4,
0.995 0.99 0.975 0.95 0.9
W6 0.817171 1.174583 1.916733 2.812830 3.953046

0.1 0.05 0.025 0.01 0.005
W6 30.924768 37.767869 44.408115 53.006542 59.456564
( )
(7) W7 = Max S12 , S 22 ,.., S k2 SSE
Variance : 0.00000
S.D. : 0.00001
MAD : 0.00001
Range : 0.00010
Mid_range : 0.00025
Median : 0.00022
Q1 : 0.00021
Q2 : 0.00022
Q3 : 0.00023
IQR : 0.00001
C.V. : none
415
0.995 0.99 0.975 0.95 0.9
W7 0.0002046 0.0002055 0.0002070 0.0002085 0.0002106

0.1 0.05 0.025 0.01 0.005
W7 0.0002347 0.0002397 0.0002444 0.0002502 0.0002543

Variance : 1.44407
S.D. : 1.20169
MAD : 0.91957
Range : 17.98258
Mid_range : 8.99137
Median : 1.42577
Q1 : 0.81656
Q2 : 1.42577
Q3 : 2.28655
IQR : 1.46999
[ ]
C.V. : 0.70730
E W8 − F (4,4995)(w8 ) =0.7316761313,
2
[ ]
E (W8 − F (4,4995)df (w8 )) = 0.0427728338, P{W8 − F (4,4995)df (w8 ) ≥ ε } ,
2
0.1000 0.814532 0.0010 0.998469
0.0500 0.914560 0.0005 0.999241
0.0100 0.984172 0.0001 0.999852
0.0050 0.992230
0.995 0.99 0.975 0.95 0.9
W8 0.088340 0.126567 0.206201 0.302279 0.451914

0.1 0.05 0.025 0.01 0.005
W8 3.303924 4.029346 4.734599 5.644724 6.321920
416
Variance : 0.50085
S.D. : 0.70771
MAD : 0.54139
Range : 10.09545
Mid_range : 5.04782
Median : 0.83911
Q1 : 0.48065
Q2 : 0.83911
Q3 : 1.34567
IQR : 0.86502
C.V. : 0.70767

0.995 0.99 0.975 0.95 0.9
W9 0.051891 0.074411 0.121295 0.177856 0.265983
0.1 0.05 0.025 0.01 0.005
W9 1.944312 2.372349 2.787915 3.325081 3.724083

Variance : 0.00946
S.D. : 0.09728
MAD : 0.07689
Range : 1.02352
Mid_range : 1.51322
Median : 1.22274
Q1 : 1.16356
Q2 : 1.22274
Q3 : 1.29226
IQR : 0.12870
C.V. : 0.07882

0.995 0.99 0.975 0.95 0.9
W10 1.050612 1.060962 1.078566 1.096067 1.118906

0.1 0.05 0.025 0.01 0.005
W10 1.364065 1.411271 1.454904 1.508924 1.547835
417
Variance : 0.00009
S.D. : 0.00968
MAD : 0.00760
Range : 0.10422
Mid_range : 0.25226
Median : 0.21998
Q1 : 0.21438
Q2 : 0.21998
Q3 : 0.22695
IQR : 0.01257
C.V. : 0.04370

0.995 0.99 0.975 0.95 0.9
W11 0.204408 0.205299 0.206811 0.208331 0.210340

0.1 0.05 0.025 0.01 0.005
W11 0.234450 0.23949 0.244165 0.249944 0.254050
418
Appendix 11. The errors and residuals when the
distribution of the errors is shifted-exponential
ε 1 ,..., ε n ~ Shifted _ exp onetial (λ = 1, c = −1), σ 2 =

iid 1
= 1, ε j is error,
λ2
Y j = β 0 + β1 X 1, j + ε j , j = 1,2,...., n , β 0 = β1 = 1,
k = 1, n = 40, X T εˆ = 0 . The simple linear model that has two conditions about
residual, εˆ is residual, εˆ = Y − Yˆ ,
j j j j
(1) W1 = ε 1
Variance : 0.99967
S.D. : 0.99983
MAD : 0.73578
Range : 16.86253
Mid_range : 7.43126
Median : -0.30712
Q1 : -0.71236
Q2 : -0.30712
Q3 : 0.38622
IQR : 1.09858
C.V. : none
(2) W11 = εˆ1 ,
Variance : 0.96244
S.D. : 0.98104
MAD : 0.72202
Range : 17.21643
Mid_range : 6.42728
Median : -0.27732
Q1 : -0.67318
Q2 : -0.27732
Q3 : 0.39038
IQR : 1.06356
C.V. : none
419
w11 − E (W11 )
Z (w11 ) = ,
Var (W11 )
f W11 (Z (w11 )), FW11 (Z (w11 )) Coefficient
Variance : 1.00000
S.D. : 1.00000
MAD : 0.73598
Range : 17.54916
Mid_range : 6.55163
Median : -0.28255
Q1 : -0.68606
Q2 : -0.28255
Q3 : 0.39805
IQR : 1.08412
C.V. : none
  w11 − E (W11 )   
2
 w11 − E (W11 ) 
E  FW11   − Φ    =0.0065250344,
  Var (W )   Var (W )   
  11   11  
  w − E (W11 )  
 − Φ w11 − E (W11 )  ≥ ε  ,
 
P  FW11  11
 Var (W )   Var (W ) 
  11   11  
0.1000 0.321582 0.0010 0.992575
0.0500 0.644815 0.0005 0.996322
0.0100 0.925030 0.0001 0.999261
0.0050 0.962713
W11 − E (W11 )
is not approached to the standard normal,
Var (W11 )
0.995 0.99 0.975 0.95 0.9
Z (W11 ) -1.300855 -1.230216 -1.12423 -1.030475 -0.915864
Z -2.576 -2.326 -1.96 -1.645 -1.28

0.1 0.05 0.025 0.01 0.005
Z (W11 ) 1.297493 1.977610 2.657788 3.555246 4.232171
Z 1.28 1.645 1.96 2.326 2.576
420
W1 = ε 1 , W11 = εˆ1 ,
f W1 ,W11 (w1 , w11 ) FW1 ,W11 (w1 , w11 )
E(W1)= -0.0001, Var(W1)= 0.9997, E(W11)=-0.0001, Var(W11)= 0.9624,

(3) W12 = εˆ2

Variance : 0.97055
S.D. : 0.98517
MAD : 0.72475
Range : 16.46884
Mid_range : 6.19906
Median : -0.28266
Q1 : -0.67849
Q2 : -0.28266
Q3 : 0.38979
IQR : 1.06828
C.V. : none
w12 − E (W12 )
Z (w12 ) = ,
Var (W12 )
Variance : 1.00000
S.D. : 1.00000
MAD : 0.73566
Range : 16.71681
Mid_range : 6.29217
Median : -0.28714
Q1 : -0.68893
Q2 : -0.28714
Q3 : 0.39544
IQR : 1.08437
C.V. : none
  w12 − E (W12 )   
2
 w12 − E (W12 ) 

E  FW12   − Φ 
  =0.0067629625,
  Var (W )   Var (W )   
  12   12  
421
  w − E (W12 )  
 − Φ w12 − E (W12 )  ≥ ε  ,
 
P  FW12  12
 Var (W )   Var (W ) 
  12   12  
0.1000 0.342103 0.0010 0.992670
0.0500 0.650120 0.0005 0.996354
0.0100 0.926324 0.0001 0.999270
0.0050 0.963381
W12 − E (W12 )
is not the standard normal distribution.
Var (W12 )
0.995 0.99 0.975 0.95 0.9
Z (W12 ) -1.268817 -1.202948 -1.104417 -1.016864 -0.908898
Z -2.576 -2.326 -1.96 -1.645 -1.28

0.1 0.05 0.025 0.01 0.005
Z (W12 ) 1.297989 1.980785 2.662983 3.566898 4.248812
Z 1.28 1.645 1.96 2.326 2.576
W1 = ε 1 , W12 = εˆ2 ,
f W1 ,W12 (w1 , w12 ) FW1 ,W11 (w1 , w12 )
E(W1)= -0.0001, Var(W1)= 0.9997, E(W12)= 0.0002, Var(W12)= 0.9706,

Cov(W1,W12)= -0.0336, W1 and W12 correlation coefficient=-0.0341.
422
W2 = ε 2 , W12 = εˆ2 ,
f W2 ,W12 (w2 , w12 ) FW2 ,W11 (w2 , w12 )
E(W2)= 0.0003, Var(W2)= 1.0008, E(W12)= 0.0002, Var(W12)= 0.9706,

(4) W13 = εˆ3

Variance : 0.91311
S.D. : 0.95557
MAD : 0.70137
Range : 17.91189
Mid_range : 4.68956
Median : -0.24139
Q1 : -0.62590
Q2 : -0.24139
Q3 : 0.39192
IQR : 1.01782
C.V. : none
w13 − E (W13 )
Z (w13 ) = ,
Var (W13 )
Variance : 1.00000
S.D. : 1.00000
MAD : 0.73398
Range : 18.74473
Mid_range : 4.90773
Median : -0.25249
Q1 : -0.65488
Q2 : -0.25249
Q3 : 0.41027
IQR : 1.06514
C.V. : none
423
  w13 − E (W13 )   w13 − E (W13 )   
2
E  FW13   − Φ    =0.0053123134,
  Var (W )   Var (W )   
  13   13  
  w − E (W13 )  
 − Φ w13 − E (W13 )  ≥ ε  ,
 
P  FW13  13
 Var (W )   Var (W ) 
  13   13  
0.1000 0.224635 0.0010 0.992141
0.0500 0.615891 0.0005 0.996039
0.0100 0.920018 0.0001 0.999228
0.0050 0.960373
W13 − E (W13 )
Var (W13 )
0.995 0.99 0.975 0.95 0.9
Z (W13 ) -1.657456 -1.496394 -1.282640 -1.116063 -0.938869
Z -2.576 -2.326 -1.96 -1.645 -1.28
0.1 0.05 0.025 0.01 0.005
Z (W13 ) 1.335149 1.947893 2.609660 3.486793 4.150587
Z 1.28 1.645 1.96 2.326 2.576
W1 = ε 1 , W13 = εˆ3 ,
f W1 ,W13 (w1 , w13 ) FW1 ,W13 (w1 , w13 )
E(W1)= -0.0001, Var(W1)= 0.9997, E(W13)= -0.0001, Var(W13)= 0.9131,

424
W3 = ε 3 , W13 = εˆ3 ,
f W3 ,W13 (w3 , w13 ) FW3 ,W13 (w3 , w13 )
E(W3)= -0.0001, Var(W3)= 0.9997, E(W13)= -0.0001, Var(W13)= 0.9131,

f W11 ,W12 (w11 , w12 ) , W11 = εˆ1 , W12 = εˆ2 , FW11 ,W12 (w11 , w12 )
E(W11)= -0.0001, Var(W11)= 0.9624, E(W12)= 0.0002, Var(W12)= 0.9706,

425
E(W11)= -0.0001, Var(W11)= 0.9624, E(W13)= -0.0001, Var(W13)= 0.9131,
E(W12)= 0.0002, Var(W12)= 0.9706, E(W13)= -0.0001, Var(W13)= 0.9131,

(5) W1 = β̂ 0
Variance : 2.15194
S.D. : 1.46695
MAD : 1.11630
Range : 26.69639
Median : 1.17415
Q1 : 0.22089
Q2 : 1.17415
Q3 : 1.98116
IQR : 1.76028
C.V. : 1.46710
Z (w1 ) =
( ),
βˆ0 − E βˆ0
Var (βˆ )
0

Variance : 1.00000
S.D. : 1.00000
MAD : 0.76097
Range : 18.19856
Median : 0.11878
Q1 : -0.53104
Q2 : 0.11878
Q3 : 0.66891
IQR : 1.19996
C.V. : none
426
  w1 − E (W1 )   
2
 w1 − E (W1 ) 
E  FW1   − Φ    =0.0012270612,
  Var (W )   Var (W )   
  1   1 

  w − E (W1 )  
 − Φ w1 − E (W1 )  ≥ ε  ,
 
P  FW1  1
 Var (W )   Var (W ) 
  1   1  
0.1000 0.000000 0.0010 0.984610
0.0500 0.216585 0.0005 0.992397
0.0100 0.842099 0.0001 0.998467
0.0050 0.923134
W1 − E (W1 ) β 0 − E βˆ0
=
ˆ ( )
( )
Var (W1 ) Var βˆ
0

0.995 0.99 0.975 0.95 0.9
Z (W1 ) -3.575704 -3.045048 -2.340352 -1.807888 1.269886
Z -2.576 -2.326 -1.96 -1.645 -1.28

0.1 0.05 0.025 0.01 0.005
Z (W1 ) 1.128719 1.437290 1.635290 1.913192 2.105262
Z 1.28 1.645 1.96 2.326 2.576
(6) W2 = β̂1
Variance : 1.80847
S.D. : 1.34479
MAD : 1.02249
Range : 24.45152
Mid_range : 7.82075
Median : 0.81303
Q1 : 0.08878
Q2 : 0.81303
Q3 : 1.69951
IQR : 1.61073
C.V. : 1.34471
427
Z (w2 ) =
( ),
βˆ1 − E βˆ1
Var (βˆ )
1

Variance : 1.00000
S.D. : 1.00000
MAD : 0.76033
Range : 18.18235
Mid_range : 5.07192
Median : -0.13908
Q1 : -0.67764
Q2 : -0.13908
Q3 : 0.52011
IQR : 1.19775
C.V. : none
  w2 − E (W2 )   
2
 w2 − E (W2 ) 
E  FW2   − Φ    =0.0015675226,
  Var (W )   Var (W )   
  2   2  
  w − E (W2 )  
 − Φ w2 − E (W2 )  ≥ ε  ,
 
P  FW2  2
 Var (W )   Var (W ) 
  2   2  
0.1000 0.000000 0.0010 0.986468
0.0500 0.291121 0.0005 0.993190
0.0100 0.860482 0.0001 0.998638
0.0050 0.931600
W2 − E (W2 ) β1 − E βˆ1
=
ˆ ( )
( )
Var (W2 ) Var β ˆ
1

0.995 0.99 0.975 0.95 0.9
Z (W2 ) -1.987445 -1.817417 -1.570112 -1.357002 -1.109483
Z -2.576 -2.326 -1.96 -1.645 -1.28

0.1 0.05 0.025 0.01 0.005
Z (W2 ) 1.279628 1.833525 2.3814234 3.106171 3.652040
Z 1.28 1.645 1.96 2.326 2.576
428
f W1 ,W2 (w1 , w2 ) , W1 = β̂ 0 , W2 = β̂1 , FW1 ,W1 (w1 , w2 )
E(W1)= 0.9999, Var(W1)= 2.1519, E(W2)= 1.0001, Var(W2)= 1.8085,

( )
n
SST
= ∑ Yi − Y
2
(6) W3 = , SST is calculated when β1 = 1 ,
σ 2
i =1

S.D. : 17.56098
MAD : 13.20065
Range : 400.55085
Mid_range : 204.03934
Median : 36.17664
Q1 : 27.26585
Q2 : 36.17664
Q3 : 47.99147
IQR : 20.72561
C.V. : 0.44397
Var (W3 ) = 308.38801 ≠ 2 × E (W3 ) = 2 × 39.55462,

SST
σ2
( )
n
SSR
= β̂12 ∑ X i − X
2
(7) W4 = , SSR is calculated when β1 = 1 ,
σ2 i =1

Variance : 9.68168
S.D. : 3.11154
MAD : 1.71586
Range : 169.64891
Median : 0.47843
Q1 : 0.09929
Q2 : 0.47843
Q3 : 1.63546
IQR : 1.53617
C.V. : 2.00395
429
Var (W4 ) = 9.68168 ≠ 2 × E (W4 ) = 2 × 1.55271,
SSR
σ2
n
= ∑ (εî ) , SSE is calculated when β1 = 1 ,
SSE
(8) W5 =
2
σ 2
i =1

S.D. : 17.17265
MAD : 12.89356
Range : 400.94790
Mid_range : 203.82592
Median : 34.67160
Q1 : 25.99420
Q2 : 34.67160
Q3 : 46.20665
IQR : 20.21244
C.V. : 0.45189
Var (W5 ) = 294.90005 ≠ 2 × E (W5 ) = 2 × 38.00192,

SSE
σ2
(9) W6 = MSR MSE = F , MSR,MSE is calculated when β1 = 1 ,
Variance : 13.65630
S.D. : 3.69544
MAD : 2.04573
Range : 170.97656
Median : 0.51479
Q1 : 0.10518
Q2 : 0.51479
Q3 : 1.84476
IQR : 1.73959
C.V. : 2.03951
E [F (1,38)] = 1.05567,Var (F (1,38)) = 2.42586,

MSR
is not F distribution,
MSE
βˆ0 − β 0 βˆ0 − 1
(11) W10 = =
( )
S βˆ0 ( )
S βˆ0
,

Variance : 1.10478
S.D. : 1.05109
MAD : 0.79870
Range : 15.94943
Median : 0.12464
Q1 : -0.56979
Q2 : 0.12464
Q3 : 0.68793
IQR : 1.25773
C.V. : none
430
[(
E FW10 (w10 ) − t 38 df (w10 ) ) ]=0.0011480745,
2
P{F W10 (w10 ) − t 38 df (w10 ) ≥ ε },

0.1000 0.000000 0.0010 0.985395
0.0500 0.152115 0.0005 0.992559
0.0100 0.852356 0.0001 0.998498
0.0050 0.927356
W10 is not approached to t38 distribution,
0.995 0.99 0.975 0.95 0.9
W10 -3.946037 -3.370287 -2.584641 -1.980547 -1.377981
t 38 -2.712425 -2.429447 -2.024893 -1.686300 -1.304611

0.1 0.05 0.025 0.01 0.005
W10 1.125859 1.366965 1.567267 1.794155 1.944844
t 38 1.304611 1.686300 2.024893 2.429447 2.712425
βˆ1 − β1 βˆ1 − 1
(12) W11 = =
( )
S βˆ1 S βˆ1 ( )
,

Variance : 1.10280
S.D. : 1.05014
MAD : 0.79746
Range : 16.06372
Mid_range : 4.44813
Median : -0.14543
Q1 : -0.70618
Q2 : -0.14543
Q3 : 0.54858
IQR : 1.25476
C.V. : 73.16936
[(
E FW11 (w11 ) − t 38 df (w11 ) ) ]=0.0014880275,
2
P{F W11 (w11 ) − t 38 df (w11 ) ≥ ε },

0.1000 0.000000 0.0010 0.986524
0.0500 0.274400 0.0005 0.993086
0.0100 0.862662 0.0001 0.998647
0.0050 0.932422
W11 is not approached to t38 distribution,
0.995 0.99 0.975 0.95 0.9
W11 -1.944481 -1.796878 -1.575114 -1.377803 -1.139698
431
t 38 -2.712425 -2.429447 -2.024893 -1.686300 -1.304611

0.1 0.05 0.025 0.01 0.005
W11 1.359167 1.964282 2.570120 3.361271 3.939669
t 38 1.304611 1.686300 2.024893 2.429447 2.712425
432
Appendix 12. The critical values from two population
means test of arcsin and semi-circle
The critical value table of independent populations test statistic, one population
distribution is Arcsin that population mean is µ1 and the population variance is σ 12 ,
the other distribution is Semi-circle that population mean is µ 2 and the population
variance is σ 22 . The sample sizes of both populations are n.
∑ (X ) ∑ (X )
n n n1 n
∑X ∑ X2j
2
−X2
2
1i 1i − X1 2j
j =1 j =1
X1 = i =1
,X2 = , S12 = i =1
, S 22 = ,
n n n −1 n −1
∑ (X ) + ∑ (X )
n n
2 2
1i − X1 2j −X2
i =1 j =1
σ 12 = σ 22 = σ 2 , S spool
2
= ,
n+n−2
(1) Two population means test,

X1 − X 2
H 0 : µ1 = µ 2 , W 2 = ,W2 is symmetric distribution, P(W2 ≤ W2,,1−α ,n ) = α ,
1 1
S pool +
n n
α
n 0.9 0.95 0.975 0.99 0.995
10 1.321233 1.733986 2.116439 2.598727 2.960445
15 1.305954 1.700311 2.057514 2.494270 2.809739
20 1.299376 1.684809 2.029375 2.448044 2.743348
25 1.297278 1.677524 2.015121 2.422008 2.706834
30 1.294348 1.672459 2.006918 2.405637 2.684140
40 1.289849 1.664630 1.992896 2.383092 2.657254
50 1.286621 1.658805 1.986901 2.370267 2.637129
60 1.287472 1.657117 1.982235 2.363116 2.620850
70 1.286343 1.654364 1.978370 2.356116 2.616767
80 1.286014 1.653953 1.974450 2.353063 2.612298
90 1.285937 1.653342 1.974901 2.350630 2.607947
100 1.284650 1.652818 1.972720 2.351033 2.607600
500 1.280414 1.645491 1.960649 2.324958 2.574445
1000 1.283337 1.652970 1.975193 2.343454 2.591762
(2)Population variance test,

(n + n − 2)S pool
2
H 0 : σ = σ 0 , W3 = = W3 ,W3 is not symmetric distribution,

2
(σ 0 )
P(W3 ≤ W3,,1−α ,n ) = α ,
α
n 0.005 0.01 0.025 0.05 0.01
10 8.472986 9.306884 10.559171 11.666710 12.977979
15 16.205103 17.270988 19.199011 20.242799 21.873149
433
20 24.338859 25.577754 27.432793 29.048569 30.941318
25 32.680777 34.079478 36.167621 37.987511 40.117767
30 41.208021 42.744839 45.037098 47.042741 49.384924
40 58.557826 60.347179 63.015616 63.015615 68.061082
50 76.197494 78.232165 81.220576 83.834295 86.890946
60 94.124526 96.344728 99.638021 102.502531 105.843218
70 112.159900 114.562806 118.159471 121.264451 124.880088
80 130.223667 132.860676 136.724074 140.081709 143.979349
90 148.620290 151.385898 155.472251 159.013461 163.133853
100 167.043767 169.915952 174.239270 177.994473 182.336885
500 928.08501 934.90646 944.68181 953.20538 963.00543
1000 1898.95425 1908.62552 1922.40088 1934.51759 1948.29727
α
n 0.9 0.95 0.975 0.99 0.995
10 23.169775 24.714741 26.074063 27.665324 28.754879
15 34.273278 36.134509 37.760022 39.670647 21.873146
20 45.195865 47.317295 49.185332 51.369164 52.867983
25 56.015660 58.375919 60.434481 62.855678 64.513822
30 66.770287 69.344411 71.594274 74.223254 76.026592
40 88.079901 91.021454 93.60060 96.623616 98.698884
50 109.234755 112.517459 115.387940 118.721108 121.007592
60 130.301510 133.873324 136.993278 140.644338 143.128406
70 151.262203 155.106560 162.416367 162.416368 165.142292
80 172.173458 176.277876 179.847272 184.061336 186.911357
90 193.014321 197.368240 201.161451 205.591631 208.633962
100 213.815328 218.384264 222.394516 227.040805 230.198807
500 1033.23203 1043.26178 1051.97347 1061.99940 1068.91952
1000 2047.79392 2061.82921 2074.05669 2088.33990 2098.14225
(3)Two independent population variances test,

H 0 : σ 1 = σ 2 , W4 = 2 , W4 is not symmetric distribution, P(W4 ≤ W4,,1−α ,n ) = α ,
S12
S2
α
n 0.005 0.01 0.025 0.05 0.01
10 0.257860 0.311962 0.311962 0.472078 0.566889
15 0.388244 0.433577 0.502918 0.566832 0.646494
20 0.462043 0.502012 0.564298 0.621306 0.692211
25 0.511776 0.548402 0.605702 0.658300 0.723245
30 0.548120 0.582645 0.636110 0.685174 0.745664
40 0.600369 0.631719 0.679979 0.724074 0.777892
50 0.636705 0.665592 0.710088 0.750494 0.799855
60 0.664482 0.691267 0.732738 0.770350 0.816132
70 0.685676 0.711352 0.750838 0.786070 0.828912
80 0.703137 0.727312 0.764836 0.798681 0.839268
90 0.718257 0.741329 0.776941 0.809019 0.847807
100 0.730028 0.752700 0.787268 0.818178 0.855074
500 0.869224 0.881224 0.899033 0.914356 0.932450
434
1000 0.905672 0.914159 0.927203 0.938573 0.951877
α
n 0.9 0.95 0.975 0.99 0.995
10 1.898740 2.329106 2.817968 3.577266 4.254482
15 1.622038 1.882642 2.157653 2.553222 2.879902
20 1.496042 1.689531 1.886587 2.157596 2.373311
25 1.422024 1.579803 1.736509 1.946275 2.110210
30 1.372245 1.507056 1.638579 1.812095 1.945284
40 1.308066 1.415256 1.517819 1.649969 1.749932
50 1.267709 1.358526 1.444584 1.553117 1.633779
60 1.239547 1.319421 1.394039 1.488162 1.557474
70 1.217785 1.289998 1.356854 1.440634 1.501634
80 1.201597 1.267759 1.328695 1.404305 1.459434
90 1.188455 1.249250 1.305641 1.375237 1.425492
100 1.177207 1.234306 1.28686 1.351521 1.397114
500 1.073519 1.095503 1.115276 1.138159 1.154078
1000 1.051494 1.066640 1.079930 1.095475 1.106027
435
Appendix 13. The critical values of Zr statistic
The critical value table of Zr test statistic,
1st population is Double exponential distribution, population mean= µ X 1 ,
 
2
( )
population variance= σ X1 , X 1 ~ Double exponential  λ X 1 =

2
σ X1
, µ X1 ,

 
nd
2 population is
 
 2 
X 2 , X 2 x1 ~ Double exponential  λ X 2 = , µ X 2 = x1 ,

 σ X 2 − ρ 2 σ X1
2 2
( ) ( )

population mean= µ X 2 , population variance= σ X 2 .
2
( )
Two populations are dependent, ρ = 0.5 , simulated the n pair samples.
H 0 : ρ ( X 1 , X 2 ) = ρ 0 = 0.5 ,
1 1+ r  1  1 + ρ0 
Z r = ln , Z ρ0 = ln ,
2 1− r  2  1 − ρ 0 
Z r − Z ρ0 Z r − Z 0.70710678118
Z test statistic n →
>10
= = W9 ,
1 1
n−3 17
∑ (X )( )
n n n
1i − X 1 X 2i − X 2 ∑ X 1i ∑X 2i
r= i =1
,X1 = i =1
,X2 = i =1
,
∑ (X ) ∑ (X )
n
2
n
2 n n
1i − X1 2i −X2
i =1 i =1
1 1+ r 
Zr = ln  is approached to standara normal disrribution when n > 10 .
2 1− r 
W9 is not symmetric distribution, P(W9 ≤ W9,1−α ) = α ,
(1)n=5,
α 0.005 0.01 0.025 0.05 0.1
Critical value -2.661393 -2.316441 -1.846836 -1.474369 -1.073827
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.561924 1.984510 2.372679 2.855005 3.204614
(2)n=10,
α 0.005 0.01 0.025 0.05 0.1
Critical value -2.888845 -2.572160 -2.119369 -1.742329 -1.317886
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.698560 2.160881 2.572568 3.064138 3.408551
(3)n=15,
α 0.005 0.01 0.025 0.05 0.1
Critical value -2.978682 -2.665197 -2.214938 -1.834618 -1.401475
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.723679 2.195487 2.613863 3.111734 3.456902
436
(4)n=20,
α 0.005 0.01 0.025 0.05 0.1
Critical value -3.034397 -2.722552 -2.271044 -1.886826 -1.447965
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.732903 2.210861 2.632875 3.133993 3.479965
(5)n=25,
α 0.005 0.01 0.025 0.05 0.1
Critical value -3.074671 -2.763184 -2.309700 -1.923198 -1.479528
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.774830 2.217604 2.640761 3.141121 3.487382
(6)n=30,
α 0.005 0.01 0.025 0.05 0.1
Critical value -3.101670 -2.791216 -2.337317 -1.949068 -1.501659
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.740043 2.222071 2.647542 3.150122 3.497317
(7)n=35,
α 0.005 0.01 0.025 0.05 0.1
Critical value -3.125688 -2.814323 -2.358315 -1.967575 -1.517606
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.779227 2.224297 2.649059 3.150684 3.496449
(8)n=40,
α 0.005 0.01 0.025 0.05 0.1
Critical value -3.146775 -2.833434 -2.376144 -1.984586 -1.532513
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.739866 2.224165 2.648785 3.150423 3.495764
(9)n=50,
α 0.005 0.01 0.025 0.05 0.1
Critical value -3.176120 -2.863397 -2.404082 -2.009037 -1.553884
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.741854 2.227245 2.652193 3.151441 3.497727
(10)n=60,
α 0.005 0.01 0.025 0.05 0.1
Critical value -3.195373 -2.881483 -2.420810 -2.024091 -1.565718
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.740656 2.226746 2.653092 3.153596 3.497734
(11)n=100,
α 0.005 0.01 0.025 0.05 0.1
Critical value -3.248393 -2.931687 -2.466615 -2.065029 -1.601086
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.737634 2.223860 2.651814 3.151870 3.495316
437

Big Data Analysis

Uploaded by

Copyright:

Available Formats

Big Data Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Big Data Analysis

Uploaded by

Copyright:

Available Formats

Big Data

This is a free book, but all copyright is reserved.

Chapter 3 The population proportion test

sample poprtion pˆ = , X ~ B(n, p = 0.5), x = 0,1,..., n,

sample poprtion pˆ = , X ~ B(n, p 0 ), x = 0,1,..., n,

Example 17, X 1 ~ Beta(α = 5, β = 5) , X 2 x1 ~ B(1, x1 ) ,

Chapter 4 One way analysis

Category 5 population, X 5 ~ N (µ 5 = 25, σ 5

The each has n sample data, one way model is designed by

α1 = 0,α 2 = 0,α 3 = 0,α 4 = 0,α 5 = 0, ε ij ~ Normal (0,σ ε2 = 5 2 )

Section 3 the α i ≠ 0, i = 1,2,..., k ,

Category 5 population, X 5 ~ N (µ5 = 45, σ = 5 ),

The each has n sample data, one way model is designed by

α 1 = −10,α 2 = 10,α 3 = −0,α 4 = −20,α 5 = 20, ε ij ~ Normal (0,σ ε2 = 5 2 )

Section 4 the α i ≠ 0, i = 1,2,..., k and error distribution is Arcsin

α 1 = −20,α 2 = −10,α 3 = 0,α 4 = 10,α 5 = 20, ε ij ~ Arc sin (0, cε = 10),

ε 1 j ~ Arc sin (0, cε = 10 ), σ ε2 = 50, ε 2 j ~ Normal (0, σ ε2 ), σ ε2 = 50,

σ ε2 = 50, ε 5 j ~ Triangular1(0, cε = 10 ),σ ε2 = 50,

Section 6 the α i = 0, i = 1,2,..., k and error distribution of each category

ε 1 j ~ Arc sin (0, cε = 10), σ ε2 = 50, ε 2 j ~ Normal (0, σ ε2 ), σ ε2 = 50,

σ ε2 = 50, ε 5 j ~ Triangular1(0, cε = 10), σ ε2 = 50,

Section 7 the α i = 0, i = 1,2,..., k ,

Chapter 5 Simple linear model

Chapter 6 The general linear model and non-linear model

X 2 (t + 1) = 0.1 + 0.8 X 1 (t ) + 0.2 X 3 (t ) − 0.02 X 4 (t ) + ε 1 (t ),

Chapter 7 Multi-variate analysis using linear model

Appendix 1, The probability distribution,

the range of random variables is changed to 0.1 ≤ X 12 + X 22 ≤ 0.8 ,

appendix 9.5， X 1 , X 2 , X 3 , X 4 ~ Uniform(α = −1, β = 1),

X 1 = r sin θ , X 2 = r cosθ sin φ ,

1.1. The frequency distribution table cannot analysis big data

(1.2n=100,000,000, frequency distribution table,

(1.3)n=100,000,000 個, the probability distribution,

(1.4)n=100,000,000, Curve-fitting estimated the cumulative distribution function,

The distribution function estimated line ------

The distribution function estimated line ------

The distribution function estimated line ------

The distribution function estimated line ------

The distribution function estimated line ------

The distribution function estimated line ------

The distribution function estimated line ------

The comparison of estimated value and

1.2. Assumption population is normal distribution, it is not a good

Sample data Big data

X is not normal distribution,

Cov(Y1,Y2)= 0.0667, Y1 and Y2 correlation coefficient=0.7039.

X is not normal distribution,

Cov(Y1,Y2)= 0.0100, Y1 and Y2 correlation coefficient=0.7065.

“The best parameter value method about goodness of fit”

“The best parameter value method about goodness of fit”

The probability distribution,

The comparison of estimated value and

hypothesis and test probability distribution

2. one population sigma confidence interval when population mean is unknown

3.One population mean test , the population standard deviation is unknown

4. one population sigma test when population mean is unknown

(3.2)n=500,000,000, the probability distribution,

(3.23)Comaprsion of the cumulative probability distribution function of X1 and X2,

(3.4) Curve-fittig estimated the distribution function,