Pradytha Galuh Putranti - 2304220013 - SSD - B ING-STAT
Pradytha Galuh Putranti - 2304220013 - SSD - B ING-STAT
Here's how to determine the regression equation, if you have observational data.
1. Freehand Method
This method uses scatter diagrams to visualize observation data, with
the independent variables 𝑋and dependent variables 𝑌plotted on the horizontal
and vertical axes. The benefit is that it involves identifying the relationship
between two variables and determining the type of regression equation. If the
points are around a straight line, linear regression can be concluded, whereas if
they are around a curved line, there is a nonlinear regression. The relationship
between variables can be positive, negative, or there is no particular pattern.
Scatter diagrams help visual analysis for better understanding.
2. Least Squares Method for Linear Regression.
This method is based on the fact that the sum of the squares of the
distance between the points and the regression line being sought must be as
small as possible. For observations consisting of an independent variable
population, the linear regression model is
μy.x= θ 1 + θ 2
Parameter value θ 1 And θ 2 estimated by 𝑎 and 𝑏 so that the regression
equation using sample data is
Ŷ = 𝒂 + 𝒃𝑿
Regression coefficients 𝑎 and 𝑏for linear regression can be calculated
using the formula
(∑ 𝒀𝒊 )(∑ 𝑿𝟐𝒊 ) − (∑ 𝑿𝒊 )(∑ 𝑿𝒊 𝒀𝒊 )
𝒂=
𝒏 ∑ 𝑿𝟐𝒊 − (∑ 𝑿𝒊 )𝟐
𝒏 ∑ 𝑿𝒊 𝒀𝒊 − (∑ 𝑿𝒊 )(∑ 𝒀𝒊 )
𝒃=
𝒏 ∑ 𝑿𝟐𝒊 − (∑ 𝑿𝒊 )𝟐
̅ − 𝒃𝑿
𝒂=𝒀 ̅
𝑺𝟐𝒓𝒆𝒈
𝑭= 𝟐
𝑺𝒓𝒆𝒔
This test is used to find out whether there is a linear relationship between
the independent variable and the dependent variable. Testing the significance of
the linear relationship between the independent variable and the dependent
variable.
b. Test the linearity of the regression model using the formula𝑭
𝑺𝟐𝑻𝑪
𝑭= .
𝑺𝟐𝒆
This test is carried out to find out whether the linear model is suitable
for modeling independent and dependent variables.
F. Non-Linear Regression
a. Quadratic Parabola Model
This general equation is estimated by
Ŷ = 𝒂 + 𝒃𝑿 + 𝒄𝑿2
By using coefficients, they 𝑎, 𝑏, 𝑐 must be determined based on
observational data. By using the least squares method, it 𝑎, 𝑏, 𝑐 can be calculated
with a system of equations:
∑ 𝑌𝑖 = 𝑛𝑎 + 𝑏 ∑ 𝑋𝑖 + 𝑐 ∑ 𝑋𝑖2
∑ 𝑋𝑖 𝑌𝑖 = 𝑎 ∑ 𝑋𝑖 + 𝑏 ∑ 𝑋𝑖2 + 𝑐 ∑ 𝑋𝑖3
∑ 𝑌𝑖 = 𝑛𝑎 + 𝑏 ∑ 𝑋𝑖 + 𝑐 ∑ 𝑋𝑖2 + 𝑑 ∑ 𝑋𝑖3
c. Exponential Model
The general equation of this model is estimated by
̂ = 𝒂𝒃𝑿
𝒀
̂ = 𝐥𝐨𝐠 𝒂 + (𝐥𝐨𝐠 𝒃)𝑿
𝐥𝐨𝐠 𝒀
∑ 𝐥𝐨𝐠 𝒀𝒊 ∑ 𝑿𝒊
𝐥𝐨𝐠 𝒂 = − (𝐥𝐨𝐠 𝒃) ( )
𝒏 𝒏
𝒏(∑ 𝑿𝒊 𝐥𝐨𝐠 𝒀𝒊 ) − (∑ 𝑿𝒊 )(∑ 𝐥𝐨𝐠 𝒀𝒊 )
𝐥𝐨𝐠 𝒃 =
𝒏 ∑ 𝑿𝟐𝒊 − (∑ 𝑿𝒊 )𝟐
̂ = 𝒂𝒆𝒃𝑿
𝒀
d. Geometric Model
The general equation of this model is estimated by
̂ = 𝒂𝑿𝒃
𝒀
̂ = 𝐥𝐨𝐠 𝒂 + 𝒃 𝐥𝐨𝐠 𝑿
𝐥𝐨𝐠 𝒀
∑ 𝐥𝐨𝐠 𝒀𝒊 ∑ 𝐥𝐨𝐠 𝑿𝒊
𝐥𝐨𝐠 𝒂 = −𝒃
𝒏 𝒏
𝒏(∑ 𝐥𝐨𝐠 𝑿𝒊 𝐥𝐨𝐠 𝒀𝒊 ) − (∑ 𝐥𝐨𝐠 𝑿𝒊 )(∑ 𝐥𝐨𝐠 𝒀𝒊 )
𝒃=
𝒏 ∑ 𝒍𝒐𝒈𝟐 𝑿𝒊 − (∑ 𝐥𝐨𝐠 𝑿𝒊 )𝟐
e. Logistics Model
The simplest logistic model can be estimated by
𝟏
𝒀̂=
𝒂𝒃𝒙
𝟏
𝐥𝐨𝐠 ( ) = 𝐥𝐨𝐠 𝒂 + (𝐥𝐨𝐠 𝒃)𝑿
̂
𝒀
𝟏
∑ 𝐥𝐨𝐠 ( ) ∑ 𝑿𝒊
𝒀𝒊
𝐥𝐨𝐠 𝒂 = − (𝐥𝐨𝐠 𝒃) ( )
𝒏 𝒏
𝟏 𝟏
𝒏 (∑ 𝑿𝒊 𝐥𝐨𝐠 (𝒀 )) − (∑ 𝑿𝒊 ) (∑ 𝐥𝐨𝐠 (𝒀 ))
𝒊 𝒊
𝐥𝐨𝐠 𝒃 =
𝒏 ∑ 𝑿𝟐𝒊 − (∑ 𝑿𝒊 )𝟐
f. Hyperbola Model
The simple general equation for the hyperbola model can be written in the form
𝟏
̂=
𝒀
𝒂 + 𝒃𝑿
𝟏
= 𝒂 + 𝒃𝑿
𝒀
𝟏 𝟏
(∑ 𝒀 ) (∑ 𝑿𝟐𝒊 ) − (∑ 𝑿𝒊 ) (∑ 𝑿𝒊 𝒀 )
𝒊 𝒊
𝒂=
𝒏 ∑ 𝑿𝟐𝒊 − (∑ 𝑿𝒊 )𝟐
𝟏 𝟏
𝒏 ∑ 𝑿𝒊 − (∑ 𝑿𝒊 ) (∑ )
𝒀𝒊 𝒀𝒊
𝒃=
𝒏 ∑ 𝑿𝟐𝒊 − (∑ 𝑿𝒊 )𝟐
G. Multiple Linear Regression
Previously we discussed the linear relationship of two variables X and Y using
the linear regression equation Ŷ = 𝑎𝑏 𝑥 .
In reality, a lot of observational data occurs involving more than two variables.
For example, rice yields (Y) are influenced by fertilizer use (X 1 ), rice field area (X 2 )
and rainfall (X 3 ). In general, observational data Y can occur or be influenced by the
independent variables X 1 ,X 2 ,...,X k .
̂ = 𝒂 + 𝒃𝟏 𝑿𝟏 + 𝒃𝟐 𝑿𝟐 +. . . +𝒃𝒌 𝑿𝒌
𝒀
𝒂𝒏 + 𝒃𝟏 ∑ 𝑿𝟏 + 𝒃𝟐 ∑ 𝑿𝟐 +. . . + 𝒃𝒌 ∑ 𝑿𝒌 = ∑ 𝒀
𝒂 ∑ 𝑿𝟏 + 𝒃𝟏 ∑ 𝑿𝟐𝟏 + 𝒃𝟐 ∑ 𝑿𝟏 𝑿𝟐 +. . . +𝒃𝒌 ∑ 𝑿𝟏 𝑿𝒌 = ∑ 𝑿𝟏 𝒀
𝒂 ∑ 𝑿𝟐 + 𝒃𝟏 ∑ 𝑿𝟐 𝑿𝟏 + 𝒃𝟐 ∑ 𝑿𝟐 𝟐 +. . . +𝒃𝒌 ∑ 𝑿𝟐 𝑿𝒌 = ∑ 𝑿𝟏 𝒀
⁞
𝒂 ∑ 𝑿𝒌 + 𝒃𝟏 ∑ 𝑿𝒌 𝑿𝟏 + 𝒃𝟐 ∑ 𝑿𝒌 𝑿𝟐 +. . . +𝒃𝒌 ∑ 𝑿𝒌 𝟐 = ∑ 𝑿𝒌 𝒀
𝑎 = 𝑌̅ − 𝑏𝑋̅
1.349,8 4.209
= − (0,42) = -16.08
26 26
𝑛∑𝑋ᵢ𝑌ᵢ − (∑𝑋ᵢ)(∑𝑌ᵢ)
𝑑=
𝑛∑𝑌ᵢ2 − (∑𝑌ᵢ)²
26(218.682,4)−(4.209)(1.349,8)
= = 0,23
26(70.816,51)−(1.349,8)²
5. Enter SRESID in the Y box and ZPRED in the X box , then check the Normal probability plot.
Next, click the Continue button. OK
Interpretation of Output Results
Output Variables Entered/Removed
From the output it can be seen that the independent variable included in the model is Price and the
dependent variable is Income and no variables were removed. Meanwhile, the regression method uses
Enter.
- The regression coefficient value of the price variable (b) is positive, namely 0.685. This means that
for every increase in production costs by IDR 1, the sales level will also increase by IDR 0.685
t test
The t test in this case is used to find out whether production costs have a significant effect on sales
levels or not. The test uses a significance level of 0.05 and is 2-sided. Test steps as follows:
1. Formulate a hypothesis
Ho: Production costs have no effect on sales levels.
Ha: Production costs influence sales levels.
2. Determine t count and significance
From the output we can get a t count of 2.252 and a significance of 0.048
3. Determine the t table
The t table can be seen in the statistical table at a significance of 0.05 /2 = 0.025 with degrees of
freedom df = n-2 or 12-2 = 10, the results obtained for the t table are 2.228 (see the t table
attachment).
4. Testing Criteria
If –t table > t count < t table then Ho is accepted
If –t count < -t table or t count > t table then Ho is rejected
5. Based on Significance:
If significance is > 0.05 then Ho is accepted
If significance <0.05 then Ho is rejected
6. Make conclusions
The calculated t value > t table (2.252 > 2.228) and significance < 0.05 (0.048 < 0.05), then Ho is
rejected, so it can be concluded that production costs have an effect on sales levels.
b. Autocorrelation Test
Autocorrelation is a correlation between observation members arranged according to time or place. A
good regression model should not have autocorrelation. The test method uses the Durbin-Watson test
(DW test). Decision making in the Durbin Watson test is as follows:
criteria :
Durbin-Watson test
Ho : p=0 (no autocorrelation)
Ha : P ≠0 (there is autocorrelation)
d value: 0.037 (located in the Rejet Ho area)
N= 12
K' = number of independent variables without intercept
dl= 0.971
du=1,331
4-1,331 = 2,669
4- 0.971= 3.029
sig = 5%
Conclusion: the dw value is in positive autocorrelation.
c. Heteroscedasticity Test
Heteroscedasticity is the residual variance that is not the same for all observations in the regression
model. A good regression should not have heteroscedasticity. Below, a heteroscedasticity test is
carried out using the method
graph, namely by looking at the pattern of dots on the regression graph. The basic criteria for decision
making are:
- If there is a certain pattern, such as the points forming a certain regular pattern (wavy, widening then
narrowing), then heteroscedasticity occurs.
- If there is no clear pattern, such as dots spread above and below the number 0 on the Y axis, then
heteroscedasticity does not occur.
The results of the Heteroscedasticity test can be seen in the regression results output, and are
displayed as follows:
From the output it can be seen that the points do not form a clear pattern, and the points spread above
and below the number 0 on the Y axis, so it can be concluded that heteroscedasticity does not occur in
the regression model.
COLLERATORY ANALYSIS
Correlation analysis is a study that discusses the degree (how strong) the relationship
between two or more variables. The measure of the degree of relationship is called the
Correlation Coefficient. Simply put, correlation analysis is a way to find out whether there is
a relationship between variables. Nowadays, the correlation coefficient is a number that
shows the direction and strength of the relationship between two or more variables. This
direction is expressed in the form of a positive or negative relationship
• If the value of a variable is increased, it will increase the value of other variables.
• If the value of a variable is decreased, it will decrease the value of other variables.
• If the value of a variable is increased, it will decrease the value of other variables.
• If the value of a variable is decreased, it will increase the value of other variables.
Strong relationship
• The strength of the relationship is expressed in the form of a number, between 0 – 1.
The number 0 indicates a relationship that does not exist. Number 1 indicates a
perfect relationship
• For more details, pay attention to the following table of levels of correlation and
strength of relationship:
Example
• If r = -1, it means perfect negative correlation. This indicates that there is an inverse
relationship between variable X and variable Y where if variable X increases, then
variable Y decreases.
• If r = +1, it means perfect positive correlation. This indicates that there is a
unidirectional relationship between variable X and variable Y, where if variable X
increases, variable Y also increases.
Correlation coefficient
Correlation Techniques
The following are guidelines for choosing a correlation technique based on the type of data
used:
Example of a correlation question :
The following table shows the authoritarianism scores and scores struggle social from
12 people student :
Score
Student Authoritarianism Struggle social
A 82 42
B 98 46
C 87 39
D 40 37
E 116 65
F 113 88
G 111 86
H 83 56
I 85 62
J 126 92
K 106 54
L 117 81
The following table show score authoritarianism and score struggle social from 12
students :
Student Score X² Y² XY
Authoritarianism Struggle
(X) Social (Y)
A 82 42 6724 1764 3444
B 98 46 9604 2116 4508
C 87 39 7569 1521 3393
D 40 37 1600 1369 1480
E 116 65 13456 4225 7540
F 113 88 12769 7744 9944
G 111 86 12321 7396 9546
H 83 56 6889 3136 4648
I 85 62 7225 3844 5270
J 126 92 15876 8464 11592
K 106 54 11236 2916 5729
L 117 81 13689 6561 8586
ΣX = 1164 ΣY = 748 ΣX² = ΣY² = ΣXY =
118958 51056 75680
12(75680) − (1164)(748)
𝑟𝑥𝑦 =
√12(118958) − (1164)2 . √12(51056) − (748)2
908160 − 870672
=
√1427496 − (1164)2 . √612672 − (748)2
37488 37488
= = = 0,60
√72600 . √53168 62127.48
𝑟𝑥𝑦 = 0.60
0,60 √12−2
Tcount → 𝑡𝑜 = = 2,38
√1−(0,60)2
a = 0.01 , df = 12 – 2 = 10, t- table → t( 0.005;10)=3.17
conclusion : thank Ho because |𝑡𝑜| ≤ 𝑡 𝑎⁄2 ↔ 2,38 < 3,17
meaning : there is no significant relationship score authoritarianism with score
struggle social .
Regression and Correlation Hypothesis Testing with Microsoft Excel
Case:
The following is sales data from snack companies:
X : percentage increase in advertising costs
Y: percentage increase in sales results
X 1 2 4 5 7 9 10 12
Y 2 4 5 7 8 10 12 14
Determine the regression equation from the sales data.
Solution:
1. Type the data to be analyzed then name it Advertising Costs and Sales Results .
In general, the results of regression analysis provide calculation results which are
arranged in three tables. From the output results, it can be seen that the constant value
(Intercept) is 1.267 and the b coefficient value (X variable) is 1.037. The coefficient of
determination can be seen in the first table, Regression Statistics, which displays an R
square value of 0.984.
This means that 98.4% of variations or changes in the Sales Results variable can be
explained and influenced by changes in the Advertising Cost variable, while 1.6% is caused
by the influence of other variables that are not observed. By paying attention to the
regression results, the following regression equation can be obtained
Ŷ=a+bX=1.267+1.037 where Y is Sales Results and X is Advertising Costs.
If it is known that there is a relationship (correlation) between the independent variable and
the dependent variable, then we can then see how big the relationship is between the two
variables. The steps for testing correlation analysis using Microsoft Excel are as follows.
1. Type the data to be analyzed then name it Advertising Costs and Sales Results .
3. Next the Correlations dialog box will open. In the Input section, enter the data range for the
two variables whose relationship will be tested, namely the Advertising Cost variable and
the Sales Results variable by blocking the corresponding data. Grouped By Activate the
Columns option and check the Label in first row box to display the label description.
4. In the Output Options section , click the Output Range option and click an empty column
on the worksheet to display the analysis results on the same worksheet. Then click OK .
5. The results of the correlation analysis will give the following results.
In the output, it can be seen that the relationship between the Advertising Cost variable
and Sales Results is 0.98927. Seeing the large value of the relationship between these two
variables, it can be concluded that Advertising Costs have a very strong and positive
relationship with Sales Results . Every time there is an increase in advertising costs , sales
results will also increase linearly .