0% found this document useful (0 votes)
20 views8 pages

Yaregal Birhanu

Applied Econometrics
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views8 pages

Yaregal Birhanu

Applied Econometrics
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Addis Ababa University

College of Business and Economics


Department of Accounting and Finance
MSc in Accounting and Finance

Applied Econometrics for Accounting and


Finance
Assignment #2

Prepared by
Yaregal Birhanu GSE/9996/15
Section 2 Submitted to: Temesgen (PhD)
AAU, CoBE
Dec 29, 2023
Part I:

1. Cross-Sectional Data: a type of observational data that is collected at a single point in time or
over a very short period. Each observation in cross-sectional data represents a distinct individual
or unit, and the data is collected at a specific moment or within a specific time frame. Key
characteristics of cross-sectional data: Snapshot in Time, no time dimension, and
heterogeneity…

2. Time Series Data: a type of data that is collected or recorded over time, typically at regular
intervals. It involves the observation of a single variable or several variables over successive
periods.

3. Panel Data: it combines elements of both cross-sectional and time series data. It involves
observations on multiple subjects or entities over multiple time periods.

4. Pooled Cross-Sectional Data: a type of data that combines cross-sectional observations from
multiple time periods.

5. Correlation: measures the strength and direction of a linear relationship between two variables.

Regression: is about prediction and understanding the quantitative relationship between


variables.

6. Ordinary Least Squares: the minimization of the sum of the squared residuals.

 a method used in regression analysis to estimate the parameters of a linear regression


model
 provides estimates of the coefficients that minimize the sum of squared differences
between the observed and predicted values

7. The role of the error term: it represents the unobservable factors that affect the dependent
variable but are not explicitly included in the model.

8. The difference between the error term and residual: error term represents unobservable factors,
while residuals are the observed differences between actual and predicted values.

9. Confidence Interval: a statistical tool used to quantify the uncertainty or variability associated
with a point estimate of a population parameter. It provides a range of values within which we
can be reasonably confident that the true parameter lies.

10. The assumptions for Ordinary Least Square:

 Linearity: the relationship between the dependent variable and the independent variables
is linear. This means that changes in the independent variables have a constant effect on
the dependent variable.

1
 Homoskedasticity: the variance of the residuals is constant across all levels of the
independent variables. This means that the spread of residuals should be roughly the
same for all values of the independent variables.
 No Perfect Multi-collinearity: no perfect linear relationship among the independent
variables. Multi collinearity occurs when two or more independent variables are highly
correlated, making it difficult to separate their individual effects on the dependent
variable
 Normality: the distribution of residuals is normal

11. The nature, cause, and consequence of multicollinearity

 Nature of Multicollinearity:
 High Correlation: high correlation coefficients between independent variables,
indicating that changes in one variable are associated with systematic changes in
another.
 Interpretation Challenge: In the presence of multicollinearity, it becomes
challenging to recognize the individual impact of each independent variable on the
dependent variable.
 Causes of Multicollinearity:
 Data Collection Methods: if two variables are highly correlated because they are
derived from similar sources or measured in a similar way
 Overlapping Concepts: Variables that measure similar or overlapping concepts are
likely to be correlated.
 Functional Relationships: If there are functional relationships among the
independent variables.
 Sample Size: In small samples, the estimation of correlation coefficients may be
imprecise, leading to difficulties in identifying multicollinearity.
 Consequences of Multicollinearity:
 Increased Standard Errors: Multicollinearity inflates the standard errors of the
regression coefficients, making them less precise.
 Unstable Coefficients: Small changes in the data can lead to large changes in the
estimated coefficients, making the results unstable and sensitive to variations in the
sample.
 Ambiguous Variable Importance: Multicollinearity makes it challenging to
determine which variables are truly important in explaining the variation in the
dependent variable.
 Inefficient Estimation: The efficiency of the parameter estimates is compromised, as
the model struggles to free the effects of highly correlated variables.
 Misleading Interpretations: can lead to misleading interpretations of the
relationships between independent variables and the dependent variable. It may
suggest that certain variables are not important when, in fact, they are.

2
12. The importance of the normality assumption in OLS:

 Statistical Inference: the normality assumption is particularly relevant when conducting


hypothesis tests and constructing confidence intervals for the regression coefficients.
Under normality, the sampling distribution of the OLS estimators is known, and standard
statistical tests can be applied.

 t-Tests and p-Values: inference about individual coefficients relies on t-tests, which
assume that the sampling distribution of the estimated coefficients is approximately
normal. The p-values associated with these tests are accurate when the normality
assumption holds.

 Confidence Intervals: the construction of confidence intervals for the coefficients


assumes normality. Normality allows for the use of critical values from the standard
normal distribution, making it easier to determine the bounds of the confidence interval.

 Large Sample Size Mitigation: for large sample sizes, the central limit theorem suggests
that the distribution of the sample mean approaches normality, even if the underlying
distribution of the error term is not exactly normal. This means that the normality
assumption becomes less critical with larger samples.

13. The nature, cause, and consequence of heteroscedasticity

 the variability of the errors (residuals) in a regression model is not constant across all
levels of the independent variable(s)
 Nature of Heteroscedasticity:
o Unequal Spread: Heteroscedasticity manifests as an unequal spread of residuals.
This means that the variability of the errors systematically changes across
different values of the independent variable(s).
 Causes of Heteroscedasticity:

o Omitted Variables: Failure to include relevant variables in the model that affect
the variability of the dependent variable.
o Transformation Issues: The use of transformations (e.g., logarithmic
transformations) on variables may introduce heteroscedasticity if the
transformation affects variability differently across levels of the independent
variable.
o Measurement Error: If there is measurement error in the dependent variable that
varies systematically with the independent variable.
 Consequences of Heteroscedasticity:

3
o Inefficient Estimates: Heteroscedasticity leads to inefficient estimates of the
standard errors of the regression coefficients. This means that the precision of the
estimates is compromised.

o Incorrect Inference: The violation of the homoscedasticity assumption can lead


to incorrect inferences about the statistical significance of coefficients. Standard
hypothesis tests and confidence intervals may be unreliable.

o Misleading Conclusions: If heteroscedasticity is present but not addressed, it can


lead to incorrect conclusions about the importance and significance of variables in
the model.

o Violation of Assumptions: Heteroscedasticity is a violation of one of the


classical assumptions of OLS regression. While OLS remains unbiased, the
assumptions about the precision of the estimates are not met.

Part II: Using dataset1 and dataset2


1. a)
reg salary sales profits mktval

Source | SS df MS Number of obs = 177


-------------+---------------------------------- F(3, 173) = 12.46
Model | 10800004.4 3 3600001.45 Prob > F = 0.0000
Residual | 49965960.4 173 288820.58 R-squared = 0.1777
-------------+---------------------------------- Adj R-squared = 0.1635
Total | 60765964.7 176 345261.163 Root MSE = 537.42

------------------------------------------------------------------------------------------
salary | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------------------
sales | .0159837 .011093 1.44 0.151 -.0059112 .0378787
profits | .0317025 .2764857 0.11 0.909 -.514017 .5774221
mktval | .023831 .0159338 1.50 0.137 -.0076187 .0552807
_cons | 717.0624 47.51152 15.09 0.000 623.2855 810.8393
b) Sales:
 Coefficient: 0.0159837
 Interpretation: Holding other variables constant, a one-unit increase in sales is associated
with an increase of approximately 0.016 units in salary. However, the p-value (P>|t|) is
0.151, suggesting that the relationship is not statistically significant at the conventional
0.05 significance level.

4
Profits:
 Coefficient: 0.0317025
 Interpretation: Holding other variables constant, a one-unit increase in profits is
associated with an increase of approximately 0.032 units in salary. However, the p-value
is 0.909, indicating that the relationship is not statistically significant.
Market Value:
 Coefficient: 0.023831
 Interpretation: Holding other variables constant, a one-unit increase in market value is
associated with an increase of approximately 0.024 units in salary. The p-value is 0.137,
suggesting that the relationship is not statistically significant at the conventional 0.05
significance level.
Intercept (_cons):
 Coefficient: 717.0624
 Interpretation: When all independent variables are zero (sales, profits, and market value
are all zero), the estimated average salary is 717.06. This is the intercept or the baseline
salary.
c) Only the intercept (baseline salary) is statistically insignificant at the 0.05 significance
level.
d) The coefficient of determination (R2) provides the proportion of the variation in the
dependent variable (CEO salaries) explained by the independent variables (sales, profits, and
market value) in the regression model. R2 value is 0.1777, which means that approximately
17.77% of the total variation in CEO salaries is explained by the variation in sales, profits, and
market value.
e) CEO Tenure (ceoten): Coefficient is 12.73086.
For each additional year of CEO tenure, there is an estimated increase of approximately 12.731
units in CEO salary.
f) Linear Term
 The coefficient β2 represents the linear effect of age on salary.
 If β2 is positive, it indicates a positive linear relationship. As age increases, salary
increases.
 If β2 is negative, it indicates a negative linear relationship. As age increases,
salary decreases.
Quadratic Term
 The coefficient β3 represents the quadratic effect of age on salary.
 The quadratic term introduces curvature to the relationship. If β3 is positive, it
indicates an upward-curving relationship. Salary initially increases with age, but
the rate of increase slows down as age further increases.
 If β3 is negative, it indicates a downward-curving relationship. Salary initially
increases with age, but the rate of increase slows down, and eventually, salary
may start decreasing with very high ages.

5
The turning point is the age at which the quadratic effect transitions the relationship from
increasing to decreasing (or vice versa). The turning point represents the age at which the effect
of age on salary changes direction.
2. a) lowest= 0 highest=18
b) average education= 12.56274
c) average wage= 5.896103
d) female=252
e) male=274
f) people above mean= 217
g) wage=B0+B1educ+B2exper+B3tenure+u
i) Interpretation of Parameters in the Model:
reg wage educ exper tenure

Source | SS df MS Number of obs = 526


-------------+---------------------------------- F(3, 522) = 76.87
Model | 2194.1116 3 731.370532 Prob > F = 0.0000
Residual | 4966.30269 522 9.51398984 R-squared = 0.3064
-------------+---------------------------------- Adj R-squared = 0.3024
Total | 7160.41429 525 13.6388844 Root MSE = 3.0845

------------------------------------------------------------------------------
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .5989651 .0512835 11.68 0.000 .4982176 .6997126
exper | .0223395 .0120568 1.85 0.064 -.0013464 .0460254
tenure | .1692687 .0216446 7.82 0.000 .1267474 .2117899
_cons | -2.872735 .7289643 -3.94 0.000 -4.304799 -1.440671
 B0: The intercept. It represents the estimated wage when all independent variables (educ,
exper, and tenure) are zero.
 B1: The coefficient for educ. It represents the estimated change in wage for a one-unit
change in education level, holding exper and tenure constant.
 B2: The coefficient for exper. It represents the estimated change in wage for a one-unit
change in years of experience, holding educ and tenure constant.
 B3: The coefficient for tenure. It represents the estimated change in wage for a one-unit
change in tenure, holding educ and exper constant.
 U: The error term the unexplained
ii) log(wage)= B0+B1educ+B2exper+B3tenure+u
Interpretation remains similar, but now the coefficients represent percentage changes in wage
due to a one-unit change in the corresponding independent variable.
iii) Interpretation of R-Squared:
 R2 (coefficient of determination) measures the proportion of variability in the dependent
variable (wage or log(wage)) explained by the independent variables.

6
 For example, if R2=0.80, it means that 80% of the variability in wage (or log(wage)) is
explained by the variables in the model
iv) Interpretation of Adjusted R-Squared:
 Adjusted R2 takes into account the number of variables in the model. It penalizes the
addition of variables that do not improve the model significantly.
 It is useful when comparing models with different numbers of variables.
 A higher adjusted R2 suggests a better balance between model fit and complexity.
v) exper at 0.064
3. a) The coefficient of dtowndtown is 45.8.

 Interpretation: Holding the size (sqrmeter) and the number of bedrooms (bdrms) constant,
being in the town (dtown = 1) is associated with an increase in house price by 45.8 units
compared to being outside the town (dtown = 0)

b) The intercept is 14.6.

 Interpretation: When the size is zero (which might not be practically meaningful), the
number of bedrooms is zero, and the house is not in the town (dtown = 0), the estimated
house price is 14.6

c) R2=0.58 indicates that approximately 58% of the variability in house prices is explained by
the independent variables (sqrmeter, bdrms, dtown) in the model.

d) The coefficient of bdrmsbdrms is 75.25.

 Interpretation: Holding the size (sqrmeter) and the town indicator (dtown) constant,
having one additional bedroom is associated with an estimated increase in house price of
75.25 units.

e) price=14.6+90.52×140+75.25×4+45.8×1 =13034.2

f) i) price=14.6+90.52×2400+75.25×3+45.8×1= 217,534.15

ii) Residual = 800000-217534.15= 582,465.85

You might also like