Problem Sets 202324

UNIVERSITE PARIS 1 PANTHEON-SORBONNE
L3 ECONOMIE
INTRODUCTION A L’ECONOMETRIE
C.DOZ
&
INTRODUCTION TO ECONOMETRICS
T.BROER
PROBLEM SETS
ANNEE 2023-2024
NG : The notation for the problem sets may differ from that used
in Wooldridge’s textbook and in class. In particular, the model para-
meters may not be called β0 , ..., βk , but a and b (for the intercept and
slope parameter), α and β, etc.
For the data work, the commands in English versions of Excel will
differ from the French ones in the text.
1
Problem set 1
Data analysis with Excel

Descriptive statistics and graphs (recap)
EXERCISE :
1. Copy the "Region1_echantillon" file from the EPI to your computer and open it in
Excel. Read the description of the data in "Variable description". The data is sorted by
degree level : note the line numbers corresponding to each degree level as you will need
them throughout the work. You can also, but it is not essential, create 6 additional tabs
in your file and copy into each tab the data corresponding to a level of diploma : this
may facilitate certain manipulations (for example the creation of graphs).
2. In the "DONNEES" tab open "utilitaire d’analyse" then the "statistiques des-
criptives" tool.
i) Calculate these statistics for the AGE and SALRED variables, on the entire sample,
by checking "rapport détaillé". What do you see about the maximum and minimum
salary values ?
ii) Do the same thing again by adding the value of K for the option Kème maxi-
mum and Kème minimum so as to calculate the first and last percentile of the
distributions of variables AGE and SALRED.
iii) Do the same thing again by adding the value of K for the option Kème maximum
and Kème minimum so as to calculate the first and last decile of the distributions
of variables AGE and SALRED.
iv) Comment on the results obtained.
v) Create a CLASSE variable taking the following successive values (lines 2 to 12) :
1000, 1500, 2000, 2500, 3000, 3500, 4000, 5000, 6000, 7000, 8000.
vi) In "utilitaire d’analyse", choose the tool "Histogramme" with SALRED in the
"input range" and CLASSE in the "plage des classes" and check "Représentation
graphique". Comment.
3. Do the same as in questions (i) to (iv) of the previous question for each value of the
DDIPL variable and comment on the results obtained.
NB : it will be necessary to redefine the CLASSE variable in an appropriate manner for
each level of DDIPL, so that the histogram provides useful information. If this cannot be
done entirely during the session due to lack of time, we will only study the cases DDIPL
= 1, 3, 5, 7.
i) Go to the INSERTION tab and make a graph showing salary versus age for the
entire sample. What do you notice ?
ii) Sort the data in descending order of the SALRED value. Redo a graph by removing
the 3 individuals with the highest values of the SALRED variable. What do you
notice ?
iii) Same thing by only taking the data corresponding to the values between the first
and last percentile of the SALRED variable.
iv) Sort the data by degree level and make graphs showing salary versus age for each
degree level. Comment.
2
Problem Set 2
Linear model and introduction to OLS
EXERCISE 1 :
Consider the following relationships between y, x , K et L, which are variables on which we

have observations y1 , . . . yN , x1 , . . . xN , etc... a, b, α, β are parameters whose true value we do
not know.
Among these relationship, which can, ultimately, lead to a linear model, perhaps after a change
of variables, or a parameter change ?
Please write down the linear model in question, and indicate which ones are the parameters of
that model.
1. y = a + bx2
b
2. y = a + x
x
3. y = a + b
1
4. y = a + b+x
α β
5. y = AK L
6. y = AK α L1−α
a
7. y = 1+xb
EXERCISE 2 :
Consider a random sample y1 , . . . yN of i.i.d. random variables with expected value m.

We associate to that sample the linear model yn = m + εn , where the constant is the only
explanatory variable, m is the unknown parameter we try to estimate, and εn is the residual.
Solve the OLS normal equations associated to this model, i.e. derive m̂ by solving
N
X
Min (yn − m)2
m∈R n=1
Show that the extremum you find is indeed a minimum.
EXERCISE 3 :
N.B. This exercise is purely pedagogical - we usually do not estimate a model on

the basis of 10 observations !
You can do this exercise by hand, or using Microsoft Excel, or both. If you use Excel, do not
use Excel’s regression tools for this, but calculate variances and covariances by hand. Check
online the variance formula used, and correct it if necessary.
We have the following observations on two variables y and x, and want to estimate the
model yn = a + bxn + εn by OLS.
x 3 1,5 1 3 1 2,5 0,5 -1 0,5 2

y -4 -1 -2 -5,5 -1 -5 0,5 2 3 -4
1. Draw a scatter plot of the data.

2. Calculate â and b̂.
3
3. Draw the OLS regression line on the same graph as the scatter plot.
4. Calculate the fitted values ŷn , the residuals ε̂n , and the R2 of the regression. Comment.
EXERCISE 4 :
Consider the model yn = a + bxn + εn for which we have N observations.
i) Denote as â and b̂ the OLS estimators of a and b. Write down the formulae for â and b̂.
ii) Denote as x and y the sample means of x et y, and denote mean deviations as yen = yn −y
and xen = xn − x.
Consider the model without a constant : yen = βe
xn + un and denote β̂ the OLS estimator
β in this model.
Remember the OLS formula for β̂ and show that β̂ = b̂.
iii) Same question as ii. but with the model yn = γe
xn + v n .
iv) Which practical conclusion do you draw from these results ?
4
Problem Set 3
The simple linear regression model : properties of OLS
EXERCISE 1 :
Consider the quarterly sales of micro-computers during 5 years. Denote as qt , t = 1, . . . , 20

sales (in volume) during the 20 quarters, and as pt , t = 1, . . . , 20 the real price of computers
during this period (the real price is the ratio of a computer price index to a consumer price
index).
1. We study the model (M1) : ln qt = a + b ln pt + εt
i) Interpret the model coefficient a and b in this model.
ii) The observations give rise to the following results :
P20 P20
t=1 ln pt = −5.15, t=1 ln qt = 30.89
P20 2
P20 2
P20
t=1 (ln pt ) = 1.91, t=1 (ln qt ) = 72.53, t=1 (ln pt )(ln qt ) = −10.41
Calculate the estimates of a and b using OLS.
iii) Calculate the R2 .
2. Now consider the model (M2) ln(pt qt ) = c + d ln pt + ut
i) What can we say about c, d, and ut
ii) Calculate the estimators of c and d that you obtain using OLS.Verify that : ĉ = â
and dˆ = b̂ + 1.
iii) What can we say about the residuals in this model ?
iv) What can we say about the R2 ?
3. We are now interested in the model(M3) : qt = α + βpt + vt .
The observations give rise to the following results :
P20 P20
t=1 pt = 15.68, t=1 qt = 135.58
P20 P20 P20

t=1 p2t = 12.64, 2
t=1 qt = 956.18, t=1 pt qt = 102.86
i) Calculate the estimates of α et β using OLS.
ii) Compare them to those you found for a and b. Comment.
iii) Can we say anything about the estimated residuals and about the R2 of the model
when we compare it to that in question 1 ?
EXERCISE 2 :
Consider the model : yn = a + bxn + εn for which we have n observations.

Denote as R2 the (sample) determination coefficient obtained from estimating the model
and rxy the empirical (sample) correlation coefficient between x and y.
2
i) Show that rxy = R2 .
ii) How do you interpret a high estimated value for R2 ?
EXERCISE 3 :
We estimate the model yn = a + bxn + εn using OLS.
5
1. We assume that the error term εn satisfies the assumptions we made in the lecture :
which ones are these ?
2. What are the variances of the estimators V ar(â) and V ar(b̂).
3. What can we say about â and b̂ if the sample variance of xn is very large ?
4. What can we say about â and b̂ if the sample variance of xn is very small ?
5. Why do we assume that there is sample variance in xn (SLR3) ?
6. What would happen if all xn had the same value ? Does this seem intuitive ?
6
Problem Set 4
Data work with Excel

Scatter plots and simple regression
We take the "region1e chantillon”f ileonwhichwewillworkduringalltheappliedtutorialsessions.
The results of the regressions will be studied in more detail during the following
tutorial session : keep them.
1. In this file, there is no variable describing the number of years of experience of the indi-
vidual. Explain why the individual’s age is an imperfect approximation of their number
of years of experience. However, we will make this approximation throughout the rest of
the study.
2. Explain why it is more realistic, from the point of view of economic interpretation, to
estimate a linear model describing the logarithm of wages as a function of age rather
than a linear model describing the wage as a function of age.
3. i) Create the variable LogSALRED obtained by taking the logarithm of SALRED.
ii) Make graphs giving LogSALRED versus AGE for values of DDIPL equal to 1, 3, 5,
7. Comment.
4. In the remainder of the exercise, we will estimate the model using OLS
LogSALRED = a + bAGE + ε
i) For DDIPL=1, calculate : the sample covariance between LogSALRED and AGE
(reminder : there is a correction to be made in Excel : does it have a big impact
here ?), the sample variance of AGE, the coefficients b̂ and â in the linear regression
of LogSALRED on the constant and AGE.
ii) In the FORMULES tab under "plus de fonctions" look for the function DROI-
TEREG and perform the linear regression of LogSALRED on the constant and
AGE. Compare with (i).
iii) In the DONNEES tab look for "utilitaire d’analyse", then choose the tool
"Régression linéaire" and estimate the model : you will check "intitulé pré-
sent" so as to have the name of the variables in the outputs, as well as"niveau de
confiance 95%", "courbes des résidus", "courbes de régression" et "dia-
gramme de répartition des probabilités". Comment on the values obtained for
b̂ and for R2
In the entire series of applied TDs, we will use this function when we want
to do a linear regression.
iv) Repeat the same estimation for the different values of DDIPL (in case of lack of time,
you can limit yourself to DDIPL=1, DDIPL=3, DDIPL = 5, DDIPL=7). Comment.
v) Repeat the same estimation for DDIPL=1, but only taking observations between the
first and last percentile (you will need to sort the data beforehand). Compare with
the results obtained in the previous question.
7
Problem Set 5
Simple linear regression model : properties of OLS, and confidence
intervals in the Gaussian model Part I (1h)
EXERCISE1 :
Consider the model

yn = a + bxn + εn
where we assume that the error term is distributed i.i.d. according to N (0, σ 2 ).
We estimate this model with N = 24 observations using OLS.
The sample means, standard deviations and covariances are :
y = −0.9676, x = 0.1126
v v
u
u1 X N u
u1 X N
sy = t (yn − y)2 = 0.10413, sx = t (xn − x)2 = 0.2296 Covemp (x, y) = −0.01846
N n=1 N n=1
1. Calculate b̂ and â.

2. Calculate the R2 . Comment.
3. Calculate the estimate of the error variance σ̂ 2 .
4. i) Construct a 95% confidence interval for b.
ii) Construct a 99% confidence interval for b.
EXERCISE 2 :
Consider the sample in exercise 1 of Problem Set 3. Add the assumption that εn is distributed
i.i.d. according to N (0, σ 2 ).
1. Calculate the 95% confidence intervals for a, b, c, d. Comment.
i) Test the significance of the parameters at a 5% level.
ii) What are the "p-values" of the test statistics you calculated in i) ?
iii) The test of significance of d in (M2) is equivalent to testing a hypothesis about b in
(M1). State this hypothesis. What is the economic interpretation ?
iv) How would you carry out a test of the hypothesis about b you formulated in iii)
directly ?
8
Part II
Simple linear regression model using Excel

We take the regression results obtained in questions 3 (iv) and 3 (v) of Problem Set 4.
1. Check the construction of Student statistics on an example.
2. Check the construction of confidence intervals on an example.
3. Check the construction of "p-values" on an example.
4. Carry out the significance tests of the coefficients for all the regressions considered :
i) using the Student statistic
ii) using "p-values".
Comment on the results obtained.
9
Problem set 6
Simple linear regression model : confidence intervals, tests and

forecasts in the Gaussian model
EXERCISE 1 : Revision (mock-exam) exercise
A company doctor collected information from a sample of 30 young male employees. For each
of them, denote as yn his weight (in kg) and as xn his height (in cm).
Suppose that the underlying population model has the form yn = a + bxn + εn and that the
error term is distributed according to N (0, σ 2 ). Suppose that the sample is a random sample
from this distribution.
This model is estimated using OLS on the given sample. In particular, the sample means,
standard deviations and covariances are :
N
1 X
y = 86, 2 x = 183 Covemp (x, y) = (yn − y)(xn − x) = 168, 95
N n=1
v v
u
u1 X N u
u1 X N
sy = t 2
(yn − y) = 12, 83 sx = t (xn − x)2 = 15, 1
N n=1 N n=1
1. Calculate b̂ and â.

2. Calculate σ̂ 2 .
3. Conduct a significance test at the 5% level for the model parameters.
4. The reference value of b often used for a population of this type is b = 0, 7. We want
to test the Null hypothesis H0 : b = 0, 7 against the alternative H1 : b 6= 0, 7. Can you
reject H0 at the 5% level ?
5. We want to test the Null hypothesis H0 : b ≤ 0, 7 against the alternative H1 : b > 0, 7.

Can you reject H0 at the 5% level ?
6. Construct a 95% confidence interval for the weight of a young man with the same cha-
recteristics who is 1.85 m tall.
EXERCISE 2 :
Consider again the data in Exercise 1 in Problem set 5.

Construct a forecast interval for y when x = 0, 12 :
— at the 95% confidence level
— at the 99% confidence level
Compare the two intervals and comment.
10
Problem set 7 (1h)
Simple linear regression : matrix notation and random vectors
EXERCISE 1 :
Consider again the data in Exercise 3 in Problem set 1.
x 3 1,5 1 3 1 2,5 0,5 -1 0,5 2

y -4 -1 -2 -5,5 -1 -5 0,5 2 3 -4
1. Write the model in matrix form, denoted as y = Xβ + ε

(define exactly y, X, β et ε here).
2. Calculate X 0 y, X 0 X, (X 0 X)−1 , and use them to calculate β̂.
Verify that the results are identical to those you obtained before.
3. Calculate ŷ and ε̂. Again, verify that the results are identical to those you obtained
before.
EXERCISE  2 :
Y1
Consider a random vector Y =  Y2  with expected value
Y3
  
1 1 1/2 −1/2
EY =  0  and VY =  1/2 2 1 .
−2 −1/2 1 2

2Y1 + Y2
Define Z = .
Y1 − Y3
Calculate EZ and VZ :
i) term-by-term for every entry of EZ and VZ.
ii) using matrix notation as seen in class.
11
Problem Set 8
Introduction to multivariate regression analysis.

Part I
EXERCISE 1 :
1. Consider the following wage equation :
ln wn = b0 + b1 indFn + b2 etudn + εn (M1)
where :
— indFn is a variable that takes the value 1 if individual n is a woman, 0 otherwise
— etudn is the number of years of study that individual n has completed.
How do you interpret the coefficients of the model ?
2. Define a new variable indHn that takes value 1 if individual n is a man and 0 otherwise.
Can we estimate the model (M2) defined as : ln wn = a0 +a1 indFn +a2 indHn +a3 etudn +
εn ? Why, or why not ?
3. Consider now the model (M3)
ln wn = c1 indFn + c2 indHn + c3 etudn + εn
Can we estimate model (M3) ? Why, or why not ? Interpret the coefficients of the new
model.
4. What is the mathematical relationship between the parameters of model (M1) and (M3) ?
5. Show that the estimators of the parameters in (M1) and (M3) are linked by the same
relationships as those that link the parameters of these models (use the definition of the
OLS estimators).
6. Consider now the model (M4) :
ln wn = d0 + d1 indFn + d2 etudn + d3 indFn ∗ etudn + εn
(where ∗ indicates the product of two variables).

How do you interpret the coefficient d3 ? In the spirit of question 3, can you propose an
equivalent formulation of this model ?
12
EXERCISE 2 :
Consider the following 2 models :
yn = b0 + b1 x1n + b2 x2n + un (M 1)
yn = b0 + b1 x1n + b2 x2n + b3 x3n + vn (M 2)
The two models are estimated using OLS on N observations. Denote

• for (M 1) :
— û the vector
PN of estimated residuals
— SSR1 = n=1 û2n the residual sum of squares
— R12 the coefficient of determination
• for (M 2) :
— v̂ the vector
PNof estimated residuals
— SSR2 = n=1 v̂n2 the residual sum of squares
— R22 the coefficient of determination
1. Write, for i = 1, 2 the relationship between Ri2 and SSRi .
2. Write the OLS problems associated to models (M 1) et (M 2).
3. Show that SSR1 ≥ SSR2 , since R12 ≤ R22 . Comment.

2
4. Establish a general formula that allows to calculate an adjusted R2 (i.e. the R you saw
in class).
2 2
5. Denote as R1 and R2 the adjusted R2 of the two regressions associated to (M 1) and
(M 2).
2 2
What can you say about R1 and R2 ? Comment.
13
Part II : Data analysis
Preliminary remarks :
• to estimate a model with several explanatory variables, using the "Régression linéaire"
tool of the "utilitaire d’analyse", these variables must be placed in contiguous columns.
Then define the associated rectangular range under "Plage pour les variables X"
• when you use the "Régression linéaire" tool of the "utilitaire d’analyse", you will
check the same boxes as what was done for the regression simple (see question 4 (iii) of
sheet no. 4).
Modeling of salary by level of diploma based on new variables

1. We first wish to check whether the relationship between age and the logarithm of salary
can be non-linear.
i) Create the variable AGE2 equal to the square of the variable AGE.
ii) We want to estimate the model
LogSALRED = b0 + b1 AGE + b2 AGE2 + ε
Interpret this model and explain why it is relevant. What do you think is the sign of
the parameter b2 in this model ?
iii) Estimate the model for each degree level (or possibly only for DDIPL=1, DDIPL=3,
DDIPL=5, DDIPL=7) and comment in detail on the results obtained.
2. In PS 10, we will also look at possible differences between men and women within the
framework of this model.
Create an indicator variable HOM which is equal to 1 if the individual is a man and
0 otherwise, and an indicator variable FEM which is equal to 1 if the individual is a
woman and 0 otherwise : you will use the FORMULES tab then the section Logique
and the function SI as well as the variable SEXE which is worth 1 if the individual is a
man and 2 if it is a woman.
3. Keep the variables AGE2, HOM, FEM in your data file.
14
Problem Set 9
Properties of the multivariate linear regression model,

OLS normal equations
EXERCISE 1 :
In an econometric study in the US on a sample of 4000 employees, the authors modeled the
mean hourly salary of every individual during the year 1998 as a function of her or his highest
educational degree, gender and age.
The explanatory variables are :
— Educ is an indicator variable that takes value 1 if the individual has a college degree,
and 0 otherwise
— Female is an indicator variable that takes value 1 if the individual is female, and 0
otherwise
— Age is the individual’s age
The estimation results are the following
Regressor Model (1) Model (2)

Educ 5.46 5.48
Female -2.64 -2.62
Age 0.29
Intercept 12.69 4.40
SER 6.27 6.22
R2 0.176 0.190
2
R
Note : SER denotes “standard error” and corresponds to the σ̂ you have seen in class.
1. Write down the models (M 1) et (M 2) that are estimated explicitly.

2. Explain carefully how we should interpret the coefficients that appear in each of the two
models.
3. Comment on the sign of the estimated coefficients, and compare the values obtained for
the two models.
4. Discuss the values of R2 in both models with reference to EXERCISE 2 in problem set
6.
5. Which test would you carry out in order to test the null hypothesis of equal pay for men
and women in these models ? Does the statistical output above allow you to carry out
this test ?
2
6. Calculate the adjusted R in both regressions. Comment.
15
EXERCISE 2 :
Consider a model of the form yn = b0 + b1 x1n + b2 x2n + εn and assume that the error εn is
distributed according to N (0, σ 2 ). Furthermore, assume that the observations are obtained
through random sampling.
   
b0 b̂0
Denote β =  b1  and β̂ =  b̂1  the OLS estimator of β.
b2 b̂2
1. Write down the model and the expression for the estimator β̂ using matrix notation.
2. Using what you saw in class, show that β̂ ∼ N β, σ 2 (X 0 X)−1

1
− 12
 
1 2
3. Suppose that (X 0 X)−1 =  21 2 1 
4
1 1
−2 4 1
i) What is the distribution for b̂0 , and for b̂2 .

b̂1
ii) What is the distribution for the vector Z = ?
b̂2
iii) What is the distribution for the real random variable 2b̂1 − b̂2 ?
16
Problem set 10
Multiple regression :
Data work with Excel continued
1. Perform the LogSALRED regression on the variables HOM, AGE, AGE2 and on the
constant for each level of diploma (or possibly only for DDIPL=1, DDIPL=4, DDIPL=7).
i) How is the coefficient of the HOM variable interpreted in this framework ?
ii) Is this coefficient significant at the 5% level ? How is this interpreted ?
iii) We denote b1 for this coefficient. Perform the test of H0 : b1 ≥ 0 against H1 : b1 < 0
at a 5% level of significance. How is this interpreted ?
2. i) Perform the LogSALRED regression on the variables FEM, AGE, AGE2 and on
the constant for each level of diploma (or possibly only for DDIPL=1, DDIPL=4,
DDIPL=7). How is the coefficient of the FEM variable interpreted in this framework ?
Is this coefficient significant at the 5% level of significant ?
ii) For a given level of diploma (for example for DDIPL=1) what are the relationships
between the coefficients of the equations estimated in (1) and (2) ? Study these re-
lationships from a theoretical perspective, then check that they are satisfied by the
estimated coefficients.
3. This question only concerns the case DDIPL=1.

i) Perform the LogSALRED regression on the variables HOM, FEM, AGE, AGE2 :
what is happening and why ?
ii) Perform the LogSALRED regression on the variables HOM, FEM, AGE, AGE2 wi-
thout constant term (check the box "intersection à l’origine").
The following questions concern the results of this regression.
iii) Are the different coefficients of the model significant at the 5% level ?
iv) What is happening with the R2 and what can we say about it ?
v) How are the coefficients associated with the HOM and FEM variables interpreted ?
vi) Can we test the equality of these two coefficients with the available tools and results ?
vii) What are the relationships between the coefficients of the equation estimated in
this question and the equation estimated in (1) for DDIPL=1 ? Verify that these
relationships hold for the estimated coefficients.
viii) Among the models considered in this question, which are those whose formulation
is the simplest for testing the existence of a difference between the salaries of men
and women ?
4. Do you think that the models used in the previous questions are relevant for testing the
possibility of salary discrimination between men and women ? For what ?
5. Optional question (to be answered only if there is time left during this tutorial ses-
sion) :
i) Create indicator variables for the level of diploma that you will note NIV1, NIV3,. . . ,
NIV7 : these variables will therefore be defined as follows : NIVi =1 if the diploma
is equal to i and 0 otherwise (you will proceed in a similar way to what was done to
create the HOM and FEM variables).
17
ii) Perform the LogSALRED regression on the variables HOM, NIV1, NIV3,. . . ,NIV6,
AGE, AGE2, and on the constant. Why is the NIV7 variable not included in this list
of explanatory variables ?
Carefully explain how the coefficients of the different variables in this model are
interpreted. Are these coefficients significant at the 5% level ? What is the point of
the formulation used in this model ? What information are we losing compared to
the models estimated by level of diploma in question 2 (vii) ?
iii) What could we do to avoid losing this information ?
18
Problem Set 11
Multiple regression :
tests and forecast intervals
EXERCISE 1 :
We try to understand how the number of students in university cities affects rents. For this,
we study a sample of 64 cities.
For each city, denote loy the (natural) logarithm of the mean rent per square meter m2 for
rental appartments, pop the logarithm of the population, revmoy the logarithm of mean
household income and pctstu the percentage of students in the population. We estimate the
following model :
loy = b0 + b1 pop + b2 revmoy + b3 pctstu + ε
and assume that the error term is distributed normally according to N (0, σ 2 ). Furthermore,
we assume that we have a random sample from this model.
1. i) Interpret precisely the model coefficients.
ii) Which sign would you expect b1 , b2 et b3 to have ?
2. The estimation yields the following results (standard errors are given in parentheses
below the coefficient)
loy
c = 0.043 + 0.072 pop + 0.507 revmoy + 0.0056 pctstu
(0.844) (0.035) (0.081) (0.0017)
2
R = 0.458
i) Test the significance of the coefficients at the 5 % level.

ii) What is the expected impact, ceteris paribus, if the population of a city rises by 2 %
(indicate what exactly “ceteris paribus" means in this context).
iii) Formulate the Null hypothesis that the percentage share of students in the city po-
pulation does not have an effect on rents per square meter. How would you formulate
the alternative hypothesis, bearing in mind the sign you expect the corresponding
coefficient to have ? Carry out this test at the 5 % level.
19
EXERCISE 2 :
Consider the following linear model : yn = b0 + b1 xn1 + b2 xn2 + εn where we assumed that the
error term is distributed according to N (0, σ 2 ).
We estimate themodel on a random sample of N = 25 observations. This yields the following
results :
 
0.8
β̂ = 1.2 , SCR = 3.368
0.3
The estimated variance-covariance matrix of β̂ is

 
0.13 −0.11 −0.013
V̂ β̂ = σ̂ 2 (X 0 X)−1 =  −0.11 0.31 0.026 
−0.013 0.026 0.0126
1. Test the significance of the different coefficients at the 5 % level.

2. Test, at the 5 % level, the Null hypothesis that H0 : b2 ≥ 0.5 against the alternative
hypothesis H1 : b2 < 0.5.
3. i) What is the distribution of b̂1 − b̂2 ?
ii) Calculate the estimated variance of b̂1 − b̂2 .
iii) Construct the test of the Null hypothesis H0 : b1 − b2 = 1 against the alternative
H1 : b1 − b2 6= 1. Carry out this test at the 5 % level.
4. Suppose we know : x26,1 = 1 et x26,2 = −1. Construct a 95% forecast interval for y26 .
EXERCISE 3 :
We estimate a Cobb-Douglas production function in growth rates. We estimate it on data
from a given industrial sector, using annual data for T = 25 years.
Denote :
— q̇t the yearly growth of total production in volume terms
— l˙t the yearly growth labour inputs
— k̇t the yearly growth of capital inputs
The estimated model is :
q̇t = c + al˙t + bk̇t + εt , t = 1, . . . , T
We assume the error term is distributed according to N (0, σ 2 ).

We regard the data as a random sample from the model. The results are :
 
0.0029
β̂ =  0.56 
0.39
and the estimated variance-covariance matrix of β̂ is
1.63 ∗ 10−6 −1.1 ∗ 10−4 −3.15 ∗ 10−5

 
V̂ β̂ = σ̂ 2 (X 0 X)−1 =  −1.1 ∗ 10−4 0.0242 4.99 ∗ 10−3 

−5 −3
−3.15 ∗ 10 4.99 ∗ 10 5.96 ∗ 10−3
1. Test the significance of the different coefficients at the 5 % level.

2. Test, at the 5 % level, the Null hypothesis that H0 : b2 ≥ 0.5 against the alternative
hypothesis H1 : b2 < 0.5.
20
3. i) What is the distribution of â + b̂ ?
ii) Calculate the estimated variance of â + b̂.
iii) Construct the test for the Null hypothesis H0 : a + b = 1 against the alternative
H1 : a + b 6= 1 How do you interpret the Null hypothesis ? Carry out this test at the
5% level.
4. Construct a 95 % forecast interval for q if l˙ = 1 and k̇ = 0.5.
21

Problem Sets 202324

Uploaded by

Copyright:

Available Formats

Problem Sets 202324

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Problem Sets 202324

Uploaded by

Copyright:

Available Formats

UNIVERSITE PARIS 1 PANTHEON-SORBONNE

Data analysis with Excel

Consider the following relationships between y, x , K et L, which are variables on which we

Consider a random sample y1 , . . . yN of i.i.d. random variables with expected value m.

Show that the extremum you find is indeed a minimum.

N.B. This exercise is purely pedagogical - we usually do not estimate a model on

x 3 1,5 1 3 1 2,5 0,5 -1 0,5 2

1. Draw a scatter plot of the data.

Consider the quarterly sales of micro-computers during 5 years. Denote as qt , t = 1, . . . , 20

P20 P20 P20

Consider the model : yn = a + bxn + εn for which we have n observations.

We estimate the model yn = a + bxn + εn using OLS.

Data work with Excel

Consider the model

1. Calculate b̂ and â.

Simple linear regression model using Excel

2. Check the construction of confidence intervals on an example.

3. Check the construction of "p-values" on an example.

Simple linear regression model : confidence intervals, tests and

EXERCISE 1 : Revision (mock-exam) exercise

1. Calculate b̂ and â.

5. We want to test the Null hypothesis H0 : b ≤ 0, 7 against the alternative H1 : b > 0, 7.

Consider again the data in Exercise 1 in Problem set 5.

x 3 1,5 1 3 1 2,5 0,5 -1 0,5 2

1. Write the model in matrix form, denoted as y = Xβ + ε

Introduction to multivariate regression analysis.

ln wn = b0 + b1 indFn + b2 etudn + εn (M1)

ln wn = c1 indFn + c2 indHn + c3 etudn + εn

ln wn = d0 + d1 indFn + d2 etudn + d3 indFn ∗ etudn + εn

(where ∗ indicates the product of two variables).

The two models are estimated using OLS on N observations. Denote

2. Write the OLS problems associated to models (M 1) et (M 2).

3. Show that SSR1 ≥ SSR2 , since R12 ≤ R22 . Comment.

Modeling of salary by level of diploma based on new variables

LogSALRED = b0 + b1 AGE + b2 AGE2 + ε

Properties of the multivariate linear regression model,

The estimation results are the following

Regressor Model (1) Model (2)

1. Write down the models (M 1) et (M 2) that are estimated explicitly.

3. This question only concerns the case DDIPL=1.

loy = b0 + b1 pop + b2 revmoy + b3 pctstu + ε

i) Test the significance of the coefficients at the 5 % level.

The estimated variance-covariance matrix of β̂ is

1. Test the significance of the different coefficients at the 5 % level.

q̇t = c + al˙t + bk̇t + εt , t = 1, . . . , T

We assume the error term is distributed according to N (0, σ 2 ).

and the estimated variance-covariance matrix of β̂ is

1.63 ∗ 10−6 −1.1 ∗ 10−4 −3.15 ∗ 10−5

V̂ β̂ = σ̂ 2 (X 0 X)−1 =  −1.1 ∗ 10−4 0.0242 4.99 ∗ 10−3 

1. Test the significance of the different coefficients at the 5 % level.

You might also like