Problem Sets 202324
Problem Sets 202324
Problem Sets 202324
L3 ECONOMIE
INTRODUCTION A L’ECONOMETRIE
C.DOZ
&
INTRODUCTION TO ECONOMETRICS
T.BROER
PROBLEM SETS
ANNEE 2023-2024
NG : The notation for the problem sets may differ from that used
in Wooldridge’s textbook and in class. In particular, the model para-
meters may not be called β0 , ..., βk , but a and b (for the intercept and
slope parameter), α and β, etc.
For the data work, the commands in English versions of Excel will
differ from the French ones in the text.
1
Problem set 1
1. Copy the "Region1_echantillon" file from the EPI to your computer and open it in
Excel. Read the description of the data in "Variable description". The data is sorted by
degree level : note the line numbers corresponding to each degree level as you will need
them throughout the work. You can also, but it is not essential, create 6 additional tabs
in your file and copy into each tab the data corresponding to a level of diploma : this
may facilitate certain manipulations (for example the creation of graphs).
2. In the "DONNEES" tab open "utilitaire d’analyse" then the "statistiques des-
criptives" tool.
i) Calculate these statistics for the AGE and SALRED variables, on the entire sample,
by checking "rapport détaillé". What do you see about the maximum and minimum
salary values ?
ii) Do the same thing again by adding the value of K for the option Kème maxi-
mum and Kème minimum so as to calculate the first and last percentile of the
distributions of variables AGE and SALRED.
iii) Do the same thing again by adding the value of K for the option Kème maximum
and Kème minimum so as to calculate the first and last decile of the distributions
of variables AGE and SALRED.
iv) Comment on the results obtained.
v) Create a CLASSE variable taking the following successive values (lines 2 to 12) :
1000, 1500, 2000, 2500, 3000, 3500, 4000, 5000, 6000, 7000, 8000.
vi) In "utilitaire d’analyse", choose the tool "Histogramme" with SALRED in the
"input range" and CLASSE in the "plage des classes" and check "Représentation
graphique". Comment.
3. Do the same as in questions (i) to (iv) of the previous question for each value of the
DDIPL variable and comment on the results obtained.
NB : it will be necessary to redefine the CLASSE variable in an appropriate manner for
each level of DDIPL, so that the histogram provides useful information. If this cannot be
done entirely during the session due to lack of time, we will only study the cases DDIPL
= 1, 3, 5, 7.
i) Go to the INSERTION tab and make a graph showing salary versus age for the
entire sample. What do you notice ?
ii) Sort the data in descending order of the SALRED value. Redo a graph by removing
the 3 individuals with the highest values of the SALRED variable. What do you
notice ?
iii) Same thing by only taking the data corresponding to the values between the first
and last percentile of the SALRED variable.
iv) Sort the data by degree level and make graphs showing salary versus age for each
degree level. Comment.
2
Problem Set 2
Linear model and introduction to OLS
EXERCISE 1 :
EXERCISE 2 :
N
X
Min (yn − m)2
m∈R n=1
EXERCISE 3 :
3
3. Draw the OLS regression line on the same graph as the scatter plot.
4. Calculate the fitted values ŷn , the residuals ε̂n , and the R2 of the regression. Comment.
EXERCISE 4 :
Consider the model yn = a + bxn + εn for which we have N observations.
i) Denote as â and b̂ the OLS estimators of a and b. Write down the formulae for â and b̂.
ii) Denote as x and y the sample means of x et y, and denote mean deviations as yen = yn −y
and xen = xn − x.
Consider the model without a constant : yen = βe
xn + un and denote β̂ the OLS estimator
β in this model.
Remember the OLS formula for β̂ and show that β̂ = b̂.
iii) Same question as ii. but with the model yn = γe
xn + v n .
iv) Which practical conclusion do you draw from these results ?
4
Problem Set 3
The simple linear regression model : properties of OLS
EXERCISE 1 :
P20 2
P20 2
P20
t=1 (ln pt ) = 1.91, t=1 (ln qt ) = 72.53, t=1 (ln pt )(ln qt ) = −10.41
Calculate the estimates of a and b using OLS.
iii) Calculate the R2 .
2. Now consider the model (M2) ln(pt qt ) = c + d ln pt + ut
i) What can we say about c, d, and ut
ii) Calculate the estimators of c and d that you obtain using OLS.Verify that : ĉ = â
and dˆ = b̂ + 1.
iii) What can we say about the residuals in this model ?
iv) What can we say about the R2 ?
3. We are now interested in the model(M3) : qt = α + βpt + vt .
The observations give rise to the following results :
P20 P20
t=1 pt = 15.68, t=1 qt = 135.58
EXERCISE 2 :
EXERCISE 3 :
5
1. We assume that the error term εn satisfies the assumptions we made in the lecture :
which ones are these ?
2. What are the variances of the estimators V ar(â) and V ar(b̂).
3. What can we say about â and b̂ if the sample variance of xn is very large ?
4. What can we say about â and b̂ if the sample variance of xn is very small ?
5. Why do we assume that there is sample variance in xn (SLR3) ?
6. What would happen if all xn had the same value ? Does this seem intuitive ?
6
Problem Set 4
The results of the regressions will be studied in more detail during the following
tutorial session : keep them.
1. In this file, there is no variable describing the number of years of experience of the indi-
vidual. Explain why the individual’s age is an imperfect approximation of their number
of years of experience. However, we will make this approximation throughout the rest of
the study.
2. Explain why it is more realistic, from the point of view of economic interpretation, to
estimate a linear model describing the logarithm of wages as a function of age rather
than a linear model describing the wage as a function of age.
3. i) Create the variable LogSALRED obtained by taking the logarithm of SALRED.
ii) Make graphs giving LogSALRED versus AGE for values of DDIPL equal to 1, 3, 5,
7. Comment.
4. In the remainder of the exercise, we will estimate the model using OLS
LogSALRED = a + bAGE + ε
i) For DDIPL=1, calculate : the sample covariance between LogSALRED and AGE
(reminder : there is a correction to be made in Excel : does it have a big impact
here ?), the sample variance of AGE, the coefficients b̂ and â in the linear regression
of LogSALRED on the constant and AGE.
ii) In the FORMULES tab under "plus de fonctions" look for the function DROI-
TEREG and perform the linear regression of LogSALRED on the constant and
AGE. Compare with (i).
iii) In the DONNEES tab look for "utilitaire d’analyse", then choose the tool
"Régression linéaire" and estimate the model : you will check "intitulé pré-
sent" so as to have the name of the variables in the outputs, as well as"niveau de
confiance 95%", "courbes des résidus", "courbes de régression" et "dia-
gramme de répartition des probabilités". Comment on the values obtained for
b̂ and for R2
In the entire series of applied TDs, we will use this function when we want
to do a linear regression.
iv) Repeat the same estimation for the different values of DDIPL (in case of lack of time,
you can limit yourself to DDIPL=1, DDIPL=3, DDIPL = 5, DDIPL=7). Comment.
v) Repeat the same estimation for DDIPL=1, but only taking observations between the
first and last percentile (you will need to sort the data beforehand). Compare with
the results obtained in the previous question.
7
Problem Set 5
Simple linear regression model : properties of OLS, and confidence
intervals in the Gaussian model Part I (1h)
EXERCISE1 :
y = −0.9676, x = 0.1126
v v
u
u1 X N u
u1 X N
sy = t (yn − y)2 = 0.10413, sx = t (xn − x)2 = 0.2296 Covemp (x, y) = −0.01846
N n=1 N n=1
EXERCISE 2 :
Consider the sample in exercise 1 of Problem Set 3. Add the assumption that εn is distributed
i.i.d. according to N (0, σ 2 ).
1. Calculate the 95% confidence intervals for a, b, c, d. Comment.
i) Test the significance of the parameters at a 5% level.
ii) What are the "p-values" of the test statistics you calculated in i) ?
iii) The test of significance of d in (M2) is equivalent to testing a hypothesis about b in
(M1). State this hypothesis. What is the economic interpretation ?
iv) How would you carry out a test of the hypothesis about b you formulated in iii)
directly ?
8
Part II
4. Carry out the significance tests of the coefficients for all the regressions considered :
i) using the Student statistic
ii) using "p-values".
Comment on the results obtained.
9
Problem set 6
A company doctor collected information from a sample of 30 young male employees. For each
of them, denote as yn his weight (in kg) and as xn his height (in cm).
Suppose that the underlying population model has the form yn = a + bxn + εn and that the
error term is distributed according to N (0, σ 2 ). Suppose that the sample is a random sample
from this distribution.
This model is estimated using OLS on the given sample. In particular, the sample means,
standard deviations and covariances are :
N
1 X
y = 86, 2 x = 183 Covemp (x, y) = (yn − y)(xn − x) = 168, 95
N n=1
v v
u
u1 X N u
u1 X N
sy = t 2
(yn − y) = 12, 83 sx = t (xn − x)2 = 15, 1
N n=1 N n=1
6. Construct a 95% confidence interval for the weight of a young man with the same cha-
recteristics who is 1.85 m tall.
EXERCISE 2 :
10
Problem set 7 (1h)
Simple linear regression : matrix notation and random vectors
EXERCISE 1 :
Consider again the data in Exercise 3 in Problem set 1.
EXERCISE 2 :
Y1
Consider a random vector Y = Y2 with expected value
Y3
1 1 1/2 −1/2
EY = 0 and VY = 1/2 2 1 .
−2 −1/2 1 2
2Y1 + Y2
Define Z = .
Y1 − Y3
Calculate EZ and VZ :
i) term-by-term for every entry of EZ and VZ.
ii) using matrix notation as seen in class.
11
Problem Set 8
EXERCISE 1 :
1. Consider the following wage equation :
where :
— indFn is a variable that takes the value 1 if individual n is a woman, 0 otherwise
— etudn is the number of years of study that individual n has completed.
How do you interpret the coefficients of the model ?
2. Define a new variable indHn that takes value 1 if individual n is a man and 0 otherwise.
Can we estimate the model (M2) defined as : ln wn = a0 +a1 indFn +a2 indHn +a3 etudn +
εn ? Why, or why not ?
3. Consider now the model (M3)
Can we estimate model (M3) ? Why, or why not ? Interpret the coefficients of the new
model.
4. What is the mathematical relationship between the parameters of model (M1) and (M3) ?
5. Show that the estimators of the parameters in (M1) and (M3) are linked by the same
relationships as those that link the parameters of these models (use the definition of the
OLS estimators).
6. Consider now the model (M4) :
12
EXERCISE 2 :
Consider the following 2 models :
yn = b0 + b1 x1n + b2 x2n + un (M 1)
yn = b0 + b1 x1n + b2 x2n + b3 x3n + vn (M 2)
13
Part II : Data analysis
Preliminary remarks :
• to estimate a model with several explanatory variables, using the "Régression linéaire"
tool of the "utilitaire d’analyse", these variables must be placed in contiguous columns.
Then define the associated rectangular range under "Plage pour les variables X"
• when you use the "Régression linéaire" tool of the "utilitaire d’analyse", you will
check the same boxes as what was done for the regression simple (see question 4 (iii) of
sheet no. 4).
Interpret this model and explain why it is relevant. What do you think is the sign of
the parameter b2 in this model ?
iii) Estimate the model for each degree level (or possibly only for DDIPL=1, DDIPL=3,
DDIPL=5, DDIPL=7) and comment in detail on the results obtained.
2. In PS 10, we will also look at possible differences between men and women within the
framework of this model.
Create an indicator variable HOM which is equal to 1 if the individual is a man and
0 otherwise, and an indicator variable FEM which is equal to 1 if the individual is a
woman and 0 otherwise : you will use the FORMULES tab then the section Logique
and the function SI as well as the variable SEXE which is worth 1 if the individual is a
man and 2 if it is a woman.
3. Keep the variables AGE2, HOM, FEM in your data file.
14
Problem Set 9
EXERCISE 1 :
In an econometric study in the US on a sample of 4000 employees, the authors modeled the
mean hourly salary of every individual during the year 1998 as a function of her or his highest
educational degree, gender and age.
The explanatory variables are :
— Educ is an indicator variable that takes value 1 if the individual has a college degree,
and 0 otherwise
— Female is an indicator variable that takes value 1 if the individual is female, and 0
otherwise
— Age is the individual’s age
Note : SER denotes “standard error” and corresponds to the σ̂ you have seen in class.
15
EXERCISE 2 :
Consider a model of the form yn = b0 + b1 x1n + b2 x2n + εn and assume that the error εn is
distributed according to N (0, σ 2 ). Furthermore, assume that the observations are obtained
through random sampling.
b0 b̂0
Denote β = b1 and β̂ = b̂1 the OLS estimator of β.
b2 b̂2
1. Write down the model and the expression for the estimator β̂ using matrix notation.
2. Using what you saw in class, show that β̂ ∼ N β, σ 2 (X 0 X)−1
1
− 12
1 2
3. Suppose that (X 0 X)−1 = 21 2 1
4
1 1
−2 4 1
i) What is the distribution for b̂0 , and for b̂2 .
b̂1
ii) What is the distribution for the vector Z = ?
b̂2
iii) What is the distribution for the real random variable 2b̂1 − b̂2 ?
16
Problem set 10
Multiple regression :
Data work with Excel continued
1. Perform the LogSALRED regression on the variables HOM, AGE, AGE2 and on the
constant for each level of diploma (or possibly only for DDIPL=1, DDIPL=4, DDIPL=7).
i) How is the coefficient of the HOM variable interpreted in this framework ?
ii) Is this coefficient significant at the 5% level ? How is this interpreted ?
iii) We denote b1 for this coefficient. Perform the test of H0 : b1 ≥ 0 against H1 : b1 < 0
at a 5% level of significance. How is this interpreted ?
2. i) Perform the LogSALRED regression on the variables FEM, AGE, AGE2 and on
the constant for each level of diploma (or possibly only for DDIPL=1, DDIPL=4,
DDIPL=7). How is the coefficient of the FEM variable interpreted in this framework ?
Is this coefficient significant at the 5% level of significant ?
ii) For a given level of diploma (for example for DDIPL=1) what are the relationships
between the coefficients of the equations estimated in (1) and (2) ? Study these re-
lationships from a theoretical perspective, then check that they are satisfied by the
estimated coefficients.
5. Optional question (to be answered only if there is time left during this tutorial ses-
sion) :
i) Create indicator variables for the level of diploma that you will note NIV1, NIV3,. . . ,
NIV7 : these variables will therefore be defined as follows : NIVi =1 if the diploma
is equal to i and 0 otherwise (you will proceed in a similar way to what was done to
create the HOM and FEM variables).
17
ii) Perform the LogSALRED regression on the variables HOM, NIV1, NIV3,. . . ,NIV6,
AGE, AGE2, and on the constant. Why is the NIV7 variable not included in this list
of explanatory variables ?
Carefully explain how the coefficients of the different variables in this model are
interpreted. Are these coefficients significant at the 5% level ? What is the point of
the formulation used in this model ? What information are we losing compared to
the models estimated by level of diploma in question 2 (vii) ?
iii) What could we do to avoid losing this information ?
18
Problem Set 11
Multiple regression :
tests and forecast intervals
EXERCISE 1 :
We try to understand how the number of students in university cities affects rents. For this,
we study a sample of 64 cities.
For each city, denote loy the (natural) logarithm of the mean rent per square meter m2 for
rental appartments, pop the logarithm of the population, revmoy the logarithm of mean
household income and pctstu the percentage of students in the population. We estimate the
following model :
and assume that the error term is distributed normally according to N (0, σ 2 ). Furthermore,
we assume that we have a random sample from this model.
1. i) Interpret precisely the model coefficients.
ii) Which sign would you expect b1 , b2 et b3 to have ?
2. The estimation yields the following results (standard errors are given in parentheses
below the coefficient)
loy
c = 0.043 + 0.072 pop + 0.507 revmoy + 0.0056 pctstu
(0.844) (0.035) (0.081) (0.0017)
2
R = 0.458
19
EXERCISE 2 :
Consider the following linear model : yn = b0 + b1 xn1 + b2 xn2 + εn where we assumed that the
error term is distributed according to N (0, σ 2 ).
We estimate themodel on a random sample of N = 25 observations. This yields the following
results :
0.8
β̂ = 1.2 , SCR = 3.368
0.3
EXERCISE 3 :
We estimate a Cobb-Douglas production function in growth rates. We estimate it on data
from a given industrial sector, using annual data for T = 25 years.
Denote :
— q̇t the yearly growth of total production in volume terms
— l˙t the yearly growth labour inputs
— k̇t the yearly growth of capital inputs
The estimated model is :
20
3. i) What is the distribution of â + b̂ ?
ii) Calculate the estimated variance of â + b̂.
iii) Construct the test for the Null hypothesis H0 : a + b = 1 against the alternative
H1 : a + b 6= 1 How do you interpret the Null hypothesis ? Carry out this test at the
5% level.
4. Construct a 95 % forecast interval for q if l˙ = 1 and k̇ = 0.5.
21