HW1 2023
HW1 2023
HW1 2023
28 October 2023
Overview / Instructions
This homework will be due on 11 November 2023 by 11:55 PM via Moodle.
You are required to submit 1) original R Markdown file and 2) knitted HTML or PDF file.
Please provide comments for R code wherever you see appropriate. Nice formatting of the
assignment will have extra points.
In general, be as concise as possible while giving a fully complete answer. All necessary
data are available in Moodle.
Remember that the Class Policy strictly applies to homework. You are encouraged to
discuss with fellow students. However, each student has to know how to answer the
questions on her/his own. Note that the final exam is individually based.
Question 0
Review the lectures.
Please derive the relationships between {𝛼! , 𝛼" , 𝛼# , 𝛼$ } and {𝛽! , 𝛽" , 𝛽# , 𝛽$ }.
Question 2: Production Time Run
ProdTime.dat contains information about 20 production runs supervised by each of three
managers. Each observation gives the time (in minutes) to complete the task, Time for Run,
as well as the number of units produced, Run Size, and the manager involved, Manager.
Which manager performs the best?
Our goal is to find the factors/variables which relate to violent crime. This variable is
included in crime as crime$violentcrimes.perpop.
Q4.1
Divide your data into 80% training and 20% testing. Run the ordinary least square
regression with all the variables and with the training data. Get RMSE and R2 for both
the training and testing data and see if there is a difference.
Q4.2
Use LASSO to choose a reasonable, small model, based on the training data you created.
Re-fit an OLS model with the variables obtained. The final model should only include
variables with p-values < 0.05. Note: you may choose to use lambda 1se or lambda min
to answer the following questions where apply.
i. What is the model reported by LASSO? Use 5-fold cross-validation to select the
tuning parameter.
ii. What is the model after refitting OLS with the selected variables? What are RMSE
and R2 for the training and testing data? Compare them with results in Q4.2.
iii. What is your final model, after excluding high p-value variables? You will need to
use model selection method to obtain this final model. Make it clear what
criterion/criteria you have used and justify why they are appropriate.
iv. Try Ridge regression with 5-fold CV to select the tuning parameter. Compare its
training and testing RMSE and R2 with the previous models.