How A Perfect Machine Model Should Be Done
How A Perfect Machine Model Should Be Done
How A Perfect Machine Model Should Be Done
Mark distribution
Marks of this coursework has a very good distribution:
average: 68
std 15.6
median 70
min 10
max 93
Q1
Confusion matrix:
𝑇𝑃 7
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 (𝑝) = = = 63.6%
𝑇𝑃 + 𝐹𝑃 7 + 4
𝑇𝑃 7
𝑅𝑒𝑐𝑎𝑙𝑙 (𝑟) = = = 78%
𝑇𝑃 + 𝐹𝑁 7 + 2
𝑇𝑃 + 𝑇𝑁 7 + 7
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = = = 70%
𝐴𝑙𝑙 20
Q2
We expect the submission contains right code for all elements and a good summary of your
work in the report. In addition to the individual feedback you have received, below is general
feedback for question 2 to help you identify what you have done well and what you need to do
to improve in future assessments. So you are advised to read your individual feedback in
conjunction with this overall feedback.
• Size of dataset
• Missing value check (no missing value in this data)
• Features:
• Categorical: categories of each feature & summary
• Numerical: min, max, mean, std (e.g., data.describe())
• Histogram plot of each feature + summary in report
• Scatter plot of pairs of features + summary in report
• Boxplot + summary
• Correlation/relationship + observation summary in report
• between each pair of features
• between each feature and the target (Revenue)
• Further useful and insightful exploration and analysis
• Preprocessing should be normally fitted on training data set only, and then use the fitted
model to transfer both the training data and test data.
o Bad: Fitted the preprocessing methods on the whole dataset.
o Complete wrong: Fitted the preprocessing methods on training data and test data
separately.
• Preprocessing is done, but it isn’t actually used for the model implementation.
• Insufficient pre-processing
• Irrelevant preprocessing: some students checked the data with no missing values, but
still did imputation.
Common issues/mistakes:
• Confusion matrix
• Accuracy
• Precision
• Recall
• F measure (F1 or F-β)
Common issues:
Common issues: