0% found this document useful (0 votes)
345 views8 pages

Topic: Dimension Reduction With PCA: Instructions

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 8

Topic: Dimension Reduction With PCA

Instructions:
Please share your answers filled in-line in the word document. Submit code separately
wherever applicable.
Please ensure you update all the details:
Name: ____Mary_________ Batch ID: __15_07_2021_________
Topic: Principal Component Analysis

Grading Guidelines:
1. An assignment submission is considered complete only when correct and executable code(s) are submitted along
with the documentation explaining the method and results. Failing to submit either of those will be considered an
invalid submission and will not be considered for evaluation.
2. Assignments submitted after the deadline will affect your grades.

Grading:
Ans Date     Ans Date
Correct On time A 100    
80% & above On time B 85 Correct Late
50% & above On time C 75 80% & above Late
50% & below On time D 65 50% & above Late
    E 55 50% & below  
Copied/No Submission   F 45    

● Grade A: (>= 90): When all assignments are submitted on or before the given deadline
● Grade B: (>= 80 and < 90):
o When assignments are submitted on time but less than 80% of problems are completed.
(OR)
o All assignments are submitted after the deadline.

● Grade C: (>= 70 and < 80):


o When assignments are submitted on time but less than 50% of the problems are completed.
(OR)
o Less than 80% of problems in the assignments are submitted after the deadline

● Grade D: (>= 60 and < 70):


o Assignments submitted after the deadline and with 50% or less problems.

● Grade E: (>= 50 and < 60):


o Less than 30% of problems in the assignments are submitted after the deadline
(OR)
o Less than 30% of problems in the assignments are submitted before deadline

● Grade F: (< 50): No submission (or) malpractice.

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Hints:
1. Business Problem
1.1. What is the business objective?
1.1. Are there any constraints?

2. Work on each feature of the dataset to create a data dictionary as displayed in the below
image:

2.1 Make a table as shown above and provide information about the features such as its data type
and its relevance to the model building. And if not relevant, provide reasons and a description of the
feature.

3. Data Pre-processing
3.1 Data Cleaning, Feature Engineering, etc.
4. Exploratory Data Analysis (EDA):
4.1. Summary.
4.2. Univariate analysis.
4.3. Bivariate analysis.

5. Model Building
5.1 Build the model on the scaled data (try multiple options).
5.2 Perform PCA analysis and get the maximum variance between components.
5.3 Perform clustering before and after applying PCA to cross the number of clusters
formed.
5.4 Briefly explain the model output in the documentation.

6. Write about the benefits/impact of the solution - in what way does the business (client)
benefit from the solution provided?

Problem Statement: -
© 2013 - 2021 360DigiTMG. All Rights Reserved.
Perform hierarchical and K-means clustering on the dataset. After that, perform PCA on the
dataset and extract the first 3 principal components and make a new dataset with these 3
principal components as the columns. Now, on this new dataset, perform hierarchical and K-
means clustering. Compare the results of clustering on the original dataset and clustering on
the principal components dataset (use the scree plot technique to obtain the optimum
number of clusters in K-means clustering and check if you’re getting similar results with and
without PCA).

Hierarchical Clustering on whole data:


Dendrogram:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


K-Means
Clustering on whole data :
Hierarchical Clustering on PCA data with 3 components:

Scree plot of PCA data with 3 components


Problem Statement: -

A pharmaceuticals manufacturing company is conducting a study on a new medicine to treat


heart diseases. The company has gathered data from its secondary sources and would like you
to provide high level analytical insights on the data. Its aim is to segregate patients depending
on their age group and other factors given in the data. Perform PCA and clustering algorithms on
the dataset and check if the clusters formed before and after PCA are the same and provide a
brief report on your model. You can also explore more ways to improve your model.

Note: This is just a snapshot of the data. The datasets can be downloaded from AiSpry LMS in
the Hands-On Material section.

You might also like