Pca
Pca
Pca
What is PCA?
• PCA is a dimensionality reduction technique that transforms a set of
features in a dataset into a smaller number of features called principal
components while at the same time trying to retain as much information
in the original dataset as possible.
Once the Standardization is done, and all features are scaled down to a similar range,
we will now proceed with the PCA method.
• Step 2: Calculate Covariance Matrix
The main aim of this step is to understand the variance of the input data from
the mean or in simple terms we try to find if there is any relationship between
the features.
https://www.enjoyalgorithms.com/blog/principal-component-analysis-in-ml
• Step 3: Computing the EigenValues and EigenVectors
We calculate the eigenvectors (unit vectors) and their associated
eigenvalues (scalars by which we multiply the eigenvector) of the
covariance matrix.
EigenValues and EigenVectors are the main reasons behind the creation of Principal
Components.
EigenVectors are nothing but the direction of the axis with maximum Variance which is
nothing but the PCs.
EigenValues are nothing but coefficients attached to the EigenVectors representing the
variance in each Principal Component.
So, when we have higher dimensional data, we usually take k components in such a
way that we get an explained variance of 0.95 or more.
• Step 5: Reload Data along the Principal Component Axes
• This is the final step where we just use the newly formed feature vector
with the selected EigenVectors(PCs) and then just reorient the data of the
original axes to the ones represented by the PCs.
• This is done by multiplying the Transpose of the feature vectors by the
Transpose of the Original dataset.
Advantages
• Removes correlated features. PCA will help you remove all the features
that are correlated, a phenomenon known as multi-collinearity. Finding
features that are correlated is time consuming, especially if the number of
features is large.
• Improves machine learning algorithm performance. With the number of
features reduced with PCA, the time taken to train your model is now
significantly reduced.
• Reduce overfitting. By removing the unnecessary features in your dataset,
PCA helps to overcome overfitting.
Disadvantages
• Independent variables are now less interpretable. PCA reduces your
features into smaller number of components. Each component is now a
linear combination of your original features, which makes it less readable
and interpretable.
• Information loss. Data loss may occur if you do not exercise care in
choosing the right number of components.
• Feature scaling. Because PCA is a variance maximizing exercise, PCA
requires features to be scaled prior to processing.
Example
• https://towardsdatascience.com/using-principal-component-
analysis-pca-for-machine-learning-b6e803f5bf1e