Pca

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 18

PCA

What is PCA?
• PCA is a dimensionality reduction technique that transforms a set of
features in a dataset into a smaller number of features called principal
components while at the same time trying to retain as much information
in the original dataset as possible.

• It retains the data in the direction of maximum variance. The reduced


features are uncorrelated with each other.

• In theory, PCA produces the same number of principal components as


there are features in the training dataset. In practice, though, we do not
keep all of the principal components. Each successive principal component
explains the variance that is left after its preceding component, so picking
just a few of the first components sufficiently approximates the original
dataset without the need for additional features.
https://www.keboola.com/blog/pca-machine-learning
https://www.enjoyalgorithms.com/blog/principal-component-analysis-in-ml
What is the Principal Component?
• Principal Components are the new transformed features or we say the
output of the PCA Algorithm is Principal Components.
Usually, the number of PCs is equal to less than the original number of
features present in the dataset.

Below are some of the properties of Principal Components:


• PCs should be a linear combination of the variables of the original dataset.
• The PCs are Orthogonal i.e there is zero correlation between a pair of
features.
• The Importance of each PC keeps decreasing from PC1 to nth PC meaning,
PC1 has the most importance and nth PC has the least importance.
Steps Involved in PCA:
• Step 1: Standardization
we standardize the range of the values in all the variables to a
similar range so that all of them have equal contributions to
the analysis.

• The main reason why we perform standardization before


actually performing the PCA is that PCA is very sensitive to the
variance of the original variables in the dataset. The reason is
that, if there are features with big differences in their initial
range of values then the features with a higher range of
values will dominate the overall analysis and PCA will be more
biased towards those features.
• Standardization is done using following formulae:

Where x is the Value


u is the mean of the feature/variable
α is the Standard Deviation

Once the Standardization is done, and all features are scaled down to a similar range,
we will now proceed with the PCA method.
• Step 2: Calculate Covariance Matrix

The covariance matrix is a square matrix, of d x d dimensions,


where d stands for “dimension” (or feature or column, if our data is
tabular). It shows the pairwise feature correlation between each feature.

The main aim of this step is to understand the variance of the input data from
the mean or in simple terms we try to find if there is any relationship between
the features.
https://www.enjoyalgorithms.com/blog/principal-component-analysis-in-ml
• Step 3: Computing the EigenValues and EigenVectors
We calculate the eigenvectors (unit vectors) and their associated
eigenvalues (scalars by which we multiply the eigenvector) of the
covariance matrix.

EigenValues and EigenVectors are the main reasons behind the creation of Principal
Components.

EigenVectors are nothing but the direction of the axis with maximum Variance which is
nothing but the PCs.

EigenValues are nothing but coefficients attached to the EigenVectors representing the
variance in each Principal Component.

So we rank these EigenVectors in the order of their respective EigenValues in


descending order. The result is that you get Principal Components in the order of their
level of variance.
• Step 4: Sort the eigenvectors from the highest eigenvalue to the lowest.
The eigenvector with the highest eigenvalue is the first principal
component. Higher eigenvalues correspond to greater amounts of shared
variance explained.
• Select the number of principal components.
• Select the top N eigenvectors (based on their eigenvalues) to become the
N principal components. The optimal number of principal components is
both subjective and problem-dependent. Usually, we look at the
cumulative amount of shared variance explained by the combination of
principal components and pick that number of components, which still
significantly explains the shared variance.
• We compute explained variance by dividing the eigen values by the sum of
all eigen values. Then, we take the cumulative sum of all eigen values.

The eigen values here are: [5.50, 1.72].


The sum of eigen values is: 7.22
Explained variance is: [0.76, 0.23]
Cumulative explained variance is: [0.76, 0.99]

So, when we have higher dimensional data, we usually take k components in such a
way that we get an explained variance of 0.95 or more.
• Step 5: Reload Data along the Principal Component Axes
• This is the final step where we just use the newly formed feature vector
with the selected EigenVectors(PCs) and then just reorient the data of the
original axes to the ones represented by the PCs.
• This is done by multiplying the Transpose of the feature vectors by the
Transpose of the Original dataset.
Advantages
• Removes correlated features. PCA will help you remove all the features
that are correlated, a phenomenon known as multi-collinearity. Finding
features that are correlated is time consuming, especially if the number of
features is large.
• Improves machine learning algorithm performance. With the number of
features reduced with PCA, the time taken to train your model is now
significantly reduced.
• Reduce overfitting. By removing the unnecessary features in your dataset,
PCA helps to overcome overfitting.
Disadvantages
• Independent variables are now less interpretable. PCA reduces your
features into smaller number of components. Each component is now a
linear combination of your original features, which makes it less readable
and interpretable.
• Information loss. Data loss may occur if you do not exercise care in
choosing the right number of components.
• Feature scaling. Because PCA is a variance maximizing exercise, PCA
requires features to be scaled prior to processing.
Example
• https://towardsdatascience.com/using-principal-component-
analysis-pca-for-machine-learning-b6e803f5bf1e

You might also like