Remote Sensing Assignment
Remote Sensing Assignment
Remote Sensing Assignment
ANALYSIS
REMOTE SENSING-II ASSIGNMENT
NARESH J
2022107302
Principal Components Analysis
Remote Sensing-II
Assignment-II
PCA works by considering the variance of each attribute because the high
attribute shows the good split between the classes, and hence it reduces the
dimensionality. Some real-world applications of PCA are image processing,
movie recommendation system, optimizing the power allocation in various
communication channels. It is a feature extraction technique, so it contains the
important variables and drops the least important variable.
1
Principal Components Analysis
The aim of this step is to standardize the range of the continuous initial
variables so that each one of them contributes equally to the analysis.
Mathematically, this can be done by subtracting the mean and dividing by the
standard deviation for each value of each variable.
Once the standardization is done, all the variables will be transformed to the
same scale.
The aim of this step is to understand how the variables of the input data set
are varying from the mean with respect to each other, or in other words, to see if
there is any relationship between them. Because sometimes, variables are highly
correlated in such a way that they contain redundant information. So, in order to
identify these correlations, we compute the covariance matrix.
2
Principal Components Analysis
Now that we know that the covariance matrix is not more than a table that
summarizes the correlations between all the possible pairs of variables, let’s
move to the next step.
Eigenvectors and eigenvalues are the linear algebra concepts that we need to
compute from the covariance matrix in order to determine the principal
components of the data.
3
Principal Components Analysis
What you first need to know about eigenvectors and eigenvalues is that they
always come in pairs, so that every eigenvector has an eigenvalue. Also, their
number is equal to the number of dimensions of the data. For example, for a 3-
dimensional data set, there are 3 variables, therefore there are 3 eigenvectors
with 3 corresponding eigenvalues.
It is eigenvectors and eigenvalues who are behind all the magic of principal
components because the eigenvectors of the Covariance matrix are
actually the directions of the axes where there is the most variance (most
information) and that we call Principal Components. And eigenvalues are
simply the coefficients attached to eigenvectors, which give the amount of
variance carried in each Principal Component.
Let’s suppose that our data set is 2-dimensional with 2 variables x,y and that the
eigenvectors and eigenvalues of the covariance matrix are as follows:
If we rank the eigenvalues in descending order, we get λ1>λ2, which means that
the eigenvector that corresponds to the first principal component (PC1)
is v1 and the one that corresponds to the second principal component (PC2)
is v2.
4
Principal Components Analysis
Now that we know that the covariance matrix is not more than a table that
summarizes the correlations between all the possible pairs of variables, let’s
move to the next step.
As we saw in the previous step, computing the eigenvectors and ordering them
by their eigenvalues in descending order, allow us to find the principal
components in order of significance. In this step, what we do is, to choose
whether to keep all these components or discard those of lesser significance (of
low eigenvalues), and form with the remaining ones a matrix of vectors that we
call Feature vector.
So, the feature vector is simply a matrix that has as columns the eigenvectors of
the components that we decide to keep. This makes it the first step towards
dimensionality reduction, because if we choose to keep only p eigenvectors
(components) out of n, the final data set will have only p dimensions.
Continuing with the example from the previous step, we can either form a
feature vector with both of the eigenvectors v1 and v2:
5
Principal Components Analysis
Or discard the eigenvector v2, which is the one of lesser significance, and form
a feature vector with v1 only:
So, as we saw in the example, it’s up to you to choose whether to keep all the
components or discard the ones of lesser significance, depending on what you
are looking for. Because if you just want to describe your data in terms of new
variables (principal components) that are uncorrelated without seeking to reduce
dimensionality, leaving out lesser significant components is not needed.
In the previous steps, apart from standardization, you do not make any changes
on the data, you just select the principal components and form the feature
vector, but the input data set remains always in terms of the original axes (i.e, in
terms of the initial variables).
In this step, which is the last one, the aim is to use the feature vector formed
using the eigenvectors of the covariance matrix, to reorient the data from the
6
Principal Components Analysis
original axes to the ones represented by the principal components (hence the
name Principal Components Analysis). This can be done by multiplying the
transpose of the original data set by the transpose of the feature vector.
7
Principal Components Analysis
are assumed to represent noise, Principal Component Analysis can improve the
signal-to-noise ratio and make it easier to identify the underlying structure in the
data.
8
Principal Components Analysis