Assignment 6

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

JSC “Kazakh-British Technical University”

Faculty of Information Technology

Practice work
PCA ALGORITHM. LATEST TREND OF COMPUTER VERSION.

Prepared by: Dussekenov Elnar


Checked by: Aiym Svambayeva

Almaty, 2024
CONTENT

Introduction ......................................................................................................................... 3
1. Principal Component Analysis ........................................................................................ 4
2. Image Compression via PCA .......................................................................................... 4
3. MATLAB Implementation.............................................................................................. 4
Conclusion .......................................................................................................................... 7
Introduction

Principal Component Analysis (PCA) is a commonly employed method for


reducing dimensions in the fields of data science, machine learning, and statistics. It
converts a set of data into a reduced form, maintaining maximum variance. This makes
PCA a crucial instrument in preprocessing, visualization, and noise reduction.
This report will examine the fundamental principles of PCA, its different uses, and
real-world applications using MATLAB. We will also discuss topics such as managing
missing data, weighted PCA, principal component coefficients, and the T-squared
statistic. This project adheres to the recommendations provided in the official
MATLAB documentation, implementing these ideas on a specific dataset.

3
1. Principal Component Analysis

Principal Component Analysis (PCA) is the general name for a technique which
uses sophisticated underlying mathematical principles to transforms several possibly
correlated variables into a smaller number of variables called principal components.
The origins of PCA lie in multivariate data analysis, however, it has a wide range of
other applications, as we will show in due course. PCA has been called, ’one of the
most important results from applied linear algebra’[2] and perhaps its most common
use is as the first step in trying to analyze large data sets. Some of the other common
applications include de-noising signals, blind source separation, and data compression.
In general terms, PCA uses a vector space transform to reduce the dimensionality of
large data sets. Using mathematical projection, the original data set, which may have
involved many variables, can often be interpreted in just a few variables (the principal
components). It is therefore often the case that an examination of the reduced
dimension data set will allow the user to spot trends, patterns and outliers in the data,
far more easily than would have been possible without performing the principal
component analysis.

2. Image Compression via PCA

To compress an image, we want to remove superfluous pixels and replace them


with other colors that are already being used by the image. However, we want to do
this in a way that can eliminate pixels while maintaining image quality. So, we are
finding a subspace of our original image that is lower dimensional but still accurately
represents the data. To compress any image, we will need to find a “basis” that contains
every combination of pixels of every animal picture so that we can accurately
reconstruct arbitrary images. Of course, we cannot find a basis that can represent every
picture, so we will want to gather a large enough dataset that can represent our
unobtainable dataset. Our approach to PCA will consider an initial dataset of 144
pictures. Each picture will be sliced in 32x32 blocks of pixels, where each block
denotes a 1024-vector in R.

3. MATLAB Implementation

During the subsequent analysis, we shall work with a standard test image that is
often used in image processing and image compression. Specifically, we will use a
grayscale image of a woman, which has become a classic example in the image
processing community. This image will be displayed in Figure 3.1 as a reference. Our
analysis will be conducted using MATLAB, a powerful computational tool well-suited
for performing numerical analysis and visualizations.

4
Figure 3.1 – The “Woman” greyscale test image

MATLAB considers greyscale images as ’objects’ consisting of two components,


a matrix of pixels, and a colourmap. The “Woman” image above is stored in a 512 ×
512 matrix (and therefore has this number of pixels). The colourmap is a 512 × 3
matrix. For RGB colour images, each image can be stored as a single 512×512×3
matrix, where the third dimension stores three numbers in the range [0, 1]
corresponding to each pixel in the 512×512 matrix, representing the intensity of the
red, green and blue components.
For a greyscale image such as the one we are dealing with, the colourmap matrix
has three identical columns with a scale representing intensity on the one dimensional
grey scale. Each element of the pixel matrix contains a number representing a certain
intensity of grey scale for an individual pixel. MATLAB displays all of the 512 × 512
pixels simultaneously with the correct intensity and the greyscale image that we see is
produced. The 512×512 matrix containing the pixel information is our data matrix, X.
We will perform a principal component analysis of this matrix, using the SVD method
outlined above. The steps involved are exactly as described above and summarised in
the following MATLAB code.
[fly, map] = imread('mam.jpg');
fly = double(fly); % Convert to double precision
image(uint8(fly)), colormap(map); % Display image
axis off, axis equal
[m, n, p] = size(fly); % 'p' represents the number of channels (should be 3 for RGB)
fly_reshaped = reshape(fly, m * n, p);
mn = mean(fly_reshaped, 1);

5
X = fly_reshaped - repmat(mn, m * n, 1);
Z = (1/sqrt(n-1)) * X; % Create matrix, Z
covZ = Z' * Z; % Covariance matrix of Z
[U, S, V] = svd(covZ);
variances = diag(S) .* diag(S); % Compute variances
numPCs = length(variances);
bar(variances(1:min(30, numPCs)));
xlabel('Principal Component');
ylabel('Variance');
title('Scree Plot of Principal Components');
PCs = min(40, numPCs);
VV = V(:, 1:PCs);
Y = X * VV; % Project data onto PCs
ratio = 256 / (2 * PCs + 1);
XX = Y * VV';
XX = XX + repmat(mn, m * n, 1);
XX = reshape(XX, [m, n, p]);
image(uint8(XX)), colormap(map);
axis off, axis equal;

In this case, we have chosen to use the first 40 (out of 512) principal components.
What compression ratio does this equate to? To answer this question, we need to
compare the amount of data we would have needed to store previously, with what we
can now store. Without compression, we would still have our 512 × 512 matrix to store.
After selecting the first 40 principal components, we have the two matrices V˜ and Y˜
(VV and YY) in the above MATLAB code) from which we can obtain a 512 × 512
pixel matrix by computing the matrix product. The image for 40 principal components
(6.3:1 compression) is displayed in Figure 3.2.

Figure 3.1 – 40 principal components

6
Conclusion

In summary, PCA proves to be a powerful and essential tool for data scientists and
analysts, especially in the domains of data preprocessing, noise reduction, and feature
extraction. While PCA has limitations, such as assuming linear relationships and
requiring standardized data, its benefits in simplifying complex datasets make it an
indispensable technique in modern data analysis. This project effectively demonstrated
PCA’s applications, advantages, and potential challenges, reinforcing its value across
different fields. The MATLAB implementations successfully illustrated how PCA can
be used to transform and compress image data, making it more manageable for analysis
and visualization. Additionally, the reconstructed images from the selected principal
components demonstrated how PCA retains the key features while eliminating less
relevant details, achieving data compression with minimal information loss.

You might also like