Assignment 6
Assignment 6
Assignment 6
Practice work
PCA ALGORITHM. LATEST TREND OF COMPUTER VERSION.
Almaty, 2024
CONTENT
Introduction ......................................................................................................................... 3
1. Principal Component Analysis ........................................................................................ 4
2. Image Compression via PCA .......................................................................................... 4
3. MATLAB Implementation.............................................................................................. 4
Conclusion .......................................................................................................................... 7
Introduction
3
1. Principal Component Analysis
Principal Component Analysis (PCA) is the general name for a technique which
uses sophisticated underlying mathematical principles to transforms several possibly
correlated variables into a smaller number of variables called principal components.
The origins of PCA lie in multivariate data analysis, however, it has a wide range of
other applications, as we will show in due course. PCA has been called, ’one of the
most important results from applied linear algebra’[2] and perhaps its most common
use is as the first step in trying to analyze large data sets. Some of the other common
applications include de-noising signals, blind source separation, and data compression.
In general terms, PCA uses a vector space transform to reduce the dimensionality of
large data sets. Using mathematical projection, the original data set, which may have
involved many variables, can often be interpreted in just a few variables (the principal
components). It is therefore often the case that an examination of the reduced
dimension data set will allow the user to spot trends, patterns and outliers in the data,
far more easily than would have been possible without performing the principal
component analysis.
3. MATLAB Implementation
During the subsequent analysis, we shall work with a standard test image that is
often used in image processing and image compression. Specifically, we will use a
grayscale image of a woman, which has become a classic example in the image
processing community. This image will be displayed in Figure 3.1 as a reference. Our
analysis will be conducted using MATLAB, a powerful computational tool well-suited
for performing numerical analysis and visualizations.
4
Figure 3.1 – The “Woman” greyscale test image
5
X = fly_reshaped - repmat(mn, m * n, 1);
Z = (1/sqrt(n-1)) * X; % Create matrix, Z
covZ = Z' * Z; % Covariance matrix of Z
[U, S, V] = svd(covZ);
variances = diag(S) .* diag(S); % Compute variances
numPCs = length(variances);
bar(variances(1:min(30, numPCs)));
xlabel('Principal Component');
ylabel('Variance');
title('Scree Plot of Principal Components');
PCs = min(40, numPCs);
VV = V(:, 1:PCs);
Y = X * VV; % Project data onto PCs
ratio = 256 / (2 * PCs + 1);
XX = Y * VV';
XX = XX + repmat(mn, m * n, 1);
XX = reshape(XX, [m, n, p]);
image(uint8(XX)), colormap(map);
axis off, axis equal;
In this case, we have chosen to use the first 40 (out of 512) principal components.
What compression ratio does this equate to? To answer this question, we need to
compare the amount of data we would have needed to store previously, with what we
can now store. Without compression, we would still have our 512 × 512 matrix to store.
After selecting the first 40 principal components, we have the two matrices V˜ and Y˜
(VV and YY) in the above MATLAB code) from which we can obtain a 512 × 512
pixel matrix by computing the matrix product. The image for 40 principal components
(6.3:1 compression) is displayed in Figure 3.2.
6
Conclusion
In summary, PCA proves to be a powerful and essential tool for data scientists and
analysts, especially in the domains of data preprocessing, noise reduction, and feature
extraction. While PCA has limitations, such as assuming linear relationships and
requiring standardized data, its benefits in simplifying complex datasets make it an
indispensable technique in modern data analysis. This project effectively demonstrated
PCA’s applications, advantages, and potential challenges, reinforcing its value across
different fields. The MATLAB implementations successfully illustrated how PCA can
be used to transform and compress image data, making it more manageable for analysis
and visualization. Additionally, the reconstructed images from the selected principal
components demonstrated how PCA retains the key features while eliminating less
relevant details, achieving data compression with minimal information loss.