0% found this document useful (0 votes)
3 views

program - 3

The document outlines a program that implements Principal Component Analysis (PCA) to reduce the dimensionality of the Iris dataset from 4 features to 2. It includes steps for standardization, covariance matrix calculation, eigenvalue and eigenvector computation, and data transformation. The program also visualizes the PCA-transformed data using a scatter plot.

Uploaded by

1bi22ai010
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

program - 3

The document outlines a program that implements Principal Component Analysis (PCA) to reduce the dimensionality of the Iris dataset from 4 features to 2. It includes steps for standardization, covariance matrix calculation, eigenvalue and eigenvector computation, and data transformation. The program also visualizes the PCA-transformed data using a scatter plot.

Uploaded by

1bi22ai010
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Program - 3

Develop a program to implement Principal Component Analysis (PCA) for reducing the
dimensionality of the Iris dataset from 4 features to 2.
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

# Load the Iris dataset


iris = datasets.load_iris()
X = iris.data
y = iris.target

# Create a DataFrame for better visualization


df = pd.DataFrame(X, columns=iris.feature_names)
#df to array
X = df.to_numpy()
np.set_printoptions(linewidth=np.inf)
print('original data top 3 rows')
print(X[:3])
mean = np.mean(X,axis=0)
print('mean value =',np.round(mean,2))
std_dev=np.std(X,axis=0)
print('Standard deviation = ',np.round(std_dev,2))
#step 1 standardization of x matrix
X_standardized = (X - np.mean(X, axis=0)) / np.std(X, axis=0)
print(' Standardization matrix top 3 rows \n', np.round(X_standardized[:3],2))
#step 2 take the tanspose of matrix X_stadradized by .T i.m x*xT
cov_matrix = np.cov(X_standardized.T)
print('covarnce = \n', np.round(cov_matrix,2))
# Step 3: Compute the eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
print('eigen values =',np.round(eigenvalues,2))
print('eigen vector =\n',np.round(eigenvectors,2))
# Step 4: Sort eigenvalues and select principal components
sorted_index = np.argsort(eigenvalues)[::-1]
sorted_eigenvectors = eigenvectors[:, sorted_index]
print('sorted eigen values =',np.round(sorted_index,2))
print('soted eigen vector =\n',np.round(sorted_eigenvectors,2))
# Select the top 2 eigenvectors
eigenvectors_subset = sorted_eigenvectors[:, :2]
print('soted eigen vector =\n',np.round(eigenvectors_subset,2))
# Step 5: Transform the data
X_reduced = np.dot(X_standardized, eigenvectors_subset)
print("X_reduced \n",X_reduced [:3])

df_pca = pd.DataFrame(X_reduced, columns=['PCA1', 'PCA2'])


df_pca['target'] = y
print(df_pca.sample(5))
# Plot the PCA-transformed data
plt.figure(figsize=(10, 7))
colors = ['r', 'g', 'b']
for target, color in zip(df_pca['target'].unique(), colors):
subset = df_pca[df_pca['target'] == target]
plt.scatter(subset['PCA1'], subset['PCA2'], color=color, label=iris.target_names[target])
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA of Iris Dataset')
plt.legend()
plt.show()
ouput:
original data top 3 rows
[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]]
mean value = [5.84 3.06 3.76 1.2 ]
Standard deviation = [0.83 0.43 1.76 0.76]
Standardization matrix top 3 rows
[[-0.9 1.02 -1.34 -1.32]
[-1.14 -0.13 -1.34 -1.32]
[-1.39 0.33 -1.4 -1.32]

covarnce =
[[ 1.01 -0.12 0.88 0.82]
[-0.12 1.01 -0.43 -0.37]
[ 0.88 -0.43 1.01 0.97]
[ 0.82 -0.37 0.97 1.01]]
eigen values = [2.94 0.92 0.15 0.02]
eigen vector =
[[ 0.52 -0.38 -0.72 0.26]
[-0.27 -0.92 0.24 -0.12]
[ 0.58 -0.02 0.14 -0.8 ]
[ 0.56 -0.07 0.63 0.52]]
sorted eigen values = [0 1 2 3]
soted eigen vector =
[[ 0.52 -0.38 -0.72 0.26]
[-0.27 -0.92 0.24 -0.12]
[ 0.58 -0.02 0.14 -0.8 ]
[ 0.56 -0.07 0.63 0.52]]
soted eigen vector =
[[ 0.52 -0.38]
[-0.27 -0.92]
[ 0.58 -0.02]
[ 0.56 -0.07]]
X_reduced
[[-2.26470281 -0.4800266 ]
[-2.08096115 0.67413356]
[-2.36422905 0.34190802]]
PCA1 PCA2 target
70 0.737683 -0.396572 1
118 3.310696 -0.017781 2
9 -2.184328 0.469014 0
149 0.960656 0.024332 2
25 -1.951846 0.625619 0

You might also like