0% found this document useful (0 votes)
12 views47 pages

data science practicals

The document outlines practical exercises in Data Science, focusing on Excel functionalities, data manipulation with pandas, hypothesis testing, ANOVA, regression analysis, and logistic regression. Each practical includes step-by-step instructions and code examples for tasks such as conditional formatting, creating pivot tables, handling missing values, feature scaling, and building predictive models. The exercises aim to provide hands-on experience with data analysis techniques and statistical methods.

Uploaded by

Ahlam Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views47 pages

data science practicals

The document outlines practical exercises in Data Science, focusing on Excel functionalities, data manipulation with pandas, hypothesis testing, ANOVA, regression analysis, and logistic regression. Each practical includes step-by-step instructions and code examples for tasks such as conditional formatting, creating pivot tables, handling missing values, feature scaling, and building predictive models. The exercises aim to provide hands-on experience with data analysis techniques and statistical methods.

Uploaded by

Ahlam Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 47

PRACTICAL NO : 1

Name: Class:TYCS Date:

Subject: Data Science Roll no:

Aim :- Introduction to Excel

A. Perform conditional formatting on a dataset using various criteria

. Step 1: create an excel sheet with data as shown below


.Step 2 : Apply conditional formatting

. In conditional formatting , select Icon sets

. Step 3:Lets use the data bars, because it compares all to each other

.In conditional formatting , select data bars


B. Create a pivot table to analyse and summarize data

Pivot table :- https://exceljet.net/articles/excel-pivot-tables

.Step 1 : Select all data -> click on insert -> Pivot table -> select ok
Step 2 : New sheet get created -> we rename the sheet as pivot table and drag and drop
the fields you want to get pivoted

Step 3: Rename the pivot table -> explore the field settings option
Step 4 : Group options -> group selection -> insert slicer select -> Apply slicer on item
Step 5 : Filter field -> certain data above / below the range can be viewed
C. Use VLOOKUP function to retrieve information from a different worksheet or table

Syntax : =vlookup(lookup value(id),tablerange,column number,true/false)


D: Perform what-if analysis using goal seek to determine input values for desired output:-
PRACTICAL NO : 2

Name: Class:TYCS Date:

Subject: Data Science Roll no:

Aim :- Data frames and basic data pre-processing

A. Read data from csv and json files into a dataframe

Step 1: Create a csv file -> student.csv-> write the code below

(1) # Read data from a csv file

import pandas as pd
df = pd.read_csv(r"E:\Student.csv")
print("Our dataset:")
print(df)
Output:-

(2) # Reading data from a JSON file

A. Create a json file in notepad , and save with .json

import pandas as pd
data = pd.read_json(r"E:\data.json")
print(data)

output:-
B. Perform basic data pre-processing tasks such as handling missing values and outliers.

import pandas as pd

# Use a properly formatted file path


file_path = r"E:\titanic.csv"

# Read the CSV file


df = pd.read_csv(file_path)

# Display first 10 rows


print("First 10 rows of the dataset:")
print(df.head(10))

# Fill missing values with 0


df.fillna(value=0, inplace=True)

# Display first 10 rows after filling NaN values


print("\nDataset after filling NA values with 0:")
print(df.head(10))
Output:-
(2) # Dropping NA values using dropna()

Code:

import pandas as pd

# Use a properly formatted file path


file_path = r"E:\titanic.csv" # Use raw string format to
prevent Unicode errors

# Read the CSV file


df = pd.read_csv(file_path)

# Display first 10 rows


print("First 10 rows of the dataset:")
print(df.head(10))

# Drop rows with missing values


df.dropna(inplace=True)

# Display first 10 rows after dropping NA values


print("\nDataset after dropping NA values:")
print(df.head(10))
output:-
C. Manipulate and transform data using functions like filtering, sorting, and grouping

 Loaded the iris dataset from "E:\1\iris\iris.csv".


 Filtered rows where the species is 'setosa'.
 Sorted the dataset by SepalLengthCm in descending order.
 Grouped data by species and calculated the average of each column.
 Fixed column name issues to match the dataset.

Code:

import pandas as pd
# Load iris dataset
iris = pd.read_csv(r"E:\1\iris\iris.csv")
# Filtering data based on a condition
setosa = iris[iris['species'] == 'setosa']
print("Setosa samples:")
print(setosa.head())
# Sorting data
sorted_iris = iris.sort_values(by='sepal_length',
ascending=False)
print("\nSorted iris dataset:")
print(sorted_iris.head())
# Grouping data
grouped_species = iris.groupby('species').mean()
print("\nMean measurements for each species:")
print(grouped_species)
output:-
PRACTICAL NO : 3

Name: Class:TYCS Date:

Subject: Data Science Roll no:

Aim :- Feature Scaling and Dummification

1. Apply feature scaling techniques like standardization and normalization to numerical


features

 Original Data: The dataset had raw values for Alcohol and Malic Acid in different ranges.

 After Min-Max Scaling:

 Values are now between 0 and 1.


 This makes them smaller and easier to compare.
 Example: Alcohol changed from 14.23 → 0.842 (scaled down).

 After Standard Scaling:

 Values are adjusted to have a mean of 0 and a similar spread.


 This removes big differences between values.
 Example: Alcohol changed from 14.23 → 1.518 (standardized).

Code:

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler, StandardScaler

# Use raw string (r"") to prevent path issues


file_path = r"E:\wine\wine.csv"

# Read CSV, skipping the first row (if it's a header)


df = pd.read_csv(file_path, header=None, usecols=[0, 1, 2],
skiprows=1)
df.columns = ['classlabel', 'Alcohol', 'Malic Acid']

# Display original DataFrame


print("Original DataFrame:")
print(df.head())

# **MinMax Scaling (Normalization)**


scaler = MinMaxScaler()
df[['Alcohol', 'Malic Acid']] =
scaler.fit_transform(df[['Alcohol', 'Malic Acid']])
print("\nDataFrame after MinMax Scaling:")
print(df.head())

# **Standard Scaling (Standardization)**


scaler = StandardScaler()
df[['Alcohol', 'Malic Acid']] =
scaler.fit_transform(df[['Alcohol', 'Malic Acid']])
print("\nDataFrame after Standard Scaling:")
print(df.head())
output:
B. Perform feature Dummification to convert categorical variables into numerical
representations

We loaded the Iris dataset and changed the species names into numbers using Label
Encoding. This helps the computer understand the data better. Then, we added a new
column called code with these numbers and printed the updated dataset.

Code:

import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Read dataset correctly


iris = pd.read_csv(r"E:\1\iris\iris.csv")

# Apply Label Encoding to the 'Species' column


le = LabelEncoder()
iris['code'] = le.fit_transform(iris['species'])

# Print the updated dataset


print(iris)

output:-
PRACTICAL NO : 4

Name: Class:TYCS Date:

Subject: Data Science Roll no:

Aim :- Hypothesis Testing

.Conduct a hypothesis test using appropriate statistical tests (e.g., t-test, chi-square test

Code:

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# Generate two samples for demonstration purposes


np.random.seed(42)
sample1 = np.random.normal(loc=10, scale=2, size=30)
sample2 = np.random.normal(loc=12, scale=2, size=30)

# Perform a two-sample t-test


t_statistic, p_value = stats.ttest_ind(sample1, sample2)

# Set the significance level


alpha = 0.05

print("Results of Two-Sample t-test:")


print(f'T-statistic: {t_statistic}')
print(f'P-value: {p_value}')
print(f"Degrees of Freedom: {len(sample1) + len(sample2) -
2}")

# Plot the distributions


plt.figure(figsize=(10, 6))
plt.hist(sample1, alpha=0.5, label='Sample 1', color='blue')
plt.hist(sample2, alpha=0.5, label='Sample 2', color='orange')
# Mark mean values on the plot
plt.axvline(np.mean(sample1), color='blue',
linestyle='dashed', linewidth=2)
plt.axvline(np.mean(sample2), color='orange',
linestyle='dashed', linewidth=2)

plt.title('Distributions of Sample 1 and Sample 2')


plt.xlabel('Values')
plt.ylabel('Frequency')
plt.legend()

# Highlight the critical region if the null hypothesis is


rejected
if p_value < alpha:
critical_region = np.linspace(min(sample1.min(),
sample2.min()), max(sample1.max(), sample2.max()), 1000)
plt.fill_between(critical_region, 0, 5, color='red',
alpha=0.3, label='Critical Region')
plt.text(11, 5, f'T-statistic: {t_statistic:.2f}',
ha='center', va='center', color='black',
backgroundcolor='white')

# Show the plot


plt.show()

# Draw Conclusions
if p_value < alpha:
print("Conclusion: There is significant evidence to reject
the null hypothesis.")
if np.mean(sample1) > np.mean(sample2):
print("Interpretation: Sample 1 has a significantly
higher mean than Sample 2.")
else:
print("Interpretation: Sample 2 has a significantly
higher mean than Sample 1.")
else:
print("Conclusion: Fail to reject the null hypothesis.")
print("Interpretation: There is not enough evidence to
claim a significant difference between the means.")

Output:
.CHI square test

Code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt # Fixed matplotlib import
import seaborn as sb
import warnings
from scipy import stats

warnings.filterwarnings('ignore')

# Load dataset
df = sb.load_dataset('mpg')

# Handling missing values in horsepower


df = df.dropna(subset=['horsepower', 'model_year']) # Removes
rows with NaN in relevant columns

# Convert 'horsepower' to numeric (it may be stored as an


object)
df['horsepower'] = pd.to_numeric(df['horsepower'],
errors='coerce')
df = df.dropna(subset=['horsepower']) # Drop NaN after
conversion

print(df)
print(df['horsepower'].describe())
print(df['model_year'].describe())

# Define bins for horsepower


bins = [0, 75, 150, 240] # Low, Medium, High
df['horsepower_new'] = pd.cut(df['horsepower'], bins=bins,
labels=['l', 'm', 'h'])
# Define bins for model year (Adjusted to have correct number
of bins)
ybins = [69, 72, 74, 84] # Adjusted to match labels count
labels = ['t1', 't2', 't3']
df['modelyear_new'] = pd.cut(df['model_year'], bins=ybins,
labels=labels)
# Crosstabulation of categorical data
df_chi = pd.crosstab(df['horsepower_new'],
df['modelyear_new'])
print(df_chi)

# Perform Chi-Square Test


chi2_stat, p_value, dof, expected =
stats.chi2_contingency(df_chi)
print(f"Chi-Square Statistic: {chi2_stat}")
print(f"P-value: {p_value}")
print(f"Degrees of Freedom: {dof}")
print(f"Expected Frequencies Table: \n{expected}"

output:
PRACTICAL NO : 5

Name: Class:TYCS Date:

Subject: Data Science Roll no:

Aim :- ANOVA (Analysis of Variance)

. Perform one-way ANOVA to compare means across multiple groups.

. Conduct post-hoc tests to identify significant differences between group means.

Code:

import pandas as pd
import scipy.stats as stats
from statsmodels.stats.multicomp import pairwise_tukeyhsd

# Define four groups of data


group1 = [23, 25, 29, 34, 30]
group2 = [19, 20, 22, 24, 25]
group3 = [15, 18, 20, 21, 17]
group4 = [28, 24, 26, 30, 29]

# Combine all data into a single list


all_data = group1 + group2 + group3 + group4

# Create corresponding group labels


group_labels = (['Group1'] * len(group1) +
['Group2'] * len(group2) +
['Group3'] * len(group3) +
['Group4'] * len(group4))

# Perform One-Way ANOVA


f_statistics, p_value = stats.f_oneway(group1, group2, group3,
group4)

# Print ANOVA results


print("One-Way ANOVA Results:")
print(f"F-statistic: {f_statistics:.4f}")
print(f"P-value: {p_value:.4f}")

# Perform Tukey-Kramer post-hoc test if ANOVA is significant


if p_value < 0.05:
tukey_results = pairwise_tukeyhsd(all_data, group_labels)
print("\nTukey-Kramer Post-Hoc Test Results:")
print(tukey_results)
else:
print("\nNo significant difference found between groups
(p-value ≥ 0.05).")

output:
PRACTICAL NO : 6

Name: Class:TYCS Date:

Subject: Data Science Roll no:

Aim :- Regression and its Types.

.Implement simple linear regression using a dataset

.Explore and interpret the regression model coefficients and goodness of fit measures

.Extent the analysis to multiple linear regression and assess the impact of additional
predictors

Code:

import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load California housing dataset


housing = fetch_california_housing()
housing_df = pd.DataFrame(housing.data,
columns=housing.feature_names)

# Add target variable (house price)


housing_df['PRICE'] = housing.target

### SIMPLE LINEAR REGRESSION (Using only 'AveRooms' as


predictor)
print("Performing Simple Linear Regression...\n")

# Define feature and target variable


X = housing_df[['AveRooms']] # Single predictor
y = housing_df['PRICE'] # Target variable

# Split data into training (80%) and testing (20%) sets


X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

# Train the Simple Linear Regression model


model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Compute performance metrics


mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Display results
print("Simple Linear Regression Results:")
print(f"Mean Squared Error (MSE): {mse:.4f}")
print(f"R-squared (R²): {r2:.4f}")
print(f"Intercept (β₀): {model.intercept_:.4f}")
print(f"Coefficient (β₁): {model.coef_[0]:.4f}")
print("\n---------------------------------------------------\
n")

### MULTIPLE LINEAR REGRESSION (Using all features)


print("Performing Multiple Linear Regression...\n")

# Define all features for multiple regression


X = housing_df.drop('PRICE', axis=1) # Use all features
y = housing_df['PRICE'] # Target variable

# Split data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

# Train the Multiple Linear Regression model


model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Compute performance metrics


mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Display results
print("Multiple Linear Regression Results:")
print(f"Mean Squared Error (MSE): {mse:.4f}")
print(f"R-squared (R²): {r2:.4f}")
print(f"Intercept (β₀): {model.intercept_:.4f}")
print(f"Coefficients (β₁, β₂, ...): {model.coef_}")

Output:
PRACTICAL NO : 7

Name: Class:TYCS Date:

Subject: Data Science Roll no:

Aim :- Logistic Regression and Decision Tree

. Build a logistic regression model to predict a binary outcome.

. Evaluate the model's performance using classfication metrics

. Contruct a decision tree model and interpret the decision rules for classification

Code:

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, precision_score,
recall_score, classification_report
import matplotlib.pyplot as plt
from sklearn.tree import plot_tree
# Load the Iris dataset and create a binary classification
problem
iris = load_iris()
iris_df = pd.DataFrame(data=np.c_[iris['data'],
iris['target']], columns=iris['feature_names'] + ['target'])

# Consider only two classes (binary classification)


binary_df = iris_df[iris_df['target'] != 2]
X = binary_df.drop('target', axis=1)
y = binary_df['target']

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

# Train a logistic regression model and evaluate its


performance
logistic_model = LogisticRegression()
logistic_model.fit(X_train, y_train)
y_pred_logistic = logistic_model.predict(X_test)

print("Logistic Regression Metrics")


print("Accuracy: ", accuracy_score(y_test, y_pred_logistic))
print("Precision:", precision_score(y_test, y_pred_logistic))
print("Recall: ", recall_score(y_test, y_pred_logistic))
print("\nClassification Report")
print(classification_report(y_test, y_pred_logistic))

# Train a decision tree model and evaluate its performance


decision_tree_model = DecisionTreeClassifier()
decision_tree_model.fit(X_train, y_train)
y_pred_tree = decision_tree_model.predict(X_test)

print("\nDecision Tree Metrics")


print("Accuracy: ", accuracy_score(y_test, y_pred_tree))
print("Precision:", precision_score(y_test, y_pred_tree))
print("Recall: ", recall_score(y_test, y_pred_tree))
print("\nClassification Report")
print(classification_report(y_test, y_pred_tree))

# Visualize the decision tree


plt.figure(figsize=(12, 6))
plot_tree(decision_tree_model,
feature_names=iris.feature_names, class_names=['0', '1'],
filled=True)
plt.title("Decision Tree Visualization")
plt.show()

output:
PRACTICAL NO : 8

Name: Class:TYCS Date:

Subject: Data Science Roll no:

Aim :- K-Means clustering

.Apply the K-means algorithm to group similar data points into clusters

.Determine the optimal number of clusters using elbow method or silhouette analysis

.Visualize the clustering results and analyse the cluster characteristics

Code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from sklearn.decomposition import PCA
# Load data
data = pd.read_csv("D:\DOWNLOADS\Wholesale customers
data.csv")

# Define categorical and continuous features


categorical_features = ['Channel', 'Region']
continuous_features = ['Fresh', 'Milk', 'Grocery', 'Frozen',
'Detergents_Paper', 'Delicassen']
# Convert categorical variables to dummy variables
for col in categorical_features:
dummies = pd.get_dummies(data[col], prefix=col)
data = pd.concat([data, dummies], axis=1)
data.drop(col, axis=1, inplace=True)

# Normalize the data


scaler = MinMaxScaler()
data_scaled = scaler.fit_transform(data)

# Elbow Method to find optimal k


sum_of_squared_distances = []
K = range(2, 15)
for k in K:
km = KMeans(n_clusters=k, random_state=42, n_init=10)
km.fit(data_scaled)
sum_of_squared_distances.append(km.inertia_)

plt.figure(figsize=(8, 5))
plt.plot(K, sum_of_squared_distances, 'bo-', markersize=6)
plt.xlabel('Number of Clusters (k)')
plt.ylabel('Sum of Squared Distances')
plt.title('Elbow Method for Optimal k')
plt.show()

# Silhouette Analysis
silhouette_scores = []
for k in K:
km = KMeans(n_clusters=k, random_state=42, n_init=10)
cluster_labels = km.fit_predict(data_scaled)
silhouette_scores.append(silhouette_score(data_scaled,
cluster_labels))

plt.figure(figsize=(8, 5))
plt.plot(K, silhouette_scores, 'ro-', markersize=6)
plt.xlabel('Number of Clusters (k)')
plt.ylabel('Silhouette Score')
plt.title('Silhouette Analysis for Optimal k')
plt.show()
# Choose the best k (based on elbow + silhouette score)
optimal_k = 4 # Change this based on the elbow and silhouette
graphs
kmeans = KMeans(n_clusters=optimal_k, random_state=42,
n_init=10)
data['Cluster'] = kmeans.fit_predict(data_scaled)

# Visualizing Clusters using PCA (for 2D visualization)


pca = PCA(n_components=2)
data_pca = pca.fit_transform(data_scaled)
df_pca = pd.DataFrame(data_pca, columns=['PC1', 'PC2'])
df_pca['Cluster'] = data['Cluster']

plt.figure(figsize=(8, 5))
sns.scatterplot(x='PC1', y='PC2', hue='Cluster',
palette='viridis', data=df_pca, s=50)
plt.title('Cluster Visualization using PCA')
plt.show()

# Analyze cluster characteristics


cluster_means = data.groupby('Cluster').mean()
print("Cluster Characteristics:")
print(cluster_means)

output:
PRACTICAL NO : 9

Name: Class:TYCS Date:

Subject: Data Science Roll no:

Aim :- Principal Component Analysis (PCA)

.Perform PCA on a dataset to reduce dimensionality

.Evaluate the explained variance and select the appropriate number of principal
components

.Visualize the data in the reduced-dimensional space

Code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
# Load the Iris dataset
iris = load_iris()
iris_df = pd.DataFrame(data=np.c_[iris['data'],
iris['target']], columns=iris['feature_names'] + ['target'])

# Separate features and target variable


X = iris_df.drop('target', axis=1)
y = iris_df['target']

# Standardize the data


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA
pca = PCA()
X_pca = pca.fit_transform(X_scaled)
explained_variance_ratio = pca.explained_variance_ratio_

# Plot cumulative explained variance ratio


plt.figure(figsize=(8, 6))
plt.plot(np.cumsum(explained_variance_ratio), marker='o',
linestyle='--')
plt.title('Explained Variance Ratio')
plt.xlabel('Number of Principal Components')
plt.ylabel('Cumulative Explained Variance Ratio')
plt.grid(True)
plt.show()

# Determine the number of components to explain 95% variance


cumulative_variance_ratio =
np.cumsum(explained_variance_ratio)
n_components = np.argmax(cumulative_variance_ratio >= 0.95) +
1
print(f"Number of principal components to explain 95%
variance: {n_components}")

# Apply PCA with selected components


pca = PCA(n_components=n_components)
X_reduced = pca.fit_transform(X_scaled)

# Visualize the reduced-dimensional data


plt.figure(figsize=(8, 6))
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y,
cmap='viridis', s=50, alpha=0.5)
plt.title('Data in Reduced-dimensional Space')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.colorbar(label='Target')
plt.show()

output:
PRACTICAL NO : 10

Name: Class:TYCS Date:

Subject: Data Science Roll no:

Aim :- Data Visualization and Storytelling

.Create meaningful visualizations using data visualization tools

.Combine multiple visualizations to tell a compelling data story

.Present the findings and insights in a clear and concise manner

Code:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
# Generate random data
np.random.seed(42) # Set a seed for reproducibility

# Create a DataFrame with random data


data = pd.DataFrame({
'variable1': np.random.normal(0, 1, 1000),
'variable2': np.random.normal(2, 2, 1000) + 0.5 *
np.random.normal(0, 1, 1000),
'variable3': np.random.normal(-1, 1.5, 1000),
'category': pd.Series(np.random.choice(['A', 'B', 'C',
'D'], size=1000, p=[0.4, 0.3, 0.2, 0.1]), dtype='category')
})

# Create a scatter plot to visualize the relationship between


two variables
plt.figure(figsize=(10, 6))
plt.scatter(data['variable1'], data['variable2'], alpha=0.5)
plt.title('Relationship between Variable 1 and Variable 2',
fontsize=16)
plt.xlabel('Variable 1', fontsize=14)
plt.ylabel('Variable 2', fontsize=14)
plt.show()

# Create a bar chart to visualize the distribution of a


categorical variable
plt.figure(figsize=(10, 6))
sns.countplot(x='category', data=data)
plt.title('Distribution of Categories', fontsize=16)
plt.xlabel('Category', fontsize=14)
plt.ylabel('Count', fontsize=14)
plt.xticks(rotation=45)
plt.show()

# Create a heatmap to visualize the correlation between


numerical variables
plt.figure(figsize=(10, 8))
numerical_cols = ['variable1', 'variable2', 'variable3']
sns.heatmap(data[numerical_cols].corr(), annot=True,
cmap='coolwarm')
plt.title('Correlation Heatmap', fontsize=16)
plt.show()

Output:

You might also like