05 AIHC Exp01

Uploaded by

laxitac115

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views6 pages

05 AIHC Exp01

Uploaded by

laxitac115

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Vidyavardhini’s College of Engineering & Technology

Name: Durvesh Kajrekar

Class: BE/CSE-DS
Experiment No.1
Aim: Collect, Clean, Integrate and Transform Healthcare
Data based on specific disease

Date of Performance: 26/7/24

Date of Submission: 2/8/

HAIMLSBL701 AI&ML in Healthcare Lab

Vidyavardhini’s College of Engineering & Technology

Aim: Collect, Clean, Integrate and Transform Healthcare Data based on specific disease

Objective: The objective of this experiment is to perform basic pre processing on healthcare
data set using python libraries

Theory:

Data Collection- Data collection is the process of gathering and measuring information from
countless different sources. In order to use the data we collect to develop practical artificial
intelligence (AI) and machine learning solutions, it must be collected and stored in a way that
makes sense for the business problem at hand.

Data Cleaning: Cleaning data refers to the way of deleting wrong, corrupted, wrongly
formatted, duplicate information, or incomplete information from a dataset. The possibility of
duplicating or mislabelling data increases when two or more data sources are combined.

Data Integration: Data integration is the practice of consolidating data from disparate sources
into a single dataset with the ultimate goal of providing users with consistent access and
delivery of data across the spectrum of subjects and structure types, and to meet the
information needs of all applications and business processes.

Data transformation: Data transformation is the process of converting, cleansing, and

structuring data into a usable format that can be analyzed to support decision making
processes, and to propel the growth of an organization. Data transformation is used when data
needs to be converted to match that of the destination system.

Code: -
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler, MinMaxScaler, LabelEncoder
from sklearn.decomposition import PCA
from scipy import stats
# Load the dataset
df = pd.read_csv('heart.csv')
# Display basic information
print("Initial Data:")

HAIMLSBL701 AI&ML in Healthcare Lab

Vidyavardhini’s College of Engineering & Technology

print(df.head())
print("\nMissing Values:")
print(df.isnull().sum())
print("\nData Description:")
print(df.describe())
# Data Cleaning
# Handle missing values
df_filled = df.fillna(df.median()) # Filling missing values with median
# Remove duplicates
df_no_duplicates = df_filled.drop_duplicates()
# Outlier Detection and Treatment
z_scores = np.abs(stats.zscore(df_no_duplicates.select_dtypes(include=[np.number])))
df_no_outliers = df_no_duplicates[(z_scores < 3).all(axis=1)]
# Normalization and Standardization
scaler = StandardScaler()
df_scaled = pd.DataFrame(scaler.fit_transform(df_no_outliers.select_dtypes(include=[np.number])),
columns=df_no_outliers.select_dtypes(include=[np.number]).columns)
min_max_scaler = MinMaxScaler()
df_normalized =
pd.DataFrame(min_max_scaler.fit_transform(df_no_outliers.select_dtypes(include=[np.number])),
columns=df_no_outliers.select_dtypes(include=[np.number]).columns)
# Feature Engineering
df_no_outliers['age_group'] = pd.cut(df_no_outliers['age'], bins=[20, 40, 60, 80], labels=['20-39', '40-
59', '60-79'])
# Encoding Categorical Variables
df_no_outliers['sex'] = df_no_outliers['sex'].map({0: 'female', 1: 'male'})
df_encoded = pd.get_dummies(df_no_outliers, columns=['sex'])
# PCA for Dimensionality Reduction
df_std = StandardScaler().fit_transform(df_no_outliers.select_dtypes(include=[np.number]))
pca = PCA(n_components=2)
df_pca = pca.fit_transform(df_std)
df_pca_df = pd.DataFrame(data=df_pca, columns=['PC1', 'PC2'])
# Print processed data and PCA result

HAIMLSBL701 AI&ML in Healthcare Lab

Vidyavardhini’s College of Engineering & Technology

print("\nProcessed Data (first few rows):")

print(df_no_outliers.head())
print("\nEncoded Data (first few rows):")
print(df_encoded.head())
print("\nPCA Data (first few rows):")
print(df_pca_df.head())

Google Collaboratory Link: - https://colab.research.google.com/drive/1Xr-WnJa-

OARr_EZvGOyNUGZctVaWm1xS?usp=sharing

Output:

HAIMLSBL701 AI&ML in Healthcare Lab

Vidyavardhini’s College of Engineering & Technology

HAIMLSBL701 AI&ML in Healthcare Lab

Vidyavardhini’s College of Engineering & Technology

Conclusion: -
We efficiently collected, cleaned, integrated, and transformed healthcare data focused on a
specific disease. The process involved handling missing values, removing duplicates,
addressing outliers, and applying normalization and standardization. We enhanced the dataset
through feature engineering and categorical encoding, and utilized PCA for dimensionality
reduction. These steps optimized the data for accurate analysis and meaningful insights.

HAIMLSBL701 AI&ML in Healthcare Lab

Jesus Is The Living Stone
100% (1)
Jesus Is The Living Stone
50 pages
The Life and Behavior of Living Organ... (Z-Library)
No ratings yet
The Life and Behavior of Living Organ... (Z-Library)
277 pages
Slope Maintenance Manual
100% (1)
Slope Maintenance Manual
111 pages
AIML Record Batch 9
No ratings yet
AIML Record Batch 9
88 pages
ML - Lab Manual
No ratings yet
ML - Lab Manual
54 pages
English Rohingya Dictionary
No ratings yet
English Rohingya Dictionary
142 pages
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
No ratings yet
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
111 pages
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
No ratings yet
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
71 pages
Part A Assignment 6
No ratings yet
Part A Assignment 6
28 pages
Health Care MLH File
No ratings yet
Health Care MLH File
76 pages
GUNADWDM
No ratings yet
GUNADWDM
105 pages
Nike FINAL
No ratings yet
Nike FINAL
49 pages
Ethics and Ai Lab Final
No ratings yet
Ethics and Ai Lab Final
31 pages
Hca Unit - 2 Answers
No ratings yet
Hca Unit - 2 Answers
22 pages
Skill
No ratings yet
Skill
42 pages
4 11 Final Modified Chapter-4
No ratings yet
4 11 Final Modified Chapter-4
32 pages
Iso-Pas 17004 2005 PDF
No ratings yet
Iso-Pas 17004 2005 PDF
11 pages
ML Lab Records
No ratings yet
ML Lab Records
101 pages
CrashBarrierAnalysis WA MainRoads
No ratings yet
CrashBarrierAnalysis WA MainRoads
48 pages
TE ML LAB Mannual
No ratings yet
TE ML LAB Mannual
21 pages
Abpc1203 Psychological Tests and Measurements
100% (1)
Abpc1203 Psychological Tests and Measurements
14 pages
Basra Mechanical Engineering Report
0% (1)
Basra Mechanical Engineering Report
54 pages
Questionaire - Case Study
No ratings yet
Questionaire - Case Study
14 pages
01 Apply Data Preprocessing On Heart Dataset and Evaluate Performance Using Confusion Matrix
No ratings yet
01 Apply Data Preprocessing On Heart Dataset and Evaluate Performance Using Confusion Matrix
19 pages
PERT-CPM in Travel Medicine
No ratings yet
PERT-CPM in Travel Medicine
2 pages
Thyroid Disease Classification Using Machine Learning Project
No ratings yet
Thyroid Disease Classification Using Machine Learning Project
34 pages
Machine Learning Project Checklist
No ratings yet
Machine Learning Project Checklist
30 pages
Data Preprocessing PT 2
No ratings yet
Data Preprocessing PT 2
7 pages
Case Study 1 v2
No ratings yet
Case Study 1 v2
28 pages
Hgs Phase II
No ratings yet
Hgs Phase II
27 pages
Sponsor Affidavit (FB)
100% (1)
Sponsor Affidavit (FB)
2 pages
Subject - Machine Learning Group - E27-24 Name
No ratings yet
Subject - Machine Learning Group - E27-24 Name
18 pages
DS Report 03
No ratings yet
DS Report 03
30 pages
L3 Overview of ML Model Development Lifecycle-1
No ratings yet
L3 Overview of ML Model Development Lifecycle-1
30 pages
Data Anomaly Detection Tool Design Dcoumnet
No ratings yet
Data Anomaly Detection Tool Design Dcoumnet
10 pages
Journal Heart Attack
No ratings yet
Journal Heart Attack
6 pages
Dsbda Lab - 1 - 1736243987425
No ratings yet
Dsbda Lab - 1 - 1736243987425
10 pages
AI - ML in Heathcare
No ratings yet
AI - ML in Heathcare
15 pages
Dsbda Lab - 2.1 - 1736750718198
No ratings yet
Dsbda Lab - 2.1 - 1736750718198
9 pages
Data Preprocessing Example Programs1
No ratings yet
Data Preprocessing Example Programs1
9 pages
Development of The Eye
No ratings yet
Development of The Eye
25 pages
Phase 2
No ratings yet
Phase 2
6 pages
Be A 65 Ads Exp 3
No ratings yet
Be A 65 Ads Exp 3
6 pages
Ds Assign 1
No ratings yet
Ds Assign 1
8 pages
Whether Application For Android OPERATING SYSTEM MAD
No ratings yet
Whether Application For Android OPERATING SYSTEM MAD
13 pages
ML - Preprocessing - Introduction
No ratings yet
ML - Preprocessing - Introduction
14 pages
Sample Worksheet 1
No ratings yet
Sample Worksheet 1
8 pages
Experiment 5
No ratings yet
Experiment 5
10 pages
1 - Data Preprocessing and Cleaning - 55
No ratings yet
1 - Data Preprocessing and Cleaning - 55
8 pages
Experiment 5
No ratings yet
Experiment 5
9 pages
05 AIHC Exp04
No ratings yet
05 AIHC Exp04
8 pages
Manual PDS Expt No. 7,8,9
No ratings yet
Manual PDS Expt No. 7,8,9
6 pages
Phase 2 Aiml
No ratings yet
Phase 2 Aiml
7 pages
05 AIHC Exp03
No ratings yet
05 AIHC Exp03
7 pages
Day-4 DS Practicals
No ratings yet
Day-4 DS Practicals
5 pages
Ass 3 - Average
No ratings yet
Ass 3 - Average
6 pages
Rahul Phase 4...
No ratings yet
Rahul Phase 4...
13 pages
IMPLEMENTATION
No ratings yet
IMPLEMENTATION
6 pages
Fallschirmjagergewehr 42 (FG42) Light Machine Gun (NAZI)
No ratings yet
Fallschirmjagergewehr 42 (FG42) Light Machine Gun (NAZI)
7 pages
Advance Python
No ratings yet
Advance Python
5 pages
DWM Exp 8
No ratings yet
DWM Exp 8
4 pages
Production Optimization: January 21, 2016
100% (1)
Production Optimization: January 21, 2016
45 pages
Endsem PDA Key
No ratings yet
Endsem PDA Key
7 pages
05 AIHC Exp02
No ratings yet
05 AIHC Exp02
11 pages
AI Course Help Guide
No ratings yet
AI Course Help Guide
3 pages
Arnav MLlab01
No ratings yet
Arnav MLlab01
7 pages
Blended Data Cleaning
No ratings yet
Blended Data Cleaning
9 pages
10-2025 - Festival Advance For The FY 2025-26
No ratings yet
10-2025 - Festival Advance For The FY 2025-26
2 pages
AI Project Report: By: Neha Kalra (17csu122) and Prerna Pathak (17csu143)
No ratings yet
AI Project Report: By: Neha Kalra (17csu122) and Prerna Pathak (17csu143)
22 pages
Week 2: Activity #03: How Often Do You ?
No ratings yet
Week 2: Activity #03: How Often Do You ?
7 pages
Exp-2 ML
No ratings yet
Exp-2 ML
6 pages
Exp 2
No ratings yet
Exp 2
6 pages
Aids-B Ii-Ii DSP Lab LP
No ratings yet
Aids-B Ii-Ii DSP Lab LP
2 pages
Test Maths PP PLD Icse Class Viii
No ratings yet
Test Maths PP PLD Icse Class Viii
2 pages
05 AIHC Exp05
No ratings yet
05 AIHC Exp05
6 pages
Type Contract Owner Name Here) : Site Diary
No ratings yet
Type Contract Owner Name Here) : Site Diary
3 pages
Whatare Participles
No ratings yet
Whatare Participles
6 pages
(Direction of Axis - 1.72 W of True North) : Hsna. Lucknow
No ratings yet
(Direction of Axis - 1.72 W of True North) : Hsna. Lucknow
5 pages
How To Write A Good Paragraph: A Step-by-Step Guide
No ratings yet
How To Write A Good Paragraph: A Step-by-Step Guide
3 pages
Mushrooms and Essiac Facts
No ratings yet
Mushrooms and Essiac Facts
10 pages
Assessment Commentary
100% (1)
Assessment Commentary
7 pages
Pharmaceutical Medical Sales Manager in Ashland KY Resume James Dickison
No ratings yet
Pharmaceutical Medical Sales Manager in Ashland KY Resume James Dickison
2 pages
High-Level Target Operating Model TOM Development
50% (4)
High-Level Target Operating Model TOM Development
4 pages
Operate Heat Recovery Steam Generator (HRSG) in A Thermal Power Station
No ratings yet
Operate Heat Recovery Steam Generator (HRSG) in A Thermal Power Station
4 pages
Plat
No ratings yet
Plat
1 page
Daniel R. Forbes Law Enforcement/Security Professional: Mobile No: (512) 573-2746
No ratings yet
Daniel R. Forbes Law Enforcement/Security Professional: Mobile No: (512) 573-2746
3 pages
Database Management System
From Everand
Database Management System
Manish Soni
No ratings yet
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
From Everand
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
JAMIE POWERS
No ratings yet