Open navigation menu

Scribd

0% found this document useful (0 votes)

6 views

Data Science Programs

Uploaded by

senthur kannan thirugnanasambanthan

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Data Science Programs

Uploaded by

senthur kannan thirugnanasambanthan

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Question: Create a Pandas program to read a CSV file, fill missing values with the column

mean, and group the data by a specified category to calculate the average of a numerical

column.

Answer:

import pandas as pd

# Read the CSV file into a DataFrame

file_path = 'data.csv' # Replace with your CSV file path

data = pd.read_csv(file_path)

# Fill missing values in each column with the column mean

data = data.fillna(data.mean(numeric_only=True))

# Specify the category column and numerical column

category_column = 'Category' # Replace with the name of your category column

numerical_column = 'Value' # Replace with the name of your numerical column

# Group the data by the category column and calculate the average of the numerical column

grouped_data = data.groupby(category_column)[numerical_column].mean()

# Display the results

print("Average of numerical column grouped by category:")

print(grouped_data)

Question: Implement a k-nearest neighbors (KNN) classifier using scikit-learn to predict

labels from the Iris dataset, and evaluate the model's accuracy.
Answer:

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import accuracy_score

# Load the Iris dataset

iris = load_iris()

X, y = iris.data, iris.target

# Split the dataset into training and testing sets (80% train, 20% test)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features for better performance

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

# Create the KNN classifier with k=3

knn = KNeighborsClassifier(n_neighbors=3)

# Train the classifier

knn.fit(X_train, y_train)

# Predict labels for the test set

y_pred = knn.predict(X_test)

# Evaluate the model's accuracy

accuracy = accuracy_score(y_test, y_pred)

# Display the accuracy

print("Accuracy of the KNN classifier:", accuracy)

Question: Write a Python program to load a CSV file into a Pandas DataFrame and display

summary statistics (mean, median, and mode) for numerical columns.

Answer:

import pandas as pd

# Load the CSV file into a DataFrame

file_path = 'data.csv' # Replace with the path to your CSV file

data = pd.read_csv(file_path)

# Display the DataFrame

print("DataFrame:")

print(data)

# Calculate and display summary statistics for numerical columns

numerical_data = data.select_dtypes(include=['number'])

# Mean

mean_values = numerical_data.mean()
print("\nMean of numerical columns:")

print(mean_values)

# Median

median_values = numerical_data.median()

print("\nMedian of numerical columns:")

print(median_values)

# Mode

mode_values = numerical_data.mode()

print("\nMode of numerical columns:")

print(mode_values.iloc[0]) # Display the first mode for simplicity

Question: Write a Dask program to load a large CSV file, filter the data based on specific

criteria, and save the results to a new CSV file.

Answer:

import dask.dataframe as dd

# Load the large CSV file into a Dask DataFrame

file_path = 'large_data.csv' # Replace with the path to your large CSV file

data = dd.read_csv(file_path)

# Define the filtering criteria (e.g., filter rows where 'column_name' > 50)

filtered_data = data[data['column_name'] > 50] # Replace 'column_name' and condition as needed

# Save the filtered data to a new CSV file

output_file_path = 'filtered_data.csv'

filtered_data.to_csv(output_file_path, single_file=True, index=False)

print(f"Filtered data has been saved to {output_file_path}")

Question: Write a Python function to calculate the mean, median, and mode of a given list of

numerical values.

Answer:

from statistics import mean, median, mode, StatisticsError

def calculate_statistics(numbers):

"""

Calculate the mean, median, and mode of a list of numerical values.

Args:

numbers (list): A list of numerical values.

Returns:

dict: A dictionary containing the mean, median, and mode.

"""

if not numbers:

return {"mean": None, "median": None, "mode": None}

try:

stats = {

"mean": mean(numbers),
"median": median(numbers),

"mode": mode(numbers),

except StatisticsError:

# Handle cases where mode is not defined (e.g., all values occur equally)

stats = {

"mean": mean(numbers),

"median": median(numbers),

"mode": "No unique mode",

return stats

# Example usage

numbers = [10, 20, 20, 30, 40]

result = calculate_statistics(numbers)

print("Mean:", result["mean"])

print("Median:", result["median"])

print("Mode:", result["mode"])

You might also like

Unit 5 Descriptive Statistics
No ratings yet
Unit 5 Descriptive Statistics
7 pages
Python Pandas II Notes XII
No ratings yet
Python Pandas II Notes XII
20 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
MACHINE LEARNING LAB WORD 12-1-2025. DOCUMENT
No ratings yet
MACHINE LEARNING LAB WORD 12-1-2025. DOCUMENT
68 pages
Week2 lab
No ratings yet
Week2 lab
8 pages
PythonForMachineLearning
No ratings yet
PythonForMachineLearning
66 pages
Aiml Exp 3.1 Mean Median
No ratings yet
Aiml Exp 3.1 Mean Median
2 pages
Aiml Exp 3.1 Mean Median
No ratings yet
Aiml Exp 3.1 Mean Median
2 pages
Exp_1_Introduction to Data Analytics and Python fundamentals_sdk_ok
No ratings yet
Exp_1_Introduction to Data Analytics and Python fundamentals_sdk_ok
9 pages
Python_1st_10
No ratings yet
Python_1st_10
11 pages
Data Handling Using Pandas-II
No ratings yet
Data Handling Using Pandas-II
55 pages
Tutorial Data Visualization Pandas Matplotlib Seaborn
No ratings yet
Tutorial Data Visualization Pandas Matplotlib Seaborn
32 pages
4 PythonPandas
No ratings yet
4 PythonPandas
8 pages
Data Analysis W Pandas
No ratings yet
Data Analysis W Pandas
4 pages
Ss Project With Python
No ratings yet
Ss Project With Python
9 pages
609008987-EDA-Lab-Manual
No ratings yet
609008987-EDA-Lab-Manual
93 pages
EDA Lab Manual
100% (2)
EDA Lab Manual
93 pages
ML Lab Final R22
No ratings yet
ML Lab Final R22
67 pages
AIML LAB MANAUAL R23
100% (1)
AIML LAB MANAUAL R23
10 pages
Lecture 3 Part 1 Understanding Data With Statistics
No ratings yet
Lecture 3 Part 1 Understanding Data With Statistics
7 pages
Chapter1.2 PythonPandas2
No ratings yet
Chapter1.2 PythonPandas2
38 pages
FDS slips solution
No ratings yet
FDS slips solution
7 pages
Document 1
No ratings yet
Document 1
16 pages
Pandas 2
No ratings yet
Pandas 2
17 pages
Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
Python For Statistics
No ratings yet
Python For Statistics
40 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
Experiment No. 1
No ratings yet
Experiment No. 1
7 pages
Ai Programs
No ratings yet
Ai Programs
22 pages
Dsbda Ass3
No ratings yet
Dsbda Ass3
22 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
DataFrame Statistics
No ratings yet
DataFrame Statistics
41 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Descriptive Statistics With Pandas: Data Handling Using Pandas - II
100% (1)
Descriptive Statistics With Pandas: Data Handling Using Pandas - II
37 pages
Random Variable
No ratings yet
Random Variable
10 pages
ml lab
No ratings yet
ml lab
14 pages
Abhiml ML File
No ratings yet
Abhiml ML File
74 pages
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
No ratings yet
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
7 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
Shubh Am
No ratings yet
Shubh Am
70 pages
Python Pandas2 PDF
No ratings yet
Python Pandas2 PDF
38 pages
Dsbda Ass2
No ratings yet
Dsbda Ass2
49 pages
Python Libraries
No ratings yet
Python Libraries
27 pages
70f626ef676e457578caba2d7bae2f6e
No ratings yet
70f626ef676e457578caba2d7bae2f6e
6 pages
Chapter 4 - Python For Data Analysis
No ratings yet
Chapter 4 - Python For Data Analysis
47 pages
Data Preprocessing Tutorial
No ratings yet
Data Preprocessing Tutorial
39 pages
More On Pandas
No ratings yet
More On Pandas
51 pages
3.1. Statistics in Python - Scipy Lecture Notes
No ratings yet
3.1. Statistics in Python - Scipy Lecture Notes
20 pages
Titanic Akshaya
No ratings yet
Titanic Akshaya
12 pages
ML Lab Records
No ratings yet
ML Lab Records
101 pages
Ai Class 12 Practical 2
No ratings yet
Ai Class 12 Practical 2
21 pages
Pandas Dataframe2
No ratings yet
Pandas Dataframe2
12 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
data science programs
No ratings yet
data science programs
11 pages
Python Tutorial - W3school2 PDF
No ratings yet
Python Tutorial - W3school2 PDF
131 pages
Customer Mail Analysis
No ratings yet
Customer Mail Analysis
11 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Unit-V Basic Statistics and Probability: Presentation - Three Forms - Histogram, Bar Chart, Frequency Polygon
No ratings yet
Unit-V Basic Statistics and Probability: Presentation - Three Forms - Histogram, Bar Chart, Frequency Polygon
6 pages
Segunda Asignación de Estadística Aplicada A La Ingeniería Química 2017 II
No ratings yet
Segunda Asignación de Estadística Aplicada A La Ingeniería Química 2017 II
4 pages
Lauric Acid Lab-1
No ratings yet
Lauric Acid Lab-1
2 pages
Utilization of Assessment Data
100% (1)
Utilization of Assessment Data
21 pages
Ejc h2 Math
No ratings yet
Ejc h2 Math
5 pages
Krebs Chapter 05 2013
No ratings yet
Krebs Chapter 05 2013
35 pages
2 Statistics and Probability - G11 - Quarter - 4 - Module - 2 - Identifying Parameter To Be Tested Given A Real Life Problem
No ratings yet
2 Statistics and Probability - G11 - Quarter - 4 - Module - 2 - Identifying Parameter To Be Tested Given A Real Life Problem
3 pages
Stress Measure
No ratings yet
Stress Measure
10 pages
CH 07
No ratings yet
CH 07
20 pages
Flex-Obedized Module For Mathematics in The Modern World: (Name)
No ratings yet
Flex-Obedized Module For Mathematics in The Modern World: (Name)
58 pages
Gmath Module 7
No ratings yet
Gmath Module 7
31 pages
SAT 202310 QAS Math
No ratings yet
SAT 202310 QAS Math
17 pages
Fall 2020 Bus 498 Exit Assessment Test: BBA Core
100% (2)
Fall 2020 Bus 498 Exit Assessment Test: BBA Core
30 pages
Sampling Distributions: Introduction To Business Statistics
No ratings yet
Sampling Distributions: Introduction To Business Statistics
12 pages
1 s2.0 S0013795204002364 Main
No ratings yet
1 s2.0 S0013795204002364 Main
23 pages
L2 - Mathematical Presentation of Data
No ratings yet
L2 - Mathematical Presentation of Data
25 pages
F.Y.B.Sc. Statistics - 20.062019
No ratings yet
F.Y.B.Sc. Statistics - 20.062019
18 pages
IAL Mathematics Formula Book
No ratings yet
IAL Mathematics Formula Book
34 pages
CH 6 10math
No ratings yet
CH 6 10math
27 pages
Planning With Confidence: Ryan Miles
No ratings yet
Planning With Confidence: Ryan Miles
28 pages
CHAPTER 4 Measure of Dispersion
No ratings yet
CHAPTER 4 Measure of Dispersion
76 pages
Exam 3 Solution
No ratings yet
Exam 3 Solution
8 pages
University of Karachi
No ratings yet
University of Karachi
53 pages
Usda Forest Dendrometry PDF
No ratings yet
Usda Forest Dendrometry PDF
24 pages
Module-4 Mathematics in The Modern World
No ratings yet
Module-4 Mathematics in The Modern World
42 pages
Activity 3 General
No ratings yet
Activity 3 General
21 pages
Private Equity Returns, Cash Flow Timing, and Investor Choices
No ratings yet
Private Equity Returns, Cash Flow Timing, and Investor Choices
50 pages
Why The Mean of Likert-Scaled Data Should Be Interpreted Using Rounding Off Rules: A Commentary
No ratings yet
Why The Mean of Likert-Scaled Data Should Be Interpreted Using Rounding Off Rules: A Commentary
4 pages
Math7 q4 Reviewer
No ratings yet
Math7 q4 Reviewer
13 pages
Learning: Estimation of Parameters
No ratings yet
Learning: Estimation of Parameters
18 pages