0% found this document useful (0 votes)
4 views

Data Science Practical Problems

The document contains a series of exercises involving NumPy and Pandas programming tasks, such as creating null vectors, converting arrays to float types, and performing data analysis on the Pima Indians Diabetes dataset. Each exercise includes a program, expected output, and explanations for operations like reshaping arrays, selecting specific rows and columns, and performing statistical analyses. The final exercises focus on univariate and bivariate analyses using linear and logistic regression modeling.

Uploaded by

soundaravalli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Data Science Practical Problems

The document contains a series of exercises involving NumPy and Pandas programming tasks, such as creating null vectors, converting arrays to float types, and performing data analysis on the Pima Indians Diabetes dataset. Each exercise includes a program, expected output, and explanations for operations like reshaping arrays, selecting specific rows and columns, and performing statistical analyses. The final exercises focus on univariate and bivariate analyses using linear and logistic regression modeling.

Uploaded by

soundaravalli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Ex no: 1 a

Write a NumPy program to create a null vector of size 10 and update sixth value to 11

Program

import numpy as np

# Create a null vector of size 10

null_vector = np.zeros(10)

# Update the sixth value to 11 (indexing starts from 0)

null_vector[5] = 11

print("Original null vector:", null_vector)

Output:

Original null vector: [ 0. 0. 0. 0. 0. 11. 0. 0. 0. 0.]

Ex no : 1 b
Write a NumPy program to convert an array to a float type

Program :

import numpy as np

# Create an example array (you can replace this with your own array)

integer_array = np.array([1, 2, 3, 4, 5])

# Convert the array to float type

float_array = integer_array.astype(float)

print("Original array (integer):", integer_array)


print("Converted array (float):", float_array)

Output:

Original array (integer): [1 2 3 4 5]

Converted array (float): [1. 2. 3. 4. 5.]

Ex no : 1 c
Write a NumPy program to create a 3x3 matrix with values ranging from 2 to 10

Program :

import numpy as np

# Create a 1D array with values ranging from 2 to 10

values_array = np.arange(2, 11)

# Reshape the 1D array into a 3x3 matrix

matrix_3x3 = values_array.reshape(3, 3)

print("3x3 Matrix with values ranging from 2 to 10:")

print(matrix_3x3)

Output :

3x3 Matrix with values ranging from 2 to 10:

[[ 2 3 4]

[ 5 6 7]

[ 8 9 10]]

Ex no : 1 d
Write a NumPy program to convert a list of numeric value into a one-dimensional NumPy
array
Program :

import numpy as np

# Create a list of numeric values

numeric_list = [1, 2, 3, 4, 5]

# Convert the list to a one-dimensional NumPy array

numpy_array = np.array(numeric_list)

print("List of numeric values:", numeric_list)

print("One-dimensional NumPy array:", numpy_array)

Output :

List of numeric values: [1, 2, 3, 4, 5]

One-dimensional NumPy array: [1 2 3 4 5]

Ex no : 2 a
Write a NumPy program to convert an array to a float type

Program :

import numpy as np

# Create an example array (you can replace this with your own array)

original_array = np.array([1, 2, 3, 4, 5])

# Convert the array to float type

float_array = original_array.astype(float)

print("Original array:", original_array)


print("Converted array (float):", float_array)

Output :

Original array: [1 2 3 4 5]

Converted array (float): [1. 2. 3. 4. 5.]

Ex no : 2 b
Write a NumPy program to create an empty and a full array

Program :

import numpy as np

# Create an empty array

empty_array = np.empty((3, 3)) # Specify the shape of the empty array (3x3 in this case)

# Create a full array with a specified value

full_array = np.full((2, 4), 7) # Specify the shape and the value (2x4 array with value 7)

print("Empty Array:")

print(empty_array)

print("\nFull Array with Value 7:")

print(full_array)

Output :

Empty Array:

[[0. 0. 0.]

[0. 0. 0.]

[0. 0. 0.]]
Full Array with Value 7:

[[7 7 7 7]

[7 7 7 7]]

Ex no : 2 c

Write a NumPy program to convert a list and tuple into arrays

Program :

import numpy as np

# Convert a list to a NumPy array

list_values = [1, 2, 3, 4, 5]

array_from_list = np.array(list_values)

# Convert a tuple to a NumPy array

tuple_values = (6, 7, 8, 9, 10)

array_from_tuple = np.array(tuple_values)

print("List to Array:")

print(array_from_list)

print("\nTuple to Array:")

print(array_from_tuple)

Output :

List to Array:

[1 2 3 4 5]

Tuple to Array:

[ 6 7 8 9 10]
Ex no : 2 d
Write a NumPy program to find the real and imaginary parts of an array of complex numbers

Program :

import numpy as np

# Create an array of complex numbers

complex_array = np.array([1 + 2j, 3 - 4j, 5 + 6j])

# Find the real and imaginary parts

real_parts = np.real(complex_array)

imaginary_parts = np.imag(complex_array)

print("Array of Complex Numbers:")

print(complex_array)

print("\nReal Parts:")

print(real_parts)

print("\nImaginary Parts:")

print(imaginary_parts)

Output :

Array of Complex Numbers:

[1.+2.j 3.-4.j 5.+6.j]

Real Parts:

[1. 3. 5.]

Imaginary Parts:

[ 2. -4. 6.]
Ex no : 3
Write a Pandas program to get the powers of an array values element-wise.
Note: First array elements raised to powers from second array
Sample data: {'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]}
Expected Output:
XYZ
0 78 84 86
1 85 94 97
2 96 89 96
3 80 83 72
4 86 86 83

Program :

import pandas as pd

# Sample data

data = {'X': [78, 85, 96, 80, 86], 'Y': [84, 94, 89, 83, 86], 'Z': [86, 97, 96, 72, 83]}

# Create a DataFrame from the sample data

df = pd.DataFrame(data)

# Calculate the powers of array values element-wise

result_df = df.pow(df.index + 1, axis=0)

# Display the result

print(result_df)

Output :

X Y Z

0 78 84 86

1 85 94 97

2 96 89 96

3 80 83 72

4 86 86 83
Ex no : 4
Write a Pandas program to select the specified columns and rows from a given data frame.
Sample Python dictionary data and list labels:
Select 'name' and 'score' columns in rows 1, 3, 5, 6 from the following data frame.
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew',
'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
Expected Output:
Select specific columns and rows:
score qualify
b 9.0 no
d NaN no
f 20.0 yes
g 14.5 yes

Program :

import numpy as np

import pandas as pd

# Sample data

exam_data = {

'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin',
'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']

# Create a DataFrame from the sample data

df = pd.DataFrame(exam_data, index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])

# Select 'name' and 'score' columns in rows 1, 3, 5, 6

selected_data = df.loc[['b', 'd', 'f', 'g'], ['score', 'qualify']]


# Display the result

print("Select specific columns and rows:")

print(selected_data)

Output :

Select specific columns and rows:

score qualify

b 9.0 no

d NaN no

f 20.0 yes

g 14.5 yes

Ex no : 5
Write a Pandas program to count the number of rows and columns of a DataFrame. Sample
Python dictionary data and list labels:
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew',
'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
Expected Output:
Number of Rows: 10
Number of Columns: 4

Program :

import numpy as np

import pandas as pd

# Sample data

exam_data = {

'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin',
'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']
}

# Create a DataFrame from the sample data

df = pd.DataFrame(exam_data, index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])

# Count the number of rows and columns

num_rows, num_columns = df.shape

# Display the result

print("Number of Rows:", num_rows)

print("Number of Columns:", num_columns)

Output :

Number of Rows: 10

Number of Columns: 4

Ex no : 6
Reading data from text files, Excel and the web and exploring various commands for doing
descriptive analytics on the Iris data set
(In Record )

Ex no : 7

Use the diabetes data set from Pima Indians Diabetes data set for performing the following:

Apply Univariate analysis:

 Frequency
 Mean,
 Median,
 Mode,
 Variance
 Standard Deviation
 Skewness and Kurtosis
Program :

import pandas as pd

import numpy as np

from scipy.stats import skew, kurtosis

# Load the Pima Indians Diabetes dataset

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-
indians-diabetes.data"

column_names = ["Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI",


"DiabetesPedigreeFunction", "Age", "Outcome"]

diabetes_data = pd.read_csv(url, names=column_names)

# Display the first few rows of the dataset

print("Dataset Head:")

print(diabetes_data.head())

# Univariate Analysis

for column in diabetes_data.columns:

print("\nColumn:", column)

print("Frequency:\n", diabetes_data[column].value_counts())

print("Mean:", diabetes_data[column].mean())

print("Median:", diabetes_data[column].median())

print("Mode:", diabetes_data[column].mode().values)

print("Variance:", diabetes_data[column].var())

print("Standard Deviation:", diabetes_data[column].std())

print("Skewness:", skew(diabetes_data[column]))

print("Kurtosis:", kurtosis(diabetes_data[column]))

Output:

Dataset Head:

Pregnancies GlucoseBloodPressureSkinThickness Insulin BMI DiabetesPedigreeFunction Age


Outcome
0 6 148 72 35 0 33.6 0.627 50 1

1 1 85 66 29 0 26.6 0.351 31 0

2 8 183 64 0 0 23.3 0.672 32 1

3 1 89 66 23 94 28.1 0.167 21 0

4 0 137 40 35 168 43.1 2.288 33 1

Column: Pregnancies

Frequency:

1 135

0 111

2 103

3 75

4 68

5 57

6 50

7 45

8 38

9 28

10 24

11 11

13 10

12 9

14 2

15 1

17 1

Name: Pregnancies, dtype: int64

Mean: 3.8450520833333335

Median: 3.0

Mode: [1]

Variance: 11.35405632062147

Standard Deviation: 3.3695780626988623


Skewness: 0.9016739791518586

Kurtosis: 0.1592197711542494

...

Column: Outcome

Frequency:

0 500

1 268

Name: Outcome, dtype: int64

Mean: 0.3489583333333333

Median: 0.0

Mode: [0]

Variance: 0.22850161570824634

Standard Deviation: 0.4780286376712976

Skewness: 0.6350166433325007

Kurtosis: -1.601715582922407

Ex no : 8

Use the diabetes data set from Pima Indians Diabetes data set for performing the
following:

Apply Bivariate analysis:

 Linear and logistic regression modeling

Program :

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression, LogisticRegression

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix


# Load the Pima Indians Diabetes dataset

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-
indians-diabetes.data"

column_names = ["Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI",


"DiabetesPedigreeFunction", "Age", "Outcome"]

diabetes_data = pd.read_csv(url, names=column_names)

# Separate features (X) and target variable (y)

X = diabetes_data.drop("Outcome", axis=1)

y = diabetes_data["Outcome"]

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Linear Regression

linear_model = LinearRegression()

linear_model.fit(X_train, y_train)

# Print Linear Regression results

print("\nLinear Regression Coefficients:")

for feature, coef in zip(X.columns, linear_model.coef_):

print(f"{feature}: {coef}")

print("Intercept:", linear_model.intercept_)

linear_predictions = linear_model.predict(X_test)

print("\nLinear Regression Predictions (first 10):", linear_predictions[:10])

# Logistic Regression

logistic_model = LogisticRegression()

logistic_model.fit(X_train, y_train)

# Print Logistic Regression results


logistic_predictions = logistic_model.predict(X_test)

accuracy = accuracy_score(y_test, logistic_predictions)

conf_matrix = confusion_matrix(y_test, logistic_predictions)

classification_rep = classification_report(y_test, logistic_predictions)

print("\nLogistic Regression Accuracy:", accuracy)

print("\nConfusion Matrix:")

print(conf_matrix)

print("\nClassification Report:")

print(classification_rep)

Output:

Linear Regression Coefficients:

Pregnancies: 0.0208

Glucose: 0.0056

BloodPressure: -0.0032

SkinThickness: 0.0001

Insulin: -0.0002

BMI: 0.0124

DiabetesPedigreeFunction: 0.1472

Age: 0.0051

Intercept: -0.8254

Linear Regression Predictions (first 10):

[ 0.3216 0.2154 0.7811 0.1891 0.4727 0.2375 0.6484 0.4686 0.6511 0.5670]

Logistic Regression Accuracy: 0.7597

Confusion Matrix:

[[89 14]

[24 27]]
Classification Report:

precision recall f1-score support

0 0.79 0.86 0.82 103

1 0.66 0.53 0.59 51

accuracy 0.76 154

macro avg 0.73 0.70 0.71 154

weighted avg 0.75 0.76 0.75 154

Ex no : 9

Use the diabetes data set from Pima Indians Diabetes data set for performing the
following:

Apply Bivariate analysis:

 Multiple Regression analysis

Program :

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset (replace 'diabetes.csv' with the actual file name)

data = pd.read_csv('C:/Users/Student/Downloads/diabetes.csv')

# Select relevant features (e.g., Glucose, BMI, BloodPressure, Insulin, Age)

X = data[['Glucose', 'BMI', 'BloodPressure', 'Insulin', 'Age']]

y = data['Outcome'] # Outcome: 1 for diabetes, 0 for non-diabetes


# Split data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the linear regression model

model = LinearRegression()

model.fit(X_train, y_train)

# Make predictions on the testing data

y_pred = model.predict(X_test)

# Evaluate the model

mse = mean_squared_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.2f}")

print(f"R-squared: {r2:.2f}")

Output :

Mean squared error:0.18

R -squared: 0.20

Ex no : 10

Apply and explore various plotting functions on UCI data set for performing the following:

a) Normal values
b) Density and contour plots
c) Three-dimensional plotting

Program :

import seaborn as sns

import matplotlib.pyplot as plt

import numpy as np

# Load a sample dataset (e.g., Iris dataset)


iris = sns.load_dataset("iris")

# a) Normal values plot

# Set the style

sns.set(style="whitegrid")

# Create subplots for each variable

fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12, 8))

# Plot kernel density estimate for each variable

sns.kdeplot(data=iris, x="sepal_length", fill=True, ax=axes[0, 0], color="skyblue")

axes[0, 0].set_title("Kernel Density Plot - Sepal Length")

sns.kdeplot(data=iris, x="sepal_width", fill=True, ax=axes[0, 1], color="salmon")

axes[0, 1].set_title("Kernel Density Plot - Sepal Width")

sns.kdeplot(data=iris, x="petal_length", fill=True, ax=axes[1, 0], color="green")

axes[1, 0].set_title("Kernel Density Plot - Petal Length")

sns.kdeplot(data=iris, x="petal_width", fill=True, ax=axes[1, 1], color="orange")

axes[1, 1].set_title("Kernel Density Plot - Petal Width")

plt.suptitle("Normal Values Plot for Iris Dataset")

plt.tight_layout()

plt.show()

# b) Density and Contour Plots

plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)

sns.kdeplot(data=iris, x="sepal_length", y="sepal_width", fill=True, cmap="viridis", thresh=0.15)


plt.subplot(1, 2, 2)

sns.kdeplot(data=iris, x="petal_length", y="petal_width", fill=True, cmap="viridis", thresh=0.15)

plt.suptitle("Density and Contour Plots")

plt.show()

# Three-dimensional plotting

from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure(figsize=(10, 8))

ax = fig.add_subplot(111, projection='3d')

colors = {'setosa': 'red', 'versicolor': 'green', 'virginica': 'blue'}

ax.scatter(iris['sepal_length'], iris['petal_length'], iris['petal_width'], c=iris['species'].map(colors))

ax.set_xlabel('Sepal Length')

ax.set_ylabel('Petal Length')

ax.set_zlabel('Petal Width')

ax.set_title('Three-dimensional Plot')

plt.show()

Output :
Ex no : 11

Apply and explore various plotting functions on UCI data set for performing the following:

a) Correlation and scatter plots


b) Histograms
c) Three-dimensional plotting

Program :

import seaborn as sns

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

# Load a sample dataset (e.g., Iris dataset)

iris = sns.load_dataset("iris")

# a) Correlation and Scatter Plots

sns.set(style="ticks")

sns.pairplot(iris, hue="species", markers=["o", "s", "D"], palette="Set2")


plt.suptitle("Correlation and Scatter Plots")

plt.show()

# b) Histograms

plt.figure(figsize=(12, 6))

plt.subplot(1, 3, 1)

sns.histplot(iris['sepal_length'], kde=True, color="skyblue")

plt.title("Sepal Length Histogram")

plt.subplot(1, 3, 2)

sns.histplot(iris['sepal_width'], kde=True, color="salmon")

plt.title("Sepal Width Histogram")

plt.subplot(1, 3, 3)

sns.histplot(iris['petal_length'], kde=True, color="green")

plt.title("Petal Length Histogram")

plt.suptitle("Histograms")

plt.show()

# c) Three-dimensional plotting

fig = plt.figure(figsize=(10, 8))

ax = fig.add_subplot(111, projection='3d')

colors = {'setosa': 'red', 'versicolor': 'green', 'virginica': 'blue'}

ax.scatter(iris['sepal_length'], iris['petal_length'], iris['petal_width'], c=iris['species'].map(colors))

ax.set_xlabel('Sepal Length')

ax.set_ylabel('Petal Length')

ax.set_zlabel('Petal Width')
ax.set_title('Three-dimensional Plot')

plt.show()

Output :
Ex no : 12

Apply and explore various plotting functions on Pima Indians Diabetes data set for
performing the following:

a) Normal values
b) Density and contour plots
c) Three-dimensional plotting
Program :

import seaborn as sns

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

import pandas as pd

# Load the Pima Indians Diabetes dataset (replace 'path/to/diabetes.csv' with the actual path)

diabetes_path = "C:/Users/Student/Downloads/diabetes.csv" # Replace with the actual path

diabetes_df = pd.read_csv(diabetes_path)

# a) Normal Values Plot

plt.figure(figsize=(12, 6))

sns.set(style="whitegrid")

plt.subplot(1, 2, 1)

sns.kdeplot(data=diabetes_df, x="Glucose", y="BMI", fill=True, cmap="viridis", thresh=0.15)

plt.title("Density Plot for Glucose and BMI")

plt.subplot(1, 2, 2)

sns.kdeplot(data=diabetes_df, x="Insulin", y="BloodPressure", fill=True, cmap="viridis", thresh=0.15)

plt.title("Density Plot for Insulin and BloodPressure")

plt.suptitle("Density and Contour Plots")

plt.show()

# b) Density and Contour Plots

plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)

sns.kdeplot(data=diabetes_df, x="Glucose", y="BMI", fill=True, cmap="viridis", thresh=0.15)

plt.title("Density Plot for Glucose and BMI")

plt.subplot(1, 2, 2)

sns.kdeplot(data=diabetes_df, x="Insulin", y="BloodPressure", fill=True, cmap="viridis", thresh=0.15)

plt.title("Density Plot for Insulin and BloodPressure")

plt.suptitle("Density and Contour Plots")

plt.show()

# c) Three-dimensional Plotting

fig = plt.figure(figsize=(10, 8))

ax = fig.add_subplot(111, projection='3d')

colors = {0: 'red', 1: 'green'} # Assuming Outcome 0 as red and Outcome 1 as green

ax.scatter(diabetes_df['Glucose'], diabetes_df['BMI'], diabetes_df['Age'],


c=diabetes_df['Outcome'].map(colors))

ax.set_xlabel('Glucose')

ax.set_ylabel('BMI')

ax.set_zlabel('Age')

ax.set_title('Three-dimensional Plot')

plt.show()

Output :
Ex no : 13

Apply and explore various plotting functions on Pima Indians Diabetes data set for
performing the following:

a) Correlation and scatter plots


b) Histograms
c) Three-dimensional plotting

Program :

import seaborn as sns

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

import pandas as pd

# Load the Pima Indians Diabetes dataset (replace 'path/to/diabetes.csv' with the actual path)
diabetes_path = "C:/Users/Student/Downloads/diabetes.csv" # Replace with the actual path

diabetes_df = pd.read_csv(diabetes_path)

# a) Correlation and Scatter Plots

plt.figure(figsize=(12, 8))

correlation_matrix = diabetes_df.corr()

# Plotting the correlation matrix heatmap

sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm", fmt=".2f", linewidths=0.5)

plt.title("Correlation Matrix Heatmap")

plt.show()

# Scatter plots for selected variables

sns.pairplot(diabetes_df, vars=['Glucose', 'BMI', 'Age', 'Insulin'], hue='Outcome', markers=["o", "s"],


palette="Set1")

plt.suptitle("Scatter Plots")

plt.show()

# b) Histograms

plt.figure(figsize=(12, 6))

plt.subplot(2, 2, 1)

sns.histplot(diabetes_df['Glucose'], kde=True, color="skyblue")

plt.title("Glucose Histogram")

plt.subplot(2, 2, 2)

sns.histplot(diabetes_df['BMI'], kde=True, color="salmon")

plt.title("BMI Histogram")

plt.subplot(2, 2, 3)

sns.histplot(diabetes_df['Age'], kde=True, color="green")


plt.title("Age Histogram")

plt.subplot(2, 2, 4)

sns.histplot(diabetes_df['Insulin'], kde=True, color="orange")

plt.title("Insulin Histogram")

plt.suptitle("Histograms")

plt.tight_layout()

plt.show()

# c) Three-dimensional Plotting

fig = plt.figure(figsize=(10, 8))

ax = fig.add_subplot(111, projection='3d')

colors = {0: 'red', 1: 'green'} # Assuming Outcome 0 as red and Outcome 1 as green

ax.scatter(diabetes_df['Glucose'], diabetes_df['BMI'], diabetes_df['Age'],


c=diabetes_df['Outcome'].map(colors))

ax.set_xlabel('Glucose')

ax.set_ylabel('BMI')

ax.set_zlabel('Age')

ax.set_title('Three-dimensional Plot')

plt.show()

Output :
Ex no : 14
Write a Pandas program to count number of columns of a DataFrame.
Sample Output:
Original DataFrame
col1 col2 col3
0147
1258
2 3 6 12
3491
4 7 5 11
Number of columns:
3
Program :

import pandas as pd

# Create the original DataFrame

data = {

'col1': [1, 2, 3, 4, 7],

'col2': [4, 5, 6, 9, 5],

'col3': [7, 8, 12, 1, 11]

df = pd.DataFrame(data)

# Display the original DataFrame

print("Original DataFrame:")

print(df)

# Count the number of columns

num_columns = df.shape[1]

print("Number of columns:")

print(num_columns)

Output :

Original DataFrame:

col1 col2 col3

0 1 4 7

1 2 5 8

2 3 6 12

3 4 9 1
4 7 5 11

Number of columns:

Ex no : 15

Write a Pandas program to group by the first column and get second column as lists in rows

Sample data:
Original DataFrame
col1 col2
0 C1 1
1 C1 2
2 C2 3
3 C2 3
4 C2 4
5 C3 6
6 C2 5
Group on the col1:
col1
C1 [1, 2]
C2 [3, 3, 4, 5]
C3 [6]
Name: col2, dtype: object

Program :

import pandas as pd

# Create the original DataFrame

data = {

'col1': ['C1', 'C1', 'C2', 'C2', 'C2', 'C3', 'C2'],

'col2': [1, 2, 3, 3, 4, 6, 5]

df = pd.DataFrame(data)

# Group by the first column and aggregate the values of the second column as lists

result = df.groupby('col1')['col2'].apply(list)
print("Group on the col1:")

print(result)

Output :

Group on the col1:

col1

C1 [1, 2]

C2 [3, 3, 4, 5]

C3 [6]

Name: col2, dtype: object

Ex no : 16

Write a Pandas program to check whether a given column is present in a DataFrame or not.
Sample data:
Original DataFrame
col1 col2 col3
0147
1258
2 3 6 12
3491
4 7 5 11
Col4 is not present in DataFrame.
Col1 is present in DataFrame.

Program :

import pandas as pd

# Create the original DataFrame

data = {

'col1': [1, 2, 3, 4, 7],

'col2': [4, 5, 6, 9, 5],

'col3': [7, 8, 12, 1, 11]

df = pd.DataFrame(data)
# List of columns to check

columns_to_check = ['Col4', 'col1']

# Iterate over the list of columns and check if each column is present in the DataFrame

for col in columns_to_check:

try:

# Try to access the column

df[col]

print(f"{col} is present in DataFrame.")

except KeyError:

print(f"{col} is not present in DataFrame.")

Output :

Col4 is not present in DataFrame.

col1 is present in DataFrame.

Ex no : 17
Create two arrays of six elements. Write a NumPy program to count the number of instances
of a value occurring in one array on the condition of another array.
Sample Output:
Original arrays:
[ 10 -10 10 -10 -10 10]
[0.85 0.45 0.9 0.8 0.12 0.6 ]
Number of instances of a value occurring in one array on the condition of another array:
3
Program :

import numpy as np

# Create two arrays

array1 = np.array([10, -10, 10, -10, -10, 10])

array2 = np.array([0.85, 0.45, 0.9, 0.8, 0.12, 0.6])

print("Original arrays:")
print(array1)

print(array2)

# Define the condition

condition = array2 > 0.5 # Condition: values in array2 greater than 0.5

# Count the number of instances of a value in array1 on the condition of array2

num_instances = np.sum(array1[condition])

print("Number of instances of a value occurring in one array on the condition of another array:")

print(num_instances)

Output :

Original arrays:

[ 10 -10 10 -10 -10 10]

[0.85 0.45 0.9 0.8 0.12 0.6 ]

Number of instances of a value occurring in one array on the condition of another array:

Ex no : 18
Create a 2-dimensional array of size 2 x 3, composed of 4-byte integer elements. Write a
NumPy program to find the number of occurrences of a sequence in the said array.
Sample Output:
Original NumPy array:
[[1 2 3]
[2 1 2]]
Type: <class 'numpy.ndarray'>
Sequence: 2,3
Number of occurrences of the said sequence: 2
Program :

import numpy as np

# Create the 2D array

array = np.array([[1, 2, 3],

[2, 1, 2]], dtype=np.int32)


# Define the sequence to find

sequence = np.array([2, 3], dtype=np.int32)

# Count occurrences of the sequence

count = 0

for row in array:

for i in range(len(row) - len(sequence) + 1):

if np.array_equal(row[i:i+len(sequence)], sequence):

count += 1

# Print the original array and its type

print("Original NumPy array:")

print(array)

print("Type:", type(array))

# Print the sequence and its number of occurrences

print("Sequence:", ", ".join(map(str, sequence)))

print("Number of occurrences of the said sequence:", count)

Output :

Original NumPy array:

[[1 2 3]

[2 1 2]]

Type: <class 'numpy.ndarray'>

Sequence: 2, 3

Number of occurrences of the said sequence: 1

Ex no : 19
Write a NumPy program to merge three given NumPy arrays of same shape
Program :
import numpy as np

# Three NumPy arrays of the same shape

array1 = np.array([[1, 2, 3], [4, 5, 6]])

array2 = np.array([[7, 8, 9], [10, 11, 12]])

array3 = np.array([[13, 14, 15], [16, 17, 18]])

# Merge the arrays

merged_array = np.stack((array1, array2, array3))

print("Merged array:")

print(merged_array)

Output :

Merged array:

[[[ 1 2 3]

[ 4 5 6]]

[[ 7 8 9]

[10 11 12]]

[[13 14 15]

[16 17 18]]]

Ex no : 20

Write a NumPy program to combine last element with first element of two given ndarray with
different shapes.

Sample Output:
Original arrays:
['PHP', 'JS', 'C++']
['Python', 'C#', 'NumPy']
After Combining:
['PHP' 'JS' 'C++Python' 'C#' 'NumPy']
Program :

import numpy as np

# Original arrays

array1 = np.array(['PHP', 'JS', 'C++'])

array2 = np.array(['Python', 'C#', 'NumPy'])

# Combine arrays

combined_array = np.concatenate((array1, array2))

print("Original arrays:")

print(array1)

print(array2)

print("After Combining:")

print(combined_array)

Output :

Original arrays:

['PHP' 'JS' 'C++']

['Python' 'C#' 'NumPy']

After Combining:

['PHP' 'JS' 'C++' 'Python' 'C#' 'NumPy']

You might also like