Data Science Practical Problems
Data Science Practical Problems
Write a NumPy program to create a null vector of size 10 and update sixth value to 11
Program
import numpy as np
null_vector = np.zeros(10)
null_vector[5] = 11
Output:
Ex no : 1 b
Write a NumPy program to convert an array to a float type
Program :
import numpy as np
# Create an example array (you can replace this with your own array)
float_array = integer_array.astype(float)
Output:
Ex no : 1 c
Write a NumPy program to create a 3x3 matrix with values ranging from 2 to 10
Program :
import numpy as np
matrix_3x3 = values_array.reshape(3, 3)
print(matrix_3x3)
Output :
[[ 2 3 4]
[ 5 6 7]
[ 8 9 10]]
Ex no : 1 d
Write a NumPy program to convert a list of numeric value into a one-dimensional NumPy
array
Program :
import numpy as np
numeric_list = [1, 2, 3, 4, 5]
numpy_array = np.array(numeric_list)
Output :
Ex no : 2 a
Write a NumPy program to convert an array to a float type
Program :
import numpy as np
# Create an example array (you can replace this with your own array)
float_array = original_array.astype(float)
Output :
Original array: [1 2 3 4 5]
Ex no : 2 b
Write a NumPy program to create an empty and a full array
Program :
import numpy as np
empty_array = np.empty((3, 3)) # Specify the shape of the empty array (3x3 in this case)
full_array = np.full((2, 4), 7) # Specify the shape and the value (2x4 array with value 7)
print("Empty Array:")
print(empty_array)
print(full_array)
Output :
Empty Array:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
Full Array with Value 7:
[[7 7 7 7]
[7 7 7 7]]
Ex no : 2 c
Program :
import numpy as np
list_values = [1, 2, 3, 4, 5]
array_from_list = np.array(list_values)
array_from_tuple = np.array(tuple_values)
print("List to Array:")
print(array_from_list)
print("\nTuple to Array:")
print(array_from_tuple)
Output :
List to Array:
[1 2 3 4 5]
Tuple to Array:
[ 6 7 8 9 10]
Ex no : 2 d
Write a NumPy program to find the real and imaginary parts of an array of complex numbers
Program :
import numpy as np
real_parts = np.real(complex_array)
imaginary_parts = np.imag(complex_array)
print(complex_array)
print("\nReal Parts:")
print(real_parts)
print("\nImaginary Parts:")
print(imaginary_parts)
Output :
Real Parts:
[1. 3. 5.]
Imaginary Parts:
[ 2. -4. 6.]
Ex no : 3
Write a Pandas program to get the powers of an array values element-wise.
Note: First array elements raised to powers from second array
Sample data: {'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]}
Expected Output:
XYZ
0 78 84 86
1 85 94 97
2 96 89 96
3 80 83 72
4 86 86 83
Program :
import pandas as pd
# Sample data
data = {'X': [78, 85, 96, 80, 86], 'Y': [84, 94, 89, 83, 86], 'Z': [86, 97, 96, 72, 83]}
df = pd.DataFrame(data)
print(result_df)
Output :
X Y Z
0 78 84 86
1 85 94 97
2 96 89 96
3 80 83 72
4 86 86 83
Ex no : 4
Write a Pandas program to select the specified columns and rows from a given data frame.
Sample Python dictionary data and list labels:
Select 'name' and 'score' columns in rows 1, 3, 5, 6 from the following data frame.
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew',
'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
Expected Output:
Select specific columns and rows:
score qualify
b 9.0 no
d NaN no
f 20.0 yes
g 14.5 yes
Program :
import numpy as np
import pandas as pd
# Sample data
exam_data = {
'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin',
'Jonas'],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']
df = pd.DataFrame(exam_data, index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
print(selected_data)
Output :
score qualify
b 9.0 no
d NaN no
f 20.0 yes
g 14.5 yes
Ex no : 5
Write a Pandas program to count the number of rows and columns of a DataFrame. Sample
Python dictionary data and list labels:
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew',
'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
Expected Output:
Number of Rows: 10
Number of Columns: 4
Program :
import numpy as np
import pandas as pd
# Sample data
exam_data = {
'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin',
'Jonas'],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']
}
df = pd.DataFrame(exam_data, index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
Output :
Number of Rows: 10
Number of Columns: 4
Ex no : 6
Reading data from text files, Excel and the web and exploring various commands for doing
descriptive analytics on the Iris data set
(In Record )
Ex no : 7
Use the diabetes data set from Pima Indians Diabetes data set for performing the following:
Frequency
Mean,
Median,
Mode,
Variance
Standard Deviation
Skewness and Kurtosis
Program :
import pandas as pd
import numpy as np
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-
indians-diabetes.data"
print("Dataset Head:")
print(diabetes_data.head())
# Univariate Analysis
print("\nColumn:", column)
print("Frequency:\n", diabetes_data[column].value_counts())
print("Mean:", diabetes_data[column].mean())
print("Median:", diabetes_data[column].median())
print("Mode:", diabetes_data[column].mode().values)
print("Variance:", diabetes_data[column].var())
print("Skewness:", skew(diabetes_data[column]))
print("Kurtosis:", kurtosis(diabetes_data[column]))
Output:
Dataset Head:
1 1 85 66 29 0 26.6 0.351 31 0
3 1 89 66 23 94 28.1 0.167 21 0
Column: Pregnancies
Frequency:
1 135
0 111
2 103
3 75
4 68
5 57
6 50
7 45
8 38
9 28
10 24
11 11
13 10
12 9
14 2
15 1
17 1
Mean: 3.8450520833333335
Median: 3.0
Mode: [1]
Variance: 11.35405632062147
Kurtosis: 0.1592197711542494
...
Column: Outcome
Frequency:
0 500
1 268
Mean: 0.3489583333333333
Median: 0.0
Mode: [0]
Variance: 0.22850161570824634
Skewness: 0.6350166433325007
Kurtosis: -1.601715582922407
Ex no : 8
Use the diabetes data set from Pima Indians Diabetes data set for performing the
following:
Program :
import pandas as pd
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-
indians-diabetes.data"
X = diabetes_data.drop("Outcome", axis=1)
y = diabetes_data["Outcome"]
# Linear Regression
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)
print(f"{feature}: {coef}")
print("Intercept:", linear_model.intercept_)
linear_predictions = linear_model.predict(X_test)
# Logistic Regression
logistic_model = LogisticRegression()
logistic_model.fit(X_train, y_train)
print("\nConfusion Matrix:")
print(conf_matrix)
print("\nClassification Report:")
print(classification_rep)
Output:
Pregnancies: 0.0208
Glucose: 0.0056
BloodPressure: -0.0032
SkinThickness: 0.0001
Insulin: -0.0002
BMI: 0.0124
DiabetesPedigreeFunction: 0.1472
Age: 0.0051
Intercept: -0.8254
[ 0.3216 0.2154 0.7811 0.1891 0.4727 0.2375 0.6484 0.4686 0.6511 0.5670]
Confusion Matrix:
[[89 14]
[24 27]]
Classification Report:
Ex no : 9
Use the diabetes data set from Pima Indians Diabetes data set for performing the
following:
Program :
import pandas as pd
# Load the dataset (replace 'diabetes.csv' with the actual file name)
data = pd.read_csv('C:/Users/Student/Downloads/diabetes.csv')
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
r2 = r2_score(y_test, y_pred)
print(f"R-squared: {r2:.2f}")
Output :
R -squared: 0.20
Ex no : 10
Apply and explore various plotting functions on UCI data set for performing the following:
a) Normal values
b) Density and contour plots
c) Three-dimensional plotting
Program :
import numpy as np
sns.set(style="whitegrid")
plt.tight_layout()
plt.show()
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.show()
# Three-dimensional plotting
ax = fig.add_subplot(111, projection='3d')
ax.set_xlabel('Sepal Length')
ax.set_ylabel('Petal Length')
ax.set_zlabel('Petal Width')
ax.set_title('Three-dimensional Plot')
plt.show()
Output :
Ex no : 11
Apply and explore various plotting functions on UCI data set for performing the following:
Program :
iris = sns.load_dataset("iris")
sns.set(style="ticks")
plt.show()
# b) Histograms
plt.figure(figsize=(12, 6))
plt.subplot(1, 3, 1)
plt.subplot(1, 3, 2)
plt.subplot(1, 3, 3)
plt.suptitle("Histograms")
plt.show()
# c) Three-dimensional plotting
ax = fig.add_subplot(111, projection='3d')
ax.set_xlabel('Sepal Length')
ax.set_ylabel('Petal Length')
ax.set_zlabel('Petal Width')
ax.set_title('Three-dimensional Plot')
plt.show()
Output :
Ex no : 12
Apply and explore various plotting functions on Pima Indians Diabetes data set for
performing the following:
a) Normal values
b) Density and contour plots
c) Three-dimensional plotting
Program :
import pandas as pd
# Load the Pima Indians Diabetes dataset (replace 'path/to/diabetes.csv' with the actual path)
diabetes_df = pd.read_csv(diabetes_path)
plt.figure(figsize=(12, 6))
sns.set(style="whitegrid")
plt.subplot(1, 2, 1)
plt.subplot(1, 2, 2)
plt.show()
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.subplot(1, 2, 2)
plt.show()
# c) Three-dimensional Plotting
ax = fig.add_subplot(111, projection='3d')
colors = {0: 'red', 1: 'green'} # Assuming Outcome 0 as red and Outcome 1 as green
ax.set_xlabel('Glucose')
ax.set_ylabel('BMI')
ax.set_zlabel('Age')
ax.set_title('Three-dimensional Plot')
plt.show()
Output :
Ex no : 13
Apply and explore various plotting functions on Pima Indians Diabetes data set for
performing the following:
Program :
import pandas as pd
# Load the Pima Indians Diabetes dataset (replace 'path/to/diabetes.csv' with the actual path)
diabetes_path = "C:/Users/Student/Downloads/diabetes.csv" # Replace with the actual path
diabetes_df = pd.read_csv(diabetes_path)
plt.figure(figsize=(12, 8))
correlation_matrix = diabetes_df.corr()
plt.show()
plt.suptitle("Scatter Plots")
plt.show()
# b) Histograms
plt.figure(figsize=(12, 6))
plt.subplot(2, 2, 1)
plt.title("Glucose Histogram")
plt.subplot(2, 2, 2)
plt.title("BMI Histogram")
plt.subplot(2, 2, 3)
plt.subplot(2, 2, 4)
plt.title("Insulin Histogram")
plt.suptitle("Histograms")
plt.tight_layout()
plt.show()
# c) Three-dimensional Plotting
ax = fig.add_subplot(111, projection='3d')
colors = {0: 'red', 1: 'green'} # Assuming Outcome 0 as red and Outcome 1 as green
ax.set_xlabel('Glucose')
ax.set_ylabel('BMI')
ax.set_zlabel('Age')
ax.set_title('Three-dimensional Plot')
plt.show()
Output :
Ex no : 14
Write a Pandas program to count number of columns of a DataFrame.
Sample Output:
Original DataFrame
col1 col2 col3
0147
1258
2 3 6 12
3491
4 7 5 11
Number of columns:
3
Program :
import pandas as pd
data = {
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
num_columns = df.shape[1]
print("Number of columns:")
print(num_columns)
Output :
Original DataFrame:
0 1 4 7
1 2 5 8
2 3 6 12
3 4 9 1
4 7 5 11
Number of columns:
Ex no : 15
Write a Pandas program to group by the first column and get second column as lists in rows
Sample data:
Original DataFrame
col1 col2
0 C1 1
1 C1 2
2 C2 3
3 C2 3
4 C2 4
5 C3 6
6 C2 5
Group on the col1:
col1
C1 [1, 2]
C2 [3, 3, 4, 5]
C3 [6]
Name: col2, dtype: object
Program :
import pandas as pd
data = {
'col2': [1, 2, 3, 3, 4, 6, 5]
df = pd.DataFrame(data)
# Group by the first column and aggregate the values of the second column as lists
result = df.groupby('col1')['col2'].apply(list)
print("Group on the col1:")
print(result)
Output :
col1
C1 [1, 2]
C2 [3, 3, 4, 5]
C3 [6]
Ex no : 16
Write a Pandas program to check whether a given column is present in a DataFrame or not.
Sample data:
Original DataFrame
col1 col2 col3
0147
1258
2 3 6 12
3491
4 7 5 11
Col4 is not present in DataFrame.
Col1 is present in DataFrame.
Program :
import pandas as pd
data = {
df = pd.DataFrame(data)
# List of columns to check
# Iterate over the list of columns and check if each column is present in the DataFrame
try:
df[col]
except KeyError:
Output :
Ex no : 17
Create two arrays of six elements. Write a NumPy program to count the number of instances
of a value occurring in one array on the condition of another array.
Sample Output:
Original arrays:
[ 10 -10 10 -10 -10 10]
[0.85 0.45 0.9 0.8 0.12 0.6 ]
Number of instances of a value occurring in one array on the condition of another array:
3
Program :
import numpy as np
print("Original arrays:")
print(array1)
print(array2)
condition = array2 > 0.5 # Condition: values in array2 greater than 0.5
num_instances = np.sum(array1[condition])
print("Number of instances of a value occurring in one array on the condition of another array:")
print(num_instances)
Output :
Original arrays:
Number of instances of a value occurring in one array on the condition of another array:
Ex no : 18
Create a 2-dimensional array of size 2 x 3, composed of 4-byte integer elements. Write a
NumPy program to find the number of occurrences of a sequence in the said array.
Sample Output:
Original NumPy array:
[[1 2 3]
[2 1 2]]
Type: <class 'numpy.ndarray'>
Sequence: 2,3
Number of occurrences of the said sequence: 2
Program :
import numpy as np
count = 0
if np.array_equal(row[i:i+len(sequence)], sequence):
count += 1
print(array)
print("Type:", type(array))
Output :
[[1 2 3]
[2 1 2]]
Sequence: 2, 3
Ex no : 19
Write a NumPy program to merge three given NumPy arrays of same shape
Program :
import numpy as np
print("Merged array:")
print(merged_array)
Output :
Merged array:
[[[ 1 2 3]
[ 4 5 6]]
[[ 7 8 9]
[10 11 12]]
[[13 14 15]
[16 17 18]]]
Ex no : 20
Write a NumPy program to combine last element with first element of two given ndarray with
different shapes.
Sample Output:
Original arrays:
['PHP', 'JS', 'C++']
['Python', 'C#', 'NumPy']
After Combining:
['PHP' 'JS' 'C++Python' 'C#' 'NumPy']
Program :
import numpy as np
# Original arrays
# Combine arrays
print("Original arrays:")
print(array1)
print(array2)
print("After Combining:")
print(combined_array)
Output :
Original arrays:
After Combining: