0% found this document useful (0 votes)

29 views34 pages

Pranav Data Science Lab

Data science notes

Uploaded by

Pranav Coder

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views34 pages

Pranav Data Science Lab

Data science notes

Uploaded by

Pranav Coder

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

SAGE University Indore

LAB MANUAL
Subject code :- ACTDEDSE001P
SUBJECT NAME :- DATA SCIENCE LAB
SEMESTER: VI SEM III YEAR
Session: - Dec - may 2023

Institute of Advance Computing

Submitted to :- Submitted by:-

Prof. Rishi yadav Name:- Pranav Patidar
Enrollment No :- 21ADV3CSE0117
INDEX
S.no Name of the Experiment Date of Page no. Remark
Experiment
1. Write the code to sort an array in NumPy by the nth
column? If yes, then write the code to plot the
histogram?
2. How can you copy objects in Python?

3. Write code to sort a DataFrame in Python in descending

order.
4. Which method in pandas is used to create scatter plot
matrix?
5. Is it possible to plot histogram in Pandas without calling
Matplotlib?

6. Study of various types of Charts for Data Visualization

using Python Programming
7. Write a program to calculate Mean using Python
Programming

8. Write a program to calculate Median using Python

Programming
9. Write a program to calculate Mode using Python
Programming
10. Write a program to calculate correlations using python
programming.

11. Write a program to calculate regression using python

programming.
12. Write a program to calculate Linear regression using
python programming.
13. Write a program to calculate Logistic regression using
python programming.
14. Implement a vector model of information retrieval using
python programming.
15. Implement a decision tree using python programming.
16. Write a program to calculate different types of
probability.
Experiment 1

Write the code to sort an array in NumPy by the nth column? If yes, then write the code to
plot the histogram?

Certainly! Let’s break this down into two steps:

1. Sorting an Array in NumPy by the nth Column: To sort a NumPy array by its
nth column, you can use the following code snippet. Let’s assume you have an
array a and you want to sort it by the second column (index 1):
2. Python
import numpy as np

# Example array
a = np.array([[9, 2, 3], [4, 5, 6], [7, 0, 5]])

# Sort by the second column

sorted_array = a[a[:, 1].argsort()]

print(sorted_array)
3.
4. AI-generated code. Review and use carefully. More info on FAQ.
5. The output will be:
[[7 0 5]
[9 2 3]
[4 5 6]]
6.
7. Plotting a Histogram using Matplotlib: To create a basic histogram using
Matplotlib, you can use the following code snippet. Here, we generate some
random data and visualize its distribution:
8. Python
import matplotlib.pyplot as plt
import numpy as np

# Generate random data (you can replace this with your own data)
data = np.random.randn(1000)

# Create the histogram

plt.hist(data, bins=30, color='skyblue', edgecolor='black')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.title('Basic Histogram')
9. plt.show()
OUTPUT :
Experiment 2

How can you copy objects in Python?

Copy in Python (Deep Copy and Shallow

Copy)
In Python, Assignment statements do not copy objects, they create bindings
between a target and an object. When we use the = operator, It only creates a new
variable that shares the reference of the original object. In order to create “real
copies” or “clones” of these objects, we can use the copy module in Python.

Syntax of Python Deepcopy

Syntax: copy.deepcopy(x)

Syntax of Python Shallowcopy

Syntax: copy.copy(x)

Example:

In order to make these copies, we use the copy module. The copy() returns a
shallow copy of the list, and deepcopy() returns a deep copy of the list. As you can
see that both have the same value but have different IDs.

Example: This code showcases the usage of the copy module to create both
shallow and deep copies of a nested list li1. A shallow copy, li2, is created using
copy.copy(), preserving the top-level structure but sharing references to the inner
lists. A deep copy, li3, is created using copy.deepcopy(), resulting in a
completely independent copy of li1, including all nested elements. The code
prints the IDs and values of li2 and li3, highlighting the distinction between
shallow and deep copies in terms of reference and independence.

import copy

li1 = [1, 2, [3, 5], 4]

li2 = copy.copy(li1)

print("li2 ID: ", id(li2), "Value: ", li2)

li3 = copy.deepcopy(li1)

print("li3 ID: ", id(li3), "Value: ", li3)

Output:

li2 ID: 2521878674624 Value: [1, 2, [3, 5], 4]

li3 ID: 2521878676160 Value: [1, 2, [3, 5], 4]

What is Deep copy in Python?

A deep copy creates a new compound object before inserting copies of the items
found in the original into it in a recursive manner. It means first constructing a new
collection object and then recursively populating it with copies of the child objects
found in the original. In the case of deep copy, a copy of the object is copied into
another object. It means that any changes made to a copy of the object do not
reflect in the original object.
Example:

In the above example, the change made in the list did not affect other lists,
indicating the list is deeply copied.

This code illustrates deep copying of a list with nested elements using the copy
module. It initially prints the original elements of li1, then deep copies them to
create li2. A modification to an element in li2 does not affect li1, as
demonstrated by the separate printouts. This highlights how deep copying creates
an independent copy, preserving the original list’s contents even after changes to
the copy.

import copy

li1 = [1, 2, [3,5], 4]

li2 = copy.deepcopy(li1)

print ("The original elements before deep copying")

for i in range(0,len(li1)):
print (li1[i],end=" ")

print("\r")

li2[2][0] = 7

print ("The new list of elements after deep copying ")

for i in range(0,len( li1)):

print (li2[i],end=" ")

print("\r")

print ("The original elements after deep copying")

for i in range(0,len( li1)):

print (li1[i],end=" ")

Output:

The original elements before deep copying

1 2 [3, 5] 4
The new list of elements after deep copying
1 2 [7, 5] 4
The original elements after deep copying
1 2 [3, 5] 4
Experiment 3

Write code to sort a DataFrame in Python in descending order.

Certainly! To sort a DataFrame in Python in descending order, you can use the
sort_values() method from the pandas library. Let’s assume you have a
DataFrame called df with columns named 'one', 'two', and 'letter'. You want
to sort it based on the 'one' column in descending order. Here’s how you can do it:
Python
import pandas as pd

# Create a sample DataFrame

d = {'one': [2, 3, 1, 4, 5], 'two': [5, 4, 3, 2, 1], 'letter': ['a', 'a', 'b', 'b', 'c']}
df = pd.DataFrame(d)

# Sort the DataFrame by the 'one' column in descending order

sorted_df = df.sort_values(by=['one'], ascending=False)

# Display the sorted DataFrame

print(sorted_df)

OUTPUT:

one two letter

4 5 1 c
3 4 2 b
1 3 4 a
0 2 5 a
2 1 3 b
Experiment 4

Which method in pandas is used to create scatter plot matrix?

To create a scatter plot matrix in pandas, you can use the

pandas.plotting.scatter_matrix() function. This handy method allows you
to visualize the relationships between multiple variables in a dataset simultaneously.
Let’s dive into the details:

Scatter Plot Matrix with pandas.plotting.scatter_matrix()

The syntax for creating a scatter matrix is as follows:
Python
import pandas as pd

# Assume 'df' is your pandas DataFrame

pd.plotting.scatter_matrix(df, alpha=0.5, figsize=None, ax=None, grid=False,
diagonal='hist', marker='.', density_kwds=None, hist_kwds=None,
range_padding=0.05, **kwargs)

AI-generated code. Review and use carefully. More info on FAQ.

Here’s a brief explanation of the parameters:

● frame: The pandas DataFrame containing your data.

● alpha: Amount of transparency applied to the scatter plots (optional).
● figsize: A tuple specifying the width and height of the plot in inches
(optional).
● ax: An optional Matplotlib axis object.
● grid: Set this to True if you want to display a grid.
● diagonal: Choose between ‘hist’ (histogram) or ‘kde’ (Kernel Density
Estimation) for the diagonal plots.
● marker: Matplotlib marker type for the scatter points (default is ‘.’).
● density_kwds: Keyword arguments for the kernel density estimate plot.
● hist_kwds: Keyword arguments for the histogram plot.
● range_padding: Relative extension of axis range in x and y with respect to
the data range.
● **kwargs: Additional keyword arguments to be passed to the scatter
function.
The resulting output will be a matrix of scatter plots, where each cell represents the
relationship between two variables. For example, if you have columns ‘A’, ‘B’, ‘C’, and
‘D’ in your DataFrame, the scatter matrix will display scatter plots for (‘A’, ‘A’), (‘A’, ‘B’),
(‘A’, ‘C’), (‘A’, ‘D’), and so on.

📊🌟
Remember, this scatter matrix provides a visual overview of how your variables
interact, making it a powerful tool for exploratory data analysis!
Experiment 5

Is it possible to plot histogram in Pandas without calling Matplotlib?

Certainly! You can create a histogram directly from a Pandas DataFrame without
explicitly calling Matplotlib. Here are a couple of approaches:

1. Using DataFrame.hist(): The df.hist() function in Pandas allows you

to create histograms for specific columns within your DataFrame. You can
customize the number of bins, grid, and other parameters. Here’s the syntax:

2. Python
df.hist(column=None, by=None, grid=True, xlabelsize=None, xrot=None,
ylabelsize=None, yrot=None, ax=None, sharex=False, sharey=False, figsize=None,
layout=None, bins=10, backend=None, legend=False, **kwargs)

3. You can specify the column you want to create a histogram for, and Pandas
will handle the plotting internally1.
4. Using numpy.histogram(): If you need more control or want to retrieve the
histogram values without actually plotting, you can use
numpy.histogram(). This function computes the histogram and returns the
bin edges and counts. Here’s an example:
5. Python
import numpy as np

def function_hist(a, ini, final):

# Define bins
bins = np.linspace(ini, final, 13)
weights_a = np.ones_like(a) / float(len(a))

# Compute histogram
hist, bin_edges = np.histogram(np.array(a), bins, weights=weights_a)
return hist, bin_edges

# Usage
your_data = [1, 2, 3, 4, 5, 6, 7, 8, 9]
hist_values, bin_edges = function_hist(your_data, ini=1, final=10)
Experiment 6

Study of various types of Charts for Data Visualization using Python Programming
Matplotlib
Matplotlib is an easy-to-use, low-level data visualization library that is
built on NumPy arrays. It consists of various plots like scatter plot, line
plot, histogram, etc. Matplotlib provides a lot of flexibility.
To install this type the below command in the terminal.

pip install matplotlib

Scatter Plot
Scatter plots are used to observe relationships between variables and uses
dots to represent the relationship between them. The scatter() method in the
matplotlib library is used to draw a scatter plot.
import pandas as pd
import matplotlib.pyplot as plt

# reading the database

data = pd.read_csv("tips.csv")
# Scatter plot with day against tip
plt.scatter(data['day'], data['tip'])

# Adding Title to the Plot

plt.title("Scatter Plot")

# Setting the X and Y labels

plt.xlabel('Day')
plt.ylabel('Tip')

plt.show()

Output:

Line Chart
Line Chart is used to represent a relationship between two data X and Y on a
different axis. It is plotted using the plot() function. Let’s see the below
example.
import pandas as pd
import matplotlib.pyplot as plt

# reading the database

data = pd.read_csv("tips.csv")
# Scatter plot with day against tip
plt.plot(data['tip'])
plt.plot(data['size'])

# Adding Title to the Plot

plt.title("Scatter Plot")

# Setting the X and Y labels

plt.xlabel('Day')
plt.ylabel('Tip')

plt.show()

Output:

Bar Chart
A bar plot or bar chart is a graph that represents the category of data with
rectangular bars with lengths and heights that is proportional to the
values which they represent. It can be created using the bar() method.
import pandas as pd
import matplotlib.pyplot as plt

# reading the database

data = pd.read_csv("tips.csv")

# Bar chart with day against tip

plt.bar(data['day'], data['tip'])

plt.title("Bar Chart")

# Setting the X and Y labels

plt.xlabel('Day')
plt.ylabel('Tip')

# Adding the legends

plt.show()
Output:
Histogram

A histogram is basically used to represent data in the form of some

groups. It is a type of bar plot where the X-axis represents the bin ranges
while the Y-axis gives information about frequency. The hist() function is
used to compute and create a histogram. In histogram, if we pass
categorical data then it will automatically compute the frequency of that
data i.e. how often each value occurred.
Example:
import pandas as pd
import matplotlib.pyplot as plt

# reading the database

data = pd.read_csv("tips.csv")

# histogram of total_bills
plt.hist(data['total_bill'])
plt.title("Histogram")

# Adding the legends

plt.show()

Output:
Experiment 7

Write a program to calculate Mean using Python Programming

Calculating the mean (or average) of a list of numbers in Python is straightforward.

You can achieve this using built-in functions or external libraries. Let’s explore a few
methods:

Using Python’s Built-in statistics Module: The statistics module provides a

convenient mean() function. Here’s an example:
Python:
import statistics

data = [1, 3, 4, 5, 7, 9, 2]
mean_value = statistics.mean(data)
print(f"Mean is: {mean_value:.6f}")

Output:
Mean is: 4.428571

Using a Custom Function: If you prefer not to use external libraries, you can create
your own function to calculate the mean. Here’s a simple implementation:
Python
def calculate_mean(sample):
return sum(sample) / len(sample)

data_sample = [4, 8, 6, 5, 3, 2, 8, 9, 2, 5]
mean_result = calculate_mean(data_sample)
print(f"Mean is: {mean_result:.1f}")

Output:
Mean is: 5.2

Using NumPy (for larger datasets): If you’re working with large datasets, consider
using the NumPy library. It provides efficient array operations, including mean
calculation:
Python
import numpy as np
array_data = np.array([0, 1, 2, 3, 4, 5, 6, 7])
mean_value_np = np.mean(array_data)
print(f"Mean using NumPy: {mean_value_np:.2f}")

Output:
Mean using NumPy: 3.5
Experiment 8

Write a program to calculate Median using Python Programming

Calculating the median in Python is straightforward. Let’s explore a couple of ways to

achieve this:

Using the statistics Module:

Python’s built-in statistics module provides a convenient function to

calculate the median.

Here’s an example using this approach:

Python
import statistics

# Create a list of numbers

numbers = [1, 2, 3, 4, 5, 6, 7]

# Calculate the median

med = statistics.median(numbers)

# Print the result

print(f"The median value is: {med}")

The output will be:

The median value is: 4

The statistics.median() function sorts the data in ascending order before

calculating the median. If the dataset has an even number of values, it takes the
average of the two middle values12.
Using NumPy (for more complex scenarios):

If you’re dealing with larger datasets or need additional functionality, consider

using NumPy.

Install NumPy if you haven’t already (pip install numpy).

Here’s an example using NumPy:

Python
import numpy as np
# Create an array
array1 = np.array([0, 1, 2, 3, 4, 5, 6, 7])

# Calculate the median

median1 = np.median(array1)

# Print the result

print(f"The median value using NumPy is: {median1}")

Output will be same as previous…

Experiment 9

Write a program to calculate Mode using Python Programming

Let’s create a Python program to calculate the mode of a given dataset. The mode
represents the most frequently occurring value in the dataset.
Python
def calculate_mode(data):
"""
Calculates the mode of a list of numeric data.

Args:
data (list): A list of numeric values.

Returns:
float or str: The mode value (or values) in the dataset.
"""
from statistics import mode

try:
# Calculate the mode using Python's built-in mode() function
result = mode(data)
return result
except StatisticsError:
return "No unique mode found in the dataset."

# Example usage
sample_data = [4, 1, 2, 2, 3, 5]
mode_result = calculate_mode(sample_data)
print(f"The mode of the dataset is: {mode_result}")

OUTPUT:
The mode of the dataset is: 2

In this program:

● We define a function calculate_mode that takes a list of numeric values as

input.
● Inside the function, we use Python’s statistics.mode() function to find
the mode.
● If there is no unique mode (i.e., multiple values occur with the same highest
frequency), the function returns a message indicating that.
Experiment 10

Write a program to calculate correlations using python programming.

Calculating correlations between variables is essential in data analysis. Let’s explore

how to calculate different correlation coefficients using Python. We’ll use libraries
like NumPy, SciPy, and pandas.
1. Pearson Correlation Coefficient:
○ The Pearson correlation measures the linear relationship between two
continuous variables.
○ It ranges from -1 (perfect negative correlation) to 1 (perfect positive
correlation).
○ Here’s an example using NumPy:
○ Python
import numpy as np

# Generate random data

np.random.seed(100)
var1 = np.random.randint(0, 10, 50)
var2 = var1 + np.random.normal(0, 10, 50)

# Calculate Pearson correlation

correlation_matrix = np.corrcoef(var1, var2)
pearson_coefficient = correlation_matrix[0, 1]

print(f"Pearson Correlation Coefficient: {pearson_coefficient:.4f}")

OUTPUT:

Pearson Correlation Coefficient: 0.3350

2. Spearman and Kendall Correlation:
○ These coefficients are useful when dealing with non-linear
relationships or ranked data.
○ SciPy provides functions for both Spearman and Kendall correlations.
○ Example using SciPy:
○ Python
from scipy.stats import spearmanr, kendalltau

# Calculate Spearman correlation

spearman_corr, _ = spearmanr(var1, var2)
print(f"Spearman Correlation Coefficient: {spearman_corr:.4f}")

# Calculate Kendall correlation

kendall_corr, _ = kendalltau(var1, var2)
print(f"Kendall Correlation Coefficient: {kendall_corr:.4f}")
○

OUTPUT:

Spearman Correlation Coefficient: 0.3844

Kendall Correlation Coefficient: 0.2644

3. Correlation Matrix with pandas:

○ You can compute the correlation matrix for multiple variables using
pandas:
○ Python
import pandas as pd

# Create a DataFrame with your data

df = pd.DataFrame({"Variable1": var1, "Variable2": var2})

# Calculate the correlation matrix

correlation_matrix_df = df.corr()

print("Correlation Matrix:")
print(correlation_matrix_df)

OUTPUT:

Correlation Matrix:

Variable1 Variable2

Variable1 1.000000 0.335018

Variable2 0.335018 1.000000

4. Visualization:
○ To visualize correlations, you can create scatter plots, regression lines,
or heatmaps using Matplotlib.
Experiment 11

Write a program to calculate correlations using python programming.

Certainly! Below is a simple Python program using the `scikit-learn` library to perform linear
regression:

```python
import numpy as np
from sklearn.linear_model import LinearRegression

# Sample data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 3, 5, 7, 9])

# Create and fit the model

model = LinearRegression()
model.fit(X, y)

# Coefficients
slope = model.coef_[0]
intercept = model.intercept_

print("Slope:", slope)
print("Intercept:", intercept)

# Predictions
y_pred = model.predict(X)

print("Predictions:", y_pred)
```
OUTPUT:
Slope: 1.8
Intercept: -0.20000000000000018
Predictions: [1.6 3.4 5.2 7. 8.8]

In this program:
- We first import the necessary libraries: `numpy` for numerical operations and
`LinearRegression` from `sklearn.linear_model` for linear regression.
- We define our sample data `X` and `y`, where `X` is a 2D array representing the features
and `y` is a 1D array representing the target variable.
- We create an instance of the `LinearRegression` model and fit it to our data using the `fit()`
method.
- We print the coefficients (slope and intercept) of the regression line using the `coef_` and
`intercept_` attributes of the model.
- Finally, we use the model to make predictions on the input data `X` and print the predicted
values.
Experiment 12

Write a program to calculate correlations using python programming.

Certainly! Here's a Python program to calculate linear regression using the least squares
method without using any external libraries:

```python
# Sample data
X = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 9]

# Calculate mean of X and y

mean_X = sum(X) / len(X)
mean_y = sum(y) / len(y)

# Calculate slope (m) and y-intercept (b)

numerator = sum((x - mean_X) * (yi - mean_y) for x, yi in zip(X, y))
denominator = sum((x - mean_X) ** 2 for x in X)
slope = numerator / denominator
intercept = mean_y - slope * mean_X

# Print slope and y-intercept

print("Slope:", slope)
print("Intercept:", intercept)

# Predictions
y_pred = [slope * x + intercept for x in X]
print("Predictions:", y_pred)
```

OUTPUT:

Slope: 1.8
Intercept: -0.20000000000000018
Predictions: [1.5999999999999999, 3.4, 5.2, 7.0, 8.8]

In this program:
- We define our sample data `X` and `y`.
- We calculate the mean of `X` and `y`.
- We use the least squares method to calculate the slope (`m`) and y-intercept (`b`) of the
regression line.
- We print the slope and y-intercept.
- We make predictions for the target variable `y` based on the calculated slope and
y-intercept.

You can modify the `X` and `y` lists with your own data to perform linear regression on your
dataset. Keep in mind that this implementation is a basic example and may not be as efficient
or robust as using external libraries like `numpy` or `scikit-learn` for more complex tasks.
Experiment 13

Write a program to calculate correlations using python programming.

Here's a simple implementation of logistic regression using the `scikit-learn` library in
Python:

```python
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Load the Iris dataset

iris = load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Create and fit the logistic regression model

model = LogisticRegression()
model.fit(X_train, y_train)

# Make predictions on the test set

y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Print classification report

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

```
OUTPUT:

Accuracy: 1.0

Classification Report:
precision recall f1-score support

0 1.00 1.00 1.00 10

1 1.00 1.00 1.00 9
2 1.00 1.00 1.00 11

accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30

In this program:
- We first import necessary libraries from `scikit-learn`.
- We load the Iris dataset using `load_iris()` function from `sklearn.datasets`.
- We split the data into training and testing sets using `train_test_split()` function.
- We create an instance of the Logistic Regression model and fit it to the training data using
the `fit()` method.
- We make predictions on the test set using the `predict()` method.
- We calculate the accuracy of the model using `accuracy_score()` function from
`sklearn.metrics`.
- We print the classification report which includes precision, recall, f1-score, and support for
each class using `classification_report()` function from `sklearn.metrics`.

You can replace the `X` and `y` with your own dataset to perform logistic regression on your
data. This example uses the Iris dataset as a demonstration. Make sure your data is properly
preprocessed before applying logistic regression.
Experiment 14

Implement a vector model of information retrieval using python programming.

Implementing a vector space model of information retrieval involves several steps, including
preprocessing the text data, constructing a document-term matrix, calculating the TF-IDF
(Term Frequency-Inverse Document Frequency) weights, and performing similarity
calculations between documents and queries.

Below is a Python implementation of a simple vector space model of information retrieval

using the `scikit-learn` library:

```python
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Sample documents
documents = [
"This is the first document.",
"This document is the second document.",
"And this is the third one.",
"Is this the first document?",
]

# Sample query
query = "This is the second document."

# Create a TF-IDF vectorizer

vectorizer = TfidfVectorizer()

# Fit the vectorizer on the documents and transform them into a document-term matrix
tfidf_matrix = vectorizer.fit_transform(documents)

# Transform the query into a document-term matrix

query_vector = vectorizer.transform([query])

# Calculate cosine similarity between the query vector and document vectors
cosine_similarities = cosine_similarity(query_vector, tfidf_matrix)

# Get the document with the highest similarity score

most_similar_doc_index = cosine_similarities.argmax()
most_similar_doc = documents[most_similar_doc_index]

print("Most similar document:", most_similar_doc)

```

OUTPUT:

Most similar document: This document is the second document.

In this implementation:
- We have a list of sample documents and a sample query.
- We use the `TfidfVectorizer` class from `scikit-learn` to convert the text data into a TF-IDF
weighted document-term matrix.
- We transform the query into a document-term matrix using the same vectorizer.
- We calculate cosine similarity between the query vector and document vectors using
`cosine_similarity` function from `sklearn.metrics.pairwise`.
- We identify the document with the highest similarity score to the query based on cosine
similarity.

You can replace the `documents` list and `query` string with your own dataset and query to
perform information retrieval using the vector space model. Make sure your documents are
preprocessed and tokenized appropriately before applying the vectorizer.
Experiment 15

Implement decision tree using python programming

Sure! Here's an example of implementing a decision tree classifier using the popular library
`scikit-learn`:

```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report

# Load the Iris dataset

iris = load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Create a decision tree classifier

clf = DecisionTreeClassifier()

# Fit the classifier to the training data

clf.fit(X_train, y_train)

# Make predictions on the test data

y_pred = clf.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Print classification report

print("\nClassification Report:")
print(classification_report(y_test, y_pred))
```
OUTPUT:

Accuracy: 1.0

Classification Report:
precision recall f1-score support

0 1.00 1.00 1.00 10

1 1.00 1.00 1.00 9
2 1.00 1.00 1.00 11

accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30

In this example:
- We first import necessary modules from `scikit-learn`.
- We load the Iris dataset using `load_iris()` function from `sklearn.datasets`.
- We split the data into training and testing sets using `train_test_split()` function from
`sklearn.model_selection`.
- We create an instance of the Decision Tree classifier using `DecisionTreeClassifier()` class.
- We fit the classifier to the training data using the `fit()` method.
- We make predictions on the test data using the `predict()` method.
- We calculate the accuracy of the model using `accuracy_score()` function from
`sklearn.metrics`.
- We print the classification report which includes precision, recall, f1-score, and support for
each class using `classification_report()` function from `sklearn.metrics`.

You can replace the Iris dataset with your own dataset and adjust the parameters of the
decision tree classifier as needed. Additionally, you can visualize the decision tree using tools
like `graphviz` to gain insights into the decision-making process of the model.
Experiment 16

Implement decision tree using python programming

Sure, here's a Python program to calculate different types of probabilities:

```python
# Importing required libraries
import numpy as np
from scipy.stats import binom, poisson, norm

# Probability of getting exactly k successes in n independent Bernoulli trials with

probability p of success
def binomial_probability(n, k, p):
return binom.pmf(k, n, p)

# Probability of getting k or fewer successes in n independent Bernoulli trials with

probability p of success
def binomial_cumulative_probability(n, k, p):
return binom.cdf(k, n, p)

# Probability of getting exactly k occurrences in a fixed interval of time or space if

events occur independently and at a constant rate
def poisson_probability(k, lambd):
return poisson.pmf(k, lambd)

# Probability of a random variable falling within a specified range

def normal_probability(x, mean, std):
return norm.pdf(x, mean, std)

# Cumulative probability of a random variable falling below a specified value

def normal_cumulative_probability(x, mean, std):
return norm.cdf(x, mean, std)

# Example usage
if __name__ == "__main__":
# Binomial probability
n = 10 # Number of trials
k = 5 # Number of successes
p = 0.5 # Probability of success
print("Binomial Probability:", binomial_probability(n, k, p))

# Binomial cumulative probability

print("Binomial Cumulative Probability:", binomial_cumulative_probability(n, k, p))

# Poisson probability
k = 2 # Number of occurrences
lambd = 3 # Average rate
print("Poisson Probability:", poisson_probability(k, lambd))
# Normal probability
x = 1 # Value of the random variable
mean = 0 # Mean of the distribution
std = 1 # Standard deviation of the distribution
print("Normal Probability:", normal_probability(x, mean, std))

# Normal cumulative probability

print("Normal Cumulative Probability:", normal_cumulative_probability(x, mean,
std))
```

OUTPUT:

Binomial Probability: 0.24609375000000003

Binomial Cumulative Probability: 0.623046875
Poisson Probability: 0.22404180765538775
Normal Probability: 0.24197072451914337
Normal Cumulative Probability: 0.8413447460685429

In this program:
- We use functions from the `scipy.stats` module to calculate various types of probabilities:
binomial probability, cumulative binomial probability, Poisson probability, normal
probability, and cumulative normal probability.
- Each function takes input parameters specific to the probability distribution being calculated
and returns the corresponding probability value.
- We demonstrate the usage of each function with example inputs.

You can adjust the input parameters and probability distributions as needed for your specific
calculations.

Programming in
No ratings yet
Programming in
72 pages
Series and Pandas Methods
No ratings yet
Series and Pandas Methods
5 pages
1T00738A
No ratings yet
1T00738A
1,729 pages
Beckhoff - CX9020 (2016)
100% (1)
Beckhoff - CX9020 (2016)
50 pages
Introduction To Numpy: Aniruddh Kadam Reg No-12109237 Lovely Professional University
100% (1)
Introduction To Numpy: Aniruddh Kadam Reg No-12109237 Lovely Professional University
84 pages
Windows WSL
No ratings yet
Windows WSL
271 pages
Python Cheat Sheet 2.0
100% (1)
Python Cheat Sheet 2.0
10 pages
Python For Data Science Cheat Sheet 2.0
No ratings yet
Python For Data Science Cheat Sheet 2.0
11 pages
DC To Ac Inverter Card With Igbt
100% (1)
DC To Ac Inverter Card With Igbt
26 pages
NumPy & Pandas
No ratings yet
NumPy & Pandas
27 pages
FALLSEM2019-20 STS4021 SS VL2019201000258 Reference Material I 16-Jul-2019 Ppt-A2 Module 1 PDF
No ratings yet
FALLSEM2019-20 STS4021 SS VL2019201000258 Reference Material I 16-Jul-2019 Ppt-A2 Module 1 PDF
91 pages
12.1 - 12.9 Introduction To Modules - Libraries For DataScience
No ratings yet
12.1 - 12.9 Introduction To Modules - Libraries For DataScience
54 pages
FODS_LAB_MANUAL
No ratings yet
FODS_LAB_MANUAL
26 pages
Data Science Python Cheat Sheet
No ratings yet
Data Science Python Cheat Sheet
25 pages
Fundamentals of Data Science Lab Manual
No ratings yet
Fundamentals of Data Science Lab Manual
34 pages
Zhang Et Al v. Baidu - Com Inc. Et Al
No ratings yet
Zhang Et Al v. Baidu - Com Inc. Et Al
17 pages
704814258-Unit-5-Python-Packages-240127-185930
No ratings yet
704814258-Unit-5-Python-Packages-240127-185930
34 pages
DSF LAB EXP FULL (1) (1)
No ratings yet
DSF LAB EXP FULL (1) (1)
88 pages
PMI - Modules and Data Structures
No ratings yet
PMI - Modules and Data Structures
23 pages
Power Supply Unit Installation Manual
No ratings yet
Power Supply Unit Installation Manual
24 pages
Unit Vi
No ratings yet
Unit Vi
60 pages
CppLabAssignmentsSem4_Solution
No ratings yet
CppLabAssignmentsSem4_Solution
51 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (4)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
11 pages
vertopal.com_Ch02-statlearn-lab
No ratings yet
vertopal.com_Ch02-statlearn-lab
58 pages
CS211 Fall 2021 Programming Assignment II: David Menendez
No ratings yet
CS211 Fall 2021 Programming Assignment II: David Menendez
10 pages
Data Toolkit Assignment
No ratings yet
Data Toolkit Assignment
30 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
42 pages
Class 8 To Class 9 Shallow and Deep Copying
No ratings yet
Class 8 To Class 9 Shallow and Deep Copying
20 pages
Dec 23 py
No ratings yet
Dec 23 py
19 pages
DAV EXP 1 t12 31
No ratings yet
DAV EXP 1 t12 31
39 pages
Week 4- Introduction to Python #3
No ratings yet
Week 4- Introduction to Python #3
47 pages
W1.1_CBU5201
No ratings yet
W1.1_CBU5201
36 pages
Fundamentals of Data science Lab manual new
No ratings yet
Fundamentals of Data science Lab manual new
33 pages
Value Added Course: Programming in Python and Machine Learning UNIT-2
No ratings yet
Value Added Course: Programming in Python and Machine Learning UNIT-2
41 pages
PP Unit 4 Q&A
No ratings yet
PP Unit 4 Q&A
25 pages
Improving_the_ISOIEC_11770_Standard_for_Key_Manage (1)
No ratings yet
Improving_the_ISOIEC_11770_Standard_for_Key_Manage (1)
15 pages
4 Introduction to Python Part 3 (2)
No ratings yet
4 Introduction to Python Part 3 (2)
48 pages
6A Boscombe Road
No ratings yet
6A Boscombe Road
4 pages
Numpy_Data_Analysis_and_visualisation_with_Python
No ratings yet
Numpy_Data_Analysis_and_visualisation_with_Python
75 pages
Numerical Methods Roots of Equation Chapter One
No ratings yet
Numerical Methods Roots of Equation Chapter One
5 pages
Data Visualization1
No ratings yet
Data Visualization1
52 pages
Answers 1
No ratings yet
Answers 1
17 pages
SYS Module
No ratings yet
SYS Module
20 pages
Pgdca Ii - 2324 Even Sem Assignment All Subject
No ratings yet
Pgdca Ii - 2324 Even Sem Assignment All Subject
6 pages
01 Introduction to Python
No ratings yet
01 Introduction to Python
36 pages
Process Factsheet: IT Service Continuity Management
No ratings yet
Process Factsheet: IT Service Continuity Management
2 pages
unit 5
No ratings yet
unit 5
28 pages
PyDays Day-2 - Final
No ratings yet
PyDays Day-2 - Final
26 pages
Python Language Questions
No ratings yet
Python Language Questions
6 pages
01 Introduction to Python
No ratings yet
01 Introduction to Python
36 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
Python Libraries
No ratings yet
Python Libraries
27 pages
FOD Record Sem 1
No ratings yet
FOD Record Sem 1
25 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
72 pages
Python For DScience & D Visualisation Updated
No ratings yet
Python For DScience & D Visualisation Updated
11 pages
Pythonn SE
No ratings yet
Pythonn SE
18 pages
Data Analysis and Visualisation With Python
No ratings yet
Data Analysis and Visualisation With Python
75 pages
Python Game
No ratings yet
Python Game
3 pages
PC Express Dealers Pricelist
No ratings yet
PC Express Dealers Pricelist
3 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
07 Task Performance Flores
No ratings yet
07 Task Performance Flores
2 pages
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
No ratings yet
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
9 pages
Nutritional Status 2022-2023
No ratings yet
Nutritional Status 2022-2023
7 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Securing Microservices and Microservice Architectures: A Systematic Mapping Study
No ratings yet
Securing Microservices and Microservice Architectures: A Systematic Mapping Study
21 pages
Custom sidebar + Multi level User
No ratings yet
Custom sidebar + Multi level User
9 pages
Pandas What Can Pandas Do For You ?: Statsmodels SM Seaborn Sns
No ratings yet
Pandas What Can Pandas Do For You ?: Statsmodels SM Seaborn Sns
9 pages
Website Performance Presentation
No ratings yet
Website Performance Presentation
2 pages
EXP1-siddhant gupta (23_SE_148)
No ratings yet
EXP1-siddhant gupta (23_SE_148)
17 pages
Database Management Systems
No ratings yet
Database Management Systems
7 pages
Chapter-7 Designed Methodologies and Tools: Design Flow: Sequence of Steps in A Design Methodology
No ratings yet
Chapter-7 Designed Methodologies and Tools: Design Flow: Sequence of Steps in A Design Methodology
16 pages
Python Cheat Sheet For Excel Users
No ratings yet
Python Cheat Sheet For Excel Users
5 pages
20740C 02-GMG
No ratings yet
20740C 02-GMG
28 pages
DV Lab2 Updated
No ratings yet
DV Lab2 Updated
12 pages
LAB 2 DWM
No ratings yet
LAB 2 DWM
13 pages
Bdsm
No ratings yet
Bdsm
1 page
Numpy Basics Introduction To
No ratings yet
Numpy Basics Introduction To
35 pages
Chhotu Resume-2022
No ratings yet
Chhotu Resume-2022
2 pages
Python Cheat Sheet
No ratings yet
Python Cheat Sheet
4 pages
Python Unit IV
No ratings yet
Python Unit IV
12 pages
Pandas python
No ratings yet
Pandas python
11 pages
Pandas For Machine Learning: Acadview
No ratings yet
Pandas For Machine Learning: Acadview
18 pages
Getting Started With Python Cheat Sheet
No ratings yet
Getting Started With Python Cheat Sheet
1 page
Python BasicsGUIA PYTHON-01
No ratings yet
Python BasicsGUIA PYTHON-01
1 page
Computer Vision and Image processing
No ratings yet
Computer Vision and Image processing
2 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Python Cheat Sheet For Beginners
No ratings yet
Python Cheat Sheet For Beginners
1 page
SPAdapter SPPID
No ratings yet
SPAdapter SPPID
116 pages
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet
Python: Advanced Guide to Programming Code with Python
From Everand
Python: Advanced Guide to Programming Code with Python
Charlie Masterson
No ratings yet