0% found this document useful (0 votes)

11 views44 pages

CS3361-DATA SCIENCE LAB MANUAL

The document provides a comprehensive overview of various Python packages including NumPy, SciPy, Jupyter, Statsmodels, and Pandas, detailing their features and sample programs. It also includes practical exercises on working with NumPy arrays and Pandas DataFrames, demonstrating data manipulation techniques. Additionally, it covers reading data from text files, Excel, and the web, along with performing descriptive analytics on datasets.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views44 pages

CS3361-DATA SCIENCE LAB MANUAL

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

EX.

NO:1 INSTALLATION OF PACKAGES

DATE:
AIM:
To download, install and explore the features of NumPy, SciPy, Jupyter, Stasmodels and
Pandas packages.

NUMPY:
Numpy is a general-purpose array-processing package. It provides a high-performance
multidimensional array object, and tools for working with these arrays. It is the fundamental
package for scientific computing with Python. Besides its obvious scientific uses, Numpy can
also be used as an efficient multi-dimensional container of generic data.

Features:
 High-performance N-dimensional array object.
 It contains tools for integrating code from C/C++ and FORTRAN.
 It contains a multidimensional container for generic data.
 Additional linear algebra, Fourier transforms, and random number capabilities.
 It consists of broadcasting functions.
 It had data type definition capability to work with varied databases.
Sample Program:
import numpy as np
a=np.array([1,2,3])
print(a)

OUTPUT:
[1 2 3]

SCIPY:
SciPy is a python library that is useful in solving many mathematical equations and
algorithms. It is designed on the top of Numpy library that gives more extension of finding
scientific mathematical formulae like Matrix Rank, Inverse, polynomial equations, LU
Decomposition, etc. Using its high level functions will significantly reduce the complexity of
the code and helps in better analyzing the data. SciPy is an interactive Python session used as
a data-processing library that is made to compete with its rivalries such as MATLAB, Octave,
R- Lab,etc. It has many user-friendly, efficient and easy-to-use functions that helps to solve
problems like numerical integration, interpolation, optimization, linear algebra and statistics.
Sample Program:
from scipy import constants
print(constants.pi)

OUTPUT:
3.141592653589793

JUPYTER:
The IPython Notebook concept was expanded upon to allow for additional programming
languages and was therefore renamed "Jupyter". "Jupyter" is a loose acronym meaning Julia,
Python and R, but today, the notebook technology supports many programming languages.
An IDE normally consists of at least a source code editor, build automation tools and a
debugger. Jupyter Notebook is an IDE for Python that allows its users to create documents
containing both rich text and code. It also supports the programming languages Julia, and R.

Jupyter Notebook allows users to compile all aspects of a data project in one place
making it easier to show the entire process of a project to your intended audience. Through
the web-based application, users can create data visualizations and other components of a
project to share with others via the platform.
To open jupyter-lab:
Open command prompt and type jupyter-lab.
Then after initializing all the necessary packages, it will open as follows:

Click on new notebook, then the new file will be opened with .ipynb file extension. Then type
python code and execute the code using Shift+Enter.
Sample Program and Output:

STASMODELS:
Statsmodels is a Python module that provides classes and functions for the
estimation ofmany different statistical models, as well as for conducting statistical tests, and
statistical data exploration. An extensive list of result statistics is available for each
estimator. The results aretested against existing statistical packages to ensure that they are
correct. statsmodels supports specifying models using R-style formulas and pandas
DataFrames.
statsmodels is a Python package that provides a complement to scipy for statistical
computations including descriptive statistics and estimation and inference for statistical
models.

Sample Program:
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
df = pd.read_csv(r"C:\Users\UGCS\Desktop\headbrain11.csv")
print(df.head())
# fitting the model
df.columns = ['Head_size', 'Brain_weight']
model = smf.ols(formula='Head_size ~ Brain_weight', data=df).fit()# model summary
print(model.summary())

OUTPUT:

PANDAS:
Pandas is a Python library used for working with data sets. It has functions for
analyzing, cleaning, exploring, and manipulating data. Pandas allow us to analyze big data
and make conclusions based on statistical theories. Pandas can clean messy data sets, and
make them readable and relevant. Relevant data is very important in data science.
Sample Program:
import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)

OUTPUT:

RESULT:

Thus the python packages NumPy, SciPy, Jupyter, Stasmodels and Pandas have been
downloaded, installed and the features have been explored successfully.
EX.NO:2 WORKING WITH NUMPY ARRAYS
DATE:

AIM:
To write a python code to work with numpy arrays.
ALGORITHM:

1. Import the numpy package.

2. Create the array using numpy.array()
3. Indexing can be done like this: [start:end].
4. The NumPy array object has a property called dtype that returns the data type of
thearray.
5. To deal with iteration to multi-dimensional arrays in numpy, we can do this
usingbasic for loop of python.
6. To join two arrays, the concatenate() function along with the axis can be used.
7. Use array_split() for splitting arrays, we pass it the array we want to split and the
numberof splits.
8. To search an array, use the where() method.
9. The NumPy ndarray object has a function called sort(), that will sort a specified array.
10. In NumPy, you filter an array using a boolean index list.
a. If the value at an index is True that element is contained in the filtered array
b. if the value at that index is False that element is excluded from the filtered array.

PROGRAM:
#Create a 0-D array:
import numpy as np
arr = np.array(42)
print(arr)

OUTPUT:
42

#Create a 1-D array:

import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)

OUTPUT:
[1,2,3,4,5]

#Create a 2-D array:

import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)
OUTPUT:
[[1 2 3]
[4 5 6]]

#Create a 3-D array:

import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)

OUTPUT:
[[[1 2 3]
[4 5 6]]
[[1 2 3]
[4 5 6]]]

#Check how many dimensions the arrays have:

import numpy as np
a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)

OUTPUT:
0
1
2
3

#Accessing Array Elements:

import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr[1])

OUTPUT:
2
#Accessing 2-D Arrays:
import numpy as np
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
print('2nd element on 1st row: ', arr[0, 1])
OUTPUT:
2nd element on 1st row: 2

#Array Slicing:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[4:])

OUTPUT:
[5 6 7]

#Slicing 2-D Arrays:

import numpy as np
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[1, 1:4])

OUTPUT:
[7 8 9]

#Getting the data type of an array:

import numpy as np
arr = np.array(['apple', 'banana', 'cherry'])
print(arr.dtype)

OUTPUT:
<U6

#Iterate on the elements of 1-D array:

import numpy as np
arr = np.array([1, 2, 3])
for x in arr:
print(x)

OUTPUT:
1
2
3

#Iterating 2-D Arrays:

import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
for x in arr:
print(x)
OUTPUT:
[1 2 3]
[4 5 6]

#Join two arrays:

import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
print(arr)

OUTPUT:
[1 2 3 4 5 6]

#Splitting the array:

import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
newarr = np.array_split(arr, 3)
print(newarr)

OUTPUT:
[array([1, 2]), array([3, 4]), array([5, 6])]

#Searching Arrays:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 4, 4])
x = np.where(arr == 4)
print(x)

OUTPUT:
(array([3, 5, 6], dtype=int32),)

#Sorting Arrays:
import numpy as np
arr = np.array([3, 2, 0, 1])
print(np.sort(arr))

OUTPUT:
[0 1 2 3]
#Filtering Arrays:
import numpy as np
arr = np.array([41, 42, 43, 44])
x = [True, False, True, False]
newarr = arr[x]
print(newarr)

OUTPUT:
[41 43]

RESULT:
Thus the python code to work with numpy arrays has been implemented and executed
successfully.
EX.NO:3 WORKING WITH PANDAS DATA FRAMES
DATE:

AIM:
To write a python program to work with pandas data frames.
ALGORITHM:
1. Pandas is a Python library used for working with data sets.
2. It has functions for analyzing, cleaning, exploring, and manipulating data.
3. Dataframes can be created using list or dictionary.
4. Dataframes can also be used to load any other .csv or .xslx files.
5. It can be used to replace the null values with other values.
6. It can also perform data and its statistical analyzing.

PROGRAM:
#Creating a dataframe using List:
import pandas as pd
lst =['Anna', 'University', 'Chennai', 'Sri Ramakrishna','College', 'of','Engineering']
df = pd.DataFrame(lst)
print(df)
OUTPUT:
0
0 Anna
1 University
2 Chennai
3 Sri Ramakrishna
4 College
5 of
6 Engineering

#Creating DataFrame from dict of ndarray/lists:

import pandas as pd
data = {'Name':['Tom', 'nick', 'krish', 'jack'], 'Age':[20, 21, 19, 18]}
df = pd.DataFrame(data)
print(df)
OUTPUT:
#Column Selection:
import pandas as pd
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
df = pd.DataFrame(data)
print(df[['Name', 'Qualification']])
OUTPUT:

#Load Files Into a DataFrame:

import pandas as pd
df = pd.read_csv(r 'data.csv')
print(df.to_string())

OUTPUT:

# Viewing the Data

import pandas as pd
df = pd.read_csv(r"C:\Users\UGCS\Desktop\data.csv")
print(df.head(10))
print(df.tail(5))
OUTPUT:
#Replacing Nullvalues:
import pandas as pd
df = pd.read_csv(r"C:\Users\UGCS\Desktop\data.csv")
df.fillna(130, inplace = True)
print(df)

OUTPUT:

#Checking for missing values using isnull() and notnull() :

import pandas as pd
df = pd.read_csv(r"data.csv")
bool_series = pd.isnull(df["Pulse"])
print(df[bool_series])
bool_series = pd.notnull(df["Pulse"])
print(df[bool_series])

OUTPUT:
RESULT:

Thus the python program to work with pandas data frames have been implemented and
executed successfully.
EX.NO:4 READING DATA FROM TEXT FILES, EXCEL AND THE WEB
DATE:
AIM:
To read the data from text files, Excel and the web and exploring various commands
fordoing descriptive analytics on the Iris data set.

PRE-REQUISITES:
pip install xlrd
pip install openpyxl
pipinstall requests
pip install beautifulsoup4

ALGORTIHM:
1. Open the file to be written using open() function.
2. The file can opened with read/write/append/… mode.
3. Write the file using write() or writelines() function.
4. seek(n) takes the file handle to the nth byte from the beginning.
5. Close the file using close().
6. To read the data from the excel, install pandas.
7. Create a dataframe using read_excel()
8. To read the data from the web, install requests and beautifulsoup4.
9. The content from the web can be accessed using the function requests.get(url).
10. To perform descriptive analytics on a dataset, install seaborn, matplotlib and
pandas toexplore various functions.
PROGRAM:
#Reading data from text file:
# Program to show various ways to read and write data in a file.
file1 = open("myfile.txt","w")
L = ["This is Python \n","This is datascience \n","This is jupyter \n"]
file1.write("Hello \n")
file1.writelines(L)
file1.close() #to change file access
modesfile1 = open("myfile.txt","r+")
print("Output of Read function is ")
print(file1.read())
print()
# seek(n) takes the file handle to the nth byte from the beginning.
file1.seek(0)
print( "Output of Readline function is ")
print(file1.readline())
print()
file1.seek(0)
# To show difference between read and readline
print("Output of Read(9) function is ")
print(file1.read(9))
print()
file1.seek(0)
print("Output of Readline(9) function is ")
print(file1.readline(9))
file1.seek(0)

# readlines function
print("Output of Readlines function is ")
print(file1.readlines())
print(file1.close())
OUTPUT:

#Reading data from excel:

# Create a new excel file
import pandas as pd
# read by default 1st sheet of an excel file
dataframe1 = pd.read_excel('excel.xlsx')
print(dataframe1)
OUTPUT:
#Reading data from the web:
import requests
from bs4 import BeautifulSoup
import time
url="https://raw.githubusercontent.com/RupeshMohan/Linear_Regression/master/headbrain.csv"
page = requests.get(url)
soup = BeautifulSoup(page.content,'html.parser')
print(soup)

OUTPUT:
Gender,Age Range,Head Size(cm^3),Brain Weight(grams)

# descriptive analytics on the Iris data set

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt # Reading the CSV
filedf = pd.read_csv("iris.csv")
df.shape df.info()
df.describe()
data = df.drop_duplicates(subset ="Iris-setosa",)
data df.value_counts("Iris-setosa")
sns.countplot(x='Iris-setosa',data=df, )
print(plt.show())
OUTPUT:
RESULT:
Thus the python code to read the data from text files, Excel and the web and exploring
various commands for doing descriptive analytics on the Iris data set.
EX.NO:5A UNIVARIATE ANALYSIS USING DIABETES DATASET
DATE:
AIM:
To perform Univariate analysis such as Frequency, Mean, Median, Mode, Variance,
Standard Deviation, Skewness and Kurtosis on the diabetes dataset.
ALGORITHM:
1. Install pandas.
2. To find the frequency of a single variable on a dataset, use the value_counts() function.
3. To find the mean of a single variable on a dataset, use the mean() function.
4. To find the median of a single variable on a dataset, use the median() function.
5. To find the mode of a single variable on a dataset, use the mode() function.
6. To find the variance of a single variable on a dataset, install and import statistics and
usethe statistics.variance() function.
7. To find the standard deviation of a single variable on a dataset, use the std() function.
8. To find the skewness of a single variable on a dataset, install and import scipy and use
thescipy.stats.skew() function.
9. To find the kurtosis of a single variable on a dataset, install and import scipy and use
thescipy.stats.kurtosis() function.

PROGRAM:
#Reading dataset
import pandas as pd
#create DataFrame
df = pd.read_csv("diabetes.csv")
df.info()
df.describe()

OUTPUT:
#Finding Frequency
import pandas as pd
#create DataFrame
df = pd.read_csv("diabetes.csv")
#create frequency table for 'Glucose' variable
f1=df['Glucose'].value_counts()
print('frequency table for Glucose variable\n',f1)

OUTPUT:

#Finding Mean
import pandas as pd
#create DataFrame
df = pd.read_csv("diabetes.csv")
m1=df['Pregnancies'].mean()
print('Mean of Pregnancies',m1)
m2=df['Glucose'].mean()
print('Mean of Glucose',m2)
m3=df['BloodPressure'].mean()
print('Mean of BloodPressure',m3)
m4=df['SkinThickness'].mean()
print('Mean of SkinThickness',m4)
m5=df['Insulin'].mean()
print('Mean of Insulin',m5)
m6=df['BMI'].mean()
print('Mean of BMI',m6) m7=df['DiabetesPedigreeFunction'].mean()
print('Mean of DiabetesPedigreeFunction',m7)
m8=df['Age'].mean()
print('Mean of Age',m8)
OUTPUT:

#Finding Median
import pandas as pd
df = pd.read_csv("diabetes.csv")
m1=df['Pregnancies'].median()
print('median of Pregnancies',m1)
m2=df['Glucose'].median()
print('median of Glucose',m2)
m3=df['BloodPressure'].median()
print('median of BloodPressure',m3)
m4=df['SkinThickness'].median()
print('median of SkinThickness',m4)
m5=df['Insulin'].median()
print('median of Insulin',m5)
m6=df['BMI'].median()
print('median of BMI',m6)
m7=df['DiabetesPedigreeFunction'].median()
print('median of DiabetesPedigreeFunction',m7)
m8=df['Age'].median()
print('median of Age',m8)

OUTPUT:
#Finding Mode
import pandas as pd
#create DataFrame
df = pd.read_csv("diabetes.csv")
m1=df['Pregnancies'].mode()
print('mode of Pregnancies',m1)
m2=df['Glucose'].mode()
print('mode of Glucose',m2)
m3=df['BloodPressure'].mode()
print('mode of BloodPressure',m3)
m4=df['SkinThickness'].mode()
print('mode of SkinThickness',m4)
m5=df['Insulin'].mode()
print('mode of Insulin',m5)
m6=df['BMI'].mode()
print('mode of BMI',m6)
m7=df['DiabetesPedigreeFunction'].mode()
print('mode of DiabetesPedigreeFunction',m7)
m8=df['Age'].mode()
print('mode of Age',m8)

OUTPUT:

#Finding Variance
import pandas as pd
import statistics #create DataFrame
df = pd.read_csv("diabetes.csv")
print("Variance of Glucose set is % s"%(statistics.variance(df.Glucose)))
print("Variance of Pregnancies set is % s"%(statistics.variance(df.Pregnancies)))
print("Variance of Age set is % s"%(statistics.variance(df.Age)))
OUTPUT:

#Finding Standard Deviation

import pandas as pd
#create DataFrame
df = pd.read_csv("diabetes.csv")
s1=df['Pregnancies'].std()
print('std of Pregnancies',s1)
s2=df['Glucose'].std()
print('std of Glucose',s2)
s3=df['BloodPressure'].std()
print('std of BloodPressure',s3)
s4=df['SkinThickness'].std()
print('std of SkinThickness',s4)
s5=df['Insulin'].std()
print('std of Insulin',s5)
s6=df['BMI'].std()
print('std of BMI',s6)
s7=df['DiabetesPedigreeFunction'].std()
print('std of DiabetesPedigreeFunction',s7)
s8=df['Age'].std()
print('std of Age',s8)

OUTPUT:
#Finding Skewness
import scipy
import pandas as pd
#create DataFrame
df = pd.read_csv("diabetes.csv")
s1=scipy.stats.skew(df.Age, axis=0, bias=True)
print('the skewness of Age is',s1)
s2=scipy.stats.skew(df.Glucose, axis=0, bias=True)
print('the skewness of Glucose is',s2)

OUTPUT:

#Finding Kurtosis
import scipy
import pandas as pd
#create DataFrame
df = pd.read_csv("diabetes.csv")
k1=scipy.stats.kurtosis(df.Age, axis=0, bias=True)
print('the kurtosis of Age is',k1)
k2=scipy.stats.kurtosis(df.Glucose, axis=0, bias=True)
print('the kurtosis of Glucose is',k2)

OUTPUT:

RESULT:
Thus the Univariate analysis such as Frequency, Mean, Median, Mode, Variance,
Standard Deviation, Skewness and Kurtosis on the diabetes dataset have been performed
successfully.
EX.NO:5B BIVARIATE ANALYSIS USING DIABETES DATASET
DATE:
AIM:
To perform Bivariate analysis such as Linear and logistic regression modeling on the
Diabetes dataset.
ALGORITHM:
1. Linear regression uses the relationship between the data-points to draw a straight
linethrough all them.
2. This line can be used to predict future values.
3. Import scipy and draw the line of Linear Regression
4. Define response and explanatory variable.
5. Add constant to predictor variables.
6. Create the model using, sm.OLS(y, x).fit().
7. View the model using summary().
8. To construct the correlation matrix, use corr().
9. To model the logistic regression, Install scikit-learn of version 0.24.2.
10. Read and explore the data.
11. Split the Dataset as Train and Test dataset
12. Train the model using, LogisticRegression()
13. Visualize the performance of logistic regression model.
PROGRAM:
#creating scatterplots
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("diabetes.csv")
plt.scatter(df.BMI, df.Age)
plt.title('BMI vs. Age')
plt.xlabel('BMI')
plt.ylabel('Age')
plt.show()

OUTPUT:
#simple linear regression
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
df = pd.read_csv("diabetes.csv")
#define response variable
y = df['Insulin']
#define explanatory variable
x = df[['BloodPressure']]
#add constant to predictor variables
x = sm.add_constant(x)
#fit linear regression model
model = sm.OLS(y, x).fit()
#view model summary
print(model.summary())

OUTPUT:
#creating histogram
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import seaborn as sns
df = pd.read_csv("diabetes.csv")
sns.histplot(df.Age,kde=True)
plt.show()

OUTPUT:

#constructing correlation matrix:

import pandas as pd
df = pd.read_csv("diabetes.csv")
print(df.corr())
OUTPUT:
#LOGISTIC REGRESSION MODELING:
PRE-REQUISITES:
Install scikit-learn of version 0.24.2 in the command prompt as follows:
pip install scikit-learn==0.24.2

#Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#Read and Explore the data
dataset = pd.read_csv("diabetes.csv")# input
x = dataset.iloc[:, [2, 3]].values # output
y = dataset.iloc[:, 4].values

#Splitting The Dataset: Train and Test dataset

from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size = 0.25, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
xtrain = sc_x.fit_transform(xtrain)
xtest = sc_x.transform(xtest)
print (xtrain[0:10, :])

#Train The Model

from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(xtrain, ytrain)
y_pred = classifier.predict(xtest)
#Evaluation Metrics
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(ytest, y_pred)
print ("Confusion Matrix : \n", cm)
from sklearn.metrics import accuracy_score
print ("Accuracy : ", accuracy_score(ytest, y_pred))
OUTPUT:

#Visualizing the performance of logistic regression model

from matplotlib.colors import
ListedColormapX_set, y_set = xtest, ytest
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1,
stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1,
stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(
np.array([X1.ravel(), X2.ravel()]).T).reshape(
X1.shape), alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Classifier (Test set)')
plt.xlabel('Age')
plt.ylabel('Glucose')
plt.legend() plt.show()
OUTPUT:

RESULT:
Thus the Bivariate analysis such as Linear and logistic regression modeling on the
diabetesdataset have been performed and analyzed successfully.
EX.NO:5C MULTIPLE REGRESSION ANALYSIS USING DIABETES DATASET
DATE:

AIM:
To perform multiple regression analysis using diabetes dataset.
ALGORITHM:
1. Multiple regression is like linear regression, but with more than one independent
value,meaning that we try to predict a value based on two or more variables.
2. Import pandas, numpy and matplotlib packages.
3. Install and import sklearn(scikit-learn) package.
4. Import linear_model from scikit-learn.
5. Plot the graph using scatter()
6. Generate training and testing data from the dataset.
7. Model the dataset using, regr.fit()
8. Analyze the coefficients and intercepts.

PROGRAM:
from mpl_toolkits.mplot3d import Axes3D
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn import linear_model
np.random.seed(19680801)
data=pd.read_csv("diabetes.csv")
data.head(210)
data = data[["Glucose","Age","Pregnancies"]]
fig=plt.figure()
ax=fig.add_subplot(111,projection='3d')
n=100
ax.scatter(data["Glucose"],data["Age"],data["Pregnancies"],color="red")
ax.set_xlabel("Glucose")
ax.set_ylabel("Age")
ax.set_zlabel("Pregnancies")
plt.show()

OUTPUT:
#Generating training and testing data from our data:
train = data[:(int((len(data)*0.8)))]
test = data[(int((len(data)*0.8))):]
# Modeling:Using sklearn package to model data :
regr = linear_model.LinearRegression()
train_x = np.array(train[["Glucose"]])
train_y = np.array(train[["Age"]])
regr.fit(train_x,train_y)
ax.scatter(data["Glucose"],data["Age"],data["Pregnancies"],color="red")
plt.plot(train_x, regr.coef_*train_x + regr.intercept_, '-r')
ax.set_xlabel("Glucose")
ax.set_ylabel("Age") ax.set_zlabel("Pregnancies")
print ("coefficients : ",regr.coef_)#Slope
print ("Intercept : ",regr.intercept_)

OUTPUT:

RESULT:
Thus the multiple regression analysis using diabetes dataset have been implemented and
executed successfully.
EX.NO:6 EXPLORING VARIOUS PLOTTING FUNCTIONS USING DATASET
DATE:

AIM:
To apply and explore various plotting functions such as Normal curves, Density and
Contour plots, Correlation and scatter plots, Histograms and three dimensional plotting on
UCIdata sets.
ALGORITHM:
1. Import numpy, matplotlib, scipy and pandas.
2. Create the dataframe.
3. Find mean and standard deviation from the dataset.
4. Find the normal curve snd using, stats.norm()
5. Generate 1000 randomvalues and plot the normalcurve.
6. Install and import seaborn package.
7. Draw the density plot using distplot().
8. Draw the contour plot using kdeplot().
9. Construct the correlation matrix using, con.corr().
10. Display the coefficient of correlation using stats.pearsonr()
11. Plot the histogram using hist().
12. To model 3D plotting, import Axes3D.

PROGRAM:
#NORMAL CURVES:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import pandas as pd
#create DataFrame
df = pd.read_csv("diabetes.csv")
mu=df['Pregnancies'].mean()
std=df['Pregnancies'].std()
snd = stats.norm(mu, std)
# Generate 1000 random values between -100, 100
x = np.linspace(-100, 100, 1000)
plt.figure(figsize=(7.5,7.5))
plt.plot(x, snd.pdf(x))
plt.xlim(-60, 60)
plt.title('Normal Distribution', fontsize='15')
plt.xlabel('Values of Random Variable X', fontsize='15')
plt.ylabel('Probability', fontsize='15')
plt.show()
OUTPUT:

#DENSITY AND CONTOUR PLOTS:

#density plot:
import seaborn as sns
import matplotlib.pyplot
as pltimport pandas as pd
df = pd.read_csv("diabetes.csv")
sns.distplot(a=df.Glucose, hist=False)
plt.show()

OUTPUT:
#contour plot:
import seaborn as sns
import matplotlib.pyplot
as pltimport pandas as pd
df = pd.read_csv("diabetes.csv")
sns.set_style("white")
sns.kdeplot(x=df.Age, y=df.BloodPressure)
plt.show()
sns.kdeplot(x=df.Age, y=df.BloodPressure, cmap="Reds", shade=True, bw_adjust=.5)
plt.show()
sns.kdeplot(x=df.Age, y=df.BloodPressure, cmap="Blues", shade=True, thresh=0)
plt.show()

OUTPUT:
#CORRELATION AND SCATTER PLOTS:
import pandas as pd
import matplotlib.pyplot as plt
con = pd.read_csv('diabetes.csv')
print(con)
import seaborn as sns
sns.scatterplot(x="Age", y="Glucose", data=con);
plt.show()
sns.lmplot(x="Age", y="Glucose", hue="BMI", data=con);
plt.show()
#coefficient of correlation
from scipy import stats
cr=stats.pearsonr(con['Glucose'], con['Age'])
print(cr)
#correlation matrix
cormat = con.corr()
print(round(cormat,2))

OUTPUT:
#HISTOGRAMS:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
a = pd.read_csv('diabetes.csv')# Creating histogram
fig, ax = plt.subplots(figsize =(10, 7))
ax.hist(a, bins = [0, 25, 50, 75, 100])
plt.show()
OUTPUT:

#THREE DIMENSIONAL PLOTTING:

from mpl_toolkits.mplot3d import Axes3D
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn import linear_model
np.random.seed(19680801)
data=pd.read_csv("diabetes.csv")
data.head(210)
data = data[["BMI","BloodPressure","Insulin"]]
fig=plt.figure()
ax=fig.add_subplot(111,projection='3d')
n=100
ax.scatter(data["BMI"],data["BloodPressure"],data["Insulin"],color="red")
ax.set_xlabel("BMI")
ax.set_ylabel("BloodPressure")
ax.set_zlabel("Insulin")
plt.show()

OUTPUT:
RESULT:
Thus the various plotting functions such as Normal curves, Density and contour plots,
Correlation and scatter plots, Histograms and three dimensional plotting have been explored
successfully on UCI data sets.
EX.NO:7 VISUALIZING GEOGRAPHIC DATA WITH BASEMAP
DATE:
AIM:
To implement visualization of geographic data with basemap.

PRE-REQUISITIES:
Install folium.

ALGORITHM:
1. Import folium and pandas libraries.
2. Initialize the map and store it in a m object
3. Use the function, folium.Map()
4. Save the map using save() function.
5. Open and view the file using any browser.
PROGRAM:

Installation of Folium:
# import the folium, pandas
librariesimport folium
import pandas as pd
# initialize the map and store it in a m object
m = folium.Map(location = [40, -95],zoom_start = 4)
# show the map
m.save('my_map.html')

OUTPUT:

RESULT:
Thus the implementation of visualizing geographic data with base map has been
executedsuccessfully.

Data Science Lab Manual Full
No ratings yet
Data Science Lab Manual Full
47 pages
SADEWSE
No ratings yet
SADEWSE
50 pages
FDS Lab Manual R21
No ratings yet
FDS Lab Manual R21
47 pages
DS LAB MANUAL (1)
No ratings yet
DS LAB MANUAL (1)
113 pages
Fds Lab 1-3 Exp
No ratings yet
Fds Lab 1-3 Exp
18 pages
Fds Record
No ratings yet
Fds Record
69 pages
Iml Practical Assignment
No ratings yet
Iml Practical Assignment
22 pages
Log
No ratings yet
Log
16 pages
Mayan Civilization
No ratings yet
Mayan Civilization
18 pages
CN-2013 Program Only Modified
No ratings yet
CN-2013 Program Only Modified
44 pages
Methods For Evaluating Information Sources An Anno
No ratings yet
Methods For Evaluating Information Sources An Anno
13 pages
Ch-2 Python Libraries For ML
No ratings yet
Ch-2 Python Libraries For ML
70 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
45 pages
CS3361-Data Science Lab Manual - B.rethina Kumar
No ratings yet
CS3361-Data Science Lab Manual - B.rethina Kumar
36 pages
CS3362 Data Science Laboratory Alok Kumar
No ratings yet
CS3362 Data Science Laboratory Alok Kumar
50 pages
FDS record last copy
No ratings yet
FDS record last copy
61 pages
Fds Lab Record
No ratings yet
Fds Lab Record
84 pages
Nemeth. Differential Contributions of Majority and Minority Influence
No ratings yet
Nemeth. Differential Contributions of Majority and Minority Influence
10 pages
ARIIA Rankings 2020 Report
No ratings yet
ARIIA Rankings 2020 Report
16 pages
Environmental Gradients
No ratings yet
Environmental Gradients
9 pages
FDS Record
No ratings yet
FDS Record
59 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
43 pages
FDSA LAB MANUAL
No ratings yet
FDSA LAB MANUAL
53 pages
FDS Lab Manual (1-3) PDF
No ratings yet
FDS Lab Manual (1-3) PDF
17 pages
Dsa Lab Manual Inserting Pages
No ratings yet
Dsa Lab Manual Inserting Pages
6 pages
lab manual fds
No ratings yet
lab manual fds
44 pages
DSL Rough Draft
No ratings yet
DSL Rough Draft
34 pages
Final Fds Manual
No ratings yet
Final Fds Manual
77 pages
FDS_LAB_MANUAL (1)
No ratings yet
FDS_LAB_MANUAL (1)
62 pages
CRITERION 5
No ratings yet
CRITERION 5
86 pages
CRITERION 5.2.1 QLM
No ratings yet
CRITERION 5.2.1 QLM
6 pages
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
No ratings yet
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
9 pages
Final Fds Manual Print
No ratings yet
Final Fds Manual Print
55 pages
fds lab manual[1]
No ratings yet
fds lab manual[1]
24 pages
dv_lab_manual_modified
No ratings yet
dv_lab_manual_modified
31 pages
NPTEL PRESENTATION
No ratings yet
NPTEL PRESENTATION
24 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
42 pages
Grace Python Numpy MB
No ratings yet
Grace Python Numpy MB
56 pages
DSF LAB EXP FULL (1) (1)
No ratings yet
DSF LAB EXP FULL (1) (1)
88 pages
Advance Python Program Unit II
No ratings yet
Advance Python Program Unit II
92 pages
CO-PO-PSO-Calculation-FDS
No ratings yet
CO-PO-PSO-Calculation-FDS
10 pages
Numpy
No ratings yet
Numpy
14 pages
Grace Python Numpy MB Final
No ratings yet
Grace Python Numpy MB Final
55 pages
Batch2_FDS_printout
No ratings yet
Batch2_FDS_printout
38 pages
CRITERION 3
No ratings yet
CRITERION 3
104 pages
Hardware Inventory Sheet
No ratings yet
Hardware Inventory Sheet
9 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
36 pages
FDS Lab Meterial CS3361
No ratings yet
FDS Lab Meterial CS3361
30 pages
fdsa lab manual final
No ratings yet
fdsa lab manual final
70 pages
Data Science Using Python Lab Manual
No ratings yet
Data Science Using Python Lab Manual
68 pages
fds_merged (3) (1)
No ratings yet
fds_merged (3) (1)
102 pages
Module3 Advance Pythonlibraries
No ratings yet
Module3 Advance Pythonlibraries
53 pages
Gluocose Experiment Corrections
0% (1)
Gluocose Experiment Corrections
5 pages
FINAL FDS MANUAL print
No ratings yet
FINAL FDS MANUAL print
55 pages
LAB 2 DWM
No ratings yet
LAB 2 DWM
13 pages
Hypothesis Tests For Population Proportion - Large Sample
No ratings yet
Hypothesis Tests For Population Proportion - Large Sample
4 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
18 pages
One Wash National Program (Ownp)
No ratings yet
One Wash National Program (Ownp)
134 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
42 pages
Coursebook 27
No ratings yet
Coursebook 27
2 pages
FDS LAB
No ratings yet
FDS LAB
43 pages
ML3_Data_Analysis
No ratings yet
ML3_Data_Analysis
80 pages
Parent Teacher Conference Kit Freebie
100% (1)
Parent Teacher Conference Kit Freebie
16 pages
M3-Introduction to Numpy and Pandas
No ratings yet
M3-Introduction to Numpy and Pandas
55 pages
Attachment 3 Python for Data Analysis Lyst9850 (1)
No ratings yet
Attachment 3 Python for Data Analysis Lyst9850 (1)
31 pages
OCS353-Data Science Fundamentals Manual 1_pdf
No ratings yet
OCS353-Data Science Fundamentals Manual 1_pdf
6 pages
Kerala University s8 Syllabus 2008 Scheme
No ratings yet
Kerala University s8 Syllabus 2008 Scheme
29 pages
CPC Mechanical - Diagnostic Test 2014: Quantitative Aptitude
No ratings yet
CPC Mechanical - Diagnostic Test 2014: Quantitative Aptitude
2 pages
Value Added Course: Programming in Python and Machine Learning UNIT-2
No ratings yet
Value Added Course: Programming in Python and Machine Learning UNIT-2
41 pages
De&v Lab Manual
No ratings yet
De&v Lab Manual
91 pages
Summary Sampul
No ratings yet
Summary Sampul
1 page
Numpy
No ratings yet
Numpy
4 pages
EXP1-siddhant gupta (23_SE_148)
No ratings yet
EXP1-siddhant gupta (23_SE_148)
17 pages
CS3361 - Data Science
No ratings yet
CS3361 - Data Science
56 pages
dl-question-bank
No ratings yet
dl-question-bank
21 pages
Slipforming of Advanced Concrete Structures: K.T. Fossa
No ratings yet
Slipforming of Advanced Concrete Structures: K.T. Fossa
6 pages
Auditing I Course Outline
No ratings yet
Auditing I Course Outline
3 pages
Lau W-2012-Nano y Ultrafiltración
No ratings yet
Lau W-2012-Nano y Ultrafiltración
6 pages
Reasons Why A Domestic Business Operation To Change To Some Form of International Operation
No ratings yet
Reasons Why A Domestic Business Operation To Change To Some Form of International Operation
2 pages
Optics: Properties of Light
No ratings yet
Optics: Properties of Light
52 pages
Introduction To Numpy Pandas and Matplotlib
No ratings yet
Introduction To Numpy Pandas and Matplotlib
2 pages
HKU - 7001 - 3.2 Managing Data II
No ratings yet
HKU - 7001 - 3.2 Managing Data II
67 pages
SM-V700 Manual
No ratings yet
SM-V700 Manual
86 pages
Deswik - Suite 2016.1.841 - Patch Release Notes
No ratings yet
Deswik - Suite 2016.1.841 - Patch Release Notes
18 pages
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
From Everand
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
e3
No ratings yet
Document Type: Document Code Revision No. Document Title: Effective Date
No ratings yet
Document Type: Document Code Revision No. Document Title: Effective Date
6 pages
Introduction To Gear Design-R1
No ratings yet
Introduction To Gear Design-R1
35 pages
AWSCommercial Installation
No ratings yet
AWSCommercial Installation
32 pages
RAW11 - Q3 - Mod1 - Reading-and-Thinking-Strategies-Across-Text-Types - Version 3
81% (81)
RAW11 - Q3 - Mod1 - Reading-and-Thinking-Strategies-Across-Text-Types - Version 3
57 pages
Plate Girder
No ratings yet
Plate Girder
45 pages
Airmaster User Manual
No ratings yet
Airmaster User Manual
333 pages
Datascience Lab Manual
No ratings yet
Datascience Lab Manual
46 pages
Prethesis Final Report Redevelopment of Rythubazar in Mehidepatnam
50% (2)
Prethesis Final Report Redevelopment of Rythubazar in Mehidepatnam
40 pages
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
6-Worked Out Examples-Tension & Compression
No ratings yet
6-Worked Out Examples-Tension & Compression
20 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet