0% found this document useful (0 votes)
460 views

Data Analytics Lab

The document outlines an index for a lab file on data analytics submitted by a student named Amit Singh to their professors at NOIDA INSTITUE OF ENGINEERING & TECHNOLOGY, listing topics like performing numerical operations, data import/export, matrix operations, statistical analysis, and simple linear and logistic regression using Python/R. The aims demonstrate how to handle data preprocessing tasks, fit regression models, and evaluate their performance on test data.

Uploaded by

Amit Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
460 views

Data Analytics Lab

The document outlines an index for a lab file on data analytics submitted by a student named Amit Singh to their professors at NOIDA INSTITUE OF ENGINEERING & TECHNOLOGY, listing topics like performing numerical operations, data import/export, matrix operations, statistical analysis, and simple linear and logistic regression using Python/R. The aims demonstrate how to handle data preprocessing tasks, fit regression models, and evaluate their performance on test data.

Uploaded by

Amit Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

NOIDA INSTITUE OF ENGINEERING & TECHNOLOGY,

GREATER NOIDA

Department of Information Technology

LAB FILE
ON
DATA ANALYTICS LAB
KIT-651
(6th Semester)
(2020 – 2021)

Submitted To: Submitted by:

Ms. Tanya Name: Amit Singh

Dr. Vivek Kumar Roll: 1813313019

Affiliated to Dr. A.P.J Abdul Kalam Technical University, Uttar Pradesh, Lucknow.
Data ANALYTICS LAB
KIT-651
INDEX
S.NO TOPIC DATE GRADE SIGNATURE

To get the input from user and perform numerical


1 operations (MAX, MIN, AVG, SUM, SQRT, ROUND)
using in R/Python.
To perform data import/export (.CSV, .XLS, TXT)
2
operations using data frames in R/Python.
To get the input matrix from user and perform Matrix
addition, subtraction, multiplication, inverse transpose
3
and division operations using vector concept in
R/Python.
To perform statistical operations (Mean, Median, Mode
4
and Standard deviation) using R/Python.
To perform data pre-processing operations i) Handling
5
Missing data ii) Min-Max normalization.
6 To perform Simple Linear Regression with R/Python.

7 To perform Simple Logistic Regression with R/Python.

10

11

12

13

14

15

16
Aim -1. To get the input from user and perform numerical operations (MAX,
MIN, AVG, SUM, SQRT, ROUND) using in R/Python.

import math
list1 = []
  
n = int(input("Enter number of elements : "))
  
for i in range(0, n):
  ele = int(input())
  list1.append(ele)
      
print("Sum = ",sum(list1))
print("Maximum element = ",max(list1))
print("Minimum element = ",min(list1))
print("Square root =" ,math.sqrt(list1[1]))
print("Round =",round(5.56))
print("Average = ", sum(list1)/len(list1))

OUTPUT: -
Enter number of elements : 5
1
6
2
8
7
Sum = 24
Maximum element = 8
Minimum element = 1
Square root = 2.449489742783178
Round = 6
Average = 4.8
Aim - 2. To perform data import/export (.CSV, .XLS, TXT) operations using
data frames in R/Python.

from google.colab import drive


drive.mount("/content/drive")

import pandas as pd
df = pd.read_csv('/content/drive/MyDrive/Da-Lab/ITUR_rain1.csv')

print(df.Frequency)

OUTPUT: -

0 1.0
1 1.5
2 2.0
3 2.5
4 3.0
...
99 96.0
100 97.0
101 98.0
102 99.0
103 100.0
Name: Frequency, Length: 104, dtype: float64
Aim - 3. To get the input matrix from user and perform Matrix addition,
subtraction, multiplication, inverse transpose and division operations using
vector concept in R/Python.

import numpy
r = int(input("Enter  no of row of matrix1 "))
c = int(input("Enter no of cloumns of matrix1 "))
m = []
print("Enter elements")
for i in range(r):          
    a =[]
    for j in range(c):      
         a.append(int(input()))
    m.append(a)
r1 = int(input("Enter the number of rows of matrix 2 "))
c1 = int(input("Enter the number of columns of matrix 2 "))
m1 = []
print("Enter elements")
for i in range(r1):          
    a1 =[]
    for j in range(c1):      
         a1.append(int(input()))
    m1.append(a1)
m2=[]
for i in range(r):
  a3=[]
  for j in range(c):
    a3.append(m[i][j]+m1[i][j])
  m2.append(a3)
print("Sum pf matrix is:")
for i in range (r):
  for j in range(c):
    print(m2[i][j],end=" ")
  print()
pm=[]
for i in range (r):
  sm=[]
  for j in range (c):
    s=0
    
    for k in range (c):
      s=s+m[i][k]*m1[k][j]
    sm.append(s)
  pm.append(sm)
print("Product of matrix:")
for i in range( r):
  for j in range (c):
    print(pm[i][j],end =" ")
  print()
print("Transpose of multiplication matrix is :")
print(numpy.transpose(pm))

OUTPUT: -

Enter no of row of matrix1 2


Enter no of cloumns of matrix1 2
Enter elements
1
2
3
4
Enter the number of rows of matrix 2 2
Enter the number of columns of matrix 2 2
Enter elements
4
5
6
7
Sum pf matrix is:
57
9 11
Product of matrix:
16 19
36 43
Transpose of multiplication matrix is :
[[16 36]
[19 43]]
Aim -4. To perform statistical operations (Mean, Median, Mode and Standard
deviation) using R/Python.

import statistics as st
lst = []
  

n = int(input("Enter number of elements : "))
  

for i in range(0, n):
    ele = int(input())
  
    lst.append(ele) 

print("Mean value is:",st.mean(lst))
print("Meadian is:",st.median(lst))
print("Mode value is :",st.mode(lst))
print("Standard deviation is :",statistics.stdev(lst))

OUTPUT :-

Enter number of elements : 5


1
2
3
4
5
Mean value is: 3
Meadian is: 3
Mode is: 0
Standard deviation is: 1.414
Aim - 5. To perform data pre-processing operations i) Handling Missing data
ii) Min-Max normalization.

import pandas as pd
import numpy as np
df = pd.read_csv("/content/drive/MyDrive/Da-Lab/titanic.csv")
df.head()

df.drop(['PassengerId','Name','SibSp','Parch','Ticket','Cabin','Embarked'],axis='columns',inplace=
True)
df.head()
target = df.Survived
inputs = df.drop('Survived',axis='columns')

#One-hot encoding
dummies = pd.get_dummies(inputs.Sex)
dummies.head(3)

inputs = pd.concat([inputs,dummies],axis='columns')
inputs.head(3)

inputs.drop(['Sex','male'],axis='columns',inplace=True)
inputs.head(3)
inputs.columns[inputs.isna().any()]

OUTPUT: -

Index(['Age'], dtype='object')

inputs.Age = inputs.Age.fillna(inputs.Age.mean())
inputs.head()

inputs.Age[:10]

OUTPUT: -

0 22.000000
1 38.000000
2 26.000000
3 35.000000
4 35.000000
5 29.699118
6 54.000000
7 2.000000
8 27.000000
9 14.000000
Name: Age, dtype: float64
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(inputs,target,test_size=0.3)

from sklearn.naive_bayes import GaussianNB
model = GaussianNB()

model.fit(X_train,y_train)

OUTPUT: -
GaussianNB(priors=None, var_smoothing=1e-09)

model.score(X_test,y_test)

OUTPUT: -

0.7574626865671642

model.predict(X_test[0:10])

OUTPUT: -

array([0, 1, 1, 1, 0, 1, 1, 0, 0, 1])
Aim - 6. To perform Simple Linear Regression with R/Python.

import numpy as np 

import pandas as pd
import matplotlib.pyplot as plt

from google.colab import files
uploaded = files.upload()

data = pd.read_csv("area.csv")
X = data.Area.values.astype(float)

y = data.Price.values.astype(float)

plt.scatter(X,y)
plt.xlabel("Area")
plt.ylabel("Price")
plt.show()
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
reg = linear_model.LinearRegression()
reg.fit(data[['Area']],data.Price)

OUTPUT: -

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

reg.predict([[100]])

OUTPUT: -

array([9229.8328887])

reg.coef_

OUTPUT: -

array([40.46056658])

reg.intercept_

OUTPUT: -

5183.7762302371

100.6691978*100+1118.140232700558

OUTPUT: -

11185.060012700558
Aim - 7. To perform Simple Logistic Regression with R/Python.

You might also like