DATA SCIENCE USING PYTHON - COMPLETE UNITS
UNIT 1: INTRODUCTION TO DATA SCIENCE AND PYTHON PROGRAMMING
1. Implement basic Python programs for reading input from console.
name = input("Enter your name: ")
print("Hello", name)
2. Perform Creation, indexing, slicing, concatenation and repetition operations on
Python built-in datatypes: Strings, List, Tuples, Dictionary, Set
s = "Python"
print(s[1:4])
l = [1,2,3]
print(l[0:2] + l*2)
t = (1,2,3)
print(t[1:])
d = {'a':1}
print(d['a'])
s = {1,2}
s.add(3)
3. Solve problems using decision and looping statements.
x=5
if x > 0:
print("Positive")
for i in range(3):
print(i)
4. Apply Python built-in data types: Strings, List, Tuples, Dictionary. Set and their
methods to solve any given problem
print("hi".upper())
l = [1,2]; l.append(3)
t = (1,2,3)
print(t.count(2))
d = {'a':1}; print(d.get('a'))
s = {1}; s.add(2)
5. Handle numerical operations using math and random number functions
import math, random
print(math.sqrt(25))
print(random.randint(1, 10))
6. Create user-defined functions with different types of function arguments.
def greet(name, msg="Hi"):
print(msg, name)
greet("Nitesh")
UNIT 2: FILE, EXCEPTION HANDLING AND OOP
1. Create packages and import modules from packages.
from mypkg import module1
module1.say_hello()
2. Perform File manipulations- open, close, read, write, append and copy from one file
to another.
with open("f1.txt", "w") as f: f.write("Hi")
with open("f1.txt") as f: data = f.read()
with open("f2.txt", "w") as f: f.write(data)
3. Handle Exceptions using Python Built-in Exceptions
try:
x = 1/0
except ZeroDivisionError:
print("Cannot divide by zero")
4. Solve problems using Class declaration and Object creation.
class A:
def __init__(self, x): self.x = x
a = A(5)
print(a.x)
5. Implement OOP concepts like Data hiding and Data Abstraction
class Test:
def __init__(self): self.__val = 10
def get(self): return self.__val
t = Test()
print(t.get())
6. Solve any real-time problem using inheritance concept.
class Animal:
def speak(self): print("Sound")
class Dog(Animal):
def speak(self): print("Bark")
d = Dog()
d.speak()
UNIT 3: INTRODUCTION TO NUMPY
1. Create NumPy arrays from Python Data Structures, Intrinsic NumPy objects and
Random Functions.
import numpy as np
np.array([1,2,3])
np.arange(5)
np.random.rand(2)
2. Manipulation of NumPy arrays- Indexing, Slicing, Reshaping. Joining and Splitting.
a = np.array([[1,2],[3,4]])
print(a[1,1])
print(a.reshape(4,1))
print(np.hstack([a,a]))
3. Computation on NumPy arrays using Universal Functions and Mathematical
methods.
a = np.array([1,2,3])
print(np.mean(a), np.sum(a), np.sqrt(a))
4. Import a CSV file and perform various Statistical and Comparison operations on
rows/columns.
data = np.genfromtxt("data.csv", delimiter=",", skip_header=1)
print(np.mean(data, axis=0))
5. Load an image file and do crop and flip operation using NumPy Indexing.
from imageio import imread
img = imread("img.jpg")
crop = img[100:200,100:200]
flip = img[::-1]
UNIT 4: DATA MANIPULATION WITH PANDAS
1. Create Pandas Series and DataFrame from various inputs.
import pandas as pd
s = pd.Series([1,2,3])
df = pd.DataFrame({'A':[1,2]})
2. Import any CSV file to Pandas DataFrame and perform the following:
df = pd.read_csv("file.csv")
print(df.head(10))
print(df.tail(10))
(b) Get the shape, index and column details
print(df.shape, df.index, df.columns)
(c) Select/Delete the records (rows)/columns based on conditions.
print(df[df['Age'] > 20])
df.drop(columns=['Name'])
(d) Perform ranking and sorting operations.
df['Rank'] = df['Score'].rank()
df.sort_values(by='Age')
(e) Do required statistical operations on the given columns.
print(df.describe())
print(df['Score'].mean())
(f) Find the count and uniqueness of the given categorical values.
print(df['Gender'].value_counts())
print(df['Gender'].unique())
(g) Rename single/multiple columns.
df.rename(columns={'Name':'Full Name'}, inplace=True)
UNIT 5: DATA CLEANING, PREPARATION AND VISUALIZATION
1. Import any CSV file to Pandas DataFrame
import pandas as pd
df = pd.read_csv("data.csv")
(a) Handle missing data by detecting and dropping/ filling missing values.
df.isnull().sum()
df.dropna()
df.fillna(0)
df['Age'].fillna(df['Age'].mean())
(b) Transform data using apply() and map() method.
df['col'] = df['col'].apply(lambda x: x*2)
df['gender'] = df['gender'].map({'M':'Male','F':'Female'})
(c) Detect and filter outliers.
Q1 = df['col'].quantile(0.25)
Q3 = df['col'].quantile(0.75)
IQR = Q3 - Q1
df[(df['col'] < Q1 - 1.5*IQR) | (df['col'] > Q3 + 1.5*IQR)]
(d) Perform Vectorized String operations on Pandas Series.
df['Name'].str.upper()
df['Email'].str.contains('@gmail')
(e) Visualize data using Line Plots. Bar Plots, Histograms, Density Plots and Scatter
Plots.
import matplotlib.pyplot as plt
df['Marks'].plot(kind='line')
plt.show()