23CS302 - dslab - experiment 1
23CS302 - dslab - experiment 1
23CS302 - dslab - experiment 1
0 0 3 1.5
Objectives:
To understand the python libraries for data science
To understand the basic Statistical and Probability measures for data science
To learn descriptive analytics on the benchmark data sets
To apply correlation and regression analytics on standard data sets
To present and interpret data using visualization packages in Python
List of experiments:
1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and
Pandas packages.
2. Implementation of python programs using NumPy arrays.
3. Implementation of python programs using Pandas data frames.
4. Implementation of python programs to perform descriptive analytics on the Iris data set by
reading text files, Excel and the web of Iris data set.
5. Implementation of python programs to perform the following analysis on diabetes data set
from UCI and Pima Indians Diabetes data set :
a. Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard
Deviation, Skewness and Kurtosis.
b. Bivariate analysis: Linear and logistic regression modelling
c. Multiple Regression analysis
d. Compare the results of the above analysis for the two data sets.
6. Implementation of python programs to apply and explore various plotting functions on
UCI data sets.
a. Normal curves
b. Density and contour plots
c. Corelation and scatter plots
d. Histograms
e. Three dimensional plotting
7. Implementation of python programs for visualizing Geographic Data with Basemap
NumPy:
NumPy(Numerical Python) is a fundamental open source library for numerical
computing in Python, providing support for large, multi-dimensional arrays and matrices,
along with a collection of mathematical functions to operate on these arrays.
SciPy:
SciPy is a Python library that extends NumPy's capabilities by providing additional
functions for scientific and technical computing. It includes modules for optimization,
integration, interpolation, eigenvalue problems, and more, making it a versatile tool for
complex mathematical and scientific tasks.
Jupyter:
Jupyter is an open-source web application that allows you to create and share
documents containing live code, equations, visualizations, and narrative text, facilitating
interactive computing and data analysis.
Statsmodels:
Statsmodels is a Python library for estimating and interpreting statistical models,
providing tools for regression analysis, hypothesis testing, and various statistical methods.
Pandas:
Pandas is a Python library for data manipulation and analysis, offering data structures
like DataFrames and Series to handle and analyze tabular data with ease.
import scipy
print(scipy.__version__)
Sample Program:
import numpy as np
from scipy.optimize import minimize
from scipy.integrate import quad
def quadratic_function(x):
return x**2 + 3*x + 2
result = minimize(quadratic_function, x0=0) # x0 is the initial guess
print("Optimization Result:")
print("Optimal value of x:", result.x[0])
print("Minimum value of the function:", result.fun)
Output:
Optimization Result:
Optimal value of x: -1.5
Minimum value of the function: 0.25
import statsmodels.api as sm
print(sm.__version__)
4. Running Jupyter
After installation, you can start Jupyter Notebook or JupyterLab:
For Jupyter Notebook:
jupyter notebook
For JupyterLab:
jupyter lab
Result:
Thus the NumPy, SciPy, Jupyter, Statsmodels and Pandas packages are downloaded
and installed successfully.