numpy ,Scipy,matplot
numpy ,Scipy,matplot
Short introduction of python libraries which are used widely for Machine Learning like NumPy, SciPy,
matplotlib, scikit-learn, pandas
Till today I have written all tutorials without libraries and now I’m taking our journey to next level
where we will use python libraries for classification, visualization and clustering. In this article, we
will have a short introduction of NumPy, SciPy, matplotlib, scikit-learn, pandas.
NumPy
NumPy basically provides n-dimensional array object. NumPy also provides mathematical
functions which can be used in many calculations.
import numpy as np
arr = np.array([[1,2,3],[4,5,6]])
print("Numpy array
{}".format(arr))
Output
Output
Numpy array
[[1 2 3]
[4 5 6]]
SciPy
SciPy is collection of scientific computing functions. It provides advanced linear algebra routines,
mathematical function optimization, signal processing, special mathematical functions, and
statistical distributions.
# Create a 2D NumPy array with a diagonal of ones, and zeros everywhere else
eye = np.eye(3)
print("NumPy array:
{}".format(eye))
sparse_matrix = sparse.csr_matrix(eye)
print("
SciPy sparse CSR matrix:
{}".format(sparse_matrix))
Output
NumPy array:
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
(0, 0) 1.0
(1, 1) 1.0
(2, 2) 1.0
matplotlib
matplotlib is scientific plotting library usually required to visualize data. Importantly visualization is
required to analyze the data. You can plot histograms, scatter graphs, lines etc.
x = [1,2,3]
y = [4,5,6]
plt.scatter(x,y)
plt.show()
Output
scikit-learn
scikit-learn is built on NumPy, SciPy and matplotlib provides tools for data analysis and data
mining. It provides classification and clustering algorithms built in and some datasets for practice like
iris dataset, Boston house prices dataset, diabetes dataset etc.
iris_data = datasets.load_iris()
sample = iris_data['data'][:3]
{}".format(iris_data['feature_names']))
print("{}".format(sample))
Output
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
[[5.1 3.5 1.4 0.2]
pandas
pandas is used for data analysis it can take multi-dimensional arrays as input and produce
charts/graphs. pandas may take a table with columns of different datatypes. It may ingest data from
various data files and database like SQL, Excel, CSV etc.
import pandas as pd
dataframe = pd.DataFrame(age)
print("all age:
{}".format(dataframe))
{}".format(filtered))
Output
all age:
age
0 4
1 6
2 8
3 34
4 5
5 30
6 41
age
3 34
5 30
6 41
Requests Beautiful Soup lxml Selenium
Learning Curve Very easy (beginner- Very easy (beginner- Easy Easy
friendly) friendly)
Size of Web Scraping Project Large and small Large and small Large and Small
Supported small