Machine
Learning in Python
Dr. Hafeez
Python and Scipy installa:on
• Hi, You cannot get started with machine learning in Python un:l you have access
to the plaAorm
• This lesson is easy, you must download and install the Python 3.6 plaAorm on your
computer
•
• Visit the Python homepage and download Python for your opera:ng system
(Linux, OS X or Windows). Install Python on your computer. You may need to use a
plaAorm specific package manager such as macports on OS X or yum on RedHat
Linux
• You also need to install the SciPy plaAorm and the scikit-learn library. I
recommend using the same approach that you used to install Python. You can
install everything at once (much easier) with Anaconda. Anaconda is
recommended for beginners
• Start Python for the first :me from command line by typing "python" at the
command line. Check the versions of everything you are going to need using the
code below:
Python for first :me
• # Python version
• import sys
• print('Python: {}'.format(sys.version))
• # scipy
• import scipy
• print('scipy: {}'.format(scipy.__version__))
• # numpy
• import numpy
• print('numpy: {}'.format(numpy.__version__))
• # matplotlib
• import matplotlib
• print('matplotlib: {}'.format(matplotlib.__version__))
• # pandas
• import pandas
• print('pandas: {}'.format(pandas.__version__))
• # scikit-learn
• import sklearn
• print('sklearn: {}'.format(sklearn.__version__))
ML in Python
• Need more help? See this blog post:
>>How to Setup a Python Environment for
Machine Learning and Deep Learning with
Anaconda
• In the next lesson, we will look at basic Python
and SciPy syntax
• Take the next step and make fast progress
in Machine Learning Mastery With Python.
Basic Python and Scipy Syntax
• Ability to read and write basic Python scripts
• As a developer, you can pick up new programming languages pre`y
quickly. Python is case sensi:ve, uses hash (#) for comments and
uses white space to indicate code blocks (white space ma`ers)
• Today's task is to prac:ce the basic syntax of the Python
programming language and important SciPy data structures in the
Python interac:ve environment.
• Prac:ce assignment, working with lists and flow control in Python.
• Prac:ce working with NumPy arrays.
• Prac:ce crea:ng simple plots in Matplotlib.
• Prac:ce working with Pandas Series and DataFrames.
Basic Python and Scipy Syntax
• For example, below is a simple example of crea:ng a Pandas
DataFrame.
• # dataframe
• import numpy
• import pandas
• myarray = numpy.array([[1, 2, 3], [4, 5, 6]])
• rownames = ['a', 'b']
• colnames = ['one', 'two', 'three']
• mydataframe = pandas.DataFrame(myarray, index=rownames,
columns=colnames)
• print(mydataframe)
• In the next lesson, we will look at loading data into Python.
Load Datasets from CSV
• Hi, Machine learning algorithms need data
• You can load your own data from CSV files but when
you are gekng started with machine learning in
Python you should prac:ce on standard machine
learning datasets
• Your task for today's lesson is to get comfortable
loading data into Python and to find and load standard
machine learning datasets
• There are many excellent standard machine learning
datasets in CSV format that you can download and
prac:ce with on the UCI machine learning repository
Load Datasets from CSV
• Prac:ce loading CSV files into Python using
the CSV.reader() func:on in the standard
library.
• Prac:ce loading CSV files using NumPy and
the numpy.loadtxt() func:on.
• Prac:ce loading CSV files using Pandas and
the pandas.read_csv() func:on.
Load Datasets from CSV
• To get you started below is a snippet that will load the Pima Indians
onset of diabetes dataset using Pandas directly from the UCI
Machine Learning Repository:
• # Load CSV using Pandas from URL
• from pandas import read_csv
• url = "h`ps://goo.gl/bDdBiA"
• names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age',
'class']
• data = read_csv(url, names=names)
• print(data.shape)
• In the next lesson, you will calculate descrip:ve sta:s:cs for your
data in Python.
Understand Data with Descrip:ve
Stats
• Hi, once you have loaded your data into Python
you need to be able to understand it.
• The be`er you can understand your data, the
be`er and more accurate the models that you
can build. The first step to understanding your
data is to use descrip:ve sta:s:cs.
• Today your lesson is to learn how to use
descrip:ve sta:s:cs to understand your data. I
recommend using the helper func:ons provided
on the Pandas DataFrame.
Understand Data with Descrip:ve
Stats
• Understand your data using the head() func:on
to look at the first few rows.
• Review the dimensions of your data with
the shape property.
• Look at the data types for each a`ribute with
the dtypes property.
• Review the distribu:on of your data with
the describe() func:on.
• Calculate pair-wise correla:on between your
variables using the corr() func:on.
Understand Data with Descrip:ve
Stats
• The below example loads the Pima Indians onset of diabetes
dataset and summarizes the distribu:on of each a`ribute.
• # Sta:s:cal Summary
• import pandas
• url = "h`ps://goo.gl/bDdBiA"
• names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age',
'class']
• data = pandas.read_csv(url, names=names)
• descrip:on = data.describe()
• print(descrip:on)
• In the next lesson, you will learn about data visualiza:on in Python.
Understand Data with Data
visualiza:on
• Hi, con:nuing on from the last lesson, you must
spend :me to be`er understand your data.
• A second way to improve your understanding of
your data is by using data visualiza:on
techniques (e.g. plokng).
• Today, your lesson is to learn how to use plokng
in Python to understand a`ributes alone and
their interac:ons. Again, I recommend using the
helper func:ons provided on the Pandas
DataFrame.
Data visualiza:on
• Use the hist() func:on to create a histogram
of each a`ribute.
• Use the plot(kind='box') func:on to create box
and whisker plots of each a`ribute.
• Use the pandas.sca8er_matrix() func:on to
create pair-wise sca`er plots of all a`ributes.
Understand Data with Data
visualiza:on
• For example, the snippet below will load the diabetes dataset and create a
sca`er plot matrix of the dataset.
• # Sca`er Plot Matrix
• import matplotlib.pyplot as plt
• import pandas
• from pandas.plokng import sca`er_matrix
• url = "h`ps://goo.gl/bDdBiA"
• names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
• data = pandas.read_csv(url, names=names)
• sca`er_matrix(data)
• plt.show()
• In the next lesson, you will learn how to pre-process your data in Python.