0% found this document useful (0 votes)
9 views78 pages

Python For Machine Learning

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 78

Python

For
Machine Learning

Lecturer : Lyheang UNG


Outline
1. Python

2. Numpy

3. Pandas
1. Python

1. Introduction
2. Basic Syntax
3. Variable
4. Operators
5. Data Types
6. Control Flow
7. Function
8. Class
9. Module
1. Introduction

Python is a high-level programming language which is:


● Interpreted − Python is processed at runtime by the interpreter. You do not need to compile your
program before executing it. This is similar to PHP.
● Interactive − You can actually sit at a Python prompt and interact with the interpreter directly to
write your programs.
● Object-Oriented − Python supports Object-Oriented style or technique of programming that
encapsulates code within objects.
● Beginner's Language − Python is a great language for the beginner-level programmers and
supports the development of a wide range of applications from simple text processing to WWW
browsers to games.
1. Introduction

● Why Python for Machine Learning?


2. Basic Syntax

● Line Indentation is used in Python to delimit block codes. All statements within the same block
must be indented the same amount.
● The header line of compound statements, such as if, while, def, and class must be terminated
with a colon (:). The semicolon (;) is optional at the end of statement.
● In Python, the comments are followed by hash sign ”#”.
3. Variable

● Python is dynamically typed language. You do not need to


declare variables. The declaration happens automatically when
you assign a value to a variable.
● Variables can change type, simply by assigning them a new value
of a different type.
● Python allows you to assign a single value to several variables
simultaneously. You can also assign multiple objects to multiple
variables.
3. Variable Naming

● A variable name starts with a letter (A to Z or a to z) or an underscore (_) followed by zero or more
letters, underscores and digits (0 to 9).
● The following are reserved words that cannot be used as constant or variable or any other identifier
names.

and exec not assert finally


or break for pass class
from print continue global raise
def if return del import
try elif in while else
is with except lambda yield
4. Operators

The operators are used to perform operations on variables and values. In Python, they are classified into
six groups:
○ Arithmetic Operators
○ Assignment Operators
○ Comparison Operators
○ Logical Operators
○ Identity Operators
○ Membership Operators
4. Operators

● Arithmetic Operators are used with numeric values to perform common mathematical operations.

Operator Description Example

+ Addition x+y

- Substraction x-y

* Multiplication x*y

/ Division x/y (i.e. 5 / 2 = 2.5)

% Modulus x%y

** Exponentiation x ** y

// Floor division x // y (i.e. 5 // 2 = 2)


4. Operators

● Assignment Operators are used to assign values to variables.

Operator Description Same As

= x=5 x=5

+= x += 3 x=x+3

-= x -= 3 x=x-3

*= x *= 3 x=x*3

/= x /= 3 x=x/3

%= x %= 3 x=x%3

//= x //= 3 x = x // 3
4. Operators

● Comparison Operators are used to compare between two variables or values.

Operator Description Example

== Equal x == y

!= Not equal x != y

> Greater than x>y

< Less than x<y

>= Greater than or equal to x >= y

<= Less than or equal to x <= y


4. Operators

● Logical Operators are used to combine conditional or logic statements.

Operator Description Example

and True if both statements are true x < 5 and x < 10

or True if one of the statements is true x < 5 or x < 4

not Reverse the result, return False if not (x < 5 and x < 10)
the result is true
4. Operators

● Identity Operators are used to compare the objects whether they are equal.
● The variables which point to the same object, share the same reference location in memory.

Operator Description Example

is True if both variables are the same x is y


object

is not True if both variables are not the same x is not y


object
4. Operators

● Membership Operators are used to test if an object is presented in a sequence (i.e. list, set).

Operator Description Example

in True if a sequence with the specified x in y


value is present in the object

not in True if a sequence with the specified x not in y


value is not present in the object
5. Data Types
● Data types are the classification or categorization of data items. Data types represent a kind of
value which determines what operations can be performed on that data.
● In Python, types of the data are classified as follows:
○ Boolean
○ Number
○ String
○ Sequence
○ Dictionary
○ Binary
5. Boolean

In Python, True and False are Boolean objects of class 'bool' and they are immutable. Python assumes
any non-zero and non-null values as True, otherwise it is False value.
5. Number

● Numbers are immutable objects consisting of three built-in types:


○ Integer
○ Floating-point numbers
○ Complex numbers: <real part> + <imaginary part>j
5. Number

● Common Functions

Function Description Example

int(x) Convert x to an integer int(3.7)


Convert x to a floating-point
float(x) float(2)
number
abs(x) Absolute value of x abs(-10)

pow(x, y) Value of x y pow(2, 3)

exp(x) Value of ex exp(3)

sqrt(x) Square root of x sqrt(4)


5. String

● Python Strings are immutable objects that cannot change their values.
● You can update an existing string by (re)assigning a variable to another string.
● Python accepts single ('), double (") quotes to denote string literals.
5. String

● Common Operators

Operator Description Example


Concatenation - Add values on either side 'ab' + 'cd' → 'abcd'
+
of the operator.
Repetition - Create new string by 'a' * 3 → ’aaa'
* concatenating multiple copies of the same
string.
Slice - Give the character from the given m = 'abcd’
[]
index . m[0] → a, m[-1] → d
Range Slice - Give the characters from the m = 'abcde’
[:]
given range. m[1:3] → bc
Membership - Return true if a character m = 'ab’
in
exists in the given string. ’a’ in m → True
5. String
Method Description
● Common Methods isalpha() True if a string is alphanumeric and False otherwise.
True if a string contains only digits and False
isdigit()
otherwise.
lower() Convert all uppercase letters in string to lowercase.

upper() Convert lowercase letters in string to uppercase.

replace(old, new) Replace all occurrences of ’old’ in string with ’new’.

Splits a string according to delimiter separator


split(separator)
(space if not provided) and return list of substrings.
Removes all leading and trailing whitespace of a
strip()
string.
5. List

● A list is an ordered group of items or elements whose elements do not have to be of the same type.
A list contains items separated by commas (,) and enclosed within square brackets ([]).
● List indexes start at 0 in the beginning of the list and working their way from -1 at the end. Similar
to strings, Lists operations include slicing ([] and [:]), concatenation (+), repetition (*) and
membership (in).
5. List

● Common Functions

Function Description

len(list) Give the total length of the list.

max(list) Return max value from the list.

min(list) Return min value from the list.

list(tuple) Convert a tuple into list.


5. List

● Common Methods

Method Description

append(obj) Append an object to list

insert(index, obj) Insert an object into list at offset index

count(obj) Return the number of times an object occurs in the list

index(obj) Return the lowest index in list that an object appears

remove(obj) Remove an object from list

reverse() Reverse the elements of list in place

sort() Sort the elements of list in place


5. Set

● A set is a collection which is unordered and unindexed, written with curly brackets ({}).
● Set cannot be accessed by referring to the index, because they are unordered, the items has no
index. However, you can loop through the set items using a for loop, or ask if a specified value is
present in a set, by using the in keyword.
5. Set

● Common Methods Method Description

add(element) Add an object to the set.

clear() Remove all the elements from the set.

copy Return a copy of the set.

update(set) Update the set with the union of this set and another.

remove(element) Remove the specified object from the set.

disjoint() Return whether two sets have an intersection or not.

issubset(set) Return whether another set contains this set or not.


5. Tuple

● Tuples are immutable objects that cannot be changed once they have been created. A tuple
contains items separated by commas and enclosed in parentheses instead of square brackets.
● You can update an existing tuple by (re)assigning a variable to another tuple.
● Tuples are faster than lists and protect your data against accidental changes to these data. The
rules for tuple indices are the same as for lists and they have the same operations, functions as
well.
5. Dictionary

● Dictionaries are kind of hash table which consist of key-value pairs of unordered elements.
○ Keys : must be immutable data types, usually numbers or strings.
○ Values : can be any arbitrary Python object.
● Dictionaries are mutable objects that can change their values.
● A dictionary is enclosed by curly braces ({ }), the items are separated by commas (,), and each key is
separated from its value by a colon (:).
● Dictionary’s values can be assigned and accessed using square braces ([]) with a key to obtain its
value.
5. Dictionary
5. Dictionary

● Common Methods

Method Description

keys() Return list of dictionary's keys.

values() Return list of dictionary's values.

items() Return list of dictionary's (key, value) tuple pairs.

get(key, Return value with respect to the key or default if key not in
default=None) dictionary.
update(dict2) Add key-values pairs of another dictionary to this dictionary.
6. Control Flow

● In typical programs, there always been a series of statements faithfully executed by in exact top-
down order. What if you wanted to change the flow of how it works?
● As you might have guessed, this is achieved using control flow statements. There are three control
flow statements in Python - if, for and while.
● Python does not provide switch statements as in other languages (Java, PHP…).
6. If..elif..else

● If..else Statement

● If..elif..else Statement
6. Loop

● for Loop

● while Loop
6. Loop

● break: terminates the loop statement and transfers execution to the statement immediately
following the loop.

● continue: causes the loop to skip the remainder of its body and immediately retest its condition
prior to reiterating.
7. Function

● A function is a block of organized, reusable code that is used to perform a single, related action.
Functions provide better modularity for your application and a high degree of code reusing.

● Function Syntax

With Return Statement Without Return Statement


7. Function

In Python, calling function is done by using any of the following types of arguments:
● Required arguments: the arguments must be given and passed to the function in correct positional
order.
● Keyword arguments: the function call identifies the arguments by the parameter names.
● Default arguments: the argument has a default value in the function declaration used when the
value is not provided in the function call.

Required Arguments Keyword Arguments Default Arguments


8. Class

● A Class is like an object constructor, or a "blueprint" for creating objects.


8. Class

● Data Hiding: the attributes named with a double underscore prefix (”__”) are not be directly visible to
outsiders.
9. Module

● A module is a file consisting of Python codes that can define functions, classes and variables.
● A module allows you to organize your code by grouping related code which makes the code easier
to understand and use.
● You can use any Python source file as a module by executing an import statement

● Python's from statement lets you import specific components from a module into the current
namespace.
9. Module

● import * is used to import all elements from a module into the current namespace.
2. Numpy

1. Introduction
2. Array
1. Introduction

● NumPy (Numerical Python), is a library consisting of


multidimensional array objects and a collection of
routines for processing those arrays. Using NumPy,
mathematical and logical operations on arrays can be
performed.
● Numpy is a Python C extension library for array-
oriented computing. It is the foundation of the Python
scientific stack.
2. Array

● The most important object defined in NumPy is an N-dimensional array type called ndarray. It
describes the collection of items of the same type. Items in the collection can be accessed
using a zero-based index.
● Every item in an ndarray takes the same size of block in the memory. Each element in ndarray is
an object of data-type object (called dtype).
2. Array

● Utilities to Create Arrays


2. Array

● Basic Attributes
2. Array

● Selecting Data
2. Array

● Arithmetic Functions
2. Array

● Statistical Functions
2. Array

● Linear Algebra Functions


3. Pandas

1. Introduction
2. Data Structure
3. Import/Export Data
4. Data Visualization
1. Introduction

● Pandas is an open-source Python library providing high-performance data manipulation and


analysis tool using its powerful data structures. The name Pandas is derived from the word
Panel Data – an Econometrics from Multidimensional data.
● With Pandas, we can accomplish five typical steps in the processing and analysis of data,
regardless of the origin of data — load, prepare, manipulate, model, and analyze.
● Python with Pandas is used in a wide range of fields including academic and commercial
domains including finance, economics, statistics, analytics, etc.
2. Data Structure

The two main data structures of Panda consists of:


● Series: one dimensional labeled homogeneous array with size immutable.
● DataFrames: two dimensional labeled, size-mutable tabular structure with potentially
heterogeneously typed columns.
2. Data Structure - Series

● You can read more in the document.


2. Data Structure - DataFrame
2. Data Structure - DataFrame

● Basic Attributes

Attribute Description

shape Return number of rows and columns

size Return the number of elements in this object.

dtypes Return the dtypes in the DataFrame


empty Indicator whether DataFrame is empty

columns Return columns

index Return the index

values Return numpy representation of the DataFrame


2. Data Structure - DataFrame

● Basic Methods

Method Description

head(n) Return the first n rows

tail(n) Return the last n rows

info() Display information such as index, data type and


memory…

to_numpy() Convert DataFrame to numpy array

astype(float) Convert the datatype of the series to float


2. Data Structure - DataFrame

● Selecting Data

Method Description

df[col_name] Select a column

df[[col1_name, col2_name]] Select multiple columns

loc[index] Select rows by index

iloc[position] Select rows by position

iloc[a, b] Select element at index of (a, b)


2. Data Structure - DataFrame

● Selecting Data

Method Description

df[df[col] > 0.6] Select rows where the column col > 0.6

df[(df[col] > 0.6) & (df[col] < 0.8)] Select rows where the column 0.8 > col > 0.6

sort_values(col, ascending=False) Sort values by col in descending order

groupby(col) Returns a groupby object for values from one


column
2. Data Structure - DataFrame

● Statistics

Method Description

describe() Display statistics summary for numerical columns

corr(method) Return the correlation coefficient between columns

count(axis) Return the number of non-null values in each axis

max(axis) Return the highest value in each axis

min(axis) Return the lowest value in each axis


2. Data Structure - DataFrame

● Statistics

Method Description

mean(axis) Return the mean in each axis

median(axis) Return the median of each axis

std(axis) Return the standard deviation of each axis


2. Data Structure - DataFrame

● Other Methods Method Description

isnull() Checks for null values, returns boolean array

notnull() Opposite of isnull()

dropna(axis) Drop rows whose columns containing NaN

fillna(n) Replace all null values with n

rename(columns) Rename the columns

replace(old, new) Replace all old with new

set_index(col) Change the index by col


2. Data Structure - DataFrame
● Other Methods
Method Description

value_counts(dropna=F View unique values and counts


alse)

append(df2) Add the rows in df2 to the end of df1 (columns should be
identical)

pd.concat([df1, df2], Add the columns in df1 to the end of df2 (rows should be
axis=1) identical)

join(df2, on=col, SQL-style join the columns with the columns on df2
how='inner') where the rows for col have identical values. The 'how'
can be 'left', 'right', 'outer' or 'inner'

● You can read more on selecting, grouping, merge, reshaping… in the document.
3. Import & Export Data

● Read CSV file


3. Import & Export Data

● Other Read Methods


Method Description

read_table(filename) Read from a delimited text file (TSV – Tab Separated


Table)

read_excel(filename, sheetname) Read from an Excel file

read_sql(query, Read from a SQL table/database


connection_object)

read_json(json_string) Read from a JSON formatted string, URL or file

read_html(url) Read from an html web document

read_clipboard() Takes the contents of your clipboard and passes it to


read_table()
3. Import & Export Data

● Write file

Method Description

to_csv(filename) Write to a CSV file

to_excel(filename) Write to an Excel file

to_sql(table_name, connection_object) Write to a SQL table

to_json(filename) Write to a file in JSON format

● You can read more on I/O functions in the


document.
4. Data Visualization

● Data Visualization is an important task which help us to understand the nuances of the data.
● With Panda, we have a friendly a plot method, which wraps matplotlib library, to visualize the
data in the form of a histogram, line chart, pie chart, scatter chart etc.
● This following data will be used in the following parts.
4. Data Visualization

● Bar Chart is a graph that is used to present the categorical data with rectangular bars
with heights or lengths proportional to the number of values that they represent.
4. Data Visualization

● Stack Bar Chart is used to break down and compare parts of a whole. Each bar represents a whole,
and segments in the bar represent different parts or categories of that whole. Different colors are
used to illustrate the different categories in the bar.
4. Data Visualization

● Pie Chart (Circle Chart) is a circular statistical graphic, which is divided into slices to illustrate
numerical proportion. The arc length of each slice is proportional to the quantity it represents.
4. Data Visualization

● Histogram differs from a bar chart, that relates only one continue variable. It is constructed by
first defining the bin (the range of values) and then count how many values fall into each bin.
4. Data Visualization

● Line Chart is a type of graph which displays information as a series of data points called
markers connected by straight line segments. It is a basic type of chart common in many fields.
4. Data Visualization

● Scatter Plot is a type of mathematical diagram which uses Cartesian coordinates to display the
continue values for typically two variables.
4. Data Visualization

● Boxplot is a standardized way of displaying the distribution of data based on a five number
summary (“minimum”, first quartile (Q1), median (Q2), third quartile (Q3), and “maximum”).
● It can identify outliers and can tell you if the data is symmetrical, how tightly your data is
grouped, and if and how your data is skewed.

Q2= Median
Min Max

Q1 Q
3

● You can read more on visualization in the


document.
4. Data Visualization - Seaborn

● CatPlot visualizes the relationship between a numerical and one or more categorical variables.
4. Data Visualization - Seaborn

● CatPlot visualizes the relationship between a numerical and one or more categorical variables.
4. Data Visualization - Seaborn

● Heat Map is a data visualization technique that shows magnitude of a phenomenon as color in
two dimensions. The variation in color may be by hue or intensity, giving obvious visual cues to
the reader about how the phenomenon is clustered or varies over space.
Q&A

You might also like