McCulloch Pitts Neuron - Perceptron
Perceptron as a binary classifier
x1 = isActorRajinikanth
Let us reconsider our problem of deciding whether to watch
x2 = isGenreFamily
a movie or not
x3 = isDirectorKSRavi
x4 = imdbRating(scaled to 0 to 1)
Suppose we are given a list of m movies and a label (class) ... ...
associated with each movie indicating whether the user xn = criticsRating(scaled to 0 to 1)
liked this movie or not : binary decision
Further, suppose we represent each movie with n features
(some boolean, some real valued)
We will assume that the data is linearly separable and we
want a perceptron to learn how to make this decision In other words, we want the perceptron
to find the equation of this separating plane (or find the
values of w0, w1, w2, .., wm)
Perceptron Algorithm
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and σ𝑛𝑖=0 𝑤𝑖 ∗ 𝑥𝑖 < 0 𝑡ℎ𝑒𝑛
w = w + x;
end
We are interested in finding the line
if x ∈ N and σ𝑛𝑖=0 𝑤𝑖 ∗ 𝑥𝑖 ≥ 0 𝑡ℎ𝑒𝑛 wTx = 0 which divides the input space
w = w - x; into two halves
end
end Every point (x) on this line
satisfies the equation wTx = 0
What can you tell about the angle (α) between w and any point (x) which lies
on this line ?
Perceptron Algorithm
Consider some points (vectors) which lie in the
Algorithm: Perceptron Learning Algorithm
positive half space of this line (i.e., wTx ≥ 0)
P ← inputs with label 1;
N ← inputs with label 0; What will be the angle between any such
Initialize w randomly; vector and w ?
while !convergence do Obviously, less than 90°
Pick random x ∈ P ∪ N ; What about points (vectors) which lie in the
if x ∈ P and σ𝑛𝑖=0 𝑤𝑖 ∗ 𝑥𝑖 < 0 𝑡ℎ𝑒𝑛
negative half space of this line (i.e., wTx < 0)
w = w + x;
end What will be the angle
if x ∈ N and σ𝑛𝑖=0 𝑤𝑖 ∗ 𝑥𝑖 ≥ 0 𝑡ℎ𝑒𝑛 between any such vector
w = w - x;
end
and w ?
end
Obviously, greater than 90°
Perceptron Algorithm For x ∈ P if w.x < 0 then it means
that the angle (α) between this x
Algorithm: Perceptron Learning Algorithm
and the current w is greater than
P ← inputs with label 1;
N ← inputs with label 0; 90° (but we want α to be less than
Initialize w randomly; 90°)
while !convergence do What happens to the new angle
Pick random x ∈ P ∪ N ; (αnew) when wnew = w + x
if x ∈ P and σ𝑛𝑖=0 𝑤𝑖 ∗ 𝑥𝑖 < 0 𝑡ℎ𝑒𝑛
w = w + x;
end
if x ∈ N and σ𝑛𝑖=0 𝑤𝑖 ∗ 𝑥𝑖 ≥ 0 𝑡ℎ𝑒𝑛
w = w - x;
end
end
Perceptron Algorithm
For x ∈ N if w.x ≥ 0 then it means
Algorithm: Perceptron Learning Algorithm that the angle (α) between this x
P ← inputs with label 1; and the current w is less than 90°
N ← inputs with label 0; (but we want α to be greater than
Initialize w randomly; 90°)
while !convergence do
Pick random x ∈ P ∪ N ; What happens to the new angle
if x ∈ P and σ𝑛𝑖=0 𝑤𝑖 ∗ 𝑥𝑖 < 0 𝑡ℎ𝑒𝑛 (αnew) when wnew = w − x
w = w + x;
end
if x ∈ N and σ𝑛𝑖=0 𝑤𝑖 ∗ 𝑥𝑖 ≥ 0 𝑡ℎ𝑒𝑛
w = w - x;
end
end
Perceptron Algorithm
initializing w1, w2, as 1 initializing w1, w2, as
and w0 as –1 1 and w0 as –1
So far,
What about non-boolean (say, real) inputs? Real valued inputs are allowed
Do we always need to hand code the threshold? No, we can learn the threshold
Are all inputs equal? What if we want to assign more weight (importance) to some
inputs? A perceptron allows weights to be assigned to inputs
What about functions which are not linearly separable ?
Not possible with a single perceptron but we will see how to
handle this
what do we do about functions which are not linearly
separable ?
The fourth condition contradicts conditions
2 and 3
Hence we cannot have a solution to this set
of inequalities
XOR is not linearly separable using single layer perceptron
What about XNOR gate (sometimes ENOR, EXNOR or NXOR and pronounced as
Exclusive NOR) ?
Example
Prakash VIT Chennai