CS 416 Artificial Intelligence: Neural Networks
CS 416 Artificial Intelligence: Neural Networks
CS 416 Artificial Intelligence: Neural Networks
Artificial Intelligence
Lecture
Lecture 24
24
Neural
Neural Networks
Networks
Chapter
Chapter 20
20
Neural Networks
Read
Read Section
Section 20.5
20.5
Small
Small program
program and
and homework
homework assignment
assignment
Model of Neurons
•• Multiple
Multiple inputs/dendrites
inputs/dendrites
(~10,000!!!)
(~10,000!!!)
•• Cell
Cell body/soma
body/soma performs
performs
computation
computation
•• Single
Single output/axon
output/axon
•• Computation
Computation isis typically
typically
modeled
modeled as
as linear
linear
–– change
change in
in input
input
corresponds
corresponds to to kk change
change in
in
output (not kk22 or
output (not sin…)
or sin …)
Early History of Neural Nets
Eons
Eons ago:
ago: Neurons
Neurons are
are invented
invented
• 1868:
1868: J.
J. C.
C. Maxwell
Maxwell studies
studies feedback
feedback mechanisms
mechanisms
• 1943:
1943: McCulloch-Pitts
McCulloch-Pitts Neurons
Neurons
• 1949:
1949: Hebb
Hebb indicates
indicates biological
biological mechanism
mechanism
• 1962:
1962: Rosenblatt’s
Rosenblatt’s Perceptron
Perceptron
• 1969:
1969: Minsky
Minsky and
and Papert
Papert decompose
decompose perceptrons
perceptrons
McCulloch-Pitts Neurons
•• One
One or
or two
two inputs
inputs to
to neuron
neuron
•• Inputs
Inputs are
are multiplied
multiplied by
by
weights
weights
•• IfIf sum
sum of
of products
products exceeds
exceeds aa
threshold,
threshold, the
the neuron
neuron fires
fires
What can we model with these?
-0.5
Error in
book
-1
Perceptrons
•• Each
Each inputinput is
is binary
binary and and has
has
associated
associated with with itit aa weight
weight
•• The
The sum sum ofof the
the inner
inner product
product
of
of thethe input
input and
and weights
weights is is
calculated
calculated
•• IfIf this
this sum
sum exceeds
exceeds aa
threshold,
threshold, the the perceptron
perceptron firesfires
Neuron thresholds (activation functions)
•• ItIt is
is desirable
desirable to
to have
have aa differentiable
differentiable activation
activation function
function for
for
automatic
automatic weight
weight adjustment
adjustment
http://www.csulb.edu/~cwallis/artificialn/History.htm
Hebbian Modification
“When an axon of cell A is near enough to excite
cell B and repeatedly or persistently takes part
in firing it, some growth process or metabolic
change takes place in one or both cells such
that A’s efficiency, as one of the cells firing B, is
increased”
from Hebb’s 1949 The Organization of Behavior,
p. 62
Error Correction
w i x i c x w
Only
Only updates
updates weights
weights for
for non-zero
non-zero inputs
inputs
For
For positive
positive inputs
inputs
•• IfIf the
the perceptron
perceptron should
should have
have fired
fired but
but did
did not,
not, the
the weight
weight
is
is increased
increased
•• IfIf the
the perceptron
perceptron fired
fired but
but should
should not
not have,
have, the
the weight
weight is
is
decreased
decreased
Perceptron Example
•• Example
Example modified
modified Name
Richard
Had 4.0 Male Studious Drinker Gets 4.0
1 1 0 1 0
from
from “The
“The Essence
Essence Alan
Alison
1
0
1
0
1
1
0
0
1
0
Jeff 0 1 0 1 0
of
of Artificial
Artificial Gail
Simon
1
0
0
1
1
1
1
1
1
0
Intelligence”
Intelligence” by
by Weights 0.2 0.2 0.2 0.2
Alison
Alison Cawsey
Cawsey
•• Initialize
Initialize all
all weights
weights
to
to 0.2
0.2
•• Let
Let epsilon
epsilon == 0.05
0.05
and
and threshold
threshold == 0.5
0.5
Perceptron Example
•• First
First output
output is
is 11 Name
Richard
Had 4.0 Male Studious Drinker Gets 4.0
1 1 0 1 0
since
since
Alan 1 1 1 0 1
Alison 0 0 1 0 0
Jeff 0 1 0 1 0
0.2+0.2+0.2>0.5
0.2+0.2+0.2>0.5
Gail
Simon
1
0
0
1
1
1
1
1
1
0
•• Should
Should be
be 0,
0, so
so New w 0.15 0.15 0.2 0.15
weights
weights with
with active
active
connections
connections are
are
decremented
decremented by by
0.05
0.05
Perceptron Example
Name Had 4.0 Male Studious Drinker Gets 4.0
•• Next
Next output
output is
is 00 since
since Richard
Alan
1
1
1
1
0
1
1
0
0
1
0.15+0.15+0.2<=0.5
0.15+0.15+0.2<=0.5
Alison
Jeff
0
0
0
1
1
0
0
1
0
0
Gail 1 0 1 1 1
Simon 0 1 1 1 0
•• Should
Should be
be 1,
1, so
so Old w 0.15 0.15 0.2 0.15
weights
weights with
with active
New w 0.2 0.2 0.25 0.15
active
connections
connections are
are
incremented
incremented byby 0.05
0.05
•• New
New weights
weights work
work for
for
Alison,
Alison, Jeff,
Jeff, and
and Gail
Gail
Perceptron Example
Name Had 4.0 Male Studious Drinker Gets 4.0
•• Output
Output for
for Simon
Simon is
is 11 Richard
Alan
1
1
1
1
0
1
1
0
0
1
(0.2+0.25+0.15>0.5)
(0.2+0.25+0.15>0.5) Alison
Jeff
0
0
0
1
1
0
0
1
0
0
Gail 1 0 1 1 1
•• Should
Should be
be 0,
0, so
so Simon 0 1 1 1 0
connections
connections are
are
decremented
decremented by by 0.05
0.05
•• Are
Are we
we finished?
finished?
Perceptron Example
Name Had 4.0 Male Studious Drinker Gets 4.0
•• After
After processing
processing all
all the
the Richard
Alan
1
1
1
1
0
1
1
0
0
1
examples
examples again
again we
we get
get
Alison
Jeff
0
0
0
1
1
0
0
1
0
0
Gail 1 0 1 1 1
weights
weights that
that work
work for
for Simon 0 1 1 1 0
all
all examples
Weights 0.25 0.1 0.2 0.1
examples
•• What
What do
do these
these weights
weights
mean?
mean?
•• In
In general,
general, how
how often
often
should
should we
we reprocess?
reprocess?
Perceptrons are linear classifiers
Consider
Consider aa two-input
two-input neuron
neuron
•• Two
Two weights
weights are
are “tuned”
“tuned” to
to fit
fit the
the data
data
•• The
The neuron
neuron uses
uses the
the equation
equation w
w11 ** xx11 ++ w
w22 ** xx22 to
to fire
fire or
or not
not
–– This
This is
is like
like the
the equation
equation of
of aa line
line mx
mx ++ bb -- yy
http://www.compapp.dcu.ie/~humphrys/Notes/Neural/single.neural.html
Linearly separable
These
These single-layer
single-layer perceptron
perceptron networks
networks can
can
classify
classify linearly
linearly separable
separable systems
systems
For homework
Consider
Consider aa system
system like
like XOR
XOR
xx11 xx22 xx11 XOR
XOR xx22
11 11 00
00 11 11
11 00 11
11 11 00
Class Exercise
•• Find
Find w1,
w1, w2,
w2, and
and
theta
theta such
such that
that
Theta(x1*w1+x2*w2)
Theta(x1*w1+x2*w2)
== x1
x1 xor
xor x2
x2
•• Or,
Or, prove
prove that
that itit
can’t
can’t be
be done
done
2nd Class Exercise
•• x3
x3 == ~x1,
~x1, x4
x4 == ~x2
~x2
•• Find
Find w1,
w1, w2,
w2, w3,
w3,
w4,
w4, and
and theta
theta such
such
that
that
Theta(x1*w1+x2*w2)
Theta(x1*w1+x2*w2)
== x1
x1 xor
xor x2
x2
•• Or,
Or, prove
prove that
that itit
can’t
can’t be
be done
done
3rd Class Exercise
•• Find
Find w1,
w1, w2,
w2, and
and f()
f()
such
such that
that
f(x1*w1+x2*w2)
f(x1*w1+x2*w2) ==
x1
x1 xor
xor x2
x2
•• Or,
Or, prove
prove that
that itit
can’t
can’t be
be done
done
Multi-layered Perceptrons
•• Input
Input layer,
layer, output
output
layer,
layer, and
and “hidden”
“hidden”
layers
layers
•• Eliminates
Eliminates some
some
concerns
concerns of
of Minsky
Minsky
and
and Papert
Papert
•• Modification
Modification rules
rules
are
are more
more
complicated!
complicated!
4th Class Exercise
•• Find
Find w1,
w1, w2,
w2, w3,
w3,
w4,
w4, w5,
w5, theta1,
theta1, and
and
theta2
theta2 such
such that
that
output
output is
is x1x1 xor
xor
x2
x2
•• Or,
Or, prove
prove that
that itit
can’t
can’t be
be done
done
Recent History of Neural Nets
•• 1969
1969 Minsky
Minsky &
& Papert
Papert “kill”
“kill” neural
neural nets
nets
•• 1974
1974 Werbos
Werbos describes
describes back-propagation
back-propagation
•• 1982
1982 Hopfield
Hopfield reinvigorates
reinvigorates neural
neural nets
nets
•• 1986
1986 Parallel
Parallel Distributed
Distributed Processing
Processing
•• (Here’s
(Here’s some
some source
source code:
code:
http://www.geocities.com/CapeCanaveral/16
http://www.geocities.com/CapeCanaveral/16
24/)
24/)
“The report of my death is greatly exaggerated.” – Mark Twain
Limitations of Perceptrons
•• Minsky
Minsky && Papert
Papert published
published “Perceptrons”
“Perceptrons”
stressing
stressing the
the limitations
limitations of
of perceptrons
perceptrons
•• Single-layer
Single-layer perceptrons
perceptrons cannot
cannot solve
solve
problems
problems that
that are
are linearly
linearly inseparable
inseparable (e.g.,
(e.g.,
xor)
xor)
•• Most
Most interesting
interesting problems
problems are
are linearly
linearly
inseparable
inseparable
•• Kills
Kills funding
funding for
for neural
neural nets
nets for
for 12-15
12-15 years
years
Back-Propagation
•• The
The concept
concept ofof
local
local error
error is
is
required
required
•• We’ll
We’ll examine
examine our
our
simple
simple 3-layer
3-layer
perceptron
perceptron with
with xor
xor
Back-Propagation (xor)
•• Initial
Initial weights
weights are
are random
random Initial weights:
•• Threshold
Threshold is
is now
now
sigmoidal
sigmoidal (function should w1=0.90, w2=-0.54
(function should
have
have derivatives)
derivatives)
w3=0.21, w4=-0.03
w5 = 0.78
1
f ( x w)
1 e xw
Cypher: It means, buckle your seatbelt, Dorothy, because Kansas is going bye-bye.
Back-Propagation (xor)
•• Input
Input layer
layer –– two
two unit
unit
•• Hidden
Hidden layer
layer –– one
one unit
unit
•• Output
Output layer
layer –– one
one unit
unit
•• Output
Output is
is related
related to
to input
input by
by
F w , x f f x w w
•• Performance
Performance is
is defined
defined as as
1
P F w , x c 2
T x ,c T
“I hate math... so little room to make small errors.” – Caleb Schaefer, UGA student
Back-Propagation (xor)
• Error
Error at
at last
last layer (hidden
layer (hidden output)
output) is
is
defined as: 1 F w , x c
defined as:
• Error
Error at
at previous
previous layer (input
layer (input hidden)
hidden)
is
is defined as: j w j k o k 1 o k k
defined as:
P x ,c
• Change
Change in
in weight:
weight: wi j
x ,c T 2wi j
• Where: P x, c
Where: oi o j 1 o j j
2wi j
Back-Propagation (xor)
•• (0,0)
(0,0)00 –– 1st
1st example
example
•• Input
Input to
to hidden
hidden unit
unit is
is 0,
0, sigmoid(0)=0.5
sigmoid(0)=0.5
•• Input
Input to
to output
output unit
unit is
is (0.5)(-0.03)=-0.015
(0.5)(-0.03)=-0.015
•• Sigmoid(-0.015)=0.4963
Sigmoid(-0.015)=0.4963error=-0.4963
error=-0.4963
So, o 0.4963
•• So,
P
(0.5)(0.4963)(1 0.4963)(0.4963) 0.0620
w 4
•• Example’s
Example’s contribution
contribution to
to w 4 is
is –0.0062
–0.0062
Why are we ignoring the other weight changes?
Back-Propagation (xor)
•• (0,1)
(0,1)11 –– 2nd
2nd example
example
=-0.54
•• iihh=-0.54 oohh=0.3862
=0.3862
•• iioo=(0.3862)(-0.03)+0.78=0.769
=(0.3862)(-0.03)+0.78=0.769oooo=0.6683
=0.6683
o 1 0.6833 0.3167
P
( 0.3862)( 0.6833)(1 0.6833)( 0.3167) 0.0252
w 4
P
(1)(0.6833)(1 0.6833)( 0.3167) 0.0685 &c…
w 5
h ( 0.03)(0.6833)(1 0.6833)( 0.3167) 0.0021
P
(1)( 0.3682)(1 0.3682)( 0.0021) 0.0005
w 2
Back-Propagation (xor)
•• Initial
Initial performance
performance == -0.2696
-0.2696
•• After
After 100
100 iterations
iterations we
we have:
have:
•• w=(0.913,
w=(0.913, -0.521,
-0.521, 0.036,
0.036, -0.232,
-0.232, 0.288)
0.288)
•• Performance
Performance == -0.2515
-0.2515
•• After
After 100K
100K iterations
iterations we
we have:
have:
•• w=(15.75,
w=(15.75, -7.671,
-7.671, 7.146,
7.146, -7.149,
-7.149, 0.0022)
0.0022)
•• Performance
Performance == -0.1880
-0.1880
•• After
After 1M
1M iterations
iterations we
we have:
have:
•• w=(21.38,
w=(21.38, -10.49,
-10.49, 9.798,
9.798, -9.798,
-9.798, 0.0002)
0.0002)
•• Performance
Performance == -0.1875
-0.1875
Hopfield Nets
• Created
Created neural
neural nets
nets that
that have
have content-
content-
addressable
addressable memory
memory
• Can
Can reconstruct
reconstruct aa learned
learned signal
signal from
from
aa fraction
fraction of
of itit as
as an
an input
input
• Provided
Provided aa biological
biological interpretation
interpretation
What is the Purpose of NN?
•• To
To create
create an
an Artificial
Artificial Intelligence,
Intelligence, or
or
•• Although
Although not
not an
an invalid
invalid purpose,
purpose, many
many people
people in
in the
the AI
AI
community
community think
think neural
neural networks
networks do
do not
not provide
provide
anything
anything that
that cannot
cannot bebe obtained
obtained through
through other
other
techniques
techniques
•• To
To study
study how
how the
the human
human brain
brain works?
works?
•• Ironically,
Ironically, those
those studying
studying neural
neural networks
networks with
with this
this in
in
mind
mind are
are more
more likely
likely to
to contribute
contribute to
to the
the previous
previous
purpose
purpose
Quick List of Terms
•• Presynaptic
Presynaptic Modification:
Modification: Synapse
Synapse weights
weights are
are only
only
modified
modified when
when incoming
incoming (afferent)
(afferent) neuron
neuron fires
fires
•• Postsynaptic
Postsynaptic Modification:
Modification: Synapse
Synapse weights
weights are
are only
only
modified
modified when
when outgoing
outgoing (efferent)
(efferent) neuron
neuron fires
fires
•• Error
Error Correction:
Correction: Synapse
Synapse weights
weights are
are modified
modified
relative
relative to
to an
an error
error –– can
can be
be pre-
pre- or
or postsynaptic;
postsynaptic;
requires
requires some
some form
form ofof feedback
feedback
•• Self-supervised:
Self-supervised: Synapse
Synapse weights
weights are
are modified
modified
relative
relative to
to internal
internal excitation
excitation ofof neuron
neuron –– can
can bebe pre-
pre-
or
or postsynaptic
postsynaptic
Self-supervised Neurons
•• One
One example
example is
is aa neuron
neuron that
that has
has the
the
following
following synaptic
synaptic modification
modification rule:
rule:
wij y j xi wij
y j x w x wij T
i
Internal excitation
0 E xi y j E y j wij Convergence of weights
E xi xiT w E y w
ij j ij
E x xT
i i w E y w
ij j ij
Eigenvalue equation!
More Self-Supervision
•• Previous
Previous rule
rule could
could not
not learn
learn to
to distinguish
distinguish
between
between different
different classes
classes of
of data
data
•• However,
However, ifif the
the rule
rule is
is modified
modified to:
to:
w ij y x
j i w ij
•• The
The neuron
neuron will
will learn
learn to
to only
only respond
respond to
to aa
certain
certain class
class of
of inputs
inputs
•• Different
Different neurons
neurons respond
respond to
to different
different
classes
classes
Some Brain Facts
•• Contains
Contains ~100,000,000,000
~100,000,000,000 neurons
neurons
•• Hippocampus
Hippocampus CA3
CA3 region
region contains
contains ~3,000,000
~3,000,000 neurons
neurons
•• Each
Each neurons
neurons is
is connected
connected to
to ~10,000
~10,000 other
other neurons
neurons
•• ~1,000,000,000,000,000
~1,000,000,000,000,000 (10
(101515)) connections!
connections!
•• Contrary
Contrary to to aa BrainPlace.com,
BrainPlace.com, this this is
is considerably
considerably less
less than
than number
number
of
of stars
stars in
in the
the universe
universe –– 10
102020 to
to 10
102222
•• Consumes
Consumes ~20-30%
~20-30% of
of the
the body’s
body’s energy
energy
•• Contains
Contains about
about 2%
2% of
of the
the body’s
body’s mass
mass