ECE/CS 559 - Neural Networks Lecture Notes #2 Mathematical Models For The Neuron, Neural Network Architectures
ECE/CS 559 - Neural Networks Lecture Notes #2 Mathematical Models For The Neuron, Neural Network Architectures
ECE/CS 559 - Neural Networks Lecture Notes #2 Mathematical Models For The Neuron, Neural Network Architectures
• Neuron: A highly-specialized cell that can transmit and receive information through electrical/chemical
signaling. The fundamental building block of the brain.
• Synapse: The structure that connects two given neurons, i.e. the functional unit that mediates the
interactions between neurons.
• A chemical synapse: Electrical signal through axon of one neuron → chemical signal / neurotransmitter
→ postsynaptic electrical signal at the dendrite of another neuron.
• Information flows through dendrites to axons: Dendrite (of neuron A) → Cell body (Soma) → Axon
→ Synapses → Dendrite (we are now at neuron B) → Cell body (Soma) → Axon → Synapses → · · · .
• A neuron can receive thousands of synaptic contacts and it can project onto thousands of target
neurons.
• A synapse can impose excitation or inhibition (but not both) on the target neuron. The “strength” of
excitation/inhibition may vary from one synapse to another.
• There are thousands of different types of neurons. There are, however, 3 main categories.
1
1.2 A mathematical model for the neuron
(Image taken from our coursebook: S. Haykin, “Neural Networks and Learning Machines,” 3rd ed.)
• Input signals are from dendrites of other neurons.
• The synaptic weights correspond to the synaptic strengths: positivity/negativity → excitation/inhibition.
• The summing unit models the operation of the cell body (soma).
(Image taken from our coursebook: S. Haykin, “Neural Networks and Learning Machines,” 3rd ed.)
P
n
• yk = ϕ j=0 wkj xj . Note that now the summation starts from index 0.
2
1.3.1 Step function
• Threshold function (or the Heaviside/step function):
1, v ≥ 0
ϕ(v) = .
0, v < 0
(Image taken from our coursebook: S. Haykin, “Neural Networks and Learning Machines,” 3rd ed.)
(Image taken from our coursebook: S. Haykin, “Neural Networks and Learning Machines,” 3rd ed.)
eav − e−av
ϕ(v) = tanh(av) = .
eav + e−av
for some parameter a > 0. Approaches the signum function as a → ∞.
3
2 Neural network architectures
Having introduced our basic model of a (mathematical) neuron, we now introduce the different neural network
architectures that we will keep revisiting throughout the course.
(Image taken from our coursebook: S. Haykin, “Neural Networks and Learning Machines,” 3rd ed.)
• We just count the number of layers consisting of neurons (not the layer of source nodes as no compu-
tation is performed there). Thus, the network in Fig. 1 is called a single-layer network.
• Also, note that the information flow is only over one direction, i.e. the input layer of source nodes
project directly onto the output layer of neurons (according to the non-linear transformations as spec-
ified by the neurons.). There is no feedback of the network’s output to network’s input. We thus say
that the network in Fig. 1 is of feedforward type.
• Let us try to formulate the input-output relationship of the network. Putting in the symbols, we have
the following diagram:
4
We have
n
!
X
y1 = φ b1 + w1i xi
i=1
n
!
X
y2 = φ b2 + w2i xi
i=1
..
.
n
!
X
yk = φ bk + wki xi
i=1
5
Here wji is the weight from input i to Neuron j. We can further rewrite everything in a simple matrix form.
Define
1
y1 b1 w11 · · · w1n x1
y2 b2 w21 · · · w2n
0 x2
y = . , W0 = . , x = .
. . .
.. .. .. .. ..
..
.
yk bk wk1 · · · wkn
xn
Then, the above input output relationship can just be written as y = φ(W0 x0 ) in the sense that φ is applied
component-wise. Sometimes biases are treated separately. Defining
b1 w11 · · · w1n x1
b2 w21 · · · w2n x2
b = . ,W = . .. , x = .. .
..
.. .. . . .
bk wk1 ··· wkn xn
we can write y = φ(W0 x0 ) = φ(Wx + b).
(Image taken from our coursebook: S. Haykin, “Neural Networks and Learning Machines,” 3rd ed.)
Figure 2: Fully-connected 2-layer feedforward network with one hidden layer and one output layer.
6
• We call a network with m source nodes in its input layer, h1 hidden nodes in its first hidden layer, h2
nodes in its second hidden layer, . . ., hK nodes in its Kth hidden layer, and finally q nodes in its output
layer a m-h1 -h2 -· · · -hK -q network. For example, the network in Fig. 2 is called a 10-4-2 network as it
has 10 source nodes in its input layer, 4 nodes in the hidden layer, and 2 nodes in its output layer.
• Fully-connected network: Every node in each layer of the network is connected to every other node
in the adjacent layer. Example: The network is Fig. 2 is fully connected. Otherwise, the network is
partially connected.
• Deep network: Many (usually assumed to be > 1) hidden layers. Shallow network: The opposite.
• As will be made more precise later on, theoretically, only one hidden node is sufficient for almost any
application provided that one can afford a large number of neurons. On the other hand, a deep network
can perform the same tasks as a shallow network with the extra advantage of possibly a fewer number
of neurons. Hence, deep networks, provided that they can be properly designed, may be better suited
for complex practical applications.
• The input-output relationships may be formulated in a similar manner as the single-layer network
discussed previously. For example, consider a two-layer network with n inputs, L1 neurons in the first
layer, and L2 neurons in the second (output) layer. Let x ∈ Rn×1 be the vector of inputs, W1 ∈ RL1 ×n
be the matrix of weights connecting the input layer to the first layer of neurons (where the ith row
jth column of W1 corresponds to the weight between input node j and neuron i of the first layer),
b1 ∈ RL1 ×1 be the vector of biases for the first layer of neurons, W2 ∈ RL2 ×L1 be the matrix of weights
connecting the first layer of neurons to the second layer of neurons (where the ith row jth column of
W2 corresponds to the weight between neuron j of the first layer and neuron i of the second layer),
and b2 ∈ RL2 ×1 be the vector of biases for the second layer of neurons, and y ∈ RL2 ×1 be the vector
of outputs. Then, we have y = φ(W2 φ(W1 x + b1 ) + b2 ).
(Image taken from our coursebook: S. Haykin, “Neural Networks and Learning Machines,” 3rd ed.)
7
• What distinguishes recurrent networks from feedforward networks is that they incorporate feedback.
• In Fig. 3, the boxes with z −1 represent the unit discrete time delays.
• No self-feedback: The output of a given neuron is not fed back to its own input.
• One may have variations of the structure in Fig. 3. We shall discuss these variations and the details
of recurrent networks later on.