Lecture 4

Iris versicolor
Iris setosa
www.mathworks.com/help/toolbox/stats/ bqzdnrv-1.html
www.fs.fed.us/wildflowers/beauty/iris/blueflag/
Your first workshop task (using J48 in Weka) was to see if there was enough information in the four attributes of petal length and width and sepal length and width to distinguish and classify these similar looking flowers. Notice that colour may not help here. Your next workshop will involve using ANNs to distinguish between these flowers. Iris virginica
www.fs.fed.us/wildflowers/beauty/iris/blueflag/
Workshops and assignment

Week 3: Workshop in ANNs (attend if you wish) Week 3: assignment distributed Week 4 Workshop in ANNs (attend if you wish) Week 5-9: Workshops in WEKA/ANNs as you progress through the assignment (tailored to your requirements) Week 10: 1st June 2012 hand-in. 13 pages max.
2
Lecture 4
Artificial Neural Networks
Images of the brain. Top left: photo; top right: what is currently known; left: close up showing the brain consisting of layers of interconnected nerve cells (neurons) and tissue.
All images taken from www.idsia.ch/NNcourse/brain.html4
From biology to computing

Neuron = nerve cell (in brain) Flow of information Biological
Neurons
Artificial
http://research.yale.edu/ysm/images/78.2/articles-neural-neuron.jpg
http://faculty.washington.edu/chudler/color/pic1an.gif
http://www.frontiersin.org/neuromorphic_engineering/10.3389/fnins.2011.00026/full
Biological computing through action potential spikes: A: Abstract physiology B: Biochemistry Physiology and biochemistry lead to spikes. Can spikes be used for computing?
Post synaptic neuron spikes more (or less) depending on pre synaptic behaviour
Action potential spiking can take place many times per second, depending on which part of the brain we look at
http://www.socialbehavior.uzh.ch/teaching/ComputationalNeuroeconomicsFS11/Chapter10.pptx
The Structure of Neurons

A neuron has a cell body, a branching input structure (the dendrIte) and a branching output structure (the axOn)
Axons connect to dendrites via synapses. Electro-chemical signals are propagated from the dendritic input, through the cell body, and down the axon to other neurons
Classical computing vs. Neural Net

Layers of interconnected neurons (as many layers as you like), with the connections being weighted to reflect strength of incoming signal CPU
data and instructions
data
data
data
memory Layer 1 Layer 2 Layer 3
http://ilab.usc.edu/classes/2002cs561/notes/session28.ppt
Feedforward
http://www.cs.umbc.edu/~ypeng/F04NN/lecture-notes/NN-Ch1.ppt
Introduction
What is an (artificial) neural network? A set of nodes (units, neurons, processing elements) Each node has input and output Each node performs a simple computation by its node function Weighted connections between nodes Connectivity gives the structure/architecture of the net What can be computed by a NN is primarily determined by the connections and their weights A very much simplified version of networks of neurons in animal nerve systems Neuron is basic computational unit (primitive processor), not a program
Introduction
Von Neumann machine
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Human Brain
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
One or a few high speed (ns) processors with considerable computing power One or a few shared high speed buses for communication Sequential memory access by address Problem-solving knowledge is separated from the computing component Hard to be adaptive
Large # (1011) of low speed processors (ms) with limited computing power Large # (1015) of low speed connections Problem-solving knowledge resides in the connectivity of neurons Adaptation by changing the connectivity Easily adapts for learning Fault tolerant
Example of fault tolerance

Captchas now frequently used by websites to check that fault-tolerant human brain (rather than rigorous software) is interacting with site. If there were a reliable algorithm for recognizing these faulty characters, some other method for checking the human-ness of the user needs to be found.
12
ANN
Introduction
Bio NN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nodes input output node function Connections connection strength
Cell body signal from other neurons firing frequency firing mechanism Synapses synaptic strength
Highly parallel, simple local computation (at neuron level) achieves global results as emerging property of the interaction (at network level) Pattern directed (meaning of individual nodes only in the context of a pattern) Fault-tolerant/graceful degrading Learning/adaptation plays important role.
History of NN
Pitts & McCulloch (1943)
First mathematical model of biological neurons All Boolean operations can be implemented by these neuronlike nodes (with different threshold and excitatory/inhibitory connections). Competitor to Von Neumann model for general purpose computing device Origin of automata theory.
Hebb (1949)
Hebbian rule of learning: increase the connection strength between neurons i and j whenever both i and j are activated. Or increase the connection strength between nodes i and j whenever both nodes are simultaneously ON or OFF.
History of NN
Early booming (50s early 60s) Rosenblatt (1958)
Perceptron: network of threshold nodes for pattern classification x1 x2 xn Perceptron convergence theorem: everything that can be represented by a perceptron can be learned
A neuron only fires if its input signal exceeds a certain amount (the threshold) in a short time period. Synapses vary in strength Good connections allowing a large signal Slight connections allow only a weak signal. Synapses can be either excitatory or inhibitory.
Perceptron
Linear treshold unit (LTU)
x1 w1 w2 wn x0=1 w0
Usually, an extra input (constant) is included to ensure that, even if all the inputs are 0, there is non-zero fed into the threshold function
x2
. . .
wi xi i=0
o(xi)=
o
n
xn
1 if
w x >0
i=0
i i
-1 otherwise
Threshold function: if the sum of weighed inputs is greater than 0, output 1 else output -1
16
Perceptron Learning Rule

wi = wi + wi (a weight changes according to some difference) wi = (t - o) xi (difference is between desired class value and output value) t=c(x) is the target value (class value of sample) o is the perceptron output (actual value representing class) (eta) is a small constant (e.g. 0.1) called learning rate (more later)
If the output is correct (t=o) the weights wi are not changed If the output is incorrect (to) the weights wi are changed such that the output of the perceptron for the adjusted weights is closer to t. The algorithm converges to the correct classification after repeated presentations of samples: if the training data is linearly separable and is sufficiently small
17
ANN Supervised Learning Method - Basic

2. Actual output compared with desired output
4. Error used to adjust weights (not input or output)

Problem: How do we represent the class value if a threshold function only returns 0 and 1?
5. Many repetitions of 1-4
1. Samples fed in one at a time to the input units: repeat many times
3. Actual output stored in a temporary file to allow calculation of error (difference between desired output and actual output) error can be summed
18
Training ANN (perceptron) for Additional AND function For AND input constant
-1 A x A B Output 00 0 01 0 t = 0.0 10 0 W2 = ? 11 1 Output is +1 if t exceeded, W3 = ? 0 otherwise W1 = ?
Initialize with random weight values Introduce additional input constant (called bias) to ensure that the threshold function receives some input even if all other inputs are zero
19
Training Perceptrons
-1 W1 = 0.3 x t = 0.0 For AND A B Output 00 0 01 0 10 0 11 1
W2 = 0.5
W3 = -0.4
y
I1 I2 I 3 -1 -1 -1 -1 0 0 1 1 Summation Output 0 0 1 0
0 (-1*0.3) + (0*0.5) + (0*-0.4) = -0.3 1 (-1*0.3) + (0*0.5) + (1*-0.4) = -0.7 0 (-1*0.3) + (1*0.5) + (0*-0.4) = 0.2 1 (-1*0.3) + (1*0.5) + (1*-0.4) = -0.2
Given the current weights, this perceptron does not produce the correct results for two combinations of AND: 1 0 and 1 1.
20
Exercise: Fill the values in the summation table to determine whether this Perceptron correctly performs the AND function.
W2 = 0.7
W3 = -0.2
y
I1 I2 I 3 -1 -1 -1 -1 0 0 1 1 0 1 0 1
21
Summation
Output
Solution to Exercise
W2 = 0.7
W3 = -0.2
y
I1 I2 I3 -1 -1 -1 -1 0 0 1 1 Summation Output 0 0 1 1
22
0 (-1*0.4) + (0*0.7) + (0*-0.2) = -0.4 1 (-1*0.4) + (0*0.7) + (1*-0.2) = -0.6 0 (-1*0.4) + (1*0.7) + (0*-0.2) = 0.3 1 (-1*0.4) + (1*0.7) + (1*-0.2) = 0.1
Weight adjustment
I1 I2 I 3 -1 -1 -1 -1 0 0 1 1 Summation Output 0 0 1 0
Output 0 0 1 1
0 (-1*0.3) + (0*0.5) + (0*-0.4) = -0.3 1 (-1*0.3) + (0*0.5) + (1*-0.4) = -0.7 0 (-1*0.3) + (1*0.5) + (0*-0.4) = 0.2 1 (-1*0.3) + (1*0.5) + (1*-0.4) = -0.2
Summation
I1 I2 I 3 -1 -1 -1 -1 0 0 1 1
0 (-1*0.4) + (0*0.7) + (0*-0.2) = -0.4 1 (-1*0.4) + (0*0.7) + (1*-0.2) = -0.6 0 (-1*0.4) + (1*0.7) + (0*-0.2) = 0.3 1 (-1*0.4) + (1*0.7) + (1*-0.2) = 0.1
Note that the summation results are in the right direction for producing the correct output for 1 1 but not for 1 0 (given threshold of t=0.0)
23
Learning algorithm
Epoch : Presentation of the entire training set to the neural network. In the case of the AND function an epoch consists of four sets of inputs being presented to the network (i.e. [0,0], [0,1], [1,0], [1,1]) Error: The error value is the amount by which the value output by the network differs from the target value. For example, if we required the network to output 0 and it output a 1, then Error = -1
Learning question 1: Is there a set of weights that will produce the correct results for all input values of the AND function?
Learning question 2: If so, how do we find these weights automatically rather than manually adjusting the weights?
24
Learning algorithm (supervised) recap

Target Value, T : When we are training a network we not only present it with the input but also with a value that we require the network to produce. For example, if we present the network with [1,1] for the AND function the training value will be 1 Output , O : The output value from the neuron Ij : Inputs being presented to the neuron Wj : Weight from input neuron (Ij) to the output neuron LR : The learning rate. This dictates how quickly the network converges. It is set by a matter of experimentation. It is typically 0.1, for reasons to be described later
25
Feedback Learning Algorithm recap

Until Convergence (low error or other stopping criterion) do Present a training pattern Calculate the error of the output nodes Adjust the weights connecting the input nodes to the output node so that the next time the training pattern is presented the error of the output is reduced
26
Automatic Perceptron algorithm

weight change = some small constant (target output actual output) input if we use error instead of the target output actual output, we have: weight change = some small constant error input
Perceptron feedback rule

weight change = some small constant (learning rate) error input (typically, learning rate is 0.1 or smaller) The error is: error = (target output actual output) [will always be -1, 0 or 1 in our example here] E.g. 1 0 produces the wrong output 1 for 1 0, as follows:
-1 1 0 (-1*0.4) + (1*0.7) + (0*-0.2) = 0.3 1
Error = (0 1) = -1 Weight change for w1 = 0.1 x -1 x -1 = 0.1 (w1=0.4+0.1=0.5) Weight change for w2 = 0.1 x -1 x 1 = -0.1 (w2=0.7-0.1=0.6) Weight change for w3 = 0.1 x -1 x 0 = 0 (w3=0.2, no change)
28
Update Perceptron weights for 1 0

W2 = 0.6
W3 = -0.2
y The next time 1 0 is presented:
-1 1 0 (-1*0.5) + (1*0.6) + (0*-0.2) = 0.1 1
That is, while the output is still wrong, there has been a reduction in output from 0.3 to 0.1 for input 1 0. At least another presentation of 1 0 is required to produce the desired output 0, given the threshold of t=0.0
29
Next presentation
Error = (0 1) = -1 Weight change for w1 = 0.1 x -1 x -1 = 0.1 (w1=0.5+0.1=0.6) Weight change for w2 = 0.1 x -1 x 1 = -0.1 (w2=0.6-0.1=0.5) Weight change for w3 = 0.1 x -1 x 0 = 0 (w3=0.2, no change)
-1 1 0 (-1*0.6) + (1*0.5) + (0*-0.2) = -0.1 0
Next presentation:
Since the output activation for 1 0 is now below 0, the correct output for AND(1,0) = 0 is now produced. We can now move to the next wrongly classified sample 1 1. One can change weights after processing a single pattern or accumulate weight error values over a batch of patterns before changing the weights. This allows all patterns to be presented to a perceptrons existing weights before the weights are changed.
30
History of NN
The setback (mid 60s late 70s)
Serious problems with perceptron model (Minskys book 1969) Single layer perceonptrons cannot represent (learn) simple functions such as XOR Multi-layer of non-linear units may have greater power but there is no learning rule for such nets Scaling problem: connection weights may grow infinitely The first two problems overcame by latter effort in 80s, but the scaling problem persists Death of Rosenblatt (1964) Striving of Von Neumann machine and AI
History of NN
Renewed enthusiasm and flourish (80s present)
New techniques
Backpropagation learning for multi-layer feed forward nets (with non-linear, differentiable node functions) Thermodynamic models (Hopfield net, Boltzmann machine, etc.) Unsupervised learning
Impressive application (character recognition, speech recognition, text-to-speech transformation, process control, associative memory, etc.) ANNs now preferred computational method in many applications (e.g. pattern recognition)
Excitatory and Inhibitory Synapses - Recap

We call a synapse/weight: excitatory if wi > 0, and inhibitory if wi < 0. We also associate a threshold q with each neuron A neuron fires (i.e., has value 1 on its output line if the weighted sum of inputs at t reaches or passes q: output = 1 if and only if wixi q
http://ilab.usc.edu/classes/2002cs561/notes/session28.ppt
33
Most common ANN architecture: Feedforward nets
Information flow is unidirectional

Data is presented to Input layer Passed on to Hidden Layer Passed on to Output layer
Information is distributed
Information processing is parallel
34
Your ANN for OCR

feedforward network train using Backpropagation 2-D pixel matrix converted into linear input
A B C D E
Hidden Layer
Output Layer
Input Layer
35
Multi-layer Perceptron (MLP) Topology Note that

i k i j k i k i
output layer can contain more than one unit E.g. output classes 1/0 can be represented by one unit or by two units ( 1 0 for class 1 and 0 1 for class 2)
Input Layer i
Hidden Layer(s) j Output Layer k
Fully connected, MLP is the most common (and simplest) ANN

36
Backpropagation Learning Algorithm

Until Convergence (low error or other stopping criteria) do Present a training pattern Calculate the error of the output nodes Calculate the error of the hidden nodes (based on the error of the output nodes which is propagated back to the hidden nodes) Continue propagating error back until the input layer is reached Update all weights based on the standard delta rule with the appropriate error function d wij = dj Zi where Z is the output function of ANN and is learning rate
One can change weights after processing a single pattern or accumulate weight error values over a batch of patterns before changing the weights.
37
Backpropagation algorithm in rules

weight change = some small constant (learning rate) error input activation For an output node, the error is:
error = (target activation - output activation) output activation (1 - output activation)
For a hidden node, the error is:

error = weighted sum of to-node errors hidden activation (1 - hidden activation)
38
Why back-propagation?
Each weight Shares the Blame for prediction error with other weights. Back-propagation algorithm decides how to distribute the blame among all weights and adjust the weights accordingly. Small portion of blame leads to small adjustment. Large portion of the blame leads to large adjustment.
39
The role of in learning

q
w1 w2 ... wn
w1=0.5 w2=0.2 w3=0.8
1*0.5 + 0*0.2 + 1*0.8 = 1.3 (actual output)

Assume = 1.0 (threshold activation function) Then 1.3>1.0 and perceptron outputs 1. But desired output is 0. Then: wnew= wold + (desired actual) * input
x1
x2
xn
Assume = 1 w1new = 0.5+1*(01)*1=0.5 W2new = 0.2+1*(01)*0=0.2 W3new = 0.8+1*(01)*1= 0.2 Large can lead to weight oscillation: Assume = 0.2 w1new = 0.5+0.2*(01)*1=0.3 Note how W2new = 0.2+0.2*(01)*0=0.2 weights that are W3new = 0.8+0.2*(01)*1= 0.6 more to blame
Assume three inputs, one output 1 0 1 is the pattern at the input nodes, with 0 the target
get a larger amount of 40 change
Transfer Functions: Transfer function is usually the same for every unit in the same layer There are various choices for Transfer / Activation functions that determine what is output from a neuron
1
1
0.5
-1
Tanh f(x) = (ex e-x) / (ex + e-x)
Logistic f(x) = ex / (1 + ex)
Threshold 0 if x< 0 f(x) = 1 if x >= 1
Choice of transfer function for one output unit will depend on class information: 1. If 0 and 1, use Logistic or Threshold 2. If non-binary, use Tanh (e.g. -1, 0, 1 for tripartite classification) 3. If N classes, a class is represented as (0,..., 0,1, 0,.., 0) at the output layer (i.e. as many output nodes as distinct class values)
41
Why is back propagation important?

Provides a procedure that allows networks to learn weights that can solve any deterministic input-output problem. Allows networks to learn how to represent information as well as how to use it. Raises questions about the nature of representations and of what must be specified in order to learn them. But back propagation pure and simple may be prone to the local minima problem This is because standard BP always seeks to reduce error through weight adjustment and gradient descent error is not allowed to increase
42
Local Minima
Advantages of back propagation Relatively simple implementation Standard method and generally works well Disadvantages of BP Slow and inefficient Can get stuck in local minima resulting in sub-optimal solutions
Local Minimum
Learning rate specifies the step width of gradient descent
Weights are stuck through gradient descent (i.e. error has reached local minimum) Gradient descent must be amended to allow learning to leave flat spots
Global Minimum
43
Enhancements To Back Propagation

Momentum
Adds a percentage of the last movement to the current movement With momentum
Without momentum, BP will fall back to local minimum and elimination of flat spots, BP will find the global minimum
44
The role of bias in FFBP ANNs

If all inputs are 0 in a FFBP, then output will be 0 irrespective of weights Bias unit lies in one layer and is connected to all neurons in next layer
The role of the bias units is to ensure that some value is input to the nodes at the next layer even if values are 0 from nodes in the previous layer. Bias units are usually set to output 1.
45
Weight change and momentum

backpropagation algorithm often takes a long time to learn Momentum consists of adding a fraction of the old weight change typically set at about 0.5 The learning rule then looks like:
weight change = some small constant error input activation + momentum constant old weight change
w(t) = *d + a*w(t-1) w is the change in weight is the learning rate d is the error x input activation a is the momentum parameter
46
Batch Update
With default BP update you update weights after every pattern With Batch update you accumulate the changes for each weight, but do not update them until the end of each epoch Batch update gives a correct direction of the gradient for the entire data set, while default update could do some weight updates in directions quite different from the average gradient of the entire data set
47
When to stop training the Network ?

Ideally when we reach the global minimum of the error surface: Stop if the decrease in total training error (since last cycle) is small. Usually, sum of squared error (SSE) is used for this purpose (squared so that negative error values are converted to positive for summing) Stop if the overall changes in the weights (since last cycle) are small. But the network thus obtained may have poor generalizing power on unseen data i.e. the ANN has over-fitted the data Over-fitting means that the ANN has memorized the training data so that, when new data is presented, predictions are poor
48
Example of non-training. Note how the error graph goes into a flat line
49
An example of a neural network learning successfully
50
Choice of Training Parameters

Learning Parameter and Momentum What should be the optimal values of these training parameters ? - No clear consensus on any fixed strategy. - However, effects of wrongly specifying them are well studied. Too big Large leaps in weight space risk of missing global minima. Too small - Takes long time to converge to global minima - Once stuck in local minima, difficult to get out of Trial and error still the only method known
51
Is Backprop biologically plausible?

Neurons do net send error signals backward across their weights through a chain of neurons, as far as anyone can tell But does this matter? Some neurons appear to use error signals, and there are ways to use differences between activation signals to carry error information
52
Neural network models (summary)

Some of the most popular NN models are:
Perceptron, ADALINE, Multilayer Perceptron (MLP), Learning Vector Quantization (LVQ), Self-Organizing Map (SOM), Adaptive Resonance Theory (ART), Probabilistic Neural Network (PNN), General Regression Neural Network (GNN), Bidirectional Associative Memory (BAM), Boltzmann Machine, Elman, Hamming, Support Vector Machines (SVM), Time Delay NN (TDNN), Recurrent Backpropagation, ARTMAP, Counterpropagation, Neocognitron (over 100 different type)
53
Connectionism versus strong AI

Strong AI (and its use of the PSSH) is often called computationalism the human mind works purely through formal operations on symbols, like a computer program Connectionism is used to describe mental phenomena as the emergent processes of interconnected networks of simple units. There are many forms of connectionism, but the most common forms use neural network models. Hence, the debate is whether traditional programming or neural networks explain the mind better That is, does your mind work like high-level computer program (or programs), or does your mind work like a neural network?
54
Practical Differences
A traditional program need have no knowledge of the hardware on which it is run A neural network is totally dependent on the architecture and is therefore hardware dependent You can write a traditional program without paying attention to the hardware To solve a problem on a neural network, you have to experiment with different architectures and parameters (hardware)
55
Philosophical differences
A computer program contains explicit rules for manipulating symbols, e.g.
If x>y then a=1 else a=2
A neural network represents symbols (x, y, etc) as a distributed pattern (typically a feature vector), e.g. 001100 for a, 110011 for b A neural network doesnt use explicit rules but weights on connections (and threshold functions in units) to perform tasks
56
Operation
A traditional program takes input, transforms the input through rules and produces output A neural network uses spreading activation that represents a probability that a neuron generates an action potential spike (fires) that spreads to other units connected to it
57
Explaining the mind

Computationalism explains the mind as a suite of software that takes incoming symbols, performs mathematical and logical operations on those symbols to produce other symbols, and outputs a desired result that is correct Connectionism explains the mind as an ndimensional vector of numeric activation values over neural units in a network no symbols exist
58
Explaining learning
Computationalism explains learning as the application of symbolic formulae to data to extract important features or derive new conclusions Connectionism explains learning as modifying weights on connections
59
Where do you stand?

Do you think that your mind works like a computer (computationalism, strong AI) or like a neural network? Remember, it is the mind we are dealing with, not brain Even connectionists accept that neural networks are, at best, an approximation to real brains If you want to work at the brain level, you need to know about biochemistry
60
Summary
You have been introduced to the two main views in AI concerning intelligence and how to build intelligence into a machine: Write increasingly sophisticated software Construct increasingly sophisticated neural networks The debate is not just about machine intelligence but about ourselves: Are we von Neumann computers (like your desktop, Apple iPad)? Are we neural networks (no real neural network computers have yet been built)?
61
Exercise: Update Perceptron weights for 1 1 For AND

-1 W1 = 0.5 x t = 0.0 A B Output 00 0 01 0 10 0 11 1
W2 = 0.3
W3 = -0.2
y When 1 1 is presented:
-1 1 1 (-1*0.5) + (1*0.3) + (1*-0.2) = -0.4 0
Using the perceptron learning rule and a learning rate of 0.1, show how the weights are changed for subsequent presentations of 1 1. How many presentations does it take before the perceptron produces the correct output for 1 1? Then test your weights on 0 1, 0 0 and 1 0.
62
Solution to Exercise: 1 1
-1 1 1 (-1*0.5) + (1*0.3) + (1*-0.2) = -0.4 0
First presentation: Error = (target output actual output) = 1 Weight change = learning rate x error x input Weight change for w1 = 0.1 x 1 x -1 = -0.1 (w1=0.5-0.1=0.4) Weight change for w2 = 0.1 x 1 x 1 = 0.1 (w2=0.3+0.1=0.4) Weight change for w3 = 0.1 x 1 x 1 = 0.1 (w3=-0.2+0.1=-0.1)
-1 1 1 (-1*0.4) + (1*0.4) + (1*-0.1) = -0.1 0
Second presentation: Error = (target output actual output) = 1 Weight change = learning rate x error x input Weight change for w1 = 0.1 x 1 x -1 = -0.1 (w1=0.4-0.1=0.3) Weight change for w2 = 0.1 x 1 x 1 = 0.1 (w2=0.4+0.1=0.5) Weight change for w3 = 0.1 x 1 x 1 = 0.1 (w3=-0.1+0.1=0)
-1 1 1 (-1*0.3) + (1*0.5) + (1*0) = 0.2 1
63
Test your weights on 0 1, 0 0 and 1 0

-1 0 1 (-1*0.3) + (0*0.5) + (1*0) = -0.3 0
That is, the weights work for 0 1. For 0 0:

-1 0 0 (-1*0.3) + (0*0.5) + (0*0) = -0.3 0
For 1 0:
-1
0 (-1*0.3) + (1*0.5) + (0*0) = 0.2
64
References and further reading

http://en.wikipedia.org/wiki/Connectionism butler.redlands.edu/cs/ai/AdotSaha/NNtutorial.ppt www.dontveter.com/bpr/bpr.html http://philosophy.uwaterloo.ca/MindDict/connectionism.html
65

Lecture 4

Uploaded by

Copyright:

Available Formats

Lecture 4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 4

Uploaded by

Copyright:

Available Formats

Iris versicolor

Workshops and assignment

From biology to computing

The Structure of Neurons

Classical computing vs. Neural Net

data and instructions

memory Layer 1 Layer 2 Layer 3

Example of fault tolerance

Nodes input output node function Connections connection strength

Perceptron Learning Rule

ANN Supervised Learning Method - Basic

4. Error used to adjust weights (not input or output)

5. Many repetitions of 1-4

Learning algorithm (supervised) recap

Feedback Learning Algorithm recap

Automatic Perceptron algorithm

Perceptron feedback rule

Update Perceptron weights for 1 0

Excitatory and Inhibitory Synapses - Recap

Most common ANN architecture: Feedforward nets

Information flow is unidirectional

Information processing is parallel

Your ANN for OCR

Multi-layer Perceptron (MLP) Topology Note that

Hidden Layer(s) j Output Layer k

Fully connected, MLP is the most common (and simplest) ANN

Backpropagation Learning Algorithm

Backpropagation algorithm in rules

For a hidden node, the error is:

The role of in learning

w1=0.5 w2=0.2 w3=0.8

1*0.5 + 0*0.2 + 1*0.8 = 1.3 (actual output)

get a larger amount of 40 change

Tanh f(x) = (ex e-x) / (ex + e-x)

Logistic f(x) = ex / (1 + ex)

Threshold 0 if x< 0 f(x) = 1 if x >= 1

Why is back propagation important?

Learning rate specifies the step width of gradient descent

Enhancements To Back Propagation

The role of bias in FFBP ANNs

Weight change and momentum

When to stop training the Network ?

An example of a neural network learning successfully

Choice of Training Parameters

Is Backprop biologically plausible?

Neural network models (summary)

Connectionism versus strong AI

Explaining the mind

Where do you stand?

Exercise: Update Perceptron weights for 1 1 For AND

Test your weights on 0 1, 0 0 and 1 0

That is, the weights work for 0 1. For 0 0:

0 (-1*0.3) + (1*0.5) + (0*0) = 0.2

References and further reading

You might also like

10.5 + 00.2 + 1*0.8 = 1.3 (actual output)

0 (-10.3) + (10.5) + (0*0) = 0.2