Artificial Neural Networks: Torsten Reil

Artificial Neural Networks
Torsten Reil
torsten.reil@zoo.ox.ac.uk
Outline
• What are Neural Networks?
• Biological Neural Networks
• ANN – The basics
• Feed forward net
• Training
• Example – Voice recognition
• Applications – Feed forward nets
• Recurrency
• Elman nets
• Hopfield nets
• Central Pattern Generators
• Conclusion
What are Neural Networks?
• Models of the brain and nervous system

• Highly parallel
– Process information much more like the brain than a serial
computer
• Learning
• Very simple principles

• Very complex behaviours
• Applications
– As powerful problem solvers
– As biological models
Biological Neural Nets
• Pigeons as art experts (Watanabe et al. 1995)
– Experiment:
• Pigeon in Skinner box
• Present paintings of two different artists (e.g. Chagall / Van
Gogh)
• Reward for pecking when presented a particular artist (e.g. Van
Gogh)
• Pigeons were able to discriminate between Van Gogh
and Chagall with 95% accuracy (when presented with
pictures they had been trained on)
• Discrimination still 85% successful for previously

unseen paintings of the artists
• Pigeons do not simply memorise the pictures

• They can extract and recognise patterns (the ‘style’)
• They generalise from the already seen to make
predictions
• This is what neural networks (biological and artificial)

are good at (unlike conventional computer)
ANNs – The basics
• ANNs incorporate the two fundamental components

of biological neural nets:
1. Neurones (nodes)
2. Synapses (weights)
• Neurone vs. Node
• Structure of a node:
• Squashing function limits node output:

• Synapse vs. weight
Feed-forward nets
• Information flow is unidirectional

• Data is presented to Input layer
• Passed on to Hidden Layer
• Passed on to Output layer
• Information is distributed
• Information processing is parallel
Internal representation (interpretation) of data

• Feeding data through the net:
(1  0.25) + (0.5  (-1.5)) = 0.25 + (-0.75) = - 0.5
1
Squashing:  0.3775
1 e 0.5
• Data is presented to the network in the form of
activations in the input layer
• Examples
– Pixel intensity (for pictures)
– Molecule concentrations (for artificial nose)
– Share prices (for stock market prediction)
• Data usually requires preprocessing

– Analogous to senses in biology
• How to represent more abstract data, e.g. a name?

– Choose a pattern, e.g.
• 0-0-1 for “Chris”
• 0-1-0 for “Becky”
• Weight settings determine the behaviour of a network
 How can we find the right weights?

Training the Network - Learning
• Backpropagation
– Requires training set (input / output pairs)
– Starts with small random weights
– Error is used to adjust weights (supervised learning)
 Gradient descent on error landscape
• Advantages
– It works!
– Relatively fast
• Downsides
– Requires a training set
– Can be slow
– Probably not biologically realistic
• Alternatives to Backpropagation
– Hebbian learning
• Not successful in feed-forward nets
– Reinforcement learning
• Only limited success
– Artificial evolution
• More general, but can be even slower than backprop
Example: Voice Recognition
• Task: Learn to discriminate between two different

voices saying “Hello”
• Data
– Sources
• Steve Simpson
• David Raubenheimer
– Format
• Frequency distribution (60 bins)
• Analogy: cochlea
• Network architecture
– Feed forward network
• 60 input (one for each frequency bin)
• 6 hidden
• 2 output (0-1 for “Steve”, 1-0 for “David”)
• Presenting the data
Steve
David
• Presenting the data (untrained network)
Steve
0.43
0.26
David
0.73
0.55
• Calculate error
Steve
0.43 – 0 = 0.43
0.26 –1 = 0.74
David
0.73 – 1 = 0.27
0.55 – 0 = 0.55
• Backprop error and adjust weights
Steve
0.43 – 0 = 0.43
0.26 – 1 = 0.74
1.17
David
0.73 – 1 = 0.27
0.55 – 0 = 0.55
0.82
• Repeat process (sweep) for all training pairs
– Present data
– Calculate error
– Backpropagate error
– Adjust weights
• Repeat process multiple times

• Presenting the data (trained network)
Steve
0.01
0.99
David
0.99
0.01
• Results – Voice Recognition
– Performance of trained network
• Discrimination accuracy between known “Hello”s

– 100%
• Discrimination accuracy between new “Hello”’s

– 100%
• Demo
• Results – Voice Recognition (ctnd.)
– Network has learnt to generalise from original data
– Networks with different weight settings can have same

functionality
– Trained networks ‘concentrate’ on lower frequencies
– Network is robust against non-functioning nodes

Applications of Feed-forward nets
– Pattern recognition
• Character recognition
• Face Recognition
– Sonar mine/rock recognition (Gorman & Sejnowksi, 1988)
– Navigation of a car (Pomerleau, 1989)
– Stock-market prediction
– Pronunciation (NETtalk)
(Sejnowksi & Rosenberg, 1987)
Cluster analysis of hidden layer
FFNs as Biological Modelling Tools
• Signalling / Sexual Selection

– Enquist & Arak (1994)
• Preference for symmetry not selection for ‘good genes’, but
instead arises through the need to recognise objects
irrespective of their orientation
– Johnstone (1994)
• Exaggerated, symmetric ornaments facilitate mate recognition
(but see Dawkins & Guilford, 1995)

Recurrent Networks
• Feed forward networks:

– Information only flows one way
– One input pattern produces one output
– No sense of time (or memory of previous state)
• Recurrency
– Nodes connect back to other nodes or themselves
– Information flow is multidirectional
– Sense of time and memory of previous state(s)
• Biological nervous systems show high levels of

recurrency (but feed-forward structures exists too)
Elman Nets
• Elman nets are feed forward networks with partial

recurrency
• Unlike feed forward nets, Elman nets have a memory

or sense of time
Classic experiment on language acquisition and
processing (Elman, 1990)
• Task
– Elman net to predict successive words in sentences.
• Data
– Suite of sentences, e.g.
• “The boy catches the ball.”
• “The girl eats an apple.”
– Words are input one at a time
• Representation
– Binary representation for each word, e.g.
• 0-1-0-0-0 for “girl”
• Training method
– Backpropagation
• Internal representation of words
Hopfield Networks
• Sub-type of recurrent neural nets
– Fully recurrent
– Weights are symmetric
– Nodes can only be on or off
– Random updating
• Learning: Hebb rule (cells that fire together wire

together)
– Biological equivalent to LTP and LTD
• Can recall a memory, if presented with a

corrupt or incomplete version
 auto-associative or
content-addressable memory
Task: store images with resolution of 20x20 pixels
 Hopfield net with 400 nodes
Memorise:
1. Present image
2. Apply Hebb rule (cells that fire together, wire together)
• Increase weight between two nodes if both have same activity, otherwise decrease
3. Go to 1
Recall:
1. Present incomplete pattern
2. Pick random node, update
3. Go to 2 until settled
DEMO
• Memories are attractors in state space
Catastrophic forgetting
• Problem: memorising new patterns corrupts the memory of

older ones
 Old memories cannot be recalled, or spurious memories arise
• Solution: allow Hopfield net to sleep

• Two approaches (both using randomness):
– Unlearning (Hopfield, 1986)

• Recall old memories by random stimulation, but use an inverse
Hebb rule
‘Makes room’ for new memories (basins of attraction shrink)
– Pseudorehearsal (Robins, 1995)

• While learning new memories, recall old memories by random
stimulation
• Use standard Hebb rule on new and old memories
 Restructure memory
• Needs short-term + long term memory
• Mammals: hippocampus plays back new memories to neo-
cortex, which is randomly stimulated at the same time
RNNs as Central Pattern Generators
• CPGs: group of neurones creating rhythmic muscle activity for

locomotion, heart-beat etc.
• Identified in several invertebrates and vertebrates
• Hard to study
  Computer modelling
– E.g. lamprey swimming (Ijspeert et al., 1998)
• Evolution of Bipedal Walking (Reil & Husbands, 2001)
0.9
0.8
0.7
left hip lateral
activation
0.6 left hip a/p

right hip lateral
0.5
right hip a/p
0.4 left knee
right knee
0.3
0.2
0.1
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77
time
• CPG cycles are cyclic attractors in state space
Recap – Neural Networks
• Components – biological plausibility
– Neurone / node
– Synapse / weight
• Feed forward networks

– Unidirectional flow of information
– Good at extracting patterns, generalisation and
prediction
– Distributed representation of data
– Parallel processing of data
– Training: Backpropagation
– Not exact models, but good at demonstrating
principles
• Recurrent networks
– Multidirectional flow of information
– Memory / sense of time
– Complex temporal dynamics (e.g. CPGs)
– Various training methods (Hebbian, evolution)
– Often better biological models than FFNs
Online material:
http://users.ox.ac.uk/~quee0818

Artificial Neural Networks: Torsten Reil

Uploaded by

Copyright:

Available Formats

Artificial Neural Networks: Torsten Reil

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Artificial Neural Networks: Torsten Reil

Uploaded by

Copyright:

Available Formats

Artificial Neural Networks

• Models of the brain and nervous system

• Very simple principles

• Pigeons as art experts (Watanabe et al. 1995)

• Discrimination still 85% successful for previously

• Pigeons do not simply memorise the pictures

• This is what neural networks (biological and artificial)

• ANNs incorporate the two fundamental components

• Squashing function limits node output:

• Information flow is unidirectional

• Information processing is parallel

Internal representation (interpretation) of data

(1  0.25) + (0.5  (-1.5)) = 0.25 + (-0.75) = - 0.5

• Data usually requires preprocessing

• How to represent more abstract data, e.g. a name?

 How can we find the right weights?

• Task: Learn to discriminate between two different

• Repeat process multiple times

– Performance of trained network

• Discrimination accuracy between known “Hello”s

• Discrimination accuracy between new “Hello”’s

– Network has learnt to generalise from original data

– Networks with different weight settings can have same

– Trained networks ‘concentrate’ on lower frequencies

– Network is robust against non-functioning nodes

– Sonar mine/rock recognition (Gorman & Sejnowksi, 1988)

– Navigation of a car (Pomerleau, 1989)

• Signalling / Sexual Selection

(but see Dawkins & Guilford, 1995)

• Feed forward networks:

• Biological nervous systems show high levels of

• Elman nets are feed forward networks with partial

• Unlike feed forward nets, Elman nets have a memory

• Learning: Hebb rule (cells that fire together wire

• Can recall a memory, if presented with a

• Problem: memorising new patterns corrupts the memory of

• Solution: allow Hopfield net to sleep

– Unlearning (Hopfield, 1986)

– Pseudorehearsal (Robins, 1995)

• CPGs: group of neurones creating rhythmic muscle activity for

0.6 left hip a/p

• Feed forward networks

You might also like