07 Pattern Recognition

Download as pdf or txt
Download as pdf or txt
You are on page 1of 53

Pattern

Recognition
Perceptron (“single-layer perceptron”)
• simple NN for
linearly separable
patterns only

 
error correcting learning
y(n) = sgn w (n)x(n)
T

w(n + 1) = w(n) + d (n) − y(n)x(n)


if x(n)  C1
+ 1 ak
d ( n) = {
if x(n)  C2
− 1 ak
AIDP, M.Oravec, ICSM FEI STU
Linear and nonlinear separability
2 regions separated by hyperplane
p

 w x − = 0
i =1
i i

Class C2
hyperplane
Class C1

Separating boundary

Linear separability – 2D inputs, 2 classes

AIDP, M.Oravec, ICSM FEI STU


Linear and nonlinear separability

Class C1
Class C1

Class C2
Class C2

Linearly separable patterns Nonlinearly separable patterns


(decision surfaces are hyperplanes)

AIDP, M.Oravec, ICSM FEI STU


linear and nonlinear separability

• Minsky a Papert 1969


• Perceptron is not able
to separate even XOR
(exclusive or) logical A B A XOR B
function! 0 0 0
0 1 1
• NN were dead… 1 0 1
1 1 0

• 1986: PDP group, MLP,


backpropagation
algorithm

AIDP, M.Oravec, ICSM FEI STU


• Good illustration of an ANN for data
classification

• http://playground.tensorflow.org

AIDP, M.Oravec, ICSM FEI STU


Typical application areas

• Machine vision
• Character recognition (OCR)
• Computer aided diagnosis
• Speech recognition
• Face recognition
• Biometrics
Pattern • Image Data Base retrieval
• Data mining
Recognition) • Bioinformatics

The task:

• Assign unknown objects – patterns –


into the correct class. This is known as
classification.

AIDP, M.Oravec, ICSM FEI STU


o Features: These are measurable quantities obtained from
the patterns, and the classification task is based on their
respective values.

o Feature vectors: A number of features


𝑥1 , . . . , 𝑥𝑝 ,
constitute the feature vector
𝑇
𝑥 = 𝑥1 , . . . , 𝑥𝑝 ∈ 𝑅𝑝
Feature vectors are treated as random vectors.

AIDP, M.Oravec, ICSM FEI STU


An example: medical image
classification

a graph of the mean


value of the intensities
of objects from their
standard deviation
AIDP, M.Oravec, ICSM FEI STU
class A Mean and
standard
class B deviation are
features

• To which class does a new image (*) belongs? With greater


probability to A
• classifier - with its decision boundary it divides the space of flags
into areas of classes A, B
• Here it is a linear decision boundary (straight line) obtained from
training patterns

AIDP, M.Oravec, ICSM FEI STU


• The classifier consists of a set of functions, whose values,
computed at x , determine the class to which the corresponding
pattern belongs

• Classification system overview

Patterns
sensor

feature
generation

feature
selection

classifier
design

system
evaluation
AIDP, M.Oravec, ICSM FEI STU
Watanabe defines a pattern as
opposite of a chaos; it is an entity,
vaguely defined, that could be given a
name.
• For example, a pattern could be a fingerprint
Pattern, image, a handwritten cursive word, a human
face, or a speech signal.

pattern Given a pattern, its


recognition/classification may consist
recognition of one of the following two tasks:

• 1) supervised classification (e.g., discriminant


analysis) in which the input pattern is identified
as a member of a predefined class,
• 2) unsupervised classification (e.g., clustering)
in which the pattern is assigned to a hitherto
unknown class.

AIDP, M.Oravec, ICSM FEI STU


The design of a pattern recognition system
essentially involves the following three
aspects:

1) data
2) data 3) decision
Design of a acquisition and
preprocessing,
representation, making.

pattern
recognition
system Learning from a set of examples (training
set) is an important and desired attribute
of most pattern recognition systems.

AIDP, M.Oravec, ICSM FEI STU


• The four best known approaches for
pattern recognition are:
1) template matching,
Approaches 2) statistical classification,
for pattern 3) syntactic or structural matching,
4) neural networks.
recognition These models are not
necessarily independent and
sometimes the same pattern
recognition method exists with
different interpretations.
Attempts have been made to
design hybrid systems.

AIDP, M.Oravec, ICSM FEI STU


1) Template Matching

Matching is a generic operation in pattern recognition which is used to determine


the similarity between two entities (points, curves, or shapes) of the same type.

In template matching, a template (typically, a 2D shape) or a prototype of the


pattern to be recognized is available.
• The pattern to be recognized is matched against the stored template while taking into account all
allowable pose (translation and rotation) and scale changes.
• The similarity measure, often a correlation, may be optimized based on the available training set.
Often, the template itself is learned from the training set.

Disadvantages

• if the patterns are distorted due to the imaging process, viewpoint change, or large intraclass
variations among the patterns.

AIDP, M.Oravec, ICSM FEI STU


2) Statistical Classification

Each pattern is represented in terms of p features or measurements and is viewed as


a point in a p-dimensional space.

The goal is to choose those features that allow pattern vectors belonging to different
categories to occupy compact and disjoint regions in a p-dimensional feature space.

Given a set of training patterns from each class, the objective is to establish decision
boundaries in the feature space which separate patterns belonging to different
classes.
• In the statistical decision theoretic approach, the decision boundaries are determined by the probability
distributions of the patterns belonging to each class, which must either be specified or learned
• One can also take a discriminant analysis-based approach to classification: First a parametric form of the
decision boundary (e.g., linear or quadratic) is specified; then the best decision boundary of the specified
form is found based on the classification of training patterns.

AIDP, M.Oravec, ICSM FEI STU


3) Syntactic (Structural) Matching

• hierarchical perspective where a pattern is viewed as being composed of simple subpatterns which
are themselves built from yet simpler subpatterns
• The simplest/elementary subpatterns to be recognized are called primitives and the given complex
pattern is represented in terms of the interrelationships between these primitives.
• In syntactic pattern recognition, a formal analogy is drawn between the structure of patterns and
the syntax of a language.
– The patterns are viewed as sentences belonging to a language, primitives are viewed as the
alphabet of the language, and the sentences are generated according to a grammar. Thus, a
large collection of complex patterns can be described by a small number of primitives and
grammatical rules. The grammar for each pattern class must be inferred from the available
training samples.
– Suitable, if the patterns have a definite structure which can be captured in terms of a set of
rules,
• such as EKG waveforms, textured images, and shape analysis of contours
– Difficulties which primarily have to do with the segmentation of noisy patterns (to detect the
primitives) and the inference of the grammar from training data.

AIDP, M.Oravec, ICSM FEI STU


4) Neural Networks

• massive parallel computing systems


• huge number of simple processors (neurons)
• huge number of connections
• learning, generalization, adaptability, fault tolerance, distributed representation
• artificial neuron
• the ability to learn the complex relationships between inputs and outputs
• nonlinearity

• simple learning algorithms


• nonlinear algorithms for
– Feature extraction
– classification
• good hardware implementation
• boom of deep networks
AIDP, M.Oravec, ICSM FEI STU
Pattern Recognition Models

AIDP, M.Oravec, ICSM FEI STU


STATISTICAL PATTERN
RECOGNITION

AIDP, M.Oravec, ICSM FEI STU


Statistical pattern recognition
• input pattern: p-dimensional input vector
• pattern represented by m features (m-dimensional feature
vector)
• determination of decision boundaries uses known methods
of statistical decision theory

• Two modes of operation of the pattern recognition system:


– training (learning)
– classification (testing))

AIDP, M.Oravec, ICSM FEI STU


Model of statistical pattern recognition

• preprocessing
• dimensionality reduction
– „the curse of dimensionality“ - number of trainings points is exp. function of
the feature dimension
– „peaking phenomenon" - for a fixed training. the set of features can degrade
the activity of the classifier (the number of training points is small due to the
number of features)
• feature extraction
• feature selection
• classification
AIDP, M.Oravec, ICSM FEI STU
Pattern recognition systems

• One-stage system, direct classification


x1
x2
input classifier decision
...

xp

• Two-stage system
x1 y1
x2 feature
input decision
...

classifier
...

extraction
xp ym

AIDP, M.Oravec, ICSM FEI STU


Pattern recognition systems
• One-stage system, direct classification

Pattern Preproces class


Classifica-
dim=p sing dim=p tion

Many methods from 1st


part of semester ☺

• Two-stage system (feature extraction + classification)


Pattern Class
Preproces Feature Feature Classifica-
dim=p sing extraction selection tion
dim=p dim=p dim=m
m<p
Many methods from 1st
part of semester ☺ AIDP, M.Oravec, ICSM FEI STU
FEATURE EXTRACTION

AIDP, M.Oravec, ICSM FEI STU


Feature extraction
• feature extraction
• feature – the most important property of data

Data Feature
space space
transformation
dim: p dim: m=p
dim: m<p
decorrelation
dimensionality reduction

AIDP, M.Oravec, ICSM FEI STU


❖Principal Components Analysis
(The Karhunen – Loève transform):
➢ The goal: Given an original set of m measurements x  
m

compute y   
y = AT x

for an orthogonal A, so that the elements of y are optimally


mutually uncorrelated.
That is

E  y (i ) y ( j ) = 0, i  j
➢ Sketch of the proof:

  
Ry = E y y = E A x x A = AT Rx A
T T T

AIDP, M.Oravec, ICSM FEI STU
• If A is chosen so that its columns a i are the orthogonal
eigenvectors of Rx, then
R y = AT Rx A = 
where Λ is diagonal with elements the respective
eigenvalues λi.
• Observe that this is a sufficient condition but not
necessary. It imposes a specific orthogonal structure on
A.

➢ Properties of the solution


• Mean Square Error approximation.
Due to the orthogonality of A:
m
x =  y (i )a i , y (i ) = a i x
T

i =0

AIDP, M.Oravec, ICSM FEI STU


− Define  −1
xˆ =  y (i )a i
i =0

− The Karhunen – Loève transform minimizes the


square error:

   m 
2

E x − xˆ = E   y (i )a i
2

 i = 
− The error is:


E x − xˆ
2
=  
m

i =
i

It can be also shown that this is the minimum mean


square error compared to any other representation of x
by an ℓ-dimensional vector.

AIDP, M.Oravec, ICSM FEI STU


− In other words, x̂ is the projection of x into the
subspace spanned by the principal ℓ eigenvectors.
However, for Pattern Recognition this is not the
always the best solution.

AIDP, M.Oravec, ICSM FEI STU


Example-computation of PCA (KLT)

AIDP, M.Oravec, ICSM FEI STU


PCA (KLT)

• Lindsay I Smith: A tutorial on Principal Components Analysis, 2002, http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf


AIDP, M.Oravec, ICSM FEI STU
CLASSIFICATION

AIDP, M.Oravec, ICSM FEI STU


Statistical pattern recognition – decision making
process
• classifier proposal - classification according to what
information we have about class-conditional densities

– if all densities are known (but this is rare)


• Bayes' optimal rule

– if the densities are known, but some of their parameters are


not known
• parametric decision problem - unknown parameters are estimated

– if the densities are not known


• nonparametric problem
• either probability densities are estimated (eg Parzen window)
• or direct construction of the decision boundary from the training data
• e.g. k-nearest neighbor rule
• MLP is also a nonparametric method for constructing a decision boundary

AIDP, M.Oravec, ICSM FEI STU


Various approaches to statistical pattern recognition

AIDP, M.Oravec, ICSM FEI STU


Frequently
used
classifiers

AIDP, M.Oravec, ICSM FEI STU


Illustration of decision boundaries

• Data:

AIDP, M.Oravec, ICSM FEI STU


Illustration of decision boundaries

AIDP, M.Oravec, ICSM FEI STU


Statistical pattern recognition – decision making
process
• classifier proposal - classification according to what
information we have about class-conditional densities

– if all densities are known (but this is rare)


• Bayes' optimal rule

– if the densities are known, but some of their parameters are


not known
• parametric decision problem - unknown parameters are estimated

– if the densities are not known


• nonparametric problem
• either probability densities are estimated (eg Parzen window)
• or direct construction of the decision boundary from the training data
• e.g. k-nearest neighbor rule
• MLP is also a nonparametric method for constructing a decision boundary

AIDP, M.Oravec, ICSM FEI STU


K-NN RULE

k-Nearest Neighbour Rule

AIDP, M.Oravec, ICSM FEI STU


❖ The Nearest Neighbor Rule
➢ Choose k out of the N training vectors, identify the k
nearest ones to x

➢ Out of these k identify ki that belong to class ωi

➢ Assign x → i : ki  k j i  j

➢ The simplest version


k=1 !!!

➢ For large N this is not bad. It can be shown that:


if PB is the optimal Bayesian error probability, then:
M
PB  PNN  PB (2 − PB )  2 PB
M −1
AIDP, M.Oravec, ICSM FEI STU
PB - probability of error of optimal
Bayes classifier

2 PNN
➢ PB  PkNN  PB +
k

➢ k →  , PkNN → PB

➢ For small PB:

PNN  2 PB
P3 NN  PB + 3( PB ) 2

AIDP, M.Oravec, ICSM FEI STU


❖ Voronoi tesselation

Ri = x : d ( x, x i )  d ( x, x j ) i  j
AIDP, M.Oravec, ICSM FEI STU
Voronoi tessellation

◼ 2D case ◼ 3D case

AIDP, M.Oravec, ICSM FEI STU


kNN
◼ nice presentation:
http://courses.cs.tamu.edu/rgutier/cs790_w02/l8.pdf

AIDP, M.Oravec, ICSM FEI STU


kNN

http://courses.cs.tamu.edu/rgutier/cs790_w02/l8.pdf

AIDP, M.Oravec, ICSM FEI STU


kNN

http://courses.cs.tamu.edu/rgutier/cs790_w02/l8.pdf

AIDP, M.Oravec, ICSM FEI STU


kNN

http://courses.cs.tamu.edu/rgutier/cs790_w02/l8.pdf
AIDP, M.Oravec, ICSM FEI STU
Other nice solutions:

◼ ensemble learning
◼ SVM (support vector machine)
◼ neural networks

AIDP, M.Oravec, ICSM FEI STU


Other nice solutions: ensemble learning

AIDP, M.Oravec, ICSM FEI STU


Other nice solutions: ensemble learning

AIDP, M.Oravec, ICSM FEI STU


Other nice solution: SVM
◼ SVM
❑ https://www.youtube.com/watch?v=9NrALgHFwTo
❑ https://www.youtube.com/watch?v=ndNE8he7Nnk

AIDP, M.Oravec, ICSM FEI STU


Other nice solutions: NN
◼ Once again: good illustration of an ANN for data
classification ☺
❑ http://playground.tensorflow.org

AIDP, M.Oravec, ICSM FEI STU

You might also like