Unit 2: 1. Mathematical Foundations and Learning Mechanisms

1
UNIT 2
1. Mathematical Foundations and Learning Mechanisms.
 Mathematical Preliminaries
 Notation
 Intervals, Sequences and Set closures
 Vector Spaces
 Inner Product, Norm and Orthogonality
 Functions and Continuity
 Continuous Functions
 Homeomorphisms
 Monotonic Functions
 Matrices, Linear Transformations, and Eigenvectors
Notation
 Simple upper case alphabets denote Vectors, where as boldfaced Upper case alphabets
denote Matrices.
 X is a vector whereas A is a matrix.
 X= (x1,………..xn) is a vector
 A= [aij] is a matrix
 As far as possible the symbols I, j, l will be employed to denote components where as k
will be used to denote an iteration index or a discrete time index.
 The vector X k at iteration k will have components X k = (xk1 ,...,xk n)T
Intervals, Sequences, and Set Closure
 we consistently assume the following notation—R denotes the set of real numbers; R+
denotes the positive reals; Z denotes the set of integers; and Z+ denotes the set of positive
integers.
 Intervals Recall that an interval IS is a set of real numbers that satisfy the property that if
x ∈ I and y ∈ I and x ≤ z ≤ y, then z ∈ I.
 In describing intervals we usually employ the following notations as shown in Eqs
 (a,b)={x: a<x<b}
SMGG, GUNTUR Dr. JAIDEEP GERA

2
 [a,b]={x:a ≤x ≤b}
 (a, ∞) = {x : x>a}
 (a, ∞) = {x : x ≥ a}
 (−∞,b) = {x : x<b}
 (−∞,b) = {x : x ≤ b}
 Note that these intervals are bounded either above or below or both.
 This simply means that there exists a real number h that is greater than every element of I,
or a number l that is less than every number in I.
 The continuum property of real numbers states that every non-empty set of real numbers
which is bounded above has a least upper bound (lub); and which is bounded below has a
greatest lower bound (glb).
 If glb(I) ∈ I and lub(I) ∈ I, then I is closed. Otherwise I is either half-open or open.
Sequences:
 A sequence {xn} ∈ I is a mapping from the set of integers, commonly Z+, into I. A point
x ⊂ R is a limit point of I if there is a sequence of points {xn} ∈ I that converges to x.
 For example, {1/n}∞n=1 is a sequence of points in the interval [0,1] that converges to the
point 0. Therefore, 0 is a limit point of the sequence {1/n}∞n=1. Similarly the sequence {1
− 1/n}∞n=1 converges to the point 1. Clearly, 0 and 1 are limit points of the open and closed
intervals (0, 1) and [0, 1].
Set Closuer
 One can now define the closure of a set in terms of its limit points. A set together with its
limit points is called its closure. A set J is closed if it contains all of its limit points. In the
above example, (0, 1) is therefore open and [0,1] is closed.
 The complement of a closed set is open. By definition, an empty set is closed.
 Further, any finite union of closed sets is closed. But surprisingly infinite unions of closed
sets need not be closed.
 3.1+5.5= 8.6 is a real number adding two real numbers give real number so it is closed.
 4 - 8= - 4, - 4 is not a whole number because whole number don’t have negative , so whole
numbers are not closed under subtraction.
 From a set of shirts,
 Shirts are closed under the operation of “wash”.
 Shirts are not closed under the operation of “rip”.

3
Vector Space
 Vector means a quantity having direction as well as magnitude, the position of one point
in space is relative to other.
 A space consisting of vectors, together with the Associative and Commutative operation
of addition of vectors , and the Associative and the distributive operation of multiplication
of vectors by scalars.
 Suppose that V is a set upon which we have defined two operations:
 (1) Vector addition, which combines two elements of V and is denoted by “+”,
and
 (2) Scalar multiplication, which combines a complex number with an element
of V and is denoted by juxtaposition. Then V, along with the two operations, is
a vector space over C if the following ten properties hold.
 AC Additive Closure
If u, v ∈ V, then u + v ∈ V.
 SC Scalar Closure
If α ∈ C and u ∈ V, then α u ∈ V.
 C Commutativity
If u, v ∈ V then u + v = v + u.
 AA Additive Associativity
If u, v, w ∈ V, then u+(v + w) = ( u + v) + w.
 Z Zero Vector
There is a vector, 0, called the zero vector, such that u + 0 = u for all u ∈ V.
 AI Additive Inverses
If u ∈ V, then there exists a vector −u ∈ V so that u + (−u) = 0.
 SMA Scalar Multiplication Associativity
If α, β ∈ C and u ∈ V, then α(βu)=(αβ)u.
 DVA Distributivity across Vector Addition
If α ∈ C and u, v ∈ V, then α(u + v)=αu+αv.
 DSA Distributivity across Scalar Addition
If α, β ∈ C and u ∈ V, then (α + β)u=αu+βu.
 O One
If u ∈ V, then 1u=u.
Inner Product

4
 An inner product is a generalization of the dot product. In a vector space, it is a way to

multiply vectors together, with the result of this multiplication being a scalar.
 More precisely, for a real vector space, an inner product
 < . , . > satisfies the following four properties. Let u,v and w be vectors and  be a scalar,
then:
 (u+v,w)= (u,w)+(v,w)
 ( v,w)=  (v,w)
 (v,w)=(w,v)
 (v, v)≥ 0
 The fourth condition in the list above is known as the positive-definite condition.
 A vector space together with an inner product on it is called an inner product space.
 The real numbers ℝ where the inner product is given by
 <x, y> = x y.
 The Euclidean space ℝn where the inner product is given by the dot product.
 <(x1, x2,…………, xn), (y1,y2,…………., yn)> = x1y1, x2y2,……………,xnyn.
Norms
 Every inner product gives rise to a norm that can be used to measure the magnitude or
length of the elements of the underlying vector space.
 However, not every norm that is used in analysis and applications arises from an inner
product.
 To define a general norm, we will extract those properties.
 The mathematical concept of associating the length of vector is known as norm.
 A norm on the vector space ℝn assigns a real number ║v║ to each vector v ∈ V , subject
to the following axioms for every v, w ∈ V , and c ∈ ℝ.
 (i) Positivity: ║v║ ≥ 0, with ║v║ = 0 if and only if v = 0.

 (ii) Homogeneity: ║c v║ = | c | ║v║. Here c is scalar.
 | c | gives the output as positive, irrespective of input of v.
 (iii) Triangle inequality: ║v + w║ ≤ ║v║ + ║w║

5
Orthogonality
 In , two vectors u and v are perpendicular if and only if the distance between u and v is the
same as the distance between u and –v.
(a): u and v not equal to u and –v (b): u and v equal to u and –v
 Two vectors are orthogonal if they satisfy the condition XT Y=0, if (X,Y)=0.
 The product of the square of the magnitude of two vectors can never be less than the square
of their inner product.
 This is stated in the form of Cauchy- Schwarz inequality
 (XT Y)2= ║X║2 ║Y║2
Functions and Continuity
Continuous Function
 A function is continuous when its graph is a single unbroken curve.
 That is not a formal definition, but it helps you understand the idea.
 Here is a continuous function:

6
 So what is not continuous (also called discontinuous) ?

 Look out for holes, jumps or vertical asymptotes (where the function heads up/down
towards infinity).
 This is not continuous because of a hole.
 It is not continuous because of the jump
 It is not continuous because of the vertical asymptote.

7
Homeomorphisms
 In mathematics, a correspondence between two figures or surfaces or other geometrical
objects, defined by a one-to- one mapping that is continuous in both directions.
 In general a function f is a one–one (or injective) function if every element in the domain
maps to exactly one element in the range.
 Alternatively, for a one–one map, we say that f(x) = f(y) =⇒ x = y.
Monotonicity
 The monotonicity of a function tells us if the function is increasing or decreasing.
 A function is increasing when its graph rises from left to right.
 In technical terms, a function is increasing on an interval I if for any x1 and x2 in I, x1 is
less than x2 implies that f(x1) is less than f(x2).
 A function is decreasing when its graph falls from left to right.

 Again, the technical definition says that a function is decreasing on an interval I if for
any x1 and x2 in I, x1 is less than x2 implies that f(x1) is greater than f(x2).
 Accordingly they can be either monotone-increasing (f(·) > 0) or monotone-decreasing
(f(·) < 0).

8
Matrices
 Matrices play a crucial role in neural network theory.
 We start by recalling that a matrix is a rectangular array of real or complex numbers, and
in the present context we are interested only in matrices with real valued entries.
 Matrices that have an equal number of rows and columns are called square matrices.
 Matrix Multiplication
 A matrix can (and should) be looked upon as a horizontal stack of column vectors, or a
vertical stack of row vectors.
 Matrix Multiplication by rows
 Matrix Multiplication by colums
Linear Transformation
 A linear transformation A (or just a linear map) defined in a space such as Rn satisfies two
important properties:
 A(X1 + X2) = A(X1) + A(X2 ),∀X1 ,X2 ∈ ℝn;
 A(αX) = αA(X) ∀X ∈ ℝn,α ∈ R. Linear transformations generally scale and rotate points
in ℝn.

9
Eigen Value and Eigen Vectors

 Alex’s house is located at coordinates [10,10] (x=10 and y =10). Let’s refer to it as vector
A.
 Furthermore, his friend Bob lives in a house with coordinates [20,20] (x=20 and y=20). I
will refer to it as vector B.
 If Alex wants to meet Bob at his place then Alex would have to travel +10 points on the x-
axis and +10 points on the y-axis. This movement and direction can be represented as a
two-dimensional vector [10,10]. Let’s refer to it as vector C.
Eigenvectors (red) do not change direction when a linear transformation (e.g. scaling) is applied
to them. Other vectors (yellow) do
 Eigenvalues and Eigenvectors have their importance in linear differential equations where
you want to find a rate of change or when you want to maintain relationships between two
variables.
 Eigenvectors and eigenvalues are used to reduce noise in data. They can help us improve
efficiency in computationally intensive tasks.
 AX= λX
 AX- λX = 0
 X(A-λ)=0
Expectations and Covariance matrices
 The average value of some random variable X is called the expected value or the
expectation.
 The expectation of the vector X = (x1 ,......,xn)T is a vector 𝑋̅ whose elements(𝑥̅
1,…………….,𝑥̅ n) are given
 It is calculated as the probability weighted sum of values that can be drawn
 E[X] = sum(x1 * p1, x2 * p2, x3 * p3, ..., xn * pn)

10
 where p(X) is the probability density function that governs the distribution. We use the
notation E[X] for the expectation of X.
 The covariance matrix K associated with real random vector X is the expected value of the
outer vector product (X −𝑋̅) (X −𝑋̅)T
 K = E[(X −𝑋̅) (X −𝑋̅)T]
 The covariance matrix is denoted as the uppercase Greek letter Sigma. The covariance for
each pair of random variables is calculated as above.
 Covariancce matrix (K) is related to the correlation matrix(R).
 K= R - X 𝑋̅T
 R= K + X 𝑋̅T
Learning Process
 The property that is of primary significance for a neural network is the ability of the
network to learn from its environment. and to improve its performance through learning.
 The improvement in performance takes place over time in accordance with some
prescribed measure.
 A neural network learns about its environment through an interactive process of
adjustments applied to its synaptic weights and bias levels. Ideally.
 The network becomes more knowledgeable about its environment after each iteration of
the learning process.
 We define learning in the context of neural networks as:
 Learning is a process by which the free parameters of a neural network are adapted through
a process of stimulation by the environment in which the network is embedded. The type
of learning is determined by the manner in which the parameter changes take place.
 This definition of the learning process implies the following sequence of events:
 The neural network is stimulated by an environment.
 The neural network undergoes changes in its free parameters as a result of this
stimulation.
 The neural network responds in a new way to the environment because of the
changes that have occurred in its internal structure.
 A prescribed set of well-defined rules for the solution of a learning problem is called a
learning algorithm
 we have a "kit of tools" represented by a diverse variety of learning algorithms, each of
which offers advantages of its own.

11
 Basically, learning algorithms differ from each other in the way in which the adjustment to
a synaptic weight of a neuron is formulated.
Supervised learning
 Supervised learning as the name indicates the presence of a supervisor as a teacher.
Basically supervised learning is a learning in which we teach or train the machine using
data which is well labeled that means some data is already tagged with the correct answer.
After that, the machine is provided with a new set of examples (data) so that supervised
learning algorithm analyses the training data (set of training examples) and produces a
correct outcome from labeled data.
Unsupervised learning
 Unsupervised learning is the training of machine using information that is neither classified
nor labeled and allowing the algorithm to act on that information without guidance. Here
the task of machine is to group unsorted information according to similarities, patterns and
differences without any prior training of data.
 Unlike supervised learning, no teacher is provided that means no training will be given to
the machine. Therefore machine is restricted to find the hidden structure in unlabeled data
by our-self.
2. Revisiting Vectors and Matrices in linear Algebra
 Vectors can be thought of as an array of numbers where the order of the numbers also
matters (i.e magnitude and direction) .
 The individual numbers are denoted by writing the vector name with subscript indicating
the position of the individual member.
 For example, x1 is the first number, x2 is the second number and so on. If we want to write
the vector with the members explicitly then we enclose the individual elements in square
brackets as,
 Addition of two vectors is performed by adding the corresponding elements of each vector.
 For example,

12
 When a vector is multiplied by a scalar then each element gets multiplied by the scalar. For
example
Matrix
 A matrix is a two dimensional array of numbers. Generally matrices are represented by an
uppercase bold letter such as A.
 Since a matrix is two dimensional, each element is represented by a small letter with two
indices such as aij where i represents the row and j represents the column.
 A representation of m×n matrix is shown below,
Matrix multiplication
 The multiplication result matrix of two matrices A and B denoted by C has elements cij
which is the dot product of the ith row of A with jth column of the matrix B.

13
Diagonal Matrix
 A square matrix whose non-diagonal elements are zero is called a diagonal matrix. For
example:
Identity Matrix
 A square matrix whose non-diagonal elements are zero and diagonal elements are 1 is
called an identity matrix. For example:
Transpose of a Matrix
 The transpose of matrix X, denoted by XT, is the result of flipping the rows and columns
of a matrix X.
 If
 Then the transpose of X is given by

14
Inverse of a Matrix
 The inverse of a square matrix X, denoted by X-1 is a special matrix which has the property
that their multiplication results in the identity matrix.
 Above, I is an identity matrix. Both X and X-1 are inverses of each other. An instance of
an inverse matrix is illustrated below.
 These two matrices are inverses of each other. When we multiply them, we get an identity
matrix.
3. State Space Concept

 Set of all possible states for a given problem is known as state space of the problem
 Searching is needed for the solution,
 If steps are not known beforehand and have to be found out.
 For a search process following things are required
 1. Initial state description of the problem
 2. Set of legal operators
 3. Final or goal state
Problem
 A Man wants to bring a Lion, a goat, and Grass across the river. The boat is tiny and can
only carry one passenger at a time. If he leaves the Lion and the goat alone together, the
Lion will eat the goat. If he leaves the goat and the Grass alone together, the goat will eat
the Grass. How can he bring all three safely across the river?

15
Solution
 The man will first go with the goat in the boat at the other side of the river and come back
alone in the boat.
 The man will go with the lion in the boat and leave the lion at the other side of the river
and come back with the goat.
 The man will go with the grass in the boat, drops the grass at the end of the river with the
lion and come back alone.
 The man will finally go with the goat in the boat.
4. Optimization Concept
 Optimization is already a part of our life. You might not be aware of it. You are already
doing optimization in your life. For example, you wake up early in the morning and try to
catch a bus or train on time. Your objective is to catch the bus/train on time while properly
using your limited time early in the morning.
 You go to shopping and decide what to purchase within your budget constraint.
 Here, you try to maximize your satisfaction within your budget and in a time constraint.
 There are so many examples in our lives that we try to optimize while
maximizing/minimizing our objectives with many other constraints as such time, cost, etc.
 Optimization methods are applied in many fields: economics, management, planning,
logistics, robotics, optimal design, engineering, signal processing, etc.
 Phases
 The objective of the first phase is the acquisition of information, which is to identify the
parameters that affect the system. This could be profit, time, potential energy, or the
quantity or combination
 The second, very important, phase is the development of a model to integrate
all the parameters identified.
 The third phase focuses on the evaluation and analysis of the performance of the system.
 Finally, an optimization phase is regularly used to improve system performance.
 Any optimization problem has three basic ingredients (Arora 2003):
 The optimization variables, also called the design variables.
 Cost function, also called the objective function.
 The constraints expressed as equalities or inequalities.

16
 Transcript of an optimization problem in a mathematical formulation is a critical step in

the process of solving the problem.
 The decision consists of the following steps
 Formulate the problem: In this first step, a decision problem is identified. Then, an
initial problem statement is made.
 Model the problem: In this important step, is a mathematical model is constructed
for the problem.
 Optimize the problem: Once the problem is modeled, the solving procedure
generates a good solution to the problem.
 Implement a solution: The solution is tested in practice by the decision maker and
is implemented if it is acceptable.
Types of optimization problems
 Linear programming (LP): The objective function and constraints are linear. The
decision variables are involved scalar and continuous.
 Nonlinear programming (NLP): The function and / or objective constraints are not linear.
The decision variables are scalar and continuous.
 Integer Programming (IP): The decision variables are scalar and integers.
 Mixed integer linear programming (MILP): The objective function and constraints are
linear. The decision variables are scalar; some of them are integers, while others are
continuous variables.
 Mixed integer nonlinear programming (MINLP): A problem of integer linear
programming and continuous decision variables involved.
 Discrete optimization: problems involving discrete variables (integer) decision.
 Optimal Control: The decision variables are vectors.
 Stochastic programming or stochastic optimization: It is also called optimization under
uncertainty. In these problems, the objective function and / or constraints have uncertain
variables (random).
 Multi-objective optimization: Suppose we try to improve product performance while
trying to minimize the cost at the same time (Yang 2014). In this case, we are dealing with
a multi−objective optimization problem.
 Dynamic optimization: In dynamic optimization problems, the objective function
is deterministic at some point, but varies over time.

17
Optimization methods to solve problems

 There are three major classes of algorithms to solve problems
 Extract algorithms
 Approximation Algorithms
 Heuristic Algorithm
 Exact algorithms are guaranteed to find the optimal solution, but it can take an exponential
number of iterations. In practice, they are usually only applicable to small instances, due
to long operating times caused by the complexity. They include: branch−and−bound,
branch−and−cut, cutting plane, and dynamic programming algorithms.
 Approximation algorithms: Provides semi optimal solution comparative to heuristic
algorithm. Generally these are pretty good solutions within a reasonable time,
approximation algorithms provide provable solution quality and provable execution
limits .
 Heuristic algorithms provide an optimum solution, but make no guarantee as to its quality.
Heuristic algorithms use trial and error, learning and adapting to solve problems
5. Error Correction Learning
 ER is used to identify the error and reduce error in the learning process
 Notations used for ER are
 Output signal of neuron k is denoted by yk(n)
 Desired output signal of neuron k is denoted by dk(n)
 Error signal is denoted by ek(n)

 If ek(n) is equal to 0 then there is no error
 If there is a difference we need to minimize the cost function.
 This objective is achieved by minimizing a cost function or index of performance. E(n)
defined in terms of the error signal ek(n) as:

18
 The step by Step adjustments to the synaptic weights of neuron k are continued until the
system reaches a steady state.
 At that point the learning process is terminated.
 Minimization of cost function leads to a learning rule commonly called referred to as the
Delta rule or Widrow- Hoff rule.
 According to the delta rule. the adjustment Δwkj(n), applied to the synaptic weight wkj at
time step n is defined by
 Where η is a positive constant that determines the rate of learning as we proceed from one
step in the learning process to another.
 It is therefore natural that we refer to η as the learning-rate parameter.

19
 Having computed the synaptic adjustment , ΔWkj(n), the updated value of synaptic weight
Wkj is determined by
 In effect, wkj(n)and wkj(n + 1 ) may be viewed as the old and new values of synaptic weight
wkj respectively. In computational terms we may also write.
 wkj(n)= z-1[Wkj(n+1)]
 z is the unit delay operator that is z-1 represents a storage element.
6. Memory-Based Learning
 In memory-based learning, all (or most) of the past experiences are explicitly stored in a
large memory of correctly classified input-output examples:
 [(xi, di)]Ni =1, where xi denotes an input vector and di denotes the corresponding desired
response.
 Without loss of generality, we have restricted the desired response to be a scalar.
 For example, in a binary pattern classification problem there are two classes of hypotheses,
denoted by E1and E2, to be considered.
 In this example, the desired response di takes the value 0 (or -1) for class E1 and the value
1 for class E 2.
 When classification of a test vector test (not seen before) is required, the algorithm
responds by retrieving and analyzing the training data in a “local neighborhood” of xtest.
 All memory-based learning algorithms involve two essential ingredients:
 a. Criterion used for defining the local neighbourhood of the test vector xtest.
 b. Learning rule applied to the training examples in the local neighbourhood of xtest.
 The algorithms differ from each other in the way in which these two ingredients are defined.
 In a simple yet effective type of memory-based learning known as the nearest neighbor
rule, the local neighbourhood is defined as the training example which lies in the immediate
neighbourhood of the test vector xtest. In particular, the vector.

20
 Where, d(xi, xtest ) is the Euclidean distance between the vectors xi and xtest.
 The class associated with the minimum distance, that is, vector x’N is reported as the
classification of xtest. This rule is independent of the underlying distribution responsible for
generating the training examples.
 Cover and Hart (1967) have formally studied the nearest neighbour rule as a tool for pattern
classification.
 The analysis is based on two assumptions:
 a. The classified examples (xi, di) are independently and identically distributed (iid),
according to the joint probability distribution of the example (x, d).
 b. The sample size N is infinitely large.
 Under these two assumptions, it is shown that the probability of classification error
incurred by the nearest neighbour rule is bounded about by twice the Bayes probability of
error, that is, the minimum probability of error over all decision rule.
 In this sense, it may be said that half the classification information in a training set of
infinite size is contained in the nearest neighbour, which is a surprising result.
 A variant of the nearest neighbour classifier is the k-nearest neighbour classifier,
which proceeds as:
 a. Identify the k classified patterns which lie nearest to the test vector xtest for some
integer k.
 b. Assign xtest to class (hypothesis) which is most frequently represented in the k
nearest neighbours to xtest (i.e., use a majority vote to make the classification).
 Thus, the k-nearest neighbour classifier acts like an averaging device.
(a) Two classifiers 1 and 0 (b) Test vector d

21
©: Classifying the test vector based on the highest voting
7. Hebbian Learning (Generalised Learning) Supervised Learning:
 Hebb’s postulate of learning is the oldest and the most famous of all learning rules; it is
named in honor of the neuropsychologist Hebb (1949).
 When an axon of cell A is near enough to excite a cell B and repeatedly or persistently
takes part in firing it, some growth process or metabolic changes take place in one or both
cells such that A’s efficiency as one of the cells firing B, is increased.
 Hebb proposed this change as a basis of associative learning (at the cellular level), which
would result in an enduring modification in the activity pattern of a spatially distributed
“assembly of nerve cells”.
This statement is made in a neurobiological context. We may expand and rephrase it as a
two-part rule:
 a. If two neurons on either side of a synapse are activated simultaneously (i.e.,
synchronously), then the strength of that synapse is selectively increased.
 b. If two neurons on either side of a synapse are activated asynchronously, then that
synapse is selectively weakened or eliminated.
 Such a synapse is called Hebbian synapse. More precisely, we define a Hebbian synapse
as a synapse which uses a time-dependent, highly local, and strongly interactive
mechanism to increase synaptic efficiency as a function of the correlation between the
presynaptic and postsynaptic activities.
Four key properties which characterise a Hebbian synapse:
 i. Time-Dependent Mechanism:
 This mechanism refers to the facts that the modifications in a Hebbian synapse
depend on the exact time of occurrence of the presynaptic and postsynaptic signals.

22
 ii. Local Mechanism:

 By its very nature, a synapse is the transmission site where information-bearing
signals (representing on going activity in the presynaptic and postsynaptic units)
are in spatio temporal contiguity. This locally available information is used by a
Hebbian synapse to produce a local synaptic modification which is input specific.
 iii. Interactive Mechanism:
 The occurrence of a change in a Hebbian synapse depends on signals on both sides
of the synapse. That is, a Hebbian form of learning depends on a “true interaction”
between presynaptic and postsynaptic signals in the sense that we cannot make a
prediction from either one of these two activities by itself.
 iv. Conjunctional or Correlation Mechanism:
 The condition for a change in synaptic efficiency is the conjunction of presynaptic
and postsynaptic signals.
 Thus, according to this interpretation, the co-occurrence, of presynaptic and
postsynaptic signals (within a short interval of time) is sufficient to produce the
synaptic modification.
 It is for this reason that a Hebbian synapse is sometimes referred to as a
conjunctional synapse or correlational synapse.
 Correlation – It is a mutual relationship or connection
 Positive Correlation: When presynaptic neuron and post synaptic neuron fires at
same time then we can call it as positive correlation. According to the Hebbis
Hypothesis there is increase in the strength of the synapse wkj.
 Negative Correlation: When two neurons fire unsynchronosly , that means when
neuron A excited B is in relaxed state and when B excited A is in relaxed state.
Then the result is weakening the synapse wkj.
 Uncorrelation: When both the neurons are in relaxed state. No change in the
synapse
 Classification of Synaptic Modifications( Synaptic Enhancement and Depression)
 Hebbian: It increases its strength with positively correlated pre synaptic and post
synaptic signals.
 Anti Hebbian: It weakens positively correlated pre synaptic and post synaptic signals
and strengthens negatively correlated pre synaptic and post synaptic signals. Both
Hebbian and Anti Hebbian synapses are time dependent, highly local and strong
interactive in nature.
 Non Hebbian: Does not involve hebbian or anti hebbian mechanisms

23
Mathematical Modelling
 Consider a synaptic weight wkj of neuron k with pre synaptic and post synaptic signals
denoted by Xj and yk respectively
 The adjustment applied to wkj at time n is expressed as

 Where F(-,-) is a function of both post and pre synaptic signals.
Hebb’s Hypothesis
 The change in weight is expressed as

 Where ƞ is the learning rate, and this is also referred as activity product rule.
Covariance Hypothesis
 The adjustment applied to the synaptic weight is defined as
 Where ƞ is the learning rate

 x̄ , ȳ Are the averages of values of x and y,
 In particular , it allows
 It allows a convergence to a nontrivial state at
 Xj= x̄ and yk = ȳ
 It allows a prediction of both synaptic enhancement and synaptic depression

24
8. Competitive Learning Unsupervised Learning:

 In competitive learning, as the name implies, the output neurons of a neural network
compete among themselves to become active.
 Whereas in a neural network based on Hebbian learning several output neurons may be
active simultaneously, in competitive learning only a single output neuron is active at any
one time.
 It is this feature which makes competitive learning highly suited to discover statistically
salient features which may be used to classify a set of input patterns.
 There are three basic elements to a competitive learning rule:
 i. A set of neurons which are all the same except for some randomly distributed
synaptic weights, and which therefore, respond differently to a given get of input
patterns.
 ii. A limit imposed on the ‘strength’ of each neuron.
 iii. A mechanism which permits the neurons to compete for the right to respond to
a given subset of inputs, such that only one output neuron or only one neuron per
group, is active (i.e., ‘on’) at a time. The neuron which wins the competition is
called a winner-takes-all neuron.
 Accordingly the individual neurons of the network learn to specialize on ensembles of
similar patterns; in so doing they become feature detectors for different classes of input
patterns.
 In the simplest form of competitive learning, the neural network has a single layer of output
neurons, each of which is fully connected to the input nodes. The network may include
feedback connections among the neurons.
 In the network architecture described herein, the feedback connections perform lateral
inhibition, with each neuron tending to inhibit the neuron to which it is laterally connected.
In contrast, the feed forward synaptic connections in the network of all are excitatory.

25
 For a neuron k to be the winning neuron, its induced local field vk for a specified input
pattern x must be the largest among all the neurons in the network. The output signal yk of
winning neuron k is set equal to one; the output signals of all the neurons which lose the
competition are set equal to zero.
 We thus write:
 where, the induced local field yk represents the combined action of all the forward and
feedback inputs to neuron k.
 Let ωkj denote the synaptic weight connecting input node j to neuron k. Suppose that each
neuron is allotted a fixed amount of synaptic weight (i.e., all synaptic weights are positive),
which is distributed among its input nodes that is, for all k.
 A neuron then learns by shifting synaptic weights from its inactive to active input nodes.
If a neuron does not respond to a particular input pattern, no learning takes place in that
neuron.
 If a particular neuron wins the competition, each input node of that neuron relinquishes
(voluntarily give up) some proportion of its synaptic weight, and the weight relinquished
is then distributed equally among the active input nodes. According to the standard
competitive learning rule, the change Δωkj applied to synaptic weight ωkj is defined by

26
 Where, ƞ is the learning rate parameter, xj is the jth input node, ωkj is the synaptic weight
of jth node to kth node.
 This rule has the overall effect of moving the synaptic weight vector ωk of winning neuron
k towards the input pattern y.

Unit 2: 1. Mathematical Foundations and Learning Mechanisms

Uploaded by

Copyright:

Available Formats

Unit 2: 1. Mathematical Foundations and Learning Mechanisms

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 2: 1. Mathematical Foundations and Learning Mechanisms

Uploaded by

Copyright:

Available Formats

1

SMGG, GUNTUR Dr. JAIDEEP GERA

SMGG, GUNTUR Dr. JAIDEEP GERA

SMGG, GUNTUR Dr. JAIDEEP GERA

 An inner product is a generalization of the dot product. In a vector space, it is a way to

 (i) Positivity: ║v║ ≥ 0, with ║v║ = 0 if and only if v = 0.

SMGG, GUNTUR Dr. JAIDEEP GERA

(a): u and v not equal to u and –v (b): u and v equal to u and –v

SMGG, GUNTUR Dr. JAIDEEP GERA

 So what is not continuous (also called discontinuous) ?

 It is not continuous because of the jump

 It is not continuous because of the vertical asymptote.

SMGG, GUNTUR Dr. JAIDEEP GERA

 A function is decreasing when its graph falls from left to right.

SMGG, GUNTUR Dr. JAIDEEP GERA

 Matrix Multiplication by colums

SMGG, GUNTUR Dr. JAIDEEP GERA

Eigen Value and Eigen Vectors

SMGG, GUNTUR Dr. JAIDEEP GERA

SMGG, GUNTUR Dr. JAIDEEP GERA

2. Revisiting Vectors and Matrices in linear Algebra

SMGG, GUNTUR Dr. JAIDEEP GERA

SMGG, GUNTUR Dr. JAIDEEP GERA

 Then the transpose of X is given by

SMGG, GUNTUR Dr. JAIDEEP GERA

3. State Space Concept

SMGG, GUNTUR Dr. JAIDEEP GERA

SMGG, GUNTUR Dr. JAIDEEP GERA

 Transcript of an optimization problem in a mathematical formulation is a critical step in

SMGG, GUNTUR Dr. JAIDEEP GERA

Optimization methods to solve problems

5. Error Correction Learning

SMGG, GUNTUR Dr. JAIDEEP GERA

SMGG, GUNTUR Dr. JAIDEEP GERA

SMGG, GUNTUR Dr. JAIDEEP GERA

(a) Two classifiers 1 and 0 (b) Test vector d

SMGG, GUNTUR Dr. JAIDEEP GERA

©: Classifying the test vector based on the highest voting

7. Hebbian Learning (Generalised Learning) Supervised Learning:

SMGG, GUNTUR Dr. JAIDEEP GERA

 ii. Local Mechanism:

SMGG, GUNTUR Dr. JAIDEEP GERA

 Where ƞ is the learning rate

SMGG, GUNTUR Dr. JAIDEEP GERA

8. Competitive Learning Unsupervised Learning:

SMGG, GUNTUR Dr. JAIDEEP GERA

SMGG, GUNTUR Dr. JAIDEEP GERA

SMGG, GUNTUR Dr. JAIDEEP GERA

You might also like