OptimalLinearFilters PDF

Random Variables
and Stochastic Processes
Randomness
The word random effectively means
unpredictable
In engineering practice we may treat some
signals as random to simplify the analysis
even though they may not actually be
random
2
Random Variable Defined

A random variable X ( ) is the assignment of numerical
values to the outcomes of experiments
Random Variables
Examples of assignments of numbers to the outcomes of
experiments.
Discrete-Value vs ContinuousValue Random Variables

A discrete-value (DV) random variable has a set of
distinct values separated by values that cannot
occur
A random variable associated with the outcomes
of coin flips, card draws, dice tosses, etc... would
be DV random variable
A continuous-value (CV) random variable may
take on any value in a continuum of values which
may be finite or infinite in size
5
Distribution Functions
The distribution function of a random variable X is the
probability that it is less than or equal to some value,
as a function of that value.
()
FX x = P X x
Since the distribution function is a probability it must satisfy

the requirements for a probability.
()
( ) = 0
0 FX x 1 , < x <
FX
()
( )
(x ) F (x )
and FX + = 1
P x1 < X x2 = FX
FX x is a monotonic function and its derivative is never negative.

6
The distribution function for tossing a single die
) ( ) ( )
( ) ( ) ( )
u x 1 + u x 2 + u x 3
FX x = 1 / 6
+ u x 4 + u x 5 + u x 6
() ( )
A possible distribution function for a continuous random
variable
Probability Density
The derivative of the distribution function is the probability
density function (pdf).
d
fX x
FX x
dx
( ( ))
()
Probability density can also be defined by
()
f X x dx = P x < X x + dx
Properties
()
f ( x ) dx = 1
f X x 0 , < x < +
x
( ) f ( ) d
FX x =
x2
()
P x1 < X x2 = f X x dx
x1
Proakis uses the notation p ( x ) instead of fX ( x ) for

probability density.

Probability Mass and Density

The pdf for tossing a die
10
Expectation and Moments

Imagine an experiment with M possible distinct outcomes
performed N times. The average of those N outcomes is
1 M
X = ni xi
N i=1
where xi is the ith distinct value of X and ni is the number of

times that value occurred. Then
M
M
ni
1 M
X = ni xi = xi = ri xi
N i=1
i=1 N
i=1
The expected value of X is

M
M
M
ni
E X = lim xi = lim ri xi = P X = xi xi
N
N
i=1 N
i=1
i=1
( )
11

The probability that X lies within some small range can be
approximated by

x
x
P xi
< X xi + f X xi x
2
2

and the expected value is then approximated by
( )
M

x
x
E X = P xi
< X xi +
xi xi f X xi x
2
2

i=1
i=1
( )
( )
where M is now the number of

subdivisions of width x
of the range of the random
variable.
12

In the limit as x approaches zero,
( ) x f ( x ) dx
E X =
Similarly
( ( )) = g ( x ) f ( x ) dx
E g X
The nth moment of a random variable is

( ) x f ( x ) dx
E Xn =
13

The first moment of a random variable is its expected value
( ) x f ( x ) dx
E X =
The second moment of a random variable is its mean-squared

value (which is the mean of its square, not the square of its
mean).
( ) x f ( x ) dx
E X2 =
14

A central moment of a random variable is the moment of
that random variable after its expected value is subtracted.
( )
E X E X
=

n
x E ( X )

()
f X x dx
The first central moment is always zero. The second central

moment (for real-valued random variables) is the variance,
( )
= E X E X =

2
X
x E ( X )
()
f X x dx
The positive square root of the variance is the standard

deviation.
15

Properties of expectation
()
( )
( )
E a = a , E aX = a E X

, E Xn = E Xn
n

n
( )
where a is a constant. These properties can be use to prove

the handy relationship,
( )
( )
2X = E X 2 E 2 X
The variance of a random variable is the mean of its square

minus the square of its mean.
16

For complex-valued random variables absolute moments are useful.
The nth absolute moment of a random variable is defined by
( ) = x f ( x) dx
E X
and the nth absolute central moment is defined by
( )
E X E X
=

( )
xE X
()
f X x dx
17
Joint Probability Density

Let X and Y be two random variables. Their joint distribution
function is
FXY x, y P X x Y y
( ) (
( )
0 FXY x, y 1 , < x < , < y <
( ) (
( , ) = 1
FXY , = FXY x, = FXY , y = 0

FXY
( )
FXY x, y does not decrease if either x or y increases or both increase
( )
()
FXY , y = FY y and FXY x, = FX x
18

Joint distribution function for tossing two dice
19

f ( x, y ) =
F ( x, y ))
(
x y
2
XY
XY
( )
f XY x, y 0 , < x < , < y <

f ( x, y ) dxdy = 1
XY

( ) f ( , ) d d
FXY x, y =
XY

( ) f ( x, y ) dy and f ( y ) = f ( x, y ) dx
P (( X ,Y ) R ) = f ( x, y ) dxdy
fX x =
XY
XY
XY
) f ( x, y ) dxdy
P x1 < X x2 , y1 < Y y2 =
((
E g X ,Y

y2 x2
XY
y1 x1
)) = g ( x, y ) f ( x, y ) dxdy
XY

20
Independent Random Variables

If two random variables X and Y are independent then
( )
() ( )
f XY x, y = f X x fY y
and their correlation is the product of their expected values

( ) xy f ( x, y ) dxdy = y f ( y ) dy x f ( x ) dx = E ( X ) E (Y )
E XY =
XY

21

Covariance
XY
( )
( )
*

E X E X Y E Y
( x E ( X ))( y E (Y )) f ( x, y ) dxdy
= E ( XY ) E ( X ) E (Y )

XY

XY
If X and Y are independent,
( ) ( ) ( ) ( )
XY = E X E Y * E X E Y * = 0
22

If two random variables are independent, their covariance is
zero.
However, if two random variables have a zero covariance
that does not mean they are necessarily independent.
Independence Zero Covariance

Zero Covariance Independence
23

In the traditional jargon of random variable analysis, two
uncorrelated random variables have a covariance of zero.
Unfortunately, this does not also imply that their correlation is
zero. If their correlation is zero they are said to be orthogonal.
X and Y are "Uncorrelated" XY = 0
( )
X and Y are "Uncorrelated" E XY = 0

24

The variance of a sum of random variables X and Y is
2X +Y = 2X + Y2 + 2 XY = 2X + Y2 + 2 XY X Y
If Z is a linear combination of random variables X i
N
Z = a0 + ai X i
then
i=1
( )
( )
E Z = a0 + ai E X i
N
i=1
N
Z2 = ai a j X X = ai2 2X + ai a j X X
i=1 j=1
i=1
i=1 j=1
i j
25

If the Xs are all independent of each other, the variance of
the linear combination is a linear combination of the variances.
N
Z2 = ai2 2X
i=1
If Z is simply the sum of the Xs, and the Xs are all independent
of each other, then the variance of the sum is the sum of the
variances.
N
= 2X
2
Z
i=1
26
The Central Limit Theorem

If N independent random variables are added to form a resultant
random variable Z
N
Z = Xn
n=1
then
()
()
()
()
f Z z = f X z f X z f X z f X
1
(z)
and it can be shown that, under very general conditions, the pdf
of a sum of a large number of independent random variables
with continuous pdfs approaches a limiting shape called the
Gaussian pdf regardless of the shapes of the individual pdfs.
27
28

The Gaussian pdf
()
fX x =
( )
X 2
x X
)2 /2 2X
( )
2

X = E X and X = E X E X

29

The Gaussian pdf
Its maximum value occurs at the mean value of its
argument
It is symmetrical about the mean value
The points of maximum absolute slope occur at one
standard deviation above and below the mean
Its maximum value is inversely proportional to its
standard deviation
The limit as the standard deviation approaches zero is a
unit impulse
2
1
( x X ) /2 2X
x x = lim
e
X 0
X 2
30

The normal pdf is a Gaussian pdf with a mean of zero and
a variance of one.
1 x 2 /2
fX x =
e
2
()
The central moments of the Gaussian pdf are

0
, n odd
n

E X E X =

1 35 n 1 n , n even
X
( )
31
Stochastic Processes
()
A random variable is a number X assigned to every outcome

of an experiment.
( )
A stochastic process is the assignment of a function of t X t,

to each outcome of an experiment.
{ ( ) ( )
)}
The set of functions X t,1 , X t, 2 ,, X t, N corresponding

to the N outcomes of an experiment is called an ensemble and each
member X t, i is called a sample function of the stochastic
process.
( )
A common convention in the notation describing stochastic

processes is to write the sample functions as functions of t only
and to indicate the stochastic process by X t instead of X t,
and any particular sample function by X i t instead of X t, i 32.
()
()
( )
( )
Stochastic Processes
Ensemble
Sample
Function
()
The values of X t at a particular time t1 define a random variable

X t1 or just X 1 .
()
33
Example of a Stochastic Process

Suppose we place a
temperature sensor at
every airport control
tower in the world
and record the
temperature at noon
every day for a year.
Then we have a
discrete-time,
continuous-value
(DTCV) stochastic
process.
34
Example of a Stochastic Process

Suppose there is a large number
of people, each flipping a fair
coin every minute. If we assign
the value 1 to a head and the
value 0 to a tail we have a
discrete-time, discrete-value
(DTDV) stochastic process
35
Continuous-Value vs. Discrete-Value

A continuous-value (CV)
random process has a pdf
with no impulses. A discretevalue (DV) random process
has a pdf consisting only of
impulses. A mixed random
process has a pdf with
impulses, but not just
impulses.
36
Deterministic vs. NonDeterministic

A random process is deterministic if a sample function
can be described by a mathematical function such that its
future values can be computed. The randomness is in the
ensemble, not in the time functions. For example, let the
sample functions be of the form,
()
X t = Acos 2 f0t +
and let the parameter be random over the ensemble but

constant for any particular sample function.
All other random processes are non-deterministic.
37
Stationarity
If all the mltivariate statistical descriptors of a random
process are not functions of time, the random process is
said to be strict-sense stationary (SSS).
A random process is wide-sense stationary (WSS) if
( ( )) is independent of the choice of t
E X t1
and
( ( ) ( ))
E X t1 X t2 depends only on the difference between t1 and t2

38
Ergodicity
If all of the sample functions of a random process have the
same statistical properties the random process is said to be
ergodic. The most important consequence of ergodicity is that
ensemble moments can be replaced by time moments.
( )
E Xn
T /2
()
1
n
= lim
X
t dt

T T
T /2
Every ergodic random process is also stationary.

39
The Correlation Function

If X(t) is a sample function of one stochastic CT process and Y(t) is
a sample function from another stochastic CT process and
()
( )
X 1 = X t1 and Y2 = Y t2
then
( )

)
R XY t1 ,t2 = E X 1Y2* =
X 1Y2* f XY x1 , y2 ;t1 ,t2 dx1dy2

is the correlation function relating X and Y. For stationary stochastic

CT processes this can be simplified to
()
( ( ) ( ))
R XY = E X t Y * t +
If the stochastic process is also ergodic then the time correlation

function is
T /2
1
*
*
RXY = lim
X
t
Y
t
+

dt
=
X
t
Y
t + = R XY
40
T T
T /2
()
() (
() (
()
Autocorrelation
If X and Y represent the same stochastic CT process then the
correlation function becomes the special case called
autocorrelation.
()
() (
R X = E X t X * t +
For an ergodic stochastic process,
()
T /2
() (
() (
1
*
*
X
t
X
t
+

dt
=
X
t
X
t +

T T
T /2
RX = lim
( Proakis uses the notation
xy
()
= RX
( ) for correlation.)
41
Autocorrelation
( ( ))
( )
R X t,t = E X 2 t
Meansquared
value of X
For WSS stochastic CT processes
()
( ( ))
RX 0 = E X t
2
Meansquared
value of X
()
1
and RX 0 = lim
T T
T
2
T
2
()
()
X 2 t dt = X 2 t
Average
Signal
Power of X
42
The Correlation Function

If X n is a sample function of one stochastic DT process and Y n
is a sample function from another stochastic DT process and
X 1 = X n1 and Y2 = Y n2
then
R XY n1 , n2 = E X 1Y2* =

) XY
*
1 2

f XY x1 , y2 ;n1 , n2 dx1dy2
is the correlation function relating X and Y. For stationary stochastic

DT processes this can be simplified to
R XY m = E X n Y * n + m
If the stochastic DT process is also ergodic then the time correlation

function is
1 N 1
*
*

Y
Y
RXY m = lim
X
n
n
+
m
=
X
n

n + m = R43XY m
N 2N
n= N
Autocorrelation
If X and Y represent the same stochastic DT process then the
correlation function becomes the special case called
autocorrelation.
R X m = E X n X * n + m
For an ergodic stochastic DT process,

1

RX m = lim
N 2N
N 1
n= N
X n X * n + m = X n X * n + m = R X m
44
Autocorrelation
R X n, n = E X 2 n
Meansquared
value of X
For WSS stochastic DT processes
1
R X 0 = E X 2 n and RX 0 = lim
N 2N
Meansquared
value of X
N 1
n= N
X 2 n = X 2 n
Average
Signal
Power of X
45
Properties of Autocorrelation
Autocorrelation is an even function
()
( )
R X = R X or R X m = R X m
The magnitude of the autocorrelation value is never greater

than at zero delay.
()
()
R X R X 0 or R X m R X 0
()
If X has a non-zero expected value then R X or R X m

will also and it will be the square of the expected value of X.
()
If X has a periodic component then R X or R X m will

also, with the same period.
46
Properties of Autocorrelation
{ ( )} is ergodic with zero mean and no periodic components
If X t
then
()
lim R X = 0 or lim R X m = 0

m
Only autocorrelation functions for which
{ ( )} 0 for all f or F {R
F RX
m 0 for all
are possible
A time shift of a function does not affect its autocorrelation
47
Autocovariance
Autocovariance is similar to autocorrelation. Autocovariance
is the autocorrelation of the time-varying part of a signal.
()
()
( )
( )
C X = R X E 2 X or C X m = R X m E 2 X
48
Crosscorrelation
Properties
R XY = R YX
()
()
R XY R X
( )
(0) R (0)
Y
or R XY m = R YX m
or R XY m R X 0 R Y 0
If two stochastic processes X and Y are statistically independent
()
( ) ( )
()
R XY = E X E Y * = R YX
( ) ( )
or R XY m = E X E Y * = R YX m
If X is stationary CT and X is its time derivative
( ( ))
( ) ( ( )) R ( )
If Z ( t ) = X ( t ) Y ( t ) and X and Y are independent and at least
R XX
d
=
RX
d
one of them has a zero mean
()
X
()
d2
= 2 RX
d
()
R Z = R X + RY
49
Power Spectral Density

In applying frequency-domain techniques to the analysis of
random signals the natural approach is to Fourier transform
the signals.
Unfortunately the Fourier transform of a stochastic process
does not, strictly speaking, exist because it has infinite
signal energy.
But the Fourier transform of a truncated version of a
stochastic process does exist.
50

For a CT stochastic process let
X t , t T / 2
XT t =
= X t rect t / T
, t > T / 2
0
The Fourier transform is
()
()
()
( ( )) ( )
F XT t =
X T t e j 2 ft dt , T <
Using Parsevals theorem,

T /2
T /2
()
XT t
dt =
( ( ))
F XT t
df
Dividing through by T,
T /2
()
( ( ))
1
1
2
X T t dt = F X T t

T T /2
T
df
51

T /2
()
( ( ))
1
1
2
X T t dt = F X T t

T T /2
T
df
Average signal power

over time, T
If we let T approach infinity, the left side becomes the average

power over all time. On the right side, the Fourier transform
is not defined in that limit. But it can be shown that even
though the Fourier transform does not exist, its expected value
does. Then
1 T /2 2

1
E
X T t dt = E F X T t

T T /2
T
()
( ( ))

df

52

Taking the limit as T approaches infinity,
T /2
( )
( ( ))
1
1

2
lim
E
X
dt
=
lim
E
F XT t

T T
T T

T /2

df
( ( ))
2

F
X
t
T

2
E X
= lim E
df

T
T

The integrand on the right side is identified as power spectral

density (PSD).
2

F XT t

G X f = lim E
T

T

( )
( ( ))
( )
( Proakis uses the notation ( F ) for power spectral density.)

XX
53

G ( f ) df = mean squared value of {X (t )}

X
G ( f ) df = average power of {X (t )}
X
PSD is a description of the variation of a signals power versus

frequency.
PSD can be (and often is) conceived as single-sided, in which
all the power is accounted for in positive frequency space.
54
PSD Concept
55
PSD of DT Stochastic Processes

F ( X n)

G F = lim E
2
or
( )

N
G ( F ) dF = mean squared value of {X n }

1
( )
1
G X d = mean squared value of X n

2 2
where the Fourier transform is the discrete-time Fourier transform
(DTFT) defined by
x n = F
1
x
n = F
1
( X ( F )) =
X ( F ) e
j 2 Fn
( ( ))
1
X =
2
2
( )
( )
) x n e
dF X F = F x n =
X e
jn
( )
n=
) x
n e
d X = F x
n =
F
j 2 Fn
n=
56
jn
PSD and Autocorrelation

It can be shown that PSD and autocorrelation form a Fourier
transform pair.
( )
( ( )) or G ( F ) = F ( R
GX f = F RX
m
57
White Noise
White noise is a stochastic process whose PSD is constant.
( )
( )
G X f = A or G X F = A
For CT signals, signal power is the integral of PSD over all
frequency space. Therefore the power of white noise is infinite.
( ) Adf
E X2 =
No real physical process may have infinite signal power.

Therefore CT white noise cannot exist. However, many real
and important CT stochastic processes have a PSD that is almost
constant over a very wide frequency range.
58
White Noise
In many kinds of CT noise analysis, a type of random variable
known as bandlimited white noise is used. Bandlimited
white noise is simply the response of an ideal lowpass filter
which is excited by white noise.
The PSD of bandlimited white noise is constant over a finite
frequency range and zero outside that range.
Bandlimited white noise has finite signal power.
59
Cross Power Spectral Density

PSD is the Fourier transform of autocorrelation.
Cross power spectral density is the Fourier transform of
cross correlation.
()
( )
( )
F
F
R XY t
G XY f or R XY n
G XY F
Properties:
G XY f = G *YX f or G XY F = G *YX F
( )
( )
( )
( )
Re ( G ( f )) and Re ( G ( f )) are both even
Im ( G ( f )) and Im ( G ( f )) are both odd
XY
XY
YX
YX
60
Time-Domain Linear System

Analysis
For any linear, time-invariant (LTI) system, the response
y is the convolution of the excitation x with the
impulse response h.
() () ()
y t = x t h t or y n = x n h n
In the case of non-deterministic random processes this

operation cannot be done because the signals are random
and cannot, therefore, be described mathematically.
()
()
If X t excites a system and Y t is the response then the

convolution integral is
( ) X (t ) h ( ) d
Y t =
61

Analysis
We cannot directly evaluate
( ) X (t ) h ( ) d
Y t =
but we can find the expected value.

E Y t = E X t h d

( ( ))
) ()
If the random process is bounded and the system is stable

( ( )) E ( X (t )) h ( ) d
E Y t =
62

Analysis
If the random process X is stationary
( ( )) E ( X (t )) h ( ) d E ( Y ) = E ( X ) h ( ) d
E Y t =
Using
h (t ) dt = H (0) E (Y ) = E ( X ) H (0)
where H is the Fourier transform of h, we see that the

expected value of the response is the expected value of
the excitation multiplied by the zero-frequency response
of the system. If the system is DT the corresponding

result is
E Y = E X h n
63
( )
( )
n=

Analysis
It can be shown that the autocorrelation of the excitation and the
autocorrelation of the response are related by
()
() () ( )
R Y = R X h h or R Y m = R X m h m h m
This result leads to a way of thinking about the analysis of

LTI systems with random excitation.
64

Analysis
It can be shown that the cross correlation between the excitation
and the response is
()
() ()
R XY = R X h or R XY m = R X m h m
and
()
() ( )
R YX = R X h or R YX m = R X m h m
65
Frequency-Domain Linear
System Analysis
The frequency-domain relationship between excitation and
response of an LTI system is the Fourier transform of the
time-domain relationship.
()
() () ( )
( )
( ) ( ) ( )
( ) ( )
F
R Y = R X h h
G Y f = G X f H f H* f = G X f H f
( )
( ) ( ) ( )
( ) ( )
F
R Y m = R X m h m h m
G Y F = G X F H F H* F = G X F H F
The mean-squared value of the response is

( ) = G ( f ) df = G ( f ) H ( f )
E (Y ) = G ( F ) dF = G ( F ) H ( F )
E Y
df
dF
66
Frequency-Domain Linear
System Analysis
67
White Noise Excitation of LTI

Filters
Consider an LTI filter excited by white noise w [ n ] with response x [ n ].
The autocorrelation of the response is
R xx [ m ] = R ww [ m ] h [ m ] h [ m ].
If the excitation is white noise its
autocorrelation is
R xx [ m ] = w2 [ m ] h [ m ] h [ m ] .
The power spectral density of the response is
G xx ( F ) = G ww ( F ) H ( F ) = G ww ( F ) H ( F ) H * ( F )
2
If the inverse 1 / H ( z ) exists and that system is excited by x [ n ]

the response is w [ n ].
68

Filters
If the transfer function is the most common form, a ratio of
polynomials in z, the power spectral density of the response is
*
B
F
B
F)
(
)
(
2
G xx ( F ) = w
A ( F ) A* ( F )
q
k=0
k=1
F
F
where bk w [ n k ]
B ( F ) and 1 + ak x [ n k ]
A(F )
and where the excitation and response are related by the difference
p
k=1
k=0
equation x [ n ] + ak x [ n k ] = bk w [ n k ].
69

Filters
1. If b0 = 1 and bk = 0 , k > 0 the frequency response of the system
is H ( F ) = 1 / A ( F ) , it has no finite zeros and it is called an
autoregressive (AR) system.

2. If ak = 0 , k > 0 the frequency response of the system is
H ( F ) = B ( F ) it has no finite poles and it is called a moving

average (MA) system.
3. In the general case of both finite poles and finite zeros the
system is called an autoregressive moving average (ARMA)
system.
70

Filters
p
k=1
k=0
If we multiply both sides of x [ n ] + ak x [ n k ] = bk w [ n k ]

by x* [ n + m ] and then take the expectation of both sides we get
p
k=1
k=0
R xx [ m ] = ak R xx [ m + k ] + bk R wx [ m + k ].
Using x [ n ] = h [ n ] w [ n ] =
h [ q ] w [ n q ] we can show that
q=
R wx [ m ] = h [ m ] R ww [ m ] and, if w [ n ] is white noise

R wx [ m ] = h [ m ] w2 [ m ] = w2 h [ m ].
71

Filters
p
k=1
k=0
Combining R xx [ m ] = ak R xx [ m + k ] + bk R wx [ m + k ]
with R wx [ m ] = w2 h [ m ] we get
p
k=1
k=0
R xx [ m ] = ak R xx [ m k ] + bk w2 h [ k m ]
For AR systems,
p
R xx [ m ] = ak R xx [ m k ] + w2 h [ m ]
k=1
For MA systems,
q
R xx [ m ] = w2 bk h [ k + m ]
k=0
72
(These correspond to Eqs. 12.2.18, 12.2.19 and 12.2.21 in Proakis.)

Filters
Example
1
An AR system with frequency response H ( F ) =
1 0.2 e j 2 F
is excited by white noise with R ww [ m ] = w2 [ m ]. Find the
impulse response and autocorrelation of the output signal.
1
n
H(z) =

h
n
=
0.2
u[n]
[
]
j 2 F
1 0.2 e
1
1
*
2
2
G xx ( F ) = w H ( F ) H ( F ) = w
1 0.2 e j 2 F 1 0.2 e j 2 F
w2
m
G xx ( F ) =
R xx [ m ] = 1.042 w2 ( 0.2 )
1 0.4 cos ( 2 F ) + 0.04
73

Filters
Example
R xx [ m ]
1.042 ( 0.2 )
0.2 )
(
=
=
m1 = 0.2 , m > 0
m1
R xx [ m 1] 1.042 ( 0.2 )
( 0.2)
Using the result for MA systems
2
w
2
w
R xx [ m ] = ak R xx [ m k ] + w2 h [ m ]
k=1
p
R xx [ m ] = ak R xx [ m k ] , m > 0
k=1
R xx [ m ]
R xx [ m ] = 0.2 R xx [ m 1]
= 0.2 , m > 0 Check.
R xx [ m 1]
74

Filters
Example
p
R xx [ m ] = ak R xx [ m k ] + w2 h [ m ]
k=1
p
R xx [ 0 ] = ak R xx [ k ] + w2 , m = 0
k=1
R xx [ 0 ] = 0.2 R xx [ 1] + w2
1.042 w2 = ( 0.2 )1.042 w2 ( 0.2 ) + w2 = 1.042 w2 Check
75

Filters
Example
p
R xx [ m ] = ak R xx [ m k ] + w2 h [ m ] , m < 0
k=1
1.042 w2 ( 0.2 ) = 0.2 1.042 w2 ( 0.2 )

m
m1
+ w2 ( 0.2 )
1.042 w2 ( 0.2 ) = 0.2 1.042 w2 ( 0.2 ) + w2 ( 0.2 )

1.042 = 0.2 1.042 0.2 + 1 = 1.042 Check
m
1m
m
u [ m ]
m
76

Filters
Example
An MA system with frequency response H ( F ) = 1 + 0.8 e j 2 F
is excited by white noise with autocorrelation w2 [ n ]. Find

its impulse response and the autocorrelation of the output signal.
H ( F ) = 1 + 0.8 e j 2 F h [ n ] = [ n ] + 0.8 [ n 1]
R xx [ m ] = w2 [ m ] h [ m ] h [ m ]
= w2 (1.64 [ m ] + 0.8 [ m 1] + 0.8 [ m + 1])

q
Using the result R xx [ m ] = w2 bk bk+m , 0 k + m q

k=0
for MA systems,
R xx [ m ] =
2
w
b b
k k+m
k=0
, 0 k + m 1
77

Filters
Example
(
R xx [ 0 ] = w2 ( b0 b0 + b1b1 ) = w2 1 + 0.82 = 1.64 w2

R xx [1] = w2 ( b0 b1 ) = 0.8 w2
R xx [ 1] = w2 ( b1b0 ) = 0.8 w2

Check.
78
Forward and Backward Linear

Prediction
Forward linear prediction is estimating a signal value at time n
from a linear combination of signal values at previous times
p
n 1, n 2, ,n p. The estimate is x [ n ] = a p [ k ] x [ n k ]

k=1
where the "hat" on x means "an estimate of x". The difference

between x [ n ] and x [ n ] is called the forward prediction error
p
f p [ n ] = x [ n ] x [ n ] = x [ n ] + a p [ k ] x [ n k ]
k=1
79

Prediction
A one-step forward linear predictor
80

Prediction
A p-step prediction-error filter
81

Prediction
A forward linear predictor can be realized as a lattice.
f0 [ n ] = g0 [ n ] = x [ n ]
fm [ n ] = fm1 [ n ] + K m gm1 [ n 1] , m = 0,1,2,, p
gm [ n ] = K m* fm1 [ n ] + gm1 [ n 1] , m = 0,1,2,, p

(Notice that the reflection coefficients are no longer identical in
a single stage but are instead complex conjugates. This is done
82
to handle the case of complex-valued signals.)

Prediction
From chapter 9
0 ( z ) = 0 ( z ) = 1
m ( z ) = m1 ( z ) + K m z 1m1 ( z )
m ( z ) = z m m (1 / z )
m ( z ) K m m ( z )
m1 ( z ) =
1 K m2
Km = m [m]
83

Prediction
The mean-squared forward prediction error is
E fp [n]

p
p *
*
= E a p [ k ] x [ n k ] E a p [ q ] x [ n q ]
k=0
q=0

where it is understood that a p [ 0 ] = 1. This can be reduced to
E fp [n]
= a p [ k ] a *p [ q ] R xx [ k q ]
k=0 q=0
which can also be written (after considerable algebra) as
E fp [n]
p

= R xx [ 0 ] + 2 Re a p [ k ] R xx [ k ]
k=1

p
+ a p [k]
k=1

R xx [ 0 ] + 2 Re

a p [ k ]a [ q ] R xx [ k q ]
p
*
p
k=q+1 q=1
84

Prediction
It can be shown that minimizing the mean-squared error leads
a set of linear equations
p
R xx [ l ] = a p [ k ] R xx [ l k ] , l = 1,2,, p
k=1
known as the normal equations. If the autocorrelation is known

the "a" coefficients can be found from this set of equations.
The minimum mean-sqared error is
{(
min E f p [ n ]
)}
= E pf = R xx [ 0 ] + a p [ k ] R xx [ k ]
k=1
85

Prediction
We can also "predict" in the backward direction. We can estimate
x [ n p ] from the values of x [ n ] , x [ n 1] , , x [ n p + 1].
The estimate is
p1
x [ n p ] = b p [ k ] x [ n k ]
k=0
and the backward "prediction" error is

p1
g p [ n ] = x [ n p] + b p [ k ] x [ n k ] .
k=0
The minimum mean-squared error is the same as in the forward

prediction case
((
min E g p [ n ]
))
= E pg = E pf
86
Optimal Reflection Coefficients

The forward prediction error in a lattice filter is
fm [ n ] = fm1 [ n ] + K m gm1 [ n 1]
Its mean-squared value is
E fm [ n ]
) = E {(f
m1
[ n ] + K m gm1 [ n 1]) ( fm1 [ n ] + K m gm1 [ n 1])}
It can be shown that the optimal reflection coefficient to minimize the

mean-squared error is
Km =
E ( fm1 [ n ] gm1 [ n 1])

f
b
Em1
Em1
This is the negative of the correlation coefficient between the

forward and backward errors. Then the prediction errors can
f
be recursively computed by Emf = 1 K m2 Em1
.
87
AR Processes and Linear

Prediction
There is a one-to-one correspondence between the parameters in
an AR process and the predictor-coefficients of a p-th order
predictor. If the process is actually AR and pth order, they are the
same. If the output signal from an AR process excited by white
noise is applied to the corresponding predictor, the output signal
from the predictor will be white noise. So the prediction filter is
often called a whitening filter.
88
Properties of Linear PredictionError Filters

It can be shown that if the reflection coefficients in a lattice-type
linear prediction-error filter are all less than one in magnitude, that
the zeros of A(z) all lie inside the unit circle. This makes A(z)
minimum-phase. All the zeros of B(z) lie outside the unit circle
making it maximum-phase.
89
Wiener Filters
Below is a general system model for an optimal linear filter
called a Wiener filter. It makes the best estimate of d [ n ] based
on x [ n ] , which contains a signal s [ n ] plus noise w [ n ] , and
the autocorrelation functions for s [ n ] and w [ n ]. The estimation
error is e [ n ].
If d [ n ] = s [ n ] , the estimation problem is called filtering.

If d [ n ] = s [ n + n0 ] , n0 > 0, it is called prediction.
If d [ n ] = s [ n n0 ] , n0 > 0, it is called smoothing.
90
Wiener Filters
The filter is optimized by minimizing the mean-squared
estimation error
2
M 1

2
E e [n] = E d[n] h[m] x[n m] .
m=0

A set of equations called the Wiener - Hopf equations
M 1
h [ m ] R [l m ] = R [ l ]
xx
dx
, l = 0,1,, M 1
m=0
is used to find the impulse response of the optimal linear filter.

M 1

h ( k ) xx ( l k ) = dx ( l ) . The sign
In Proakis' notation,
k=0

difference
on
the
right
is
caused
by
using
a
different
definition

of autocorrelation. But both sets of equations yield an optimal
91

impulse response.

Wiener Filters
The Wiener-Hopf equations can be compactly written in
matrix form as R M h M = rd , where R M is an M M matrix
with elements R xx [ l m ] and rd is an M 1 matrix with

elements R dx [ l ]. The solution is h M = R 1
M rd and the
minimum mean-squared error achieved is
E e [n]
min
= d2 rdT R 1
M rd
If d [ n ] = s [ n ] and if s [ n ] and w [ n ] are independent the

Wiener-Hopf equations reduce to
M 1
h [ m ]( R [l m ] + R [l m ]) = R [ l ]
ss
ww
ss
, l = 0,1,, M 1
m=0
92
Wiener Filters
The Wiener-Hopf equations for IIR filters are similar to those
for FIR filters except that the impulse response
h [ k ] R [l m ] = R [ l ]
xx
dx
, l0
m=0
has an infinite duration and the minimum mean-squared error is

MMSE = d2 h opt [ m ] R *dx [ m ]

m=0
93
Wiener Filters
A stationary random process x [ n ] with autocorrelation R xx [ m ]
and power spectral density G xx ( F ) can be represented by an
equivalent innovations process i [ n ] by passing x [ n ] through
a whitening filter with transfer function 1 / G min ( z ) where G min ( z )

is the minimum-phase part from spectral factorization of G xx ( F )
G xx ( F ) = i2 G min ( F ) G max ( F ) = i2 G min ( F ) G min ( F )
94
Wiener Filters
It can be shown that the optimal IIR causal Wiener filter has the
frequency response
G*dx ( F )
H opt ( F ) = 2
i G min ( F ) G min ( F ) +
where G dx ( F ) is the cross power spectral density between d [ n ]
1
and x [ n ] and the subscript "+" on the square brackets means "the
causal part".
95
Wiener Filters
IIR Wiener Filter Example (An extension of Example 12.7.2 in Proakis)
Let x [ n ] = s [ n ] + w [ n ] where s [ n ] is an AR process that
satisfies the equation s [ n ] = 0.6s [ n 1] + v [ n ] where v [ n ]
is a white noise sequence with variance v2 = 0.64 and w [ n ]

is a white noise sequency with variance w2 = 1. Design a
Wiener filter to optimally estimate the signal s [ n ] and delayed
versions of the signal s [ n n0 ].
The system impulse response is h [ n ] = ( 0.6 ) u [ n ] and the

n
transfer function is H ( z ) =
1
z
=
.
1
1 0.6z
z 0.6
96
Wiener Filters
The power spectral density of s [ n ] is

G ss ( z ) = G vv ( z ) H ( z ) = H ( z )
2
2
v
1
1
0.64
= 0.64
=
1
1 0.6z 1 0.6z 1.36 0.6 z 1 + z
The power spectral density of x [ n ] is
2 0.6 z 1 + z
0.64
G xx ( z ) = G ss ( z ) + G ww ( z ) =
+1=
1
1.36 0.6 z 1 + z
1.36 0.6 z + z
This can be spectrally factored into the form
a bz ) ( a bz )
(
(z) =
(1 0.6z )(1 0.6z )
1
G xx
1
97
Wiener Filters
After spectral factorization

G xx
(
( z ) = 1.8
)
(1 0.6z )(1 0.6z )
G ( z ) G ( z ) , = 1.8 and
1 (1 / 3) z 1 (1 (1 / 3) z )
Therefore, if G xx ( z ) = i2
1
1
min
min
2
i
1 (1 / 3) z 1 z 1 / 3
G min ( z ) =
=
1
1 0.6z
z 0.6
The cross correlation between d [ n ] and x [ n ] is the same as the
cross correlation between s [ n ] and x [ n ] because we are doing
filtering and d [ n ] = s [ n ].
98
Wiener Filters
R dx [ m ] = R sx [ m ] = E ( s [ n ] x [ n + m ]) = E s [ n ]( s [ n + m ] + w [ n + m ])
R dx [ m ] = E ( s [ n ] s [ n + m ]) + E ( s [ n ] w [ n + m ]) = R ss [ m ] + R sw [ m ] = R ss [ m ]

=0
Therefore G dx ( z ) = G ss ( z ) =
( )
( )
G dx z 1
and
1
G
z
min
0.64
0.64
=
1.36 0.6 z 1 + z
1 0.6z 1 (1 0.6z )
0.64

(1 0.6z ) 1 0.6z 1
=
1 (1 / 3) z

1 0.6z
) (

0.64
=
(1 (1 / 3) z ) 1 0.6z 1

+
99

+
Wiener Filters
We want to split this into the causal and anti-causal parts and retain only
the causal part. The causal part has the poles inside the unit circle. So we
want a partial-fraction expansion of the form
0.64
K1 z
K2
=
+
1
(1 (1 / 3) z ) 1 0.6z 1 (1 / 3) z 1 0.6z 1
Solving, K1 = 0.8 / 3, K 2 = 0.8 and the causal part is
( )
( )
G dx z 1
0.8
=

1
1
1

0.6z
G
z
min
+
0.8
0.8
1
1
4/9
H opt ( z ) =
=
=
1
1
1
1 0.6z
1.8 1 (1 / 3) z
1 (1 / 3) z 1
2 1 (1 / 3) z
i
1 0.6z 1
h opt [ n ] = ( 4 / 9 ) (1 / 3) u [ n ]
n
100
Wiener Filters
Next consider the case in which we are not filtering but instead smoothing.
Now the cross correlation between d [ n ] and x [ n ] is not the same as the
cross correlation between d [ n ] and x [ n ]. Let d [ n ] = s [ n n0 ]. Then
(
) (
[ m ] = E (s [ n n ] s [ n + m ]) + E (s [ n n ] w [ n + m ])
R dx [ m ] = E s [ n n0 ] x [ n + m ] = E s [ n n0 ]( s [ n + m ] + w [ n + m ])
R dx
= R ss [ m + n0 ] + R sw [ m + n0 ] = R ss [ m + n0 ]

=0
101
Wiener Filters
0.64z n0
0.64z n0
Therefore G dx ( z ) = G ss ( z ) z =
=
1
1.36 0.6 z + z
1 0.6z 1 (1 0.6z )
n0

0.64z n0
G dx z 1 (1 0.6z ) 1 0.6z 1

and
=

1
1 (1 / 3) z
G
z
min
+

1 0.6z

Expanding in partial fractions as before
( )
( )
0.64z n0
n0
=
z
(1 (1 / 3) z ) 1 0.6z 1
) (

n0
0.64z
=
(1 (1 / 3) z ) 1 0.6z 1

+
0.8z
0.8z

+
z 3 z 0.6
102

+
Wiener Filters
0.8z
0.8z
n
n
Z
0.8 ( 3) u ( n + 1) + 0.8 ( 0.6 ) u [ n ]

+
z 3 z 0.6
0.8z
nn
nn
0.8z
Z
0.8 ( 3) 0 u ( n n0 + 1) + 0.8 ( 0.6 ) 0 u [ n n0 ]
z n0
+
z 3 z 0.6
The causal part of the inverse transform is
0.8 ( 3)
nn0
u ( n n0 + 1) + 0.8 ( 0.6 )
which can be written as

0.8 ( 3)
nn0
nn0
u [ n n0 ] u [ n ]
( u [ n ] u [ n n ]) + 0.8 ( 0.6)
0
nn0
u [ n n0 ]
Its z transform is
1 ( 3) n0 z n0
z n0
+
0.8
1
1
1
/
3

z
1

0.6z
3
(
)

103
Wiener Filters
( ) = ( 0.8 / 3) (1 / 3) z
1/ 3 z
( )
G dx z 1

1
G
z
min
n0
n0
1
1
H opt ( z ) =
1
1

1
/
3
z
(
)
i2
1 0.6z 1
which can be written as
0.8z n0
+
1 0.6z 1
n0

1 / 3) z n0
0.8z n0
(
+
( 0.8 / 3)
1
1
1
/
3

z
1

0.6z
(
)

n0
n0

( 3)
5 ( n0 1)
0.8
n0 1 z 0.6 z
+
H opt ( z ) = z

( 0.8 / 3) (1 / 3)
z

3
z

1
/
3
9
z

1
/
3

104
Wiener Filters
The impulse response of the filter is the inverse transform of this transfer
function. Finding a general expression for it is very tedious. But we can
see its form by finding the frequency response
( )
H opt e j
n
j
0.6 e jn0 ( 3) 0
5 j( n0 1)
0.8
n0 1 e
= e
+ j
( 0.8 / 3) (1 / 3)

j
j

1
/
3
e

3

1
/
3
e
9
e

and then computing the impulse response numerically, using the fast Fourier
transform.
105
Wiener Filters
106
Wiener Filters
Now consider the case in which we use a non-causal filter.
Then
H opt ( z ) =
( )= (
)
( z ) 1.8 (1 (1 / 3) z )(1 (1 / 3) z )
(1 0.6z )(1 0.6z )
G dx z 1
G xx
0.64
1 0.6z 1 (1 0.6z )
1
1
1
1 / 8
0.3555
9/8
H opt ( z ) =
= 1.067
+

1
z

1
/
3
z

3
(
)
1 0.333z (1 0.333z )

h opt [ n ] = 1.067 ( 3 / 8 ) (1 / 3) = 0.4 (1 / 3)

n
This impulse response is virtually identical to the n0 = 9 impulse response

for the smoothing case except that it is centered at n = 0 instead of n = 9.
107

OptimalLinearFilters PDF

Uploaded by

Copyright:

Available Formats

OptimalLinearFilters PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

OptimalLinearFilters PDF

Uploaded by

Copyright:

Available Formats

Random Variables

and Stochastic Processes

Random Variable Defined

Discrete-Value vs ContinuousValue Random Variables

Since the distribution function is a probability it must satisfy

FX x is a monotonic function and its derivative is never negative.

Probability density can also be defined by

Proakis uses the notation p ( x ) instead of fX ( x ) for

Probability Mass and Density

Expectation and Moments

where xi is the ith distinct value of X and ni is the number of

The expected value of X is

Expectation and Moments

where M is now the number of

Expectation and Moments

The nth moment of a random variable is

Expectation and Moments

The second moment of a random variable is its mean-squared

Expectation and Moments

The first central moment is always zero. The second central

The positive square root of the variance is the standard

Expectation and Moments

where a is a constant. These properties can be use to prove

The variance of a random variable is the mean of its square

Expectation and Moments

and the nth absolute central moment is defined by

Joint Probability Density

0 FXY x, y 1 , < x < , < y <

FXY , = FXY x, = FXY , y = 0

FXY x, y does not decrease if either x or y increases or both increase

FXY , y = FY y and FXY x, = FX x

Joint Probability Density

Joint Probability Density

f XY x, y 0 , < x < , < y <

Independent Random Variables

and their correlation is the product of their expected values

Independent Random Variables

If X and Y are independent,

Independent Random Variables

Independence Zero Covariance

Independent Random Variables

X and Y are "Uncorrelated" E XY = 0

Independent Random Variables

Independent Random Variables

The Central Limit Theorem

The Central Limit Theorem

The Central Limit Theorem

The Central Limit Theorem

The Central Limit Theorem

The central moments of the Gaussian pdf are

A random variable is a number X assigned to every outcome

A stochastic process is the assignment of a function of t X t,

The set of functions X t,1 , X t, 2 ,, X t, N corresponding

A common convention in the notation describing stochastic

The values of X t at a particular time t1 define a random variable

Example of a Stochastic Process

Example of a Stochastic Process

Continuous-Value vs. Discrete-Value

Deterministic vs. NonDeterministic

If X has a non-zero expected value then R X or R X m

If X has a periodic component then R X or R X m will

G ( F ) dF = mean squared value of {X n }

If we multiply both sides of x [ n ] + ak x [ n k ] = bk w [ n k ]

h [ q ] w [ n q ] we can show that