Optimal Filtering1

EE 414/530: Statistical Communications University of Pennsylvania
Handout 18 Fall Semester 1999
Optimal Linear Filtering

Suppose s(t) and r(t) are jointly wide-sense stationary random processes. Given r(t), what is the best estimate for s(t)? Suppose r(t) is passed through a linear, timeinvariant system with impulse response h(t). The output of the system is the widesense stationary random process
z(t) = r(t) h(t) =
r(t )h() d.
In the optimal linear ltering problem we are required to design a linear, time-invariant system whose output z(t) when r(t) is input provides the best linear estimate of the process s(t). Of course, many di erent notions of best approximation are possible. Among possibilities for a measure of how good the performance is are the following functionals: 1. P{z(t) = s(t) | r(), t}. 2. P{|z(t) s(t)| > }. 3. E(|z(t) s(t)|). 4. E |z(t) s(t)|2 . Of course, many other functionals can readily be added to this list. While all these measures have their advantages and disadvantages, the most used in practice is the mean-square error measure listed last; while this is intrinsically no better (or worse!) than any of the other measures, it has the huge advantage of leading to analytically tractable situations which frequently lead to very elegant solutions. This is the main thrust of the Wiener theory of linear ltering and smoothing.
Wiener Filters
In the sequel, we formally adopt the mean-square error measure
E |z(t) s(t)|2
as a measure of how well the process z(t) = r(t) h(t) approximates the process s(t). To keep the development unencumbered, let us suppose that s(t) and z(t) have realvalued sample functions. Exploiting linearity of expectation, we now obtain
E = E s2 (t) 2 E s(t)z(t) + E z2 (t) = E s (t) 2

2
E s(t)r(t ) h() d +

E r(t )r(t ) h()h() dd
= Rs (0) 2
Rs,r ()h() d +
Rr ( )h()h() dd,
(1)
where, with the usual notational conventions,
Rs () = E s(t + )s(t)
and
Rr () = E r(t + )r(t)
EE 414/530: Statistical Communications University of Pennsylvania are the autocorrelation functions and
Rs,r () = E s(t + )r(t)

is the cross-correlation function of the jointly wide-sense stationary processes s(t) and r(t). It is not immediately clear how to proceed from here to determine the optimal impulse response h(t). The key is a little calculus of variations. Let us rst consolidate our notation a little. If R() is any (auto or cross) correlation function, introduce the linear operator R which maps any square-integrable function h into a square-integrable function Rh given by
(Rh)(t) =
In other words, (Rh)(t) is just the convolution of R(t) with h(t). Recall that the correlation function R() is symmetric so that
R(t )h() d.
(Rh)(0) =
Likewise, we may dene the quadratic form associated with R via the usual inner product in the space of square-integrable functions:
R()h() d =
R()h() d.
h, Rh =
R( )h()h() dd.
In this notation, Rr and Rs,r are the linear operators corresponding to the correlation functions Rr () and Rs,r (). Now suppose to begin with that the lter with impuse response h(t) yields the minimum mean-square error E E0 . In our new notation, (1) now becomes
E0 = Rs (0) 2(Rs,r h)(0) + h, Rr h .

Let be a real parameter and let g(t) be any square-integrable perturbation function. Consider the linear, time-invariant system with impulse response h (t) = h(t) + g(t). (In this notation, logically, h0 (t) h(t).) Let z (t) = r(t) h (t) be the output of this system when presented with the random process r(t) as input. Following the same series of steps as that leading to (1), the mean-squared error incurred in approximating s(t) by z (t) is easily seen to be
E |z(t) s(t)|2 = Rs (0) 2 Rs,r (h + g) (0) + h + g, Rs,r (h + g) .
Grouping and combining terms in powers of the parameter simplies the expression: we can express the mean-square error as the quadratic form
E = E0 2B + C
EE 414/530: Statistical Communications University of Pennsylvania where the parameters B = B(g) and C = C(g) are given by
Rs,r (h + g) (0) =
Rs,r ()g() d
Rr ( )h()g() dd
g() Rs,r ()
Rr ( )h() d d
(2)
g() Rs,r () (Rr h)() d

h + g, Rs,r (h + g) =
Rr ( )g()g() dd.
As the lter with impulse response h is assumed to be mean-square optimal, so that, in particular, the lter with impulse response h = h + g is sub-optimal, we must have E0 E . It follows that
0 E0 E0 2B + C
whence we must have
2B + C
(3)
for every choice of real parameter and perturbant g(t). Now consider the output (t) = r(t) g(t) of a lter with impulse response g(t) when r(t) is the input. The output process (t) has autocorrelation function
R () = E (t + )(t) = Rr () g() g() =
Rr ( + )g()g() dd,
whence it is clear that the quantity C is exactly the power of the output process (t),
E (t) = R (0) =
Rr ( )g()g() dd = C.
It follows that C = C(g) is nonnegative for every choice of the perturbantg(t). An examination of (3) now readily leads to the following constraint: The inequality (3) can be maintained for every real and every perturbant g(t) if, and only if, B = B(g) 0 for every g. Indeed, su ciency of the constraint follows quickly: if B = B(g) = 0 for every choice of perturbant g(t), then 2B + C 2 = C 2 0 for every as C = C(g) 0 for every g. To show necessity, suppose B = 0. Then, for small enough , the left-hand side of (3) will be dominated by the term 2B which can be forced to become negative by choosing to have the same sign as B. In particular, the choices
B/2
if C = 0,
B/C if C = 0,
EE 414/530: Statistical Communications University of Pennsylvania yield the expressions
2B + C
B2 if C = 0, 2 B /C if C = 0,
which are strictly negative if B = 0. It follows that B = B(g) 0 is a necessary condition for the inequality (3) to hold uniformly for all choices of perturbant g(t) and real . Simplify the expression (2) for B by setting
f()
Rs,r ()
Rr ( )h() d = Rs,r () (Rr h)().
The constraint B = 0 then translates into the simple integral equation

g()f() d = 0
(4)
which must be satised for every choice of perturbant function g. Observe that the integrand has separated into a product of two terms: the perturbant function g() and a function f() determined independently of the choice of perturbant function g(). As (4) holds for every choice of perturbant function g, in particular, the felicitous choice g() = f() yields

f2 () d = 0.
We conclude that the function f must be identically zero everywhere .1 As the function f is completely determined by the lter impulse response h, the condition f 0, equivalent to the following integral equation,
(Rr h)) =
Rr ( )h() d = Rs,r (),
prescribes a necessary and su cient condition to be satised by the impulse response of the optimal, minimum mean-square error lter. Weve thus obtained a complete characterisation of the optimal lter. We can summarise our result succinctly and elegantly in convolution notation: For h(t) to be the impulse response of the minimum mean-square error lter it is necessary and su cient that it satisfy the integral equation
(Rr h)() = Rs,r ().
(5)
The gentle reader may well cavil at our description of the solution above as complete. This is a complete solution? How does one invert the integral equation to determine the impulse response h(t)? As is frequently the case, matters become much
1 More precisely, f ( ) can take nonzero values only in a set of measure (or length) zero. In particular, f ( ) cannot be nonzero on any interval.
more transparent in the Fourier domain. The frequency response of the lter, i.e., its transfer function, is the Fourier transform of the impulse response,
H(f) =
Likewise, for the jointly wide-sense stationary random processes s(t) and r(t), the power spectral density of r(t) and the cross spectral density of s(t) and r(t) are given by the Fourier relations
h(t)e j2 ft dt.
Sr (f) =
respectively. Taking Fourier transforms of both sides of (5) yields Sr (f)H(f) = Ss,r (f). Thus, the frequency-domain equivalent characterisation of (5) has a particularly spare and elegant character: It is necessary and su cient that the transfer function of the minimum mean-square error lter satises
Rr ()e j2 f d and Ss,r (f) =
Rs,r ()e j2f d,
H(f) =
Ss,r (f) Sr (f)
(5 )
wherever Sr (f) = 0. A quick example may serve to x the intuitive nature of this elegant solution to the minimum mean-square ltering problem. Example 1 Additive, uncorrelated noise . Suppose the random process r(t) is obtained by additively corrupting the random signal process s(t) by a wide-sense stationary zero-mean random noise process n(t) uncorrelated with s(t). Let Rn () = E n(t + )n(t) be the autocorrelation function of the noise process and Sn (f) be the noise power spectral density. Then
Rs,r () = E s(t + )r(t) = E s(t + ) s(t) + n(t) = E s(t + )s(t) + E s(t + )n(t) = Rs () + E s(t + ) E n(t) = Rs (),
so that Ss,r (f) = Ss (f). A similar process yields
Rr () = E r(t + )r(t) = E s(t + )s(t) + E s(t + ) E n(t) + E s(t) E n(t + ) + E n(t + )n(t) = Rs () + Rn (),
so that Sr (f) = Ss (f) + Sn (f). Thus, the transfer function of the minimum mean-square error lter is given by
H(f) =
Ss (f) . Ss (f) + Sn (f)
It is well worth mulling over the implication of the solution for a moment. The optimal lter emphasises those frequencies at which the ratio of the signal power spectral density to the noise power spectral density is large and deemphasises those frequencies for which this ratio is small. Informally, the lter passes frequencies which contain largely signal energy and inhibits frequencies which contain largely noise energy. Very persuasive! This simple idea nds application in pre-emphasis and de-emphasis lters incorporated in Dolby systems for high audio delity in FM radio and TV transmission and audio recordings.
The Wiener theory can be extended in several directions, including: What characterises the optimal lter under a causality constraint? How can one, in general, go about solving the integral equation for prediction and ltering using the innite past? What is the optimal lter among a parameterised class of lters? What is the solution to the optimal ltering problem when the observation time is nite?

Optimal Filtering1

Uploaded by

Copyright:

Available Formats

Optimal Filtering1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Optimal Filtering1

Uploaded by

Copyright:

Available Formats

EE 414/530: Statistical Communications University of Pennsylvania

Handout 18 Fall Semester 1999

Optimal Linear Filtering

z(t) = r(t) h(t) =

E = E s2 (t) 2 E s(t)z(t) + E z2 (t) = E s (t) 2

E r(t )r(t ) h()h() dd

where, with the usual notational conventions,

Handout 18 Fall Semester 1999

Rs,r () = E s(t + )r(t)

E0 = Rs (0) 2(Rs,r h)(0) + h, Rr h .

E |z(t) s(t)|2 = Rs (0) 2 Rs,r (h + g) (0) + h + g, Rs,r (h + g) .

Handout 18 Fall Semester 1999

g() Rs,r () (Rr h)() d

R () = E (t + )(t) = Rr () g() g() =

EE 414/530: Statistical Communications University of Pennsylvania yield the expressions

Handout 18 Fall Semester 1999

Rr ( )h() d = Rs,r () (Rr h)().

The constraint B = 0 then translates into the simple integral equation

Rr ( )h() d = Rs,r (),

(Rr h)() = Rs,r ().

EE 414/530: Statistical Communications University of Pennsylvania

Handout 18 Fall Semester 1999

Rr ()e j2 f d and Ss,r (f) =

Rs,r ()e j2f d,

Ss,r (f) Sr (f)

Ss (f) . Ss (f) + Sn (f)

EE 414/530: Statistical Communications University of Pennsylvania

Handout 18 Fall Semester 1999

You might also like