Optimal Filtering1
Optimal Filtering1
Optimal Filtering1
r(t )h() d.
In the optimal linear ltering problem we are required to design a linear, time-invariant system whose output z(t) when r(t) is input provides the best linear estimate of the process s(t). Of course, many di erent notions of best approximation are possible. Among possibilities for a measure of how good the performance is are the following functionals: 1. P{z(t) = s(t) | r(), t}. 2. P{|z(t) s(t)| > }. 3. E(|z(t) s(t)|). 4. E |z(t) s(t)|2 . Of course, many other functionals can readily be added to this list. While all these measures have their advantages and disadvantages, the most used in practice is the mean-square error measure listed last; while this is intrinsically no better (or worse!) than any of the other measures, it has the huge advantage of leading to analytically tractable situations which frequently lead to very elegant solutions. This is the main thrust of the Wiener theory of linear ltering and smoothing.
Wiener Filters
In the sequel, we formally adopt the mean-square error measure
E |z(t) s(t)|2
as a measure of how well the process z(t) = r(t) h(t) approximates the process s(t). To keep the development unencumbered, let us suppose that s(t) and z(t) have realvalued sample functions. Exploiting linearity of expectation, we now obtain
E s(t)r(t ) h() d +
= Rs (0) 2
Rs,r ()h() d +
Rr ( )h()h() dd,
(1)
Rs () = E s(t + )s(t)
and
Rr () = E r(t + )r(t)
EE 414/530: Statistical Communications University of Pennsylvania are the autocorrelation functions and
(Rh)(t) =
In other words, (Rh)(t) is just the convolution of R(t) with h(t). Recall that the correlation function R() is symmetric so that
R(t )h() d.
(Rh)(0) =
Likewise, we may dene the quadratic form associated with R via the usual inner product in the space of square-integrable functions:
R()h() d =
R()h() d.
h, Rh =
R( )h()h() dd.
In this notation, Rr and Rs,r are the linear operators corresponding to the correlation functions Rr () and Rs,r (). Now suppose to begin with that the lter with impuse response h(t) yields the minimum mean-square error E E0 . In our new notation, (1) now becomes
Grouping and combining terms in powers of the parameter simplies the expression: we can express the mean-square error as the quadratic form
E = E0 2B + C
EE 414/530: Statistical Communications University of Pennsylvania where the parameters B = B(g) and C = C(g) are given by
Rs,r (h + g) (0) =
Rs,r ()g() d
Rr ( )h()g() dd
g() Rs,r ()
Rr ( )h() d d
(2)
h + g, Rs,r (h + g) =
Rr ( )g()g() dd.
As the lter with impulse response h is assumed to be mean-square optimal, so that, in particular, the lter with impulse response h = h + g is sub-optimal, we must have E0 E . It follows that
0 E0 E0 2B + C
whence we must have
2B + C
(3)
for every choice of real parameter and perturbant g(t). Now consider the output (t) = r(t) g(t) of a lter with impulse response g(t) when r(t) is the input. The output process (t) has autocorrelation function
Rr ( + )g()g() dd,
whence it is clear that the quantity C is exactly the power of the output process (t),
E (t) = R (0) =
Rr ( )g()g() dd = C.
It follows that C = C(g) is nonnegative for every choice of the perturbantg(t). An examination of (3) now readily leads to the following constraint: The inequality (3) can be maintained for every real and every perturbant g(t) if, and only if, B = B(g) 0 for every g. Indeed, su ciency of the constraint follows quickly: if B = B(g) = 0 for every choice of perturbant g(t), then 2B + C 2 = C 2 0 for every as C = C(g) 0 for every g. To show necessity, suppose B = 0. Then, for small enough , the left-hand side of (3) will be dominated by the term 2B which can be forced to become negative by choosing to have the same sign as B. In particular, the choices
B/2
if C = 0,
B/C if C = 0,
2B + C
B2 if C = 0, 2 B /C if C = 0,
which are strictly negative if B = 0. It follows that B = B(g) 0 is a necessary condition for the inequality (3) to hold uniformly for all choices of perturbant g(t) and real . Simplify the expression (2) for B by setting
f()
Rs,r ()
g()f() d = 0
(4)
which must be satised for every choice of perturbant function g. Observe that the integrand has separated into a product of two terms: the perturbant function g() and a function f() determined independently of the choice of perturbant function g(). As (4) holds for every choice of perturbant function g, in particular, the felicitous choice g() = f() yields
f2 () d = 0.
We conclude that the function f must be identically zero everywhere .1 As the function f is completely determined by the lter impulse response h, the condition f 0, equivalent to the following integral equation,
(Rr h)) =
prescribes a necessary and su cient condition to be satised by the impulse response of the optimal, minimum mean-square error lter. Weve thus obtained a complete characterisation of the optimal lter. We can summarise our result succinctly and elegantly in convolution notation: For h(t) to be the impulse response of the minimum mean-square error lter it is necessary and su cient that it satisfy the integral equation
(5)
The gentle reader may well cavil at our description of the solution above as complete. This is a complete solution? How does one invert the integral equation to determine the impulse response h(t)? As is frequently the case, matters become much
1 More precisely, f ( ) can take nonzero values only in a set of measure (or length) zero. In particular, f ( ) cannot be nonzero on any interval.
more transparent in the Fourier domain. The frequency response of the lter, i.e., its transfer function, is the Fourier transform of the impulse response,
H(f) =
Likewise, for the jointly wide-sense stationary random processes s(t) and r(t), the power spectral density of r(t) and the cross spectral density of s(t) and r(t) are given by the Fourier relations
h(t)e j2 ft dt.
Sr (f) =
respectively. Taking Fourier transforms of both sides of (5) yields Sr (f)H(f) = Ss,r (f). Thus, the frequency-domain equivalent characterisation of (5) has a particularly spare and elegant character: It is necessary and su cient that the transfer function of the minimum mean-square error lter satises
H(f) =
(5 )
wherever Sr (f) = 0. A quick example may serve to x the intuitive nature of this elegant solution to the minimum mean-square ltering problem. Example 1 Additive, uncorrelated noise . Suppose the random process r(t) is obtained by additively corrupting the random signal process s(t) by a wide-sense stationary zero-mean random noise process n(t) uncorrelated with s(t). Let Rn () = E n(t + )n(t) be the autocorrelation function of the noise process and Sn (f) be the noise power spectral density. Then
Rs,r () = E s(t + )r(t) = E s(t + ) s(t) + n(t) = E s(t + )s(t) + E s(t + )n(t) = Rs () + E s(t + ) E n(t) = Rs (),
so that Ss,r (f) = Ss (f). A similar process yields
Rr () = E r(t + )r(t) = E s(t + )s(t) + E s(t + ) E n(t) + E s(t) E n(t + ) + E n(t + )n(t) = Rs () + Rn (),
so that Sr (f) = Ss (f) + Sn (f). Thus, the transfer function of the minimum mean-square error lter is given by
H(f) =
It is well worth mulling over the implication of the solution for a moment. The optimal lter emphasises those frequencies at which the ratio of the signal power spectral density to the noise power spectral density is large and deemphasises those frequencies for which this ratio is small. Informally, the lter passes frequencies which contain largely signal energy and inhibits frequencies which contain largely noise energy. Very persuasive! This simple idea nds application in pre-emphasis and de-emphasis lters incorporated in Dolby systems for high audio delity in FM radio and TV transmission and audio recordings.
The Wiener theory can be extended in several directions, including: What characterises the optimal lter under a causality constraint? How can one, in general, go about solving the integral equation for prediction and ltering using the innite past? What is the optimal lter among a parameterised class of lters? What is the solution to the optimal ltering problem when the observation time is nite?