A Review of Speech Signal Enhancement Techniques

International Journal of Computer Applications (0975 – 8887)

Volume 139 – No.14, April 2016

A Review of Speech Signal Enhancement Techniques

Devyani S. Kulkarni Ratnadeep R. Deshmukh Pukhraj P. Shrishrimal
Department of Computer Department of Computer Department of Computer
Science & IT, Science & IT, Science & IT,
Dr. B.A.M.U. Dr. B.A.M.U. Dr. B.A.M.U.
Aurangabad Aurangabad Aurangabad

ABSTRACT its removal techniques. Section 4 describes the categorization

Speech is the most natural and the most effective way of of speech signal enhancement techniques followed by
communication between human. During the speech conclusion.
communication, the signals contains some noise so when
processing the digital speech signals; speech signal
enhancement is very important step. The field speech Speech enhancement is a step in the digital speech signal
processing is an applied area of signal processing. The motive processing having an objective of increasing the quality of
of speech enhancement is to enhance the understandability speech signal i.e. to enhance the clarity, intelligibility,
and comprehensibility of speech signal. There are numbers of understand ability and comprehensibility of speech signal
techniques proposed using which speech signal enhancement with the help of some algorithm/filter. There are various
is performed. The objective of this paper is to provide an reasons which leads to degradation of speech signal due to
overview of speech enhancement algorithms which are used background noise which are captured during the recording
for enhancement of speech signal. like reverberation, babble etc. For specific type of speech
enabled applications like speaker recognition, mobile
General Terms applications, hearing aids, VoIP etc. clean and noise free
Speech Signal Enhancement, Background Noise, Speech speech signals are required. The speech enhancement can be
Signal Degradation. achieved by various methods. According to the type of
degradation and the noise in the acquired speech signal the
Keywords approach to speech enhancement varies.
Speech Signal Enhancement, Speech Degradation, Speech The fig 1 shows the Basic steps of speech enhancement
Communication, Filtering Techniques, Speech Signal system [5].

In human being the interaction is using vocal communication
i.e. voice. This is the motive for the researchers to carry out
research in the domain of Digital Speech Signal Processing.
The field Digital speech processing is a sub domain of Digital
signal processing. Each signal associated with a speech
communication always contains a noise. The purpose behind
speech enhancement is to enhance the understandability and
comprehensibility of speech signal [1]. For achieving a good
performance of Speech enabled system it is necessary to have
speech signals without noise, high quality and clarity. Every
time it is very difficult to have speech signals without any
background noise [2]. In a natural environment there is always
some amount of ECHO. Acoustically echo less room are
generally used for capturing the Echoless Speech [3]. During
a study it was observed that the signals are affected by
background noise and it affects the accuracy of the system. To Fig 1: Basic steps of speech enhancement system [5]
increase the accuracy of the system we need to filter the
background noise from speech signal acquired. The aim of 3. TYPES OF NOISE AND ITS
speech signal enhancement techniques is reducing background
In this section the review different types of noise removal
In digital speech signal processing the speech enhancement is techniques is described. The speech signal can be degraded
having great impact. With the help of mathematical approach because of the noise such as be periodic noise, wide band
and simulation there are many techniques using which speech noise, and interfering speech.
signal enhancement is performed [4].
A. Periodic Noise and its Removal Techniques
In this paper an overview of speech enhancement algorithms
used for enhancement of digital speech signal are presented. Stationary filters, adaptive filters, or transform domain filters
The paper is organized as follows the section 2 explains what are used for removing the periodic noise. First approach is
is meant by speech enhancement; section 3 describes types of stationary in which a bank of notch filters such as twin T-
noise because of which the speech signal can be degraded and filters can be used as a comb filter for removal of periodic
noise. Second is adaptive filters, in which a forward

prediction error filter can be used as an inverse filter which The estimation of power spectrum of noisy speech can be
will filter out periodic noise. Third one is transform domain in done as:
which periodic noise spectrum can be observed and
manipulated. The periodic components can be identified by --------------- (3)
inspection of the spectrum. Where are the statistical average values of
B. Wide Band Noise and its Removal Techniques during non-speech period, so eq. (4) - (5) shows the enhanced
speech signal amplitude.
Spectral Subtraction method (SS) and adaptive cancellation
are used for removal of Wide band noise. In spectral -------- (4)
subtraction method, estimated noise spectrum is subtracted -------------- (5)
from the spectrum of the noisy speech. And with the help of
adaptive cancellation the noise correlated with signal can be Combined with the phase of the noisy signal to synthesize the
removed. The correlated signal may be obtained as the signal again
estimated channel in the absence of signal. Adaptive filter
whose impulse response must be such that the filtered channel ------------------- (6)
noise matches the signal noise may be tuned to remove noise. The reverse short-time Fourier transform is performed to
The coefficients are updated until output reaches minima. transform the signals into time domain. Traditional spectral
C. Interfering Speech and its removal techniques subtraction calculation assessing uproarious vitality
throughout no speech stage, in any case, it can't upgrade noise
When two speech signals are interfering Speech enhancement throughout speech stage. Additionally the method obliges a
techniques are not useful. If we are able to identify different VAD that may not work extremely well under low SNR.
pitches the voices of different speakers can be isolated. We
must track voiced segments In order that pitch separation 2. Spectral Subtraction with Over subtraction Model:
works. For recovering desired speaker’s harmonics a comb (SSOM)
filter can be used provided pitch values are already known. In In order to come down with the musical noise effect SSOM
order to isolate voice of different speakers a transform domain procedure was introduced. The perception of musical noise
technique can also be used. Assuming that pitch values of can be reduced using this. This Method does the subtraction
speakers are known, we may find Discrete Fourier Transform of an overestimate of the noise power spectrum and present
(DFT) of the mixed signal and track the harmonics of the the resultant spectral components from going below a preset
fundamental frequencies of the two speakers. We have to minimum spectral floor value.
simply take IDFT of the isolated DFTs to get individual 3. Non-Linear Spectral Subtraction: (NSS)
speaker’s voices if we can isolate the DFT outputs [6]. This method is based on combination of the two ideas first
one is The use of an extended noise and an over subtraction
4. SPEECH ENHANCEMENT model and second is Non-linear implementation of the
METHODS subtraction process, considering that the subtraction process
There are so many different methods used for speech must depend on the SNR of the frame, to go to apply less
enhancement some of them are as follows. They can be subtraction with high SNRs and vice versa [8].
divided in to two basic categories as: Single Channel
Enhancing Techniques and Multi-Channel Enhancing b) Multi Chanel Enhancement Techniques
Techniques. The systems which are of this kind are more complex one as
compare to single channel systems. This systems takes
a) Single Chanel Enhancement Techniques advantage of available multiple signal inputs to the system
This technique is a common for real time applications such as and uses noise reference in adaptive noise cancellation device.
mobile communication, hearing aids etc. as generally there is These systems can do better for non-stationary noises than
no second channel present. This method gives the limited single channel systems by considering the spatial properties of
performance as it improves the quality of noisy signal at the the noise source and the signal, also limitations inherent to
cost of some intelligibility. Also as compare to multichannel single channel systems [9].
system this system is easier and cost effective. Generally this
system uses different statistics of speech and unwanted noise 1. Adaptive Noise Cancellation
[7]. This method is one of the powerful speech enhancement
techniques.Which is based on the auxiliary channels
1. Spectral Subtraction Method availability, which is known as reference path, where a
It is one of the basic methods used for speech enhancement. In correlated sample or reference of the contaminating noise is
the spectral subtraction it is assumed that a signal is formed present. Following an adaptive algorithm, this reference input
by two additive components. The speech contains noise can will be filtered in order to subtract the output of this filtering
be expressed as process which is in the main path, where noisy speech is
present. The adaptive noise cancellation (ANC) cancels the
-------------------- (1)
primary unwanted noise r(n) with is help of introducing a
Where is time, is the uncorrupted speech signal, is cancelling anti-noise of equal amplitude but opposite phase by
the additive noise signal and is the corrupted speech using a reference signal. The reference signal generated is
signal available for processing. The observed signal is split derived from one or more sensors located at points which are
into overlapping frames using the application of a window near the noise and interference sources at the point where the
function and implemented in the short-time Fourier transform interest signal is weak or undetectable [10].
(STFT) magnitude domain. Also in the frequency domain this
can be represented as
------------------ (2)

2. Multisensor Beamforming producing an estimation of the desired variables in such a way

A multiple-input and single-output (MISO) application is a that the error is statistically minimized. One of the most basic
Beamforming, which consists of multichannel advanced differences between the Wiener filter and the Kalman filter is
multidimensional (space-time domain) filtering techniques the ability of the latter to accommodate non-stationary signals
which enhances the desired signal and also suppress the noise [13].
signal. In beamforming, the arrangements of two or more
microphones are in an array of some geometric shape. Then a 3. Linear Predictive Coding
beamformer is used to filter the sensor outputs and amplifies Linear predictive coding (LPC) is a tool mainly used for
or attenuates the signals depending on their direction of processing the audio signal and speech processing for
arrival (DOA). The hidden idea of this method is based on the representing the spectral envelop of a speech digital signal in
assumption that the contribution of the reflexions is small, and a compressed way (using the information of linear prediction
the direction of arrival of the desired signal is known. Then, model). It start by making assumption LPC starts with the
from the correct alignment of the phase function present in assumption that the speech signal is produced by a buzz at the
each sensor, enhancement of the desired signal can be done by end of a tube, adding, sometimes, hissing and popping sounds.
rejecting all the noisy components not aligned in phase. This model is a good approximation to the reality. The glottis
produces the buzz, which is known by his intensity (loudness)
The speech enhancement can also be done in both time and frequency (pitch). The vocal tract generates a tube which
domain and transform domain as follows [11]. is known by his resonances, called formants. The lips, tongue
and throat generate the hisses and pops sounds. [14]
a) Time Domain Method
LPC does the analysis in the speech signal by using the
1. Winer Filtering formants, by removing their effect from the speech signal and
Lim and Oppenheim in December 1979 suggested the wiener estimating the intensity and frequency of the speech signal
filter for speech enhancement as an improvement to spectral which are remaining. The removal of formants process is
subtraction. This method is popularly used in so many signal called inverse filtering. The remaining signal after the
enhancement methods. The basic of Winer filter is getting an subtraction is known as residue. The numbers which describe
estimate of the clean signal from that corrupted by additive the frequency and intensity of the buzz, the formants and the
noise is. With minimizing the Mean Square Error (MSE) residue signal can be stored or transmitted. Determine the
between the desired signal s (n) and the estimated signal ˆs (n) formants from the original signal is the fundamental problem
we obtained the estimate. of the LPC system. So the solution of this is to express each n
every sample as a linear combination of previous samples.
Solution to this optimization problem in the frequency domain
The coefficients of the equation represent the formants, so we
gives the following filter transfer function:
use the LPC system to estimate these coefficients.
------------ (7) 4. Transform Domain Method
a. DFT Based (STSA Methods)
Where and are the power spectral densities of the This is most known method as these methods have less
clean and the noise signals, respectively. This formula can be computational complexity as easy implementation. Uses short
derived considering the signal s and the noise v as time DFT (STDFT) and have been intensively investigated
uncorrelated and stationary signals. The SNR is defined by and also known as spectral processing methods. To spectral
[13]: phase For Human speech perception these methods are not
sensitive. But the clean spectral amplitude must be properly
---------- (8)
extracted from the noisy speech to have acceptable quality
speech at output. Hence they are known as short time spectral
This definition can be integrated to the Wiener filter equation amplitude (STSA) based methods is the face on which they
as follows are is based [].
---------- (9) b. Signal Subspace Method
This method contains the use of a signal dependant transform
The fixed frequency response at all frequencies and the for decomposing a noisy signal into two separate subspaces,
requirement to estimate the power spectral density of the the signal plus noise subspace, and also the noise-only
clean signal and noise prior to filtering is the drawback of the subspace. This transform uses to perform this operation is the
Wiener filter [12]. Karhuenen-Loeve transform (KLT). This assumption expects
that speech can just extent the signal in accumulation to noise
2. Kalman Filtering subspace, and the noise-just subspace. The KLT components
A generalization of the Winer Filter is the kalman filter. It which denote the noise just subspace are nulled, while the
contains a slowly varying AR model. In the Kalman filtering modules which represent the noisy signal are modified by a
framework the AR model and the excitation model fit nicely, gain function. The enhanced signal is derived from the inverse
fully exploiting the capability of the Kalman filter for KLT of the altered components. To improve the quality is the
processing non-stationary signals in an LMMSE optimum aim here and concurrently minimising any loss in
manner. The coefficients of AR-model are estimated with the intelligibility. The enhanced speech which is produced by the
help of decision directed type Power Spectral Subtraction signal subspace using adaptive noise estimation (SSANE)
method which is followed by an LPC analysis. Multi- Pulse algorithm, is of a good, natural-sounding quality and contains
Linear Predictive Coding (MPLPC) based method is used for no audible noise. Still, this algorithm can only update the
the robust estimation of the rapidly time-varying excitation noise estimate when speech is absent, and suffers degradation
model in the presence of noise. We can say that the Kalman in performance in many different noise types [16]. Following
filter combines all the available data measured, also the Fig 2. Represents the Block diagram of Subspace speech
knowledge of the system and the measurement devices, for enhancement system.

Communication and Computer Engineering, Volume 5,

Issue (4) July, Technovision-2014, ISSN 2249–071X
[5] Ganga Prasad, Surender “A Review of Different
Approaches of Spectral Subtraction Algorithms for
Speech Enhancement” Department of Electronics,
Fig 2: Block diagram of Subspace speech enhancement Madhav Institute of Technology & Science Gwalior,
system M.P. – 474005.
[6] Chaudhari, Amol, and S. B. Dhonde. "A review on
5. CONCLUSION speech enhancement techniques." Pervasive Computing
Speech enhancement is a technique having objective of (ICPC), 2015 International Conference on. IEEE, 2015.
increasing the quality of speech signal.In this paper different
[7] Pankaj Bactor, Anil Garg, "Different Techniques for the
speech enhancement techniques have discussed. We have
Enhancement of the Intelligibility of a Speech Signal",
studied different Types of Noise and Its Removal Techniques.
International Journal of Engineering Research and
Also we have seen Speech Enhancement Methods like Single
Development, Volume 2, Issue 2 (July 2012), PP. 57-64.
Channel and Multi-Channel Enhancing Techniques and their
sub types. Also in Time Domain Method we have seen Winer [8] Yariv Ephraim, Hanoch Lev-Ari and William J.J.
Filtering, Kalman Filtering, and Linear Predictive Coding. Roberts “A Brief Survey of Speech Enhancement” IEEE
And in Transform Domain Method DFT Based (STSA Sig. Proc. Let., vol. 10,pp. 104-106, April 2003 s.
Methods), Signal Subspace Method. [9] Lu-ying SUI, Xiong-wei ZHANG, Jian-jun HUANG,
Bin ZHOU “An Improved Spectral Subtraction Speech
6. ACKNOWLEDGMENTS Enhancement Algorithm under Non-stationary Noise”
This work is supported by University Grants Commission Institute of Command Automation, PLAUST Nanjing,
under the scheme Major Research Project entitled as China,IEEE,2011.
"Development of Database and Automatic Recognition
System for Continuous Marathi Spoken Language for [10] Reddy, D.R, “Speech recognition by machine: A
agriculture purpose in Marathwada Region". The authors review”, Proceedings of IEEE (Volume: 64, Issue: 4)
would also like to thank the Department of Computer Science ISSN: 0018-9219.
and IT, Dr. Babasaheb Ambedkar Marathwada University, [11] Young, S.J, “Robust continuous speech recognition using
Aurangabad for providing the infrastructure to carry out the parallel model”, IEEE Transactions on Speech and Audio
research. Processing (Volume: 4, Issue: 5).
[12] Savita Hooda and Smriti Aggarwal Maharishi
7. REFERENCES Markandeshwar University, Mullana (Ambala), INDIA.
[1] Sunita Dixit, Dr. MD Yusuf Mulge, "Review on Speech
Enhancement Techniques", International Journal of
Modeling Techniques in Speech Recognition",
Computer Science and Mobile Computing, IJCSMC,
Vol. 3, Issue. 8, August 2014, pg.285 – 290.
[2] Chanchal Pandey, Sandeep Nemad, "Distinctive [14] Yoon. B-Y.; Tashev, I. & Acero, A. (2007) Robust
Methods for Speech Enhancement using Kalman Adaptive Beamforming Algorithm Using Instantaneous
Filtering", International Journal of Computer Direction Of Arrival With Enhanced Noise Suppression
Applications (0975 – 8887)Volume 105 – No. 5, Capability. IEEE International Conference on Acoustics,
November 2014. Speech and Signal Processing 1:I-133–I- 136.
[3] P. Bravin Jose, Mrs. M. Jayasanthi, "Review on Speech [15] Nandini Garg, Jyoti Gupta, "Review on Speech
Enhancement Techniques", KARPAGAM JOURNAL Enhancement using Signal Subspace method",
OF ENGINEERING RESEARCH (KJER), Volume No.: International Journal of Application or Innovation in
01, Issue No.: 01. 2014. Engineering & Management (IJAIEM), Volume 2, Issue
[4] Vyankatesh Chapke, Prof. Harjeet Kaur, "Review of 5, May 2013.
Speech Enhancement Techniques using Statistical [16] Barry Commins “Signal Subspace Speech Enhancement
Approach", International Journal of Electronics with Adaptive Noise Estimation” National University of
Ireland, Galway, September 2005.

