Abstract
Developed method of real-time non-uniform speech stretching is presented. The proposed solution is based on the well-known SOLA algorithm (Synchronous Overlap and Add). Non-uniform time-scale modification is achieved by the adjustment of time scaling factor values in accordance with the signal content. Dependently on the speech unit (vowels/consonants), instantaneous rate of speech (ROS), and speech signal presence, values of the scaling factor are selected. This provides as low as possible difference in the duration of the input and output signal and high naturalness and quality of the modified speech. In the experimental part of the paper accuracy of the proposed ROS estimator is examined. Quality of the speech stretched using the proposed method is assessed in the subjective tests.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Demol, M., Verhelst, W., Struye, K., Verhoeve, P.: Efficient Non-Uniform Time-Scaling of Speech with WSOLA. In: Speech and Computers, SPECOM (2005)
Grofit, S., Lavner, Y.: Time-Scale Modification of Audio Signals Using Enhanced WSOLA with Management of Transients. IEEE Trans. on Audio, Speech, and Language Processing 16(1) (2008)
Kupryjanow, A., Czyzewski, A.: Real-time speech-rate modification experiments. Audio Engineering Society Convention Paper, Preprint No. 8052, London (2010)
Kupryjanow, A., Czyzewski, A.: Time-scale modification of speech signals for supporting hearing impaired schoolchildren. In: Proc. of the International Conference NTAV/SPA, New Trends in Audio and Video, Signal Processing: Algorithms, Architectures, Arrangements and Applications, Poznan, pp. 159–162 (2009)
Le Beux, S., Doval, B., d’Alessandro, C.: Issues and solutions related to real-time TD-PSOLA implementation. Audio Engineering Society Convention Paper, Preprint No. 8085 (2010)
Mirghafori, N., Fosler, E., Morgan, N.: Towards Robustness to Fast Speech in ASR. In: Proc. ICASSP 1996, pp. I335–I338 (1996)
Morgan, N., Fosler-Lussier, E.: Combining multiple estimators of speaking rate. In: ICASSP, Seattle (1998)
Moulines, E., Laroche, J.: Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Communication 16(2), 175–205 (1995)
Narayanan, S., Wang, D.: Speech rate estimation via temporal correlation andselected sub-band correlation. In: ICASSP (2005)
Pesce, F.: Realtime-stretching of speech signals. In: DAFX, Italy (2000)
Pfau, T., Ruske, G.: Estimating the speaking rate by vowel detection. In: ICASSP 1998, Seattle (1998)
Tallal, P., et al.: Language Comprehension in Language-Learning Impaired Children Improved with acoustically modified speech. Science 271 (1996)
Verhelst, W., Roelands, M.: An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1993 (1993)
Yoo, I.C., Yook, D.: Robust Voice Activity Detection Using the Spectral Peaks of Vowel Sounds. ETRI Journal 31(4), s. 451–s. 453 (2009)
Zheng, J., Franco, H., Stolcke, A.: Rate of Speech Modeling for Large Vocabulary Conversational Speech Recognition (2000)
Zheng, J., Franco, H., Weng, F., Sankar, A., Bratt, H.: Word-level rate-of-speech modeling using rate-specificphones and pronunciations. In: Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Istanbul, vol. 3, pp. 1775–1778 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kupryjanow, A., Czyzewski, A. (2012). A Method of Real-Time Non-uniform Speech Stretching. In: Obaidat, M.S., Sevillano, J.L., Filipe, J. (eds) E-Business and Telecommunications. ICETE 2011. Communications in Computer and Information Science, vol 314. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35755-8_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-35755-8_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35754-1
Online ISBN: 978-3-642-35755-8
eBook Packages: Computer ScienceComputer Science (R0)