An Experimental Performance Assessment of Temporal Convolutional Networks for Microphone Virtualization in a Car Cabin
Abstract
:1. Introduction
2. System Model
3. TCN Model
3.1. TCN Architecture
3.2. TCN Implementation
- the number of TCN layers L, i.e., the number of stacked residual blocks;
- the feature size of each residual block, i.e., the number of filters that are calculated in parallel at each layer;
- the filter sizes in the convolution operations at the end of the network;
- the total number of training epochs.
4. Experimental Setup and Scenarios
5. Numerical Results
6. Concluding Remarks
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Liu, L.; Kuo, S.M.; Zhou, M. Virtual sensing techniques and their applications. In Proceedings of the International Conference on Networking, Sensing and Control, Okayama, Japan, 26–29 March 2009; pp. 31–36. [Google Scholar] [CrossRef]
- Pezzoli, M.; Borra, F.; Antonacci, F.; Tubaro, S.; Sarti, A. A Parametric Approach to Virtual Miking for Sources of Arbitrary Directivity. IEEE/ACM Trans. Audio Speech Lang. Proc. 2020, 28, 2333–2348. [Google Scholar] [CrossRef]
- Thiergart, O.; Del Galdo, G.; Taseska, M.; Habets, E.A.P. Geometry-Based Spatial Sound Acquisition Using Distributed Microphone Arrays. IEEE/ACM Trans. Audio Speech Lang. Proc. 2013, 21, 2583–2594. [Google Scholar] [CrossRef]
- Erdem, E.; Cvetkovic, Z.; Hacihabiboglu, H. 3D Perceptual Soundfield Reconstruction via Virtual Microphone Synthesis. IEEE/ACM Trans. Audio Speech Lang. Proc. 2023, 31, 1305–1317. [Google Scholar]
- Szurley, J.; Bertrand, A.; Dijk, B.V.; Moonen, M. Binaural Noise Cue Preservation in a Binaural Noise Reduction System with a Remote Microphone Signal. IEEE/ACM Trans. Audio Speech Lang. Proc. 2016, 24, 952–966. [Google Scholar] [CrossRef]
- Antonanzas, C.; Ferrer, M.; De Diego, M.; Gonzalez, A. Remote Microphone Technique for Active Noise Control Over Distributed Networks. IEEE/ACM Trans. Audio Speech Lang. Proc. 2023, 31, 1522–1535. [Google Scholar] [CrossRef]
- Chen, H.; Huang, X.; Zou, H.; Lu, J. Research on the Robustness of Active Headrest with Virtual Microphones to Human Head Rotation. Appl. Sci. 2022, 12, 1506. [Google Scholar] [CrossRef]
- Zhang, Z.; Wu, M.; Yin, L.; Gong, C.; Wang, J.; Zhou, S.; Yang, J. Robust feedback controller combined with the remote microphone method for broadband active noise control in headrest. Appl. Acoust. 2022, 195, 108815. [Google Scholar] [CrossRef]
- Liang, K.W.; Hu, J.S. Optimal Controller Design for Virtual Sensing with Independent Noise Source Measurement. IEEE Trans. Control Syst. Technol. 2019, 27, 363–369. [Google Scholar] [CrossRef]
- Elliott, S.J.; Cheer, J. Modeling local active sound control with remote sensors in spatially random pressure fields. J. Acoust. Soc. Am. 2015, 137, 1936–1946. [Google Scholar] [CrossRef] [PubMed]
- Elliott, S.; Jung, W.; Cheer, J. Causality and Robustness in the Remote Sensing of Acoustic Pressure, with Application to Local Active Sound Control. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 8484–8488. [Google Scholar] [CrossRef]
- Elliott, S.; Lai, C.K.; Vergez, T.; Cheer, J. Robust stability and performance of local active control systems using virtual sensing. In Proceedings of the International Congress on Acoustics (ICA), Aachen, Germany, 9–13 September 2019; pp. 61–68. [Google Scholar] [CrossRef]
- Moreau, D.; Cazzolato, B.; Zander, A.; Petersen, C. A Review of Virtual Sensing Algorithms for Active Noise Control. Algorithms 2008, 1, 69–99. [Google Scholar] [CrossRef]
- Peterson, C.D.; Fraanje, R.; Cazzolato, B.S.; Zander, A.; Hansen, C.H. A Kalman filter approach to virtual sensing for active noise control. Mech. Syst. Signal Proc. 2008, 22, 490–508. [Google Scholar] [CrossRef]
- Das, D.; Moreau, D.; Cazzolato, B. Performance evaluation of an active headrest using the remote microphone technique. In Proceedings of the Australian Acoustical Society Conference, Gold Coast, Australia, 2–4 November 2011; pp. 1–7. [Google Scholar]
- Jung, W.; Elliott, S.J.; Cheer, J. Local active control of road noise inside a vehicle. Mech. Syst. Signal Proc. 2019, 121, 144–157. [Google Scholar] [CrossRef]
- Shi, D.; Lam, B.; Gan, W.S. Analysis of Multichannel Virtual Sensing Active Noise Control to Overcome Spatial Correlation and Causality Constraints. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 8499–8503. [Google Scholar] [CrossRef]
- Shi, D.; Gan, W.S.; Lam, B.; Hasegawa, R.; Kajikawa, Y. Feedforward multichannel virtual-sensing active control of noise through an aperture: Analysis on causality and sensor-actuator constraints. J. Acoust. Soc. Am. 2020, 147, 32–48. [Google Scholar] [CrossRef] [PubMed]
- Zhang, J.; Elliott, S.J.; Cheer, J. Robust performance of virtual sensing methods for active noise control. Mech. Syst. Signal Proc. 2021, 152, 107453. [Google Scholar] [CrossRef]
- Ribeiro, J.G.C.; Koyama, S.; Saruwatari, H. Kernel Interpolation of Acoustic Transfer Functions with Adaptive Kernel for Directed and Residual Reverberations. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar] [CrossRef]
- Khan, A.; Sohail, A.; Zahoora, U.; Qureshi, A.S. A survey of the recent architectures of deep convolutional neural networks. Nat. Artif. Intell. Rev. 2020, 53, 5455–5516. [Google Scholar] [CrossRef]
- Purwins, H.; Li, B.; Virtanen, T.; Schluter, J.; Chang, S.; Sainath, T. Deep Learning for Audio Signal Processing. IEEE J. Select. Top. Signal Proc. 2019, 13, 206–219. [Google Scholar] [CrossRef]
- Aggarwal, C.C. Neural Networks and Deep Learning: A Textbook, 1st ed.; Springer Nature: Cham, Switzerland, 2016. [Google Scholar]
- van den Oord, A.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K. WaveNet: A Generative Model for Raw Audio. In Proceedings of the ISCA Workshop on Speech Synthesis, Sunnyvale, CA, USA, 13–15 September 2016. [Google Scholar]
- Lea, C.; Flynn, M.D.; Vidal, R.; Reiter, A.; Hager, G.D. Temporal Convolutional Networks for Action Segmentation and Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1003–1012. [Google Scholar] [CrossRef]
- Opinto, A.; Martalò, M.; Costalunga, A.; Strozzi, N.; Tripodi, C.; Raheli, R. Experimental Results on Observation Filter Estimation for Microphone Virtualization. In Proceedings of the 2021 Immersive and 3D Audio: From Architecture to Automotive (I3DA), Bologna, Italy, 8–10 September 2021; pp. 1–7. [Google Scholar]
- Opinto, A.; Martalò, M.; Costalunga, A.; Strozzi, N.; Tripodi, C.; Raheli, R. Experimental Analysis and Design Guidelines for Microphone Virtualization in Automotive Scenarios. IEEE/ACM Trans. Audio Speech Lang. Proc. 2022, 30, 2337–2346. [Google Scholar] [CrossRef]
- Luo, Y.; Chen, Z.; Yoshioka, T. Dual-Path RNN: Efficient Long Sequence Modeling for Time-Domain Single-Channel Speech Separation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 46–50. [Google Scholar] [CrossRef]
- Hershey, S.; Chaudhuri, S.; Ellis, D.P.W.; Gemmeke, J.F.; Jansen, A.; Moore, R.C.; Plakal, M.; Platt, D.; Saurous, R.A.; Seybold, B.; et al. CNN architectures for large-scale audio classification. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 131–135. [Google Scholar] [CrossRef]
- Pandey, A.; Wang, D. TCNN: Temporal Convolutional Neural Network for Real-time Speech Enhancement in the Time Domain. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 6875–6879. [Google Scholar] [CrossRef]
- Germain, F.G.; Chen, Q.; Koltun, V. Speech Denoising with Deep Feature Losses. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH), Graz, Austria, 15–19 September 2019; pp. 2723–2727. [Google Scholar] [CrossRef]
- Rethage, D.; Pons, J.; Serra, X. A Wavenet for Speech Denoising. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 5069–5073. [Google Scholar] [CrossRef]
- Guirguis, K.; Schorn, C.; Guntoro, A.; Abdulatif, S.; Yang, B. SELD-TCN: Sound Event Localization and Detection via Temporal Convolutional Networks. In Proceedings of the 2020 28th European Signal Processing Conference (EUSIPCO), Amsterdam, Netherlands, 18–21 January 2021. [Google Scholar] [CrossRef]
- Koutini, K.; Eghbal-Zadeh, H.; Widmer, G. Receptive Field Regularization Techniques for Audio Classification and Tagging with Deep Convolutional Neural Networks. IEEE/ACM Trans. Audio Speech Lang. Proc. 2021, 29, 1987–2000. [Google Scholar] [CrossRef]
- Zoom Corporation. Zoom F8. Available online: https://zoomcorp.com/en/us/field-recorders/field-recorders/f8/ (accessed on 2 June 2024).
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- ANSI S1.11-2004; American National Standard Specification for Octave-Band and Fractional-Octave-Band Analog and Digital Filters. Acoustical Society of America: Melville, NY, USA, 2009.
Name | Training | Validation | Type |
---|---|---|---|
A vs. A | alone | alone | direct |
A vs. P | alone | passenger | cross |
P vs. A | passenger | alone | cross |
P vs. P | passenger | passenger | direct |
M | alone and passenger | alone and passenger | mixed |
Maximum number of TCN layers L | 10 |
Feature size at each layer | 100 |
Training epochs | 5 |
Starting weight recursion | 0.005 |
Drop factor | 0.25 |
Filter size D for residual block convolutions | 2 |
Post-sum convolution feature size | 256 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Opinto, A.; Martalò, M.; Straccia, R.; Raheli, R. An Experimental Performance Assessment of Temporal Convolutional Networks for Microphone Virtualization in a Car Cabin. Sensors 2024, 24, 5163. https://doi.org/10.3390/s24165163
Opinto A, Martalò M, Straccia R, Raheli R. An Experimental Performance Assessment of Temporal Convolutional Networks for Microphone Virtualization in a Car Cabin. Sensors. 2024; 24(16):5163. https://doi.org/10.3390/s24165163
Chicago/Turabian StyleOpinto, Alessandro, Marco Martalò, Riccardo Straccia, and Riccardo Raheli. 2024. "An Experimental Performance Assessment of Temporal Convolutional Networks for Microphone Virtualization in a Car Cabin" Sensors 24, no. 16: 5163. https://doi.org/10.3390/s24165163
APA StyleOpinto, A., Martalò, M., Straccia, R., & Raheli, R. (2024). An Experimental Performance Assessment of Temporal Convolutional Networks for Microphone Virtualization in a Car Cabin. Sensors, 24(16), 5163. https://doi.org/10.3390/s24165163