skip to main content
10.1145/3594315.3594357acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccaiConference Proceedingsconference-collections
research-article

A Biological Population Threshold Coding with Robust Feature Extraction and Neuronal Jitter for SNN-based Speech Recognition

Published: 02 August 2023 Publication History

Abstract

The neuronal dynamics of brain-inspired spiking neural networks (SNNs) make them more suitable for processing dynamic signals. In SNN, neurons interact via discrete spikes. Neuronal coding is crucial to the advancement of neuromorphic computing. In the field of temporal coding, the population threshold coding (PTC) which uses multiple neurons to encode the trajectory of a time-varying signal attracts lots of research attention. It features noise robustness and spike sparsity. In this paper, we (1) evaluate the number of threshold levels and the number of filter banks in the PTC; (2) compare the Mel filter bank, the Gammatone filter bank, and the mix of two-based PTC; and (3) apply different levels of neuronal jitter to the encoding process using speech (TIDIGTS) and sound (RWCP) datasets. The classifications are performed using two types of classifiers: biologically plausible supervised Tempotron learning rule and backpropagation (BP)-based SNN learning rule. Our findings indicate that (1) the appropriate threshold resolution and number of filter banks are dependent on the datasets, and (2) PTC is robust to cochlear filter bank-based feature extractions and neuronal jitter.

References

[1]
Jacob N Allen, Hoda S Abdel-Aty-Zohdy, and Robert L Ewing. 2009. Cognitive processing using spiking neural networks. In Proceedings of the IEEE 2009 National Aerospace & Electronics Conference (NAECON). IEEE, 56–64.
[2]
Hung Tat Chen, Kwan Ting Ng, Amine Bermak, Man Kay Law, and Dominique Martinez. 2011. Spike latency coding in biologically inspired microelectronic nose. IEEE transactions on biomedical circuits and systems 5, 2 (2011), 160–168.
[3]
Peter Dayan and Laurence F Abbott. 2005. Theoretical neuroscience: computational and mathematical modeling of neural systems. MIT press.
[4]
Isabel Dean, Nicol S Harper, and David McAlpine. 2005. Neural population coding of sound level adapts to stimulus statistics. Nature neuroscience 8, 12 (2005), 1684–1689.
[5]
Rong Z Gan, Brian P Reeves, and Xuelin Wang. 2007. Modeling of sound transmission from ear canal to cochlea. Annals of biomedical engineering 35, 12 (2007), 2180–2195.
[6]
Charles D Gilbert and Torsten N Wiesel. 1992. Receptive field dynamics in adult primary visual cortex. Nature 356, 6365 (1992), 150–152.
[7]
Robert Gütig and Haim Sompolinsky. 2006. The tempotron: a neuron that learns spike timing–based decisions. Nature neuroscience 9, 3 (2006), 420–428.
[8]
Robert Gütig and Haim Sompolinsky. 2009. Time-warp–invariant neuronal processing. PLoS biology 7, 7 (2009), e1000141.
[9]
Peter Heil. 2004. First-spike latency of auditory neurons revisited. Current opinion in neurobiology 14, 4 (2004), 461–467.
[10]
Volker Hohmann. 2002. Frequency analysis and synthesis using a Gammatone filterbank. Acta Acustica united with Acustica 88, 3 (2002), 433–442.
[11]
Eric R Kandel, James H Schwartz, Thomas M Jessell, Steven Siegelbaum, A James Hudspeth, Sarah Mack, 2000. Principles of neural science. Vol. 4. McGraw-hill New York.
[12]
Nikola Kasabov, Kshitij Dhoble, Nuttapod Nuntalid, and Giacomo Indiveri. 2013. Dynamic evolving spiking neural networks for on-line spatio-and spectro-temporal pattern recognition. Neural Networks 41 (2013), 188–201.
[13]
R Gary Leonard and George Doddington. 1993. Tidigits speech corpus. Texas Instruments, Inc (1993).
[14]
Paul Mermelstein. 1976. Distance measures for speech recognition, psychological and instrumental. Pattern recognition and artificial intelligence 116 (1976), 374–388.
[15]
Satoshi Nakamura, Kazuo Hiyane, Futoshi Asano, Takanobu Nishiura, and Takeshi Yamada. 2000. Acoustical sound database in real environments for sound scene understanding and hands-free speech recognition. (2000).
[16]
Adedeji Olugboja, Zenghui Wang, and Yanxia Sun. 2021. Parallel Convolutional Neural Networks for Object Detection. Journal of Advances in Information Technology Vol 12, 4 (2021).
[17]
A Emin Orhan and Wei Ji Ma. 2015. Neural population coding of multiple stimuli. Journal of Neuroscience 35, 9 (2015), 3825–3841.
[18]
Zihan Pan, Yansong Chua, Jibin Wu, Malu Zhang, Haizhou Li, and Eliathamby Ambikairajah. 2020. An efficient and perceptually motivated auditory neural encoding and decoding algorithm for spiking neural networks. Frontiers in neuroscience 13 (2020), 1420.
[19]
Zihan Pan, Jibin Wu, Malu Zhang, Haizhou Li, and Yansong Chua. 2019. Neural population coding for effective temporal classification. In 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8.
[20]
Roy D Patterson. 1986. Auditory filters and excitation patterns as representations of frequency resolution. Frequency selectivity in hearing (1986).
[21]
Michael Pfeiffer and Thomas Pfeil. 2018. Deep learning with spiking neurons: opportunities and challenges. Frontiers in neuroscience (2018), 774.
[22]
Yasufumi Sakai, Yu Eto, and Yuta Teranishi. 2022. Structured pruning for deep neural networks with adaptive pruning rate derivation based on connection sensitivity and loss function. Journal of Advances in Information Technology (2022).
[23]
Benjamin Schrauwen and Jan Van Campenhout. 2003. BSA, a fast and accurate spike train encoding scheme. In Proceedings of the International Joint Conference on Neural Networks, 2003., Vol. 4. IEEE, 2825–2830.
[24]
Ben J Shannon and Kuldip K Paliwal. 2003. A comparative study of filter bank spacing for speech recognition. In Microelectronic engineering research conference, Vol. 41. 310–12.
[25]
Anup Vanarse, Adam Osseiran, and Alexander Rassau. 2016. A review of current neuromorphic approaches for vision, auditory, and olfactory sensors. Frontiers in neuroscience 10 (2016), 115.
[26]
Paul J Werbos. 1990. Backpropagation through time: what it does and how to do it. Proc. IEEE 78, 10 (1990), 1550–1560.
[27]
Jibin Wu, Yansong Chua, Malu Zhang, Haizhou Li, and Kay Chen Tan. 2018. A spiking neural network framework for robust sound classification. Frontiers in neuroscience 12 (2018), 836.
[28]
Yujie Wu, Lei Deng, Guoqi Li, Jun Zhu, and Luping Shi. 2018. Spatio-temporal backpropagation for training high-performance spiking neural networks. Frontiers in neuroscience 12 (2018), 331.
[29]
Rong Xiao, Rui Yan, Huajin Tang, and Kay Chen Tan. 2016. A spiking neural network model for sound recognition. In International Conference on Cognitive Systems and Signal Processing. Springer, 584–594.
[30]
Yanli Yao, Qiang Yu, Longbiao Wang, and Jianwu Dang. 2019. A spiking neural network with distributed keypoint encoding for robust sound recognition. In 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8.

Cited By

View all
  • (2024)A Low-Resource-Cost FPGA Implementation of Population Threshold Coding for Spiking Neural Networks2024 4th International Conference on Neural Networks, Information and Communication (NNICE)10.1109/NNICE61279.2024.10498425(73-79)Online publication date: 19-Jan-2024

Index Terms

  1. A Biological Population Threshold Coding with Robust Feature Extraction and Neuronal Jitter for SNN-based Speech Recognition

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        ICCAI '23: Proceedings of the 2023 9th International Conference on Computing and Artificial Intelligence
        March 2023
        824 pages
        ISBN:9781450399029
        DOI:10.1145/3594315
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 02 August 2023

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. neuronal jitter
        2. population threshold coding (PTC)
        3. robust feature extraction
        4. spiking neural networks (SNNs)

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        • Zhejiang Provincial Natural Science Foundation Exploration Youth Program
        • The Fundamental Research Funds for the Central Universities

        Conference

        ICCAI 2023

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)22
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 19 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)A Low-Resource-Cost FPGA Implementation of Population Threshold Coding for Spiking Neural Networks2024 4th International Conference on Neural Networks, Information and Communication (NNICE)10.1109/NNICE61279.2024.10498425(73-79)Online publication date: 19-Jan-2024

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media