


default search action
INTERSPEECH 2011: Florence, Italy
- 12th Annual Conference of the International Speech Communication Association, INTERSPEECH 2011, Florence, Italy, August 27-31, 2011. ISCA 2011
Keynote Sessions
Keynote 1
- Julia Hirschberg:
Speaking More Like You: Entrainment in Conversational Speech. 4001
Keynote 2
- Tom M. Mitchell:
Neural Representations of Word Meanings. 4002
Keynote 3
- Alex Pentland:
Signals and Speech. 1-4
Keynote 4: Roundtable - Future and Applications of Speech and Language Technologies for the Good Health of Society
- Gabriele Miceli:
Language Disorders: Viewpoints on a Complex Object. - Björn Granström:
Speech Technology in (Re)Habilitation of Persons with Communication Disabilities. - Hiroshi Ishiguro:
From Teleoperated Androids to Cellphones as Surrogates.
Regular Oral Sessions
Speaker Recognition - Modeling
- Avi Matza:
Skew Gaussian Mixture Models for Speaker Recognition. 5-8 - Orith Toledo-Ronen, Hagai Aronowitz, Ron Hoory, Jason W. Pelecanos, David Nahamoo:
Towards Goat Detection in Text-Dependent Speaker Verification. 9-12 - Jean-François Bonastre, Xavier Anguera Miró, Gabriel Hernández Sierra, Pierre-Michel Bousquet:
Speaker Modeling Using Local Binary Decisions. 13-16 - Hagai Aronowitz, Ron Hoory, Jason W. Pelecanos, David Nahamoo:
New Developments in Voice Biometrics for User Authentication. 17-20 - Miranti Indar Mandasari, Mitchell McLaren, David A. van Leeuwen:
Evaluation of i-vector Speaker Recognition Systems for Forensic Application. 21-24 - Mohammed Senoussaoui, Patrick Kenny, Niko Brümmer, Edward de Villiers, Pierre Dumouchel:
Mixture of PLDA Models in i-vector Space for Gender-Independent Speaker Recognition. 25-28
Speech Perception - Speech Intelligibility
- Nandini Iyer, Douglas Brungart, Brian D. Simpson:
Segregation of Whispered Speech Interleaved with Noise or Speech Maskers. 29-32 - Roi Kliper, Hendrik Kayser, Daphna Weinshall, Israel Nelken, Jörn Anemüller:
Monaural Azimuth Localization Using Spectral Dynamics of Speech. 33-36 - Jan Rennies, Thomas Brand, Birger Kollmeier:
Prediction of Binaural Intelligibility Level Differences in Reverberation. 37-40 - Aurore Gautreau, Michel Hoen, Fanny Meunier:
Let's All Speak Together! Exploring the Impact of Various Languages on the Comprehension of Speech in Multi-Linguistic Babble. 41-44 - Valeriy Shafiro, Stanley Sheft, Robert Risley:
Cross-Rate Variation in the Intelligibility of Dual-Rate Gated Speech in Older Listeners. 45-48 - Chia-ying Lee, James R. Glass, Oded Ghitza:
An Efferent-Inspired Auditory Model Front-End for Speech Recognition. 49-52
Speech Representation and Modelling
- Faten Ben Ali, Laurent Girin, Sonia Djaziri Larbi:
A Long-Term Harmonic Plus Noise Model for Speech Signals. 53-56 - Alan Ó Cinnéide, David Dorran, Mikel Gainza, Eugene Coyle:
A Frequency Domain Approach to ARX-LF Voiced Speech Parameterization and Synthesis. 57-60 - Vikram Ramanarayanan, Athanasios Katsamanis, Shrikanth S. Narayanan:
Automatic Data-Driven Learning of Articulatory Primitives from Real-Time MRI Data Using Convolutive NMF with Sparseness Constraints. 61-64 - Dong Wang, Ravichander Vipperla, Nicholas W. D. Evans:
Online Pattern Learning for Non-Negative Convolutive Sparse Coding. 65-68 - Nicolas Malyska, Thomas F. Quatieri, Robert B. Dunn:
Sinewave Representations of Nonmodality. 69-72 - Ch. Srikanth Raj, Thippur V. Sreenivas:
Time-Varying Signal Adaptive Transform and IHT Recovery of Compressive Sensed Speech. 73-76
Emotion, Speaking Style, and Social Behavior
- Martin Wöllmer, Felix Weninger, Florian Eyben, Björn W. Schuller:
Acoustic-Linguistic Recognition of Interest in Speech with Bottleneck-BLSTM Nets. 77-80 - Mustafa Erden, Levent M. Arslan:
Automatic Detection of Anger in Human-Human Call Center Dialogs. 81-84 - Keng-hao Chang, Howard Lei, John F. Canny:
Improved Classification of Speaking Styles for Mental Health Monitoring Using Phoneme Dynamics. 85-88 - Matthew Black, Panayiotis G. Georgiou, Athanasios Katsamanis, Brian R. Baucom, Shrikanth S. Narayanan:
"You made me do it": Classification of Blame in Married Couples' Interactions by Fusing Automatically Derived Speech and Language Information. 89-92 - Martijn Goudbeek, Marie Nilsenová:
Context and Priming Effects in the Recognition of Emotion of Old and Young Listeners. 93-96 - Agustín Gravano, Rivka Levitan, Laura Willson, Stefan Benus, Julia Hirschberg, Ani Nenkova:
Acoustic and Prosodic Correlates of Social Behavior. 97-100
HMM-based Speech Synthesis I
- Kyung Hwan Oh, June Sig Sung, Doo Hwa Hong, Nam Soo Kim:
Decision Tree-Based Clustering with Outlier Detection for HMM-Based Speech Synthesis. 101-104 - Hanna Silén, Elina Helander, Moncef Gabbouj:
Prediction of Voice Aperiodicity Based on Spectral Representations in HMM Speech Synthesis. 105-108 - Takashi Nose, Takao Kobayashi:
A Perceptual Expressivity Modeling Technique for Speech Synthesis Based on Multiple-Regression HSMM. 109-112 - Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda:
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis. 113-116 - Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi:
Feature-Space Transform Tying in Unified Acoustic-Articulatory Modelling for Articulatory Control of HMM-Based Speech Synthesis. 117-120 - Matt Shannon, Heiga Zen, William J. Byrne:
The Effect of Using Normalized Models in Statistical Speech Synthesis. 121-124
Speaker Recognition - Modeling, Automatic Procedures, Analysis I
- Ce Zhang, Rong Zheng, Bo Xu:
Restoring the Residual Speaker Information in Total Variability Modeling for Speaker Verification. 125-128 - Hagai Aronowitz, Oren Barkan:
New Developments in Joint Factor Analysis for Speaker Verification. 129-132 - Joaquin Gonzalez-Rodriguez:
Speaker Recognition Using Temporal Contours in Linguistic Units: The Case of Formant and Formant-Bandwidth Trajectories. 133-136 - Ondrej Glembek, Lukás Burget, Niko Brümmer, Oldrich Plchot, Pavel Matejka:
Discriminatively Trained i-vector Extractor for Speaker Verification. 137-140 - Michelle Hewlett Sanchez, Luciana Ferrer, Elizabeth Shriberg, Andreas Stolcke:
Constrained Cepstral Speaker Recognition Using Matched UBM and JFA Training. 141-144 - Alan McCree, Douglas E. Sturim, Douglas A. Reynolds:
A New Perspective on GMM Subspace Compensation Based on PPCA and Wiener Filtering. 145-148
Speech Perception - Perceptual Learning and Cross-Language Perception
- Odette Scharenborg, Holger Mitterer, James M. McQueen:
Perceptual Learning of Liquids. 149-152 - Annelie Tuinman, Holger Mitterer, Anne Cutler:
The Efficiency of Cross-Dialectal Word Recognition. 153-156 - Minoru Tsuzaki, Keiichi Tokuda, Hisashi Kawai, Jinfu Ni:
Estimation of Perceptual Spaces for Speaker Identities Based on the Cross-Lingual Discrimination Task. 157-160 - Sharon Peperkamp
, Camillia Bouchon:
The Relation Between Perception and Production in L2 Phonological Processing. 161-164 - Maria Paola Bissiri, María Luisa García Lecumberri, Martin Cooke, Jan Volín
:
The Role of Word-Initial Glottal Stops in Recognizing English Words. 165-168 - Caicai Zhang, Gang Peng, William S.-Y. Wang:
Effect of Language Experience on the Categorical Perception of Cantonese Vowel Duration. 169-172
Speech Analysis
- Christian Fischer Pedersen, Ove Andersen, Paul Dalsgaard:
Adaptive Estimation of Zeros of Time-Varying Z-Transforms. 173-176 - John Kane, Christer Gobl:
Identifying Regions of Non-Modal Phonation Using Features of the Wavelet Transform. 177-180 - Xing Fan, Keith W. Godin, John H. L. Hansen:
Acoustic Analysis of Whispered Speech for Phoneme and Speaker Dependency. 181-184 - Afsaneh Asaei, Mohammad Javad Taghizadeh, Hervé Bourlard, Volkan Cevher:
Multi-Party Speech Recovery Exploiting Structured Sparsity Models. 185-188 - Sri Harish Reddy Mallidi, Sriram Ganapathy, Hynek Hermansky:
Modulation Spectrum Analysis for Recognition of Reverberant Speech. 189-192 - Petko Nikolov Petkov, W. Bastiaan Kleijn
, Bert de Vries:
Discrete Choice Models for Non-Intrusive Quality Assessment. 193-196
Speech Enhancement and Dereverberation
- Keisuke Kinoshita, Mehrez Souden, Marc Delcroix, Tomohiro Nakatani:
Single Channel Dereverberation Using Example-Based Speech Enhancement with Uncertainty Decoding Technique. 197-200 - Jan S. Erkelens, Richard Heusdens:
A Statistical Room Impulse Response Model with Frequency Dependent Reverberation Time for Single-Microphone Late Reverberation Suppression. 201-204 - Chenxi Zheng, Tiago H. Falk, Wai-Yip Chan:
An Assessment of the Improvement Potential of Time-Frequency Masking for Speech Dereverberation. 205-208 - Thiago de M. Prego, Amaro A. de Lima, Sergio L. Netto:
Perceptual Improvement of a Two-Stage Algorithm for Speech Dereverberation. 209-212 - Najib Hadir, Friedrich Faubel, Dietrich Klakow:
A Model-Based Spectral Envelope Wiener Filter for Perceptually Motivated Speech Enhancement. 213-216 - Jorge I. Marin-Hurtado, Devangi N. Parikh, David V. Anderson:
Binaural Noise-Reduction Method Based on Blind Source Separation and Perceptual Post Processing. 217-220
ASR - Feature Extraction II
- Tim Ng, Bing Zhang, Spyridon Matsoukas, Long Nguyen:
Region Dependent Transform on MLP Features for Speech Recognition. 221-224 - Martin Heckmann, Claudius Gläser:
Discriminant Sub-Space Projection of Spectro-Temporal Speech Features Based on Maximizing Mutual Information. 225-228 - Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura:
Combining Feature Space Discriminative Training with Long-Term Spectro-Temporal Features for Noise-Robust Speech Recognition. 229-232 - Sumit Chopra, Patrick Haffner, Dimitrios Dimitriadis:
Combining Frame and Segment Level Processing via Temporal Pooling for Phonetic Classification. 233236 - Dong Yu, Michael L. Seltzer:
Improved Bottleneck Features Using Pretrained Deep Neural Networks. 237-240 - Yuan-Fu Liao, Chia-Hsing Lin, We-Der Fang:
Minimum Classification Error Based Spectro-Temporal Feature Extraction for Robust Audio Classification. 241-244
Speaker Recognition - Modeling, Automatic Procedures, Analysis II
- Ce Zhang, Rong Zheng, Bo Xu:
Data-Driven Gaussian Component Selection for Fast GMM-Based Speaker Verification. 245-248 - Daniel Garcia-Romero, Carol Y. Espy-Wilson:
Analysis of i-vector Length Normalization in Speaker Recognition Systems. 249-252 - Weiwu Jiang, Zhifeng Li, Helen M. Meng:
An Analysis Framework Based on Random Subspace Sampling for Speaker Verification. 253-256 - Nicolas Scheffer, Yun Lei, Luciana Ferrer:
Factor Analysis Back Ends for MLLR Transforms in Speaker Recognition. 257-260 - Craig S. Greenberg, Alvin F. Martin, Bradford Barr, George R. Doddington:
Report on Performance Results in the NIST 2010 Speaker Recognition Evaluation. 261-264 - Marcel Kockmann, Luciana Ferrer, Lukás Burget, Jan Cernocký:
iVector Fusion of Prosodic and Cepstral Features for Speaker Verification. 265-268
Speech Production - Articulatory Measurements
- Yoon-Chul Kim, Michael I. Proctor, Shrikanth S. Narayanan, Krishna S. Nayak:
Visualization of Vocal Tract Shape Using Interleaved Real-Time MRI of Multiple Scan Planes. 269-272 - Ralf Winkler, Susanne Fuchs, Pascal Perrier, Mark Tiede:
Biomechanical Tongue Models: An Approach to Studying Inter-Speaker Variability. 273-276 - Jun Wang, Jordan R. Green, Ashok Samal, David Marx:
Quantifying Articulatory Distinctiveness of Vowels. 277-280 - Michael I. Proctor, Adam C. Lammert, Athanasios Katsamanis, Louis M. Goldstein, Christina Hagedorn, Shrikanth S. Narayanan:
Direct Estimation of Articulatory Kinematics from Real-Time Magnetic Resonance Image Sequences. 281-284 - Peter Birkholz, Christiane Neuschaefer-Rube:
Combined Optical Distance Sensing and Electropalatography to Measure Articulation. 285-288 - Santitham Prom-on, Yi Xu, Fang Liu:
Simulating Post-L F0 Bouncing by Modeling Articulatory Dynamics. 289-292
Acoustic Event Detection
- Jürgen T. Geiger, Mohamed Anouar Lakhal, Björn W. Schuller, Gerhard Rigoll:
Learning New Acoustic Events in an HMM-Based System Using MAP Adaptation. 293-296 - Yi Ren Leng, Tran Huy Dat, Norihide Kitaoka, Haizhou Li:
Alternative Frequency Scale Cepstral Coefficient for Robust Sound Event Recognition. 297-300 - Akinori Ito, Akihito Aiba, Masashi Ito, Shozo Makino:
Evaluation of Abnormal Sound Detection using Multi-Stage GMM in Various Environments. 301-304 - Joerg Schmalenstroeer, Markus Bartek, Reinhold Haeb-Umbach:
Unsupervised Learning of Acoustic Events Using Dynamic Time Warping and Hierarchical K-Means++ Clustering. 305-308 - Pradeep Natarajan, Stavros Tsakalidis, Vasant Manohar, Rohit Prasad, Premkumar Natarajan:
Unsupervised Audio Analysis for Categorizing Heterogeneous Consumer Domain Videos. 313-316
Speech Synthesis - Unit Selection and Hybrid Approaches
- Vivek Kumar Rangarajan Sridhar, Ann K. Syrdal, Alistair Conkie, Srinivas Bangalore:
Enriching Text-to-Speech Synthesis Using Automatic Dialog Act Tags. 317-320 - Lukas Latacz, Wesley Mattheyses, Werner Verhelst:
Joint Target and Join Cost Weight Training for Unit Selection Synthesis. 321-324 - Andreas Windmann, Igor Jauk, Fabio Tamburini, Petra Wagner:
Prominence-Based Prosody Prediction for Unit Selection Speech Synthesis. 325-328 - Sathish Pammi, Marc Schröder:
Evaluating the Meaning of Synthesized Listener Vocalizations. 329-332 - Iñaki Sainz, Daniel Erro, Eva Navas, Inma Hernáez:
A Hybrid TTS Approach for Prosody and Acoustic Modules. 333-336 - Alexander Sorin, Slava Shechtman, Vincent Pollet:
Uniform Speech Parameterization for Multi-Form Segment Synthesis. 337-340
Speech Enhancement Analysis and Evaluation
- Ryoichi Miyazaki, Hiroshi Saruwatari, Kiyohiro Shikano:
Theoretical Analysis of Musical Noise and Speech Distortion in Structure-Generalized Parametric Blind Spatial Subtraction Array. 341-344 - Yan Tang, Martin Cooke:
Subjective and Objective Evaluation of Speech Intelligibility Enhancement Under Constant Energy and Duration Constraints. 345-348 - Nagarjuna Reddy Muraka, Chandra Sekhar Seelamantula:
A Risk-Estimation-Based Comparison of Mean Square Error and Itakura-Saito Distortion Measures for Speech Enhancement. 349-352 - Mahdi Triki:
On Noise Tracking for Noise Floor Estimation. 353-356 - Ben Milner:
Maximum a posteriori Estimation of Noise from Non-Acoustic Reference Signals in Very Low Signal-to-Noise Ratio Environments. 357-360 - Ryo Wakisaka, Hiroshi Saruwatari, Kiyohiro Shikano, Tomoya Takatani:
Blind Speech Prior Estimation for Generalized Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator. 361-364
Speaker Recognition - Analysis and Statistics I
- Kornel Laskowski, Qin Jin:
Harmonic Structure Transform for Speaker Recognition. 365-368 - Hemant A. Patil, Maulik C. Madhavi, Keshab K. Parhi:
Combining Evidence from Spectral and Source-Like Features for Person Recognition from Humming. 369-372 - Yanhua Long, Zhi-Jie Yan, Frank K. Soong, Li-Rong Dai, Wu Guo:
Improvements in Speaker Characterization Using Spectral Subband Energy Based on Harmonic plus Noise Model. 373-376 - Yosef A. Solewicz, Hagai Aronowitz:
Implicit Segmentation in Two-Wire Speaker Recognition. 377-380 - Sibel Yaman, Jason W. Pelecanos, Mohamed Kamal Omar:
Boosting Speaker Recognition Performance with Compact Representations. 381-384 - Carlos Vaquero, Alfonso Ortega
, Eduardo Lleida:
Partitioning of Two-Speaker Conversation Datasets. 385-388
Speech Production - Coarticulation and Speech Timing
- Stefan Benus, Marianne Pouplier:
Jaw Movement in Vowels and Liquids Forming the Syllable Nucleus. 389-392 - Barbara Gili Fivela, Antonio Stella, Sonia D'Apolito, Francesco Sigona:
Coarticulation Across Prosodic Domains in Italian: An Ultrasound Investigation. 393-396 - Juraj Simko, Fred Cummins, Stefan Benus:
Investigating the Stability of Intergestural Timing Relations. 397-400 - Claudio Zmarich, Barbara Gili Fivela, Pascal Perrier, Christophe Savariaux, Graziano Tisato:
Speech Timing Organization for the Phonological Length Contrast in Italian Consonants. 401-404 - Chiara Celata, Silvia Calamai:
Timing in Italian VNC Sequences at Different Speech Rates. 405-408 - Christina Hagedorn, Michael I. Proctor, Louis Goldstein:
Automatic Analysis of Singleton and Geminate Consonant Articulation Using Real-Time Magnetic Resonance Imaging. 409-412
Speech Segmentation
- Yih-Ru Wang:
A Two-Stage Sample-Based Phone Boundary Detector Using Segmental Similarity Features. 413-416 - Qiang Huang, Stephen J. Cox:
Iterative Improvement of Speaker Segmentation in a Noisy Environment Using High-Level Knowledge. 417-420 - Diego Castán, Carlos Vaquero, Alfonso Ortega, David Martínez González, Jesús Antonio Villalba López, Eduardo Lleida:
Hierarchical Audio Segmentation with HMM and Factor Analysis in Broadcast News Domain. 421-424 - Ozlem Kalinli:
Syllable Segmentation of Continuous Speech Using Auditory Attention Cues. 425-428 - Vijayaditya Peddinti, Kishore Prahallad:
Exploiting Phone-Class Specific Landmarks for Refinement of Segment Boundaries in TTS Databases. 429-432 - Agnès Pedone, Juan José Burred, Simon Maller, Pierre Leveau:
Phoneme-Level Text to Audio Synchronization on Speech Signals with Background Music. 433-436
ASR - Acoustic Models II
- Frank Seide, Gang Li, Dong Yu:
Conversational Speech Transcription Using Context-Dependent Deep Neural Networks. 437-440 - Guangsen Wang, Khe Chai Sim:
Sequential Classification Criteria for NNs in Automatic Speech Recognition. 441-444 - Mathew Magimai-Doss, Ramya Rasipuram, Guillermo Aradilla, Hervé Bourlard:
Grapheme-Based Automatic Speech Recognition Using KL-HMM. 445-448 - Joseph Keshet, Chih-Chieh Cheng, Mark Stoehr, David A. McAllester:
Direct Error Rate Minimization of Hidden Markov Models. 449-452 - Xie Sun, Xin Chen, Yunxin Zhao:
On the Effectiveness of Statistical Modeling Based Template Matching Approach for Continuous Speech Recognition. 453-456 - Guangsen Wang, Khe Chai Sim:
Comparison of Smoothing Techniques for Robust Context Dependent Acoustic Modelling in Hybrid NN/HMM Systems. 457-460
Robust Speech Recognition II
- Ramón Fernandez Astudillo, João Paulo da Silva Neto:
Propagation of Uncertainty Through Multilayer Perceptrons for Robust Automatic Speech Recognition. 461-464 - Katariina Mahkonen, Antti Hurmalainen, Tuomas Virtanen, Jort F. Gemmeke:
Mapping Sparse Representation to State Likelihoods in Noise-Robust Automatic Speech Recognition. 465-468 - Heikki Kallasjoki, Ulpu Remes, Jort F. Gemmeke, Tuomas Virtanen, Kalle J. Palomäki:
Uncertainty Measures for Improving Exemplar-Based Source Separation. 469-472 - Hsien-Cheng Liao, Yuan-Fu Liao, Chin-Hui Lee:
Maximum Confidence Measure Based Interaural Phase Difference Estimation for Noise Masking in Dual-Microphone Robust Speech Recognition. 473-476 - Shirin Badiezadegan, Richard C. Rose:
A Performance Monitoring Approach to Fusing Enhanced Spectrogram Channels in Robust Speech Recognition. 477-480 - Ning Cheng, Xunying Liu, Lan Wang:
Generalized Variable Parameter HMMs for Noise Robust Speech Recognition. 481-484
Speaker Recognition - Analysis and Statistics II
- Pierre-Michel Bousquet, Driss Matrouf, Jean-François Bonastre:
Intersession Compensation and Scoring Methods in the i-vectors Space for Speaker Recognition. 485-488 - Szymon Drgas, Adam Dabrowski:
Kernel Alignment Maximization for Speaker Recognition Based on High-Level Features. 489-492 - Balaji Vasan Srinivasan, Daniel Garcia-Romero, Dmitry N. Zotkin, Ramani Duraiswami:
Kernel Partial Least Squares for Speaker Recognition. 493-496 - Mohamed Kamal Omar, Jason W. Pelecanos:
Conversational-Side-Specific Inter-Session Variability Compensation. 497-500 - David A. van Leeuwen, Niko Brümmer:
A Speaker Line-Up for the Likelihood Ratio. 501-504 - Jesús Antonio Villalba López, Niko Brümmer:
Towards Fully Bayesian Speaker Recognition: Integrating Out the Between-Speaker Covariance. 505-508
Speaker Recognition - Analysis and Statistics II
- Hemant A. Patil, Pallavi N. Baljekar:
Novel VTEO Based Mel Cepstral Features for Classification of Normal and Pathological Voices. 509-512 - Eiji Shimura, Kazuhiko Kakehi:
Temporal Performance of Dysarthric Patients in Speech and Tapping Tasks. 513-516 - Xinhui Zhou, Maureen L. Stone, Carol Y. Espy-Wilson:
A Comparative Acoustic Study on Speech of Glossectomy Patients and Normal Subjects. 517-520 - Ali Alpan, Francis Grenez, Jean Schoentgen:
Dysperiodicity Analysis of Perceptually Assessed Synthetic Speech Stimuli. 521-524 - Alain Ghio, Frédérique Weisz, Giovanna Baracca, Giovanna Cantarella, Danièle Robert, Virginie Woisard, Franco Fussi, Antoine Giovanni:
Is the Perception of Voice Quality Language-Dependant? A Comparison of French and Italian Listeners and Dysphonic Speakers. 525-528 - Juan Rafael Orozco-Arroyave, S. Murillo Rendón, Andrés Marino Álvarez-Meza, Julián D. Arias-Londoño, Edilson Delgado-Trejos, Jesús Francisco Vargas-Bonilla, César Germán Castellanos-Domínguez:
Automatic Selection of Acoustic and Non-Linear Dynamic Features in Voice Signals for Hypernasality Detection. 529-532
ASR - Lexical, Prosodic and Multi-Lingual Models
- Sravana Reddy, Evandro B. Gouvêa:
Learning from Mistakes: Expanding Pronunciation Lexicons Using Word Recognition Errors. 533-536 - David Imseng, Hervé Bourlard, John Dines, Philip N. Garner, Mathew Magimai-Doss:
Improving Non-Native ASR Through Stochastic Multilingual Phoneme Space Transformations. 537-540 - Scott Novotney, Richard M. Schwartz, Sanjeev Khudanpur:
Unsupervised Arabic Dialect Adaptation with Self-Training. 541-544 - Dino Seppi, Kris Demuynck, Dirk Van Compernolle:
Template-Based Automatic Speech Recognition Meets Prosody. 545-548 - Ibrahim Badr, Ian McGraw, James R. Glass:
Pronunciation Learning from Continuous Speech. 549-552 - Yanmin Qian, Daniel Povey, Jia Liu:
State-Level Data Borrowing for Low-Resource Speech Recognition Based on Subspace GMMs. 553-560
Source Separation
- Yasmina Benabderrahmane, Sid-Ahmed Selouani, Douglas D. O'Shaughnessy:
Blind Speech Separation in Multiple Environments Using a Frequency Oriented PCA Method for Convolutive Mixtures. 557-560 - Zbynek Koldovský, Jirí Málek, Petr Tichavský:
Blind Speech Separation in Time-Domain Using Block-Toeplitz Structure of Reconstructed Signal Matrices. 561-564 - Auxiliadora Sarmiento, Iván Durán-Díaz, Sergio Cruces, Pablo Aguilera:
Generalized Method for Solving the Permutation Problem in Frequency-Domain Blind Source Separation of Convolved Speech Signals. 565-568 - Emad M. Grais, Hakan Erdogan:
Adaptation of Speaker-Specific Bases in Non-Negative Matrix Factorization for Single Channel Speech-Music Separation. 569-572 - Shuhua Zhang, Laurent Girin:
An Informed Source Separation System for Speech Signals. 573-576 - Ngoc Thuy Tran, William G. Cowley, André Pollok:
Adaptive Blocking Beamformer for Speech Separation. 577-580
Multimodal Signal Processing
- Per Ola Kristensson, Keith Vertanen:
Asynchronous Multimodal Text Entry Using Speech and Gesture Keyboards. 581-584 - Niall McLaughlin, Ji Ming, Danny Crookes:
Robust Bimodal Person Identification Using Face and Speech with Limited Training Data and Corruption of Both Modalities. 585-588 - Atef Ben Youssef, Thomas Hueber, Pierre Badin, Gérard Bailly:
Toward a Multi-Speaker Visual Articulatory Feedback System. 589-592 - Thomas Hueber, Elie-Laurent Benaroya, Bruce Denby, Gérard Chollet:
Statistical Mapping Between Articulatory and Acoustic Data for an Ultrasound-Based Silent Speech Interface. 593-596 - Joerg Schmalenstroeer, Florian Jacob, Reinhold Haeb-Umbach, Marius H. Hennecke, Gernot A. Fink:
Unsupervised Geometry Calibration of Acoustic Sensor Networks Using Source Correspondences. 597-600 - Michael Wand, Matthias Janke, Tanja Schultz:
Investigations on Speaking Mode Discrepancies in EMG-Based Speech Recognition. 601-604
ASR - Language Models II
- Tomás Mikolov, Anoop Deoras, Stefan Kombrink, Lukás Burget, Jan Cernocký:
Empirical Evaluation and Combination of Advanced Language Modeling Techniques. 605-608 - Geoffrey Zweig, Shuangyu Chang:
Personalizing Model M for Voice-Search. 609-612 - Takahiro Shinozaki, Yu Kubota, Sadaoki Furui, Eiji Utsunomiya, Yasutaka Shindoh:
Sentence Selection by Direct Likelihood Maximization for Language Model Adaptation. 613-616 - Ebru Arisoy, Bhuvana Ramabhadran, Hong-Kwang Jeff Kuo:
Feature Combination Approaches for Discriminative Language Models. 617-620 - Sankaranarayanan Ananthakrishnan, Stavros Tsakalidis, Rohit Prasad, Premkumar Natarajan:
On-Line Language Model Biasing for Multi-Pass Automatic Speech Recognition. 621-624 - Moonyoung Kang, Tim Ng, Long Nguyen:
Mandarin Word-Character Hybrid-Input Neural Network Language Model. 625-628
Phonology and Phonetics
- Vahid Sadeghi:
Laryngealization and Breathiness in Persian. 629-632 - Viola Müller, Jonathan Harrington, Felicitas Kleber, Ulrich Reubold:
Age-Dependent Differences in the Neutralization of the Intervocalic Voicing Contrast: Evidence from an Apparent-Time Study on East Franconian. 633-636 - Barbara Samlowski, Bernd Möbius, Petra Wagner:
Comparing Syllable Frequencies in Corpora of Written and Spoken Language. 637-640 - Luca Iacoponi, Renata Savy:
Sylli: Automatic Phonological Syllabification for Italian. 641-644 - André N. Xavier, Plínio A. Barbosa:
A Preliminary Study on the Production of Signs in Brazilian Sign Language when One of the Manual Articulators is Unavailable. 645-648 - Ho-hsien Pan, Mao-Hsu Chen, Shao-Ren Lyu:
Electroglottograph and Acoustic Cues for Phonation Contrasts in Taiwan Min Falling Tones. 649-652
Voice Conversion
- Daisuke Saito, Keisuke Yamamoto, Nobuaki Minematsu, Keikichi Hirose:
One-to-Many Voice Conversion Based on Tensor Representation of Speaker Space. 653-656 - Yu Qiao, Tong Tong, Nobuaki Minematsu:
A Study on Bag of Gaussian Model with Application to Voice Conversion. 657-660 - Lei Li, Yoshihiko Nankaku, Keiichi Tokuda:
A Bayesian Approach to Voice Conversion Based on GMMs Using Multiple Model Structures. 661-664 - Mahdi Eslami, Hamid Sheikhzadeh, Abolghasem Sayadiyan:
Quality Improvement of Voice Conversion Systems Based on Trellis Structured Vector Quantization. 665-668 - Hadas Benisty, David Malah:
Voice Conversion Using GMM with Enhanced Global Variance. 669-672 - Elizabeth Godoy, Olivier Rosec, Thierry Chonavel:
Spectral Envelope Transformation Using DFW and Amplitude Scaling for Voice Conversion with Parallel or Nonparallel Corpora. 673-676
Robust Speech Recognition III
- Pejman Mowlaee, Rahim Saeidi, Zheng-Hua Tan, Mads Græsbøll Christensen, Tomi Kinnunen, Pasi Fränti, Søren Holdt Jensen:
Sinusoidal Approach for the Single-Channel Speech Separation and Recognition Challenge. 677-680 - Cemil Demir, A. Taylan Cemgil, Murat Saraclar:
Semi-Supervised Single-Channel Speech-Music Separation for Automatic Speech Recognition. 681-684 - Hari Krishna Maganti, Marco Matassoni:
A Level-Dependent Auditory Filter-Bank for Speech Recognition in Reverberant Environments. 685-688 - Mehrez Souden, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani:
A Multichannel Feature-Based Processing for Robust Speech Recognition. 689-692 - Xiong Xiao, Jinyu Li, Chng Eng Siong, Haizhou Li:
Feature Normalization Using Structured Full Transforms for Robust Speech Recognition. 693-696 - Masakiyo Fujimoto, Shinji Watanabe, Tomohiro Nakatani:
A Robust Estimation Method of Noise Mixture Model for Noise Suppression. 697-700
Spoken Language Understanding
- Xiao Li, Ye-Yi Wang, Gökhan Tür:
Multi-Task Learning for Spoken Language Understanding with Shared Slots. 701-704 - Dustin Hillard, Asli Celikyilmaz, Dilek Hakkani-Tür, Gökhan Tür:
Learning Weighted Entity Lists from Web Click Logs for Spoken Language Understanding. 705-708 - Dilek Hakkani-Tür, Gökhan Tür, Larry P. Heck, Elizabeth Shriberg:
Bootstrapping Domain Detection Using Query Click Logs for New Domains. 709-712 - Asli Celikyilmaz, Dilek Hakkani-Tür, Gökhan Tür:
Approximate Inference for Domain Detection in Spoken Language Understanding. 713-716 - Chien-Lin Huang, Bin Ma, Haizhou Li, Chung-Hsien Wu:
Speech Indexing Using Semantic Context Inference. 717-720 - Yun-Cheng Ju, Jasha Droppo:
Automatically Optimizing Utterance Classification Performance without Human in the Loop. 721-724
Dialect and Accent Identification
- Philippe Boula de Mareüil, Jean-Luc Rouas, Manuela Yapomo:
In Search of Cues Discriminating West-African Accents in French. 725-728 - Abualsoud Hanani, Martin J. Russell, Michael J. Carey:
Computer and Human Recognition of Regional Accents of British English. 729-732 - Rong Tong, Bin Ma, Haizhou Li, Chng Eng Siong:
Target-Aware Lattice Rescoring for Dialect Recognition. 733-736 - Murat Akbacak, Dimitra Vergyri, Andreas Stolcke, Nicolas Scheffer, Arindam Mandal:
Effective Arabic Dialect Classification Using Diverse Phonotactic Models. 737-740 - Nancy F. Chen, Wade Shen, Joseph P. Campbell:
Characterizing Deletion Transformations Across Dialects Using a Sophisticated Tying Mechanism. 741-744 - Fadi Biadsy, Julia Hirschberg, Daniel P. W. Ellis:
Dialect and Accent Recognition Using Phonetic-Segmentation Supervectors. 745-748
First Language Acquisition
- Kouki Miyazawa, Hideaki Miura, Hideaki Kikuchi, Reiko Mazuka:
The Multi Timescale Phoneme Acquisition Model of the Self-Organizing Based on the Dynamic Features. 749-752 - Helen Brown, M. Gareth Gaskell:
The Time-Course of Talker-Specificity Effects for Newly-Learned Pseudowords: Evidence for a Hybrid Model of Lexical Representation. 753-756 - Britta Lintfert, Antje Schweitzer, Bernd Möbius:
A Parametric Approach to Intonation Acquisition Research: Validation on Child-Directed Speech Data. 757-760 - Maarten Versteegh, Louis ten Bosch, Lou Boves:
Modelling Novelty Preference in Word Learning. 761-764 - Gopal Ananthakrishnan, Giampiero Salvi:
Using Imitation to Learn Infant-Adult Acoustic Mappings. 765-768 - Christina Bergmann, Louis ten Bosch, Lou Boves:
Thresholding Word Activations for Response Scoring - Modelling Psycholinguistic Data. 769-772
ASR - Acoustic Models III
- Roger Hsiao, Tanja Schultz:
Generalized Baum-Welch Algorithm and its Implication to a New Extended Baum-Welch Algorithm. 773-776 - Frank Diehl, Mark John Francis Gales, Xunying Liu, Marcus Tomalin, Philip C. Woodland:
Word Boundary Modelling and Full Covariance Gaussians for Arabic Speech-to-Text Systems. 777-780 - Tom Ko, Brian Mak:
A Fully Automated Derivation of State-Based Eigentriphones for Triphone Modeling with No Tied States Using Regularization. 781-784 - Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo, Dimitri Kanevsky:
Reducing Computational Complexities of Exemplar-Based Sparse Representations with Applications to Large Vocabulary Speech Recognition. 785-788 - Yu Zhang, Jian Xu, Zhi-Jie Yan, Qiang Huo:
An i-vector Based Approach to Training Data Clustering for Improved Speech Recognition. 789-792 - Senaka Buthpitiya, Ian R. Lane, Jike Chong:
Rapid Training of Acoustic Models Using Graphics Processing Unit. 793-796
Spoken Dialogue Systems I
- Teruhisa Misu, Kiyonori Ohtake, Chiori Hori, Hisashi Kawai, Satoshi Nakamura:
User Study of Spoken Decision Support System. 797-800 - Antoine Raux, Yi Ma:
Efficient Probabilistic Tracking of User Goal and Dialog History for Spoken Dialog Systems. 801-804 - Alexander Schmitt, Alexander Zgorzelski, Wolfgang Minker:
Tackling a Shilly-Shally Classifier for Predicting Task Success in Spoken Dialogue Interaction. 805-808 - Toyomi Meguro, Yasuhiro Minami, Ryuichiro Higashinaka, Kohji Dohsaka:
Evaluation of Listening-Oriented Dialogue Control Rules Based on the Analysis of HMMs. 809-812 - David Suendermann, Jackson Liscombe, Jonathan Bloom, Grace Li, Roberto Pieraccini:
Large-Scale Experiments on Data-Driven Design of Commercial Spoken Dialog Systems. 813-816 - Fredrik Kronlid, Jessica Villing, Alexander Berman, Staffan Larsson:
Comparing System-Driven and Free Dialogue in In-Vehicle Interaction. 817-820
Spoken Language Resources, Evaluation and Standardization II
- Michael A. Carlin, Samuel Thomas, Aren Jansen, Hynek Hermansky:
Rapid Evaluation of Speech Representations for Spoken Term Discovery. 821-824 - Ben Hixon, Eric Schneider, Susan L. Epstein:
Phonemic Similarity Metrics to Compare Pronunciation Methods. 825-828 - Janto Skowronek, Alexander Raake:
Investigating the Effect of Number of Interlocutors on the Quality of Experience for Multi-Party Audio Conferencing. 829-832 - Jáchym Kolár, Lori Lamel:
On Development of Consistently Punctuated Speech Corpora. 833-836 - Shrikanth S. Narayanan, Erik Bresch, Prasanta Kumar Ghosh, Louis Goldstein, Athanasios Katsamanis, Yoon Kim, Adam C. Lammert, Michael I. Proctor, Vikram Ramanarayanan, Yinghua Zhu:
A Multimodal Real-Time MRI Articulatory Corpus for Speech Research. 837-840 - Denis Burnham, Dominique Estival, Steven Fazio, Jette Viethen, Felicity Cox, Robert Dale, Steve Cassidy, Julien Epps, Roberto Togneri, Michael Wagner, Yuko Kinoshita, Roland Göcke
, Joanne Arciuli, Mark Onslow
, Trent W. Lewis, Andrew Butcher, John Hajek:
Building an Audio-Visual Corpus of Australian English: Large Corpus Collection with an Economical Portable and Replicable Black Box. 841-844
Language Identification
- Rong Zheng, Ce Zhang, Bo Xu:
Data-Driven UBM Generation via Tied Gaussians for GMM-Supervector Based Accent Identification. 845-848 - David Martínez González, Jesús Antonio Villalba López, Antonio Miguel, Alfonso Ortega, Eduardo Lleida:
I3A Language Recognition System for Albayzin 2010 LRE. 849-852 - Mikel Peñagarikano, Amparo Varona, Luis Javier Rodríguez, Germán Bordel:
Dimensionality Reduction for Using High-Order n-Grams in SVM-Based Phonotactic Language Recognition. 853-856 - Najim Dehak
, Pedro A. Torres-Carrasquillo, Douglas A. Reynolds, Réda Dehak:
Language Recognition via i-vectors and Dimensionality Reduction. 857-860 - David Martínez González, Oldrich Plchot, Lukás Burget, Ondrej Glembek, Pavel Matejka:
Language Recognition in iVectors Space. 861-864
Second Language Acquisition, Development and Learning II
- Xiaojun Qian, Helen M. Meng, Frank K. Soong:
On Mispronunciation Lexicon Generation Using Joint-Sequence Multigrams in Computer-Aided Pronunciation Training (CAPT). 865-868 - Bianca Sisinni, Mirko Grimaldi:
Validating a Second Language Perception Model for Classroom Context - A Longitudinal Study within the Perceptual Assimilation Model. 869-872 - Makiko Sadakata, James M. McQueen:
The Role of Variability in Non-Native Perceptual Learning of a Japanese Geminate-Singleton Fricative Contrast. 873-876 - Jared Bernstein, Jian Cheng, Masanori Suzuki:
Fluency Changes with General Progress in L2 Proficiency. 877-880 - Slim Ouni:
Tongue Gestures Awareness and Pronunciation Training. 881-844 - Wim A. van Dommelen, Valérie Hazan:
Impact of Speaker Variability on Speech Perception in Non-Native Listeners. 885-888
ASR - Search, Keyword Spotting and Confidence Measures II
- Evelyn Kurniawati, Samsudin Ng, Karthik Muralidhar, Sapna George:
A Template Based Voice Trigger System Using Bhattacharyya Edit Distance. 889-892 - David Nolden, Ralf Schlüter, Hermann Ney:
Acoustic Look-Ahead for More Efficient Decoding in LVCSR. 893-896 - Frank Duckhorn, Matthias Wolff, Rüdiger Hoffmann:
A New Epsilon Filter for Efficient Composition of Weighted Finite-State Transducers. 897-900 - Sabato Marco Siniscalchi, Torbjørn Svendsen, Chin-Hui Lee:
A Bottom-Up Stepwise Knowledge-Integration Approach to Large Vocabulary Continuous Speech Recognition Using Weighted Finite State Machines. 901-904 - Matthew Stephen Seigel, Philip C. Woodland:
Combining Information Sources for Confidence Estimation with CRF Models. 905-908 - Kouichi Katsurada, Shinta Sawada, Shigeki Teshima, Yurie Iribe, Tsuneo Nitta:
Evaluation of Fast Spoken Term Detection Using a Suffix Array. 909-912
SLP for Information Extraction and Retrieval I
- Timothy J. Hazen:
Latent Topic Modeling for Audio Corpus Summarization. 913-916 - Richard Dufour, Yannick Estève, Paul Deléglise:
Investigation of Spontaneous Speech Characterization Applied to Speaker Role Recognition. 917-920 - Armando Muscariello, Guillaume Gravier, Frédéric Bimbot:
Zero-Resource Audio-Only Spoken Term Detection Based on a Combination of Template Matching Techniques. 921-924 - Yeon-Jun Kim, David C. Gibbon:
Automatic Learning in Content Indexing Service Using Phonetic Alignment. 925-928 - Pei-Ning Chen, Kuan-Yu Chen, Berlin Chen:
Leveraging Relevance Cues for Improved Spoken Document Retrieval. 929-932 - Yun-Nung Chen, Yu Huang, Ching-feng Yeh, Lin-Shan Lee:
Spoken Lecture Summarization by Random Walk over a Graph Constructed with Automatically Extracted Key Terms. 933-936
Speaker Diarization I
- Hagai Aronowitz:
Speaker Diarization Using a priori Acoustic Information. 937-940 - Kofi Boakye, Oriol Vinyals, Gerald Friedland:
Improved Overlapped Speech Handling for Speaker Diarization. 941-944 - Stephen Shum, Najim Dehak, Ekapol Chuangsuwanich, Douglas A. Reynolds, James R. Glass:
Exploiting Intra-Conversation Variability for Speaker Diarization. 945-948 - Masafumi Nishida, Seiichi Yamamoto:
Speaker Clustering Based on Non-Negative Matrix Factorization. 949-952 - Sree Harsha Yella, Fabio Valente:
Information Bottleneck Features for HMM/GMM Speaker Diarization of Meetings Recordings. 953-956 - David Wang, Robbie Vogt, Sridha Sridharan, David Dean:
Cross Likelihood Ratio Based Speaker Clustering Using Eigenvoice Models. 957-960
Prosody I
- Giuseppina Turco, Michele Gubian, Jessamyn Schertz:
A Quantitative Investigation of the Prosody of Verum Focus in Italian. 961-964 - Amelie Dorn, Ailbhe Ní Chasaide:
Effects of Focus on f0 and Duration in Irish (Gaelic) Declaratives. 965-968 - Jennifer Cole, Stefanie Shattuck-Hufnagel:
The Phonology and Phonetics of Perceived Prosody: What do Listeners Imitate? 969-972 - Amandine Michelas, Noël Nguyen:
Uncovering the Effect of Imitation on Tonal Patterns of French Accentual Phrases. 973-976 - Pilar Prieto, Cecilia Pugliesi, Joan Borràs-Comes
, Ernesto Arroyo, Josep Blat:
Crossmodal Prosodic and Gestural Contribution to the Perception of Contrastive Focus. 977-980 - Erin Cvejic
, Jeesun Kim, Chris Davis:
Temporal Relationship Between Auditory and Visual Prosodic Cues. 981-984
ASR - New Paradigms
- Xie Sun, Yunxin Zhao:
New Methods for Template Selection and Compression in Continuous Speech Recognition. 985-988 - Shi-Xiong Zhang, Mark J. F. Gales:
Structured Support Vector Machines for Noise Robust Continuous Speech Recognition. 989-990 - Masayuki Suzuki, Gakuto Kurata, Masafumi Nishimura, Nobuaki Minematsu:
Continuous Digits Recognition Leveraging Invariant Structure. 993-996 - Dimitri Kanevsky, David Nahamoo, Tara N. Sainath, Bhuvana Ramabhadran:
Convergence of Line Search A-Function Methods. 997-1000 - Yasuhisa Fujii, Kazumasa Yamamoto, Seiichi Nakagawa:
Hidden Boosted MMI and Hierarchical State Posterior Feature for Automatic Speech Recognition Based on Hidden Conditional Neural Fields. 1001-1004 - Jun Cai, Bruce Denby, Pierre Roussel-Ragot, Gérard Dreyfus, Lise Crevier-Buchman:
Recognition and Real Time Performances of a Lightweight Ultrasound Based Silent Speech Interface Employing a Language Model. 1005-1008
Spoken Dialogue Systems II
- Heriberto Cuayáhuitl, Nina Dethlefs:
Optimizing Situated Dialogue Management in Unknown Environments. 1009-1012 - Om Deshmukh, Shajith Ikbal, Ashish Verma, Etienne Marcheret:
Acoustic-Similarity Based Technique to Improve Concept Recognition. 1013-1016 - Doug Peters, Peter Stubley:
Dialog Methods for Improved Alphanumeric String Capture. 1017-1020 - David DeVault, Kenji Sagae, David R. Traum:
Detecting the Status of a Predictive Incremental Speech Understanding Model for Real-Time Decision-Making in a Spoken Dialogue System. 1021-1024 - Senthilkumar Chandramohan, Matthieu Geist, Fabrice Lefèvre, Olivier Pietquin:
User Simulation in Dialogue Systems Using Inverse Reinforcement Learning. 1025-1028 - Paul A. Crook, Oliver Lemon:
Lossless Value Directed Compression of Complex User Goal States for Statistical Spoken Dialogue Systems. 1029-1032
Speaker Diarization II
- Janez Zibert, France Mihelic:
Prosodic and Phonetic Features for Speaker Clustering in Speaker Diarization Systems. 1033-1036 - Marijn Huijbregts, David A. van Leeuwen:
Diarization-Based Speaker Retrieval for Broadcast Television Archives. 1037-1040 - Martin Zelenák, Javier Hernando:
The Detection of Overlapping Speech with Prosodic Features for Speaker Diarization. 1041-1044 - Sree Hari Krishnan Parthasarathi, Hervé Bourlard, Daniel Gatica-Perez:
LP Residual Features for Robust, Privacy-Sensitive Speaker Diarization. 1045-1048 - Houman Ghaemmaghami, David Dean, Robbie Vogt, Sridha Sridharan:
Extending the Task of Diarization to Speaker Attribution. 1049-1052 - Viet-Anh Tran, Viet Bac Le, Claude Barras, Lori Lamel:
Comparing Multi-Stage Approaches for Cross-Show Speaker Diarization. 1053-1056
Prosody II
- György Szaszák, Katalin Nagy, András Beke:
Analysing the Correspondence Between Automatic Prosodic Segmentation and Syntactic Structure. 1057-1060 - Joseph Tepperman, Emily Nava:
Long-Distance Rhythmic Dependencies and their Application to Automatic Language Identification. 1061-1064 - Andrew Rosenberg:
Symbolic and Direct Sequential Modeling of Prosody for Classification of Speaking-Style and Nativeness. 1065-1068 - Wentao Gu, Ting Zhang, Hiroya Fujisaki:
Prosodic Analysis and Perception of Mandarin Utterances Conveying Attitudes. 1069-1072 - Chierh Cheng, Michele Gubian:
Predicting Taiwan Mandarin Tone Shapes from their Duration. 1073-1076 - Charlotte Wollermann, Ulrich Schade, Bernhard Schröder:
Variation of Accent Type and of Context - Influences on Pragmatic Focus Interpretation. 1077-1080
Adaptation for ASR
- Shinji Watanabe, Atsushi Nakamura, Biing-Hwang Juang:
Model Adaptation for Automatic Speech Recognition Based on Multiple Time Scale Evolution. 1081-1084 - Catherine Breslin, K. K. Chin, Mark J. F. Gales, Kate M. Knill:
Integrated Online Speaker Clustering and Adaptation. 1085-1088 - Zoltán Tüske, Christian Plahl, Ralf Schlüter:
A Study on Speaker Normalized MLP Features in LVCSR. 1089-1092 - Yongwon Jeong, Young Kuk Kim:
Matrix-Variate Distribution of Training Models for Robust Speaker Adaptation. 1093-1096 - Michael L. Seltzer, Alex Acero:
Separating Speaker and Environmental Variability Using Factored Transforms. 1097-1100 - Mazin Gilbert, Iker Arizmendi, Enrico Bocchieri, Diamantino Caseiro, Vincent Goffin, Andrej Ljolje, Mike Phillips, Chao Wang, Jay G. Wilpon:
Your Mobile Virtual Assistant Just Got Smarter! 1101-1104
SLP for Information Extraction and Retrieval II
- Vincent Claveau, Sébastien Lefèvre:
Topic Segmentation of TV-Streams by Mathematical Morphology and Vectorization. 1105-1108 - Mimi Lu, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li:
Probabilistic Latent Semantic Analysis for Broadcast News Story Segmentation. 1109-1112 - Evandro B. Gouvêa:
Hybrid Speech Recognition for Voice Search: A Comparative Study. 1113-1116 - Bo Peng, Yao Qian, Frank K. Soong, Bo Zhang:
A New Phonetic Candidate Generator for Improving Search Query Efficiency. 1117-1120 - Yukiko Suzuki, Kiyoaki Aikawa:
Towards Voice-Input Symbolic Pattern Retrieval Using Parameter-Based Search. 1121-1124 - Vikram Gupta, Jitendra Ajmera, Arun Kumar, Ashish Verma:
A Language Independent Approach to Audio Search. 1125-1128
Regular Poster Sessions
Second Language Acquisition, Development and Learning I
- Mikhail Ordin, Leona Polyanskaya, Christiane Ulbrich:
Acquisition of Timing Patterns in Second Language. 1129-1132 - Hongyan Li, Shen Huang, Shijin Wang, Bo Xu:
Context-Dependent Duration Modeling with Backoff Strategy and Look-Up Tables for Pronunciation Assessment and Mispronunciation Detection. 1133-1136 - Mee Sonu, Keiichi Tajima, Hiroaki Kato, Yoshinori Sagisaka:
Perceptual Training of Vowel Length Contrast of Japanese by L2 Listeners: Effects of an Isolated Word versus a Word Embedded in Sentences. 1137-1140 - E.-Chin Wu:
Similar Vowels in L1/L2 Production: Confused or Discerned in Early L2 English Learners with Different Amount of Exposure. 1141-1144 - Lya Meister, Einar Meister
:
Production and Perception of Estonian Vowels by Native and Non-Native Speakers. 1145-1148 - Hiroshi Kibishi, Seiichi Nakagawa:
New Feature Parameters for Pronunciation Evaluation in English Presentations at International Conferences. 1149-1152 - Gérard Bailly, Will Barbour:
Synchronous Reading: Learning French Orthography by Audiovisual Training. 1153-1156 - Christos Koniaris, Olov Engwall:
Phoneme Level Non-Native Pronunciation Analysis by an Auditory Model-Based Native Assessment Scheme. 1157-1160 - Pavel Sturm, Radek Skarnitzl:
The Open Front Vowel /æ/ in the Production and Perception of Czech Students of English. 1161-1164 - Catia Cucchiarini, Henk van den Heuvel, Eric Sanders, Helmer Strik:
Error Selection for ASR-Based English Pronunciation Training in 'My Pronunciation Coach'. 1165-1168 - Tomoko Nariai, Kazuyo Tanaka:
An Experimental Analysis of Pitch Patterns in Japanese Speakers of English with Verification by Speech Re-Synthesis. 1169-1172 - Tomoko Nariai, Kazuyo Tanaka, Yoshiaki Itoh:
An Analysis of Word Duration in Native Speakers and Japanese Speakers of English. 1173-1176
Speech Enhancement
- Laura Laaksonen, Ville Myllylä, Riitta Niemistö:
Evaluating Artificial Bandwidth Extension by Conversational Tests in Car Using Mobile Devices with Integrated Hands-Free Functionality. 1177-1180 - Hannu Pulakka, Ulpu Remes, Santeri Yrttiaho, Kalle J. Palomäki, Mikko Kurimo, Paavo Alku:
Low-Frequency Bandwidth Extension of Telephone Speech Using Sinusoidal Synthesis and Gaussian Mixture Model. 1181-1184 - Amr H. Nour-Eldin, Peter Kabal:
Memory-Based Approximation of the Gaussian Mixture Model Framework for Bandwidth Extension of Narrowband Speech. 1185-1188 - Philip Harding, Ben Milner:
Speech Enhancement by Reconstruction from Cleaned Acoustic Features. 1189-1192 - Jae-Hun Choi, Sang-Kyun Kim, Joon-Hyuk Chang:
A Soft Decision-Based Speech Enhancement Using Acoustic Noise Classification. 1193-1196 - Chao Li, Wenju Liu:
A Noise Estimation Method Based on Speech Presence Probability and Spectral Sparseness. 1197-1120 - Chao Li, Wenju Liu:
Improved a posteriori Speech Presence Probability Estimation Based on Cepstro-Temporal Smoothing and Time-Frequency Correlation. 1201-1204 - Md Foezur Rahman Chowdhury, Sid-Ahmed Selouani, Douglas D. O'Shaughnessy:
A Rapid Adaptation Algorithm for Tracking Highly Non-Stationary Noises based on Bayesian Inference for On-Line Spectral Change Point Detection. 1205-1208 - Kuldip K. Paliwal, Belinda Schwerin, Kamil K. Wójcicki:
Single Channel Speech Enhancement Using MMSE Estimation of Short-Time Modulation Magnitude Spectrum. 1209-1212 - Atanu Saha, Tetsuya Shimamura:
Speech Enhancement Using Masking Properties in Adverse Environments. 1213-1216 - Bhiksha Raj, Rita Singh, Tuomas Virtanen:
Phoneme-Dependent NMF for Speech Enhancement in Monaural Mixtures. 1217-1220 - Christina Leitner, Franz Pernkopf, Gernot Kubin:
Kernel PCA for Speech Enhancement. 1221-1224 - Angel M. Gomez, Belinda Schwerin, Kuldip K. Paliwal:
Objective Intelligibility Prediction of Speech by Combining Correlation and Distortion Based Techniques. 1225-1228
ASR - Feature Extraction I
- Frantisek Grézl, Martin Karafiát:
Integrating Recent MLP Feature Extraction Techniques into TRAP Architecture. 1229-1232 - Martin Wöllmer, Björn W. Schuller, Gerhard Rigoll:
Feature Frame Stacking in RNN-Based Tandem ASR Systems - Learned vs. Predefined Context. 1233-1236 - Christian Plahl, Ralf Schlüter, Hermann Ney:
Improved Acoustic Feature Combination for LVCSR by Neural Networks. 1237-1240 - Joel Pinto, Mathew Magimai-Doss, Hervé Bourlard:
Hierarchical Tandem Features for ASR in Mandarin. 1241-1244 - Fabio Valente, Mathew Magimai-Doss, Wen Wang:
Analysis and Comparison of Recent MLP Features for LVCSR Systems. 1245-1248 - Jaehyung Lee, Soo-Young Lee:
Deep Learning of Speech Features for Improved Phonetic Recognition. 1249-1252 - Heyun Huang, Yang Liu, Jort F. Gemmeke, Louis ten Bosch, Bert Cranen, Lou Boves:
Globality-Locality Consistent Discriminant Analysis for Phone Classification. 1253-1256 - Hynek Boril, Frantisek Grézl, John H. L. Hansen:
Front-End Compensation Methods for LVCSR Under Lombard Effect. 1257-1260 - Jung-Won Lee, Jeung-Yoon Choi, Hong-Goo Kang:
Classification of Fricatives Using Feature Extrapolation of Acoustic-Phonetic Features in Telephone Speech. 1261-1264 - Sami Keronen, Jouni Pohjalainen, Paavo Alku, Mikko Kurimo:
Noise Robust Feature Extraction Based on Extended Weighted Linear Prediction in LVCSR. 1265-1268 - Bernd T. Meyer, Suman V. Ravuri, Marc René Schädler, Nelson Morgan:
Comparing Different Flavors of Spectro-Temporal Features for ASR. 1269-1272 - Ehsan Variani, Thomas Schaaf:
VTLN in the MFCC Domain: Band-Limited versus Local Interpolation. 1273-1276 - Sridhar Krishna Nemala, Kailash Patil, Mounya Elhilali:
Multistream Bandpass Modulation Features for Robust Speech Recognition. 1277-1280 - Davide Marino, Thomas Hain:
An Analysis of Automatic Speech Recognition with Multiple Microphones. 1281-1284
Spoken Dialogue & Spoken Language Understanding Systems
- Géraldine Damnati, Delphine Charlet:
Multi-View Approach for Speaker Turn Role Labeling in TV Broadcast News Shows. 1285-1288 - Sudeep Gandhe, Michael Rushforth, Priti Aggarwal, David R. Traum:
Evaluation of an Integrated Authoring Tool for Building Advanced Question-Answering Characters. 1289-1292 - Gökhan Tür, Dilek Hakkani-Tür, Dustin Hillard, Asli Celikyilmaz:
Towards Unsupervised Spoken Language Understanding: Exploiting Query Click Logs for Slot Filling. 1293-1296 - Donghyeon Lee, Cheongjae Lee, Minwoo Jeong, Kyungduk Kim, Seokhwan Kim, Junhwi Choi, Gary Geunbae Lee:
Web-Enhanced Content Retrieval for Information Access Dialogue System. 1297-1300 - Lucie Daubigney, Milica Gasic, Senthilkumar Chandramohan, Matthieu Geist, Olivier Pietquin, Steve J. Young:
Uncertainty Management for On-Line Optimisation of a POMDP-Based Large-Scale Spoken Dialogue System. 1301-1304 - Sunao Hara, Norihide Kitaoka, Kazuya Takeda:
Detection of Task-Incomplete Dialogs Based on Utterance-and-Behavior Tag N-Gram for Spoken Dialog Systems. 1305-1308 - Ruhi Sarikaya, Stanley F. Chen, Bhuvana Ramabhadran:
Shrinkage-Based Features for Natural Language Call Routing. 1309-1312 - Leonid Rachevsky, Dimitri Kanevsky, Ruhi Sarikaya, Bhuvana Ramabhadran:
Clustering with Modified Cosine Distance Learned from Constraints. 1313-1316 - Andrew Fandrianto, Brian Langner, Alan W. Black:
Using Speaker ID to Discover Repeat Callers of a Spoken Dialog System. 1317-1320 - Florian Pinault, Fabrice Lefèvre:
Semantic Graph Clustering for POMDP-Based Spoken Dialog Systems. 1321-1324 - Ryo Taguchi, Yuji Yamada, Koosuke Hattori, Taizo Umezaki, Masahiro Hoguro, Naoto Iwahashi, Kotaro Funakoshi, Mikio Nakano:
Learning Place-Names from Spoken Utterances and Localization Results by Mobile Robot. 1325-1328 - Björn Gambäck, Fredrik Olsson, Oscar Täckström:
Active Learning for Dialogue Act Classification. 1329-1332 - Thierry Bazillon, Benjamin Maza, Mickael Rouvier, Frédéric Béchet, Alexis Nasr:
Speaker Role Recognition Using Question Detection and Characterization. 1333-1336 - Qiang Huang, Stephen J. Cox:
Learning Score Structure from Spoken Language for a Tennis Game. 1337-1340 - Silke M. Witt:
Semi-Automated Classifier Adaptation for Natural Language Call Routing. 1341-1344 - Wei-Bin Liang, Chung-Hsien Wu, Chih-Hung Wang, Jhing-Fa Wang:
Interactional Style Detection for Versatile Dialogue Response Using Prosodic and Semantic Features. 1345-1348 - Christine Kühnel, Benjamin Weiss, Matthias Schulz, Sebastian Möller:
Quality Aspects of Multimodal Dialog Systems: Identity, Stimulation and Success. 1349-1352
Prosodic Structure
- Joseph Tepperman, Emily Nava:
Where Should Pitch Accents and Phrase Breaks Go? A Syntax Tree Transducer Solution. 1353-1356 - Giuliano Bocci, Cinzia Avesani:
Phrasal Prominences do not need Pitch Movements: Postfocal Phrasal Heads in Italian. 1357-1360 - David Le Gac, Hiyon Yoo:
Intonation of Left Dislocated Topics in Modern Greek. 1361-1364 - Laura Thompson, Catherine Inez Watson, Ray Harlow, Jeanette King, Margaret Maclagan, Helen Charters, Peter Keegan:
Phrases, Pitch and Perceived Prominence in Maori. 1365-1368 - Tomás Dubeda:
Perceptual Sensitivity to Prenuclear and Nuclear Intonational Patterns. 1369-1372 - Raya Kalaldeh:
Tonal Alignment Defined: The Case of Southern Irish English. 1373-1376 - Andrew Rosenberg:
Using Mutual Information to Identify Regions of Analysis for Prosodic Analysis. 1377-1380 - Chiu-yu Tseng, Zhao-yu Su, Chi-Feng Huang:
Prosodic Highlights in Mandarin Continuous Speech - Cross-Genre Attributes and Implications. 1381-1384 - Simone Sulpizio, James M. McQueen:
When Two Newly-Acquired Words are One: New Words Differing in Stress Alone are not Automatically Represented Differently. 1385-1388 - Shehui Bu, Zhenjie Zhuo, Lingling Yang, Shuichi Itahashi:
Automatic Determination of the Standard Chinese Prosodic Phrase Boundaries by F0 Generation Model. 1389-1392 - Céline De Looze, Stéphane Rauzy:
Measuring Speakers' Similarity in Speech by Means of Prosodic Cues: Methods and Potential. 1393-1396 - Li-chiung Yang:
Tonal Variations in Mandarin: New Evidence from Spontaneous and Read Speech. 1397-1400
Language Processing
- Camille Guinaudeau, Julia Hirschberg:
Accounting for Prosodic Information to Improve ASR-Based Topic Tracking for TV Broadcast News. 1401-1404 - Kenji Imamura, Tomoko Izumi, Kugatsu Sadamitsu, Kuniko Saito, Satoshi Kobashikawa, Hirokazu Masataki:
Morpheme Conversion for Connecting Speech Recognizer and Language Analyzers in Unsegmented Languages. 1405-1408 - Ren-Ying Fang, Bo-Wei Chen, Jhing-Fa Wang, Chung-Hsien Wu:
Emotion Detection Based on Concept Inference and Spoken Sentence Analysis for Customer Service. 1409-1412 - Christophe Cerisara, Pavel Král, Claire Gardent:
Commas Recovery with Syntactic Features in French and in Czech. 1413-1416 - Daniele Falavigna:
Redundancy Reduction in ASR of Spontaneous Speech Through Statistical Machine Translation. 1417-1420 - Chin-Chih Chiang:
From Interview to News Text: A Study of Taiwan TV Political Interviews in Newspaper Reports. 1421-1424
ASR - Language Models I
- Jeffrey Sorensen, Cyril Allauzen:
Unary Data Structures for Language Models. 1425-1428 - Cyril Allauzen, Michael Riley:
Bayesian Language Model Interpolation for Mobile Speech Input. 1429-1432 - Martin Sundermeyer, Ralf Schlüter, Hermann Ney:
On the Estimation of Discount Parameters for Language Model Smoothing. 1433-1436 - Patrick Lehnen, Stefan Hahn, Hermann Ney:
N-Grams for Conditional Random Fields or a Failure-Transition(f) Posterior for Acyclic FSTs. 1437-1440 - M. Ali Basha Shaik, Amr El-Desoky Mousa, Ralf Schlüter, Hermann Ney:
Hybrid Language Models Using Mixed Types of Sub-Lexical Units for Open Vocabulary German LVCSR. 1441-1444 - Amr El-Desoky Mousa, M. Ali Basha Shaik, Ralf Schlüter, Hermann Ney:
Morpheme Based Factored Language Models for German LVCSR. 1445-1448 - Markus Nußbaum-Thom, Amr El-Desoky Mousa, Ralf Schlüter, Hermann Ney:
Compound Word Recombination for German LVCSR. 1449-1452 - Akio Kobayashi, Takahiro Oku, Shinichi Homma, Toru Imai, Seiichi Nakagawa:
Lattice-Based Risk Minimization Training for Unsupervised Language Model Adaptation. 1453-1456 - Christian Gillot, Christophe Cerisara:
Similarity Language Model. 1457-1460 - Erinç Dikici, Murat Semerci, Murat Saraclar, Ethem Alpaydin:
Data Sampling and Dimensionality Reduction Approaches for Reranking ASR Outputs Using Discriminative Language Models. 1461-1464 - Ryo Masumura, Seongjun Hahm, Akinori Ito:
Training a Language Model Using Webdata for Large Vocabulary Japanese Spontaneous Speech Recognition. 1465-1468 - Hai Son Le, Ilya Oparin, Abdelkhalek Messaoudi, Alexandre Allauzen, Jean-Luc Gauvain, François Yvon:
Large Vocabulary SOUL Neural Network Language Models. 1469-1472 - Jonathan Mamou, Abhinav Sethy, Bhuvana Ramabhadran, Ron Hoory, Paul Vozila:
Improved Spoken Query Transcription Using Co-Occurrence Information. 1473-1476 - Yik-Cheung Tam, Paul Vozila:
Unsupervised Latent Speaker Language Modeling. 1477-1480
Spoken Language Resources, Evaluation and Standardization I
- Nobuaki Minematsu, Koji Okabe, Keisuke Ogaki, Keikichi Hirose:
Measurement of Objective Intelligibility of Japanese Accented English Using ERJ (English Read by Japanese) Database. 1481-1484 - Sebastian Möller, Chihuy Bang, Teele Tamme, Markus Vaalgamaa, Benjamin Weiss:
From Single-Call to Multi-Call Quality: A Study on Long-Term Quality Integration in Audio-Visual Speech Communication. 1485-1488 - Hui Lin, Jeff A. Bilmes:
Optimal Selection of Limited Vocabulary Speech Corpora. 1489-1492 - Stephen A. Zahorian, Jiang Wu, Montri Karnjanadecha, Chandra Sekhar Vootkuri, Brian Wong, Andrew Hwang, Eldar Tokhtamyshev:
Open Source Multi-Language Audio Database for Spoken Language Processing Applications. 1493-1496 - Matthew Black, Daniel Bone, Marian E. Williams, Phillip Gorrindo, Pat Levitt, Shrikanth S. Narayanan:
The USC CARE Corpus: Child-Psychologist Interactions of Children with Autism Spectrum Disorders. 1497-1500 - Nelly Barbot, Vincent Barreaud, Olivier Boëffard, Laure Charonnat, Arnaud Delhay, Sébastien Le Maguer, Damien Lolive:
Towards a Versatile Multi-Layered Description of Speech Corpora Using Algebraic Relations. 1501-1504 - Korin Richmond, Phil Hoole, Simon King:
Announcing the Electromagnetic Articulography (Day 1) Subset of the mngu0 Articulatory Corpus. 1505-1508 - Gregor Pirker, Michael Wohlmayr, Stefan Petrik, Franz Pernkopf:
A Pitch Tracking Corpus with Evaluation on Multipitch Tracking Scenario. 1509-1512 - Taras Butko:
On Building and Evaluating a Broadcast-News Audio Segmentation System. 1513-1516 - Simon Dobrisek, France Mihelic:
Time- and Acoustic-Mediated Alignment Algorithms for Speech Recognition Evaluation. 1517-1520 - Julia Niemann, Kati Schulz, Ina Wechsung:
Effects of Shortening Speech Prompts of In-Car Voice User Interfaces on Users Mental Models. 1521-1524 - Laurens van der Werff, Wessel Kraaij, Franciska de Jong:
Speech Transcript Evaluation for Information Retrieval. 1525-1528 - Luis Javier Rodríguez, Mikel Peñagarikano, Amparo Varona, Mireia Díez, Germán Bordel:
The Albayzin 2010 Language Recognition Evaluation. 1529-1532 - Roger K. Moore:
Progress and Prospects for Speech Technology: Results from Three Sexennial Surveys. 1533-1536 - Josef R. Novak, Nobuaki Minematsu, Keikichi Hirose:
Painless WFST Cascade Construction for LVCSR - Transducersaurus. 1537-1540
Paralinguistic Information - Classification and Detection
- Catharine Oertel, Stefan Scherer, Nick Campbell:
On the Use of Multimodal Cues for the Prediction of Degrees of Involvement in Spontaneous Conversation. 1541-1544 - Narichika Nomoto, Masafumi Tamoto, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi:
Anger Recognition in Spoken Dialog Using Linguistic and Para-Linguistic Information. 1545-1548 - Alexei V. Ivanov, Giuseppe Riccardi, Adam J. Sporka, Jakub Franc:
Recognition of Personality Traits from Human Spoken Conversations. 1549-1552 - Björn W. Schuller, Zixing Zhang, Felix Weninger, Gerhard Rigoll:
Using Multiple Databases for Training in Emotion Recognition: To Unite or to Vote? 1553-1556 - Felix Burkhardt, Björn W. Schuller, Benjamin Weiss, Felix Weninger:
"Would You Buy a Car from Me?" - On the Likability of Telephone Voices. 1557-1560 - James Gibson, Athanasios Katsamanis, Matthew P. Black, Shrikanth S. Narayanan:
Automatic Identification of Salient Acoustic Instances in Couples' Behavioral Interactions Using Diverse Density Support Vector Machines. 1561-1564 - Daniel Neiberg, Joakim Gustafson:
Predicting Speaker Changes and Listener Responses with and without Eye-Contact. 1565-1568 - Senaka Amarakeerthi, Tin Lay Nwe, Liyanage C. De Silva, Michael Cohen:
Emotion Classification Using Inter- and Intra-Subband Energy Variation. 1569-1572 - Kazuki Kitahara, Shinzi Michiwiki, Miku Sato, Shoichi Matsunaga, Masaru Yamashita, Kazuyuki Shinohara:
Emotion Classification of Infants' Cries Using Duration Ratios of Acoustic Segments. 1573-1576 - Bogdan Vlasenko, Dmytro Prylipko, David Philippou-Hübner, Andreas Wendemuth:
Vowels Formants Analysis Allows Straightforward Detection of High Arousal Acted and Spontaneous Emotions. 1577-1580 - Daniel Neiberg, Petri Laukka, Hillary Anger Elfenbein:
Intra-, Inter-, and Cross-Cultural Classification of Vocal Affect. 1581-1584
Applications for Learning, Education, Aged and Handicapped Persons
- Sajad Shirali-Shahreza, Yashar Ganjali, Ravin Balakrishnan:
Verifying Human Users in Speech-Based Interactions. 1585-1588 - Jian Cheng:
Automatic Assessment of Prosody in High-Stakes English Tests. 1589-1592 - Dean Luo, Xuesong Yang, Lan Wang:
Improvement of Segmental Mispronunciation Detection with Prior Knowledge Extracted from Large L2 Speech Corpus. 1593-1596 - Jian Cheng, Jianqiang Shen:
Off-Topic Detection in Automated Speech Assessment Applications. 1597-1600 - Sebastian Stüker, Johanna Fay, Kay Berkling:
Towards Context-Dependent Phonetic Spelling Error Correction in Children's Freely Composed Text for Diagnostic and Pedagogical Purposes. 1601-1604 - Verónica López-Ludeña, Rubén San Segundo, Ricardo de Córdoba, Javier Ferreiros, Juan Manuel Montero, José Manuel Pardo:
Factored Translation Models for Improving a Speech into Sign Language Translation System. 1605-1608 - Kálmán Abari, Zsuzsanna Zsófia Rácz, Gábor Olaszy:
Formant Maps in Hungarian Vowels - Online Data Inventory for Research, and Education. 1609-1612 - Germán Bordel, Silvia Nieto, Mikel Peñagarikano, Luis Javier Rodríguez, Amparo Varona:
Automatic Subtitling of the Basque Parliament Plenary Sessions Videos. 1613-1616 - Yurie Iribe, Silasak Manosavanh, Kouichi Katsurada, Ryoko Hayashi, Chunyue Zhu, Tsuneo Nitta:
Generating Animated Pronunciation from Speech Through Articulatory Feature Extraction. 1617-1620 - Wei Chen, Jack Mostow:
A Tale of Two Tasks: Detecting Children's Off-Task Speech in a Reading Tutor. 1621-1624 - Toshiko Isei-Jaakkola, Takatoshi Naka, Keikichi Hirose:
Problems Encountered by Japanese EL2 with English Short Vowels as Illustrated on a 3D Vowel Chart. 1625-1628 - Thomas Pellegrini, Rui Correia, Isabel Trancoso, Jorge Baptista, Nuno J. Mamede:
Automatic Generation of Listening Comprehension Learning Material in European Portuguese. 1629-1632 - Chao-Hong Liu, Chung-Hsien Wu, David Sarwono, Jhing-Fa Wang:
Candidate Generation for ASR Output Error Correction Using a Context-Dependent Syllable Cluster-Based Confusion Matrix. 1633-1636 - Huynh Thai Hoa, An Vu Tran, Tran Huy Dat:
Semi-Supervised Tree Support Vector Machine for Online Cough Recognition. 1637-1640
Robust Speech Recognition I
- Volker Leutnant, Alexander Krueger, Reinhold Haeb-Umbach:
A Versatile Gaussian Splitting Approach to Non-Linear State Estimation and its Application to Noise-Robust ASR. 1641-1644 - Hilman Ferdinandus Pardede, Koichi Shinoda:
Generalized-Log Spectral Mean Normalization for Speech Recognition. 1645-1648 - Young-Ik Kim, Hoon-Young Cho, Sang-Hun Kim:
Zero-Crossing-Based Channel Attentive Weighting of Cepstral Features for Robust Speech Recognition: The ETRI 2011 CHiME Challenge System. 1649-1652 - Wooil Kim, John H. L. Hansen:
Feature Compensation for Speech Recognition in Severely Adverse Environments Due to Background Noise and Channel Distortion. 1653-1656 - Ning Ma, Jon Barker, Heidi Christensen, Phil D. Green:
Binaural Cues for Fragment-Based Speech Recognition in Reverberant Multisource Environments. 1657-1660 - Vikas Joshi, Raghavendra Bilgi, Srinivasan Umesh, Luz García, M. Carmen Benítez:
Sub-Band Level Histogram Equalization for Robust Speech Recognition. 1661-1664 - Ulpu Remes, Yoshihiko Nankaku, Keiichi Tokuda:
GMM-Based Missing-Feature Reconstruction on Multi-Frame Windows. 1665-1668 - Yang Sun, Jort F. Gemmeke, Bert Cranen, Louis ten Bosch, Lou Boves:
Improvements of a Dual-Input DBN for Noise Robust ASR. 1669-1672 - Randy Gomez, Tatsuya Kawahara:
Denoising Using Optimized Wavelet Filtering for Automatic Speech Recognition. 1673-1676 - Florian Müller, Alfred Mertins:
Noise Robust Speaker-Independent Speech Recognition with Invariant-Integration Features Using Power-Bias Subtraction. 1677-1680
ASR - Acoustic Models I
- Michele Alessandrini, Giorgio Biagetti, Alessandro Curzi, Claudio Turchetti:
Semi-Automatic Acoustic Model Generation from Large Unsynchronized Audio and Text Chunks. 1681-1684 - Brian Strope, Doug Beeferman, Alexander Gruenstein, Xin Lei:
Unsupervised Testing Strategies for ASR. 1685-1688 - Gakuto Kurata, Nobuyasu Itoh, Masafumi Nishimura:
Acoustic Model Training with Detecting Transcription Errors in the Training Data. 1689-1692 - Aren Jansen, Kenneth Church:
Towards Unsupervised Training of Speaker Independent Acoustic Models. 1693-1692 - Xiaodong Cui, Xin Chen, Jian Xue, Peder A. Olsen, John R. Hershey, Bowen Zhou:
Acoustic Modeling with Bootstrap and Restructuring Based on Full Covariance. 1697-1700 - Jian Xu, Yu Zhang, Zhi-Jie Yan, Qiang Huo:
An i-vector Based Approach to Acoustic Sniffing for Irrelevant Variability Normalization Based Acoustic Model Training and Speech Recognition. 1701-1704 - Muhammad Ali Tahir, Ralf Schlüter, Hermann Ney:
Log-Linear Optimization of Second-Order Polynomial Features with Subsequent Dimension Reduction for Speech Recognition. 1705-1708 - Qingqing Zhang, Lori Lamel, Jean-Luc Gauvain:
Genre Categorization and Modeling for Broadcast Speech Transcription. 1709-1712 - Sunghwan Shin, Ho-Young Jung, Biing-Hwang Juang:
Individual Error Minimization Learning Framework and its Applications to Speech Recognition and Utterance Verification. 1713-1716 - Sakhia Darjaa, Milos Cernak, Marián Trnka, Milan Rusko, Róbert Sabo:
Effective Triphone Mapping for Acoustic Modeling in Speech Recognition. 1717-1720 - Udhyakumar Nallasamy, Michael Garbus, Florian Metze, Qin Jin, Thomas Schaaf, Tanja Schultz:
Analysis of Dialectal Influence in Pan-Arabic ASR. 1721-1724 - Azarakhsh Jalalvand, Fabian Triefenbach, David Verstraeten, Jean-Pierre Martens:
Connected Digit Recognition by Means of Reservoir Computing. 1725-1728 - Madhavi Vedula Ratnagiri, Biing-Hwang Juang, Lawrence R. Rabiner:
Large Margin - Minimum Classification Error Using Sum of Shifted Sigmoids as the Loss Function. 1729-1732 - Javier Mikel Olaso, M. Inés Torres, Raquel Justo:
Representing Phonological Features Through a Two-Level Finite State Model. 1733-1736 - Jan Vanek, Jan Trmal, Josef V. Psutka, Josef Psutka:
Optimization of the Gaussian Mixture Model Evaluation on GPU. 1737-1740
Source Separation and Speech Enhancement
- Xueliang Zhang, Wenju Liu:
Monaural Voiced Speech Segregation Based on Pitch and Comb Filter. 1741-1744 - Yasuharu Hirasawa, Naoki Yasuraoka, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno:
Fast and Simple Iterative Algorithm of Lp-Norm Minimization for Under-Determined Speech Separation. 1745-1748 - Azam Rabiee, Saeed Setayeshi, Soo-Young Lee:
Monaural Speech Separation Based on a 2D Processing and Harmonic Analysis. 1749-1752 - Ingrid Jafari, Serajul Haque, Roberto Togneri, Sven Nordholm:
Underdetermined Blind Source Separation with Fuzzy Clustering for Arbitrarily Arranged Sensors. 1753-1756 - Dang Hai Tran Vu, Reinhold Haeb-Umbach:
On Initial Seed Selection for Frequency Domain Blind Speech Separation. 1757-1760 - Nobuaki Tanaka, Tetsuji Ogawa, Tetsunori Kobayashi:
Spatial Filter Calibration Based on Minimization of Modified LSD. 1761-1764 - Toru Nakashika, Tetsuya Takiguchi, Yasuo Ariki:
Probabilistic Spectrum Envelope: Categorized Audio-Features Representation for NMF-Based Sound Decomposition. 1765-1768 - Jinho Choi, Chang D. Yoo:
A High Resolution Multiple Source Localization Based on Generalized Cumulant Structure (GCS) Matrix. 1769-1772 - Emad M. Grais, Hakan Erdogan:
Single Channel Speech Music Separation Using Nonnegative Matrix Factorization with Sliding Windows and Spectral Masks. 1773-1776 - Jorge I. Marin-Hurtado, David V. Anderson:
Perceptually-Inspired Processing for Multichannel Wiener Filter. 1777-1780 - Shoichi Nakano, Kazumasa Yamamoto, Seiichi Nakagawa:
Speech Recognition in Mixed Sound of Speech and Music Based on Vector Quantization and Non-Negative Matrix Factorization. 1781-1784 - Tomohiro Nakatani, Shoko Araki, Marc Delcroix, Takuya Yoshioka, Masakiyo Fujimoto:
Reduction of Highly Nonstationary Ambient Noise by Integrating Spectral and Locational Characteristics of Speech and Noise for Robust ASR. 1785-1788 - Carlo Drioli, Andrea Calanca:
Voice Processing by Dynamic Glottal Models with Applications to Speech Enhancement. 1789-1792 - Jinqiu Sang, Guoping Li, Hongmei Hu, Mark E. Lutman, Stefan Bleeck:
Supervised Sparse Coding Strategy in Cochlear Implants. 1793-1796
HMM-Based Speech Synthesis II
- Benjamin Picart, Thomas Drugman, Thierry Dutoit:
Continuous Control of the Degree of Articulation in HMM-Based Speech Synthesis. 1797-1800 - Ling-Hui Chen, Yoshihiko Nankaku, Heiga Zen, Keiichi Tokuda, Zhen-Hua Ling, Li-Rong Dai:
Estimation of Window Coefficients for Dynamic Feature Extraction for HMM-Based Speech Synthesis. 1801-1804 - Zhengqi Wen, Jianhua Tao:
Inverse Filtering Based Harmonic Plus Noise Excitation Model for HMM-Based Speech Synthesis. 1805-1808 - Daniel Erro, Iñaki Sainz, Eva Navas, Inma Hernáez:
Improved HNM-Based Vocoder for Statistical Synthesizers. 1809-1812 - Gopala Krishna Anumanchipalli, Luís C. Oliveira, Alan W. Black:
A Statistical Phrase/Accent Model for Intonation Modeling. 1813-1816 - Gustav Eje Henter, W. Bastiaan Kleijn
:
Intermediate-State HMMs to Capture Continuously-Changing Signal Features. 1817-1820 - Norbert Braunschweiler, Sabine Buchholz:
Automatic Sentence Selection from Speech Corpora Including Diverse Speech for Improved HMM-TTS Synthesis Quality. 1821-1824 - Hui Liang, John Dines:
Phonological Knowledge Guided HMM State Mapping for Cross-Lingual Speaker Adaptation. 1825-1828 - Nicolas Obin, Pierre Lanchantin, Anne Lacheret, Xavier Rodet:
Reformulating Prosodic Break Model into Segmental HMMs and Information Fusion. 1829-1832 - Ranniery Maia, Heiga Zen, Kate M. Knill, Mark J. F. Gales, Sabine Buchholz:
Multipulse Sequences for Residual Signal Modeling. 1833-1836 - Cassia Valentini-Botinhao, Junichi Yamagishi, Simon King:
Can Objective Measures Predict the Intelligibility of Modified HMM-Based Synthetic Speech in Noise? 1837-1840 - Tsuneo Nitta, Takayuki Onoda, Masashi Kimura, Yurie Iribe, Kouichi Katsurada:
Speech Synthesis Based on Articulatory-Movement HMMs with Voice-Source Codebooks. 1841-1844 - Tsuneo Kato, Makoto Yamada, Nobuyuki Nishizawa, Keiichiro Oura, Keiichi Tokuda:
Large-Scale Subjective Evaluations of Speech Rate Control Methods for HMM-Based Speech Synthesizers. 1845-1848 - Yu Maeno, Takashi Nose, Takao Kobayashi, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka:
HMM-Based Emphatic Speech Synthesis Using Unsupervised Context Labeling. 1849-1852
Phonetics and Phonology, Stress, Accent, Rhythm
- Chiara Bertini, Pier Marco Bertinetto, Na Zhi:
Chinese and Italian Speech Rhythm: Normalization and the CCI Algorithm. 1853-1852 - Paolo Mairano, Antonio Romano:
Rhythm Metrics on Syllables and Feet do not Work as Expected. 1857-1860 - Lei Chen, Klaus Zechner:
Applying Rhythm Features to Automatically Assess Non-Native Speech. 1861-1864 - Brian Vaughan:
Prosodic Synchrony in Co-Operative Task-Based Dialogues: A Measure of Agreement and Disagreement. 1865-1868 - Oliver Niebuhr, Astrid Wolf:
Low and High, Short and Long by Crook or by Hook? 1869-1872 - Christian Heinrich, Florian Schiel:
Estimating Speaking Rate by Means of Rhythmicity Parameters. 1873-1876 - Denis Arnold, Bernd Möbius, Petra Wagner:
Comparing Word and Syllable Prominence Rated by Naïve Listeners. 1877-1880 - Shinichi Tokuma, Yi Xu:
L1/L2 Perception of Lexical Stress with F0 Peak-Delay: Effect of an Extra Syllable Added. 1881-1884 - Kheang Seng, Yurie Iribe, Tsuneo Nitta:
Letter-to-Phoneme Conversion Based on Two-Stage Neural Network Focusing on Letter and Phoneme Contexts. 1885-1888 - Rosemary Orr, Hugo Quené, Roeland van Beek, Thari Diefenbach, David A. van Leeuwen, Marijn Huijbregts:
An International English Speech Corpus for Longitudinal Study of Accent Development. 1889-1892 - Sunhee Kim, Kyuwhan Lee, Minhwa Chung:
A Corpus-Based Study of English Pronunciation Variations. 1893-1896 - Hywel Stoakes, Andrew Butcher, Janet Fletcher, Marija Tabain:
Long Term Average Speech Spectra in Yolngu Matha and Pitjantjatjara Speaking Females and Males. 1897-1900 - Tekla Etelka Gráczi, Steven M. Lulich, Tamás Gábor Csapó, András Beke:
Context and Speaker Dependency in the Relation of Vowel Formants and Subglottal Resonances - Evidence from Hungarian. 1901-1904
ASR - Search, Keyword Spotting and Confidence Measures I
- Keith Kintzley, Aren Jansen, Hynek Hermansky:
Event Selection from Phone Posteriorgrams Using Matched Filters. 1905-1908 - Yaodong Zhang, James R. Glass:
A Piecewise Aggregate Approximation Lower-Bound Estimate for Posteriorgram-Based Dynamic Time Warping. 1909-1912 - Long Qin, Ming Sun, Alexander I. Rudnicky:
OOV Detection and Recovery Using Hybrid Models with Different Fragments. 1913-1916 - Haiyang Li, Jiqing Han, Tieran Zheng:
AUC Optimization Based Confidence Measure for Keyword Spotting. 1917-1920 - Zejun Ma, Xiaorui Wang, Bo Xu:
An Empirical Study of Multilingual Spoken Term Detection. 1921-1924 - Zejun Ma, Xiaorui Wang, Bo Xu:
Fusing Multiple Confidence Measures for Chinese Spoken Term Detection. 1925-1928 - Zhanlei Yang, Hao Chao, Wenju Liu:
Response Probability Based Decoding Algorithm for Large Vocabulary Continuous Speech Recognition. 1929-1932 - Yuxiang Shan, Yan Deng, Jia Liu:
Combining Lattice-Based Language Dependent and Independent Approaches for Out-of-Language Detection in LVCSR. 1933-1936 - Naoaki Ito, Yoshihiko Nankaku, Akinobu Lee:
Evaluation of Tree-Trellis Based Decoding in Over-Million LVCSR. 1937-1940 - Hao Huang, Bing Hu Li:
Lattice Based Discriminative Model Combination Using Automatically Induced Phonetic Contexts. 1941-1944 - Taniya Mishra, Andrej Ljolje, Mazin Gilbert:
Predicting Human Perceived Accuracy of ASR Systems. 1945-1948 - Ioana Vasilescu, Dahbia Yahia, Natalie D. Snoeren, Martine Adda-Decker, Lori Lamel:
Cross-Lingual Study of ASR Errors: On the Role of the Context in Human Perception of Near-Homophones. 1949-1952 - Tatsuhiko Saito, Takashi Nose, Takao Kobayashi, Yohei Okato, Akio Horii:
Performance Prediction of Speech Recognition Using Average-Voice-Based Speech Synthesis. 1953-1956 - Ali Haznedaroglu, Levent M. Arslan:
Confidence Measures for Turkish Call Center Conversations. 1957-1960 - Taichi Asami, Narichika Nomoto, Satoshi Kobashikawa, Yoshikazu Yamaguchi, Hirokazu Masataki, Satoshi Takahashi:
Spoken Document Confidence Estimation Using Contextual Coherence. 1961-1964
Pitch Processing - Singing Voice Analysis
- Alipah Pawi, Saeed Vaseghi, Ben Milner, Seyed Ghorshi:
Fundamental Frequency Estimation Using Modified Higher Order Moments and Multiple Windows. 1965-1968 - Michael Wohlmayr, Franz Pernkopf:
EM-Based Gain Adaptation for Probabilistic Multipitch Tracking. 1969-1972 - Thomas Drugman, Abeer Alwan:
Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics. 1973-1976 - D. Govind, S. R. Mahadeva Prasanna, Debadatta Pati:
Epoch Extraction in High Pass Filtered Speech Using Hilbert Envelope. 1977-1980 - Alexander Pavlovets, Alexander A. Petrovsky:
Robust HNR-Based Closed-Loop Pitch and Harmonic Parameters Estimation. 1981-1984 - Chetana Prakash, N. Dhananjaya, Suryakanth V. Gangashetty:
Exploring Bessel Features for Detection of Glottal Closure Instants. 1985-1988 - João P. Cabral, John Kane, Christer Gobl, Julie Carson-Berndsen:
Evaluation of Glottal Epoch Detection Algorithms on Different Voice Types. 1989-1992 - Antonio Origlia, Giovanni Abete, Francesco Cutugno, Iolanda Alfano, Renata Savy, Bogdan Ludusan:
A Divide et impera Algorithm for Optimal Pitch Stylization. 1993-1996 - Ricardo Teixeira Sousa, Aníbal J. S. Ferreira:
Singing Voice Analysis Using Relative Harmonic Delays. 1997-2000 - Siu Wa Lee, Minghui Dong:
Singing Voice Synthesis: Singer-Dependent Vibrato Modeling and Coherent Processing of Spectral Envelope. 2001-2004 - Sylvain Le Beux, Lionel Feugère, Christophe d'Alessandro:
Chorus Digitalis: Experiments in Chironomic Choir Singing. 2005-2008
Prosodic Modeling
- Kun Li, Shuang Zhang, Mingxing Li, Wai Kit Lo, Helen M. Meng:
Prominence Model for Prosodic Features in Automatic Lexical Stress and Pitch Accent Detection. 2009-2012 - Ya Li, Jianhua Tao, Xiaoying Xu:
Hierarchical Stress Modeling in Mandarin Text-to-Speech. 2013-2016 - Chong-Jia Ni, Wenju Liu, Bo Xu:
Automatic Prosodic Events Detection by Using Syllable-Based Acoustic, Lexical and Syntactic Features. 2017-2020 - Albert Rilliard, Alexandre Allauzen, Philippe Boula de Mareüil:
Using Dynamic Time Warping to Compute Prosodic Similarity Measures. 2021-2024 - Plínio Almeida Barbosa, Hansjörg Mixdorff, Sandra Madureira:
Applying the Quantitative Target Approximation Model (qTA) to German and Brazilian Portuguese. 2025-2028 - Nicolas Obin, Anne Lacheret, Xavier Rodet:
Stylization and Trajectory Modelling of Short and Long Term Speech Prosody Variations. 2029-2032 - Mathieu Avanzi, Nicolas Obin, Anne Lacheret-Dujour, Bernard Victorri:
Toward a Continuous Modeling of French Prosodic Structure: Using Acoustic Features to Predict Prominence Location and Prominence Degree. 2033-2036 - Tim Mahrt, Jui-Ting Huang, Yoonsook Mo, Margaret M. Fleck, Mark Hasegawa-Johnson, Jennifer Cole:
Optimal Models of Prosodic Prominence Using the Bayesian Information Criterion. 2037-2040 - Hussein Hussein, Hansjörg Mixdorff, Hue San Do, Rüdiger Hoffmann:
Quantitative Analysis of Tone Coarticulation in Mandarin. 2041-2044 - Daniel Neiberg, Gopal Ananthakrishnan, Joakim Gustafson:
Tracking Pitch Contours Using Minimum Jerk Trajectories. 2045-2048
Discourse and Dialogue
- Benjamin Maza, Marc El-Bèze, Georges Linarès, Renato de Mori:
On the Use of Linguistic Features in an Automatic System for Speech Analytics of Telephone Conversations. 2049-2052 - Abe Kazemzadeh, Sungbok Lee, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Determining what Questions to Ask, with the Help of Spectral Graph Theory. 2053-2056 - Hendrik Buschmeier, Zofia Malisz, Marcin Wlodarczak, Stefan Kopp, Petra Wagner:
'Are You Sure You're Paying Attention?' - 'Uh-Huh' Communicating Understanding as a Marker of Attentiveness. 2057-2060 - Yuichi Ishimoto, Mika Enomoto, Hitoshi Iida:
Projectability of Transition-Relevance Places Using Prosodic Features in Japanese Spontaneous Conversation. 2061-2064 - Anna Hjalmarsson, Kornel Laskowski:
Measuring Final Lengthening for Speaker-Change Prediction. 2065-2068 - Kornel Laskowski, Jens Edlund, Mattias Heldner:
Incremental Learning and Forgetting in Stochastic Turn-Taking Models. 2069-2072 - Kallirroi Georgila, David R. Traum:
Reinforcement Learning of Argumentation Dialogue Policies in Negotiation. 2073-2076 - Tobias Heinroth, Savina Koleva, Wolfgang Minker:
Topic Switching Strategies for Spoken Dialogue Systems. 2077-2080 - Ryuichiro Higashinaka, Noriaki Kawamae, Kugatsu Sadamitsu, Yasuhiro Minami, Toyomi Meguro, Kohji Dohsaka, Hirohito Inagaki:
Unsupervised Clustering of Utterances Using Non-Parametric Bayesian Methods. 2081-2084
SLP for Speech Translation, Information Extraction and Retrieval
- Carolina Parada, Mark Dredze, Frederick Jelinek:
OOV Sensitive Named-Entity Recognition in Speech. 2085-2088 - Markus Saers, Dekai Wu, Chi-kiu Lo, Karteek Addanki:
Speech Translation with Grammar Driven Probabilistic Phrasal Bilexica Extraction. 2089-2092 - Christoph Tillmann, Sanjika Hewavitharana:
An Efficient Unified Extraction Algorithm for Bilingual Data. 2093-2096 - Songfang Huang, Bowen Zhou:
Using Features from Topic Models to Alleviate Over-Generation in Hierarchical Phrase-Based Translation. 2097-2100 - Songfang Huang, Bowen Zhou:
An Empirical Study on Improving Hierarchical Phrase-Based Translation Using Alignment Features. 2101-2104 - Xiaodong He, Li Deng:
Robust Speech Translation by Domain Adaptation. 2105-2108 - Emil Ettelaie, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Enhancements to the Training Process of Classifier-Based Speech Translator via Topic Modeling. 2109-2112 - Vivek Kumar Rangarajan Sridhar, Luciano Barbosa, Srinivas Bangalore:
A Scalable Approach to Building a Parallel Corpus from the Web. 2113-2116 - Yoshiaki Itoh, Kohei Iwata, Masaaki Ishigame, Kazuyo Tanaka, Shi-wook Lee:
Spoken Term Detection Results Using Plural Subword Models by Estimating Detection Performance for Each Query. 2117-2120 - Luciano Barbosa, Diamantino Caseiro, Giuseppe Di Fabbrizio:
SpeechForms: From Web to Speech and Back. 2121-2124 - Kazuyuki Noritake, Hiroaki Nanjo, Takehiko Yoshimi:
Image Processing Filters for Line Detection-Based Spoken Term Detection. 2125-2128 - Joe Polifroni, François Mairesse:
Using Latent Topic Features for Named Entity Extraction in Search Queries. 2129-2132 - Ryo Masumura, Seongjun Hahm, Akinori Ito:
Language Model Expansion Using Webdata for Spoken Document Retrieval. 2133-2136 - Tomoyosi Akiba, Koichiro Honda:
Effects of Query Expansion for Spoken Document Passage Retrieval. 2137-2140 - Chun-an Chan, Lin-Shan Lee:
Unsupervised Hidden Markov Modeling of Spoken Queries for Spoken Term Detection without Speech Recognition. 2141-2144 - Roberto Gemello, Franco Mana, Pier Domenico Batzu:
Topic Identification from Audio Recordings Using Rich Recognition Results and Neural Network Based Classifiers. 2145-2148
Speech Synthesis - Selected Topics
- Alok Parlikar, Alan W. Black:
A Grammar Based Approach to Style Specific Phrase Prediction. 2149-2152 - Oliver Watts, Bowen Zhou:
Unsupervised Features from Text for Speech Synthesis in a Speech-to-Speech Translation System. 2153-2156 - Oliver Watts, Junichi Yamagishi, Simon King:
Unsupervised Continuous-Valued Word Features for Phrase-Break Prediction without a Part-of-Speech Tagger. 2157-2160 - Francisco Campillo, Francisco Méndez Pazó, Montserrat Arza, Laura Docío Fernández, Antonio Bonafonte, Eva Navas, Iñaki Sainz:
Albayzín 2010: A Spanish Text to Speech Evaluation. 2161-2164 - Binbin Shen, Zhiyong Wu, Yongxin Wang, Lianhong Cai:
Combining Active and Semi-Supervised Learning for Homograph Disambiguation in Mandarin Text-to-Speech Synthesis. 2165-2168 - Thomas Ewender, Beat Pfister:
Automatically Creating a Diphone Set from a Speech Database. 2169-2172 - Wesley Mattheyses, Lukas Latacz, Werner Verhelst:
Automatic Viseme Clustering for Audiovisual Speech Synthesis. 2173-2176 - Florian Hinterleitner, Sebastian Möller, Christoph Norrenbrock, Ulrich Heute:
Perceptual Quality Dimensions of Text-to-Speech Systems. 2177-2180 - Shinsuke Mori, Graham Neubig:
A Pointwise Approach to Pronunciation Estimation for a TTS Front-End. 2181-2184 - Mohamed Abou-Zleikha, Julie Carson-Berndsen:
Correlating Text with Prosody. 2185-2188 - Andrew Rosenberg, Raul Fernandez, Bhuvana Ramabhadran:
"What is... Dengue Fever?" - Modeling and Predicting Pronunciation Errors in a Text-to-Speech System. 2189-2192 - Christoph Norrenbrock, Ulrich Heute, Florian Hinterleitner, Sebastian Möller:
Aperiodicity Analysis for Quality Estimation of Text-to-Speech Signals. 2193-2196
Human Speech and Sound Perception I
- Eeva Klintfors, Ellen Marklund, Francisco Lacerda:
Parallels in Infants' Attention to Speech Articulation and to Physical Changes in Speech-Unrelated Objects. 2197-2200 - Daniel Duran, Jagoda Bruni, Grzegorz Dogil, Hinrich Schütze:
Speech Events are Recoverable from Unlabeled Articulatory Data: Using an Unsupervised Clustering Approach on Data Obtained from Electromagnetic Midsaggital Articulography (EMA). 2201-2204 - Sofia Strömbergsson:
Children's Recognition of their own Voice: Influence of Phonological Impairment. 2205-2208 - Takayuki Kagomiya, Seiji Nakagawa:
Evaluation of Bone-Conducted Ultrasonic Hearing-Aid Regarding Transmission of Speaker Discrimination Information. 2209-2212 - Christian Herff, Matthias Janke, Michael Wand, Tanja Schultz:
Impact of Different Feedback Mechanisms in EMG-Based Speech Recognition. 2213-2216 - Michael C. W. Yip:
Phonotactic Constraints and the Segmentation of Cantonese Speech. 2217-2220 - Katrin Schneider, Grzegorz Dogil, Bernd Möbius:
Reaction Time and Decision Difficulty in the Perception of Intonation. 2221-2224 - Ferenc Honbolygo, Valéria Csépe:
Processing of Stress Related Acoustic Cues as Indexed by ERPs. 2225-2228 - Marijt J. Witteman, Andrea Weber, James M. McQueen:
On the Relationship Between Perceived Accentedness, Acoustic Similarity, and Processing Difficulty in Foreign-Accented Speech. 2229-2232 - Shigeaki Amano, Yukari Hirata:
The Perception Boundary Between Single and Geminate Stops in 3- and 4-Mora Japanese Words. 2233-2236 - Yusuke Ijima, Mitsuaki Isogai, Hideyuki Mizuno:
Correlation Analysis of Acoustic Features with Perceptual Voice Quality Similarity for Similar Speaker Selection. 2237-2240
Multilingual and Multimodal Approaches to Spoken Language
- Vicent Alabau, Verónica Romero
, Antonio L. Lagarda, Carlos D. Martínez-Hinarejos:
A Multimodal Approach to Dictation of Handwritten Historical Documents. 2245-2248 - Asterios Toutios, Utpala Musti, Slim Ouni, Vincent Colotte:
Weight Optimization for Bimodal Unit-Selection Talking Head Synthesis. 2249-2252 - Stefan Schaffer, Benjamin Jöckel, Ina Wechsung, Robert Schleicher, Sebastian Möller:
Modality Selection and Perceived Mental Effort in a Mobile Application. 2253-2256 - Jitendra Ajmera, Ashish Verma:
A Cross-Lingual Spoken Content Search System. 2257-2260 - Christian Girardi, Roberto Gretter, Daniele Falavigna, Fabio Brugnara, Diego Giuliani, Marcello Federico:
NeMo: A Platform for Multilingual News Monitoring. 2261-2264 - Sourish Chaudhuri, Mark Harvilla, Bhiksha Raj:
Unsupervised Learning of Acoustic Unit Descriptors for Audio Content Representation and Classification. 2265-2268 - Michael Glodek, Stefan Scherer, Friedhelm Schwenker:
Conditioned Hidden Markov Model Fusion for Multimodal Classification. 2269-2272 - Benjamin Lecouteux, Michel Vacher, François Portet:
Distant Speech Recognition in a Smart Home: Comparison of Several Multisource ASRs in Realistic Conditions. 2273-2276 - Jiansong Chen, Lei Zhu, Bailan Feng, Peng Ding, Bo Xu:
A Robust Approach to Mining Repeated Sequence in Audio Stream. 2277-2280
ASR - New Paradigms and Other Topics
- Dong Yu, Li Deng:
Accelerated Parallelizable Neural Network Learning Algorithm for Speech Recognition. 2281-2284 - Dong Yu, Li Deng:
Deep Convex Net: A Scalable Architecture for Speech Pattern Classification. 2285-2288 - Siwei Wang
, Gina-Anne Levow:
Modeling Broad Context for Tone Recognition with Conditional Random Fields. 2289-2292 - Shang-wen Li, Yow-Bang Wang, Liang-Che Sun, Lin-Shan Lee:
Improved Tonal Language Speech Recognition by Integrating Spectro-Temporal Evidence and Pitch Information with Properly Chosen Tonal Acoustic Units. 2293-2296 - Evandro Gouvêa, Marelie H. Davel:
Kullback-Leibler Divergence-Based ASR Training Data Selection. 2297-2300 - Arild Brandrud Næss, Karen Livescu, Rohit Prabhavalkar
:
Articulatory Feature Classification Using Nearest Neighbors. 2301-2304 - Sébastien Demange, Slim Ouni:
Continuous Episodic Memory Based Speech Recognition Using Articulatory Dynamics. 2305-2308 - T. Li, Philip C. Woodland, Frank Diehl, Mark J. F. Gales:
Graphone Model Interpolation and Arabic Pronunciation Generation. 2309-2312 - Irina Illina, Dominique Fohr, Denis Jouvet:
Grapheme-to-Phoneme Conversion Using Conditional Random Fields. 2313-2316 - Ching-feng Yeh, Chao-Yu Huang, Lin-Shan Lee:
Bilingual Acoustic Model Adaptation by Unit Merging on Different Levels and Cross-Level Integration. 2317-2320 - Marijn Schraagen, Gerrit Bloothooft:
A Qualitative Evaluation of Phoneme-to-Phoneme Technology. 2321-2324 - Daniele Falavigna, Roberto Gretter:
Cheap Bootstrap of Multi-Lingual Hidden Markov Models. 2325-2328 - Nima Mesgarani, Samuel Thomas, Hynek Hermansky:
Adaptive Stream Fusion in Multistream Recognition of Speech. 2329-2332 - Man-Hung Siu, Herbert Gish, Steve Lowe, Arthur Chan:
Unsupervised Audio Patterns Discovery Using HMM-Based Self-Organized Units. 2333-2336 - John Labiak, Karen Livescu:
Nearest Neighbors with Learned Distances for Phonetic Frame Classification. 2337-2340
Speaker Recognition - Modeling, Automatic Procedures, Analysis III
- Ahilan Kanagasundaram, Robbie Vogt, David Dean, Sridha Sridharan, Michael Mason:
i-vector Based Speaker Recognition on Short Utterances. 2341-2344 - Hanwu Sun, Bin Ma:
Study of Overlapped Speech Detection for NIST SRE Summed Channel Speaker Recognition. 2345-2348 - Zhanyu Ma, Arne Leijon:
Super-Dirichlet Mixture Models Using Differential Line Spectral Frequencies for Text-Independent Speaker Identification. 2349-2352 - Hon-Bill Yu, Man-Wai Mak:
Comparison of Voice Activity Detectors for Interview Speech in NIST Speaker Recognition Evaluation. 2353-2356 - Achintya Kumar Sarkar, Srinivasan Umesh:
Eigen-Voice Based Anchor Modeling System for Speaker Identification Using MLLR Super-Vector. 2357-2360 - Wen Wang, Andreas Kathol, Harry Bratt:
Automatic Detection of Speaker Attributes Based on Utterance Text. 2361-2364 - Sandro Cumani, Pier Domenico Batzu, Daniele Colibro, Claudio Vair, Pietro Laface, Vasileios Vasilakakis:
Comparison of Speaker Recognition Approaches for Real Applications. 2365-2368 - Tim Polzehl, Sebastian Möller, Florian Metze:
Modeling Speaker Personality Using Voice. 2369-2372 - Marc Ferras, Koichi Shinoda, Sadaoki Furui:
Structural Joint Factor Analysis for Speaker Recognition. 2373-2376 - Sangeeta Biswas, Marc Ferras, Koichi Shinoda, Sadaoki Furui:
Acoustic Forest for SMAP-Based Speaker Verification. 2377-2380 - Garimella S. V. S. Sivaram, Samuel Thomas, Hynek Hermansky:
Mixture of Auto-Associative Neural Networks for Speaker Verification. 2381-2384
Speech Audio Analysis and Classification
- Seppo Fagerlund, Unto K. Laine:
Stop Consonant Recognition by Temporal Fine Structure of Burst. 2385-2388 - Katrin Kirchhoff, Andrei Alexandrescu:
Phonetic Classification Using Controlled Random Walks. 2389-2392 - Luís Marujo, Márcio Viveiros, João Paulo Neto:
Keyphrase Cloud Generation of Broadcast News. 2393-2396 - Alfonso M. Canterla, Magne Hallstein Johnsen:
Optimized Feature Extraction and HMMs in Subword Detectors. 2397-2400 - Ziqiang Shi, Jiqing Han, Tieran Zheng:
Real-World Speech/Non-Speech Audio Classification Based on Sparse Representation Features and GPCs. 2401-2404 - Manas A. Pathak, Bhiksha Raj:
Privacy Preserving Speaker Verification Using Adapted GMMs. 2405-2408 - Éva Székely, João P. Cabral, Peter Cahill, Julie Carson-Berndsen:
Clustering Expressive Speech Styles in Audiobooks Using Glottal Source Parameters. 2409-2412 - Bogdan Ludusan, Antonio Origlia, Francesco Cutugno:
On the Use of the Rhythmogram for Automatic Syllabic Prominence Detection. 2413-2416 - Sethserey Sam, Xiong Xiao, Laurent Besacier, Eric Castelli, Haizhou Li, Chng Eng Siong:
Speech Modulation Features for Robust Nonnative Speech Accent Detection. 2417-2420 - Chi Zhang, John H. L. Hansen:
Frame-Level Vocal Effort Likelihood Space Modeling for Improved Whisper-Island Detection. 2421-2424 - Xing Fan, John H. L. Hansen:
Speaker Identification for Whispered Speech Using a Training Feature Transformation from Neutral to Whisper. 2425-2428 - Andrea DeMarco, Stephen J. Cox:
An Accurate and Robust Gender Identification Algorithm. 2429-2432 - Xiaohong Yang, Qingcai Chen, Shusen Zhou, Xiaolong Wang:
Deep Belief Networks for Automatic Music Genre Classification. 2433-2436 - Jonathan William Dennis, Tran Huy Dat, Haizhou Li:
Image Representation of the Subband Power Distribution for Robust Sound Classification. 2437-2440 - Bo Xiao, Viktor Rozgic, Athanasios Katsamanis, Brian R. Baucom, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Acoustic and Visual Cues of Turn-Taking Dynamics in Dyadic Interactions. 2441-2444
Human Speech and Sound Perception II
- Alexandra Jesse, Holger Mitterer:
Pointing Gestures do not Influence the Perception of Lexical Stress. 2445-2448 - Ian R. Cushing, Francis F. Li, Ken Worrall, Tim D. Jackson:
Relationships Between Phonetic Features and Speech Perception - A Statistical Investigation from a Large Anechoic British English Corpus. 2449-2452 - Guy J. Brown, Tim Jürgens, Ray Meddis, Matthew Robertson, Nicholas R. Clark:
The Representation of Speech in a Nonlinear Auditory Model: Time-Domain Analysis of Simulated Auditory-Nerve Firing Patterns. 2453-2456 - Luís Pinto Coelho, Daniela Braga, Miguel Sales Dias, Carmen García-Mateo:
An Automatic Voice Pleasantness Classification System Based on Prosodic and Acoustic Patterns of Voice Preference. 2457-2460 - René Carré, Pierre L. Divenyi, Willy Serniclaes, Emmanuel Ferragne, Egidio Marsico, Viet Son Nguyen:
Contributions of F1 and F2 (F2') to the Perception of Plosive Consonants. 2461-2464 - Jeesun Kim, Chris Davis:
Auditory Speech Processing is Affected by Visual Speech in the Periphery. 2465-2468 - Tim Paris, Jeesun Kim, Chris Davis:
Visual Speech Speeds Up Auditory Identification Responses. 2469-2472 - Ryoichi Takashima, Tohru Nagano, Ryuki Tachibana, Masafumi Nishimura:
Agglomerative Hierarchical Clustering of Emotions in Speech Based on Subjective Relative Similarity. 2473-2476 - Guangting Mai, Gang Peng:
Optimal Syllabic Rates and Processing Units in Perceiving Mandarin Spoken Sentences. 2477-2480 - Mirjam Wester, Hui Liang:
Cross-Lingual Speaker Discrimination Using Natural and Synthetic Speech. 2481-2484
Speech Audio Analysis
- Yongzhe Shi, Weiqiang Zhang, Jia Liu:
Robust Audio Fingerprinting Based on Local Spectral Luminance Maxima Scheme. 2485-2488 - Unto K. Laine:
Entropy-Rate Driven Inference of Stochastic Grammars. 2489-2492 - Sheng-Chieh Lee, K. Bharanitharan, Bo-Wei Chen, Jhing-Fa Wang, Chung-Hsien Wu, Min-Jian Liao:
An Efficient Pre-Processing Scheme to Improve the Sound Source Localization System in Noisy Environment. 2493-2496 - Guylaine Le Jan, Yannick Benezeth, Guillaume Gravier, Frédéric Bimbot:
A Study on Auditory Feature Spaces for Speech-Driven Lip Animation. 2497-2500 - Erfan Loweimi, Seyed Mohammad Ahadi, Hamid Sheikhzadeh:
Phase-Only Speech Reconstruction Using Very Short Frames. 2501-2504 - Trond Skogstad, Torbjørn Svendsen:
Frequency-Warped and Stabilized Time-Varying Cepstral Coefficients. 2505-2508 - Freddy William, Abhijeet Sangwan, John H. L. Hansen:
Using Human Perception for Automatic Accent Assessment. 2509-2512 - Carlos Molina, Sungbok Lee, Shrikanth S. Narayanan, Néstor Becerra Yoma:
A Study of the Effectiveness of Articulatory Strokes for Phonemic Recognition. 2513-2516 - Erika Okamoto, Toshio Irino, Ryuichi Nisimura, Hideki Kawahara:
Auditory Filterbank Improves Voice Morphing. 2517-2520 - Anna Katharina Fuchs, Christian Feldbauer, Michael Stark:
Monaural Sound Localization. 2521-2524
Speech Coding
- Masahiro Fukui, Shigeaki Sasaki, Yusuke Hiwasaki, Sachiko Kurihara, Yoichi Haneda:
Dual-Mode AVQ Coding Based on Spectral Masking and Sparseness Detection for ITU-T G.711.1/G.722 Super-Wideband Extensions. 2525-2528 - Azar Taufique, Kumaran Vijayasankar, Wooil Kim, John H. L. Hansen, Marco Tacca, Andrea Fumagalli:
Phone Impact Based Speech Transmission Technique for Reliable Speech Recognition in Poor Wireless Network Conditions. 2529-2532 - Jingting Zhou, Daniel Garcia-Romero, Carol Y. Espy-Wilson:
Automatic Speech Codec Identification with Applications to Tampering Detection of Speech Recordings. 2533-2536 - Chang-Heon Lee, Olivier Rosec, Yannis Stylianou:
A Hybrid Quasi-Harmonic/CELP Wideband Speech Coding Scheme for Unit Selection TTS Synthesis. 2537-2540 - Anssi Rämö, Henri Toukomaa:
Voice Quality Characterization of IETF Opus Codec. 2541-2544 - Christian Fischer Pedersen:
Leja Ordering LSFs for Accurate Estimation of Predictor Coefficients. 2545-2548 - Qipeng Gong, Peter Kabal:
Improved Quality for Conversational VoIP Using Path Diversity. 2549-2552 - Abdul Hannan Khan, Peter Kabal:
Tree Encoding for the ITU-T G.711.1 Speech Coder. 2553-2556 - Dong Wang, Ravichander Vipperla, Nicholas W. D. Evans:
Parallel and Hierarchical Decision Making for Sparse Coding in Speech Recognition. 2557-2560 - Chen-Yu Chiang, Jyh-Her Yang, Ming-Chieh Liu, Yih-Ru Wang, Yuan-Fu Liao, Sin-Horng Chen:
A New Model-Based Mandarin-Speech Coding System. 2561-2564
Robustness and Adaptation for ASR
- Petr Cerva, Karel Palecek, Jan Silovský, Jan Nouza:
Using Unsupervised Feature-Based Speaker Adaptation for Improved Transcription of Spoken Archives. 2565-2568 - Volker Fischer, Siegfried Kunzmann:
Online Speaker Adaptation with Pre-Computed FMLLR Transformations. 2569-2572 - Diego Giuliani, Fabio Brugnara:
Instantaneous Speaker Adaptation Through Selection and Combination of fMLLR Transformation Matrices. 2573-2576 - Hwa Jeon Song, Yunkeun Lee, Hyung Soon Kim:
Joint Bilinear Transformation Space Based Maximum a posteriori Linear Regression Adaptation Using Prior with Variance Function. 2577-2580 - Doddipatla Rama Sanand, Mikko Kurimo:
A Study on Combining VTLN and SAT to Improve the Performance of Automatic Speech Recognition. 2581-2584 - Yu Tsao, Paul R. Dixon, Chiori Hori, Hisashi Kawai:
Incorporating Regional Information to Enhance MAP-Based Stochastic Feature Compensation for Robust Speech Recognition. 2585-2588 - Shweta Ghai, Rohit Sinha:
A Study on the Effect of Pitch on LPCC and PLPC Features for Children's ASR in Comparison to MFCC. 2589-2592 - Denis Jouvet, Dominique Fohr, Irina Illina:
About Handling Boundary Uncertainty in a Speaking Rate Dependent Modeling Approach. 2593-2596 - Ji Wu, Zhiyang He, Ping Lv:
An Active Learning Approach to Task Adaptation. 2597-2600 - Vikas Joshi, Raghavendra Bilgi, Srinivasan Umesh, M. Carmen Benítez, Luz García:
Efficient Speaker and Noise Normalization for Robust Speech Recognition. 2601-2604 - Thomas Winkler:
How Realistic is Artificially Added Noise? 2605-2608
Voice Activity Detection
- Masashi Unoki, Xugang Lu, Rico Petrick, Shota Morita, Masato Akagi, Rüdiger Hoffmann:
Voice Activity Detection in MTF-Based Power Envelope Restoration. 2609-2612 - Miquel Espi, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono, Shigeki Sagayama:
Using Spectral Fluctuation of Speech in Multi-Feature HMM-Based Voice Activity Detection. 2613-2616 - Kannu Mehta, Chau Khoa Pham, Chng Eng Siong:
Linear Dynamic Models for Voice Activity Detection. 2617-2620 - Jouni Pohjalainen, Tuomo Raitio, Paavo Alku:
Detection of Shouted Speech in the Presence of Ambient Noise. 2621-2624 - Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura:
Breath-Detection-Based Telephony Speech Phrasing. 2625-2628 - Gibak Kim:
Multi-Channel Voice Activity Detection Based on Conic Constraints. 2629-2632 - Theodore Petsatodis, Fotios Talantzis, Christos Boukis, Zheng-Hua Tan, Ramjee Prasad:
Multi-Sensor Voice Activity Detection Based on Multiple Observation Hypothesis Testing. 2633-2636 - Chao Gao, Guruprasad Saikumar, Saurabh Khanwalkar, Avi Herscovici, Anoop Kumar, Amit Srivastava, Premkumar Natarajan:
Online Speech Activity Detection in Broadcast News. 2637-2640 - Daniel Reich, Felix Putze, Dominic Heger, Joris IJsselmuiden, Rainer Stiefelhagen, Tanja Schultz:
Tue-SeA Real-Time Speech Command Detector for a Smart Control Room. 2641-2644 - Ekapol Chuangsuwanich, James R. Glass:
Robust Voice Activity Detector for Real World Applications Using Harmonicity and Modulation Frequency. 2645-2648 - Tomas Dekens, Werner Verhelst:
On Noise Robust Voice Activity Detection. 2649-2652 - Xugang Lu, Masashi Unoki, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura:
Adaptive Regularization Framework for Robust Voice Activity Detection. 2653-2656
Human Speech Production I
- Tomoki Koriyama, Takashi Nose, Takao Kobayashi:
On the Use of Extended Context for HMM-Based Spontaneous Conversational Speech Synthesis. 2657-2660 - Asterios Toutios, Slim Ouni:
Predicting Tongue Positions from Acoustics and Facial Features. 2661-2664 - Louis ten Bosch, Annika Hämäläinen, Mirjam Ernestus:
Assessing Acoustic Reduction: Exploiting Local Structure in Speech. 2665-2668 - Bistra Andreeva, Magdalena Wolska:
The "Fortis-Lenis" Distinction in Bulgarian and German. 2669-2672 - Gang Chen, Jody Kreiman, Yen-Liang Shue, Abeer Alwan:
Acoustic Correlates of Glottal Gaps. 2673-2676 - Brian O. Bush, John-Paul Hosom, Alexander Kain, Akiko Amano-Kusumoto:
Using a Genetic Algorithm to Estimate Parameters of a Coarticulation Model. 2677-2680 - Peter Birkholz, Bernd J. Kröger, Christiane Neuschaefer-Rube:
Synthesis of Breathy, Normal, and Pressed Phonation Using a Two-Mass Model with a Triangular Glottis. 2681-2684 - Prasanta Kumar Ghosh, Shrikanth S. Narayanan:
Analysis of Inter-Articulator Correlation in Acoustic-to-Articulatory Inversion Using Generalized Smoothness Criterion. 2685-2688 - Tokihiko Kaburagi:
Frequency-Domain Representation of Source-Filter Coupling and its Effect in the Production of Voice. 2689-2692 - Heikki Rasilo, Unto K. Laine, Okko Johannes Räsänen, Toomas Altosaar:
Method for Speech Inversion with Large Scale Statistical Evaluation. 2693-2696 - Bettina Braun, Sabine Geiselmann:
Italian in the No-Man's Land Between Stress-Timing and Syllable-Timing? Speakers are More Stress-Timed than Listeners. 2697-2700 - Laura Folk, Florian Schiel:
The Lombard Effect in Spontaneous Dialog Speech. 2701-2704
Speaker Recognition - Analysis and Statistics III
- Timur Pekhovsky, Alexandra Lokhanova:
Variational Bayesian Model Selection for GMM-Speaker Verification Using Universal Background Model. 2705-2708 - Mitchell McLaren, David A. van Leeuwen:
To Weight or Not to Weight: Source-Normalised LDA for Speaker Recognition Using i-vectors. 2709-2712 - Chien-Lin Huang, Bin Ma:
Maximum Entropy Based Data Selection for Speaker Recognition. 2713-2716 - Wei Rao, Man-Wai Mak:
Addressing the Data-Imbalance Problem in Kernel-Based Speaker Verification via Utterance Partitioning and Speaker Comparison. 2717-2720 - Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki:
Single-Channel Head Orientation Estimation Based on Discrimination of Acoustic Transfer Function. 2721-2724 - Zhenchun Lei, Yingchun Yang:
Maximum Likelihood i-vector Space Using PCA for Speaker Verification. 2725-2728 - Ming Li, Xiang Zhang, Yonghong Yan, Shrikanth S. Narayanan:
Speaker Verification Using Sparse Representations on Total Variability i-vectors. 2729-2732 - Taufiq Hasan, John H. L. Hansen:
Robust Speaker Recognition in Non-Stationary Room Environments Based on Empirical Mode Decomposition. 2733-2736 - Jani Even, Panikos Heracleous, Carlos Toshinori Ishi, Norihiro Hagita:
Range Based Multi Microphone Array Fusion for Speaker Activity Detection in Small Meetings. 2737-2740 - Tetsuji Ogawa, Hideitsu Hino, Noboru Murata, Tetsunori Kobayashi:
Speaker Verification Robust to Talking Style Variation Using Multiple Kernel Learning Based on Conditional Entropy Minimization. 2741-2744 - Ville Hautamäki, Kong-Aik Lee, Tomi Kinnunen, Bin Ma, Haizhou Li:
Regularized Logistic Regression Fusion for Speaker Verification. 2745-2748 - Ayeh Jafari, Ramji Srinivasan, Danny Crookes, Ji Ming:
A Longest Matching Segment Approach with Baysian Adaptation - Application to Noise-Robust Speaker Recognition. 2749-2752 - Howard Lei, Nikki Mirghafori:
Data Selection with Kurtosis and Nasality Features for Speaker Recognition. 2753-2756 - Inma Hernáez, Ibon Saratxaga, Jon Sánchez, Eva Navas, Iker Luengo:
Use of the Harmonic Phase in Speaker Recognition. 2757-2760
Voice Conversion and Speech Synthesis
- Nicholas Pilkington, Heiga Zen, Mark J. F. Gales:
Gaussian Process Experts for Voice Conversion. 2761-2764 - Christophe Veaux, Xavier Rodet:
Intonation Conversion from Neutral to Expressive Speech. 2765-2768 - Nobuhiko Hattori, Tomoki Toda, Hisashi Kawai, Hiroshi Saruwatari, Kiyohiro Shikano:
Speaker-Adaptive Speech Synthesis Based on Eigenvoice Conversion and Language-Dependent Prosodic Conversion in Speech-to-Speech Translation. 2769-2772 - Javier Pérez, Antonio Bonafonte:
Adding Glottal Source Information to Intra-Lingual Voice Conversion. 2773-2776 - Ming Lei, Junichi Yamagishi, Korin Richmond, Zhen-Hua Ling, Simon King, Li-Rong Dai:
Formant-Controlled HMM-Based Speech Synthesis. 2777-2780 - Tuomo Raitio, Antti Suni, Martti Vainio, Paavo Alku:
Analysis of HMM-Based Lombard Speech Synthesis. 2781-2784 - Nicolas Obin, Pierre Lanchantin, Anne Lacheret, Xavier Rodet:
Discrete/Continuous Modelling of Speaking Style in HMM-Based Speech Synthesis: Design and Evaluation. 2785-2788 - June Sig Sung, Doo Hwa Hong, Shin Jae Kang, Nam Soo Kim:
Factored MLLR Adaptation for Singing Voice Generation. 2789-2792 - Keikichi Hirose, Keiko Ochi, Ryusuke Mihara, Hiroya Hashimoto, Daisuke Saito, Nobuaki Minematsu:
Adaptation of Prosody in Speech Synthesis by Changing Command Values of the Generation Process Model of Fundamental Frequency. 2793-2796 - Miaomiao Wen, Miaomiao Wang, Keikichi Hirose, Nobuaki Minematsu:
Prosody Conversion for Emotional Mandarin Speech Synthesis Using the Tone Nucleus Model. 2797-2800 - Reima Karhila, Mirjam Wester:
Rapid Adaptation of Foreign-Accented HMM-Based Speech Synthesis. 2801-2804 - Bálint Tóth, Tibor Fegyó, Géza Németh:
The Effects of Phoneme Errors in Speaker Adaptation for HMM Speech Synthesis. 2805-2808
Human Speech Production II
- Jeffrey Berry, Sunjing Ji, Ian R. Fasel, Diana Archangeli:
Articulatory Reduction in Mandarin Chinese Words. 2809-2812 - Adam C. Lammert, Michael I. Proctor, Athanasios Katsamanis, Shrikanth S. Narayanan:
Morphological Variation in the Adult Vocal Tract: A Modeling Study of its Potential Acoustic Impact. 2813-2816 - Steven M. Lulich, Harish Arsikere, John R. Morton, Gary K. F. Leung, Abeer Alwan, Mitchell Sommers:
Analysis and Automatic Estimation of Children's Subglottal Resonances. 2817-2820 - Wolfgang Wokurek, Andreas Madsack:
Acceleration Sensor Based Estimates of Subglottal Resonances: Short vs. Long Vowels. 2821-2824 - Nicolas Audibert, Angélique Amelot:
Comparison of Nasalance Measurements from Accelerometers and Microphones and Preliminary Development of Novel Features. 2825-2828 - Michael Fitzpatrick, Jeesun Kim, Chris Davis:
The Effect of Seeing the Interlocutor on Speech Production in Different Noise Types. 2829-2832 - Vincent Aubanel, Martin Cooke, Julián Villegas, María Luisa García Lecumberri:
Conversing in the Presence of a Competing Conversation: Effects on Speech Production. 2833-2836 - Mattias Heldner, Jens Edlund, Anna Hjalmarsson, Kornel Laskowski:
Very Short Utterances and Timing in Turn-Taking. 2837-2840 - Athanasios Katsamanis, Erik Bresch, Vikram Ramanarayanan, Shrikanth S. Narayanan:
Validating rt-MRI Based Articulatory Representations via Articulatory Recognition. 2841-2844 - Yinghao Li, Jiangping Kong:
An Electropalatographic and Acoustic Study on Anticipatory Coarticulation in V1#C2V2 Sequences in Standard Chinese. 2845-2848 - Iris Hanique, Mirjam Ernestus:
Final /t/ Reduction in Dutch Past-Participles: The Role of Word Predictability and Morphological Decomposability. 2849-2852 - Zeynab Raeesy, Ladan Baghai-Ravary, John S. Coleman
:
Parametrising Degree of Articulator Movement from Dynamic MRI Data. 2853-2856
Systems for LVCSR and Rich Transcription
- Xunying Liu, Mark J. F. Gales, Philip C. Woodland:
Improving LVCSR System Combination Using Neural Network Language Model Cross Adaptation. 2857-2860 - Jian Xue, Xiaodong Cui, Gregg Daggett, Etienne Marcheret, Bowen Zhou:
Towards High Performance LVCSR in Speech-to-Speech Translation System on Smart Phones. 2861-2864 - Yun-Hsuan Sung, Martin Jansche, Pedro J. Moreno:
Deploying Google Search by Voice in Cantonese. 2865-2868 - Fabio Brugnara:
A Multithreaded Implementation of Viterbi Decoding on Recursive Transition Networks. 2873-2876 - Stefan Kombrink, Tomás Mikolov, Martin Karafiát, Lukás Burget:
Recurrent Neural Network Based Language Modeling in Meeting Recognition. 2877-2880 - Michele Cossalter, Priya Sundararajan, Ian R. Lane:
Ad-Hoc Meeting Transcription on Clusters of Mobile Devices. 2881-2884 - Kacem Abida, Fakhri Karray:
ROVER Enhancement with Automatic Error Detection. 2885-2888 - Yuya Akita, Tatsuya Kawahara:
Automatic Comma Insertion of Lecture Transcripts Based on Multiple Annotations. 2889-2892
Language, Dialect Identification and Speaker Diarization
- Chang Huai You, Haizhou Li, Kong-Aik Lee:
Study on the Relevance Factor of Maximum a Posteriori with GMM for Language Recognition. 2893-2896 - Tania Habib, Harald Romsdorfer:
Improving Multiband Position-Pitch Algorithm for Localization and Tracking of Multiple Concurrent Speakers by Using a Frequency Selective Criterion. 2897-2900 - Amparo Varona, Mikel Peñagarikano, Luis Javier Rodríguez, Germán Bordel:
On the Use of Lattices of Time-Synchronous Cross-Decoder Phone Co-Occurrences in a SVM-Phonotactic Language Recognition System. 2901-2904 - Naohiro Tawara, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi:
Speaker Clustering Based on Utterance-Oriented Dirichlet Process Mixture Model. 2905-2908 - Jan Silovský, Jan Prazak, Petr Cerva, Jindrich Zdánský, Jan Nouza:
PLDA-Based Clustering for Speaker Diarization of Broadcast Streams. 2909-2912 - Mehdi Soufifar, Marcel Kockmann, Lukás Burget, Oldrich Plchot, Ondrej Glembek, Torbjørn Svendsen:
iVector Approach to Phonotactic Language Recognition. 2913-2916 - Christopher Alberti, Michiel Bacchiani:
Discriminative Features for Language Identification. 2917-2920 - Robert Allen Fox, Ewa Jacewicz:
Perceptual Sensitivity to Dialectal and Generational Variations in Vowels. 2921-2924 - Qian Yang, Qin Jin, Tanja Schultz:
Investigation of Cross-Show Speaker Diarization. 2925-2928 - Vesa Siivola, Bryan L. Pellom, Meagan Sills:
Language Identification for Text Chats. 2929-2932 - Kong-Aik Lee, Chang Huai You, Ville Hautamäki, Anthony Larcher, Haizhou Li:
Spoken Language Recognition in the Latent Topic Simplex. 2933-2936
Paralinguistic Information - Analysis and Tools
- Frederike Gottsmann, Corinna Harwardt:
Investigating Robustness of Spectral Moments on Normal- and High-Effort Speech. 2937-2940 - Corinna Harwardt:
Comparing the Impact of Raised Vocal Effort on Various Spectral Parameters. 2941-2944 - Keith W. Godin, John H. L. Hansen:
Vowel Context and Speaker Interactions Influencing Glottal Open Quotient and Formant Frequency Shifts in Physical Task Stress. 2945-2948 - Serguei V. S. Pakhomov, Michael E. Kotlyar:
Prosodic Correlates of Individual Physiological Response to Stress. 2949-2952 - Marcela Charfuelan, Marc Schröder:
The Vocal Effort of Dominance in Scenario Meetings. 2953-2956 - Sona Patel, Rahul Shrivastav:
A Preliminary Model of Emotional Prosody Using Multidimensional Scaling. 2957-2960 - Jangwon Kim, Sungbok Lee, Shrikanth S. Narayanan:
An Exploratory Study of the Relations Between Perceived Emotion Strength and Articulatory Kinematics. 2961-2964 - Carlos Toshinori Ishi, Hiroshi Ishiguro, Norihiro Hagita:
Improved Acoustic Characterization of Breathy and Whispery Voices. 2965-2968 - D. Govind, S. R. Mahadeva Prasanna, Bayya Yegnanarayana:
Neutral to Target Emotion Conversion Using Source and Suprasegmental Information. 2969-2972 - Khiet P. Truong, Ronald Poppe
, Iwan de Kok, Dirk Heylen:
A Multimodal Analysis of Vocal and Visual Backchannels in Spontaneous Dialogs. 2973-2976 - Nikos Malandrakis, Alexandros Potamianos, Elias Iosif, Shrikanth S. Narayanan:
Kernel Models for Affective Lexicon Creation. 2977-2980
Special Sessions
Speech and Language Processing-Based Assistive Technologies and Health Applications I
- Douglas E. Sturim, Pedro A. Torres-Carrasquillo, Thomas F. Quatieri, Nicolas Malyska, Alan McCree:
Automatic Detection of Depression in Speech Using Gaussian Mixture Modeling with Factor Analysis. 2981-2984 - H. Timothy Bunnell, Jason Lilley, Sigfrid D. Soli, Ivan Pal:
Utterance Verification for Automating the Hearing in Noise Test (HINT). 2985-2988 - Emily Mower, Chi-Chun Lee, James Gibson, Theodora Chaspari, Marian E. Williams, Shrikanth S. Narayanan:
Analyzing the Nature of ECA Interactions in Children with Autism. 2989-2993
Speech and Language Processing-Based Assistive Technologies and Health Applications II
- Theologos Athanaselis, Stelios Bakamidis, Ioannis Dologlou, Evmorfia N. Argyriou, Antonios Symvonis:
Incorporating Speech Recognition Engine into an Intelligent Assistive Reading System for Dyslexic Students. 2993-2996 - Nicholas Cummins, Julien Epps, Michael Breakspear, Roland Goecke
:
An Investigation of Depressed Speech Detection: Features and Normalization. 2997-3000 - Michelle Hewlett Sanchez, Dimitra Vergyri, Luciana Ferrer, Colleen Richey, Pablo Garcia, Bruce Knoth, William Jarrold:
Using Prosodic and Spectral Features in Detecting Depression in Elderly Males. 3001-3004 - Catherine Middag, Tobias Bocklet, Jean-Pierre Martens, Elmar Nöth:
Combining Phonological and Acoustic ASR-Free Features for Pathological Speech Intelligibility Assessment. 3005-3008 - Robin Hofe, Stephen R. Ell, Michael J. Fagan, James M. Gilbert, Phil D. Green, Roger K. Moore
, Sergey I. Rybchenko:
Speech Synthesis Parameter Generation for the Assistive Silent Speech Interface MVOCA. 3009-3012 - Peter A. Heeman, Andy McMillin, J. Scott Yaruss:
Computer-Assisted Disfluency Counts for Stuttered Speech. 3013-3016 - Richard Hummel, Wai-Yip Chan, Tiago H. Falk:
Spectral Features for Automatic Blind Intelligibility Estimation of Spastic Dysarthric Speech. 3017-3020 - Emily Tucker Prud'hommeaux, Brian Roark:
Extraction of Narrative Recall Patterns for Neuropsychological Assessment. 3021-3024 - Aki Kunikoshi, Yu Qiao, Daisuke Saito, Nobuaki Minematsu, Keikichi Hirose:
Gesture Design of Hand-to-Speech Converter Derived from Speech-to-Hand Converter Based on Probabilistic Integration Model. 3025-3028 - Akira Sasou:
Powered Wheelchair Control Using Acoustic-Based Recognition of Head Gesture Accompanying Speech. 3029-3032 - José Luis Blanco Murillo, Rubén Fernández Pozo, Doroteo Torre Toledano, Javier Caminero, Eduardo López:
Analyzing Training Dependencies and Posterior Fusion in Discriminant Classification of Apnea Patients Based on Sustained and Connected Speech. 3033-3036
Crowdsourcing for Speech Processing I
- Gabriel Parent, Maxine Eskénazi:
Speaking to the Crowd: Looking at Past Achievements in Using Crowdsourcing for Speech and Predicting Future Challenges. 3037-3040
Crowdsourcing for Speech Processing II
- Chia-ying Lee, James R. Glass:
A Transcription Task for Crowdsourcing with Automatic Quality Control. 3041-3044 - Kartik Audhkhasi, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Reliability-Weighted Acoustic Model Adaptation Using Crowd Sourced Transcriptions. 3045-3048 - Martin Cooke, Jon Barker, María Luisa García Lecumberri, Krzysztof Wasilewski:
Crowdsourcing for Word Recognition in Noise. 3049-3052 - Sabine Buchholz, Javier Latorre:
Crowdsourcing Preference Tests, and How to Detect Cheating. 3053-3056 - Ian McGraw, James R. Glass, Stephanie Seneff:
Growing a Spoken Language Interface on Amazon Mechanical Turk. 3057-3060 - Filip Jurcícek, Simon Keizer, Milica Gasic, François Mairesse, Blaise Thomson, Kai Yu, Steve J. Young:
Real User Evaluation of Spoken Dialogue Systems Using Amazon Mechanical Turk. 3061-3064 - Hadrien Gelas, Solomon Teferra Abate, Laurent Besacier, François Pellegrino:
Quality Assessment of Crowdsourcing Transcriptions for African Languages. 3065-3068 - Keelan Evanini, Klaus Zechner:
Using Crowdsourcing to Provide Prosodic Annotations for Non-Native Speech. 3069-3072 - Masataka Goto
, Jun Ogata:
PodCastle: Recent Advances of a Spoken Document Retrieval Service Improved by Anonymous User Contributions. 3073-3076
Spoken Language Processing of Human-Human Conversations I
- Fabio Valente, Alessandro Vinciarelli:
Language-Independent Socio-Emotional Role Recognition in the AMI Meetings Corpus. 3077-3080 - Rivka Levitan, Julia Hirschberg:
Measuring Acoustic-Prosodic Entrainment with Respect to Multiple Levels and Dimensions. 3081-3084 - Youngja Park:
Automatic Call Quality Monitoring Using Cost-Sensitive Classification. 3085-3088
Spoken Language Processing of Human-Human Conversations II
- Tomoharu Iwata, Shinji Watanabe:
Learning Influences from Word Use in Polylogue. 3089-3092 - Wen Wang, Kristin Precoda, Colleen Richey, Geoffrey Raymond:
Identifying Agreement/Disagreement in Conversational Speech: A Cross-Lingual Study. 3093-3096 - Daniel Neiberg, Joakim Gustafson:
A Dual Channel Coupled Decoder for Fillers and Feedback. 3097-3100 - Chi-Chun Lee, Athanasios Katsamanis, Matthew P. Black, Brian R. Baucom, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
An Analysis of PCA-Based Vocal Entrainment Measures in Married Couples' Affective Spoken Interactions. 3101-3104
Speech and Audio Processing for Human-Robot Interaction I
- Lars Schillingmann, Petra Wagner, Christian Munier, Britta Wrede, Katharina J. Rohlfing:
Using Prominence Detection to Generate Acoustic Feedback in Tutoring Scenarios. 3105-3108 - Takuma Otsuka, Kazuhiro Nakadai, Tetsuya Ogata, Hiroshi G. Okuno:
Bayesian Extension of MUSIC for Sound Source Localization and Tracking. 3109-3112 - Martin Wöllmer, Felix Weninger, Stefan Steidl, Anton Batliner, Björn W. Schuller:
Speech-Based Non-Prototypical Affect Recognition for Child-Robot Interaction in Reverberated Environments. 3113-3104
Speech and Audio Processing for Human-Robot Interaction II
- Mounira Maazaoui, Yves Grenier, Karim Abed-Meraim:
Blind Source Separation for Robot Audition Using Fixed Beamforming with HRTFs. 3117-3120 - Marie Tahon, Agnès Delaborde, Laurence Devillers:
Real-Life Emotion Detection from Speech in Human-Robot Interaction: Experiments Across Diverse Corpora with Child and Adult Voices. 3121-3124 - Yazid Attabi, Pierre Dumouchel:
Weighted Ordered Classes - Nearest Neighbors: A New Framework for Automatic Emotion Recognition from Speech. 3125-3128 - David Doukhan, Albert Rilliard, Sophie Rosset, Martine Adda-Decker, Christophe d'Alessandro:
Prosodic Analysis of a Corpus of Tales. 3129-3132 - Carlos Toshinori Ishi, Hiroshi Ishiguro, Norihiro Hagita:
Analysis of Acoustic-Prosodic Features Related to Paralinguistic Information Carried by Interjections in Dialogue Speech. 3133-3136 - Martin Heckmann, Kazuhiro Nakadai, Hirofumi Nakajima:
Robust Intonation Pattern Classification in Human Robot Interaction. 3137-3140 - Takashi Sumiyoshi, Masahito Togami, Yasunari Obuchi:
ASR for Human-Symbiotic Robot "EMIEW2" with Mechanical Noise and Floor-Level Noise Reduction. 3141-3144
Speech Technology for Under-Resourced Languages I
- Ngoc Thang Vu, Franziska Kraus, Tanja Schultz:
Rapid Building of an ASR System for Under-Resourced Languages Based on Multilingual Unsupervised Training. 3145-3148 - Shyamal Kr. Das Mandal, Somnath Chandra Vijay Kumar, Swaran Lata, Asoke Kumar Datta:
Places and Manner of Articulation of Bangla Consonants: A EPG Based Study. 3149-3152 - Marelie H. Davel, Charl Johannes van Heerden, Neil Kleynhans, Etienne Barnard:
Efficient Harvesting of Internet Audio for Resource-Scarce ASR. 3153-3156
Speech Technology for Under-Resourced Languages II
- Milan Secujski, Darko Pekar, Niksa Jakovljevic:
Automatic Prosody Generation for Serbo-Croatian Speech Synthesis Based on Regression Trees. 3157-3160 - Alexey Karpov, Irina S. Kipyatkova, Andrey Ronzhin:
Very Large Vocabulary ASR for Spoken Russian with Syntactic and Morphemic Analysis. 3161-3164 - Timothy Kempton, Roger K. Moore, Thomas Hain:
Cross-Language Phone Recognition when the Target Language Phoneme Inventory is not Known. 3165-3168 - Sourish Chaudhuri, Bhiksha Raj, Tony Ezzat:
A Paradigm for Limited Vocabulary Speech Recognition Based on Redundant Spectro-Temporal Feature Sets. 3169-3172 - Nora Barroso, Karmele López de Ipiña, Aitzol Ezeiza, Carmen Hernández, Nerea Ezeiza, Odei Barroso, Unai Susperregi, Simeon Barroso:
GorUp: An Ontology-Driven Audio Information Retrieval System that Suits the Requirements of Under-Resourced Languages. 3173-3176 - Nic J. de Vries, Jaco Badenhorst, Marelie H. Davel, Etienne Barnard, Alta de Waal:
Woefzela - An Open-Source Platform for ASR Data Collection in the Developing World. 3177-3180 - Hansjörg Mixdorff, Lehlohonolo Mohasi, Malillo Machobane, Thomas Niesler:
A Study on the Perception of Tone and Intonation in Sesotho. 3181-3184 - Febe de Wet, Alta de Waal, Gerhard B. Van Huyssteen:
Developing a Broadband Automatic Speech Recognition System for Afrikaans. 3185-3188 - Herman Kamper
, Thomas Niesler:
Multi-Accent Speech Recognition of Afrikaans, Black and White Varieties of South African English. 3189-3192 - Charturong Tantibundhit, Chutamanee Onsuwan, Tanawan Saimai, Nantaporn Saimai, Sumonmas Thatphithakkul, Patcharika Chootrakool, Krit Kosawat, Nattanun Thatphithakkul:
Perceptual Representation of Consonant Sounds in Thai. 3193-3196 - Mumtaz B. Mustafa, Raja Noor Ainon, Roziati Zainuddin, Zuraidah M. Don, Gerry Knowles:
A Cross-Lingual Approach to the Development of an HMM-Based Speech Synthesis System for Malay. 3197-3200
Special Events
Speaker State Challenge - Intoxication and Sleepiness I
- Björn W. Schuller, Stefan Steidl, Anton Batliner, Florian Schiel, Jarek Krajewski:
The INTERSPEECH 2011 Speaker State Challenge. 3201-3204 - Claude Montacié, Marie-José Caraty:
Combining Multiple Phoneme-Based Classifiers with Audio Feature-Based Classifier for the Detection of Alcohol Intoxication. 3205-3208 - Fadi Biadsy, William Yang Wang, Andrew Rosenberg, Julia Hirschberg:
Intoxication Detection Using Phonetic, Phonotactic and Prosodic Cues. 3209-3212 - Tobias Bocklet, Korbinian Riedhammer, Elmar Nöth:
Drink and Speak: On the Automatic Classification of Alcohol Intoxication by Acoustic, Prosodic and Text-Based Features. 3213-3216 - Daniel Bone, Matthew Black, Ming Li, Angeliki Metallinou, Sungbok Lee, Shrikanth S. Narayanan:
Intoxicated Speech Detection by Fusion of Speaker Normalized Hierarchical Features and GMM Supervectors. 3217-3220 - Stefan Ultes, Alexander Schmitt, Wolfgang Minker:
Attention, Sobriety Checkpoint! Can Humans Determine by Means of Voice, if Someone is Drunk... and Can Automatic Classifiers Compete? 3221-3224 - Florian Hönig, Anton Batliner, Elmar Nöth:
Does it Groove or does it Stumble - Automatic Classification of Alcoholic Intoxication using Prosodic Features. 3225-3228
Speech Processing Tools
- Christoph Draxler, Toomas Altosaar, Sadaoki Furui, Mark Y. Liberman, Peter Wittenburg:
Speech Processing Tools - An Introduction to Interoperability. 3229-3232 - Jean-Philippe Goldman:
EasyAlign: An Automatic Phonetic Alignment Tool Under Praat. 3233-3236 - Julián Villegas, Martin Cooke, Vincent Aubanel, Marco Aldo Piccolino Boniforti:
Mtrans: A Multi-Channel, Multi-Tier Speech Annotation Tool. 3237-3236 - Christophe Cerisara, Claire Gardent:
The JSafran Platform for Semi-Automatic Speech Processing. 3241-3244 - Johannes Wagner, Florian Lingenfelser, Elisabeth André
:
The Social Signal Interpretation Framework (SSI) for Real Time Signal Processing and Recognition. 3245-3248 - Han Sloetjes, Peter Wittenburg, Aarthy Somasundaram:
ELAN - Aspects of Interoperability and Functionality. 3249-3252 - Marc Schröder, Marcela Charfuelan, Sathish Pammi, Ingmar Steiner:
Open Source Voice Creation Toolkit for the MARY TTS Platform. 3253-3256 - Stefan Steidl, Korbinian Riedhammer, Tobias Bocklet, Florian Hönig, Elmar Nöth:
Java Visual Speech Components for Rapid Application Development of GUI Based Speech Processing Applications. 3257-3260 - Michael Johnston, Giuseppe Di Fabbrizio, Simon Urbanek:
mTalk - A Multimodal Browser for Mobile Services. 3261-3264 - Stuart N. Wrigley
, Thomas Hain
:
Web-Based Automatic Speech Recognition Service - webASR. 3265-3268 - Markus Klehr, Andreas Ratzka, Thomas Roß:
A Web Based Speech Transcription Workplace. 3269-3272 - Philippe Martin:
WinPitch: A Multimodal Tool for Speech Analysis of Endangered Languages. 3273-3276 - Mark A. Huckvale:
Recording Caregiver Interactions for Machine Acquisition of Spoken Language Using the KLAIR Virtual Infant. 3277-3280
Speaker State Challenge - Intoxication and Sleepiness II
- Florian Schiel:
Perception of Alcoholic Intoxication in Speech. 3281-3284 - Tauhidur Rahman, Soroosh Mariooryad, Shalini Keshavamurthy, Gang Liu, John H. L. Hansen, Carlos Busso:
Detecting Sleepiness by Fusing Classifiers Trained with Novel Acoustic Features. 3285-3288 - Albino Nogueiras Rodríguez:
An HMM-Based Approach to the INTERSPEECH 2011 Speaker State Challenge. 3289-3292 - Elif Bozkurt, Engin Erzin, Çigdem Eroglu Erdem, A. Tanju Erdem:
RANSAC-Based Training Data Selection for Speaker State Recognition. 3293-3296 - Rok Gajsek, Simon Dobrisek, France Mihelic:
University of Ljubljana System for Interspeech 2011 Speaker State Challenge. 3297-3300 - Dong-Yan Huang, Shuzhi Sam Ge, Zhengchen Zhang:
Speaker State Classification Based on Fusion of Asymmetric SIMPLS and Support Vector Machines. 3301-3304
Show & Tell Sessions
Show & Tell Demonstration - Speech Systems and Applications
- Felix Burkhardt:
An Affective Spoken Storyteller. 3305-3306 - Lijuan Wang, Wei Han, Frank K. Soong, Qiang Huo:
Text Driven 3D Photo-Realistic Talking Head. 3307-3308 - Takayuki Arai:
Physical Models Producing Vowels with Pitch Variation. 3309-3310 - Margot Mieskes:
An Engine-Independent Text-to-Speech Workplace. 3311-3312 - Simone Carcone, Carlo Giovannella:
An Application to Test the Emotion Conveyed by Vocal and Musical Signals. 3313-3314 - Mariusz Ziólko, Jakub Galka, Bartosz Ziólko, Tomasz Jadczyk, Dawid Skurzok, Mariusz Masior:
Automatic Speech Recognition System Dedicated for Polish. 3315-3316 - Kong-Aik Lee, Anthony Larcher, Helen Thai, Bin Ma, Haizhou Li:
Joint Application of Speech and Speaker Recognition for Automation and Security in Smart Home. 3317-3318 - Staffan Larsson, Alexander Berman, Jessica Villing:
Adding a Speech Cursor to a Multimodal Dialogue System. 3319-3320 - S. Thomas Christie, Serguei V. S. Pakhomov:
Prosody Toolkit: Integrating HTK, Praat and WEKA. 3321-3322 - Fabiano Francesconi, Arindam Ghosh, Giuseppe Riccardi, Marco Ronchetti, Alex Vagin:
Collecting Life Logs for Experience-Based Corpora. 3323-3324
Show & Tell Demonstration - Mobility and Web-Services
- Stuart N. Wrigley, Thomas Hain:
Making an Automatic Speech Recognition Service Freely Available on the Web. 3325-3326 - Yeon-Jun Kim, Thomas Okken, Alistair Conkie, Giuseppe Di Fabbrizio:
AT&T VoiceBuilder: A Cloud-Based Text-to-Speech Voice Builder Tool. 3327-3328 - Roger C. F. Tucker, Dan Fry, Vincent Wan, Stuart N. Wrigley, Thomas Hain:
Extending Audio Notetaker to Browse WebASR Transcriptions. 3329-3330 - Samantha Ainsley, Linne Ha, Martin Jansche, Ara Kim, Masayuki Nanzawa:
A Web-Based Tool for Developing Multilingual Pronunciation Lexicons. 3331-3332 - Michael Johnston, Patrick Ehlen:
Speak4it and the Multimodal Semantic Interpretation System. 3333-3334 - Tanel Alumäe, Ahti Kitsik:
TSAB - Web Interface for Transcribed Speech Collections. 3335-3336 - Andrej Ljolje, Vincent Goffin, Diamantino Caseiro, Taniya Mishra, Mazin Gilbert:
Visual Voice Mail to Text on the iPhone/iPad. 3337-3338 - Christoph Draxler:
Percy - An HTML5 Framework for Media Rich Web Experiments on Mobile Devices. 3339-3340 - Mark A. Huckvale:
The KLAIR Toolkit for Recording Interactive Dialogues with a Virtual Infant. 3341-3342 - Francesco Nesta, Marco Matassoni, Hari Krishna Maganti:
Real-Time Prototype for Integration of Blind Source Extraction and Robust Automatic Speech Recognition. 3343-3344

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.