CASSI Speech Recognition

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 14

CASSI Speech Recognition:

Adding Speech Recognition to Embedded Devices

by

Praveen lvv
INTRODUCTION

What is CASSI ?
 Conversay Advanced Symbolic Speech Interface

 It can be used in a variety of embedded systems.


It runs on either single or dual-processor hardware designs

Conversay developers and customers write application


code that uses the CASSI API to integrate speech
recognition and text-to-speech (TTS) capability into
embedded products.
> CASSI provides continuous, speaker-independent
speech recognition
What is TTS ?
Text-To-Speech (TTS):
CASSI contains two modules for performing TTS:
Rosetta and a TTS synthesis module.

Rosetta, the text-to-phonetics unit, accepts


arbitrary written text as input and outputs a string of
phonemes for CASSI to synthesize

process
of incorporating speech technology
1. Definition of capabilities
2. Analysis of hardware resources
3. User interface design
4. Development
HARDWARE ENVIRONMENT:
Modular nature.

 Suitable for a variety of systems.

 Used with single processor designs where one


processor handles all component execution.

 Feature extraction and TTS synthesis may be


separated onto their own DSP (or other front-end signal
processor)
Front-End Block:
The front-end block is used for recognition and TTS functions
Processor Block (Back-End):

The processor block performs all other code functions, including


topic management and search
AUTOMATIC SPEECH RECOGNISATION

What does speaker dependent / adaptive / independent mean?


What does continuous speech and isolated-word mean?

A continuous speech system operates on speech in


which words are connected together, i.e. not separated
by pauses.

Continuous speech is more difficult to handle because of a variety


of effects.

An isolated-word system operates on single words at a


time - requiring a pause between saying each word.

This is the simplest form of recognition


The Process of Speech Recognition
Acoustic-Phonetic

Pattern Recognition

Artificial Intelligence

INTERFACE
The Experiment

’Yes’ spoken by first person

‘Yes’ spoken by the second


person
The Basic Steps

 Divide the sound wave into evenly spaced blocks.

 Process each block for important characteristics .

 Attempt to associate each block with a


Phone, which is the most basic unit of speech,
producing a string of phones.

Find the word whose model is the most likely match


speech recognition systems use the basic three-stage

Architecture:

Feature detection in which the


raw acoustic waveform is
represented in a more useful
space

Probabilistic classification of
the feature vectors, in which the
frames are scored as looking
more or less likely as versions

Search for best word-


sequence hypothesis in which
a word sequence is found that is
consistent with the constraints of
lexicon and grammar
ADVANTAGES OF SPEECH RECOGNISATION

Easy search and index recorded audio and video data.

Speech recognition is also useful as a form of input.

 people working in active environment such as hospitals to use computers.

 people with handicaps to use computers.


CONCLUSION !!!

 Visual cues to help computers decipher speech sounds that


are obscured by environmental noise.

 Speech-to-speech translation project for spontaneous speech

 Multi-engine Spanish-to-English machine translation system

Building synthetic voices


Thank You

You might also like