Speech Intelligilbility and PA System
Speech Intelligilbility and PA System
Speech Intelligilbility and PA System
Procedure :
How the experiment was conducted.
Step by Step explanation of the method of ‘Machine
Measure’ of Speech Intelligibility conducted.
Conclusion
An overall interpretation of the results obtained.
Assessment of the speakers & the environments.
General inferences
References
2
There’s an important difference between music and speech. The brain is capable of
“filling in” a fair amount of missing information in music, because there’s a high degree of
predictability (generally while hearing music and if you didn’t get the bass line or some part
of the song which you are keen on listening in the first four measures, you’ll pick it up when
it repeats in the next four beats) But speech is rich in constantly-changing information.
So this experiment was conducted to obtain a practical knowledge of speech intelligibility &
also to gain experience of setting up a public address sound system.
Various speaker systems were assessed and their respective intelligibility (STI-PA scores) were
noted to compare with each other; the environments or rooms in which these tests were
performed were also assessed. Hence the capability of the space to accommodate a good
sense of speech intelligibility could also be judged with the measured data and conclusions.
Intelligibility could be defined as the degree to which speech can be understood. With
specific reference to speech communication system specification and testing, intelligibility
denotes the extent to which trained listeners can identify words or phrases that are spoken
by trained talkers and transmitted to the listeners via the communication system.
Public address systems in building complexes have to inform persons about escape directions
in case of emergency. Such public buildings include airports, railway stations, shopping
centres or concert halls. However if such announcements are misunderstood due to poor
system quality, tragic consequences may result. Therefore, it is essential to design, install
and verify sound reinforcement systems properly for intelligibility. In addition, a variety of
other applications such as legal and medical applications may require intelligibility
verification. Speech communication systems (Public Address Systems) therefore are subject
to more stringent requirements than music systems.
The sound power in speech is carried by the vowels, which average from 30 to 300
milliseconds in duration. Intelligibility is imparted chiefly by the consonants, which average
from 10 to 100 milliseconds in duration and may be as much as 27 dB lower in amplitude than
the vowels. The strength of the speech signal varies as a whole, and the strength of individual
frequency ranges varies with respect to the others as the formants change.”1
(In Fig.1 is a vocal spectrum graph for male and female speakers with an “idealized” human
vocal spectrum superimposed)
The listener’s challenge is to analyze speech sounds into meaningful units of language - a
complicated task. Gaps in the sound don’t necessarily correspond to word or syllable breaks.
1Section-1 : ‘Speech Intelligibility Papers’– Written by Ralph Jones. Edited by Rachel Murray P.E
http://www.meyersound.com/support/papers/speech/
3
Speech sounds also are not discrete events: rather, they merge and overlap in time, and the
articulation of a given phoneme differs in different contexts and with different speakers.
In fact, the precise ways in which the ear-brain mechanism decodes speech remain something
of a mystery. Such factors as loudness, duration and spectral content certainly affect
speech perception, but how they may interact is not fully understood.
Fig.1: Vocal spectrum graph for male and female speakers with an “idealized” human vocal spectrum
superimposed 2
The goal of a speech reinforcement system is to deliver the speaking voice to listeners with
sufficient clarity to be understood. Given the complexity of the speech signal, the task of
providing high-quality speech reinforcement in real-world, less-than-ideal conditions is
doubly complicated.
Masking
The most common obstacle that speech system designers face is the intrusion of unwanted
sounds that inevitably interfere with the speech signal. The effect is called “masking,” — a
general term that covers a very wide variety of situations.
2 French, N. R. and Steinberg, J. C. “Factors Governing the Intelligibility of Speech Sounds,” JASA vol.
19, no. 1 (1947)
4
Masking noise can come from acoustical sources such as ventilation equipment, traffic,
crowds and commonly, reverberation and echoes. It can also arise electronically from
thermal noise, tape hiss or distortion products. If the sound system has unusually large peaks
in its frequency response, the speech signal can even end up masking itself. One relationship
between the strength of the speech signal and the masking sound is called the signal-to-
noise ratio expressed in decibels. Ideally, the S/N ratio is greater than 0dB, indicating that
the speech is louder than the noise. Just how much louder the speech needs to be in order to
be understood varies with, among other things, the type and spectral content of the masking
noise.
So we could define it as the ratio between the strength of the desired speech signal and that
of introduced noise, expressed in decibels. At 0 dB the two are of equal strength; negative
values are associated with loss of intelligibility due to masking. Positive values are usually
associated with better intelligibility.
“The most uniformly effective mask is broadband noise. Although, narrow-band noise is less
effective at masking speech than broadband noise, the degree of masking varies with
frequency. High-frequency noise masks only the consonants, and its effectiveness as a mask
decreases as the noise gets louder. But low-frequency noise is a much more effective mask
when the noise is louder than the speech signal, and at high sound pressure levels it masks
both vowels and consonants”3.
The direction, from which a masking sound arrives, relative to the direction of the speech
signal, can affect the degree of masking. If the noise comes from the same place, the
masking is greatest; it decreases as the distance between the noise and the speech increases
because this makes it easier for the brain to discriminate between them. The masking effect
is lowest when the presentation is through headphones, with the speech in one ear and the
mask in the other. (Unfortunately, we can’t take advantage of that feature in sound
reinforcement).
Statistical tests using trained talkers and listeners are by far the most accurate and reliable
methods for intelligibility testing. Unfortunately, they are complicated to set up, time-
consuming to conduct and require extensive statistical analysis to interpret. Hence,
consultants and acousticians have long sought an automated, machine-based test that could
quickly and easily yield meaningful intelligibility scores for speech systems.
Nowadays, highly developed algorithms as SII (Speech Intelligibility Index) and various forms
of the STI (Speech Transmission Index) allow measuring speech intelligibility. These
techniques take care of many parameters which are important for intelligibility such as:
• Speech level
• Background noise level
• Reflections
• Reverberation
• Psychoacoustic effects (masking effects)
3Section-2 : ‘Speech Intelligibility Papers’– Written by Ralph Jones. Edited by Rachel Murray P.E
http://www.meyersound.com/support/papers/speech/
5
In STI testing, speech is modelled by a special test signal with speech-like characteristics.
Following on the concept that speech can be described as a fundamental waveform that is
modulated by low-frequency signals, STI employs a complex amplitude modulation scheme to
generate its test signal. The basic idea of STI measurement consists in emitting a synthesized
test signals instead of a human speaker’s voice.
The speech intelligibility measurement acquires and evaluates this signal as perceived by the
listeners ear. At the receiving end of the communication system, the depth of modulation of
the received signal is compared with that of the test signal in each of a number of frequency
bands. Reductions in the modulation depth are associated with loss of intelligibility the
Speech Transmission Index (STI) is a machine measure of intelligibility whose value varies
from 0 (completely unintelligible) to 1 (perfect intelligibility).
STI is derived from the Modulation Transfer Function (MTF) in a room. MTF is calculated from
a noise signal 125 Hz to 8 kHz octave bands with modulation frequencies between 0.63 Hz
and 12.5 Hz (14 frequencies*7 octaves=98)
The MTF concept was proposed by Houtgast and Steeneken to account for the relationship
between the transfer function in an enclosure in terms of input and output signal envelopes
and the characteristics of the enclosure such as reverberation. This concept was introduced
as a measure in room acoustics for assessing the effect of the enclosure on speech
intelligibility.
To calculate STI :
m( F )
( S / N ) 10 lg
1 m( F )
Where A weighting factor for each of the 7 octave bands is applied based on a
standard speech spectrum, calculated from subjective testing (0.13, 0.14, 0.11, 0.12,
0.19, 0.17, 0.14) for 125 Hz to 8 kHz
( S / N ) Woct ( S / N ) oct
oct
Finally, the weighted mean signal to noise ratio is converted to STI giving a value
between 0 and 1, 1 indicates perfect intelligibility.
STI
S / N 15
30
>0.80 Excellent
>0.65 V. Good
>0.50 Good
>0.40 Fair
>0.30 Poor
0.30 Bad
Table 0.0 – STI- quality rating table
6
“A rising awareness for security issues, new technological means and the shortcomings of
RASTI triggered the speaker manufacturer Bose and the research institute TNO to develop a
new method for speech intelligibility measurements of PA installations. The result of these
efforts is STI-PA, which allows quick and accurate tests with portable instruments. Like
RASTI, STI-PA applies a simplified procedure to calculate the MTF. But STI-PA determines one
MTF by analyzing all seven frequency bands, whereby each band is modulated with two
frequencies.
Supposing that no severe impulsive background noise is present and that no massive non-
linear distortions occur, STI-PA provides results as accurate as STI. If however impulsive
background noise is present during the normal system operation hours, it is usually possible to
mitigate the effects by also acquiring a measurement at a more favourable time e.g. under
slightly different conditions in the area, or during the night time and to calculate an unbiased
overall measurement by using the results of both test cycles.”4
In Lab room:
1. Nor 275 Speaker (Hemi-Dodecahedron) Speakers
2.Tivoli speakers
In Reverberation Chamber:
1. Balloon (for the RT measurement of the chamber by Impulse noise method)
2. Nor 275 Speaker (Hemi-Dodecahedron)
3. Tivoli speakers
In Anechoic chamber:
1. Yamaha powered monitor speaker model HS 50M
2.Tivoli speakers
7
Procedure :
The apparatus was then set up in the reverberation chamber. This time the balloon burst
method was carried out before the measurement
The Nor 275 hemi-dodec was placed at one corner of the rev.room and the SLM nor 140 on
the opposite far end of the rev. room(@10m). The signal source was generated and the
readings taken down. The closer distance measurements(close & medium) were carried out at
almost 2/3rd & 1/3rd distances of the long distance measurement.(i.e @ 1m & 3m
respectively)-This was repeated with the Tivoli speakers too.
Then the next stage involved opening the doors to the 10 m2 area absorptive surface wall
of the reverberation chamber. The experiment was repeated at close, medium & far
distances from the signal source as before with the Nor 275 speaker & the Tivoli speakers.
The final stage of the experiment involved measuring the STI from the Tivoli speakers facing
towards the absorptive surface of the wall. The measurements were taken at 1m & 3m
respectively facing the direction of the speakers. This was to determine the measurement of
the STI on grounds of effective sound localization
8
In Anechoic chamber:
Here only the Tivoli speaker and a new addition – Yamaha powered monitor speaker model HS
50M were used. Both of them were tested at a distance of 3m from the signal source. The
measurements were repeated several times and the STI results averaged to improve the
accuracy of tests on the basis of repeatability.
The time set for the measurement was 12 seconds.
The background noise within was also measured & noted.
0.6
0.5
0.4 Nor 275 Hemi-dodec
Tivoli Speakers
0.3
0.2
0.1
0
At Close At medium At long Average
range(1m) range(3m) range(9m)
Distance in metres
0.5
0.4 Nor 275 Hemi-dodec
Tivoli Speakers
0.3
0.2
0.1
0
At Close At medium At long Average
range(1m) range(3m) range(10m)
Distance in metres
9
Graph 3 (refer table C in annexure)
SPEAKER IN REV. ROOM with background noise of 35dBA & Absorptive
Surface of 10m2
1
0.9
0.8
0.7
0.6
STI - RANGE
0.3
Tivoli Faced towards Absorptive
0.2 surface
0.1
0
At Close At medium At long Average
range(1m) range(3m) range(10m)
Distance in metres
SPEAKER IN
ANECHOIC
Measurement
with SPL in
@ 3m from STI-PA Average.STI-PA Avg.SPL
background dBA
signal source
noise of
21.6dBA
1 0.92 58.3
Tivoli 0.925 57.65
2 0.93 57
1 0.86 61.2
2 0.92 61.6
Yamaha HS
3 0.92 0.906 66 68.8
50M
4 0.94 75.2
5 0.89 80
10
In the Lab room, with a background noise of 36dBA; it is observed that the SPEECH
Transmission Index(STI) has a gradual decline in its level as the distance between the signal
source and the sound level meter is increased by 3m. That indicates that the clarity of
intelligible speech goes on declining with increase of distance.
Morover there is a difference in the STI of the Nor-275 (hemi-dodecahedron) speakers & the
Tivoli speakers. The Nor-275 shows a low STI average of 0.58(which is termed to be
intelligible speech as per the quality rating- Table 0.0) when compared to Tivoli speakers
STI average of 0.7(which is termed as very good). The sound power level output of both the
speakers being almost the same at all distances/positions on a time measure of 12s.
Hence, this proves that the Tivoli speakers proved to be better than the Nor 275, this is also
because, the Tivoli speakers were uni-directional in its output whereas the Nor-275 was
emitting sound in all directions(multidirectional) and wasn’t specifically directing sound
towards the listener/sound level meter.- Refer Table –A
In the reverberant chamber, with a background noise of 35dBA; it is observed that the
SPEECH Transmission Index(STI) has a gradual decline in its level as the distance between the
signal source and the sound level meter is increased from the medium range distance from
the SLM to the farthest position(10m). Whereas there is a steep decline from the closest
position(1m) to the medium range position(3m) from the SLM. This is more prominently
noted with the hemi-decahedron (Nor-275).This indicates that the multidirectional Nor-275
speaker acted more like a unidirectional source of sound when it is the closest to the SLM Nor
140. But when moved to farthest position from medium position it shows a increase of 1.4dB
in the sound level (71.9 – 73.3dB).But the Speech intelligibility shows only a downward
graph, indicating poorness of clarity in intelligible speech with the increase of distance.
Refer Table –B
Morover there is a difference in the STI of the Nor-275 (hemi-dodecahedron) speakers & the
Tivoli speakers. The Nor-275 shows a low STI average of 0.45(which is termed to be fairly
intelligible speech as per the quality rating- Table 0.0) when compared to Tivoli speakers STI
average of 0.533(which is termed as good enough). The sound power level output of both the
speakers being different at all distances/positions on a time measure of 12s. The Nor-275
emitted a higher level of sound but only created more masking of sound within the space.
11
Hence, this proves that the Tivoli speakers again proved to be better than the Nor 275, this
is also because, the Nor-275 was emitting sound in all directions creating more reverberation
and disturbances.
The RT of the reverberation chamber was measured by the impulse noise method and noted
in Table D. It is observed that the reverberation was highest within the lower frequency
range mainly. So this infers that the STI would be further affected by masking noise consisting
of lower frequencies(63k-1Khz) rather than higher frequencies within the space.
(for eg: machinery,equipment, or similar functions within a space that is highly reverberant
can prove really bad for speech communication. Refer Table –D
Morover when compared to the Lab room, it is seen that the STI levels of both signal sources
have come down considerably when measured in the reverberation chamber. Thus the Lab
room is much better in terms of a better communicative environment for speech.Although
with the addition of the 10sq.m of absorptive surface on one entire wall of the reverberant
chamber did enhance the audible environment of the chamber to a good level.
Refer Table –C
Not only that but when the Tivoli speakers were directed straight towards the absorptive
surface and the signal source measured, that too did a lot of good to its STI average which
increased from 0.63 to a whopping 0.7 suddenly. This can be very well observed in Table C.
In the Anechoic chamber, a new speaker was also brought in (Yamaha) which had a better
configuration than the Tivoli speakers.(and more expensive too).
But the results of STI after emitting the signal source of sound showed a different picture;
where the Tivoli gave an STI of 0.925(Excellent quality of intelligible speech) whereas the
Yamaha speakers gave a lesser level of 0.906. Repetitive tests were carried out just to
confirm if this result was correct or if there were fluctuations each time. However this
proved to be concrete that the Tivoli speakers had a better speech intelligibility index.
12
The above observations give us a fair picture on directional sound sources being used within
a space and the effect of the environment on the same. We could assess the quality of the
room/environment for its communicative sharpness and clarity with this process. Here the
anechoic chamber proved to be the best and clearest environment for speech; then came the
Lab room which proved superior to the reverberant chamber as it contained less reverberant
sound/reflected sound waves.
It is necessary that the STI tests & checks be done on a fixed interval basis especially in
public gathering spaces like stations, undergrounds and auditoriums etc to maintain the
quality of speech in its public address systems with the advent of time. This would
ensure maximum safety and less confusion in announcements being made through these
PA systems.
________________________________________________________________________________
References:
Ref.: ‘Introduction to Speech Intelligibility’
source:http://www.ntiaudio.com/Portals/0/Products/Minstruments/AL1/AppNotes/NTI_App
_Note_Introducing_STI-PA.pdf
Ref.: Houtgast, T. and Steeneken, H.J.M., “The modulation transfer function in room
acoustics as a predictor of speech intelligibility”, Acustica 28, 1973, p.66-73.
Ref.: Bradley, J. S. “Predictors of Speech Intelligibility in Rooms,” JASA vol. 80, no. 3 (1986)
Ref.: ‘Speech Intelligibility Papers’– Written by Ralph Jones. Edited by Rachel Murray P.E
source :http://www.meyersound.com/support/papers/speech/
13