See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/221514379
Hands free mouse: comparative study on
mouse clicks controlled by humming.
Conference Paper · January 2010
Source: DBLP
CITATION
READS
1
47
2 authors:
Ondrej Polacek
Zdenek Mikovec
21 PUBLICATIONS 66 CITATIONS
70 PUBLICATIONS 118 CITATIONS
Ricardo
SEE PROFILE
Czech Technical University in Prague
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
In-hospital navigation for individuals with specific navigation and orientation needs (seniors, visually
impaired, wheelchair users) View project
All content following this page was uploaded by Ondrej Polacek on 04 December 2016.
The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document
and are linked to publications on ResearchGate, letting you access and read them immediately.
Hands Free Mouse: Comparative Study
on Mouse Clicks Controlled by Humming
Ondřej Poláček
Abstract
Faculty of Electrical Engineering,
In this paper we present a novel method of simulating
mouse clicks while the cursor is navigated by head
movements tracked by webcam. Our method is based
on simple hummed voice commands. It is fast,
language independent and provides full control of
common mouse buttons. Our method was compared
with other three different methods in an experiment
that proved its efficiency by means of task duration.
Czech Technical University in
Prague
Karlovo nám. 13
12135 Praha 2
Czech Republic
polacond@fel.cvut.cz
Zdeněk Míkovec
Faculty of Electrical Engineering,
Keywords
Czech Technical University in
Non-Verbal Vocal Interface, Voice Interface, Head
Tracking, Accessibility, Comparative Study
Prague
Karlovo nám. 13
12135 Praha 2
ACM Classification Keywords
Czech Republic
H.5.2 Information interfaces and presentation: User
Interfaces – Input devices and strategies; Voice I/O.
xmikovec@fel.cvut.cz
General Terms
Design, Experimentation, Measurement, Performance
Introduction
Complex graphical user interfaces (GUI) are present
not only in desktop computers, but they also appear in
other areas such as Rich Internet Applications (RIA) on
the Internet. The efficiency of interaction with such
complex GUI is strongly dependent on the efficiency of
the way the mouse is used. For users with limited
motor abilities (especially upper-limb impaired users)
the use of mouse could be a serious problem. These
users need an alternative way of mouse control. There
are for instance several head tracking solutions which
can solve the problem of cursor control satisfactorily,
but not the problem of simulation of mouse clicks (see
State of the Art). The set of the mouse clicks simulated
is either not complete or too complicated for the user
who cannot simulate mouse in full extent.
State of the Art
Several methods have been used to simulate mouse
clicks. The camera mouse system [1] used a dwell time
method. A mouse click was generated, when the user
kept the mouse cursor within a 30-pixel radius for
0.5 s. The dwell time method is capable of simulating
left click only and does not cope with other mouse
events such as right click, double click, dragging and
scrolling. Moreover the method raises the Midas touch
problem [4], as the user cannot stop the cursor without
issuing a left click. It can be solved either by adding
places where the user can stop the cursor [1] or
displaying a pop-up menu after the dwell time
expires [11].
Tracking various face features can be also used to
simulate mouse clicks. For instance, a system published
by Tu [10] responded to the state of user’s mouth. It
was capable of simulating left click by opening the
mouth and right click by stretching the mouth.
Dragging was provided by moving the cursor while
keeping the mouth open. Another system called
hMouse [2] triggered left and right clicks when the turn
of user’s head exceeded a specific threshold angle.
There was no solution for dragging and scrolling
reported.
There are also several multimodal systems that use
different interaction channel to simulate mouse clicks.
Nouse [3] employed a computer keyboard, which is a
rapid and complex solution, however, the users still
need to use their hands and such solution cannot be
used for disabled people with severe upper-limb
impairment. Another modality that can be used for
simulating mouse clicks is speech. Multimodal system
published by Loewenich and Maire [5] defined five
simple speech commands (click, double, right, hold and
drop) that covered all clicking and dragging operations.
Ronzhin and Karpov [8] published similar system that
defined 30 speech commands covering clicking,
dragging and scrolling one by one line. Remaining
commands were used as shortcuts to common
operations such as open a file, exit an application,
copy, paste etc.
Our Solution
In our system non-verbal vocal interaction (NVVI) [6,
9] is used for simulating mouse clicks. This interaction
method can be characterized as using other sounds
than speech, such as humming, to control user
interfaces. In our case, hummed voice commands are
determined by its pitch and length. Expected pitch
profiles of the commands are depicted in Figure 1. Left
click (1a) is defined as a short tone produced below
user-specific threshold pitch. Double click (1b) is
defined as two consecutive left clicks. Right click (1c) is
a short tone above the threshold pitch. Drag (1d) is a
long tone. The difference between long and short tone
is 0.5 s. However, this value can be modified according
to preferences of the user. Drop operation does not
have its own command and it is triggered by short or
long tone. Scrolling (1e, 1f) is performed when
significant increase or decrease in pitch is detected.
Amount of lines scrolled is determined by length of the
voice command in real time. Continuous real-time
control is a significant advantage of NVVI [9]. Using
speech the user has to explicitly specify the amount of
lines scrolled, which is rather awkward. Note that in
order to keep our method simple, minimal amount of
voice commands was used. Moreover, commands are
very simple and short for the most frequent operations
(clicking) and they are a bit more complicated for
advanced operations (dragging and scrolling).
b
c
pitch
a
threshold
pitch
Experiment
The aim of the experiment was to determine the
efficiency of our solution. We compared our NVVI
method with other three different methods for
simulating mouse clicks in terms of speed and error
rate. The following four methods were prepared for the
comparison test:
Non-verbal vocal interaction (NVVI) as described in
previous section. Scrolling voice commands were not
included in the experiment.
Speech commands. Regarding the fact that all
participants were Czech native speakers we used Czech
commands recognized by MyVoice application [7].
Commands and their English equivalents are listed in
Table 1.
Table 1. Speech commands.
Command
in Czech
Klik
Dvojklik
Pravý klik
Vzít
Položit
time
e
f
pitch
d
time
Figure 1. Non-verbal vocal commands used to simulate mouse
clicks. a. left click, b. double click, c. right click, d. drag, e.
scroll down, f. scroll up
This method is very well suited for real-time control, as
the hummed commands are recognized much faster
than verbal commands [9]. They are also culturally and
language independent. On the other hand this is a very
unusual way of interaction and the users have to get
used to it [6].
English
Equivalent
Click
Double Click
Right click
Drag
Drop
Description
(mouse operation)
Left click
Double left click
Right click
Left button down
Left button up
Computer keyboard. Mouse buttons were mapped
to keystrokes. Alt + left arrow corresponded to left
mouse button and alt + right arrow to right button.
Arrows up and down corresponded to mouse wheel.
This method was chosen as a reference test.
Head gestures. This solution combined the dwell
time approach with a pie menu (see Figure 2). When
the mouse cursor did not significantly move for 0.5 s,
the pie menu appeared and concrete operation was
chosen by moving the cursor over the menu as
depicted in Figure 2 by dashed arrow. This method
does not suffer from Midas touch problem [4] and all
mouse operations can be simulated. The menu can be
cancelled by moving the cursor down.
Webcam
Figure 2. Pie menu.
Microphone
participants had to start with circle 1 and continue until
circle 16 was reached. Part of expected cursor
trajectory is shown by arrows. Every task defined a
different mouse operation to be performed in the circles
as follows:
Task Pointing. No clicking was involved. This task
was included for reference purposes.
Task Left Click. Only left click had to be simulated.
Task Multi Click. Left, right and double clicks were
simulated according to caption of circles.
Task Drag & Drop. Drag and drop operations were
involved.
Figure 3. Setup of the experiment.
16
1
3
14
5
12
7
10
9
8
11
6
13
4
2
15
Figure 4. Task template.
In order to navigate the cursor by head throughout the
experiment, a head tracking system described in [11]
was chosen. A cheap webcam can be used by the
system to track the head of the user and convert its
position and turn to position of the mouse cursor on a
screen. An absolute mode is used as described in [5],
i.e. the position of head is directly mapped to the
position of the mouse cursor. A 17” LCD monitor with
native resolution 1280 x 1024 pixels was used. There
was a webcam mounted on the top of the monitor that
provided data for head tracking. While using vocal
modalities the participants used headphones with
microphone. The experiment setup is depicted in
Figure 3.
For the experiment four mouse click simulation tasks
were defined. In every task the participants had to
move the cursor to particular circle as shown in
Figure 4 and perform specified clicking operation. The
The participants had to pass through overall 16 tasks
(four tasks using four modalities for simulating mouse
clicks). In order to minimize learning effect, the
sequence of methods and tasks was shuffled. Moreover
every task had to be undertaken twice and data were
measured only in the second try. The objective data
collected were processed into three indicators as
follows:
Task duration, which is the duration between the
first and last operation in a task including error
operations. This indicator is used to measure the
efficiency of each method.
Click duration, which is the duration between
passing the border of a small circle and a correct click.
Error rate, which expresses the number of wrong
clicks relative to number of total clicks.
Due to the long-lasting single session (about 50
minutes), we did not include scrolling capabilities of
Table 2. Mean times and standard deviations (SD) for each task and modality. Grey cells in one row correspond to means that are not
statistically different. Overall error rates for each method are shown in the last row.
Speech
NVVI
Mean Time [s] SD [s] Mean Time [s] SD [s]
Left
Click
Task Duration
Click Duration
Multi
Task Duration
Click
Click Duration
Drag & Task Duration
Drop
Click Duration
Error rate [%]
49.6
1.342
55.1
1.639
51.0
1.430
7.4
0.277
7.7
0.374
6.7
0.274
39.0
0.875
49.5
1.358
44.5
1.256
3.53
evaluated methods. After each session the participants
were given a post-test questionnaire to subjectively
assess speed, comfort and accuracy of each method. In
the experiment 54 participants without disabilities took
part. They were recruited from university students
(mean age=23.5, SD=0.98) and were technically
oriented and experienced computer users. The
participants were trained to perform NVVI and head
gestures in a training session which was conducted
before the experiment and lasted approximately 15
minutes. Speech and keyboard methods were not
trained because speech is a natural form of interaction
and participants were experienced enough in using
keyboard.
Results
In Table 2 the results of the experiment are
summarized. Mean times (task and click durations) of
each method in three tasks are shown in rows. As these
times are compared in each row, the speed of the
methods can be evaluated without exception as follows:
Keyboard < NVVI < Speech < Head gestures
6.11
7.8
0.203
12.9
0.526
8.2
0.249
Keyboard
Head gestures
Mean Time [s] SD [s] Mean Time [s] SD [s]
34.0
0.524
39.3
0.706
36.3
0.558
4.09
6.5
0.175
7.0
0.213
6.8
0.186
60.9
1.927
71.1
1.977
70.5
1.667
9.7
0.260
14.3
0.415
9.9
0.314
1.75
ANOVA test and Scheffé’s method were used to find
statistically significant (p < .01) differences in mean
times of each task. Most of them are significant except
those shown in grey color in Table 2. Our method is the
fastest among hands-free methods (speech and head
gestures), however, it is slower than keyboard, which
on the other hand is unusable for severe motor
impaired users.
Error rate results are summarized in the last row of
Table 2. The head gestures method experienced the
lowest error rate. This is probably caused by the
relatively high time penalty, when the user selects a
wrong option. In this case the cursor has to be
navigated to the initial position and the user has to wait
for pie menu popup (dwell time). This leads to much
more careful interaction. However, we believe, that this
behavior can be improved by personalizing the dwell
time and size of the pie menu. The error rate of NVVI
was the highest one (6.11%), which is caused by
insufficient training involving only one session.
According to longitudinal studies [6, 9] four training
sessions are enough to minimize error rate of the NVVI.
Nevertheless the time penalty caused by these errors is
already included in the task duration indicator.
Table 3. Questionnaire results. Scale 1 (=worst) … 5 (=best).
Mean values are displayed.
Speech
NVVI
Kbd.
Gest.
3.33
3.74
3.94
3.50
2.81
2.94
4.72
4.30
4.81
2.11
2.89
3.04
Speed
Comfort
Accuracy
Subjective results are shown in Table 3. Head gestures
were generally rated by the users as the worst method
and keyboard as the best. Even though NVVI was faster
than speech, it was perceived worse in comfort and
accuracy.
Conclusion
In this paper we have described a method for mouse
clicks simulation based on humming (NVVI). This
method is capable of simulating all common mouse
buttons including mouse wheel for real-time scrolling.
Our method was compared with other three methods
(speech, head gestures and keyboard) and it was the
second fastest, although it experienced the highest
error rate. The subjective perception of the accuracy
and comfort was also rated as the worst. In the future,
we will conduct longitudinal tests with disabled users in
real applications and combine more modalities in the
system to provide more efficient control of a computer.
Acknowledgement
We would like to thank Lukáš Zich from the Center for
Machine Perception, CTU in Prague for provision of the
head tracking software. This research has been partially
supported by the MSMT research program MSM
6840770014 and the VitalMind project (IST-215387).
References
[1] Betke, M., Gips, J. and Fleming, P. The Camera
Mouse: Visual Tracking of Body Features to Provide
Computer Access for People With Severe Disabilities. In
IEEE Transactions on Neural Systems and Rehabilitation
Engineering, IEEE Computer Society (2002), 1-10.
[2] Fu, Y., Huang, T. S., hMouse: Head Tracking Driven
Virtual Computer Mouse, In Proc WACV’07, IEEE
Computer Society (2007), 30-36.
[3] Gorodnichy, D. O., Malik, S. and Roth, G. Nouse
'Use Your Nose as a Mouse' - a New Technology for
hands-free Gamers and Interfaces, In Proc VI'2002,
Calgary (2002), 354-361.
[4] Jacob, R. J.K. What you look at is what you get, In
Computer, vol.26, no.7, IEEE Computer Society (1993),
65-66.
[5] Loewenich, F. and Maire, F. Hands-free mousepointer manipulation using motion-tracking and speech
recognition. In Proc OZCHI, ACM Press (2007),295-302.
[6] Mahmud, M., Sporka, A. J., Kurniawan, S. H. and
Slavik, P. A Comparative Longitudinal Study of Nonverbal Mouse Pointer, In Proc INTERACT 2007,
Springer-Verlag (2007), 489-502.
[7] Nouza, J., Nouza, T. and Červa, P. A MultiFunctional Voice-Control Aid for Disabled Persons. In
Proc SPECOM 2005, Moscow, 715-718.
[8] Ronzhin, A. and Karpov, A. Assistive multimodal
system based on speech recognition and head tracking,
In Proc of EUSIPCO’2005, Turkey, 2005.
[9] Sporka, A. J., Kurniawan, S. H., Mahmud, M. and
Slavik, P. Longitudinal study of continuous non-speech
operated mouse pointer, In Proc CHI’07, ACM Press
(2007), 2669-2674.
[10] Tu, J., Huang, T. and Tao, H. Face as Mouse
Through Visual Face Tracking, In Proc CRV (2005),
IEEE Computer Society (2005), 339-346.
[11] Zich, L. Video based Human-Computer interface,
Master Thesis, 2009, CTU Prague, FEE.