326
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012
Assessment of Stereoscopic Crosstalk Perception
Liyuan Xing, Student Member, IEEE, Junyong You, Member, IEEE, Touradj Ebrahimi, Member, IEEE, and
Andrew Perkis, Senior Member, IEEE
Abstract—Stereoscopic three-dimensional (3-D) services do not
always prevail when compared with their two-dimensional (2-D)
counterparts, though the former can provide more immersive
experience with the help of binocular depth. Various specific 3-D
artefacts might cause discomfort and severely degrade the Quality
of Experience (QoE). In this paper, we analyze one of the most annoying artefacts in the visualization stage of stereoscopic imaging,
namely, crosstalk, by conducting extensive subjective quality tests.
A statistical analysis of the subjective scores reveals that both
scene content and camera baseline have significant impacts on
crosstalk perception, in addition to the crosstalk level itself. Based
on the observed visual variations during changes in significant
factors, three perceptual attributes of crosstalk are summarized
as the sensorial results of the human visual system (HVS). These
are shadow degree, separation distance, and spatial position of
crosstalk. They are classified into two categories: 2-D and 3-D
perceptual attributes, which can be described by a Structural
SIMilarity (SSIM) map and a filtered depth map, respectively. An
objective quality metric for predicting crosstalk perception is then
proposed by combining the two maps. The experimental results
demonstrate that the proposed metric has a high correlation (over
88%) when compared with subjective quality scores in a wide
variety of situations.
Index Terms—Crosstalk perception, objective metric, perceptual attribute, subjective evaluation.
I. INTRODUCTION
TEREOSCOPIC three-dimensional (3-D) imaging is
based on simultaneously capturing a pair of two-dimensional (2-D) images and then separately delivering them to
respective eyes. Consequently, 3-D perception is generated in
the human visual system (HVS). Although stereoscopic 3-D
services introduce a new modality (binocular depth) that can
offer increasingly richer experience (immersion and realism) to
the end-users, they do not always prevail when compared with
their 2-D counterparts. One of the major drawbacks of stereoscopic 3-D services is visual discomfort, which can potentially
cause users to feel uncomfortable and severely degrade the
viewing experience.
S
Manuscript received March 24, 2011; revised July 20, 2011; accepted October 01, 2011. Date of publication October 18, 2011; date of current version
March 21, 2012. This work was supported by the Research Council of Norwegian University of Science and Technology and UNINETT. The associate editor
coordinating the review of this manuscript and approving it for publication was
Prof. Weisi Lin.
L. Xing, J. You. and A. Perkis are with the Centre for Quantifiable Quality of
Service (Q2S) in Communication Systems, Norwegian University of Science
and Technology (NTNU), N-7491 Trondheim, Norway (e-mail: liyuan@q2s.
ntnu.no; junyong.you@ieee.org; andrew@iet.ntnu.no).
T. Ebrahimi is with the Multimedia Signal Processing Group (MMSPG),
Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne,
Switzerland (e-mail: touradj.ebrahimi@epfl.ch).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TMM.2011.2172402
The importance of various causes and aspects of visual discomfort is clarified in [1]. In particular, 3-D artefacts are considered to be one of the most prominent factors contributing to
visual discomfort. Such artefacts can be introduced in each stage
from the acquisition to the restitution in a typical 3-D processing
chain [2]. In particular, crosstalk is one of the most annoying
distortions in the visualization stage of a stereoscopic imaging
system [3]. Crosstalk is produced by imperfect view separation
that causes a small proportion of one eye image to be seen by
the other eye as well. Crosstalk artefacts are usually perceived
as ghosts, shadows, or double contours by human subjects.
Nowadays, crosstalk exists in almost all stereoscopic displays. However, the mechanisms behind occurrence of crosstalk
can be significantly different across different stereoscopic display technologies. These mechanisms have been analyzed in
order to characterize and measure the components contributing
to crosstalk. Therefore, crosstalk reduction can be achieved
by reducing the effect of one or more of these components.
Since it is not possible to completely eliminate crosstalk of
displays with current technologies, researchers attempt to
conceal crosstalk of a 3-D presentation using image processing
methods before display. Such methods are usually categorized
into crosstalk cancellation. Crosstalk cancellation does not
always perform efficiently in all situations. These issues have
been widely investigated in the literature, e.g., see a review
in [4]. However, neither of the aforementioned methods can
completely eliminate crosstalk artefacts.
Therefore, it is beneficial to study how users perceive crosstalk
of 3-D presentations. Comparatively few research efforts have
been devoted to this topic. In [5], a visibility threshold of
crosstalk for different amounts of disparity and image contrast
ratios in grayscale patches is provided. It shows that the visibility of crosstalk increases with increasing image contrast and
disparity. However, a stereoscopic presentation is the result of
a combination of different contrasts and disparities per pixel
over an entire image, and it is more practical to know how much
crosstalk can be perceived when it is visible. Therefore, the
authors of [3] investigated more realistic scenarios where natural
scenes varying in crosstalk levels (0%, 5%, 10%, and 15%) and
camera baselines (0, 4, and 12 cm) affect the perceptual attributes
of crosstalk (perceived image distortion, perceived depth, and
visual discomfort). However, only two, rather similar, natural
scenes were used in their experiments. More scene contents
with different depth structures and image contrasts should be
taken into account when designing a subjective experiment,
because depth structure of scene content together with camera
baseline can in principle determine the disparity, one of the most
major factors impacting crosstalk visibility [5]. Moreover, the
authors of [6] found that monocular cues of images also play an
important role in the crosstalk perception, in addition to contrast
1520-9210/$26.00 © 2011 IEEE
XING et al.: ASSESSMENT OF STEREOSCOPIC CROSSTALK PERCEPTION
ratio and disparity. In [7], it is shown that edges and high contrast
of computer-generated wire-frames make crosstalk more visible
when compared with natural images. This means that crosstalk
can be more efficiently concealed on images with more texture
or details. These observations partially support a hypothesis
that scene content is an important factor impacting users’
perception of crosstalk. Although other artefacts, e.g., blur and
vertical disparity as investigated in [8], may also have impact
on the crosstalk perception, they can be often corrected by
postprocessing techniques.
Although subjective testing is the most reliable way to evaluate the perceived quality, it is time-consuming, expensive, and
unsuitable for real-time applications. To deal with these drawbacks, objective metrics that can predict the human subjects’
judgment with a high correlation are desired. To develop good
objective metrics, the perception mechanisms need to be well
understood and taken into account. However, this is usually
fairly difficult. Therefore, development of objective 3-D quality
models is still in its early stages. Researchers first started with
exploring whether or not traditional 2-D metrics can be applied
to stereoscopic quality assessment [9], [10]. Subsequently, a few
objective metrics [11]–[13] that take into account the characteristics of stereoscopic images have been proposed. However,
most of the existing objective metrics are designed to assess
quality degradations caused by lossy compression schemes. To
the best of our knowledge, only one objective metric that considers noncompression quality degradations, induced during acquisition and display stages of stereoscopic media, has been proposed in [14]. This metric is modeled by a linear combination of
three measurements, which evaluate the perceived depth, visual
fatigue, and temporal consistency, respectively.
In this paper, subjective tests [15] have been conducted to
collect the evaluation scores on crosstalk perception of a wide
range of 3-D stimuli, including different scene contents, camera
baselines, and crosstalk levels. Thereby, a comprehensive database of crosstalk perception for a wide variety of situations has
been created. Furthermore, based on a statistical analysis of
subjective scores, scene content, camera baseline, and crosstalk
level are found to have significant impacts on the perception of
crosstalk. By changing the amplitude of the significant factors,
three perceptual attributes of crosstalk in the HVS have been
observed. These perceptual attributes are further used to design
an objective quality metric [16] for crosstalk perception.
The main contributions of the paper are twofold. First, our
subjective tests provide a comprehensive database for crosstalk
perception in stereoscopic images. Second, users’ subjective
perception of crosstalk is predicted using an objective metric
based on a rigorous analysis of perceptual attributes of crosstalk.
The remainder of this paper is organized as follows. In
Section II, we present the subjective tests on crosstalk perception as well as a statistical analysis of the subjective scores. In
Section III, perceptual attributes of crosstalk are explained by
an observation on the visual variations of stimuli when several
significant factors change. Furthermore, a perceptual objective
metric for crosstalk perception is proposed by describing the
perceptual attributes of crosstalk and the experimental results
are reported in Section IV. Finally, concluding remarks are
given in Section V.
327
Fig. 1. Polarized display system used in subjective tests.
II. SUBJECTIVE TESTS ON CROSSTALK PERCEPTION
Several recommendations for subjective evaluation of visual
stimuli have been issued by the International Telecommunication Union (ITU), e.g., the widely used ITU-R BT.500 [17] for
television pictures. For subjective evaluation of stereoscopic
television pictures, ITU-R BT.1438 [18] has made a few first
steps, but it still lacks many details. The authors of [19] have
summarized the lacks in the form of additional requirements. In
this subjective test, we followed these methodologies and further customized them for the crosstalk perception. In the following, we will provide some details about laboratory environment where the subjective tests were conducted, how test stimuli
were prepared, which test method was adopted, as well as what
results were obtained from the subjective tests.
A. Laboratory Environment
1) Display System: A polarization technique was used to
present 3-D images, as illustrated in Fig. 1. Specifically, two
Canon XEED SX50 projectors with a resolution of 1280 960
were placed on a Chief ASE-2000 Adjusta-Set Slide Stacker.
The stacker can be adjusted with
swivel,
tilt, and
leveling ranges. Two Cavision linear polarizing glass filters
with size of 4 in 4 in were installed orthogonally in front of the
projectors. In this way, two views were projected and superimposed onto the backside of a 2.40 m 1.20 m silver screen. The
projected distance between the projectors and the silver screen
was about 2 m, forming a projected region occupying the central area of the silver screen with a width of 1.12 m and height of
0.84 m. Images up-sampled by a bicubic interpolation method
were displayed in full-screen mode. The subjects equipped with
polarized glasses were asked to view 3-D images on the opposite side of the silver screen. The viewing distance was set to
about five times the image height 0.84 m 5 , as suggested in
[17]. The field of view (FOV) was thus 15 .
2) Alignment of Display System: Prior to the tests, the display system was calibrated to align the two projectors. In particular, the positions of two projectors were adjusted to guarantee that the center points of projectors, projected region, and
silver screen, positioned in the same horizontal line (center horizontal line as shown in Fig. 1) and the line was perpendicular
to the silver screen. Moreover, the angles of stackers and the
keystones of the projectors were adjusted with the help of projected Hermann grid images. The adjustment of display system
328
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012
was finished once the two Hermann grid images from the left
and right projectors were exactly overlapped.
3) Measurement of System-Introduced Crosstalk: After the
alignment, the system-introduced crosstalk was measured immediately. As introduced in [4], the terminology and mathematical definitions of crosstalk are diverse and sometimes contradictory. We adopt the definition of system-introduced crosstalk
as the degree of the unexpected light leakage from the unintended channel to the intended channel. In particular, we measured the leakage in a situation when the left and right test
images have the maximum difference in brightness. The systemintroduced crosstalk is measured mathematically as follows:
(1)
where
denotes a pair of test images (the left image is in
white completely while the right is in black),
is another
image pair both in black,
denotes the luminance measured
on the silver screen, and
denotes the luminance after the
right lens of polarized glasses which is cling to the silver screen.
Therefore, denotes the system-introduced crosstalk from the
left channel to the right, which is approximately 3% in our experiments. The consistence of the system-introduced crosstalk
of polarized display has also been verified over the display, between projectors, and among different combinations of brightness between left and right test images.
4) Room Conditions: The test room had a length of 11.0 m,
width of 5.6 m, and height of 2.7 m. During the subjective tests,
all of the doors and windows of the test room were closed and
covered by black curtains. In addition, the lights in the room
were turned off except for one reading lamp on a desk in front
of the subject, which was used to illuminate the keyboard when
entering subjective scores. In this way, subjects could concentrate on the 3-D perception, as opposed to entering the scores
using the keyboard.
B. Test Stimuli
Scene content and camera baseline are requisite factors
for stereoscopic imaging and also affect users’ perception
on crosstalk. Therefore, scene content, camera baseline, and
crosstalk level were selected as three observed factors in the
subjective tests of crosstalk perception. In particular, three
camera baselines and four crosstalk levels were applied to six
scene contents, which resulted in 72 test stimuli in total.
1) Scene Content: Seven multiview sequences (one for
training) from the MPEG [20] were chosen as representative
scene contents, as shown in Fig. 2. These scene contents cover
a wide range of depth structures, contrasts, colors, edges, and
textures, which were considered as potential factors impacting
users’ perception of crosstalk. In particular, a wide range of
depth structures were obtained by including both indoor and
outdoor scenes.
2) Camera Baseline: Three camera baselines were formed
from four consecutive cameras. The leftmost camera always
served as the left eye view and the other three cameras took
turns as the right eye views for 3-D images. In this way, three
3-D images with different camera baselines were generated for
Fig. 2. Visual samples of the selected scenes. (a) Book arrival. (b) Champagne.
(c) Dog. (d) Love bird. (e) Outdoor. (f) Pantomime. (g) Newspaper.
NUMBER
OF THE
TABLE I
SELECTED CAMERAS FROM LEFT TO RIGHT
RESULTING CAMERA BASELINES
AND THE
each scene. Table I gives more information about the selected
cameras and the resulting camera baselines of the 3-D images.
3) Crosstalk Level: In order to simulate different levels of
system-introduced crosstalk for different displays, crosstalk
artefacts were added to three 3-D image pairs, to which four
different crosstalk levels were introduced to each using the
algorithm developed in [21]. This algorithm can be summarized
by the following equations:
(2)
and
denote the original left and right views,
where
and
are the distorted views by simulating system-introduced
crosstalk distortions, and the parameter is to adjust the level
of crosstalk distortion. According to (2), the simulating algorithm keeps a consistent characteristic of the system-introduced
crosstalk of polarized display by applying the same leakage percentage to all pixels in the entire image, both the left and right
views, and different brightness of all pixels.
In our experiments, the system-introduced crosstalk is 3%,
which should be added to the simulated crosstalk in image pairs
in (2). Therefore, the overall system-introduced crosstalk perceived by the users is defined as follows:
(3)
XING et al.: ASSESSMENT OF STEREOSCOPIC CROSSTALK PERCEPTION
329
where
is the overall system-introduced crosstalk combining both the system-introduced crosstalk
and the simulated crosstalk . As crosstalk aroused by stereoscopic techniques usually ranges from 0 to 15% [5] and the image quality
might be very low if the crosstalk level is large, e.g., over 15%
[3], the parameter was set to 0, 5%, 10% and 15%, respectively, in our subjective tests. Thus, the overall crosstalk levels
in our experiments were actually
, i.e., 3%, 8% 13% and
18%, respectively. As the maximum pixel value change tuned
by
is only
, its effect can be
ignored. Therefore, the overall system-introduced crosstalk is
simulated as following based on an additive rule:
TABLE II
EXPLANATIONS OF FIVE CATEGORIAL ADJECTIVAL LEVELS
TRAINING EXAMPLES FROM BOOK ARRIVAL
AND
THEIR
(4)
Equation (4) indicates that the different simulated crosstalk
levels can be applied to a stereoscopic display with a consistent
system-introduced crosstalk level. Therefore, the crosstalk level
will refer to the overall crosstalk level
in this work.
C. Test Methodology
1) Single Stimulus: Among different methodologies for
subjective quality assessment of Standard Definition TeleVision (SDTV) pictures in ITU-R BT.500 [17], three widely used
methodologies are double stimulus continuous quality scale
(DSCQS), double stimulus impairment scale (DSIS), and single
stimulus (SS). In this study, as several camera baselines for
each scene have been taken into account, it is difficult to choose
an original 3-D image as the reference. Therefore, we adopted
the SS method. The SS method was also used in assessing
the quality levels of stereoscopic images with varying camera
baselines in the literature [3], [22]. In addition, in order for
subjects to have sufficient time to generate their 3-D perception
and have an extensive exploration of still 3-D images, a minor
modification was made on the SS method such that the subjects
could freely decide the viewing time for each image as in [3].
2) Test Interface: In order to support the adaptive SS methodology, a special interface was developed to conveniently display
the stereoscopic images in a random order. In addition, a subject
could conveniently and freely decide when he/she moved to the
next image pairs by pressing “Ctrl” key on the keyboard. The
score of the current image pairs was recorded by pressing a “Numerical” key instead of writing on an answer sheet. Other special considerations, such as displaying in full screen, disabling
unnecessary keys, updating the scores. and so on, were also included in the developed interface.
3) Subjects: Before the training sessions, visual perception
related characteristics of the subjects were collected, including
pupillary distance (measured by a ruler), normal or corrected
binocular vision (tested by the Snellen chart), color vision
(tested by the Ishihara), and stereo vision (tested by the TV-04
and TV-07 in ITU-R BT.1438 [18]).
A total of 28 subjects participated in the tests, consisting of
15 males and 13 females, aged from 23 to 46 years old. The
binocular vision of all of the subjects was above 0.80 with a
mean of 1.05 and a standard deviation of 0.28. Although seven
subjects had monocular vision differences of either 0.4 or 0.2,
all of the subjects could perceive the binocular depth.
4) Training Sessions: Subjects participated in both training
and test sessions individually. During the training sessions,
an example of five categorical adjectival levels (see Table II)
was shown to the subject in order to benchmark and harmonize
their measuring scales. The Book Arrival scene was selected
by expert viewers in such a way that each quality level was
represented by an example image and that these example
images could cover a full range of quality levels within the set
of test stimuli. When each example image was displayed, the
operator verbally explained the corresponding quality level to
the subject. In addition, a detailed explanation of every scale
was provided to subjects in form of written instructions (see
Table II). Subjects were encouraged to view the representative
examples as long as they wished and asked questions if they
needed any further clarifications. The training sessions would
continue until subjects could understand and distinguish the
five different quality levels.
5) Test Sessions: During the test sessions, subjects were first
presented with three dummy 3-D images from the Book Arrival
content, which were not used in the training sessions. These
dummy images were used to stabilize subjects’ judgment, and
the corresponding scores were not included in the subsequent
data analysis. Following the dummy images, 72 test images
were randomly shown to the subjects. A new 3-D image was
shown after a subject had entered his/her score for the previous
one. During the test period, the subjects were not allowed to ask
questions in order to avoid any interruption during the entire
session.
D. Subjective Results Analysis
The subjective scores of the 72 test stimuli given by 28
subjects are analyzed here. Particularly, we aim to analyze the
relationship between crosstalk perception and three potential
significant factors, including scene content, camera baseline,
and crosstalk level.
1) Normality Test and Outlier Removal: In order to apply
arithmetic mean value as mean opinion scores (MOS) and use
parametric statistical analysis methods, such as analysis of variance (ANOVA), the normality of subjective scores across subjects needs to be validated. The
test recommended in [17]
based on calculating the kurtosis coefficient was adopted for a
330
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012
IMPACT
Fig. 3. MOS and CI (significance of 95%) of subjective scores on crosstalk
perception for scene contents.
normality test. We classified the test results into three groups:
normal
, close to normal (
or
), and abnormal (
or
). If the
total proportion of normal and close to normal was more than
80%, we assumed that the subjective scores in our tests subject
to the normal distribution. The results showed that the majority
of stimuli (55 over 72) were normal distributed and (11 over 72)
were close to normal, while others (6 over 72) were not. Therefore, we can assume that the subjective scores subject to the
normal distribution. A screening test of subjects was also performed according to a guideline in [17]. Subjects who had produced votes significantly distant from the average scores should
be removed. Consequently, one outlier was detected and the corresponding results were excluded from the following analysis.
2) Observations: After removing the outlier, MOS and 95%
confidence interval (CI) were computed and plotted as a function of camera baseline and crosstalk level for all of the six scene
contents separately, as shown in Fig. 3. A number of observations can be made based on the results in those plots.
OF
TABLE III
CROSSTALK LEVEL(CL) AND CAMERA BASELINE(CB)
CROSSTALK PERCEPTION FOR EACH SCENE
ON
Generally speaking, the MOS values decrease as the level of
crosstalk distortions increases. However, the decreasing degree
of MOS for Dog and Pantomime is not significant as those in
other four scenes. When considering the impact of camera baseline, we can observe that there is a general tendency of reduction of the MOS values of crosstalk perception with increasing
camera baseline. However, while this tendency is significant for
the near indoor scenes (Champagne and Newspaper), it is less
significant for others, especially for Dog and Pantomime. Therefore, we can summarize the observations as follows:
• observation i: crosstalk level and camera baseline have an
impact on crosstalk perception.
• observation ii: the impact of crosstalk level and camera
baseline on crosstalk perception varies with scene content.
In addition, the individual curves in Fig. 3 show that even
the highest MOS values of Champagne and Newspaper are still
below 4, which indicates that the system-introduced crosstalk
is more perceptible in close-up scenes. Furthermore, we also
noticed that there exist exceptions where MOS values increase
with the increasing of camera baseline and crosstalk level.
Hence, crosstalk perception might be influenced by other
perceptual attributes, such as perceived depth.
3) Statistical Analysis: In order to verify the observations
and evaluate the impact of the independent variables (scene content, camera baseline, crosstalk level) on the dependent variable (crosstalk perception), we utilized ANOVA to analyze the
subjective scores obtained in our tests. ANOVA is a general
technique that can be used to test the equality hypothesis of
means among two or more groups. These groups are classified by factors (independent variables whose settings are controlled and varied by the operator) or levels (the intensity settings of a factor). An -way ANOVA treats factors and the
null hypothesis includes: 1) there is no difference in the means
of each factor and 2) there is no interaction between -factors
. The null hypothesis is verified using the -test
and can be easily judged by the -value. When the -value is
smaller than 0.05, the null hypothesis is rejected, which means
there is a significant difference in means. In particular, there is a
significant difference between the levels of a factor such that the
factor has significant effect or the difference between the levels
of one factor is not same for the levels of other factors such that
there is an interaction between different factors.
We used Statistical Package for the Social Sciences (SPSS)
statistics toolbox for our analysis. Tables III–V show the
ANOVA results for different factors. In these results, “ ” indicates that the corresponding factor has a significant effect on
the crosstalk perception or multiple factors have interactions in
terms of the impact on the crosstalk perception, and “—” means
the factor has no significant effect or there is no interaction
between multiple factors.
XING et al.: ASSESSMENT OF STEREOSCOPIC CROSSTALK PERCEPTION
TABLE IV
IMPACT OF SCENE CONTENT(SC) ON CROSSTALK PERCEPTION BETWEEN
EVERY TWO SCENES
TABLE V
IMPACT OF CROSSTALK LEVEL (CL), CAMERA BASELINE (CB), AND SCENE
CONTENT (SC) ON CROSSTALK PERCEPTION FOR ALL THE SCENES
331
combinations of camera baselines and crosstalk levels, respectively. Specifically, the selected scene contents have comparatively large differences in depth structures and image contrast.
When these test stimuli were perceived on a stereoscopic display in a certain order of changing significant factors, we summarized the visual variations of crosstalk to its perceptual attributes, which in turn are shadow degree, separation distance,
and spatial position of crosstalk. Shadow degree and separation distance are 2-D perceptual attributes existing in single eye
view, and they are still maintained in 3-D perception. On the
other hand, spatial position emphasizes the perceptual attribute
of crosstalk in 3-D perception when the left and right views are
fused.
A. 2-D Perceptual Attributes
When considering observation i, we first tested the impact of
crosstalk level and camera baseline on crosstalk perception for
each scene content. As shown in Table III, both crosstalk level
and camera baseline have a significant impact on the crosstalk
perception in each scene content, generally speaking. However,
an exception is that camera baseline has no significant impact
on crosstalk perception for Pantomime. In addition, for most
scenes except for Champagne and Pantomime, the crosstalk
level and camera baseline have interaction in terms of the
impact on crosstalk perception.
Regarding observation ii, the impact of scene content on
crosstalk perception between every two scenes has been reported in Table IV. It can be seen that there is a significant
difference between scene contents in terms of crosstalk perception for most scene content pairs. However, there are two
exceptional pairs, Champagne and Newspaper, as well as Outdoor and Dog. In other words, there is no significant difference
between Champagne and Newspaper when their crosstalk
perceptions are considered. The same argument also applies to
Outdoor and Dog, although it may seem that Pantomime and
Dog are similar when judging from Fig. 3.
All of these observations can be further verified if we consider
three factors together for crosstalk perception on the whole test
stimuli. Table V shows that crosstalk level, camera baseline, and
scene content have significant impacts on crosstalk perception,
respectively, and they have 2-factors interactions in terms of the
impact on crosstalk perception. However, 3-factors interaction
does not have a significant impact on crosstalk perception.
III. UNDERSTANDING OF CROSSTALK PERCEPTION
After identifying the significant factors, their relationship
with the perceptual attributes of crosstalk can be modeled.
Because the perceptual attributes of crosstalk are the sensorial
results of HVS and closer to perceptive viewpoint, the gap
between low-level significant factors and high-level users’
perception on crosstalk can be bridged.
Ten test stimuli with different amplitudes of the significant
factors were selected to represent the perceptual attributes of
crosstalk, as shown in Fig. 4. The red rectangular regions highlight the selected regions in the images for the sake of discussion of the crosstalk, and have been enlarged and placed on a top
right or left corner of each image. These stimuli consist of two
scene contents (Champagne and Dog), which were applied five
1) Shadow Degree of Crosstalk: We define it as the distinctness of crosstalk against the original view. If the shadow degree
increases, crosstalk becomes more annoying. When viewing the
Champagne and Dog presentations from top downwards in the
first and third columns, it can be noticed that the shadow degree
of crosstalk becomes stronger with the increase of the crosstalk
level. It indicates that crosstalk level relates to shadow degree
of crosstalk. Moreover, the shadow degree is more visible in
the Champagne presentations when compared to the Dog presentations. This is due to the different contrast structures in
Champagne and Dog presentations. Thus, the contrast of scene
content also relates to shadow degree of crosstalk. In fact, the
contrast of scene content and crosstalk level reflect the shadow
degree of crosstalk mutually, which implies that the 2-factors
interaction between crosstalk level and contrast of scene content has a relationship with the shadow degree of crosstalk.
2) Separation Distance of Crosstalk: We define it as the distance of crosstalk separated from the original view. Crosstalk
is more annoying with increasing the separation distance.
When viewing the Champagne and Dog presentations from
top downwards in the second and fourth columns, it can be
noticed that the separation distance of crosstalk becomes larger
with the increase of camera baseline. It indicates that camera
baseline reflects the separation distance of crosstalk, which
shows that camera baseline has a relationship with separation
distance. Moreover, the separation distance of crosstalk is more
visible in Champagne presentations as opposed to Dog. This
is due to different relative depth structures in Champagne and
Dog presentations, thus depth of scene content also relates to
separation distance of crosstalk. Actually, the camera baseline
and relative depth structure of scene content together, namely,
disparity, determine the separation distance of crosstalk. This
confirms that the 2-factors interaction between camera baseline
and depth of scene content relates to separation distance of
crosstalk.
3) Interaction Between 2-D Perceptual Attributes: If we pay
attention to the change of crosstalk level and camera baseline
together when viewing the Champagne and Dog presentations
from left to right in the first and third rows, it can be noticed
that the shadow degree and separation distance of crosstalk interact mutually. It reflects that the interaction between crosstalk
level and camera baseline has a relationship with the interaction between 2-D perceptual attributes. Moreover, less shadow
332
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012
Fig. 4. Left eye view for scene contents Champagne and Dog with different combinations of camera baseline and crosstalk level. (a) 100 mm, 3%. (b) 50 mm,
13%. (c) 100 mm, 3%. (d) 50 mm, 13%. (e) 100 mm, 13%. (f) 100 mm, 13%. (g) 100 mm, 13%. (h) 100 mm, 13%. (i) 100 mm, 18%. (j) 150 mm, 13%. (k) 100
mm, 18%. (l) 150 mm, 13%.
degree and separation distance of crosstalk can be perceived
with the Dog presentations when the same camera baseline and
crosstalk level changes were applied as that of Champagne because of the difference of scene content including both contrast
and relative depth structure. Thus, scene content relates to interaction between 2-D perceptual attributes. Furthermore, this
also confirms that the impact of crosstalk level and camera baseline on crosstalk perception varies with the scene content. Thus,
the 3-factors interaction between crosstalk level, camera baseline and scene content has a relationship with the interaction
between 2-D perceptual attributes.
B. 3-D Perceptual Attribute
Spatial position of crosstalk is defined as the impact of
crosstalk position in 3-D space on perception when the left
and right views are fused and 3-D perception is generated.
Specifically, we observed that spatial position of crosstalk only
impacts the visible crosstalk satisfying requirements of shadow
degree and separation distance of crosstalk. In our experiments,
the crosstalk of foreground objects usually has more impact
on perception than background objects due to the fact that the
foreground objects are closer to the test subjects and have larger
disparity because of parallel camera arrangement and rectification. Therefore, relative depth structure of scene content in
the region of visible crosstalk relates to the perceptual attribute
spatial position of crosstalk. Additionally, focus of attention
might also have an important role behind the observation from
our experiments. However, as in the data evaluated in this
work, foreground objects were always also a priori the focus
TABLE VI
RELATIONSHIP BETWEEN PERCEPTUAL ATTRIBUTES OF CROSSTALK AND
SIGNIFICANT FACTORS: CROSSTALK LEVEL (CL), CAMERA BASELINE (CB),
CONTRAST OF SCENE CONTENT (SC_C), DEPTH OF SCENE CONTENT (SC_D),
AND BOTH CONTRAST AND DEPTH OF SCENE CONTENT (SC_CD)
of attention. In future work, we will further investigate the
influence of focus of attention on the crosstalk perception.
C. Summary
Table VI lists the relationship between the perceptual attributes and related factors as explained earlier. As can be
seen from the table, the 2-D perceptual attributes include all
of the significant factors in Table V, which indicates that 2-D
perceptual attributes can characterize low-level significant
factors while in a more perceptual level of HVS. Moreover, the
table also shows that 2-D perceptual attributes alone are not
enough to explain the visual perception of crosstalk. Thus, 3-D
perceptual attribute should be modelled to predict the users’
perception on crosstalk. It indicates that an objective metric
proposed directly from the significant factors in Table V is not
comprehensive. However, selecting those stimuli with distinct
visual variations corresponding to the significant factors indeed reduces the complexity and facilitates observation of the
perceptual attributes.
XING et al.: ASSESSMENT OF STEREOSCOPIC CROSSTALK PERCEPTION
333
Fig. 5. Illustrations of SSIM map on Champagne and Dog. (a) 100 mm, 3%. (b) 50 mm, 13%. (c) 100 mm, 3%. (d) 50 mm, 13%. (e) 100 mm, 13%. (f) 100 mm,
13%. (g) 100 mm, 13%. (h) 100 mm, 13%. (i) 100 mm, 18%. (j) 150 mm, 13%. (k) 100 mm, 18%. (l) 150 mm, 13%.
IV. OBJECTIVE QUALITY METRIC
An objective metric for crosstalk perception can be developed based on modeling 2-D and 3-D perceptual attributes of
crosstalk. Here, we will explain what kinds of existing maps can
reflect the perceptual attributes, how these maps are combined
to construct a perceptual objective metric, and the experimental
results of the metric.
A. 2-D Perceptual Attributes Map
The 2-D perceptual attributes were illustrated in Fig. 4 using
the left eye view with crosstalk added distortion as in (4). It can
be noticed that shadow degree, separation distance of crosstalk,
and their interaction are most visible in the edge region with
high contrast. The Structural SIMilarity (SSIM) quality measure
proposed by Wang et al. [23] can describe the 2-D perceptual
attributes of crosstalk to some extent. SSIM assumes that the
measurement on structural information provides a good estimation of the perceived image quality because the HVS is highly
adapted to extract structural information from a visual scene.
A MATLAB implementation of the SSIM is accessible from
[24]. In addition, an SSIM map of a test image is also provided,
which allows a closer look at specific regions instead of the entire image. Considering the combination of 2-D and 3-D perceptual attributes in a single objective metric, the SSIM map
with quality measure on all the pixels is preferred, as opposed
to the SSIM with a single quality measure for the entire image.
SSIM is constructed based on the comparisons of three components: luminance, contrast, and structure, between an original
image without any distortions and its degraded version. In our
case, the original image is the one shown on the stereoscopic
display without any crosstalk , and the distorted version is the
one with both system-introduced and simulated crosstalk . Finally, the SSIM map is defined as follows:
(5)
where
denotes the SSIM algorithm and
is the generated SSIM map of the left eye view.
Fig. 5 is a representative illustration of SSIM map derived
from the crosstalk distorted Champagne and Dog presentations
in Fig. 4. In the SSIM map, 0 (black) at a pixel means the largest
difference between the original and crosstalk added image and
1 (white) denotes no difference. For Champagne, it can be seen
that when the crosstalk level is larger in the first column, the
shadow degree of crosstalk represented by the SSIM map is
darker. Also, when the camera distance is larger in the second
column, the separation distance of crosstalk represented by the
SSIM map becomes wider. In addition, their mutually interaction is also described by the SSIM map when the shadow degree
and separation distance change synchronously in the first and
third rows. The same situation exists with the Dog presentation.
Moreover, it can be seen that different shadow degrees and separation distances of crosstalk for Champagne and Dog with the
334
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012
presentation. Therefore, the foreground of Champagne is much
closer to its nearest depth plane than that of Dog.
The region of visible crosstalk is also defined based on the
SSIM map, because we observed that crosstalk is more visible
in the regions where the pixel value of SSIM map is smaller than
a threshold. A threshold of 0.977 was obtained experimentally
from our experiments. Therefore, the following equation is used
to define the filtered depth map as 3-D perceptual attribute map:
Fig. 6. Illustrations of depth map of Champagne and Dog when camera baseline is 150 mm.
(7)
denotes the filtered
where and are the pixel index, and
depth map corresponding to the visible crosstalk region of left
eye image.
C. Objective Metric for Crosstalk Perception
Fig. 7. Illustrations of filtered depth map of Champagne and Dog when camera
baseline is 150 mm and crosstalk level is 3%.
same camera baseline and crosstalk level can also be expressed
by the SSIM map, which means that the scene content difference
is also characterized. Thus, the SSIM map can reflect these 2-D
perceptual attributes, namely, shadow degree of crosstalk, separation distance of crosstalk, and their interactions.
As aforementioned, the 2-D and 3-D perceptual attributes can
be represented by the SSIM and filtered depth maps. Therefore,
the overall crosstalk perception is supposed to be an integration of the two maps. Since 3-D perceptual attributes discover
that visible crosstalk of foreground objects has more impacts on
perception than background objects, more weights should be assigned to the visible crosstalk of foreground than background. In
other words, SSIM map should be further weighted by filtered
depth map. Thus, the integration is performed in the following
equation:
(8)
B. 3-D Perceptual Attribute Map
Spatial position of crosstalk describes users’ perception of
crosstalk in 3-D space, which can be characterized because visible crosstalk of foreground objects should have more impact on
perception than background objects. Therefore, in order to form
a 3-D perceptual attribute map, depth structure of scene content
and region of visible crosstalk should be combined.
Relative depth structure of scene content can be represented
by the depth map. Depth estimation algorithms are usually
performed in two approaches: 1) from one single image using
monocular cues and 2) from stereo or multi images using stereo
(triangulation) cues. The latter is usually more accurate but requires corresponding intrinsic and extrinsic camera parameters
of the stereo images. Since the performance of the metric relies
on the accuracy of the depth estimation algorithm, we adopt the
latter approach and the Depth Estimation Reference Software
(DERS) [25] version 4.0 was employed in this paper. The depth
map of the original right eye view
is calculated as follows:
(6)
where
denotes the DERS algorithm proposed in [24] and
is the generated depth map of the right view.
is normalized to represent a relative 3-D depth in which 0 denotes
the farthest depth value and 255 the nearest. Fig. 6 gives an example of the depth map of Champagne and Dog. The farthest
and nearest depth values are 7.7 and 2.0 m for Champagne, and
8.2 and 2.5 m for Dog, respectively. However, they are both
normalized by a same factor 5.7 m. It can be seen that the foreground object champagne is much brighter than that in the Dog
(9)
where
and
denote the combined map and the
quality value predicted by the objective metric, respectively.
denotes the average operation. In (8), the filtered depth
map
is normalized into the interval
first by the
maximum depth value 255, and then subtracted it from 1 to
comply with the meaning of SSIM map that a lower pixel
value in SSIM map means a larger crosstalk distortion. When
two pixels with identical values in the SSIM map locate in the
foreground and background, respectively, the
value of
the foreground pixel will be smaller than the background pixel
after combining with the filtered depth map.
D. Experimental Results
The performance of an objective quality metric can be evaluated by a comparison with respect to the MOS values obtained
in subjective tests. The proposed metric was compared with
and
as well as other three
traditional 2-D metrics
metrics
,
, and
, which combine the 2-D and 3-D
perceptual attributes in different approaches as in the following
equations:
(10)
(11)
(12)
(13)
XING et al.: ASSESSMENT OF STEREOSCOPIC CROSSTALK PERCEPTION
335
TABLE VII
EVALUATION RESULTS OF DIFFERENT METRICS ON SUBJECTIVE DATASET
if
if
(14)
(15)
(16)
and
are the 2-D metrics calculated between
where
the original and crosstalk added left image, instead of original
left and right images.
is a combination of SSIM map
and the depth map
instead of the filtered depth map
as in the metric
. This means that
weights the entire
image while
only weights the region of visible crosstalk
in the image.
denotes the disparity map of the right eye
image, which is the result of a combination of relative depth
structure of scene content and camera settings, such as camera
baseline. Since
also contains the information about relative depth structure of scene content, we attempt to compare the
performance of the different metrics based on the disparity and
depth maps, respectively. In the equations, the filtered disparity
map
was obtained from
using the same approach as
from
, and metrics
and
followed the same
combination as in building
and
, respectively. Particularly, we adopted a stereo correspondence algorithm using the
Sum of Squared Difference plus Min Filter
to estimate the disparity map [26] in (13). The disparity map is a gray
image with black denoting the smallest disparity 0 pixel and
white being the largest 255 pixel.
For evaluating each metric , root mean squared error
(RMSE), Pearson correlation coefficient, and Spearman
rank-order correlation coefficient have been selected as the
evaluation criteria. They are calculated between objective
after a nonlinear regression using
values
(17)
suggested by the VQEG, and the subjective scores MOS, where
, , and
are the regression coefficients,
is the raw
value calculated from metric , and
is the exponential function. The main purpose of (17) is to unify
for each metric
to the range of MOS. Table VII reports the evaluation results.
According to the evaluation results, the objective metric for
can achieve a higher correlation
crosstalk perception
against the subjective MOS values when compared to traditional 2-D metrics (
and
), and other metrics (
,
and
). The performance of the proposed metric is
better than
and
, which indicates that the metric
taking 3-D characteristics into consideration can give a better
Fig. 8. Scatter plot of MOS of crosstalk perception versus predicted values
.
prediction of crosstalk perception than 2-D metrics. However,
different combinations of 3-D characteristics might have different prediction capabilities. It can be seen from Table VII that
has a worse performance than its counterpart
the metric
. Moreover, the performance of
and
is better
than the corresponding metrics
and
, respectively.
This indicates that weighting the region of visible crosstalk
only might be in accordance with users’ perception. However,
exhibit slightly poorer performance when compared to
the proposed metric
, which implies that relative depth
instead of absolute depth is more suitable for the weighting operation. Therefore,
that models the perceptual attributes
of crosstalk has the best performance. As the Pearson correlation of the proposed metric
is 88.4%, it is promising for
evaluating the crosstalk perception of stereoscopic images.
In order to have a closer look at the proposed metric of crosstalk
perception
in our subjective dataset, we validated its performance on different scene contents. Fig. 8 shows the scatter
plot of the MOS values versus predicted quality values
on
different scene contents. Based on the experimental results, the
performance of the proposed metric does not have a significant
difference between scene content while the impairments levels
can significantly influence the performance. In particular, the
proposed metric has better performance in predicting crosstalk
perception of stereoscopic images with low and high impairments than images with medium impairments. We think that the
performance difference might originate from the filtered depth
map where the dominating perception on the maximum crosstalk
of different impairments levels should be considered. However,
this conclusion needs to be further verified in future work.
336
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012
V. CONCLUSION
We have conducted subjective tests for stereoscopic
crosstalk perception with varying parameters of scene content,
camera baseline, and crosstalk level. The statistical results
show that crosstalk level, camera baseline, and scene content
have significant impacts on crosstalk perception, respectively,
and they have 2-factors interactions in terms of the impact
on crosstalk perception. Moreover, the perceptual attributes
(shadow degree, separation distance, and spatial position) of
crosstalk are summarized by observing the visual variations
when the significant factors (crosstalk level, camera baseline
and scene content) change. These perceptual attributes are
the sensorial results of the HVS and classified into two
categories: 2-D and 3-D perceptual attributes. Subsequently,
an objective metric for crosstalk perception has been proposed
by combining SSIM map and filtered depth map. The experimental results with respect to our subjective evaluation scores
have demonstrated promising performance of this metric,
achieving more than 88% correlation with the MOS results.
The performance of the proposed quality metrics is better
than traditional 2-D models and other compared metrics with
different combination methods.
ACKNOWLEDGMENT
The authors would like to thank Prof. L. A. Rønningen for
kindly allowing us to use his lab Caruso for subjective tests and
Dr. J, Xu for his helpful discussions and suggestions throughout
this work.
REFERENCES
[1] M. Lambooij, W. A. IJsselsteijn, M. Fortuin, and I. Heynderickx,
“Visual discomfort in stereoscopic displays: A review,” J. Imag. Sci.
Technol., vol. 53, no. 3, pp. 1–14, 2009.
[2] A. Boev, D. Hollosi, A. Gotchev, and K. Egiazarian, “Classification
and simulation of stereoscopic artifacts in mobile 3DTV content,” in
Proc. Stereoscopic Displays and Applications XX, San Jose, CA, 2009,
pp. 72371F–12.
[3] P. J. H. Seuntiens, L. M. J. Meesters, and W. A. IJsselsteijn, “Perceptual
attributes of crosstalk in 3-D images,” Displays, vol. 26, no. 4-5, pp.
177–183, 2005.
[4] A. Woods, “Understanding crosstalk in stereoscopic displays,” in Proc.
Conf. Three-Dimensional Syst. Applic., Tokyo, Japan, 2010.
[5] S. Pastoor, “Human factors of 3-D images: Results of recent research at
Heinrich-Hertz-Institut Berlin,” in Proc. Int. Display Workshop, 1995,
vol. 3, pp. 69–72.
[6] K. C. Huang et al., “A study of how crosstalk affects stereopsis in
stereoscopic displays,” in Proc. Conf. Stereoscop. Displays Virtual Reality Systems X, Santa Clara, CA, 2003, vol. 5006, pp. 247–253.
[7] L. Lipton, “Factors affecting ’ghosting’ in time-multiplexed
plano-stereoscopic CRT display systems,” True 3-D Imaging Tech.
Display Technol., vol. 761, pp. 75–78, 1987.
[8] F. L. Kooi and A. Toet, “Visual comfort of binocular and 3-D displays,”
Displays, vol. 25, pp. 99–108, 2004.
[9] A. Benoit, P. L. Callet, P. Campisi, and R. Cousseau, “Quality assessment of stereoscopic images,” EURASIP J. Image Video Process., vol.
2008, 2008.
[10] J. You, G. Jiang, L. Xing, and A. Perkis, “Quality of visual experience
for 3-D presentation: Stereoscopic image,” in High-Quality Visual
Experience: Creation, Processing and Interactivity of High-resolution and High-dimensional Video Signals. Berlin, Germany:
Springer-Verlag, 2010.
[11] R. Olsson and M. Sjostrom, “A depth dependent quality metric for evaluation of coded integral imaging based 3D-images,” in Proc. 3DTV
Conf.: True Vision–Capture, Transmission and Display of 3-D Video,
Kos Island, Greece, 2007.
[12] P. Gorley and N. Holliman, “Stereoscopic image quality metrics and
compression,” in Proc. Conf. Stereoscop. Displays Applic. XIX, San
Jose, CA, 2008, vol. 6803.
[13] Z. M. P. Sazzad, S. Yamanaka, Y. Kawayokeita, and Y. Horita,
“Stereoscopic image quality prediction,” in Proc. Int. Workshop
Quality of Multimedia Experience, San Diego, CA, 2009.
[14] D. Kim et al., “Depth map quality metric for three-dimensional video,”
in Proc. Stereoscop. Displays Applic. XX, San Jose, CA, 2009, vol.
7237.
[15] L. Xing, T. Ebrahimi, and A. Perkis, “Subjective evaluation of stereoscopic crosstalk perception,” in Proc. Conf. Vis. Commun. Image
Process., Huang Shan, China, 2010, vol. 7744, pp. 77441V–77441V-9.
[16] L. Xing, J. You, T. Ebrahimi, and A. Perkis, “A perceptual quality
metric for stereoscopic crosstalk perception,” in Proc. IEEE Int. Conf.
Image Process., Hong Kong, China, 2010, pp. 4033–4036.
[17] Methodology for the Subjective Assessment of the Quality of Television
Pictures, , Recommendation BT.500–11, ITU-R, 2002.
[18] Subjective Assessment of Stereoscopic Television Pictures, Recommendation BT.1438, ITU-R, 2000.
[19] W. Chen, J. Fournier, B. Marcus, and L. C. Patrick, “New requirements
of subjective video quality assessment methodologies for 3DTV,” in
Proc. Int. Workshop Video Process. Quality Metrics Consum. Electron.,
Scottsdale, AZ, 2010.
[20] , ISO/IEC JTC1/SC29/WG11, M15377, M15378, M15413, M15419,
Archamps, France, 2008.
[21] A. Boev, D. Hollosi, and A. Gotchev, “Software for simulation of
artefacts and database of impaired videos, software for simulation of
artefacts and database of impaired videos,” Mobile3DTV Project Rep.
216503 [Online]. Available: http://mobile3dtv.eu
[22] L. Goldmann, F. D. Simone, and T. Ebrahimi, “Acomprehensive database and subjective evaluation methodology for quality of experience
in stereoscopic video,” in Proc. Three-Dimensional Image Process. Applic., San Jose, CA, 2010, vol. 7526.
[23] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image
quality assessment: From error visibility to structural similarity,” IEEE
Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
[24] SSIM Implementation [Online]. Available: http://www.ece.uwaterloo.ca/~z70wang/research/ssim/
[25] M. Tanimoto et al., “Reference softwares for depth estimation and
view synthesis,” Archamps, France, 2008, ISO/IEC JTC1/SC29/WG11
MPEG2008/M15377.
[26] D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense
two-frame stereo correspondence algorithms,” J. Comput. Vis., vol. 47,
no. 1-3, pp. 7–42, 2002.
Liyuan Xing (S’11) was born in China 1981. She
received the M.S. degree in computer applied technology from the Institute of Computing Technology,
Chinese Academy of Sciences, Beijing, China, in
2006. She is currently working toward the Ph.D.
degree at the Centre for Quantifiable Quality of
Service (Q2S), Norwegian University of Science
and Technology (NTNU), Trondheim, Norway.
During her graduate work, she worked on contentbased multimedia analysis. She was a Senior Software Engineer with the R&D Center of Toshiba in
China, where she was involved with medical image processing and healthcare
enterprise integrating from 2006 to 2008. Her current research interests are
quality assessment and modeling within new application scenarios, namely in
the realm of 3-D, virtual environments, gaming, and 3DTV like scenarios, especially focus on stereoscopic media.
Ms. Xing was the recipient of the Chinese Academy of Sciences Liu Yongling
Scholarship Excellence Award in July 2006.
XING et al.: ASSESSMENT OF STEREOSCOPIC CROSSTALK PERCEPTION
Junyong You (M’08) received the B.S. and M.S.
degrees in computational mathematics and Ph.D.
degree in electronics and information engineering
from Xi’an Jiaotong University, Xi’an, China, in
1998, 2001, and 2007, respectively.
He is currently a Research Scientist with the
Department of Electronics and Telecommunications,
Norwegian University of Science and Technology,
Trondheim, Norway. He was a Senior Researcher
with Tampere University of Technology, Finland,
from 2007 to 2009. He is the author or coauthor
of more than 45 articles for scientific books, journals and conferences. His
research interests include semantic multimedia content analysis, visual attention mechanism and modeling, vision technology, and multimedia quality
of experience definition, measurement and improvement. He is serving as
an editorial board member for two international journals and has served as
a TPC member for several international conferences and workshops. He
was the local arrangement co-chair of the second International Workshop on
Quality of Multimedia Experience (QoMEX’10) and is the general co-chair
of International Workshop on Multimedia Quality of Experience: Modeling,
Evaluation and Directions (MQoE’11).
Touradj Ebrahimi (M’08) received the M.Sc.
and Ph.D. degrees from the Swiss Federal Institute
of Technology (EPFL), Lausanne, Switzerland,
in 1989 and 1992, respectively, both in electrical
engineering.
In 1993, he was a Research Engineer with the
Corporate Research Laboratories, Sony Corporation,
Tokyo, Japan, where he conducted research on
advanced video compression techniques for storage
applications. In 1994, he served as a Research Consultant with AT&T Bell Laboratories working on
very low bitrate video coding. He is currently a Professor with EPFL, heading
its Multimedia Signal Processing Group. He is also adjunct Professor with the
Center of Quantifiable Quality of Service at Norwegian University of Science
and Technology, Trondheim, Norway. He has been very active in the field of
multimedia signal processing with a special emphasis on quality evaluations
and metrics. In February 2001, he was the first to suggest the notion of Quality
of Experience (QoE) be used in multimedia communication systems as a dual
and complement to Quality of Service (QoS). He has authored or coauthored
more than 40 publications in the field of quality metrics for various multimedia
content, an initiator of Advanced Image Coding within JPEG standardization
committee, which develops subjective and objective quality metrics for future
imaging systems, one of those responsible for the subjective evaluations of
MPEG’s next generation video compression standard, in March 2010, was
one of the founding members, and the first general co-chair of International
Workshop on Quality of Multimedia Experience (QoMEX). He is the principal
investigator and Chair of the COST Action IC1003 (QUALINET) which
gathers more than 150 researchers in a consortium related to QoE issues in
multimedia systems and services.
337
Andrew Perkis (SM’02) was born in Norway
in 1961. He received the Siv.Ing and Dr. Techn.
degrees in 1985 and 1994, respectively, and the M.S.
degree in technology management in cooperation
from Norwegian University of Science and Technology, Trondheim, Norway, NHH, and the National
University of Singapore in 2008.
Since 1993, he became an Associate Professor
with the Department of Telecommunications, Norwegian University of Science and Technology,
Trondheim, Norway, and Full Professor since 2003.
In 1999/2000, he was a Visiting Professor with The University of Wollongong,
Australia, and in 2008 a Visiting Professor with the National University of
Singapore. He is responsible for “Network Media Handling” within the National Centre of Excellence–Quantifiable Quality of Service in communication
systems at NTNU. Currently he is focusing on Multimedia Signal Processing,
specifically within methods and functionality of content representation, quality
assessment and its use within the media value chain in a variety of applications.
He is also involved in setting up directions and visions for new research within
media technology and art. Within applied research he is heavily involved
in multi platform publishing, especially to handheld devices. He has been
involved with the start-up company Adactus and commercial aspects of Digital
Cinema role out through running the Norwegian trial project NORDIC. He has
authored or coauthored more than 200 publications in international conferences
and workshops and has more than 50 contributions to International standards
bodies.
Dr. Perkis is member of The Norwegian Academy of Technological Sciences
(NTVA), the Association for Computing Machinery, The Norwegian Society of
Chartered Engineers (TEKNA), and The Norwegian Signal Processing Society
(NORSIG).