Academia.eduAcademia.edu

Assessment of Stereoscopic Crosstalk Perception

2012, IEEE Transactions on Multimedia

Stereoscopic three-dimensional (3-D) services do not always prevail when compared with their two-dimensional (2-D) counterparts, though the former can provide more immersive experience with the help of binocular depth. Various specific 3-D artefacts might cause discomfort and severely degrade the Quality of Experience (QoE). In this paper, we analyze one of the most annoying artefacts in the visualization stage of stereoscopic imaging, namely, crosstalk, by conducting extensive subjective quality tests. A statistical analysis of the subjective scores reveals that both scene content and camera baseline have significant impacts on crosstalk perception, in addition to the crosstalk level itself. Based on the observed visual variations during changes in significant factors, three perceptual attributes of crosstalk are summarized as the sensorial results of the human visual system (HVS). These are shadow degree, separation distance, and spatial position of crosstalk. They are classified into two categories: 2-D and 3-D perceptual attributes, which can be described by a Structural SIMilarity (SSIM) map and a filtered depth map, respectively. An objective quality metric for predicting crosstalk perception is then proposed by combining the two maps. The experimental results demonstrate that the proposed metric has a high correlation (over 88%) when compared with subjective quality scores in a wide variety of situations.

326 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012 Assessment of Stereoscopic Crosstalk Perception Liyuan Xing, Student Member, IEEE, Junyong You, Member, IEEE, Touradj Ebrahimi, Member, IEEE, and Andrew Perkis, Senior Member, IEEE Abstract—Stereoscopic three-dimensional (3-D) services do not always prevail when compared with their two-dimensional (2-D) counterparts, though the former can provide more immersive experience with the help of binocular depth. Various specific 3-D artefacts might cause discomfort and severely degrade the Quality of Experience (QoE). In this paper, we analyze one of the most annoying artefacts in the visualization stage of stereoscopic imaging, namely, crosstalk, by conducting extensive subjective quality tests. A statistical analysis of the subjective scores reveals that both scene content and camera baseline have significant impacts on crosstalk perception, in addition to the crosstalk level itself. Based on the observed visual variations during changes in significant factors, three perceptual attributes of crosstalk are summarized as the sensorial results of the human visual system (HVS). These are shadow degree, separation distance, and spatial position of crosstalk. They are classified into two categories: 2-D and 3-D perceptual attributes, which can be described by a Structural SIMilarity (SSIM) map and a filtered depth map, respectively. An objective quality metric for predicting crosstalk perception is then proposed by combining the two maps. The experimental results demonstrate that the proposed metric has a high correlation (over 88%) when compared with subjective quality scores in a wide variety of situations. Index Terms—Crosstalk perception, objective metric, perceptual attribute, subjective evaluation. I. INTRODUCTION TEREOSCOPIC three-dimensional (3-D) imaging is based on simultaneously capturing a pair of two-dimensional (2-D) images and then separately delivering them to respective eyes. Consequently, 3-D perception is generated in the human visual system (HVS). Although stereoscopic 3-D services introduce a new modality (binocular depth) that can offer increasingly richer experience (immersion and realism) to the end-users, they do not always prevail when compared with their 2-D counterparts. One of the major drawbacks of stereoscopic 3-D services is visual discomfort, which can potentially cause users to feel uncomfortable and severely degrade the viewing experience. S Manuscript received March 24, 2011; revised July 20, 2011; accepted October 01, 2011. Date of publication October 18, 2011; date of current version March 21, 2012. This work was supported by the Research Council of Norwegian University of Science and Technology and UNINETT. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Weisi Lin. L. Xing, J. You. and A. Perkis are with the Centre for Quantifiable Quality of Service (Q2S) in Communication Systems, Norwegian University of Science and Technology (NTNU), N-7491 Trondheim, Norway (e-mail: liyuan@q2s. ntnu.no; junyong.you@ieee.org; andrew@iet.ntnu.no). T. Ebrahimi is with the Multimedia Signal Processing Group (MMSPG), Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland (e-mail: touradj.ebrahimi@epfl.ch). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TMM.2011.2172402 The importance of various causes and aspects of visual discomfort is clarified in [1]. In particular, 3-D artefacts are considered to be one of the most prominent factors contributing to visual discomfort. Such artefacts can be introduced in each stage from the acquisition to the restitution in a typical 3-D processing chain [2]. In particular, crosstalk is one of the most annoying distortions in the visualization stage of a stereoscopic imaging system [3]. Crosstalk is produced by imperfect view separation that causes a small proportion of one eye image to be seen by the other eye as well. Crosstalk artefacts are usually perceived as ghosts, shadows, or double contours by human subjects. Nowadays, crosstalk exists in almost all stereoscopic displays. However, the mechanisms behind occurrence of crosstalk can be significantly different across different stereoscopic display technologies. These mechanisms have been analyzed in order to characterize and measure the components contributing to crosstalk. Therefore, crosstalk reduction can be achieved by reducing the effect of one or more of these components. Since it is not possible to completely eliminate crosstalk of displays with current technologies, researchers attempt to conceal crosstalk of a 3-D presentation using image processing methods before display. Such methods are usually categorized into crosstalk cancellation. Crosstalk cancellation does not always perform efficiently in all situations. These issues have been widely investigated in the literature, e.g., see a review in [4]. However, neither of the aforementioned methods can completely eliminate crosstalk artefacts. Therefore, it is beneficial to study how users perceive crosstalk of 3-D presentations. Comparatively few research efforts have been devoted to this topic. In [5], a visibility threshold of crosstalk for different amounts of disparity and image contrast ratios in grayscale patches is provided. It shows that the visibility of crosstalk increases with increasing image contrast and disparity. However, a stereoscopic presentation is the result of a combination of different contrasts and disparities per pixel over an entire image, and it is more practical to know how much crosstalk can be perceived when it is visible. Therefore, the authors of [3] investigated more realistic scenarios where natural scenes varying in crosstalk levels (0%, 5%, 10%, and 15%) and camera baselines (0, 4, and 12 cm) affect the perceptual attributes of crosstalk (perceived image distortion, perceived depth, and visual discomfort). However, only two, rather similar, natural scenes were used in their experiments. More scene contents with different depth structures and image contrasts should be taken into account when designing a subjective experiment, because depth structure of scene content together with camera baseline can in principle determine the disparity, one of the most major factors impacting crosstalk visibility [5]. Moreover, the authors of [6] found that monocular cues of images also play an important role in the crosstalk perception, in addition to contrast 1520-9210/$26.00 © 2011 IEEE XING et al.: ASSESSMENT OF STEREOSCOPIC CROSSTALK PERCEPTION ratio and disparity. In [7], it is shown that edges and high contrast of computer-generated wire-frames make crosstalk more visible when compared with natural images. This means that crosstalk can be more efficiently concealed on images with more texture or details. These observations partially support a hypothesis that scene content is an important factor impacting users’ perception of crosstalk. Although other artefacts, e.g., blur and vertical disparity as investigated in [8], may also have impact on the crosstalk perception, they can be often corrected by postprocessing techniques. Although subjective testing is the most reliable way to evaluate the perceived quality, it is time-consuming, expensive, and unsuitable for real-time applications. To deal with these drawbacks, objective metrics that can predict the human subjects’ judgment with a high correlation are desired. To develop good objective metrics, the perception mechanisms need to be well understood and taken into account. However, this is usually fairly difficult. Therefore, development of objective 3-D quality models is still in its early stages. Researchers first started with exploring whether or not traditional 2-D metrics can be applied to stereoscopic quality assessment [9], [10]. Subsequently, a few objective metrics [11]–[13] that take into account the characteristics of stereoscopic images have been proposed. However, most of the existing objective metrics are designed to assess quality degradations caused by lossy compression schemes. To the best of our knowledge, only one objective metric that considers noncompression quality degradations, induced during acquisition and display stages of stereoscopic media, has been proposed in [14]. This metric is modeled by a linear combination of three measurements, which evaluate the perceived depth, visual fatigue, and temporal consistency, respectively. In this paper, subjective tests [15] have been conducted to collect the evaluation scores on crosstalk perception of a wide range of 3-D stimuli, including different scene contents, camera baselines, and crosstalk levels. Thereby, a comprehensive database of crosstalk perception for a wide variety of situations has been created. Furthermore, based on a statistical analysis of subjective scores, scene content, camera baseline, and crosstalk level are found to have significant impacts on the perception of crosstalk. By changing the amplitude of the significant factors, three perceptual attributes of crosstalk in the HVS have been observed. These perceptual attributes are further used to design an objective quality metric [16] for crosstalk perception. The main contributions of the paper are twofold. First, our subjective tests provide a comprehensive database for crosstalk perception in stereoscopic images. Second, users’ subjective perception of crosstalk is predicted using an objective metric based on a rigorous analysis of perceptual attributes of crosstalk. The remainder of this paper is organized as follows. In Section II, we present the subjective tests on crosstalk perception as well as a statistical analysis of the subjective scores. In Section III, perceptual attributes of crosstalk are explained by an observation on the visual variations of stimuli when several significant factors change. Furthermore, a perceptual objective metric for crosstalk perception is proposed by describing the perceptual attributes of crosstalk and the experimental results are reported in Section IV. Finally, concluding remarks are given in Section V. 327 Fig. 1. Polarized display system used in subjective tests. II. SUBJECTIVE TESTS ON CROSSTALK PERCEPTION Several recommendations for subjective evaluation of visual stimuli have been issued by the International Telecommunication Union (ITU), e.g., the widely used ITU-R BT.500 [17] for television pictures. For subjective evaluation of stereoscopic television pictures, ITU-R BT.1438 [18] has made a few first steps, but it still lacks many details. The authors of [19] have summarized the lacks in the form of additional requirements. In this subjective test, we followed these methodologies and further customized them for the crosstalk perception. In the following, we will provide some details about laboratory environment where the subjective tests were conducted, how test stimuli were prepared, which test method was adopted, as well as what results were obtained from the subjective tests. A. Laboratory Environment 1) Display System: A polarization technique was used to present 3-D images, as illustrated in Fig. 1. Specifically, two Canon XEED SX50 projectors with a resolution of 1280 960 were placed on a Chief ASE-2000 Adjusta-Set Slide Stacker. The stacker can be adjusted with swivel, tilt, and leveling ranges. Two Cavision linear polarizing glass filters with size of 4 in 4 in were installed orthogonally in front of the projectors. In this way, two views were projected and superimposed onto the backside of a 2.40 m 1.20 m silver screen. The projected distance between the projectors and the silver screen was about 2 m, forming a projected region occupying the central area of the silver screen with a width of 1.12 m and height of 0.84 m. Images up-sampled by a bicubic interpolation method were displayed in full-screen mode. The subjects equipped with polarized glasses were asked to view 3-D images on the opposite side of the silver screen. The viewing distance was set to about five times the image height 0.84 m 5 , as suggested in [17]. The field of view (FOV) was thus 15 . 2) Alignment of Display System: Prior to the tests, the display system was calibrated to align the two projectors. In particular, the positions of two projectors were adjusted to guarantee that the center points of projectors, projected region, and silver screen, positioned in the same horizontal line (center horizontal line as shown in Fig. 1) and the line was perpendicular to the silver screen. Moreover, the angles of stackers and the keystones of the projectors were adjusted with the help of projected Hermann grid images. The adjustment of display system 328 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012 was finished once the two Hermann grid images from the left and right projectors were exactly overlapped. 3) Measurement of System-Introduced Crosstalk: After the alignment, the system-introduced crosstalk was measured immediately. As introduced in [4], the terminology and mathematical definitions of crosstalk are diverse and sometimes contradictory. We adopt the definition of system-introduced crosstalk as the degree of the unexpected light leakage from the unintended channel to the intended channel. In particular, we measured the leakage in a situation when the left and right test images have the maximum difference in brightness. The systemintroduced crosstalk is measured mathematically as follows: (1) where denotes a pair of test images (the left image is in white completely while the right is in black), is another image pair both in black, denotes the luminance measured on the silver screen, and denotes the luminance after the right lens of polarized glasses which is cling to the silver screen. Therefore, denotes the system-introduced crosstalk from the left channel to the right, which is approximately 3% in our experiments. The consistence of the system-introduced crosstalk of polarized display has also been verified over the display, between projectors, and among different combinations of brightness between left and right test images. 4) Room Conditions: The test room had a length of 11.0 m, width of 5.6 m, and height of 2.7 m. During the subjective tests, all of the doors and windows of the test room were closed and covered by black curtains. In addition, the lights in the room were turned off except for one reading lamp on a desk in front of the subject, which was used to illuminate the keyboard when entering subjective scores. In this way, subjects could concentrate on the 3-D perception, as opposed to entering the scores using the keyboard. B. Test Stimuli Scene content and camera baseline are requisite factors for stereoscopic imaging and also affect users’ perception on crosstalk. Therefore, scene content, camera baseline, and crosstalk level were selected as three observed factors in the subjective tests of crosstalk perception. In particular, three camera baselines and four crosstalk levels were applied to six scene contents, which resulted in 72 test stimuli in total. 1) Scene Content: Seven multiview sequences (one for training) from the MPEG [20] were chosen as representative scene contents, as shown in Fig. 2. These scene contents cover a wide range of depth structures, contrasts, colors, edges, and textures, which were considered as potential factors impacting users’ perception of crosstalk. In particular, a wide range of depth structures were obtained by including both indoor and outdoor scenes. 2) Camera Baseline: Three camera baselines were formed from four consecutive cameras. The leftmost camera always served as the left eye view and the other three cameras took turns as the right eye views for 3-D images. In this way, three 3-D images with different camera baselines were generated for Fig. 2. Visual samples of the selected scenes. (a) Book arrival. (b) Champagne. (c) Dog. (d) Love bird. (e) Outdoor. (f) Pantomime. (g) Newspaper. NUMBER OF THE TABLE I SELECTED CAMERAS FROM LEFT TO RIGHT RESULTING CAMERA BASELINES AND THE each scene. Table I gives more information about the selected cameras and the resulting camera baselines of the 3-D images. 3) Crosstalk Level: In order to simulate different levels of system-introduced crosstalk for different displays, crosstalk artefacts were added to three 3-D image pairs, to which four different crosstalk levels were introduced to each using the algorithm developed in [21]. This algorithm can be summarized by the following equations: (2) and denote the original left and right views, where and are the distorted views by simulating system-introduced crosstalk distortions, and the parameter is to adjust the level of crosstalk distortion. According to (2), the simulating algorithm keeps a consistent characteristic of the system-introduced crosstalk of polarized display by applying the same leakage percentage to all pixels in the entire image, both the left and right views, and different brightness of all pixels. In our experiments, the system-introduced crosstalk is 3%, which should be added to the simulated crosstalk in image pairs in (2). Therefore, the overall system-introduced crosstalk perceived by the users is defined as follows: (3) XING et al.: ASSESSMENT OF STEREOSCOPIC CROSSTALK PERCEPTION 329 where is the overall system-introduced crosstalk combining both the system-introduced crosstalk and the simulated crosstalk . As crosstalk aroused by stereoscopic techniques usually ranges from 0 to 15% [5] and the image quality might be very low if the crosstalk level is large, e.g., over 15% [3], the parameter was set to 0, 5%, 10% and 15%, respectively, in our subjective tests. Thus, the overall crosstalk levels in our experiments were actually , i.e., 3%, 8% 13% and 18%, respectively. As the maximum pixel value change tuned by is only , its effect can be ignored. Therefore, the overall system-introduced crosstalk is simulated as following based on an additive rule: TABLE II EXPLANATIONS OF FIVE CATEGORIAL ADJECTIVAL LEVELS TRAINING EXAMPLES FROM BOOK ARRIVAL AND THEIR (4) Equation (4) indicates that the different simulated crosstalk levels can be applied to a stereoscopic display with a consistent system-introduced crosstalk level. Therefore, the crosstalk level will refer to the overall crosstalk level in this work. C. Test Methodology 1) Single Stimulus: Among different methodologies for subjective quality assessment of Standard Definition TeleVision (SDTV) pictures in ITU-R BT.500 [17], three widely used methodologies are double stimulus continuous quality scale (DSCQS), double stimulus impairment scale (DSIS), and single stimulus (SS). In this study, as several camera baselines for each scene have been taken into account, it is difficult to choose an original 3-D image as the reference. Therefore, we adopted the SS method. The SS method was also used in assessing the quality levels of stereoscopic images with varying camera baselines in the literature [3], [22]. In addition, in order for subjects to have sufficient time to generate their 3-D perception and have an extensive exploration of still 3-D images, a minor modification was made on the SS method such that the subjects could freely decide the viewing time for each image as in [3]. 2) Test Interface: In order to support the adaptive SS methodology, a special interface was developed to conveniently display the stereoscopic images in a random order. In addition, a subject could conveniently and freely decide when he/she moved to the next image pairs by pressing “Ctrl” key on the keyboard. The score of the current image pairs was recorded by pressing a “Numerical” key instead of writing on an answer sheet. Other special considerations, such as displaying in full screen, disabling unnecessary keys, updating the scores. and so on, were also included in the developed interface. 3) Subjects: Before the training sessions, visual perception related characteristics of the subjects were collected, including pupillary distance (measured by a ruler), normal or corrected binocular vision (tested by the Snellen chart), color vision (tested by the Ishihara), and stereo vision (tested by the TV-04 and TV-07 in ITU-R BT.1438 [18]). A total of 28 subjects participated in the tests, consisting of 15 males and 13 females, aged from 23 to 46 years old. The binocular vision of all of the subjects was above 0.80 with a mean of 1.05 and a standard deviation of 0.28. Although seven subjects had monocular vision differences of either 0.4 or 0.2, all of the subjects could perceive the binocular depth. 4) Training Sessions: Subjects participated in both training and test sessions individually. During the training sessions, an example of five categorical adjectival levels (see Table II) was shown to the subject in order to benchmark and harmonize their measuring scales. The Book Arrival scene was selected by expert viewers in such a way that each quality level was represented by an example image and that these example images could cover a full range of quality levels within the set of test stimuli. When each example image was displayed, the operator verbally explained the corresponding quality level to the subject. In addition, a detailed explanation of every scale was provided to subjects in form of written instructions (see Table II). Subjects were encouraged to view the representative examples as long as they wished and asked questions if they needed any further clarifications. The training sessions would continue until subjects could understand and distinguish the five different quality levels. 5) Test Sessions: During the test sessions, subjects were first presented with three dummy 3-D images from the Book Arrival content, which were not used in the training sessions. These dummy images were used to stabilize subjects’ judgment, and the corresponding scores were not included in the subsequent data analysis. Following the dummy images, 72 test images were randomly shown to the subjects. A new 3-D image was shown after a subject had entered his/her score for the previous one. During the test period, the subjects were not allowed to ask questions in order to avoid any interruption during the entire session. D. Subjective Results Analysis The subjective scores of the 72 test stimuli given by 28 subjects are analyzed here. Particularly, we aim to analyze the relationship between crosstalk perception and three potential significant factors, including scene content, camera baseline, and crosstalk level. 1) Normality Test and Outlier Removal: In order to apply arithmetic mean value as mean opinion scores (MOS) and use parametric statistical analysis methods, such as analysis of variance (ANOVA), the normality of subjective scores across subjects needs to be validated. The test recommended in [17] based on calculating the kurtosis coefficient was adopted for a 330 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012 IMPACT Fig. 3. MOS and CI (significance of 95%) of subjective scores on crosstalk perception for scene contents. normality test. We classified the test results into three groups: normal , close to normal ( or ), and abnormal ( or ). If the total proportion of normal and close to normal was more than 80%, we assumed that the subjective scores in our tests subject to the normal distribution. The results showed that the majority of stimuli (55 over 72) were normal distributed and (11 over 72) were close to normal, while others (6 over 72) were not. Therefore, we can assume that the subjective scores subject to the normal distribution. A screening test of subjects was also performed according to a guideline in [17]. Subjects who had produced votes significantly distant from the average scores should be removed. Consequently, one outlier was detected and the corresponding results were excluded from the following analysis. 2) Observations: After removing the outlier, MOS and 95% confidence interval (CI) were computed and plotted as a function of camera baseline and crosstalk level for all of the six scene contents separately, as shown in Fig. 3. A number of observations can be made based on the results in those plots. OF TABLE III CROSSTALK LEVEL(CL) AND CAMERA BASELINE(CB) CROSSTALK PERCEPTION FOR EACH SCENE ON Generally speaking, the MOS values decrease as the level of crosstalk distortions increases. However, the decreasing degree of MOS for Dog and Pantomime is not significant as those in other four scenes. When considering the impact of camera baseline, we can observe that there is a general tendency of reduction of the MOS values of crosstalk perception with increasing camera baseline. However, while this tendency is significant for the near indoor scenes (Champagne and Newspaper), it is less significant for others, especially for Dog and Pantomime. Therefore, we can summarize the observations as follows: • observation i: crosstalk level and camera baseline have an impact on crosstalk perception. • observation ii: the impact of crosstalk level and camera baseline on crosstalk perception varies with scene content. In addition, the individual curves in Fig. 3 show that even the highest MOS values of Champagne and Newspaper are still below 4, which indicates that the system-introduced crosstalk is more perceptible in close-up scenes. Furthermore, we also noticed that there exist exceptions where MOS values increase with the increasing of camera baseline and crosstalk level. Hence, crosstalk perception might be influenced by other perceptual attributes, such as perceived depth. 3) Statistical Analysis: In order to verify the observations and evaluate the impact of the independent variables (scene content, camera baseline, crosstalk level) on the dependent variable (crosstalk perception), we utilized ANOVA to analyze the subjective scores obtained in our tests. ANOVA is a general technique that can be used to test the equality hypothesis of means among two or more groups. These groups are classified by factors (independent variables whose settings are controlled and varied by the operator) or levels (the intensity settings of a factor). An -way ANOVA treats factors and the null hypothesis includes: 1) there is no difference in the means of each factor and 2) there is no interaction between -factors . The null hypothesis is verified using the -test and can be easily judged by the -value. When the -value is smaller than 0.05, the null hypothesis is rejected, which means there is a significant difference in means. In particular, there is a significant difference between the levels of a factor such that the factor has significant effect or the difference between the levels of one factor is not same for the levels of other factors such that there is an interaction between different factors. We used Statistical Package for the Social Sciences (SPSS) statistics toolbox for our analysis. Tables III–V show the ANOVA results for different factors. In these results, “ ” indicates that the corresponding factor has a significant effect on the crosstalk perception or multiple factors have interactions in terms of the impact on the crosstalk perception, and “—” means the factor has no significant effect or there is no interaction between multiple factors. XING et al.: ASSESSMENT OF STEREOSCOPIC CROSSTALK PERCEPTION TABLE IV IMPACT OF SCENE CONTENT(SC) ON CROSSTALK PERCEPTION BETWEEN EVERY TWO SCENES TABLE V IMPACT OF CROSSTALK LEVEL (CL), CAMERA BASELINE (CB), AND SCENE CONTENT (SC) ON CROSSTALK PERCEPTION FOR ALL THE SCENES 331 combinations of camera baselines and crosstalk levels, respectively. Specifically, the selected scene contents have comparatively large differences in depth structures and image contrast. When these test stimuli were perceived on a stereoscopic display in a certain order of changing significant factors, we summarized the visual variations of crosstalk to its perceptual attributes, which in turn are shadow degree, separation distance, and spatial position of crosstalk. Shadow degree and separation distance are 2-D perceptual attributes existing in single eye view, and they are still maintained in 3-D perception. On the other hand, spatial position emphasizes the perceptual attribute of crosstalk in 3-D perception when the left and right views are fused. A. 2-D Perceptual Attributes When considering observation i, we first tested the impact of crosstalk level and camera baseline on crosstalk perception for each scene content. As shown in Table III, both crosstalk level and camera baseline have a significant impact on the crosstalk perception in each scene content, generally speaking. However, an exception is that camera baseline has no significant impact on crosstalk perception for Pantomime. In addition, for most scenes except for Champagne and Pantomime, the crosstalk level and camera baseline have interaction in terms of the impact on crosstalk perception. Regarding observation ii, the impact of scene content on crosstalk perception between every two scenes has been reported in Table IV. It can be seen that there is a significant difference between scene contents in terms of crosstalk perception for most scene content pairs. However, there are two exceptional pairs, Champagne and Newspaper, as well as Outdoor and Dog. In other words, there is no significant difference between Champagne and Newspaper when their crosstalk perceptions are considered. The same argument also applies to Outdoor and Dog, although it may seem that Pantomime and Dog are similar when judging from Fig. 3. All of these observations can be further verified if we consider three factors together for crosstalk perception on the whole test stimuli. Table V shows that crosstalk level, camera baseline, and scene content have significant impacts on crosstalk perception, respectively, and they have 2-factors interactions in terms of the impact on crosstalk perception. However, 3-factors interaction does not have a significant impact on crosstalk perception. III. UNDERSTANDING OF CROSSTALK PERCEPTION After identifying the significant factors, their relationship with the perceptual attributes of crosstalk can be modeled. Because the perceptual attributes of crosstalk are the sensorial results of HVS and closer to perceptive viewpoint, the gap between low-level significant factors and high-level users’ perception on crosstalk can be bridged. Ten test stimuli with different amplitudes of the significant factors were selected to represent the perceptual attributes of crosstalk, as shown in Fig. 4. The red rectangular regions highlight the selected regions in the images for the sake of discussion of the crosstalk, and have been enlarged and placed on a top right or left corner of each image. These stimuli consist of two scene contents (Champagne and Dog), which were applied five 1) Shadow Degree of Crosstalk: We define it as the distinctness of crosstalk against the original view. If the shadow degree increases, crosstalk becomes more annoying. When viewing the Champagne and Dog presentations from top downwards in the first and third columns, it can be noticed that the shadow degree of crosstalk becomes stronger with the increase of the crosstalk level. It indicates that crosstalk level relates to shadow degree of crosstalk. Moreover, the shadow degree is more visible in the Champagne presentations when compared to the Dog presentations. This is due to the different contrast structures in Champagne and Dog presentations. Thus, the contrast of scene content also relates to shadow degree of crosstalk. In fact, the contrast of scene content and crosstalk level reflect the shadow degree of crosstalk mutually, which implies that the 2-factors interaction between crosstalk level and contrast of scene content has a relationship with the shadow degree of crosstalk. 2) Separation Distance of Crosstalk: We define it as the distance of crosstalk separated from the original view. Crosstalk is more annoying with increasing the separation distance. When viewing the Champagne and Dog presentations from top downwards in the second and fourth columns, it can be noticed that the separation distance of crosstalk becomes larger with the increase of camera baseline. It indicates that camera baseline reflects the separation distance of crosstalk, which shows that camera baseline has a relationship with separation distance. Moreover, the separation distance of crosstalk is more visible in Champagne presentations as opposed to Dog. This is due to different relative depth structures in Champagne and Dog presentations, thus depth of scene content also relates to separation distance of crosstalk. Actually, the camera baseline and relative depth structure of scene content together, namely, disparity, determine the separation distance of crosstalk. This confirms that the 2-factors interaction between camera baseline and depth of scene content relates to separation distance of crosstalk. 3) Interaction Between 2-D Perceptual Attributes: If we pay attention to the change of crosstalk level and camera baseline together when viewing the Champagne and Dog presentations from left to right in the first and third rows, it can be noticed that the shadow degree and separation distance of crosstalk interact mutually. It reflects that the interaction between crosstalk level and camera baseline has a relationship with the interaction between 2-D perceptual attributes. Moreover, less shadow 332 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012 Fig. 4. Left eye view for scene contents Champagne and Dog with different combinations of camera baseline and crosstalk level. (a) 100 mm, 3%. (b) 50 mm, 13%. (c) 100 mm, 3%. (d) 50 mm, 13%. (e) 100 mm, 13%. (f) 100 mm, 13%. (g) 100 mm, 13%. (h) 100 mm, 13%. (i) 100 mm, 18%. (j) 150 mm, 13%. (k) 100 mm, 18%. (l) 150 mm, 13%. degree and separation distance of crosstalk can be perceived with the Dog presentations when the same camera baseline and crosstalk level changes were applied as that of Champagne because of the difference of scene content including both contrast and relative depth structure. Thus, scene content relates to interaction between 2-D perceptual attributes. Furthermore, this also confirms that the impact of crosstalk level and camera baseline on crosstalk perception varies with the scene content. Thus, the 3-factors interaction between crosstalk level, camera baseline and scene content has a relationship with the interaction between 2-D perceptual attributes. B. 3-D Perceptual Attribute Spatial position of crosstalk is defined as the impact of crosstalk position in 3-D space on perception when the left and right views are fused and 3-D perception is generated. Specifically, we observed that spatial position of crosstalk only impacts the visible crosstalk satisfying requirements of shadow degree and separation distance of crosstalk. In our experiments, the crosstalk of foreground objects usually has more impact on perception than background objects due to the fact that the foreground objects are closer to the test subjects and have larger disparity because of parallel camera arrangement and rectification. Therefore, relative depth structure of scene content in the region of visible crosstalk relates to the perceptual attribute spatial position of crosstalk. Additionally, focus of attention might also have an important role behind the observation from our experiments. However, as in the data evaluated in this work, foreground objects were always also a priori the focus TABLE VI RELATIONSHIP BETWEEN PERCEPTUAL ATTRIBUTES OF CROSSTALK AND SIGNIFICANT FACTORS: CROSSTALK LEVEL (CL), CAMERA BASELINE (CB), CONTRAST OF SCENE CONTENT (SC_C), DEPTH OF SCENE CONTENT (SC_D), AND BOTH CONTRAST AND DEPTH OF SCENE CONTENT (SC_CD) of attention. In future work, we will further investigate the influence of focus of attention on the crosstalk perception. C. Summary Table VI lists the relationship between the perceptual attributes and related factors as explained earlier. As can be seen from the table, the 2-D perceptual attributes include all of the significant factors in Table V, which indicates that 2-D perceptual attributes can characterize low-level significant factors while in a more perceptual level of HVS. Moreover, the table also shows that 2-D perceptual attributes alone are not enough to explain the visual perception of crosstalk. Thus, 3-D perceptual attribute should be modelled to predict the users’ perception on crosstalk. It indicates that an objective metric proposed directly from the significant factors in Table V is not comprehensive. However, selecting those stimuli with distinct visual variations corresponding to the significant factors indeed reduces the complexity and facilitates observation of the perceptual attributes. XING et al.: ASSESSMENT OF STEREOSCOPIC CROSSTALK PERCEPTION 333 Fig. 5. Illustrations of SSIM map on Champagne and Dog. (a) 100 mm, 3%. (b) 50 mm, 13%. (c) 100 mm, 3%. (d) 50 mm, 13%. (e) 100 mm, 13%. (f) 100 mm, 13%. (g) 100 mm, 13%. (h) 100 mm, 13%. (i) 100 mm, 18%. (j) 150 mm, 13%. (k) 100 mm, 18%. (l) 150 mm, 13%. IV. OBJECTIVE QUALITY METRIC An objective metric for crosstalk perception can be developed based on modeling 2-D and 3-D perceptual attributes of crosstalk. Here, we will explain what kinds of existing maps can reflect the perceptual attributes, how these maps are combined to construct a perceptual objective metric, and the experimental results of the metric. A. 2-D Perceptual Attributes Map The 2-D perceptual attributes were illustrated in Fig. 4 using the left eye view with crosstalk added distortion as in (4). It can be noticed that shadow degree, separation distance of crosstalk, and their interaction are most visible in the edge region with high contrast. The Structural SIMilarity (SSIM) quality measure proposed by Wang et al. [23] can describe the 2-D perceptual attributes of crosstalk to some extent. SSIM assumes that the measurement on structural information provides a good estimation of the perceived image quality because the HVS is highly adapted to extract structural information from a visual scene. A MATLAB implementation of the SSIM is accessible from [24]. In addition, an SSIM map of a test image is also provided, which allows a closer look at specific regions instead of the entire image. Considering the combination of 2-D and 3-D perceptual attributes in a single objective metric, the SSIM map with quality measure on all the pixels is preferred, as opposed to the SSIM with a single quality measure for the entire image. SSIM is constructed based on the comparisons of three components: luminance, contrast, and structure, between an original image without any distortions and its degraded version. In our case, the original image is the one shown on the stereoscopic display without any crosstalk , and the distorted version is the one with both system-introduced and simulated crosstalk . Finally, the SSIM map is defined as follows: (5) where denotes the SSIM algorithm and is the generated SSIM map of the left eye view. Fig. 5 is a representative illustration of SSIM map derived from the crosstalk distorted Champagne and Dog presentations in Fig. 4. In the SSIM map, 0 (black) at a pixel means the largest difference between the original and crosstalk added image and 1 (white) denotes no difference. For Champagne, it can be seen that when the crosstalk level is larger in the first column, the shadow degree of crosstalk represented by the SSIM map is darker. Also, when the camera distance is larger in the second column, the separation distance of crosstalk represented by the SSIM map becomes wider. In addition, their mutually interaction is also described by the SSIM map when the shadow degree and separation distance change synchronously in the first and third rows. The same situation exists with the Dog presentation. Moreover, it can be seen that different shadow degrees and separation distances of crosstalk for Champagne and Dog with the 334 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012 presentation. Therefore, the foreground of Champagne is much closer to its nearest depth plane than that of Dog. The region of visible crosstalk is also defined based on the SSIM map, because we observed that crosstalk is more visible in the regions where the pixel value of SSIM map is smaller than a threshold. A threshold of 0.977 was obtained experimentally from our experiments. Therefore, the following equation is used to define the filtered depth map as 3-D perceptual attribute map: Fig. 6. Illustrations of depth map of Champagne and Dog when camera baseline is 150 mm. (7) denotes the filtered where and are the pixel index, and depth map corresponding to the visible crosstalk region of left eye image. C. Objective Metric for Crosstalk Perception Fig. 7. Illustrations of filtered depth map of Champagne and Dog when camera baseline is 150 mm and crosstalk level is 3%. same camera baseline and crosstalk level can also be expressed by the SSIM map, which means that the scene content difference is also characterized. Thus, the SSIM map can reflect these 2-D perceptual attributes, namely, shadow degree of crosstalk, separation distance of crosstalk, and their interactions. As aforementioned, the 2-D and 3-D perceptual attributes can be represented by the SSIM and filtered depth maps. Therefore, the overall crosstalk perception is supposed to be an integration of the two maps. Since 3-D perceptual attributes discover that visible crosstalk of foreground objects has more impacts on perception than background objects, more weights should be assigned to the visible crosstalk of foreground than background. In other words, SSIM map should be further weighted by filtered depth map. Thus, the integration is performed in the following equation: (8) B. 3-D Perceptual Attribute Map Spatial position of crosstalk describes users’ perception of crosstalk in 3-D space, which can be characterized because visible crosstalk of foreground objects should have more impact on perception than background objects. Therefore, in order to form a 3-D perceptual attribute map, depth structure of scene content and region of visible crosstalk should be combined. Relative depth structure of scene content can be represented by the depth map. Depth estimation algorithms are usually performed in two approaches: 1) from one single image using monocular cues and 2) from stereo or multi images using stereo (triangulation) cues. The latter is usually more accurate but requires corresponding intrinsic and extrinsic camera parameters of the stereo images. Since the performance of the metric relies on the accuracy of the depth estimation algorithm, we adopt the latter approach and the Depth Estimation Reference Software (DERS) [25] version 4.0 was employed in this paper. The depth map of the original right eye view is calculated as follows: (6) where denotes the DERS algorithm proposed in [24] and is the generated depth map of the right view. is normalized to represent a relative 3-D depth in which 0 denotes the farthest depth value and 255 the nearest. Fig. 6 gives an example of the depth map of Champagne and Dog. The farthest and nearest depth values are 7.7 and 2.0 m for Champagne, and 8.2 and 2.5 m for Dog, respectively. However, they are both normalized by a same factor 5.7 m. It can be seen that the foreground object champagne is much brighter than that in the Dog (9) where and denote the combined map and the quality value predicted by the objective metric, respectively. denotes the average operation. In (8), the filtered depth map is normalized into the interval first by the maximum depth value 255, and then subtracted it from 1 to comply with the meaning of SSIM map that a lower pixel value in SSIM map means a larger crosstalk distortion. When two pixels with identical values in the SSIM map locate in the foreground and background, respectively, the value of the foreground pixel will be smaller than the background pixel after combining with the filtered depth map. D. Experimental Results The performance of an objective quality metric can be evaluated by a comparison with respect to the MOS values obtained in subjective tests. The proposed metric was compared with and as well as other three traditional 2-D metrics metrics , , and , which combine the 2-D and 3-D perceptual attributes in different approaches as in the following equations: (10) (11) (12) (13) XING et al.: ASSESSMENT OF STEREOSCOPIC CROSSTALK PERCEPTION 335 TABLE VII EVALUATION RESULTS OF DIFFERENT METRICS ON SUBJECTIVE DATASET if if (14) (15) (16) and are the 2-D metrics calculated between where the original and crosstalk added left image, instead of original left and right images. is a combination of SSIM map and the depth map instead of the filtered depth map as in the metric . This means that weights the entire image while only weights the region of visible crosstalk in the image. denotes the disparity map of the right eye image, which is the result of a combination of relative depth structure of scene content and camera settings, such as camera baseline. Since also contains the information about relative depth structure of scene content, we attempt to compare the performance of the different metrics based on the disparity and depth maps, respectively. In the equations, the filtered disparity map was obtained from using the same approach as from , and metrics and followed the same combination as in building and , respectively. Particularly, we adopted a stereo correspondence algorithm using the Sum of Squared Difference plus Min Filter to estimate the disparity map [26] in (13). The disparity map is a gray image with black denoting the smallest disparity 0 pixel and white being the largest 255 pixel. For evaluating each metric , root mean squared error (RMSE), Pearson correlation coefficient, and Spearman rank-order correlation coefficient have been selected as the evaluation criteria. They are calculated between objective after a nonlinear regression using values (17) suggested by the VQEG, and the subjective scores MOS, where , , and are the regression coefficients, is the raw value calculated from metric , and is the exponential function. The main purpose of (17) is to unify for each metric to the range of MOS. Table VII reports the evaluation results. According to the evaluation results, the objective metric for can achieve a higher correlation crosstalk perception against the subjective MOS values when compared to traditional 2-D metrics ( and ), and other metrics ( , and ). The performance of the proposed metric is better than and , which indicates that the metric taking 3-D characteristics into consideration can give a better Fig. 8. Scatter plot of MOS of crosstalk perception versus predicted values . prediction of crosstalk perception than 2-D metrics. However, different combinations of 3-D characteristics might have different prediction capabilities. It can be seen from Table VII that has a worse performance than its counterpart the metric . Moreover, the performance of and is better than the corresponding metrics and , respectively. This indicates that weighting the region of visible crosstalk only might be in accordance with users’ perception. However, exhibit slightly poorer performance when compared to the proposed metric , which implies that relative depth instead of absolute depth is more suitable for the weighting operation. Therefore, that models the perceptual attributes of crosstalk has the best performance. As the Pearson correlation of the proposed metric is 88.4%, it is promising for evaluating the crosstalk perception of stereoscopic images. In order to have a closer look at the proposed metric of crosstalk perception in our subjective dataset, we validated its performance on different scene contents. Fig. 8 shows the scatter plot of the MOS values versus predicted quality values on different scene contents. Based on the experimental results, the performance of the proposed metric does not have a significant difference between scene content while the impairments levels can significantly influence the performance. In particular, the proposed metric has better performance in predicting crosstalk perception of stereoscopic images with low and high impairments than images with medium impairments. We think that the performance difference might originate from the filtered depth map where the dominating perception on the maximum crosstalk of different impairments levels should be considered. However, this conclusion needs to be further verified in future work. 336 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012 V. CONCLUSION We have conducted subjective tests for stereoscopic crosstalk perception with varying parameters of scene content, camera baseline, and crosstalk level. The statistical results show that crosstalk level, camera baseline, and scene content have significant impacts on crosstalk perception, respectively, and they have 2-factors interactions in terms of the impact on crosstalk perception. Moreover, the perceptual attributes (shadow degree, separation distance, and spatial position) of crosstalk are summarized by observing the visual variations when the significant factors (crosstalk level, camera baseline and scene content) change. These perceptual attributes are the sensorial results of the HVS and classified into two categories: 2-D and 3-D perceptual attributes. Subsequently, an objective metric for crosstalk perception has been proposed by combining SSIM map and filtered depth map. The experimental results with respect to our subjective evaluation scores have demonstrated promising performance of this metric, achieving more than 88% correlation with the MOS results. The performance of the proposed quality metrics is better than traditional 2-D models and other compared metrics with different combination methods. ACKNOWLEDGMENT The authors would like to thank Prof. L. A. Rønningen for kindly allowing us to use his lab Caruso for subjective tests and Dr. J, Xu for his helpful discussions and suggestions throughout this work. REFERENCES [1] M. Lambooij, W. A. IJsselsteijn, M. Fortuin, and I. Heynderickx, “Visual discomfort in stereoscopic displays: A review,” J. Imag. Sci. Technol., vol. 53, no. 3, pp. 1–14, 2009. [2] A. Boev, D. Hollosi, A. Gotchev, and K. Egiazarian, “Classification and simulation of stereoscopic artifacts in mobile 3DTV content,” in Proc. Stereoscopic Displays and Applications XX, San Jose, CA, 2009, pp. 72371F–12. [3] P. J. H. Seuntiens, L. M. J. Meesters, and W. A. IJsselsteijn, “Perceptual attributes of crosstalk in 3-D images,” Displays, vol. 26, no. 4-5, pp. 177–183, 2005. [4] A. Woods, “Understanding crosstalk in stereoscopic displays,” in Proc. Conf. Three-Dimensional Syst. Applic., Tokyo, Japan, 2010. [5] S. Pastoor, “Human factors of 3-D images: Results of recent research at Heinrich-Hertz-Institut Berlin,” in Proc. Int. Display Workshop, 1995, vol. 3, pp. 69–72. [6] K. C. Huang et al., “A study of how crosstalk affects stereopsis in stereoscopic displays,” in Proc. Conf. Stereoscop. Displays Virtual Reality Systems X, Santa Clara, CA, 2003, vol. 5006, pp. 247–253. [7] L. Lipton, “Factors affecting ’ghosting’ in time-multiplexed plano-stereoscopic CRT display systems,” True 3-D Imaging Tech. Display Technol., vol. 761, pp. 75–78, 1987. [8] F. L. Kooi and A. Toet, “Visual comfort of binocular and 3-D displays,” Displays, vol. 25, pp. 99–108, 2004. [9] A. Benoit, P. L. Callet, P. Campisi, and R. Cousseau, “Quality assessment of stereoscopic images,” EURASIP J. Image Video Process., vol. 2008, 2008. [10] J. You, G. Jiang, L. Xing, and A. Perkis, “Quality of visual experience for 3-D presentation: Stereoscopic image,” in High-Quality Visual Experience: Creation, Processing and Interactivity of High-resolution and High-dimensional Video Signals. Berlin, Germany: Springer-Verlag, 2010. [11] R. Olsson and M. Sjostrom, “A depth dependent quality metric for evaluation of coded integral imaging based 3D-images,” in Proc. 3DTV Conf.: True Vision–Capture, Transmission and Display of 3-D Video, Kos Island, Greece, 2007. [12] P. Gorley and N. Holliman, “Stereoscopic image quality metrics and compression,” in Proc. Conf. Stereoscop. Displays Applic. XIX, San Jose, CA, 2008, vol. 6803. [13] Z. M. P. Sazzad, S. Yamanaka, Y. Kawayokeita, and Y. Horita, “Stereoscopic image quality prediction,” in Proc. Int. Workshop Quality of Multimedia Experience, San Diego, CA, 2009. [14] D. Kim et al., “Depth map quality metric for three-dimensional video,” in Proc. Stereoscop. Displays Applic. XX, San Jose, CA, 2009, vol. 7237. [15] L. Xing, T. Ebrahimi, and A. Perkis, “Subjective evaluation of stereoscopic crosstalk perception,” in Proc. Conf. Vis. Commun. Image Process., Huang Shan, China, 2010, vol. 7744, pp. 77441V–77441V-9. [16] L. Xing, J. You, T. Ebrahimi, and A. Perkis, “A perceptual quality metric for stereoscopic crosstalk perception,” in Proc. IEEE Int. Conf. Image Process., Hong Kong, China, 2010, pp. 4033–4036. [17] Methodology for the Subjective Assessment of the Quality of Television Pictures, , Recommendation BT.500–11, ITU-R, 2002. [18] Subjective Assessment of Stereoscopic Television Pictures, Recommendation BT.1438, ITU-R, 2000. [19] W. Chen, J. Fournier, B. Marcus, and L. C. Patrick, “New requirements of subjective video quality assessment methodologies for 3DTV,” in Proc. Int. Workshop Video Process. Quality Metrics Consum. Electron., Scottsdale, AZ, 2010. [20] , ISO/IEC JTC1/SC29/WG11, M15377, M15378, M15413, M15419, Archamps, France, 2008. [21] A. Boev, D. Hollosi, and A. Gotchev, “Software for simulation of artefacts and database of impaired videos, software for simulation of artefacts and database of impaired videos,” Mobile3DTV Project Rep. 216503 [Online]. Available: http://mobile3dtv.eu [22] L. Goldmann, F. D. Simone, and T. Ebrahimi, “Acomprehensive database and subjective evaluation methodology for quality of experience in stereoscopic video,” in Proc. Three-Dimensional Image Process. Applic., San Jose, CA, 2010, vol. 7526. [23] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004. [24] SSIM Implementation [Online]. Available: http://www.ece.uwaterloo.ca/~z70wang/research/ssim/ [25] M. Tanimoto et al., “Reference softwares for depth estimation and view synthesis,” Archamps, France, 2008, ISO/IEC JTC1/SC29/WG11 MPEG2008/M15377. [26] D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” J. Comput. Vis., vol. 47, no. 1-3, pp. 7–42, 2002. Liyuan Xing (S’11) was born in China 1981. She received the M.S. degree in computer applied technology from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, in 2006. She is currently working toward the Ph.D. degree at the Centre for Quantifiable Quality of Service (Q2S), Norwegian University of Science and Technology (NTNU), Trondheim, Norway. During her graduate work, she worked on contentbased multimedia analysis. She was a Senior Software Engineer with the R&D Center of Toshiba in China, where she was involved with medical image processing and healthcare enterprise integrating from 2006 to 2008. Her current research interests are quality assessment and modeling within new application scenarios, namely in the realm of 3-D, virtual environments, gaming, and 3DTV like scenarios, especially focus on stereoscopic media. Ms. Xing was the recipient of the Chinese Academy of Sciences Liu Yongling Scholarship Excellence Award in July 2006. XING et al.: ASSESSMENT OF STEREOSCOPIC CROSSTALK PERCEPTION Junyong You (M’08) received the B.S. and M.S. degrees in computational mathematics and Ph.D. degree in electronics and information engineering from Xi’an Jiaotong University, Xi’an, China, in 1998, 2001, and 2007, respectively. He is currently a Research Scientist with the Department of Electronics and Telecommunications, Norwegian University of Science and Technology, Trondheim, Norway. He was a Senior Researcher with Tampere University of Technology, Finland, from 2007 to 2009. He is the author or coauthor of more than 45 articles for scientific books, journals and conferences. His research interests include semantic multimedia content analysis, visual attention mechanism and modeling, vision technology, and multimedia quality of experience definition, measurement and improvement. He is serving as an editorial board member for two international journals and has served as a TPC member for several international conferences and workshops. He was the local arrangement co-chair of the second International Workshop on Quality of Multimedia Experience (QoMEX’10) and is the general co-chair of International Workshop on Multimedia Quality of Experience: Modeling, Evaluation and Directions (MQoE’11). Touradj Ebrahimi (M’08) received the M.Sc. and Ph.D. degrees from the Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland, in 1989 and 1992, respectively, both in electrical engineering. In 1993, he was a Research Engineer with the Corporate Research Laboratories, Sony Corporation, Tokyo, Japan, where he conducted research on advanced video compression techniques for storage applications. In 1994, he served as a Research Consultant with AT&T Bell Laboratories working on very low bitrate video coding. He is currently a Professor with EPFL, heading its Multimedia Signal Processing Group. He is also adjunct Professor with the Center of Quantifiable Quality of Service at Norwegian University of Science and Technology, Trondheim, Norway. He has been very active in the field of multimedia signal processing with a special emphasis on quality evaluations and metrics. In February 2001, he was the first to suggest the notion of Quality of Experience (QoE) be used in multimedia communication systems as a dual and complement to Quality of Service (QoS). He has authored or coauthored more than 40 publications in the field of quality metrics for various multimedia content, an initiator of Advanced Image Coding within JPEG standardization committee, which develops subjective and objective quality metrics for future imaging systems, one of those responsible for the subjective evaluations of MPEG’s next generation video compression standard, in March 2010, was one of the founding members, and the first general co-chair of International Workshop on Quality of Multimedia Experience (QoMEX). He is the principal investigator and Chair of the COST Action IC1003 (QUALINET) which gathers more than 150 researchers in a consortium related to QoE issues in multimedia systems and services. 337 Andrew Perkis (SM’02) was born in Norway in 1961. He received the Siv.Ing and Dr. Techn. degrees in 1985 and 1994, respectively, and the M.S. degree in technology management in cooperation from Norwegian University of Science and Technology, Trondheim, Norway, NHH, and the National University of Singapore in 2008. Since 1993, he became an Associate Professor with the Department of Telecommunications, Norwegian University of Science and Technology, Trondheim, Norway, and Full Professor since 2003. In 1999/2000, he was a Visiting Professor with The University of Wollongong, Australia, and in 2008 a Visiting Professor with the National University of Singapore. He is responsible for “Network Media Handling” within the National Centre of Excellence–Quantifiable Quality of Service in communication systems at NTNU. Currently he is focusing on Multimedia Signal Processing, specifically within methods and functionality of content representation, quality assessment and its use within the media value chain in a variety of applications. He is also involved in setting up directions and visions for new research within media technology and art. Within applied research he is heavily involved in multi platform publishing, especially to handheld devices. He has been involved with the start-up company Adactus and commercial aspects of Digital Cinema role out through running the Norwegian trial project NORDIC. He has authored or coauthored more than 200 publications in international conferences and workshops and has more than 50 contributions to International standards bodies. Dr. Perkis is member of The Norwegian Academy of Technological Sciences (NTVA), the Association for Computing Machinery, The Norwegian Society of Chartered Engineers (TEKNA), and The Norwegian Signal Processing Society (NORSIG).