Statistical Tuning of Adaptive-Weight Depth Map Algorithm

Hoyos, Alejandro; Congote, John; Barandiaran, Iñigo; Acosta, Diego; Ruiz, Oscar

Statistical Tuning of Adaptive-Weight Depth Map Algorithm

Diego Acosta

visibility

…

description

10 pages

link

1 file

In depth map generation, the settings of the algorithm parameters to yield an accurate disparity estimation are usually chosen empirically or based on unplanned experiments. A systematic statistical approach including classical and exploratory data analyses on over 14000 images to measure the relative influence of the parameters allows their tuning based on the number of bad_pixels. Our approach is systematic in the sense that the heuristics used for parameter tuning are supported by formal statistical methods. The implemented methodology improves the performance of dense depth map algorithms. As a result of the statistical based tuning, the algorithm improves from 16.78% to 14.48% bad_pixels rising 7 spots as per the Middlebury Stereo Evaluation Ranking Table. The performance is measured based on the distance of the algorithm results vs. the Ground Truth by Middlebury. Future work aims to achieve the tuning by using significantly smaller data sets on fractional factorial and surface-response designs of experiments.

Figures (8)

Table 1. User-specified parameters of the adaptive weight algorithm and filters

Table 3. Result metrics computed by the Middlebury Stereomatcher evaluator

Table 2. User-specified parameters of the adaptive weight algorithm
Parameter — Table 2. User-specified parameters of the adaptive weight algorithm Parameter

Fig. 2. bad_pixels and other output correlation

bad_pixels aw_win aw_col m_win cc_disp cb_win cb-_col

Statistical Tuning of Adaptive-Weight Depth Map Algorithm Alejandro Hoyos1, John Congote1,2 , Iñigo Barandiaran2, Diego Acosta3 , and Oscar Ruiz1 1 CAD CAM CAE Laboratory, EAFIT University, Medellin, Colombia {ahoyossi,oruiz}@eafit.edu.co 2 Vicomtech Research Center, Donostia-San Sebastián, Spain {jcongote,ibarandiaran}@vicomtech.org 3 DDP Research Group, EAFIT University, Medellin, Colombia dacostam@eafit.edu.co Abstract. In depth map generation, the settings of the algorithm parameters to yield an accurate disparity estimation are usually chosen empirically or based on unplanned experiments. A systematic statistical approach including classical and exploratory data analyses on over 14000 images to measure the relative influence of the parameters allows their tuning based on the number of bad pixels. Our approach is systematic in the sense that the heuristics used for parameter tuning are supported by formal statistical methods. The implemented methodology improves the performance of dense depth map algorithms. As a result of the statistical based tuning, the algorithm improves from 16.78% to 14.48% bad pixels rising 7 spots as per the Middlebury Stereo Evaluation Ranking Table. The performance is measured based on the distance of the algorithm results vs. the Ground Truth by Middlebury. Future work aims to achieve the tuning by using significantly smaller data sets on fractional factorial and surface-response designs of experiments. Keywords: Stereo Image Processing, Parameter Estimation, Depth Map. 1 Introduction Depth map calculation deals with the estimation of multiple object depths on a scene. It is useful for applications like vehicle navigation, automatic surveillance, aerial cartography, passive 3D scanning, automatic industrial inspection, or 3D videoconferencing [1]. These maps are constructed by generating, at each pixel, an estimation of the distance between the screen and the object surface (depth). Disparity is commonly used to describe inverse depth in computer vision, and also to measure the perceived spatial shift of a feature observed from close camera viewpoints. Stereo correspondence techniques often calculate a disparity function d (x, y) relating target and reference images, so that the (x, y) coordinates of the disparity space match the pixel coordinates of the reference image. Stereo methods commonly use a pair of images taken with known camera geometry to A. Berciano et al. (Eds.): CAIP 2011, Part II, LNCS 6855, pp. 563–572, 2011. c Springer-Verlag Berlin Heidelberg 2011 564 A. Hoyos et al. generate a dense disparity map with estimates at each pixel. This dense output is useful for applications requiring depth values even in diﬃcult regions like occlusions and textureless areas. The ambiguity of matching pixels in heavy textured or textureless zones tends to require complex and expensive overall image processing or statistical correlations using color and proximity measures in local support windows. Most implementations of vision algorithms make assumptions about the visual appearance of objects in the scene to ease the matching problem. The steps generally taken to compute the depth maps may include: (i) matching cost computation, (ii) cost or support aggregation, (iii) disparity computation or optimization, and (iv) disparity refinement. This article is based on work done in [1] where the principles of the stereo correspondence techniques and the quantitative evaluator are discussed. The literature review is presented in section 2, followed by section 3 describing the algorithm, filters, statistical analysis and experimental set up. Results and discussions are covered in section 4, and the article is concluded in section 5. 2 Literature Review The algorithm and filters use several user-specified parameters to generate the depth map of an image pair, and their settings are heavily influenced by the evaluated data sets [2]. Published works usually report the settings used for their specific case studies without describing the procedure followed to fine-tune them [3,4,5], and some explicitly state the empirical nature of these values [6]. The variation of the output as a function of several settings on selected parameters is briefly discussed while not taking into account the eﬀect of modifying them all simultaneously [3,2,7]. Multiple stereo methods are compared choosing values based on experiments, but only some algorithm parameters are changed not detailing the complete rationale behind the value setting [1]. 2.1 Conclusions of the Literature Review Commonly used approaches in determining the settings of depth map algorithm parameters show all or some of the following shortcomings: (i) undocumented procedures for parameter setting, (ii) lack of planning when testing for the best settings, and (iii) failure to consider interactions of changing all the parameters simultaneously. As a response to these shortcomings, this article presents a methodology to fine-tune user-specified parameters on a depth map algorithm using a set of images from the adaptive weight implementation in [4]. Multiple settings are used and evaluated on all parameters to measure the contribution of each parameter to the output variance. A quantitative accuracy evaluation allows using main eﬀects plots and analyses of variance on multi-variate linear regression models to select the best combination of settings for each data set. The initial results are improved by setting new values of the user-specified parameters, allowing the algorithm to give much more accurate results on any rectified image pair. Statistical Tuning of Adaptive-Weight Depth Map Algorithm 3 3.1 565 Methodology Image Processing In the adaptive weight algorithm ([3,4]), a window is moved over each pixel on every image row, calculating a measurement based on the geometric proximity and color similarity of each pixel in the moving window to the pixel on its center. Pixels are matched on each row based on their support measurement with larger weights coming from similar pixel colors and closer pixels. The horizontal shift, or disparity, is recorded as the depth value, with higher values reflecting greater shifts and closer proximity to the camera. The strength of grouping by color (fs (cp , cq )) for pixels p and q is defined as the Euclidean distance between colors (∆cpq ) by Equation (1). Similarly, grouping strength by distance (fp (gp , gq )) is defined as the Euclidean distance between pixel image coordinates (∆gpq ) by Equation (2). Where γc and γp are adjustable settings used to scale the measured color delta and window size respectively. ∆cpq fs (cp , cq ) = exp − (1) γc ∆gpq fp (gp , gq ) = exp − γp (2) The matching cost between pixels shown in Equation (3) is measured by aggregating raw matching costs, using the support weights defined by Equations (1) and (2), in support windows based on both the reference and target images. c∈{r,g,b} |Ic (q) − Ic (q̄d )| q∈Np ,q̄d ∈Np̄d w (p, q) w (p̄d , q̄d ) E (p, p̄d ) = (3) q∈Np ,q̄d ∈Np̄ w (p, q) w (p̄d , q̄d ) d where w (p, q) = fs (cp , cq ) · fp (gp , gq ), p̄d and q̄d are the target image pixels at disparity d corresponding to pixels p and q in the reference image, Ic is the intensity on channels red (r), green (g), and blue (b), and Np is the window centered at p and containing all q pixels. The size of this movable window N is another user-specified parameter. Increasing the window size reduces the chance of bad matches at the expense of missing relevant scene features. Post-Processing Filters. Algorithms based on correlations depend heavily on finding similar textures at corresponding points in both reference and target images. Bad matches happen more frequently in textureless regions, occluded zones, and areas with high variation in disparity. The winner takes all approach enforces uniqueness of matches only for the reference image in such a way that points on the target image may be matched more than once, creating the need to check the disparity estimates and fill any gaps with information from neighboring pixels using post-processing filters like the ones shown in Table 1. 566 A. Hoyos et al. Table 1. User-specified parameters of the adaptive weight algorithm and filters Filter Adaptive Weight [3] Median Crosscheck[8] Bilateral[9] Function User-specified parameter Disparity estimation and γaws : similarity factor, γawg : proximity factor pixel matching related to the WAW pixel size of the support window Smoothing and incorrect WM : pixel size of the median window match removal Validation of disparity ∆d : allowed disparity diﬀerence measurement per pixel Intensity and proximity γbs : similarity factor, γbg : proximity factor reweighted smoothing with lated to the WB pixel size of the bilateral window edge preservation Median Filter. They are widely used in digital image processing to smooth signals and to remove incorrect matches and holes by assigning neighboring disparities at the expense of edge preservation. The median filter provides a mechanism for reducing image noise, while preserving edges more eﬀectively than a linear smoothing filter. It sorts the intensities of all the q pixels on a window of size M and selects the median value as the new intensity of the p central pixel. The size M of the window is another of the user-specified parameters. Cross-check Filter. The correlation is performed twice by reversing the roles of the two images and considering valid only those matches having similar depth measures at corresponding points in both steps. The validity test is prone to fail in occluded areas where disparity estimates will be rejected. The allowed diﬀerence in disparities is one more adjustable parameter. Bilateral Filter. Is a non-iterative method of smoothing images while retaining edge detail. The intensity value at each pixel in an image is replaced by a weighted average of intensity values from nearby pixels. The weighting for each pixel q is determined by the spatial distance from the center pixel p, as well as its relative diﬀerence in intensity, defined by Equation (4). q∈W fs (q − p) gi (Iq − Ip )Iq Op = (4) q∈W fs (q − p) gi (Iq − Ip ) where O is the output image, I the input image, W the weighting window, fs the spatial weighing function, and gi the intensity weighting function. The size of the window W is yet another parameter specified by the user. 3.2 Statistical Analysis The user-specified input parameters and output accuracy measurements data is statistically analyzed measuring the relations amongst inputs and outputs with correlation analyses, while box plots give insight on the influence of groups Statistical Tuning of Adaptive-Weight Depth Map Algorithm 567 of settings on a given factor. A multi-variate linear regression model shown in Equation (5) relates the output variable as a function of all the parameters to find the equation coeﬃcients, correlation of determination, and allows the analysis of variance to measure the influence of each parameter on the output variance. Residual analyses are checked to validate the assumptions of the regression model like constant error variance, and mean of errors equal to zero, and if necessary, the model is transformed. The parameters are normalized to fit the range (−1, 1) as shown in Table 2. n ŷ = β0 + (5) βi xi + ǫ i=1 where ŷ is the predicted variable, xi are the factors, and βi are the coeﬃcients. 3.3 Experimental Set Up The depth maps are calculated with an implementation developed for real time videoconferencing in [4]. Using well-known rectified image sets: Cones from [1], Teddy and Venus from [10], and Tsukuba head and lamp from the University of Tsukuba. Other commonly used sets are also freely available [11,12]. The sample used consists of 14688 depth maps, 3672 for each data set, like the ones shown in Figure 1. Fig. 1. Depth Map Comparison. Top: best initial, bottom: new settings. (a) Cones, (b) Teddy, (c) Tsukuba, and (d) Venus data set. Many recent stereo correspondence performance studies use the Middlebury Stereomatcher for their quantitative comparisons [2,7,13]. The evaluator code, sample scripts, and image data sets are available from the Middlebury stereo vision site1 , providing a flexible and standard platform for easy evaluation. 1 http://vision.middlebury.edu/stereo/ 568 A. Hoyos et al. Table 2. User-specified parameters of the adaptive weight algorithm Parameter Adaptive Weights Window Size Adaptive Weights Color Factor Median Window Size Cross-Check Disparity Delta Cross-Bilateral Window Size Cross-Bilateral Color Factor Name aw win aw col m win cc disp cb win cb col Levels Values 4 [1 3 5 7] 6 [4 7 10 13 16 19] 3 [N/A 3 5] 4 [N/A 0 1 2] 5 [N/A 1 3 5 7] 7 [N/A 4 7 10 13 16 19] Coding [-1 -0.3 0.3 1] [-1 -0.6 -0.2 0.2 0.6 1] [N/A -1 0.2 1] [N/A -1 0 1] [N/A -1 -0.3 0.3 1] [N/A -1 -0.6 -0.2 0.2 0.6 1] The online Middlebury Stereo Evaluation Table gives a visual indication of how well the methods perform with the proportion of bad pixels (bad pixels) metric defined as the average of the proportion of bad pixels in the whole image (bad pixels all), the proportion of bad pixels in non-occluded regions (bad pixels nonocc), and the proportion of bad pixels in areas near depth discontinuities (bad pixels discont) in all data sets. 4 Results and Discussion 4.1 Variable Selection Pearson correlation of the factors show that they are independent and that each one must be included in the evaluation. On the other hand, a strong correlation amongst bad pixels and the other outputs is detected and shown in Figure 2. This allows the selection of bad pixels as the sole output because the other responses are expected to follow a similar trend. Other output are explain in the Table 3. Table 3. Result metrics computed by the Middlebury Stereomatcher evaluator Parameter rms error all rms error nonocc rms error occ rms error textured rms error textureless rms error discont bad pixels all bad pixels nonocc bad pixels occ bad pixels textured bad pixels textureless bad pixels discont evaluate only output params depth map Description Root Mean Square (RMS) disparity error (all pixels) RMS disparity error (non-occluded pixels only) RMS disparity error (occluded pixels only) RMS disparity error (textured pixels only) RMS disparity error (textureless pixels only) RMS disparity error (near depth discontinuities) Fraction of bad points (all pixels) Fraction of bad points (non-occluded pixels only) Fraction of bad points (occluded pixels only) Fraction of bad points (textured pixels only) Fraction of bad points (textureless pixels only) Fraction of bad points (near depth discontinuities) Read specified depth map and evaluate only Text file logging all used parameters Evaluated image Statistical Tuning of Adaptive-Weight Depth Map Algorithm 569 Fig. 2. bad pixels and other output correlation 4.2 Exploratory Data Analysis Box plots analysis of bad pixels presented in Figure 3(a) show lower output values from using filters, relaxed cross-check disparity delta values, large adaptive weight window sizes, and large adaptive weight color factor values. The median window size, bilateral window size, and bilateral window color values do not show a significant influence on the output at the studied levels. The influence of the parameters is also shown on the slopes of the main eﬀects plots of Figure 4 and confirms the behavior found with the ANOVA of the multi-variate linear regression model. The settings to lower bad pixels from this analysis yields a result of 14.48%. (a) Box Plots (b) ANOVA proportion of bad pixels Fig. 3. (a) Box Plots of bad pixels. (b) Contribution to the bad pixels variance by parameter. 4.3 Multi-variate Linear Regression Model The analysis of variance on a multi-variate linear regression (MVLR) over all data sets using the most parsimonious model quantifies the parameters with the most influence as shown in Figure 3(b). cc disp is the most significant factor accounting for a third to a half of the variance on every case. 570 A. Hoyos et al. Interactions and higher order terms are included on the multi-variate linear regression models to improve the goodness of fit. Reducing the number of input images per dataset from 3456 to 1526 by excluding the worst performing cases corresponding to cc disp = 0 and aw col = [4, 7], allows using a cubic model with interactions and an R2 of 99.05%. The residuals of the selected model fail to follow a normal distribution. Transforming the output variable or removing large residuals does not improve the residuals distribution, and there are no reasons to exclude any outliers from the image data set. Nonetheless, improved algorithm performance settings are found using the model to obtain lower bad pixels values comparable to the ones obtained through the exploratory data analysis (14.66% vs. 14.48%). In summary, the most noticeable influence on the output variable comes from having a relaxed cross-check filter, accounting for nearly half the response variance in all the study data sets. Window size is the next most influential factor, followed by color factor, and finally window size on the bilateral filter. Increasing the window sizes on the main algorithm yield better overall results at the expense of longer running times and some foreground loss of sharpness, while the support weights on each pixel have the chance of becoming more distinct and potentially reduce disparity mismatches. Increasing the color factor on the main algorithm allows better results by reducing the color diﬀerences, and slightly compensating minor variations in intensity from diﬀerent viewpoints. A small median smoothing filter window size is faster than a larger one, while still having a similar accuracy. Low settings on both the window size and the color factor on the bilateral filter seem to work best for a good balance between performance and accuracy. Fig. 4. Main Eﬀects Plots of each factor level for all data sets. Steeper slopes relate to bigger influence on the variance of the bad pixels output measurement. The optimal settings in the original data set are presented in Table 4 along with the proposed combinations. Low settings comprise the depth maps with all their parameter settings at each of their minimum tested values yielding 67.62% bad pixels. High settings relates to depth maps with all their parameter settings at each of their maximum tested values yielding 19.84% bad pixels. Best initial are the most accurate depth maps from the study data set yielding Statistical Tuning of Adaptive-Weight Depth Map Algorithm 571 Table 4. Model comparison. Average bad pixels values over all data sets and their parameter settings. Run Type bad pixels aw win aw col m win cc disp cb win cb col Low Settings 67.62% 1 4 3 0 1 4 High Settings 19.84% 7 19 5 2 7 19 Best Initial 16.78% 7 19 5 1 3 4 Exploratory analysis 14.48% 9 22 5 1 3 4 MVLR optimization 14.66% 11 22 5 3 3 18 16.78% bad pixels. Exploratory analysis corresponds to the settings determined using the exploratory data analysis based on box plots and main eﬀects plots yielding 14.48% bad pixels. MVLR optimization is the extrapolation optimization of the classical data analysis based on multi-variate linear regression model, nested models, and ANOVA yielding 14.66% bad pixels. The exploratory analysis estimation and the MVLR optimization tend to converge at similar lower bad pixels values using the same image data set. The best initial and improved depth map outputs are shown in Figure 1. 5 Conclusions and Future Work This work presents a systematic methodology to measure the relative influence of the inputs of a depth map algorithm on the output variance and the identification of new settings to improve the results from 16.78% to 14.48% bad pixels. The methodology is applicable on any group of depth map image sets generated with an algorithm where the relative influence of the user-specified parameters merits to be assessed. Using design of experiments reduces the number of depth maps needed to carry out the study when a large image database is not available. Further analysis on the input factors should be started with exploratory experimental fractional factorial designs comprising the full range on each factor, followed by a response surface experimental design and analysis. In selecting the factor levels, analyzing the influence of each filter independently would be an interesting criterion. Acknowledgments. This work has been partially supported by the Spanish Administration Agency CDTI under project CENIT-VISION 2007-1007, the Colombian Administrative Department of Science, Technology, and Innovation; and the Colombian National Learning Service (COLCIENCIAS-SENA) grant No. 1216-479-22001. References 1. Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vision 47(1-3), 7–42 (2002) 2. Gong, M., Yang, R., Wang, L., Gong, M.: A performance study on diﬀerent cost aggregation approaches used in real-time stereo matching. Int. J. Comput. Vision 75, 283–296 (2007) 572 A. Hoyos et al. 3. Yoon, K., Kweon, I.: Adaptive support-weight approach for correspondence search. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 650 (2006) 4. Congote, J., Barandiaran, I., Barandiaran, J., Montserrat, T., Quelen, J., Ferrán, C., Mindan, P., Mur, O., Tarrés, F., Ruiz, O.: Real-time depth map generation architecture for 3d videoconferencing. In: 3DTV-Conference: The True VisionCapture, Transmission and Display of 3D Video (3DTV-CON), 2010, pp. 1–4 (2010) 5. Gu, Z., Su, X., Liu, Y., Zhang, Q.: Local stereo matching with adaptive supportweight, rank transform and disparity calibration. Pattern Recogn. Lett. 29, 1230–1235 (2008) 6. Hosni, A., Bleyer, M., Gelautz, M., Rhemann, C.: Local stereo matching using geodesic support weights. In: Proceedings of the 16th IEEE Int. Conf. on Image Processing (ICIP), pp. 2093–2096 (2009) 7. Wang, L., Gong, M., Gong, M., Yang, R.: How far can we go with local optimization in real-time stereo matching. In: Proceedings of the Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT 2006), pp. 129–136 (2006) 8. Fua, P.: A parallel stereo algorithm that produces dense depth maps and preserves image features. Machine Vision and Applications 6(1), 35–49 (1993) 9. Weiss, B.: Fast median and bilateral filtering. ACM Trans. Graph. 25, 519–526 (2006) 10. Scharstein, D., Szeliski, R.: High-accuracy stereo depth maps using structured light. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 195–202 (2003) 11. Scharstein, D., Pal, C.: Learning conditional random fields for stereo. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007) 12. Hirschmuller, H., Scharstein, D.: Evaluation of cost functions for stereo matching. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007) 13. Tombari, F., Mattoccia, S., Di Stefano, L., Addimanda, E.: Classification and evaluation of cost aggregation methods for stereo correspondence. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)

Log In

Statistical Tuning of Adaptive-Weight Depth Map Algorithm

Sign up for access to the world's latest research

Figures (8)

Related papers

Related papers

Related topics