Depth Estimation using Modified Cost Function for Occlusion Handling
DDepth Estimation using Modified Cost Function forOcclusion Handling
Krzysztof Wegner, Olgierd Stankiewicz and Marek Domanski
Abstract —The paper presents a novel approach to occlusionhandling problem in depth estimation using three views. A solu-tion based on modification of similarity cost function is proposed.During the depth estimation via optimization algorithms likeGraph Cut similarity metric is constantly updated so that onlynon-occluded fragments in side views are considered. At eachiteration of the algorithm non-occluded fragments are detectedbased on side view virtual depth maps synthesized from the bestcurrently estimated depth map of the center view. Then similaritymetric is updated for correspondence search only in non-occludedregions of the side views. The experimental results, conducted onwell-known 3D video test sequences, have proved that the depthmaps estimated with the proposed approach provide about 1.25dB virtual view quality improvement in comparison to the virtualview synthesized based on depth maps generated by the state-of-the-art MPEG Depth Estimation Reference Software.
Keywords —depth estimation, disparity estimation, occlusionhandling, MVD, graph cuts, DERS, Free viewpoint television.
I. I
NTRODUCTION video systems have recently gained a lot of attention.Many new 3D video systems have been developed.Among them super multiview television and free viewpointtelevision can be examples of such novel 3D systems. In thefree viewpoint television a user is able to freely choose aposition of a virtual camera. The requested view of a scene isgenerated from dynamic 3D representation of the scene.The most commonly used 3D representation is a MultiVideoand Depth (MVD) [6] composed of multiple videos acquiredby the set of cameras and accompanied depth maps for eachof the views. Based on transmitted videos and depth data anyview can be easily generated by employing depth-image-baserendering (DIBR) [7].Recently 3D extension of such standards as AVC [32],[33] and HEVC [31] that allows for efficient transmission ofdynamic 3D scene representation in MVD format has beenfinalized.Depth information in such systems can be acquired eitherdirectly by depth cameras [8], or indirectly by algorithmicdepth estimation from recorded videos [9]. Commonly depthinformation is obtained by the conversion from disparity infor-mation [10]. Although in computer vision, disparity d is oftentreated as synonymous with depth (distance z ), essentiallythose terms are the inverse of each other. z ∼ d (1) Krzysztof Wegner, Olgierd Stankiewicz and Marek Domanski are with theChair of Multimedia Telecommunications and Microelectronics, Poznan Uni-versity of Technology, Poznan, Poland (e-mail: [email protected])
Disparity is a displacement vector between correspondingfragments (pixels, blocks) of two images of the same scenetaken from different viewpoints. Those two correspondingfragments represent the same fragment of an observed scenebut seen from two different viewpoints.
Fig. 1. Three-view disparity estimation.
Stereo correspondence search is an active research topic incomputer vision, and one of the basic method of obtainingdisparity information. There are many stereo disparity estima-tion methods known. Comprehensive study of stereo disparityestimation methods can be found in [34], and on the Middle-bury webpage [30] containing up-to-date benchmark of stereodisparity estimation methods. In the scope of development ofmultiview systems stereo correspondence search was extendedto multiview correspondence search [11], [12], [35].For the sake of simplicity and accuracy, many algorithmsassume that images are taken by a rectified set of cameras [13],[14]. Consecutively, corresponding fragments of a given imagecan be found on the same horizontal line in the remainingimages.Some algorithms use three views (left, central and right)[15], [17], [16], [36] as inputs and produce disparity map ordepth map for the central view (Fig. 1). Often when it is notimportant which of left or right view is referred to, a name”side view” is used instead.During disparity estimation, for a given fragment of thecentral view, the algorithm searches for the correspondingfragment in the side views that represent the same frag-ment/portion of the scene.The correspondence search is done on the basis of SimilarityMetric which expresses how probable it is that a certain frag-ment of one image is the corresponding fragment of the secondimage. Although the metric used is often called similarity, a r X i v : . [ c s . MM ] N ov t actually expresses dissimilarity between fragments. Thereare many Similarity Metrics known from literature: Sum ofAbsolute Difference (SAD) Sum of Squared Difference (SSD),Rank, Census, Cross Correlation and other [3], [4].The correspondence search is often defined as an opti-mization problem in which for every fragment of the centralimage the best (the most similar) fragment of the side viewsis selected. This optimization problem maybe expressed interms of energy function using Markov Random Field (MRF)and optimized via one of the optimization algorithms such asBelief Propagation [19], [20], Dynamic Programming [22], orGraph Cut [21].Since input videos are captured by multiple cameras withdifferent positions, some parts of the observed scene can beoccluded, and thus not visible, within some of the views.Disparity estimation for those fragments of a scene is chal-lenging and requires special care. If the the algorithm do notproperly taking into account, possible occlusions within thescene, estimated disparity can be wrong. Estimated disparitycan indicated not truly corresponding fragments.In this paper a novel approach to occlusion handling de-signed to work in three-view disparity estimation algorithmsis proposed.II. O CCLUSION PROBLEM IN DISPARITY ESTIMATION
Given three images, center I C , left I L and right I R of thesame size, we search for such a displacement t for every pixelP of center view (at coordinates ( x, y ) ) that minimize costfunction expressing a similarity between pixel P (or smallfragment around the pixel P like block) and a corespondentpixel P’ (small fragment around pixel P’) displaced by t inside views (at positions ( x + t, y ) in left and ( x − t, y ) in rightview). Such displacement is then a disparity of a given pixelP of a center view. d Center ( x, y ) = min t Cost ( x, y, t ); (2)In disparity estimation based on / from three views (seeFig. 2), a given point of the scene visible from center viewcan be visible from both of the side views (point A), or onlyfrom one of the side views (left or right, point B), or fromneither of them (point C).If the given fragment of the scene visible from center viewis not visible from one or both of the side views, we say that Fig. 2. Occlusion in three-view disparity estimation problem. a given fragment of the scene is occluded in side view (is notvisible from that particular side view).The simplest method for detecting occluded fragments iscross-checking [23]. Cross-checking tests the consistency ofestimated disparity value for pixels from center view withthose estimated for pixels in left and right views. If thedisparity value estimated in each view is different for a corre-spondent triple of pixels from center, left and right views givenpixels are assumed to be occluded. Next, the disparity value foroccluded pixels are extrapolated from neighboring pixels thatare not occluded. In order to perform cross-checking, disparitymaps for all of the three views are required. Estimation of threedisparity maps is not always possible. Even if the estimationof three disparity maps instead of one is possible it is resourceand time consuming.Occlusion handling is performed by adding/putting addi-tional constraints, such as ordering constraint or uniquenessconstraint to objective function of optimization procedureslike Graph Cut (GC), Dynamic Programming (DP) or BeliefPropagation (BP) used to estimate the disparity map.The ordering constraint [24] imposes the same order ofcorresponding pixels in all views. If a pixel A is on the leftof pixel B in the center view, in the side view pixel A’ thatis a corresponding pixel of pixel A must be as well on theleft of pixel B’, a corresponding pixel of pixel B. In realscenes the ordering constraint can be violated in the case ofbig perspective change or in case of thin objects. In such casesordering constraint can introduce errors in estimated disparitymaps.The uniqueness constraint [25], [26] imposes the one-to-onecorrespondence between pixel in center and side views. If agiven pixel A of the central view is assigned to a correspondingpixel B in the side view, no other pixel of central view can beassigned to a correspondence with pixel B in side view. Thisway unique pixel to pixel correspondence is forced across allof the views.There are many disparity estimation algorithms known, thathandle occlusion in efficient way [5], [28], [26]. The maindrawback of all of those algorithms are additional constraints(terms) imposed in optimization procedures with increasedcomplexity and thus execution time of the disparity estimation.Another approach to occlusion handling is to change costterm (eq. 2) composed of similarity metric in optimizationalgorithms. As we search for corresponding fragment of acentral view in both side views simultaneously, there are manyways of defining a
Cost ( x, y, t ) function.Commonly [17], [16] it is the sum of similarity metricsbetween a fragment in the center view and correspondingfragments in left and right views. Cost ( x, y, t ) = Similarity ( I C ( x, y ) , I L ( x + t, y ))+ Similarity ( I C ( x, y ) , I R ( x − t, y )) (3)Because of the occlusions Tanimoto [15] proposed to pickjust the most similar fragment from either left or right view.The intuition is that the occluded fragment of the images willlead to less similar fragment, thus the minimum of similaritymetrics from left and right view is used. ost ( x, y, t ) = NotOcc L ( x, y, t ) · Sim ( I C ( x, y ) , I L ( x + t, y ) + NotOcc R ( x, y, t ) · Sim ( I C ( x, y ) , I R ( x − t, y ))) NotOcc L ( x, y, t ) + NotOcc R ( x, y, t ) (5) Cost ( x, y, t ) = min ( Similarity ( I C ( x, y ) , I L ( x + t, y )) ,Similarity ( I C ( x, y ) , I R ( x − t, y ))) (4)In this paper we propose yet another way to define costfunction which takes into account an occlusion possible withinthe scene. III. P ROPOSED OCCLUSION HANDLING
As it was said before, a given fragment of a scene visiblefrom center view can be occluded in one or both side views(left or/and right) (Fig. 2). In such a case, searching fora correspondence of a given pixel of center view in thisparticular side view (left or right) is pointless, as the givenfragment of the scene is not visible from that particularside view. Considering the correspondence with an occludedfragment of an image could cause errors in estimated disparity.Therefore the correspondence search should be performedonly in side views in which a considered fragment of a centerview is not occluded. The cost function should be constructedin such a way that it considers only similarity metrics fromnot occluded views. If a given fragment is visible in bothviews, then the cost function should be an average of bothsimilarity metrics, in order to reduce the influence of noise,which is present in all views. We propose to define the costfunction in a way that it considers only similarity metrics offragments from a not occluded view (either left or right) (eq. 5)where
N otOcc L ( x, y, t ) , N otOcc R ( x, y, t ) expresses whethera given pixel of a center view is not occluded in left and rightviews respectively. Depending on the existence of occlusionin the views, the sum N otOcc L ( x, y, t )+ N otOcc R ( x, y, t ) inthe denominator of eq. 5 can be 2 if a pixel in not occluded inboth views, 1 if it is occluded in one of the side views (eitherleft or right), and 0 if it is occluded in both side views. If agiven pixel is occluded in both side views the equation 5 loses Fig. 3. Occlusion problem in correspondence search. its meaning, thus in such a case constant penalty value is usedas a cost value.
Cost ( x, y, t ) = const (6)But why a given fragment (object A) of a scene is notvisible in a side view? Because in a side view that fragment isoccluded by some other part of the scene (object B). Object Bblocks light rays from object A, so in side view closer objectB is visible instead the farther object A.Consider the example on Fig. 3 where two points A andB are observed by two cameras (left and center). Point B iscloser to the cameras and point A is farther. Point B is visiblein both views (left and center) at pixel position B Left and B Center respectively. But due to the occlusion , point A isvisible only in center view at pixel position A Center . If therewould be no point B, point A should/would be visible in leftview at pixel position A left . The disparity of point B in leftview is the difference of the pixel position B Left and B Center and disparity of point A in left view would be (if the pointwas/would be visible) the difference of pixel position A Left and A Center . d Left ( B Left ) = B Left − B Center (7) d Left ( A Left ) = A Left − A Center (8)The distance to the camera z is reciprocal to disparity. Soa fragment of an image representing a closer object (pointB) has bigger disparity than the fragment representing fartherobject (point A). z Left ( A Left ) > z Left ( B Left ) < = > d Left ( A Left ) < d Left ( B Left ) (9)For a given pixel A Center of center view at coordinates ( x, y ) and considered displacement t , corresponding pixel A Left in left view should be at coordinates ( x + t, y ) . So,if we want to check whether a fragment A of a scene isoccluded in left view we have to check the disparity (distance)assigned to the considered corresponding pixel A Left in leftview. If a disparity d Left ( x + t, y ) assigned already to consid-ered corresponding pixel A Left is bigger than the considereddisplacement t then probably a pixel A left is not a fragmentof the same object A but rather some other closer object Bthat occludes object A in the left view.Based on such a consideration we can create a functionassessing whether for a pixel at coordinates ( x, y ) and dis-placement t , corresponding pixel is/can be/will be occludedor not in left and right views. N otOcc L ( x, y, t ) = (cid:26) f or t ≥ d Left ( x + t, y )0 f or t < d Left ( x + t, y ) (10) ABLE IP
OSITIONS OF VIEWS USED FOR EVALUATION OF QUALITY OF ESTIMATEDDISPARITY MAPS . Sequence Name View A View B View V
Poznan Street 3 5 4Poznan Hall 2 5 7 6Poznan CarPark 3 5 4Book Arrival 7 9 8Fig. 4. Disparity map quality evaluation methodology.
N otOcc R ( x, y, t ) = (cid:26) f or t ≥ d Right ( x − t, y )0 f or t < d Right ( x − t, y ) (11) N otOcc ( x, y, t ) equal 1 means that the corresponding pixelin side view at a given displacement is probably not occluded.IV. A PPLICATION OF P ROPOSED I DEA
The proposed idea is general - it does not impose anyparticular source of disparity maps d Left for left and d Right for right view. But in general, disparity maps for left and rightviews are unknown before estimating the disparity for centralview.Commonly, disparity maps are estimated iteratively with theuse of such algorithms like Belief Propagation or Graph Cut. Insuch algorithms, at each iteration of the estimation, algorithmmaintains up-to-date / best already estimated disparity map forcenter view. This disparity map is further refined in the nextiteration of the algorithm.For our occlusion detection we propose to use disparitymaps of side views created based on the disparity map ofcenter view through Depth-Image-Based Rendering (DIBR).After each iteration of a disparity estimation algorithm, wecreate disparity maps of side views ( d Left and d Right ) fromthe best already estimated disparity map of a center view.This way if the estimation algorithm used assigned alreadysome disparity d Center ( B ) to some pixel B Center , then pixel A Center cannot have such a disparity that the correspondingpixel A Left (Fig. 3) is at the same position as correspondingpixel B Left of pixel B Center . In other words fragment B ofa scene represented by pixel B Center in center view shouldocclude a fragment A of a scene (represented by pixel A Center in center view) seen from left view.V. E
XPERIMENTS
We have implemented our idea in Depth Estimation Refer-ence Software (DERS) [18] version 5.0 developed by Moving Picture Experts Group (MPEG) of International Standardiza-tion Organization (ISO) during works on 3D video compres-sion standardization. DERS is the state-of-the-art disparityestimation technique, designed with 3D video application inmind. It uses Graph Cut as the optimization algorithm alongwith many other techniques that improve or/and speed updisparity estimation from three input videos.Proposed approach was tested on four 3D video test se-quences recommended by the MPEG committee (Fig. 5)namely: Poznan Street, Poznan CarPark, Poznan Hall 2 [1],Book Arrival [2]. (a) Poznan Street [1] (b) Poznan CarPark [1](c) Poznan Hall2 [1] (d) Book Arrival [2]Fig. 5. Exemplary frames from multiview test sequences used in experiments
In applications such as Free View Television, disparitymaps are used mainly for the purpose of view synthesis.Therefore, we have evaluated our proposed method indirectly,by assessing the quality of the synthesized views. (a) Tsukuba (b) Venus(c) Teddy (d) ConesFig. 6. Standard Middlebury dataset [29] used for evaluation of proposedalgorithm isparity maps for two views A and B (Fig. 4) have beenestimated with the use of the proposed method and originalunmodified DERS software. Based on views A and B andestimated disparity maps for views A and B, view V thatis positioned in between of view A and B was synthesized.Exact view numbers for each of test sequences used duringexperiments are provided in Table I.The quality of estimated disparity maps for views A andB is measured as a quality of rendered view V. Quality ofsynthesized view V is expressed by PSNR of luminance incomparison with view V captured by real camera positionedat the same spatial position (see Fig. 4).Such methodology is compliant with experimental method-ology developed and approved by the MPEG committee ofInternational Standardization Organization and is used by otherresearch institutes, targeted at high quality 3D television fore.g. autostereoscopic displays.In the course of evaluation disparity maps were estimatedfor every frame within the sequences (mostly 250 frames perview). This has allowed to evaluate our algorithm on a widerange of different images. The disparity estimation was donewith pixel, half-pixel and quarter-pixel precision. Also, a widerange of regularization terms used in Graph Cut algorithm hasbeen evaluated. In DERS the regularization is controlled byso-called smoothing coefficient. In experiments, the range of1 to 4 was explored.We have also evaluated our algorithm on standard Middle-bury dataset [29]: Tsukuba, Venus, Teddy and Cones (Fig. 6).In the course of that, we have modified DERS algorithm todirectly output raw disparity maps in the format required byMiddlebury evaluation webpage [30]. Because both proposedmethods and the DERS algorithm are designed to work withthree input images, we have extended recommended/standardstereo pair with third image as specified in Table II.
TABLE IIS
PECIFICATION OF THREE VIEWS USED FOR DISPARITY ESTIMATION FOREACH M IDDLEBURY DATASET . Dataset name Additional view Standard stereo pairLeft view Center view Right view
Tsukuba 2 3 4Venus 0 2 6Teddy 0 2 6Cones 0 2 6
VI. R
ESULTS
The comparison of quality of estimated disparity mapsfor proposed method versus original DERS can be found in Fig. 7d, 7a, 7c, 7b. As it can be noticed, the smoothingcoefficient can have significant impact on the quality ofdisparity maps estimated by DERS. It can be expected that in areal-world-use scenario, this parameter will be automaticallycontrolled to provide the best results. Therefore, in summa-rized Table III, we have presented only the best-performingcases. Depending on the case, the proposed occlusion han-dling brings a gain of 0.02-2.50 dB of luminance PSNR ofsynthesized view, related to the original unmodified DERS.On average, the proposal provides an improvement of 1.26 dBfor pixel-precise disparity estimation, 1.23 dB for half-precisedisparity estimation, and 1.18 db for quarter-precise disparityestimation.The application of proposed occlusion handing to Middle-bury images results in 0.2 bad pixel improvement (Table IV).Please keep in mind that Middleburry datasets have very littleocclusions. VII. C
ONCLUSION
We have presented a novel approach to occlusion handlingin disparity estimation, based on a modification of similaritycost function. Proposed approach has been tested in the three-view disparity estimation scenario. For occlusion detectionsynthesized disparity maps of left and right views have beenused.For well-known multiview video test sequences, the ex-perimental results show that the proposed approach providesvirtual view quality improvement of 1.25 dB of luminancePSNR over the state-of-the-art technique implemented inMPEG Depth Estimation Reference Software (DERS). More-over, direct quality evaluation of estimated disparity revealsthat proposed the approach reduces a number of bad pixels by1.26 p.p. A
CKNOWLEDGMENT
Research project was supported by National Science Centre,Poland, according to the decision DEC-2012/05/N/ST7/1279.R
EFERENCES[1] M. Doma´nski, O. Stankiewicz, K. Wegner et al.,
Pozna multiview videotest sequences and camera parameters , ISO/IEC JTC1/SC29/WG11 Doc.M17050, Xian, China, October 2009.[2] I. Feldmann, A. Smolic, et al.,
HHI Test Material for 3D Video , ISO/IECJTC1/SC29/WG11, Doc. M15413, Archamps, France, 2008.[3] K. Wegner, O. Stankiewicz,
Similiarity measures for depth estimation ,3DTV-Conference 2009, Potsdam, Germany, May 2009.TABLE IIIQ
UALITY COMPARISON BY
PSNR
OF A SYNTHESIZED VIEW FOR THE BEST DEPTH MAPS WITH RESPECT TO SMOOTHING COEFFICIENT . Sequence Name Pixel precison Half-pixel precision Quarter-pixel precisionDERS [dB] Proposed [dB] Gain DERS [dB] Proposed [dB] Gain DERS [dB] Proposed [dB] Gain
Poznan Street 36.31 37.41 1.10 36.78 37.70 0.92 36.96 37.88 0.90Poznan Hall2 34.62 36.06 1.44 34.62 36.11 1.48 34.74 36.39 1.56Poznan CarPark 31.71 33.89 2.18 31.36 33.87 2.50 31.51 33.99 2.48Book Arrival 36.06 36.36 0.30 37.37 37.38 0.02 37.37 37.39 0.02
Average - - - Smoothing coefficient L u m i n a n ce PS N R [ d B ] (a) Poznan CarPark . . . Smoothing coefficient L u m i n a n ce PS N R [ d B ] (b) Poznan Street Smoothing coefficient L u m i n a n ce PS N R [ d B ] Proposed pixel precisionProposed half-pixel precisionProposed quarter-pixel precisionDERS pixel precisionDERS half-pixel precisionDERS quarter-pixel precision (c) Poznan Hall 2 . . . Smoothing coefficient L u m i n a n ce PS N R [ d B ] Proposed pixel precisionProposed half-pixel precisionProposed quarter-pixel precisionDERS pixel precisionDERS half-pixel precisionDERS quarter-pixel precision (d) Book ArrivalFig. 7. Performance comparison of depth estimation with use of proposed method and DERSTABLE IVC
OMPARISON OF PROPOSED METHOD WITH
DERS
ON MIDDLEBURY DATASETS .Algorithm Tsukuba Venus Teddy Conesnonocc all disc nonocc all disc nonocc all disc nonocc all discGC+occ 1.19 2.01 6.24 1.64 2.19 6.75 11.2 17.4 19.8 5.36 12.4 13.02OP+occ [36] 2.91 3.56 7.33 0.24 0.49 2.76 10.9 15.4 20.6 5.42 10.8 12.5Putv3 1.77 3.86 9.42 0.42 0.95 5.72 7.02 14.2 18.3 2.40 9.11 6.56CostAggr+occ [37] 1.38 1.96 7.14 0.44 1.13 4.87 6.80 11.9 17.3 3.60 8.57 9.36DERS 2.70 3.30 12.10 0.67 1.25 8.53 10.2 11.5 23.3 5.17 7.33 9.50Proposed 2.65 3.01 11.20 0.63 1.02 8.34 9.96 10.97 -.– 5.02 7.12 -.–4] S. Birchfield and C. Tomasi,
A pixel dissimilarity measure that isinsensitive to image sampling , IEEE Transactions on Pattern Analysisand Machine Intelligence, 20(4):401406, April 1998.[5] Woo-Seok Jang, Yo-Sung Ho,
Efficient disparity map estimation usingocclusion handling for various 3D multimedia applications , IEEE Con-sumer Electronics, vol. 57, no. 4, pp.1937,1943, November 2011.[6] K. Mller, P. Merkle, T. Wiegand, , Proceedings of the IEEE, vol. 99, no. 4, April 2011.[7] L. Zhang and W. J. Tam,
Stereoscopic image generation based on depthimages for 3DTV , IEEE Trans. on Broadcasting, vol. 51, no. 2, pp. 191-199, June 2005.[8] S. Y. Kim, J. H. Cho, and A. oschan,
3D video generation and servicebased on a TOF depth sensor in MPEG-4 multimedia framework , IEEETrans. on Consumer Electronics, vol. 56, no. 3, pp. 1730-1738, August2010.[9] M. Doma´nski, A. Dziembowski, A. Kuehn, M. Kurc, A. Łuczak, D. Mie-loch, J. Siast, O. Stankiewicz and K. Wegner,
Experiments on acquisitionand processing of video for free-viewpoint television , 3DTV Conference2014, Budapest, Hungary, July 2014.[10] R. Hartley and A. Zisserman,
Multiple View Geometry in ComputerVision , 2nd ed., Cambridge University Press, 2003, pp. 262-278.[11] M. Okutomi and T. Kanade.
A multiple baseline stereo , IEEE Trans.PAMI, 15(4):353363, April 1993.[12] R. T. Collins.
A space-sweep approach to true multi-image matching , InCVPR96, pp. 358363, San Francisco, 1996.[13] J. Stankowski, K. Klimaszewski, O. Stankiewicz, K. Wegner,M. Doma´nski,
Preprocessing methods used for Poznan 3D/FTV testsequences , ISO/IEC JTC1/SC29/WG11 MPEG 2010/M17174, Doc.m17174, Kyoto, Japan, January 2010.[14] J. Stankowski, K. Klimaszewski, Book: ”Image Processing and Commu-nications Challenges 2”, Chapter: ”Application of epipolar rectificationalgorithm in 3D Television”, Advances in Intelligent and Soft Computing:Vol. 84, Springer-Verlag, Berlin, 2010, pp. 345-352, ISBN: 978-3-642-16294-7.[15] M. Tanimoto, T. Fujii, K. Suzuki, N. Fukushima, and Y. Mori,
Ref-erence softwares for depth estimation and view synthesis , ISO/IECJTC1/SC29/WG11, Doc. M15377, Archamps, France, April 2008.[16] Sang-Beom Lee, Yo-Sung Ho,
Multi-view Depth Map Estimation En-hancing Temporal Consistency , 23rd International Technical Conferenceon Circuits/Systems, Computers and Communications.[17] M. Wildeboer, N. Fukushima, T. Yendo, M. T. Panahpour, T. Fujii,M. Tanimoto,
A semi-automatic multi-view depth estimation method ,Proceedings of the SPIE, Volume 7744, 2010.[18] M. Wildeboer, O. Stankiewicz, K. Wegner,
A soft-segmentation match-ing in Depth Estimation Reference Software (DERS) 5.0 , ISO/IECJTC1/SC29/WG11 Doc. M17049, Xian, China, October 2009.[19] J. Sun, N. N. Zheng, and H. Y Shum,
Stereo matching using beliefpropagation , IEEE Trans. on Pattern Analysis and Machine Intelligence,vol. 25, no. 7, pp. 787-800, July 2003.[20] P. Felzenszwalb and D. Huttenlocher,
Efficient belief propagation forearly vision , In Proc. IEEE Computer Society Conference on ComputerVision and Pattern Recognition, pp.261-268, June 2004.[21] Y. Boykov, O. Veksler, and R. Zabih,
Fast approximate energy mini-mization via graph cuts , IEEE Trans. on Pattern Analysis and MachineIntelligence, vol. 23, no. 11, pp.1222-1239, November 2001.[22] O. Veksler,
Stereo correspondence by dynamic programming on a tree
CVPR 2005.[23] G. Egnal and R. Wildes,
Detecting binocular halfocclusions: empiricalcomparisons of five approaches , IEEE Trans. on Pattern Analysis andMachine Intelligence, vol. 24 no. 8, pp. 1127-1133, August 2002.[24] A. Bobick and S. Intille,
Large occlusion stereo , Int. Journal of Com-puter Vision, vol.33, no. 3, pp. 181-200, September 1999.[25] D. Marr and T. A. Poggio,
Cooperative computation of stereo disparity ,Science, vol. 194, no. 4262, pp. 283-287, October 1976.[26] V. Kolmogorov and R. Zabih,
Computing visual correspondence withocclusions using graph cuts , In Proc. IEEE International Conference onComputer Vision, pp. 508-515, July 2001.[27] T. Liu, P. Zhang, and L. Luo,
Dense stereo correspondence withcontrast context histogram, segmentation-based two-pass aggregation andocclusion handling , Lecture Notes in Computer Science, vol. 5414,pp.449-461, January 2009.[28] R. Ben-Ari and N. Sochen,
Stereo matching with Mumford-shah regu-larization and occlusion handling , IEEE Trans. on Pattern Analysis andMachine Intelligence, vol. 32, no. 11, pp. 2071-2084, November 2010.[29] D. Scharstein and R. Szeliski,
High-Accuracy Stereo Depth Maps UsingStructured Light , In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2003), volume 1, pages 195-202,Madison, WI, June 2003.[30] Middlebury Stereo Evaluation - Version 2, webpage visited 2015-01-24,http://vision.middlebury.edu/stereo/eval .[31] G. Tech, K. Wegner, Y. Chen, S. Yea,
JointCollaborative Team on 3D Video Coding Extension Development of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 Doc. JCT3V-J1001,10th Meeting: Strasbourg, FR, 1824 October 2014.[32] Annex I
Multiview and Depth video coding of ISO/IEC 14496-10, Int.Standard
Generic coding of audio-visual objects Part 10: Advanced VideoCoding , 8th Ed., 2013, also: ITU-T Rec. H.264, Edition 8.0, 2013.[33] , JCT-3V of ITU T SG 16 WP 3 and ISO/IEC JTC1/SC 29/WG 11, Doc. JCT3V-G1003, San Jose, USA, 2014.[34] D. Scharstein and R. Szeliski,
A taxonomy and evaluation of densetwo-frame stereo correspondence algorithms , International Journal ofComputer Vision, 47(1/2/3):7-42, April-June 2002.[35] S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski,