[PDF] 3D Surface Reconstruction From Multi-Date Satellite Images

Abstract

The reconstruction of accurate three-dimensional environment models is one of the most fundamental goals in the field of photogrammetry. Since satellite images provide suitable properties for obtaining large-scale environment reconstructions, there exist a variety of Stereo Matching based methods to reconstruct point clouds for satellite image pairs. Recently, the first Structure from Motion (SfM) based approach has been proposed, which allows to reconstruct point clouds from multiple satellite images. In this work, we propose an extension of this SfM based pipeline that allows us to reconstruct not only point clouds but watertight meshes including texture information. We provide a detailed description of several steps that are mandatory to exploit state-of-the-art mesh reconstruction algorithms in the context of satellite imagery. This includes a decomposition of finite projective camera calibration matrices, a skew correction of corresponding depth maps and input images as well as the recovery of real-world depth maps from reparameterized depth values. The paper presents an extensive quantitative evaluation on multi-date satellite images demonstrating that the proposed pipeline combined with current meshing algorithms outperforms state-of-the-art point cloud reconstruction algorithms in terms of completeness and median error. We make the source code of our pipeline publicly available.

Full PDF

33D SURFACE RECONSTRUCTION FROM MULTI-DATE SATELLITE IMAGES

Sebastian Bullinger, Christoph Bodensteiner, Michael ArensFraunhofer IOSB, Ettlingen, Germany { sebastian.bullinger, christoph.bodensteiner, michael.arens } @iosb.fraunhofer.de KEY WORDS:

Satellite Imagery, 3D Reconstruction, Photogrammetry, Surface Reconstruction, Texturing.

ABSTRACT:

The reconstruction of accurate three-dimensional environment models is one of the most fundamental goals in the ﬁeld of photo-grammetry. Since satellite images provide suitable properties for obtaining large-scale environment reconstructions, there exist avariety of Stereo Matching based methods to reconstruct point clouds for satellite image pairs. Recently, the ﬁrst Structure fromMotion (SfM) based approach has been proposed, which allows to reconstruct point clouds from multiple satellite images. In thiswork, we propose an extension of this SfM based pipeline that allows us to reconstruct not only point clouds but watertight meshesincluding texture information. We provide a detailed description of several steps that are mandatory to exploit state-of-the-art meshreconstruction algorithms in the context of satellite imagery. This includes a decomposition of ﬁnite projective camera calibrationmatrices, a skew correction of corresponding depth maps and input images as well as the recovery of real-world depth maps fromreparameterized depth values. The paper presents an extensive quantitative evaluation on multi-date satellite images demonstrat-ing that the proposed pipeline combined with current meshing algorithms outperforms state-of-the-art point cloud reconstructionalgorithms in terms of completeness and median error. We make the source code of our pipeline publicly available.

1. INTRODUCTION1.1 3D Surface Reconstruction with Satellite Imagery

The creation of large-scale environment reconstructions is rel-evant for several application areas such as urban planning or en-vironmental monitoring. While there is a variety of modalitiesand sensors to compute such models, the usage of observationsatellites has recently gained signiﬁcant attraction because ofthe progress made in the ﬁeld of image-based reconstruction. Incontrast to pictures of ground-level and airborne devices, satel-lite images cover huge areas, which allow for reconstructinglarge-scale environment models with less effort. In contrast toactive sensors like Lidar or Radar, image data inherently rep-resents appearance information, which allows image-based re-construction pipelines to derive not only geometry, but also thecorresponding texture information.

Stereo Matching and

Structure from Motion (SfM) are currentlythe predominant methods to derive geometric structures fromsatellite images (de Franchis et al., 2014; Zhang et al., 2019).While SfM reconstructs information of multiple images, Ste-reo Matching is restricted to single image pairs. Thus, SfMbased approaches are inherently better suited to process large(unstructured) image sets such as multi-date satellite imagery.Since SfM obtains intrinsic and extrinsic camera parametersduring reconstruction, the corresponding results allow to util-ize dense and surface reconstruction methods to compute densepoint clouds, to triangulate corresponding meshes and to per-form mesh texturing. In comparison to point clouds, texturedmeshes oftentimes create a similar appearance with a lower in-formation redundancy and inherently contain speciﬁc proper-ties that simplify the computation of consistent visualizationsor intersection tests at different scales.

We propose a pipeline that allows to compute textured meshesfrom multi-date satellite imagery. To apply available surface re- construction libraries to satellite images, several preprocessingsteps are required. This leads to the following main contribu-tions:(1) We leverage PAN-sharpening (Pohl and Genderen, 1998)to combine information of image data captured by differ-ent sensors, i.e . we compute high-resolution satellite imageswith color information exploiting the high resolution resolu-tion of panchromatic images and the color information of low-resolution multispectral images - see Section 3. Since satellitescover huge areas the corresponding reconstructions can easilyexceed available memory capacities of modern workstations.Thus, we extract and align only a speciﬁc area of interest of thepanchromatic and multispectral imagery using the metadata ofthe respective satellite image.(2) The projection properties of satellite imaging sensors aretypically described by

Rational Polynomial Camera (RPC)models (Tao and Hu, 2001). Since the reconstruction operatesonly on a speciﬁc (continuous) part of each satellite image, thecorresponding (complex) RPC model can be substituted with a

Finite Projective Camera (FPC) model that approximates theRPC model locally. Since current state-of-the-art dense andsurface reconstruction libraries do not support the skew para-meter present in ﬁnite FPC models, we perform a speciﬁc ad-aption of the reconstructed FPC models that allows us to main-tain the consistency of the reconstruction results by performinga skew correction of the corresponding images and depth maps.A mathematical derivation that justiﬁes the selected transform-ation is provided in Section 4.(3) To improve the numerical stability of the depth map com-putation, we follow the MVS step proposed by Zhang et al.(2019). Thus, the converted (and undistorted) depth maps aredeﬁned w.r.t. to a reparameterized space. Since Zhang et al.(2019) do not provide information on how to obtain the actualdepth values, we provide a concise derivation in Section 5 thatdemonstrates how the inverse projection matrix allows to re-cover the actual depth values.(4) Following the steps above we are able to utilize current a r X i v : . [ c s . C V ] F e b olmap (General Purpose Pipeline) SfM: MVS: Mesh Reconstruction: Texturing:Sch¨onberger and Frahm (2016) Sch¨onberger et al. (2016) Kazhdan and Hoppe (2013),Sch¨onberger et al. (2016) -

MVE (General Purpose Pipeline)

SfM: MVS: Mesh Reconstruction: Texturing:Fuhrmann et al. (2014) Goesele et al. (2007), Lang-guth et al. (2016) Fuhrmann and Goesele (2014),Ummenhofer and Brox (2017) Waechter et al. (2014)

OpenMVG & OpenMVS (General Purpose Pipeline)

SfM: MVS: Mesh Reconstruction: Texturing:Moulon et al. (2012) Barnes et al. (2009) Jancosek and Pajdla (2014) Waechter et al. (2014)

Meshroom (General Purpose Pipeline)

SfM: MVS: Mesh Reconstruction: Texturing:Moulon et al. (2012) Hirschmuller (2005) Jancosek and Pajdla (2014) Burt and Adelson (1983)

Ours (Satellite Speciﬁc Pipeline)

SfM: MVS: Mesh Reconstruction: Texturing:Zhang et al. (2019) + Zhang et al. (2019) + Ummenhofer and Brox (2017) Waechter et al. (2014)Sch¨onberger and Frahm (2016) Sch¨onberger et al. (2016) +Goesele et al. (2007)

Table 1. Comparison of our pipeline proposed for satellite reconstruction with state-of-the-art (general purpose) reconstructionpipelines including Colmap (Sch¨onberger and Frahm, 2016), MVE (Fuhrmann et al., 2014), OpenMVG & OpenMVS (Moulon et al.,2013; Cernea, 2021) and Meshroom (AliceVision, 2018). state-of-the-art dense and surface reconstructions algorithms toreconstruct textured meshes from multi-date satellite imagery -see Table 1 for the main components of the proposed pipeline.We demonstrate in Section 6 the validity of our approach byperforming a comprehensive quantitative evaluation, since thecomputed meshes exceed (in terms of completeness and medianerror) dense point clouds reconstructed with previously presen-ted methods.(5) To foster future research we make the source code of ourpipeline publicly available . In current literature of satellite image reconstructions there aretwo types of approaches: Stereo Matching and SfM. While themajority of methods such as (Kuschk, 2013), S2P (de Franchiset al., 2014) and ASP (Beyer et al., 2018) rely on StereoMatching to compute 3D point clouds from satellite images,VisSat (Zhang et al., 2019) recently presented an SfM basedapproach to tackle this problem.While SfM allows to compute a three-dimensional scenerepresentation reﬂecting the information of multiple images,Stereo Matching is limited to reconstruct point clouds forsingle image pairs. To apply Stereo Matching in the context ofmulti-date satellite imagery requires additional post processingsteps to fuse the individual point clouds into a single scenerepresentation - see Facciolo et al. (2017). Because of thenumber of image pairs increase quadratically with the numberof input images, Stereo Matching based methods do not scalewell for large image sets. SfM tackles the quadratic increase byexploiting speciﬁc algorithms and data structures such as scenegraphs to accelerate the computation. Thus, SfM requiressubstantially less computation time than Stereo Matchingto reconstruct multi-date satellite images - a correspondingevaluation is provided by Zhang et al. (2019).In addition, the output of SfM allows to directly leverage denseand surface reconstruction algorithms for dense point cloudcomputation, meshing and texturing. Current state-of-the-art Source code is available at https://github.com/hidden/for/review dense reconstruction pipelines have been presented in Colmap(Sch¨onberger et al., 2016), Meshroom (AliceVision, 2018),MVE (Fuhrmann et al., 2014) and OpenMVS (Cernea, 2021),which build on popular algorithms for dense reconstruction(Sch¨onberger et al., 2016; Goesele et al., 2007; Langguth etal., 2016; Barnes et al., 2009), meshing (Labatut et al., 2009;Kazhdan and Hoppe, 2013; Fuhrmann and Goesele, 2014;Ummenhofer and Brox, 2017; Jancosek and Pajdla, 2011) andtexturing (Burt and Adelson, 1983; Waechter et al., 2014).Table 1 shows a comparison of the different image-based(general purpose) pipelines and our proposed approach formulti-date satellite image reconstruction.

2. SATELLITE RECONSTRUCTION PIPELINE

In the following, we describe the proposed pipeline for re-constructing textured surface models of multi-date satellite im-agery. The dependencies of the different pipeline componentsand their relationships are depicted in Fig. 1.Processing multi-date images of observation satellites with cur-rent state-of-the-art image-based reconstruction libraries easilyexceeds memory capacities of modern workstations. To con-trol memory requirements and processing times our pipelinedetermines an area of interest (AOI) in a given set of panchro-matic and multispectral image pairs using a single boundingbox deﬁned in UTM coordinates. The panchromatic band isagnostic to color information, but shows a lower ground sampledistance than the color bands captured by the multispectralsensor.Since many state-of-the-art feature descriptors such as Root-SIFT (Arandjelovic, 2012) exploit image gradients instead ofplain color information, we utilize panchromatic images to re-construct the scene geometry, i.e . to perform SfM, MVS andmeshing computations.The SfM component exploits the approach by Zhang et al.(2019) to locally approximate the RPC model (Tao and Hu,2001) of each panchromatic image with an FPC model describ-ing only the projection properties of the area of interest. TheFPC model has 11 degrees of freedom - the corresponding cal- anchromatic Images Area of Interest(AOI) Multispectral ImagesAOI Extraction,Tonemapping AOI Extraction,Tonemapping PAN-SharpeningSkew-correctionof Cameras,Images andDepth Maps Skew-correctionof ImagesSfMMVSMeshing TexturingFigure 1. Pipeline overview including the different components, the corresponding results and their computational dependencies. Asinput the pipeline receives a set of panchromatic and multispectral images as well as a bounding box deﬁning the area of interest. ibration matrix is shown in (1). In order to represent the ob-tained SfM reconstruction with camera models containing noskew parameters ( i.e . 10 degrees of freedom), we apply an in-trinsic parameter decomposition of the registered cameras aswell as a skew correction of the corresponding input imagesand depth maps. Further details are provided in Section 4.1and Section 4.2. This representation allows us to utilize cur-rent state-of-the-art dense and surface reconstruction librariesto compute dense point clouds and non-textured meshes.In order to texture the reconstructed surfaces, we leveragePAN-Sharpening (Pohl and Genderen, 1998) to compute high-resolution satellite images with color information exploiting thehigh resolution resolution of panchromatic images and the colorinformation of low-resolution multispectral images. Analog-ously to the panchromatic images, we apply the skew correctionstep to the PAN-Sharpened images to utilize current texturinglibraries.We considered several state-of-the-art SfM pipelines to recon-struct satellite images as shown in Table 1. The evaluationconducted by Knapitsch et al. (2017) and Stathopoulou et al.(2019) suggest that Colmap (Sch¨onberger and Frahm, 2016) iscurrently one of the top performing SfM pipelines. Conveni-ently, there exists already VisSat (Zhang et al., 2019), a satellite speciﬁc adaption of Colmap, that allows us to perform a sparsereconstruction of multi-date satellite images.We considered several methods (Kazhdan and Hoppe, 2013;Sch¨onberger et al., 2016; Fuhrmann and Goesele, 2014; Um-menhofer and Brox, 2017; Jancosek and Pajdla, 2011) to recon-struct a watertight surface of the environment. Our quantitativeevaluation in Section 6 suggests that the algorithm by Ummen-hofer and Brox (2017) is a reasonable candidate to perform thesurface reconstruction in the context of satellite images.As shown in Table 1 our pipelines uses the method by Waechteret al. (2014) in the texturing step.

3. SATELLITE IMAGE PREPROCESSING3.1 Area of Interest Extraction

As previously mentioned, earth observation satellites typicallyuse panchromatic and multispectral sensors for image acquis-ition to capture different resolutions and wavelength spectra.The panchromatic band is agnostic to color information, butshows a lower ground sample distance than the color bands cap-tured by the multispectral sensor. We use the provided metadataof the corresponding multispectral image to extract the blue,reen and red color band. Further, we ﬁlter all panchromaticand multispectral images with high cloud coverage ( i.e . > )to reduce the processing time using the corresponding metadata.To compute the area of interest for each panchromatic or multis-pectral image, we use the corresponding RPC model to projecta geo-registered rectangle (provided by the user) to pixel loca-tions. Since the satellite images are deﬁned as high dynamic rangeimages with long-tailed intensity distributions, a naive scalingresults in images with low contrast. To tackle this issue weperform the tone mapping procedure proposed by Zhang et al.(2019) for panchromatic images that includes a gamma correc-tion and a ﬁltering of intensity values below the . percentileand above the . percentile. The gamma correction is deﬁnedby I g = I / . , where I g denotes the gamma corrected imageand I the original satellite image. To extend this approach tomultispectral images, we apply the percentile ﬁltering for eachchannel individually. This is necessary to avoid that a singlecolor band dominates the ﬁltering process, which potentiallyresults in unintended color shifts. Since the RPC models are geo-registered with a set of imagespeciﬁc ground control points, we are able to project the area ofinterest to each panchromatic and multispectral image pair. Byextracting the resulting sub-parts, we obtain an aligned imagepair that represents the same area. Because of the alignment weare able to exploit PAN-Sharpening (Pohl and Genderen, 1998)to leverage the high resolution of the panchromatic image andthe color information of the multispectral image to compute ahigh-resolution image including color information.Recently published PAN-Sharpening approaches (Fu et al.,2019; Deng et al., 2019) evaluate the algorithms only on im-ages with low resolution. By conducting several experimentswe observed that processing high-resolution imagery with thesemethods is infeasible because of their high computational re-quirements. For instance, the implementation by Fu et al.(2019) required on our workstation more than to processa × multispectral and a × panchromatic im-age. Instead we utilize a weighted variant (GDAL/OGR con-tributors, 2020) of the widely used Brovey transform (Pohl andGenderen, 1998). While the algorithm produces overall goodresults, it occasionally shows artifacts for individual pixels orsmall pixel regions - caused by strong reﬂecting surfaces suchas vehicles. Since our pipeline computes the ﬁnal mesh tex-tures with the algorithm proposed by Waechter et al. (2014), aphoto-consistency check ensures that such artifacts do usuallynot hamper the ﬁnal appearance.

4. SUBSTITUTION OF RECONSTRUCTION RESULTS

We leverage the approach by Zhang et al. (2019) in order to loc-ally approximate each pushbroom camera model with a ﬁniteprojective camera (FPC) model (Hartley and Zisserman, 2004).Current state-of-the-art libraries for surface reconstruction andtexturing such as MVE (Fuhrmann et al., 2014) or OpenMVS(Cernea, 2021) do not support the skew parameter contained inﬁnite projective camera models. We address this issue by sub-stituting the reconstructed FPC models with simpler calibrationmatrices containing no skew factors. To maintain the consist-ency of the reconstruction we perform a skew correction of the corresponding images and depth maps.In the following, we derive a decomposition of the FPC calibra-tion matrix that justiﬁes the selected camera model substitutionand the conducted image and depth map transformation.

The calibration matrix of the FPC model is shown in (1), K p =  f x s p x f y p y  (1)where f x and f y represent the focal length of the camera interms of pixel dimensions in the x and y direction. Similarly, p x and p y deﬁne the principal point with respect to the two pixeldimensions. The value s reﬂects the skew of the camera. If s (cid:54) = 0 this means that the pixels are skewed such that the x- andy-axes are not perpendicular.We decompose the calibration matrix K p of the FPC into asimpliﬁed calibration matrix K p (cid:48) and an afﬁne transformation T p (cid:48) → p with K p = T p (cid:48) → p K p (cid:48) ⇔ T p (cid:48) → p = K p K − p (cid:48) .In general, the transformation T p (cid:48) → p = K p K − p (cid:48) of two FPCmodels K p and K p (cid:48) is given by (2).  f x s p x f y p y (cid:124) (cid:123)(cid:122) (cid:125) K p  f (cid:48) x − s (cid:48) f (cid:48) x f (cid:48) y p (cid:48) y s (cid:48) f (cid:48) x f (cid:48) y − p (cid:48) x f (cid:48) x f (cid:48) y − p (cid:48) y f (cid:48) y (cid:124) (cid:123)(cid:122) (cid:125) K − p (cid:48) =  f x f (cid:48) x − f x s (cid:48) f (cid:48) x f (cid:48) y + sf (cid:48) y f x p (cid:48) y s (cid:48) f (cid:48) x f (cid:48) y − f x p (cid:48) x f (cid:48) x − p (cid:48) y sf (cid:48) y + p x f y f (cid:48) y − f y p (cid:48) y f (cid:48) y + p y (cid:124) (cid:123)(cid:122) (cid:125) T p (cid:48)→ p (2)The projection of a point in camera coordinates p c onto an ho-mogeneous image point (cid:101) p I may now be reformulated accordingto (3). (cid:101) p I = K p p c ⇔ (cid:101) p I = T p (cid:48) → p K p (cid:48) p c ⇔ T − p (cid:48) → p (cid:101) p I = K p (cid:48) p c . (3)Equation (3) shows that the original calibration matrix K p canbe substituted with an arbitrary ﬁnite projective camera model K (cid:48) p by transforming the image projections with T − p (cid:48) → p (cid:101) p I . Many state-of-the-art MVS, surface reconstruction and textur-ing libraries including the (ofﬁcial) implementations used inour evaluation (see Section 6) support only skew-free cameramodels K s , i.e . s (cid:48) must be . The choice for f (cid:48) x , f (cid:48) y , p (cid:48) x and p (cid:48) y is not restricted. By deﬁning ( f (cid:48) x , f (cid:48) y , p (cid:48) x , p (cid:48) y , s (cid:48) ) accordingto ( f (cid:48) x , f (cid:48) y , p (cid:48) x , p (cid:48) y , s (cid:48) ) := ( f x , f y , p x , p y , we obtain the trans-formation T s → p shown in (4).  f x s p x f y p y (cid:124) (cid:123)(cid:122) (cid:125) K p ·  f x − p x f x f y − p y f y (cid:124) (cid:123)(cid:122) (cid:125) K − s =  sf y − s · p y f y (cid:124) (cid:123)(cid:122) (cid:125) T s → p (4)The matrix T s → p in (4) contains not only a skew correctionfactor ( i.e . sf y ), but also a translation component ( i.e . − sp y f y ).hus, a part of the pixels will be translated outside of the im-age boundaries and important information will be lost. Instead,we deﬁne K s according to ( f (cid:48) x , f (cid:48) y , p (cid:48) x , p (cid:48) y , s (cid:48) ) := ( f x , f y , p x − s · p y f y , p y , to obtain a translation-free skew-correcting matrix T s → p as shown in (5).  f x s p x f y p y (cid:124) (cid:123)(cid:122) (cid:125) K p ·  f x − p x f x + s · p y f x · f y f y − p y f y (cid:124) (cid:123)(cid:122) (cid:125) K − s =  sf y

00 1 00 0 1 (cid:124) (cid:123)(cid:122) (cid:125) T s → p (5)For each registered image I p (corresponding to K p ) we com-pute a skew-free image I s and a skew-free depth map D s (cor-responding to K s ) using I s = T − s → p · I p and D s = T − s → p · D p .While removing the skew factors we applied a cubic interpol-ation to compensate artifacts caused by the discrete nature ofpixel based image representations. See Fig. 1 for a comparisonof the original and the corresponding skew-free image.

5. REAL DEPTH MAP RECOVERY

Because the distances of the registered cameras and the recon-structed point cloud (in the context of satellite images) are ex-tremely large, the values in the corresponding depth maps arecompressed to an interval that is far away from the origin. Toimprove the numerical stability of the depth map reconstruc-tion, it is reasonable to perform the computations in a repara-meterized space. We follow the reparameterization proposed byZhang et al. (2019), which uses plane-plus-parallax (Irani et al.,1998).Since Zhang et al. (2019) do not describe how to obtain the realdepth map values, we provide a concise explanation in the fol-lowing. The reparameterization is deﬁned according to (6),  P P P P P P P P P P P P Z − ¯ Zd (cid:124) (cid:123)(cid:122) (cid:125) P ·  xyz  =  uZvZZ ¯ Zz − ¯ Zd  =  uZvZZ ( ¯ Zz − ¯ Zd ) ZZ  :=  uZvZZmZ  (cid:39)  uv m  (6)where [ x, y, z, T denotes a point in homogeneous world co-ordinates and [ u, v, T a point in homogeneous image coordin-ates. P represents the (reparameterized) projection matrix fromworld to image coordinates, Z deﬁnes the conventional depthwith Z = P x + P y + P z + P , ¯ Z indicates the aver-age of the conventional depth values of the points computed inthe SfM step, (0 , , , d ) deﬁnes a plane below the scene and m denotes the reparameterized depth with m = ( ¯ Zz − ¯ Zd ) /Z .Following this reparameterization the MVS step yields re-parameterized depth maps representing vectors of the form [ u, v, m ] T . By rewriting (6) according to (7), it becomes ap-parent that the actual Z value can be obtained using only thereparameterized depth value m and the 4-th (the last) row of P − - in the following denoted as ( P − ) . P  xyz  =  uZvZZmZ  ⇔ P −  uv m  =  x/Zy/Zz/Z /Z  ⇒ ( P − )  uv m  = 1 /Z ⇔ Z = 1( P − ) (cid:2) u, v, , m (cid:3) T (7)To increase numerical stability, Zhang et al. (2019) normalize P and P − . Let n P − denote the normalization factor of P − .We obtain the real depth values using n P − · Z .By computing K s , I s (see Section 4.1) and D s for each re-gistered image, we are able to utilize established libraries forMVS and texturing in the context of satellite images such asthe algorithms proposed by Fuhrmann et al. (2014) and Cernea(2021).

6. EXPERIMENTS AND EVALUATION

For our experiments we use multi-date satellite imagery of theIARPA MVS3DM dataset (Bosch et al., 2016). The groundsample distance of the panchromatic and the multispectral im-ages is approximately . and . . To compare the proposed pipeline with previously reported res-ults we utilize the benchmark presented by Zhang et al. (2019),which is based upon the IARPA MVS3DM dataset (Bosch etal., 2016). The corresponding ground truth consists of a setof geo-registered airbourne lidar scans covering different urbanareas near San Fernando, Argentina. Zhang et al. (2019) fol-low the evaluation protocol by Bosch et al. (2016) to performa metric comparison of the MVS and the ground truth pointcloud. To this end, both point clouds are mapped to a predeﬁnedgeo-registered grid representing a height map - with cell sizesof . × . . The value for each grid cell is deﬁned by themaximum height value of all corresponding points. Since theheight map comparison is subject to the initial geo-registration,the evaluation protocol reﬁnes the alignment before perform-ing the actual evaluation. Bosch et al. (2016) deﬁne two metricscores that represent the median error (ME) and the complete-ness (CP) of the reconstructed height map. Concretely, CP de-notes the percentage of cells with an error less than .In order to exploit this benchmark for mesh evaluation, we ex-tract a set of points from the mesh surface. Since the recon-structed vertex positions usually show a non-uniform (and tex-ture dependent) distribution, they are only partly suited to rep-resent the full surface geometry. Thus, we utilize poisson disksampling (Cook, 1986) to extract a set of points following a ran-dom distribution while ensuring that the distance between anypair of samples is larger than a certain predeﬁned threshold. Wefollow previous practice (Zhang et al., 2019) and ﬁll small holeswith adjacent values.In order to determine a state-of-the-art surface reconstructionapproach for multi-date satellite imagery, we quantitativelyevaluate the proposed pipeline with several modern meshing al-gorithms. Table 2 reports the corresponding ME and CP metricsbased on surface points extracted with poisson disk sampling.Supplementary, Table 3 provides an analysis using only the re-constructed mesh vertices as surface representation. This al-lows us to compare the reconstructions with previous resultsipeline Depth Map Meshing Point Site 1 Site 2 Site 3Fusion Algorithm Sampling CP (%) ME (m) CP (%) ME (m) CP (%) ME (m)Proposed VisSat Poisson Poisson Disk 72.1 0.307 66.5 Proposed VisSat Colmap Poisson Disk 73.7

Table 2. Quantitative evaluation of the proposed pipeline with different state-of-the-art surface reconstruction algorithms. The surfacepoints used to compute the metrics are extracted with poisson disk sampling. The evaluated surface reconstruction methods includePoisson (Kazhdan and Hoppe, 2013), Colmap (Sch¨onberger et al., 2016; Labatut et al., 2009), OpenMVS (Cernea, 2021; Jancosek andPajdla, 2014), FSSR (Fuhrmann and Goesele, 2014) and GDMR (Ummenhofer and Brox, 2017). FSSR and GDMR require scaleinformation for each point in the fused point cloud to triangulate the corresponding mesh. In this case we substitute the depth mapfusion approach of VisSat (Zhang et al., 2019) with the equivalent processing step of MVE (Goesele et al., 2007). The metricscompleteness (CP) ↑ and median error (ME) ↓ follow the deﬁnition by Bosch et al. (2016). The top three results are highlighted ingreen (ﬁrst), blue (second) and red (third). Pipeline Depth Map Meshing Point Site 1 Site 2 Site 3Fusion Algorithm Sampling CP (%) ME (m) CP (%) ME (m) CP (%) ME (m)Zhang et al. (2019) VisSat - Vertex 71.7 0.340 66.8 0.460 63.2 0.413Proposed MVE - Vertex 63.3 0.699 55.9 0.884 55.1 0.856Proposed VisSat Poisson Vertex

Proposed VisSat Colmap Vertex 73.3

Table 3. Quantitative evaluation of the proposed pipeline with different state-of-the-art surface reconstruction algorithms. Thereconstructed mesh vertices are used as surface representation to compute the metrics. See Table 2 for a description of the differentalgorithms and metrics. As baseline for our comparison we use the vertices of the fused point clouds computed with VisSat (Zhang etal., 2019) and MVE (Goesele et al., 2007). The top three results are highlighted with green (ﬁrst), blue (second) and red (third).

Meshing Modiﬁed ValueAlgorithm Parameter (Default Value)Poisson - -Colmap max proj dist 0 (20)OpenMVS min point distance 0 (2.5)smoothing iterations 0 (2)FSSR - -GDMR - -

Table 4. Parameter conﬁguration of the meshing algorithms usedfor the quantitative evaluations.

Depth Map Meshing Site 1 Site 2 Site 3Fusion Algorithm Vertices Vertices VerticesVisSat - 6,289k 1,880k 1,780kMVE - 199,264k 53,872k 55,618kVisSat Poisson 11,055k 4,549k 4,228kVisSat Colmap 4,861k 1,424k 1,381kVisSat OpenMVS 3,964k 1,210k 1,078kMVE FSSR 3,850k 773k 825kMVE GDMR 3,353k 859k 1,675k

Table 5. Number of vertices reconstructed with the differentpoint fusion and mesh triangulation algorithms for each site. presented by Zhang et al. (2019). If viable, we utilize the de-fault mesh reconstruction parameters whenever possible - anoverview of (non-)deviating parameters is presented in Table 4.Since points extracted from mesh surfaces with poisson disksampling are more evenly distributed than reconstructed meshvertices, they provide a better representation of the correspond-ing surface. Thus, Table 2 contains the main results of ourquantitative evaluation. By analyzing the corresponding out-come, we observe that our pipeline achieves the best resultswith GDMR (Ummenhofer and Brox, 2017). This approachyields not only the highest completeness score, but also state-of-the-art median height errors. This result is very remarkableconsidering the fact that GDMR relies on the depth map fusion presented in MVE (Fuhrmann et al., 2014), which producesnoisy point clouds compared to the fused points obtained withVisSat.In Table 3 we use the mesh vertices to deﬁne a point set rep-resenting the reconstructed surfaces to conduct a fair compar-ison of the triangulated meshes and a set of fused point cloudscomputed with the approach by Zhang et al. (2019). We ob-serve that the reconstructed mesh vertices outperform the fusedpoints for both metrics, which demonstrates the beneﬁts of theproposed pipeline. Note that our approach achieves (again) thehighest completeness scores with GDMR. However, the conﬁg-urations with Poisson (Kazhdan and Hoppe, 2013) and Colmap(Sch¨onberger et al., 2016) obtain lower median errors.The different rankings in Table 2 and Table 3 demonstrate theimportance of selecting an appropriate point sampling method.Note that the evaluation in Table 3 is biased by the number ofmesh vertices. For instance, higher vertex numbers will mostlikely affect the resulting completeness scores. Table 5 gives anoverview of the corresponding mesh vertices.A comparison of Table 2 and Table 3 shows that GDMR sacri-ﬁces accuracy of vertex positions to obtain faces closer to theactual geometry, which emphasizes the relevance for selectingan appropriate point sampling method.Because of the non-deterministic behavior of SfM we achievedslightly different results in the depth map computation stepcompared to the original values reported by Zhang et al. (2019).

Fig. 2 shows several reconstructed meshes using the proposedapproach with GDMR for meshing, which represents the bestconﬁguration of our pipeline. The different reconstructionsrepresent the sites used for the quantitative evaluation. Fig. 3visualizes the plain geometry of different meshes obtained withPoisson, Colmap and GDMR. We observe that the reconstruc-ted surfaces exhibit different characteristics ( e.g . varying levels a) Textured mesh of site 1 reconstructed with GDMR.(b) Textured mesh of site 2 reconstructed with GDMR.(c) Textured mesh of site 3 reconstructed with GDMR.

Figure 2. Qualitative evaluation of the proposed pipeline usingusing GDMR (Ummenhofer and Brox, 2017) for three sitescontained in the multi-date satellite image dataset provided byBosch et al. (2016). of completeness, smoothness or mesh granularity), which areconsistent to the results reported in Section 6.1.

7. CONCLUSION

This paper presents an approach to reconstruct textured water-tight meshes for multi-date satellite imagery including a de-tailed description of different pipeline steps and their depend-encies. In contrast to previous work, our method leveragescurrent dense and surface reconstruction algorithms to obtaina representation of the scene. We present a decomposition ofFPC calibration matrices that allows us to substitute the cameramodels in the SfM reconstruction with simpler camera modelscontaining no skew factors. To ensure consistent reconstructionresults we apply corresponding skew corrections to input im-ages and depth maps. These adjustments enable us to utilizestate-of-the-art mesh reconstruction algorithms in the contextof satellite images. The paper presents an extensive quantit-ative evaluation of the pipeline on multi-date satellite images.By sampling points from the reconstructed mesh surfaces wecompare our surface reconstructions with point clouds obtained (a) Mesh of site 3 reconstructed with Poisson.(b) Mesh of site 3 reconstructed with Colmap.(c) Mesh of site 3 reconstructed with GDMR.

Figure 3. Qualitative evaluation of our pipeline using multi-datesatellite images by Bosch et al. (2016). We observe that Poisson(Kazhdan and Hoppe, 2013), Colmap (Sch¨onberger et al., 2016)and GDMR (Ummenhofer and Brox, 2017) reconstruct mesheswith different shape characteristics. with previously proposed SfM and MVS methods. Our eval-uation shows that current state-of-the-art meshing algorithmsoutperform previous results in terms of completeness and me-dian error. To foster future research we make the source codeof the proposed pipeline publicly available.

References

AliceVision, 2018. Meshroom: A 3D reconstruction soft-ware. https://github.com/alicevision/meshroom . [Ac-cessed January 2021]. 2Arandjelovic, R., 2012. Three things everyone should know toimprove object retrieval.

IEEE Conference on Computer Visionand Pattern Recognition (CVPR) , 2911–2918. 2Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D. B.,2009. PatchMatch: A Randomized Correspondence Algorithmfor Structural Image Editing.

ACM Transactions on Graphics ,28(3). 2eyer, R. A., Alexandrov, O., McMichael, S., 2018. The AmesStereo Pipeline: NASAs Open Source Software for Derivingand Processing Terrain Data.

Earth and Space Science , 5(9),537–548. 2Bosch, M., Kurtz, Z., Hagstrom, S., Brown, M., 2016. A mul-tiple view stereo benchmark for satellite imagery.

IEEE AppliedImagery Pattern Recognition Workshop (AIPR) , 1–9. 5, 6, 7Burt, P. J., Adelson, E. H., 1983. A Multiresolution Spline withApplication to Image Mosaics.

ACM Transactions on Graphics ,2(4), 217-236. 2Cernea, D., 2021. OpenMVS: Multi-view stereo reconstruc-tion library. https://cdcseacave.github.io/openMVS .[Accessed January 2021]. 2, 4, 5, 6Cook, R. L., 1986. Stochastic Sampling in Computer Graphics.

ACM Transactions on Graphics , 5(1), 51-72. 5de Franchis, C., Meinhardt-Llopis, E., Michel, J., Morel, J.-M.,Facciolo, G., 2014. An automatic and modular stereo pipelinefor pushbroom images.

ISPRS Annals of Photogrammetry, Re-mote Sensing and Spatial Information Sciences , II-3, 49–56. 1,2Deng, L.-J., Feng, M., Tai, X.-C., 2019. The fusion of pan-chromatic and multispectral remote sensing images via tensor-based sparse modeling and hyper-Laplacian prior.

InformationFusion , 52, 76 - 89. 4Facciolo, G., de Franchis, C., Meinhardt-Llopis, E., 2017.Automatic 3D reconstruction from multi-date satellite images.

IEEE Conference on Computer Vision and Pattern Recognition(CVPR) Workshops . 2Fu, X., Lin, Z., Huang, Y., Ding, X., 2019. A variational pan-sharpening with local gradient constraints.

IEEE Conference onComputer Vision and Pattern Recognition (CVPR) . 4Fuhrmann, S., Goesele, M., 2014. Floating Scale Surface Re-construction.

ACM Transactions on Graphics , 33(4). 2, 3, 6Fuhrmann, S., Langguth, F., Goesele, M., 2014. Mve: A multi-view reconstruction environment.

Eurographics Workshop onGraphics and Cultural Heritage , 11–18. 2, 4, 5, 6GDAL/OGR contributors, 2020. GDAL/OGR Geospatial DataAbstraction software Library. Open Source Geospatial Founda-tion. 4Goesele, M., Snavely, N., Curless, B., Hoppe, H., Seitz, S. M.,2007. Multi-view stereo for community photo collections.

IEEEInternational Conference on Computer Vision (ICCV) . 2, 6Hartley, R. I., Zisserman, A., 2004.

Multiple View Geometryin Computer Vision . Second edn, Cambridge University Press,chapter Camera Models, 157. 4Hirschmuller, H., 2005. Accurate and efﬁcient stereo pro-cessing by semi-global matching and mutual information.

IEEE Conference on Computer Vision and Pattern Recognition(CVPR) , 2, 807–814 vol. 2. 2Irani, M., Anandan, P., Weinshall, D., 1998. From refer-ence frames to reference planes: Multi-view parallax geometryand applications.

European Conference on Computer Vision(ECCV) , 829–845. 5 Jancosek, M., Pajdla, T., 2011. Multi-view reconstruction pre-serving weakly-supported surfaces.

IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR) . 2, 3Jancosek, M., Pajdla, T., 2014. Exploiting Visibility Inform-ation in Surface Reconstruction to Preserve Weakly SupportedSurfaces.

International Scholarly Research Notices , 2014, 1-20.2, 6Kazhdan, M., Hoppe, H., 2013. Screened Poisson Surface Re-construction.

ACM Transactions on Graphics . 2, 3, 6, 7Knapitsch, A., Park, J., Zhou, Q.-Y., Koltun, V., 2017. Tanksand Temples: Benchmarking Large-Scale Scene Reconstruc-tion.

ACM Transactions on Graphics , 36(4). 3Kuschk, G., 2013. Large Scale Urban Reconstruction from Re-mote Sensing Imagery.

ISPRS - International Archives of thePhotogrammetry, Remote Sensing and Spatial Information Sci-ences , XL-5/W1, 139–146. 2Labatut, P., Pons, J.-P., Keriven, R., 2009. Robust and EfﬁcientSurface Reconstruction From Range Data.

Computer GraphicsForum . 2, 6Langguth, F., Sunkavalli, K., Hadap, S., Goesele, M., 2016.Shading-aware multi-view stereo.

European Conference onComputer Vision (ECCV) . 2Moulon, P., Monasse, P., Marlet, R., 2012. Adaptive structurefrom motion with a contrario model estimation.

Asian Confer-ence on Computer Vision (ACCV) . 2Moulon, P., Monasse, P., Marlet, R., Others, 2013. OpenMVG.An open multiple view geometry library. https://github.com/openMVG/openMVG/ . [Accessed January 2021]. 2Pohl, C., Genderen, J. L. V., 1998. Review article Multisensorimage fusion in remote sensing: Concepts, methods and applic-ations.

International Journal of Remote Sensing . 1, 3, 4Sch¨onberger, J. L., Frahm, J.-M., 2016. Structure-from-motionrevisited.

IEEE Conference on Computer Vision and Pattern Re-cognition (CVPR) . 2, 3Sch¨onberger, J. L., Zheng, E., Pollefeys, M., Frahm, J.-M.,2016. Pixelwise view selection for unstructured multi-view ste-reo.

European Conference on Computer Vision (ECCV) . 2, 3,6, 7Stathopoulou, E., Welponer, M., Remondino, F., 2019. Open-Source Image-Based 3D Reconstruction Pipelines: Review,Comparison and Evaluation.

ISPRS - International Archives ofthe Photogrammetry, Remote Sensing and Spatial InformationSciences , XLII-2/W17, 331-338. 3Tao, C., Hu, Y., 2001. A Comprehensive Study of the RationalFunction Model for Photogrammetric Processing.

Photogram-metric Engineering and Remote Sensing , 67, 1347-1357. 1, 2Ummenhofer, B., Brox, T., 2017. Global, Dense Multiscale Re-construction for a Billion Points.

International Journal of Com-puter Vision , 1–13. 2, 3, 6, 7Waechter, M., Moehrle, N., Goesele, M., 2014. Let there becolor! Large-scale texturing of 3D reconstructions.

EuropeanConference on Computer Vision (ECCV) , 836–850. 2, 3, 4Zhang, K., Sun, J., Snavely, N., 2019. Leveraging vision re-construction pipelines for satellite imagery.