Roof Age Determination for the Automated Site-Selection of Rooftop Solar
Chris Heinrich, Michael Laskin, Simas Glinskis, Evert van Nieuwenburg
RRoof Age Determination for the AutomatedSite-Selection of Rooftop Solar
Chris Heinrich
Qrithm
Michael Laskin
UC Berkeley, Qrithm
Simas Glinskis
University of Chicago
Evert van Nieuwenburg
Caltech
Abstract
Rooftop solar is one of the most promising tools for drawing down greenhousegas (GHG) emissions and is cost-competitive with fossil fuels in many areas ofthe world today. One of the most important criteria for determining the suitabilityof a building for rooftop solar is the current age of its roof. The reason for this issimple – because rooftop solar installations are long-lived, the roof needs to benew enough to last for the lifetime of the solar array or old enough to justify beingreplaced. In this paper we present a data-driven method for determining the age ofa roof from historical satellite imagery, which removes one of the last obstaclesto a fully automated pipeline for rooftop solar site selection. We estimate that afull solution to this problem would reduce customer acquisition costs for rooftopsolar by ∼ ∼
750 megatons of CO displaced between2020 and 2050. The last 10 years have seen a dramatic drop in the cost of electricity generated from solar photovoltaic(PV) systems [1]. This drop in costs directly leads to an increased rate of deployment of PV systems,and therefore a reduction in future emissions of GHGs from fossil fuels. Once considered tooexpensive to be useful in the fight against climate change, ground mount and rooftop solar are nowboth ranked among the top 10 most promising methods for reducing GHGs. Rooftop solar aloneis estimated to have the potential to offset 24.6 gigatons of CO by 2050 if the fraction of globalelectricity produced by rooftop solar grows from today’s level of .4 % to 7 % by 2050 [2].In order to reach or exceed this 7 % level, however, the cost of installing solar needs to drop evenfurther. The hard costs of installing solar, including materials and equipment, have dropped to thepoint where now over two-thirds of the cost of installing solar comes from ‘soft costs’ [3][4] suchas labor, permitting and customer acquisition costs. Customer acquisition costs (CAC) alone canaccount for up to 20% of the total costs of installing solar [5] [3]. By developing automated methodsfor identifying and ranking potential solar installation sites, solar developers can focus their sales andmarketing budgets on the most promising sites, and thereby reduce their customer acquisition costs.In turn, this reduction in cost leads to an overall increased rate in the deployment of solar power [4].Given that the roof of a building is easily visible from aerial images, global-scale satellite image datacan be brought to bear on the site-selection problem, and this is precisely where machine learning(ML) can have a major impact.In this work we study how satellite imagery can be used to estimate the age of a building’s roof. Theroof age is a critical piece of information for solar developers because it directly impacts whethera building is currently well suited for rooftop solar. This is because rooftop solar installations lastfor 25+ years, and it is important that the roof does not need to be replaced during the lifetime ofthe solar array because the additional labor costs of removing and reinstalling the solar array whilereplacing the roof would typically cripple the economics of the solar project. This is especially truefor commercial and industrial buildings where large, complex solar arrays typically cover all available Preprint. Under review. a r X i v : . [ c s . OH ] J a n igure 1: Example sequences of rooftop images taken in the years 2012-2018. The top row shows aneasy example, with a clear roof change in the fifth column, meanwhile the second row shows a harderexample which exhibits some of the challenges of this problem, including image blur and varyingimage exposures which can be mistaken for a changing roof.roof area. Solar developers therefore generally seek out rooftops which are either 0-4 years or 25+years of age, since in the first case the existing roof will likely last long enough, and in the latter casethe roof is already old enough to justify replacement in conjunction with the installation of the solararray.Although roof age can usually be determined by consulting the building’s owner, making contactwith the building owner, particularly for commercial and industrial buildings, can take considerabletime and effort. This is where an automated estimation of roof age can lower customer acquisitioncosts, and therefore accelerate the deployment of rooftop solar [4]. We estimate that a full solution tothis problem would displace an additional 750 megatons of CO between the years of 2020 and 2050,with the details behind this estimated included in Appendix C. The general site-selection problem for rooftop solar has multiple components, a number of whichhave been studied previously. One important problem is to estimate the potential size and outputof a solar array on a given rooftop. Some pioneering approaches to automate this process includeMapdwell [6], as well as Project Sunroof [7], which used LIDAR data to produce solar array sizeand shading estimates. More recently Ref. [8] proposed a data-driven method for estimating solararray size using only RGB satellite images and ML. Another component of the site-selection probleminvolves estimating a building owner’s current electricity bill by estimating the energy consumptionof the building, to which ML has been applied in Refs. [9] and [10]. However, to the best of ourknowledge, ours is the first work to study data-driven approaches to the roof age estimation problemin particular, and we believe a robust solution to this problem removes one of the last obstacles to afully automated approach to rooftop solar site-selection.
We propose to determine the age of a roof using only satellite imagery, which is widely available andglobal in scope. Instead of attempting to directly regress the age of a roof from a present-day satelliteimage, however, we instead propose to use historical satellite imagery to determine the year in whichthe roof was last replaced, thereby solving for the roof age as the number of years since that date. Ifno reroof is detected during the range of available historical satellite imagery, then the age of the roofis determined to be greater than that range.
We introduce a new dataset of commercial and industrial building roof images for studying thisproblem. The dataset consists of 1,610 images of building rooftops, covering over 230 buildings inSouthern California. For each property we include one image per year in the range 2012-2018 andindicate (a) whether the roof was replaced during the 2012-2018 time frame and (b) the year when thereroof took place if there was one. Approximately 180 of the 230 buildings in the dataset underwenta reroof, and the reroof year was obtained from public building permit data from the City of LosAngeles and verified by eye [11]. The dataset was partitioned into subsets for training, validation and2 ethod Reroof detection accuracy Avg error (years) β -VAE .
872 0 . Categorical baseline 0.648 1.868Table 1: Evaluation results on the test dataset.testing, consisting of 150, 25 and 55 buildings respectively. Example image sequences are shown inFig. 1. We train a β -VAE [12], an unsupervised generative model, see appendix A, to embed satellite imagesinto latent codes x → z . In addition we also train a binary classifier, on the latent vectors of allimage pairs for a given building, to predict whether the two images correspond to the same roof or adifferent roof. At inference time, latent vectors for every image in the sequence are generated, andadjacent pairs are classified with the binary classifier: ( z t − , z t ) → p t , with p t being the probabilitythat the roofs at time t − and t are different. If p t < . for all t , then no reroof is predicted for theimage sequences, otherwise the transition year is determined to be T = argmax t ( p t ) . See AppendixB for additional implementation details. The metrics used to evaluate performance are (a) the reroof detection accuracy, i.e. the fraction ofbuildings for which it was correctly predicted whether there was or was not a reroof in the imagesequence and (b) the average error in years, (cid:80) i | y i true − y i pred | /N . The average error metric is onlycomputed on the buildings for which it was correctly determined that a reroof did take place. Theaverage error is a useful metric because, for the problem at hand, it is generally sufficient to know theapproximate roof age rather the exact year.We compare the β -VAE method to a categorical distribution baseline fit to the training data in Table1. For the categorical baseline, a reroof year, or a no-reroof label, were randomly guessed accordingto the distribution of labels in the training dataset. We see from Table 1 that the β -VAE significantly outperforms the categorical baseline on both metrics.While the historical range of this dataset is still limited, these results could already be used to suggestthat buildings whose roof was replaced in 2012 or 2013 are less ideal targets for solar developers. Inaddition to the categorical baseline, we also compared the β -VAE to non-learning based methodsthat used features such as zero-normalized cross correlation, and normalized color intensity, to detectroof transitions but found that these alternative methods did not outperform the categorical baseline,providing additional justification for the use of learning based methods to adequately solve thisproblem.Several confounding factors make the problem of roof-age estimation from satellite images chal-lenging. The quality and resolution of satellite image data can vary drastically year-to-year due toenvironmental factors, such as weather conditions and time of day, as well as sensor heterogene-ity across data providers. We also found that wide-area high-resolution satellite imagery is lessavailable before 2010, and virtually non-existent prior to the 1980s. Fortunately, the availability ofhigh-resolution satellite images will only grow over time, and we expect our method to increase inquality and utility as more data becomes available.While building permit data can also be used to determine roof age, this data source comes withits own set of problems, including limited or difficult access, incomplete or false records, andnon-standardized formatting across municipalities. The attractiveness of the satellite-image basedapproach is that it could be applied on a national, or even global scale, using only a single data source. See https://github.com/cpheinrich/reroofdata for access to the reroof dataset Conclusion
In this paper, we argued that automated roof-age estimation will enable faster large-scale deploymentof rooftop solar and showed that it is possible to make such predictions within a reasonable marginof error using historical satellite imagery. We also introduced a new dataset to enable the continuedstudy of data-driven approaches to this problem. Some interesting future directions for this workinclude developing methods that are more robust to confounding factors such as image blur andvariation in satellite image sources, as well as applying the method to residential rooftops. It wouldalso be useful to solve zero-shot roof-age estimation from a single image instead of detecting the yearthe roof was replaced.
References [1] Ran Fu, Robert M Margolis, and David J Feldman. Us solar photovoltaic system cost benchmark:Q1 2018. Technical report, National Renewable Energy Lab.(NREL), Golden, CO (UnitedStates), 2018.[2] Paul Hawken.
Drawdown: The most comprehensive plan ever proposed to reverse globalwarming
GTM Research , 2017.[6] J Alstan Jakubiec and Christoph F Reinhart. Towards validated urban photovoltaic potentialand solar radiation maps based on lidar measurements, gis data, and hourly daysim simulations.
Proceedings of SimBuild
Journal of solar energyengineering , 117(3):161–166, 1995.[10] Kadir Amasyali and Nora M El-Gohary. A review of data-driven building energy consumptionprediction studies.
Renewable and Sustainable Energy Reviews , 81:1192–1205, 2018.[11] La building permit data. https://data.lacity.org/A-Prosperous-City/Building-Permits/nbyu-2ha9.Accessed: 2019-08-15.[12] Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick,Shakir Mohamed, and Alexander Lerchner. beta-vae: Learning basic visual concepts with aconstrained variational framework. In
International Conference in Learning Representations ,2016.[13] Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. In , 2014.[14] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2014.4 ppendices
A Algorithms
A.1 Amortized Variational Inference
One unsupervised method for predicting roof age is amortized variational inference [13]. This methodestimates the distribution of observed data p ( x ) through a low dimensional latent distribution p ( z ) .The program encodes data into latent codes with a parametrized function q θ ( z | x ) and decodes with adifferent parametrized function p φ ( x | z ) . The parameters of both functions are updated by maximizingthe evidence lower bound (ELBO). E z ∼ q θ [log p ( x )] ≥ E z ∼ q θ [log p φ ( x | z )] − β D KL (cid:16) q θ ( z | x ) (cid:13)(cid:13) p ( z ) (cid:17) (1)A common choice for a prior over the latents is the unit Gaussian p ( z ) = N (0 , I ) , and the posterioris also Gaussian q θ ( z | x ) = N ( µ, σ ) , where µ and σ are learned by maximizing the ELBO formingmaximum a posteriori probability estimate. When β = 1 , this algorithm is known as the VariationalAutoencoder (VAE), and for β > it is called a β -VAE [12]. B Hyperparameters
We preprocessed raw satellite images to × pixel images centered around the buildings’ roofsand during training applied random transformations to brightness, contrast, and saturation. Forthe VAE architecture, we included four convolutional layers that downsampled the images and afully connected dense layer that yielded 128 dimensional latent codes followed by a stack of threeadditional residual layers, such that the latent space had in total 128 dimensions. During optimizationwe used Adam [14] with a learning rate of lr = 3 · − . For the disentanglement parameter β inthe β -VAE we used β = 1 . The binary classifier had four fully connected layers with dropout thatdownsample the input 256-dimensional concatenated latent code in a one-dimensional logit passedthrough a sigmoid. During optimization we used Adam with a learning rate of lr = 1 · − . C CO offset calculation In this section we estimate the potential impact of this work on GHG emissions, measured in termsof megatons of CO offset between 2020 and 2050. While it is of course impossible to provide anexact number, we seek to provide a defensible order of magnitude estimate by modeling the differentaspects of the problem and making reasonable assumptions. This estimation problem can be brokendown into two parts:1. Estimate how prior knowledge of the building roof age can reduce customer acquisitioncosts for solar developers.2. Estimate how much this reduction in solar installation cost will increase the rate of deploy-ment of rooftop solar. C.1 Impact of roof age knowledge on CAC