BigEarthNet: A Large-Scale Benchmark Archive For Remote Sensing Image Understanding
Gencer Sumbul, Marcela Charfuelan, Begüm Demir, Volker Markl
BBIGEARTHNET: A LARGE-SCALE BENCHMARK ARCHIVE FOR REMOTE SENSINGIMAGE UNDERSTANDING
Gencer Sumbul , Marcela Charfuelan , Beg¨um Demir , Volker Markl , Technische Universit¨at Berlin, DFKI GmbH
ABSTRACT
This paper presents the BigEarthNet that is a new large-scalemulti-label Sentinel-2 benchmark archive. The BigEarthNetconsists of , Sentinel-2 image patches, each of whichis a section of i) × pixels for 10m bands; ii) × pixels for 20m bands; and iii) × pixels for 60m bands.Unlike most of the existing archives, each image patch is an-notated by multiple land-cover classes (i.e., multi-labels) thatare provided from the CORINE Land Cover database of theyear 2018 (CLC 2018). The BigEarthNet is significantly largerthan the existing archives in remote sensing (RS) and thus ismuch more convenient to be used as a training source in thecontext of deep learning. This paper first addresses the limi-tations of the existing archives and then describes the proper-ties of the BigEarthNet. Experimental results obtained in theframework of RS image scene classification problems showthat a shallow Convolutional Neural Network (CNN) archi-tecture trained on the BigEarthNet provides much higher ac-curacy compared to a state-of-the-art CNN model pre-trainedon the ImageNet (which is a very popular large-scale bench-mark archive in computer vision). The BigEarthNet opens uppromising directions to advance operational RS applicationsand research in massive Sentinel-2 image archives. Index Terms — Sentinel-2 image archive, multi-label im-age classification, deep neural network, remote sensing
1. INTRODUCTION
Recent advances in deep learning have attracted great atten-tion in remote sensing (RS) due to the high capability of deepnetworks (e.g., Convolutional Neural Networks (CNN), Re-current Neural Networks (RNN), Generative Adversarial Net-works (GAN)) to model the high-level semantic content ofRS images. To train such networks, a very large training setis needed with a high number of annotated images in order tolearn effective models with several different parameters. To thebest of our knowledge, publicly available RS image archivescontain only a small number of annotated images and a large-scale benchmark archive does not yet exist. Thus, the lack ofa large training set is an important bottleneck that prevents theuse of deep learning in RS. In order to address this problem,fine-tuning deep networks pre-trained on large-scale computervision archives (e.g., ImageNet) is considered in RS commu- nity. However, such an approach has several limitations relatedto the differences on the characteristics of images betweencomputer vision and RS. Additionally, in the existing archives,RS images are annotated by single high-level category labelsthat are related to the most significant content of the image.However, RS images typically contain multiple classes andthus each image can be simultaneously associated with differ-ent land-cover class labels (i.e., multi-labels). To overcomethese problems, we introduce the BigEarthNet that is a newlarge-scale Sentinel-2 archive and contains , Sentinel-2 image patches. Each patch is annotated with multi-labelsprovided from the CORINE Land Cover database, which isupdated in 2018 (CLC 2018). We propose our archive as asufficient source for RS image analysis with deep learning. Inorder to test the BigEarthNet on RS image analysis problems,we focus our attention on image scene classification. To thisend, we consider a shallow CNN architecture to be trained onthe BigEarthNet. We compare the results obtained by this net-work with the Inception-v2 [1] pre-trained on the ImageNet.We believe that it will make a significant advancement in termsof developments of algorithms for the analysis of large-scaleRS image archives.
2. LIMITATIONS OF EXISTING REMOTE SENSINGIMAGE ARCHIVES
Most of the benchmark archives in RS (UC Merced Land UseDataset [2], WHU-RS19 [3], RSSCN7 [4], SIRI-WHU [5],AID [6], NWPU-RESISC45 [7], RSI-CB [8], EuroSat [9]and PatternNet [10]) contain a small number of images an-notated with single category labels. Table 1 presents the listof the existing archives. These archives become popular forthe implementation, evaluation and validation of algorithmsin the context of image classification, search and retrievaltasks. However, RS community encounters critical limitations,while using these archives for applying deep learning basedapproaches. One of the most critical limitations is that thenumber of annotated images included in the existing archivesis very small. Thus, they are insufficient to train modern deepneural networks to reach a high generalization ability as themodels may overfit dramatically when using small trainingsets. In details, training such networks on the existing archiveimages suffers from the problem of learning a large number The BigEarthNet is available at http://bigearth.net . a r X i v : . [ c s . C V ] J un able 1 : List of the existing RS archives. Archive Name ImageType AnnotationType Numberof Images
UC Merced Aerial RGB Single Label [2] 2,100Multi-Label [11] 2,100WHU-RS19 [3] Aerial RGB Single Label 1,005RSSCN7 [4] Aerial RGB Single Label 2,800SIRI-WHU [5] Aerial RGB Single Label 2,400AID [6] Aerial RGB Single Label 10,000NWPU-RESISC45 [7] Aerial RGB Single Label 31,500RSI-CB [8] Aerial RGB Single Label 36,707EuroSat [9] Satellite Multispectral Single Label 27,000PatternNet [10] Aerial RGB Single Label 30,400 of parameters that prevents the accurate characterization ofsemantic content of RS images. To this end, fine-tuning themodels pre-trained on ImageNet is used as a transfer learningapproach. However, the profound differences between theimage properties of computer vision and RS limit the accuratecharacterization of RS images when fine-tuning approach isapplied. As an example, Sentinel-2 images have spectralbands associated to varying and lower spatial resolutions withrespect to computer vision images. There are also differencesin the ways that the category labels of computer vision andRS are defined for the semantic content of an image. Thus,fine-tuning pre-trained models for RS images may not begenerally applicable to reduce this semantic gap and thereforemay lead to weak discrimination ability for land-cover classes.Another limitation of existing archives is that they containimages annotated by single high-level category labels, whichare related to the most significant content of the image. How-ever, RS images generally contain multiple classes so thatthey can be simultaneously associated to different land-coverclass labels (i.e., multi-labels). Hence, a benchmark archiveconsisting of images annotated with multi-labels is required.Although the archive presented in [11] contains images withmulti-labels, the sample size of this archive is very small tobe efficiently utilized for deep learning. Another limitation ofRS image archives is that since researchers generally do nothave free access to satellite data together with their annotation,most of the benchmark archives contain aerial images withonly RGB image bands. Unavailability of a high number ofannotated satellite images prevents to employ deep learningmethods in a convenient way for the complete understand-ing of huge amount of freely accessible satellite data (e.g.,Sentinel-1, Sentinel-2). Although the benchmark archive pro-posed in [9] includes annotated satellite images, the numberof images is still small. Aforementioned limitations of exist-ing archives reveal the need for a large-scale RS benchmarkarchive to be used for training deep neural networks insteadof the ImageNet.
3. THE BIGEARTHNET ARCHIVE
To overcome the limitations of existing archives, we intro-duce the BigEarthNet that is the first large-scale benchmarkarchive in RS. We have constructed our archive by selecting
Sentinel-2 tiles acquired between June 2017 and May
Table 2 : The considered Level-3 CLC classes and the number of im-ages associated with each land-cover class in the BigEarth-Net.
Land-Cover Classes Number ofImages
Mixed forest , Coniferous forest , Non-irrigated arable land , Transitional woodland/shrub , Broad-leaved forest , Land principally occupied by agriculture,with significant areas of natural vegetation , Complex cultivation patterns , Pastures , Water bodies , Sea and ocean , Discontinuous urban fabric , Agro-forestry areas , Peatbogs , Permanently irrigated land
Industrial or commercial units
Natural grassland , Olive groves , Sclerophyllous vegetation , Continuous urban fabric , Water courses , Vineyards , Annual crops associated with permanent crops , Inland marshes , Moors and heathland , Sport and leisure facilities , Fruit trees and berry plantations , Mineral extraction sites , Rice fields , Road and rail networks and associated land , Bare rock , Green urban areas , Beaches, dunes, sands , Sparsely vegetated areas , Salt marshes , Coastal lagoons , Construction sites , Estuaries , Intertidal flats , Airports
Dump sites
Port areas
Salines
Burnt areas countries(Austria, Belgium, Finland, Ireland, Kosovo, Lithuania, Lux-embourg, Portugal, Serbia, Switzerland) of Europe. It is worthnoting that considered tiles are associated to cloud cover per-centage less than 1%. All tiles were atmospherically correctedby using Sentinel-2 Level 2A product generation and format-ting tool (sen2cor) of ESA. Among Sentinel-2 spectralbands, 10 th band, for which surface information is not embod-ied, was excluded. After the tile selection and preliminary pro-cessing steps were carried out, selected tiles were divided into , non-overlapping image patches. Each patch (denotedas image hereafter) is a section of i) × pixels for 10mbands; ii) × pixels for 20m bands; and iii) × pix-els for 60m bands. We have associated each image with oneor more land-cover class labels (i.e., multi-labels) providedfrom the CORINE Land Cover (CLC) database of the year2018 (CLC 2018). The CLC inventory was produced by theEionet National Reference Centres on Land Cover with the ermanently irrigated land,sclerophyllous vegetation,beaches, dunes, sands,estuaries, sea and ocean non-irrigated arable land,fruit trees and berryplantations, agro-forestryareas, transitionalwoodland/shrubpermanently irrigated land,vineyards, beaches, dunes,sands, water courses non-irrigated arable landconiferous forest, mixedforest, water bodies discontinuous urban fabric,non-irrigated arable land,land principally occupiedby agriculture,broad-leaved forest Fig. 1 : Example of Sentinel-2 images and their multi-labels in ourBigEarthNet archive. coordination of the European Environment Agency (EEA) forthe recognition, identification and assessment of land coverclasses by leveraging the texture, pattern and density informa-tion of the objects presented in RS images. This inventory isvery recently updated as CLC 2018, for which the annotationprocess has been carried out for the period of 2017-2018. Weselected tiles within the considered time interval to be appro-priate for the annotation period of CLC 2018. CLC nomen-clature includes land cover classes grouped in a three-levelhierarchy . The considered Level-3 CLC class labels and thenumber of images associated with each label are shown in Ta-ble 2. We would like to note that the number of images for eachland cover class varies significantly in the archive. The numberof labels associated with each image varies between and ,whereas % of images have at most multi-labels. Only images contain more than labels in the BigEarthNet. Fig. 1shows an example of images and their multi-labels, while Fig.2 shows the number of Sentinel-2 images with respect to theacquisition date. It is worth noting that we aimed to representeach considered geographic location with images acquired inall different seasons. However, due to the difficulties of col-lecting Sentinel-2 images with lower cloud cover percentagewithin a narrow time interval, it was not possible for some ar-eas. The number of images acquired in autumn, winter, springand summer seasons are , , and respectively. Since cloud cover percentage of Sentinel-2 tilesacquired in winter is generally higher than the other seasons,our archive contains the lowest number of images from winterseason.We also employed the visual inspection for the qualitycheck of image multi-labels. By visual inspection, we haveidentified that , images are fully covered by seasonalsnow, cloud and cloud shadow . We suggest not to includethese images for training and test stages of the machine/deeplearning algorithms, while working on scene classification,content-based image retrieval and search if only BigEarthNetSentinel-2 images are used. https://land.copernicus.eu/user-corner/technical-library/corine-land-cover-nomenclature-guidelines The lists of images fully covered by seasonal snow, cloud and cloudshadow are available at http://bigearth.net/ . J un J u l A u g S e p O c t N o v D e c J a n F e b M a r A p r M a y J un Acquisition Date × × × × × × × N u m b e r o f I m a g e s Fig. 2 : The number of Sentinel-2 images with respect to acquisitiondate.
4. EXPERIMENTAL RESULTS
In the experiments, we have used the BigEarthNet archive inthe framework of RS image scene classification problems. Tothis end, we selected a shallow CNN architecture, which con-sists of three convolutional layers with , and filtershaving × , × and × filter sizes, respectively. Weadded one fully connected (FC) layer and one classificationlayer to the output of last convolutional layer. In all convo-lution operations, zero padding was used. We also appliedmax-pooling between layers. We considered to utilize: i) onlyRGB channels (denoted as S-CNN-RGB); and ii) all spectralchannels (denoted as S-CNN-All). For the S-CNN-All, cubicinterpolation was applied to and meter bands of eachimage to have the same pixel sizes associated with each band.Weights of the S-CNN-RGB and the S-CNN-All were ran-domly initialized and we trained both networks from scratchon the BigEarthNet images. In order to show the effective-ness of the BigEarthNet to be used in training, we comparedthe results with fine-tuning one of the recent pre-trained deeplearning architectures. We considered the Inception-v2 net-work [1] pre-trained on ImageNet as a state-of-the-art architec-ture. We used the feature vector extracted from the layer justbefore the softmax layer of the Inception-v2. To employ fine-tuning, we fixed the model weights of the Inception network.We added one FC and one classification layer to the networkand just fine-tuned these layers by using the RGB channelsof the BigEarthNet images. In the experiments, , im-ages that are fully covered by seasonal snow, cloud and cloudshadow were eliminated. Then, among the remaining images,we randomly selected: i) of images to derive a trainingset; ii) of images to derive a validation set; and iii) of images to derive a test set. Both for fine-tuning and trainingfrom scratch, we selected the number of epochs as andStochastic Gradient Descent algorithm is employed in orderto decrease the sigmoid cross entropy loss (which aims at max-imizing the log-likelihood of each land-cover class throughoutall training images). For the performance metrics of exper-iments, we employed precision ( P ), recall ( R ), F and F scores, which are widely used metrics for multi-label imageclassification. As it can be seen from Table 4, the S-CNN-RGBprovides better performance than the Inception-v2 in all met- able 3 : Example of Sentinel-2 images with the true multi-labels and the multi-labels assigned by the Inception-v2, the S-CNN-RGB and theS-CNN-All.Test Images True Multi-Label Inception-v2 S-CNN-RGB S-CNN-Allpastures, peatbogs non-irrigated arable land,coniferous forest, mixed forest,transitional woodland/shrub non-irrigated arable land, landoccupied by agriculture, mixedforest pastures, peatbogspastures, land occupiedby agriculture, waterbodies coniferous forest, mixedforest, transitionalwoodland/shrub non-irrigated arable land,land occupied byagriculture pastures, land occupiedby agriculture, waterbodiesdiscontinuous urbanfabric, industrial orcommercial units coniferous forest, mixedforest, transitionalwoodland/shrub discontinuous urban fabric, land occupiedby agriculture, broad-leaved forest,coniferous forest, mixed forest discontinuous urbanfabric, industrial orcommercial units Table 4 : Experimental results obtained by the Inception-v2, the S-CNN-RGB and the S-CNN-All.
Method P ( % ) R ( % ) F F Inception-v2 [1] .
23 56 .
79 0 . . S-CNN-RGB .
06 75 .
57 0 . . S-CNN-All .
93 77 .
10 0 . . rics, while both networks consider only RGB image channels.When the S-CNN-All architecture is trained on the BigEarth-Net images containing all spectral bands, the results becomemuch more promising with respect to using only RGB bands.Table 3 shows the example of Sentinel-2 images with the truemulti-labels and the multi-labels assigned by the Inception-v2, the S-CNN-RGB and the S-CNN-All. The performanceimprovements on all metrics are statistically significant undera value of p (cid:28) . . The same behavior is also observedwhen the BigEarthnet images are associated to Level-1 andLevel-2 CLC class labels. We would like to also note thatthe S-CNN-RGB and the S-CNN-All are very simple CNNarchitectures that consist of only 3 convolutional layers andmax-pooling. Training deeper models (which include recentdeep learning techniques such as residual connections, widerlayers with varying filter sizes etc.) from scratch can lead tomore promising results. On the basis of all obtained results,we can state that RS community can benefit from these pre-trained models on the BigEarthNet instead of the computervision archives.
5. CONCLUSION
This paper presents a large-scale benchmark archive thatconsists of , Sentinel-2 image patches annotated bymulti-labels for RS image understanding. We believe thatthe BigEarthNet will make a significant advancement for theuse of deep learning in RS by overcoming the current lim-itations of the existing archives. Experimental results showthe effectiveness of training even a simple neural network onthe BigEarthNet from scratch compared to fine-tuning a state-of-the-art deep learning model pre-trained on the ImageNet.We would like to note that we plan to regularly enrich theBigEarthNet by increasing the number of annotated Sentinel-2images.
6. ACKNOWLEDGEMENTS
This work was supported by the European Research Council underthe ERC Starting Grant BigEarth (759764) and the German Ministryfor Education and Research as BBDC (01IS14013A).
7. REFERENCES [1] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna,“Rethinking the inception architecture for computer vision,” in
IEEE Conf. Comput. Vis. Pattern Recog. , 2016.[2] Y. Yang and S. Newsam, “Bag-of-visual-words and spatialextensions for land-use classification,” in
Intl. Conf. Adv. Geogr.Inf. Syst. , 2010.[3] W. Shao, W. Yang, and G. S. Xia, “Extreme value theory-basedcalibration for the fusion of multiple features in high-resolutionsatellite scene classification,”
Int. J. Remote Sens. , vol. 34, no.23, pp. 8588–8602, 2013.[4] Q. Zou, L. Ni, T. Zhang, and Q. Wang, “Deep learning basedfeature selection for remote sensing scene classification,”
IEEEGeosci. Remote Sens. Lett. , vol. 12, no. 11, pp. 2321–2325,November 2015.[5] B. Zhao, Y. Zhong, G. Xia, and L. Zhang, “Dirichlet-derivedmultiple topic scene classification model for high spatial reso-lution remote sensing imagery,”
IEEE Trans. Geosci. RemoteSens. , vol. 54, no. 4, pp. 2108–2123, April 2016.[6] G. Xia, J. Hu, F. Hu, B. Shi, X. Bai, Y. Zhong, L. Zhang, andX. Lu, “Aid: A benchmark data set for performance evaluationof aerial scene classification,”
IEEE Trans. Geosci. RemoteSens. , vol. 55, no. 7, pp. 3965–3981, July 2017.[7] G. Cheng, J. Han, and X. Lu, “Remote sensing image sceneclassification: Benchmark and state of the art,”
Proc. IEEE , vol.105, no. 10, pp. 1865–1883, October 2017.[8] H. Li, C. Tao, Z. Wu, J. Chen, J. Gong, and M. Deng, “Rsi-cb:A large scale remote sensing image classification benchmarkvia crowdsource data,” arXiv preprint arXiv:1705.10450 , 2017.[9] P. Helber, B. Bischke, A. Dengel, and D. Borth, “Eurosat: Anovel dataset and deep learning benchmark for land use andland cover classification,” arXiv preprint arXiv:1709.00029 ,2017.[10] W. Zhou, S. Newsam, C. Li, and Z. Shao, “Patternnet: A bench-mark dataset for performance evaluation of remote sensing im-age retrieval,”
ISPRS J. Photogram. Remote Sens. , vol. 145, pp.197–209, 2018.[11] B. Chaudhuri, B. Demir, S. Chaudhuri, and L. Bruzzone, “Mul-tilabel remote sensing image retrieval using a semisupervisedgraph-theoretic method,”