[PDF] Multi-species Seagrass Detection and Classification from Underwater Images

Abstract

Underwater surveys conducted using divers or robots equipped with customized camera payloads can generate a large number of images. Manual review of these images to extract ecological data is prohibitive in terms of time and cost, thus providing strong incentive to automate this process using machine learning solutions. In this paper, we introduce a multi-species detector and classifier for seagrasses based on a deep convolutional neural network (achieved an overall accuracy of 92.4%). We also introduce a simple method to semi-automatically label image patches and therefore minimize manual labelling requirement. We describe and release publicly the dataset collected in this study as well as the code and pre-trained models to replicate our experiments at: this https URL

Full PDF

MMulti-species Seagrass Detection and Classiﬁcationfrom Underwater Images

Scarlett Raine , , Ross Marchant , , Peyman Moghadam , Frederic Maire , Brett Kettle , Brano Kusy CSIRO Data61, Brisbane, Australia School of Electrical Engineering and Robotics, Queensland University of Technology (QUT), Brisbane, Australia

Abstract —Underwater surveys conducted using divers orrobots equipped with customized camera payloads can generate alarge number of images. Manual review of these images to extractecological data is prohibitive in terms of time and cost, thusproviding strong incentive to automate this process using machinelearning solutions. In this paper, we introduce a multi-species de-tector and classiﬁer for seagrasses based on a deep convolutionalneural network (achieved an overall accuracy of 92.4%). We alsointroduce a simple method to semi-automatically label imagepatches and therefore minimize manual labelling requirement.We describe and release publicly the dataset collected in thisstudy as well as the code and pre-trained models to replicate ourexperiments at: https://github.com/csiro-robotics/deepseagrass

I. I

NTRODUCTION

Seagrasses are high value coastal resources that providea range of ecosystem services, including physical protectionof coastlines, mitigation of adverse water quality, increasingﬁsheries productivity, and acting as ‘blue carbon’ sinks formitigating climate change [1]. Active management of thesesystems requires assessment of the extent and condition ofseagrass meadows, a complex task because they are dynamic inspace and time, both naturally and in response to anthropogenicimpacts [2]. Well-studied meadows are in decline globally [3],highlighting the need to manage large areas of extant meadows,as well as their historical and potential habitats. An updatedassessment of known distributions concludes that 166,387 km of meadows (moderate to high probability assessment)currently exist [4]. Recent modelling, based on depth, minimumphotosynthetic requirements and seagrass distributional studies,suggest a potential global seagrass habitat of 1,646,788 km .In part, the ten-fold difference between known and potentialdistributions relates to under-survey or limitations of existingsurvey methods [5] in environments where surveys are impededby remoteness, turbidity, depth and coastal conditions [6].Recognising the importance of seagrass and the necessity ofits monitoring for better management, there have been a numberof advances in vision-based seagrass survey methods to increasethe spatial extent or intensity of surveys beyond that which canbe accomplished on foot at low tide, or by snorkelers or divers.Broad-scale methods include remote sensing (satellites, con-ventional aerial photography, drones) and underwater vehiclesthat are towed (TUVs [7]), remotely operated (ROVs [8]) orfully autonomous (AUVs [9]). Computer vision and machinelearning techniques are used to automatically and efﬁciently E-mails: scarlett.raine, ross.marchant,peyman.moghadam, brett.kettle, brano.kusy @csiro.au E-mails: r.marchant, [email protected] analyse the large amount of image data produced [10]. Real-time processing of images is essential if monitoring utilisesan underwater robot and the objective requires adaptation ofsampling effort or survey extents during the ﬁeld program.In this paper we present two novel contributions to theautomatic seagrass detection and classiﬁcation problem. Firstly,we have created a new publicly available image dataset(called

DeepSeagrass ) of three seagrass morphotypes fromMoreton Bay, Australia. The dataset is unique in that itcontains high-resolution images at an oblique angle and closerange, which closely matches the perspective of a diver orunderwater robot, and consists primarily of images with onlya single morphotype present. This allows image-level labelsto be assigned to subregions within the images. Secondly, wedesigned a deep neural network based seagrass classiﬁcationsystem and evaluated it on this image dataset. The training ofthe neural network takes advantage of the single-morphotypeimages to learn to identify seagrass in small image patches. Theadvantages of a patch-based approach are its simplicity, easyapplication to real-time video from an underwater robot, andits ability to yield spatial data at a sub-image level to facilitateestimates of percentage cover or mapping outputs.After reviewing existing work on seagrass monitoring inSection II, we describe the dataset that we release publiclyin Section III. In Section IV, we explain the semi-automaticlabeling of the dataset, the architecture of the neural classiﬁerand details of the training process. Experimental results arereported in Section V.II. R

ELATED W ORK

Improving the mapping of seagrass beds is an active areaof research within the marine biology community. Seagrassdata is collected using techniques such as manual surveys[11], helicopter surveys [12], remote sensing such as satelliteimagery [13] or AUVs [14], [15], [16]; or a combination oftwo or more methods [17]. Producing seagrass maps from thevast amounts of data is often time-consuming using manualcounting methods, and thus automated methods are an activeresearch area.

A. Seagrass Identiﬁcation

Two traditional machine learning approaches to seagrassdetection use Gabor ﬁltering [14], [15] or gray-level co-occurrence matrices [18] to generate feature descriptors, whichare then used to train a Support Vector Machine (SVM) forpixel or patch level classiﬁcation. More recently, deep learning1 a r X i v : . [ c s . C V ] S e p a) Strappy (b) Ferny(c) Rounded (d) Background Fig. 1: Sample of different species of seagrass from a publicimage dataset [11].using convolutional neural networks (CNNs) has emerged asthe state of the art in classiﬁcation problems [19], and thus haspotential for the automated classiﬁcation of seagrass, includingaccurate real-time calculation of percentage cover [20].Moniruzzaman et al. [21] used the Faster-RCNN [22]network to detect individual leaves of a single species ofseagrass,

Halophila ovalis , however, they did not considermultiple species or where the seagrass appears as a dense carpetover the seabed. Reus et al. [23] achieved 94.5% accuracy onpixel-wise segmentation of seagrass, later improved on byWeidmann et al. [24], in images taken by an AUV verticallydownward. They used rectangular image patches to train aCNN to produce binary pixel-wise classiﬁcations, but did notconsider different species of seagrass. Likewise, Martin-Abadal et al. [16] compared a CNN with other methods of machinelearning for single species, ﬁnding that a VGG-16 encoder withan 8-stride fully convolutional network as the decoder [25]achieved state-of-the-art accuracy for pixel-wise segmentationof the seagrass

Posidonia oceanica .In this work, we address the problem of identifying multiplespecies of seagrass. Our approach is to classify differentmorphotypes at the image patch level, using a CNN trained in afully-supervised manner. However, due to the cost of labellingat a patch or even pixel level, we use images that have only onetype of seagrass so that the image-level labels can be appliedas weak labels to the patches.

B. Existing Seagrass Datasets

The method requires a seagrass dataset representing multiplespecies, with enough images containing only one particularspecies. A survey of existing publicly available datasets (TableII) found that there does not exist, to our knowledge, a datasetdesigned for classiﬁcation and detection of different speciesof seagrass. Only two datasets contained multiple species ofseagrass (entries 2 and 5 in Table II) and of these, only thephoto-transect series of the Eastern Banks of Moreton Bay TABLE I: Class Distribution in the Public Dataset [11]

Strappy Ferny Rounded Background (entry 2 in Table II) is publicly available, referred to as the‘Public Dataset’ in the rest of this paper. Figure 1 shows asample of the images in this dataset [11]. ‘Public Dataset’images were collected on the Eastern Banks of Moreton Bay,near Brisbane, Australia and have been made available onPangaea [11]. Image resolution is very low, with sizes of105x140, 150x200 and 180x240 pixels (Figure 1). Imagescontain a variety of marine species and are accompanied at theimage-level with Coral Point Count (CPC) annotations [11].The CPC method is a manual computer-assisted technique thatoverlays points on an image. Labels are assigned to the elementdirectly under a point, and cover is estimated based on pointcounts [26].The images were labelled into three morphological taxo-nomic super-classes according to their CPC label (plus onesubstrate class): ‘Strappy’ encompassed

Cymodocea serrulata , Halodule uninervis , Syringodium isoetifolium and

Zosteramuelleri ; ‘Ferny’ described

Halophila spinulosa ; ‘Rounded’described

Halophila ovalis ; and ‘Background’ consisted ofimages with less than 1% total seagrass cover. Images withless than 70% cover for one species were removed from thedataset. Furthermore, due to the coarse sampling of the CPCmethod, some images did not have accurate species estimations.The dataset was manually inspected by a domain expert andincorrectly labeled images were removed. The ﬁltered datasetwas heavily skewed towards the Strappy class, with signiﬁcantlyfewer examples of Ferny and Rounded seagrass (see Table I).III. D

EEP S EAGRASS D ATASET

Due to the poor resolution, class imbalance, small imagesizes, inconsistent camera angles, high level of blur and theoverall small quantity of usable images, we decided to collecta new dataset from a similar location.Images were acquired across nine different seagrass beds inMoreton Bay, Australia over four days during February 2020.Search locations were chosen according to distributions reportedin the publicly available dataset. A biologist made a searchof each area, snorkelling in approximately 1 - 2m of waterduring low to mid tide. In-situ search of seagrass beds resultedin batches of photographs in 78 distinct geographic sub-areas,each containing one particular seagrass morphotype (or baresubstrate). Images were taken using a Sony Action Cam FDR-3000X from approximately 0.5m off the seaﬂoor at an obliqueangle of around 45 degrees. Over 12,000 high-resolution (4624x 2600 pixels) images were obtained.Images were reviewed to ensure that any containing asecond morphotype at more than approximately 0.5% densitywere placed into a ‘mixed’ class. The remainder were labelledaccording to their dominant morphotype, and divided into threecategories (dense, medium, sparse) according to the relativedensity of seagrass present, for a total of 11 classes.The dataset was then prepared for use in a machine learningpipeline by taking the dense seagrass images and dividing them2ABLE II: Existing Seagrass Datasets

Name of Paper Year Dataset Size Annotations Seagrass Species Access

Fig. 2: Map of 78 distinct sub-areas for our

DeepSeagrass dataset. Colour of the marker refers to the date of collection,green: 2020-02-07, purple: 2020-02-08, red: 2020-02-10, or-ange: 2020-02-11. into train and test sets. Images from 70 of the sub-areas wereallocated to the training set (1,701 images in total), while imagesfrom the remaining eight sub-areas were reserved exclusivelyfor the test set (335 images). Different sub-areas were used sothat evaluation of a classiﬁcation system on the test set wouldassess how well it generalised to different geographic areas.IV. M

ETHOD

Our approach classiﬁes seagrass on a per-patch basis usinga pre-trained CNN classiﬁer. A patch-based method was usedbecause it captures location information about the seagrassin the frame without the need for dense polygon, pixel orbounding box labels within each training image.

A. Dataset Preparation

We use underwater images that consist of only one seagrassmorphotype in each image. For the

DeepSeagrass dataset, theseare the ‘Ferny - dense’, ‘Strappy - dense’ and ‘Round - dense’classes, plus images with no seagrass from the ‘Substrate’ class.Images are then divided into a grid of patches. For the

DeepSeagrass dataset we used a grid of 5 rows by 8 columnsto generate 40 patches of size 520x578 pixels per image (asdiscussed in Section V-A2). The top row of patches for eachimage was discarded, as their oblique pose in low underwatervisibility conditions frequently resulted in hard to distinguishseagrass (Figure 3).Based on our data collection procedure, we can assume(weak labels) that the remainder of the patches for an imagerepresent the same seagrass species (or are all substrate). Assuch, each patch is assigned the image label, reducing the timeand cost of labelling considerably. Patches from the training setwere further split into 80% training and 20% validation sets.Together with the test set patches, these form the dataset ofapproximately 66,946 image patches (Table III) that are usedfor all experiments in this paper.3ABLE III: Class Distribution of Image Patches in

DeepSea-grass

Dataset

TrainingStrappy Ferny Rounded Background Total

ValidationStrappy Ferny Rounded Background Total

TestStrappy Ferny Rounded Background Total

Fig. 3: Overview of our dataset preparation pipeline.

B. Classiﬁcation Model

We use a CNN based on the VGG-16 architecture [30] forthe image classiﬁer. It consists of the VGG-16 model pretrainedon the ImageNet classiﬁcation task, with ﬁnal dense layersremoved and replaced with the following consecutive layers:dense (with 512 nodes and ReLU [31] activation), dropout[32] (with dropout probability of 0.05), dense (with 512 nodesand ReLU activation), dropout (with dropout probability of0.15) and a ﬁnal dense layer with one node for each class,activated with the Softmax function. Using heavy dropout inthe ﬁnal layers has been successful for other transfer learningapproaches with small numbers of classes [32].

C. Training

The CNN was trained on batches of 32 images at each step,using the Adam optimiser [33] with learning rate set to aninitial value of 0.001. The learning rate was halved wheneverthere was no improvement in loss over the last 10 epochs, andafter four such decreases the training was stopped.Colour-based image augmentation was applied to thetraining images to make the model robust to underwater lightvariations driven by the incident angle of the sun striking thesurface of the water, the turbidity and colour of the waterand the depth at which imagery is captured. To mimic these effects, we chose to vary the brightness, contrast and blur fordata augmentation. Additionally, the red channel was alteredto change the "blueness" of the images. The data augmentationablation study is described in Section V-A3.

D. Application to Whole Images

The trained model is then applied to whole frames froman underwater video. Whole images are divided into patcheswith size equal to that used to train the model. Each patch isclassiﬁed, assigning a label corresponding to the argmax of theSoftmax output layer. Multi-label classiﬁcation is also possibleby replacing the Softmax layer with an array of Sigmoidunits, but we do not cover this approach in this paper. Finally,classiﬁcation labels are visualised by colour coded overlay ontothe original image. V. R

ESULTS

The classiﬁer was trained and evaluated using 5-fold crossvalidation on the training set, whereby the training set is splitinto 80% training and 20% validation sets, and the evaluationmetrics were averaged across all ﬁve runs. The metrics usedfor evaluating the performance of the models were per-classprecision, recall and the overall accuracy. The precision andrecall were calculated for each of the four classes. Additionally,an overall accuracy was calculated by determining the fractionof correctly classiﬁed examples over all images. The modelachieved 98.2% to 99.6% precision and 98.0% to 99.7% recallfor each class, and an overall accuracy of 98.8%. Accuracydropped to 88.2% when the model was applied to test patchesfrom the unseen geographic sub-areas, suggesting that imagesfrom these sites are different (Tables VI and VII).

A. Ablation Study

An ablation study was performed to see the effect ofmodel architecture, patch size and augmentation methods. Allexperiments not involving the test set were implemented using5-fold cross validation.

1) Network Architecture:

In the ﬁrst experiment we testedtwo different feature extractors, the VGG-16 [30] and ResNet-50 [34] CNNs pretrained on ImageNet with ﬁnal dense layersremoved. One of three ﬁnal layer conﬁgurations on the output ofthese networks were used: ﬁrst, k-Nearest Neighbours (“KNN”)with k=3. Second, a simple 2-layer dense network (“2-Layer”)with 16 nodes and 14 nodes consecutively. Last, a 2-layerdense network with dropout of 0.05 between each dense layerof 512 nodes (“2-Layer+Drop”). Initial experiments showedno real difference between any conﬁguration, so instead themodels were evaluated on the test set. The VGG-16 CNNwith “2-Layer+Drop” conﬁguration was the best performingnetwork with 87.2% overall accuracy. Interestingly, the overallaccuracy of ResNet-50 network did not change with the ﬁnalconﬁguration (Table IV).

2) Patch Size:

The best performing network of the previoussection (VGG-16 with 2-Layer+Drop) was used to investigatethe effect of patch size and data augmentation. The patch sizewas considered as a hyper-parameter of the training phase.Smaller patch size did not improve model performance onthe validation set, but decreased per-class precision and recall(Table V). This may be because the smaller patch size removed4ABLE IV: Performance of Classiﬁers on the Unseen Test Dataset

Strappy Ferny Rounded Background

Model

Prec. Recall Prec. Recall Prec. Recall Prec. Recall Acc.VGG-16 KNN 0.76 0.84 0.80 0.91 0.73 0.83 0.97 0.75 0.83VGG-16 2-Layer 0.80 0.96 0.82 0.94

VGG-16 2-Layer+Drop.

ResNet-50 KNN (a) 260x289 Pixels(b) 520x578 Pixels

Fig. 4: Different patch size during the training phase.discriminative features. The effect was emphasised for theStrappy and Ferny morphotypes, which generally appear largerin the image, while the smaller patch size was more suitablefor physically smaller Rounded morphotypes (Figure 4).

3) Data Augmentation:

Five data augmentation approacheswere tested: no data augmentation; horizontal ﬂipping 50%of the time; geometric augmentation, consisting of horizontalﬂips, random cropping, zooming and translation; colour aug-mentation, consisting of contrast, brightness, Gaussian blur andalteration of the red channel; and a combination of both thegeometric and colour augmentations.For the data augmentation experiments, the results from5-fold cross validation showed very little difference betweenaugmentation types (Table VI). Evaluation on the test datasetshowed more variation (Table VII). The best performing strategywas colour augmentation, followed by colour and geometricaugmentation. This suggests that the lighting, turbidity or depthof the test dataset may have been outside the distribution ofthe training dataset, and that colour augmentation is importantfor robust seagrass classiﬁcation.

B. t-SNE Analysis

The output of the penultimate layer in the classiﬁcationmodel (before the Softmax output layer) was considered to bea feature vector representing the input image. For visualisation,dimensionality reduction was performed using t-SNE [35] onthe feature vectors extracted from both the training and testdatasets (Figure 5). The resulting t-SNE plots were analysedto see if the clusters in the low-dimensional representationcorrespond to the image classes, as this provides an insight into the ability of the classiﬁer to extract useful features forclass discrimination (Figure 5).The training set feature vectors are grouped into four clustersthat generally correspond to their classes. There are someexceptions which may be explained by patches being assignedincorrect image-level labels, for example, an area of baresubstrate in an otherwise densely covered seagrass image, aswell as blurry and indistinct patches from the far ﬁeld of view.By contrast, there are two distinct clusters for class 3("Background") apparent in the test t-SNE plot, when onlyone is present in the training plot (Figure 5). This can beattributed to image patches of the water column, present in thetest dataset but not in the training dataset. Water column patcheswere not separated into the Background class during trainingdataset preparation as all child patches from each seagrassimage were assigned to that class. When water column patchesare removed from the test dataset, the extra cluster in the t-SNE plot disappeared (Figure 6) and the test accuracy of theclassiﬁer improved (Table VIII).The water column patches exhibit different visual featuresto the predominantly substrate patches. This led us to separatethe Background class into sub-classes for Substrate and Water.After retraining on this modiﬁed dataset the classiﬁer achievedan overall accuracy of 92.4% (Table IX), and the resultingt-SNE plots have ﬁve distinct clusters (Figure 7).

C. Inferences on Larger Images

The trained classiﬁer was then applied to entire images toanalyse where it had failed. Patches are classiﬁed correctly inmost of the images (Figure 8). Interestingly, ﬁnal classiﬁcationsare correct in some of the images where patches differ fromimage-level labels, as occurs frequently in less dense ‘Rounded’ (a) Training Dataset (b) Test Dataset

Fig. 5: t-SNE plots from training and test datasets.5ABLE V: Effect of Patch Size on Per-Class Precision and Recall

Strappy Ferny Rounded Background

Patch Size

Prec. Recall Prec. Recall Prec. Recall Prec. Recall Acc.VGG-16 520x578

VGG-16 260x289 0.9501 0.9195 0.9269 0.9234 0.9475 0.9193

ResNet-50 260x289 0.9547 0.9358 0.9383 0.9570 0.9593 0.9278 0.9712

TABLE VI: Effect of Data Augmentation on 5-Fold Cross Validation Per-Class Precision and Recall

Strappy Ferny Rounded Background

Augmentation Type

Prec. Recall Prec. Recall Prec. Recall Prec. Recall Acc.No Data Aug. 0.993

Horizontal Flipping Only

TABLE VII: Effect of Data Augmentation on Test Dataset Per-Class Precision and Recall

Strappy Ferny Rounded Background

Augmentation Type

Prec. Recall Prec. Recall Prec. Recall Prec. Recall Acc.No Data Aug. 0.82

Geometric & Colour Aug.

TABLE VIII: 4-Class Classiﬁer Per-Class Precision and Recall with Water Patches Removed from Test Dataset

Strappy Ferny Rounded Substrate

Prec. Recall Prec. Recall Prec. Recall Prec. Recall AccuracyValidation Results 0.990 1.000 0.982 0.985 0.990 0.958 0.991 1.000 0.988Test Results 0.96 0.96 0.97 0.97 0.88 0.88 0.97 0.98 0.96

Fig. 6: t-SNE plots from test dataset with water images removed.examples. This suggests that the model may classify grid cellscorrectly in cases where labels are ambiguous or incorrect.The failure case examples (Figure 9) suggest that blur andnoise in the top-row of patches caused errors. For example,in the top row of the ‘Ferny’ and ‘Rounded’ examples theseaﬂoor recedes into the distance and seagrass is too indistinct (a) 5-Class Training Dataset (b) 5-Class Test Dataset

Fig. 7: t-SNE plots for 5-class model.for useful classiﬁcation. These cells were deliberately removedfrom the training set. Additionally, for the ‘Ferny’ image, themodel has mistaken the exposed seagrass root system as a‘strap-like’ feature, triggering incorrect Strappy classiﬁcationsin the bottom left corner of the frame (Figure 9 (b)).6ABLE IX: 5-Class Classiﬁer Per-Class Precision and Recall

Strappy Ferny Rounded Substrate Water

Prec. Recall Prec. Recall Prec. Recall Prec. Recall Prec. Recall AccuracyValidation Results 0.993 0.987 0.990 0.985 1.000 0.985 0.981 0.981 1.000 1.000 0.989Test Results 0.892 0.952 0.938 0.960 0.898 0.858 0.94 0.982 0.982 0.258 0.924 (a) Strappy (b) Ferny(c) Rounded (d) Background

Fig. 8: Test dataset images with with correctly classiﬁed patches(where yellow = strappy, red = ferny, blue = rounded and pink= background). (a) Strappy (b) Ferny(c) Rounded (d) Background

Fig. 9: Test dataset images with incorrectly classiﬁed patches(where yellow = strappy, red = ferny, blue = rounded and pink= background). VI. D

ISCUSSION

The method presented in this work relies on the collectionof images that contain a single class. Labelling is then onlyrequired at the image level, and labels can be propagated tosub-image patches with high conﬁdence. Most of the labellingwork is thus shifted to collection time, where the photographeraims to take as many single-class images as possible. Patternsof species dominance within seagrass beds make them an idealapplication of the technique, but it could also be applied toother ecological applications such as detecting and classifyingcorals in underwater images or agricultural applications such as detecting crop diseases or weeds [36].Using this approach we were able to discriminate seagrassesfrom substrate and classify them into three morphotype super-classes. When visualised on whole images the patches weregenerally classiﬁed correctly (Figure 8), even for patchesdiffering in class from their image label. This suggests assigningimage-level labels to patches when creating the training setis somewhat robust to errant patches. However, we did ﬁndclear beneﬁts to including an extra class to represent the watercolumn or far-off background.Further work could include extension of the method todistinguish speciﬁc species of seagrass in images. A logicalnext step would be to separate thin strap-like seagrasses fromthick strap-like seagrasses, for example Cymodocea serrulataand Zostera muelleri, before progressing onto more visuallysimiliar species within the Rounded super-class. The approachwould additionally beneﬁt from extensive ﬁeld trials in awider range of conditions: depth, turbidity, time of day andgeographical location. Seagrass appearance also changes withseasons, meaning that the model could be improved by trainingon data collected throughout periods of growth and senescenceand testing of the model across years. We will also investigatethe use of domain randomized synthetic dataset to bridgedomain and species gap [37], [38].VII. C

ONCLUSION

This work has implemented a deep learning method forautomated detection and classiﬁcation of seagrass species. Theapproach focused on classiﬁcation of seagrass into three majormorphological super-classes plus background and presented anovel method of training a multi-species seagrass classiﬁerusing a dataset of single species images. The four-classapproach achieved an overall accuracy of 88.2%, which wasthen improved to 92.4% by dividing the Background class intoSubstrate and Water Column sub-classes. The approach couldbe extended by separating morphological super-classes into theindividual seagrass species and would beneﬁt from increasedtraining data and robust testing in the ﬁeld. Furthermore, theapproach presented could be used as a basis for estimations ofspecies-speciﬁc percentage cover and density. To foster furtherresearch, we release the dataset used in this paper and the codedeveloped for analysis.A

CKNOWLEDGMENT

This work was done in collaboration between CSIROData61, CSIRO Oceans and Atmosphere, Babel-sbf and QUTand was funded by CSIRO’s Active Integrated Matter andMachine Learning and Artiﬁcial Intelligence (MLAI) FutureScience Platform. S.R., R.M. and F.M. acknowledge continuedsupport from the Queensland University of Technology (QUT)through the Centre for Robotics.7

EFERENCES[1] P. S. Lavery, M.-Á. Mateo, O. Serrano, and M. Rozaimi, “Variability inthe carbon storage of seagrass habitats and its implications for globalestimates of blue carbon ecosystem service,”

PloS one , vol. 8, no. 9, p.e73748, 2013.[2] R. K. Unsworth, L. J. McKenzie, L. M. Nordlund, and L. C. Cullen-Unsworth, “A changing climate for seagrass conservation?”

CurrentBiology , vol. 28, no. 21, pp. R1229–R1232, 2018.[3] R. K. Unsworth, L. J. McKenzie, C. J. Collier, L. C. Cullen-Unsworth,C. M. Duarte, J. S. Eklöf, J. C. Jarvis, B. L. Jones, and L. M. Nordlund,“Global challenges for seagrass conservation,”

Ambio , vol. 48, no. 8, pp.801–815, 2019.[4] M. Vanderklift, D. Bearham, M. Haywood, H. Lozano-Montes, R. McCal-lum, J. McLaughlin, and P. Lavery, “Natural dynamics: understandingnatural dynamics of seagrasses in north-western australia,”

WAMSIDredging Science Node Report Perth , 2016.[5] L. McKenzie, L. M. Nordlund, B. L. Jones, L. C. Cullen-Unsworth,C. M. Roelfsema, and R. Unsworth, “The global distribution of seagrassmeadows,”

Environmental Research Letters , 2020.[6] L. J. McKenzie, M. A. Finkbeiner, and H. Kirkman, “Methods formapping seagrass distribution,”

Global seagrass research methods , pp.101–121, 2001.[7] F. S. Rende, A. D. Irving, A. Lagudi, F. Bruno, S. Scalise, P. Cappa,M. Montefalcone, T. Bacci, M. Penna, B. Trabucco et al. , “Pilotapplication of 3d underwater imaging techniques for mapping posidoniaoceanica (l.) delile meadows,”

International Archives of the Photogram-metry, Remote Sensing and Spatial Information Sciences , vol. 5, p. W5,2015.[8] C. W. Finkl and C. Makowski,

Seaﬂoor Mapping Along ContinentalShelves: Research and Techniques for Visualizing Benthic Environments .Springer, 2016, vol. 13.[9] J. Monk, N. Barrett, T. Bridge, A. Carroll, A. Friedman, A. Jordan,G. Kendrick, V. Lucieer et al. , “Marine sampling ﬁeld manual for auv’s(autonomous underwater vehicles),” 2018.[10] O. Beijbom, P. J. Edmunds, D. I. Kline, B. G. Mitchell, and D. Kriegman,“Automated annotation of coral reef survey images,” in . IEEE, 2012,pp. 1170–1177.[11] C. M. Roelfsema, E. Kovacs, and S. R. Phinn, “Benthic and substratecover data derived from a time series of photo-transect surveys for theeastern banks, moreton bay australia, 2004-2014.” 2015.[12] S. J. Campbell and L. J. McKenzie, “Flood related loss and recovery ofintertidal seagrass meadows in southern queensland, australia,”

Estuarine,Coastal and Shelf Science , vol. 60, no. 3, pp. 477–490, 2004.[13] S. Phinn, C. Roelfsema, A. Dekker, V. Brando, and J. Anstee, “Mappingseagrass species, cover and biomass in shallow waters: An assessmentof satellite multi-spectral and airborne hyper-spectral imaging systemsin moreton bay (australia),”

Remote sensing of Environment , vol. 112,no. 8, pp. 3413–3425, 2008.[14] F. Bonin-Font, A. Burguera, and J.-L. Lisani, “Visual discriminationand large area mapping of posidonia oceanica using a lightweight auv,”

Ieee Access , vol. 5, pp. 24 479–24 494, 2017.[15] A. Burguera, F. Bonin-Font, J. L. Lisani, A. B. Petro, and G. Oliver,“Towards automatic visual sea grass detection in underwater areas ofecological interest,” in . IEEE, 2016,pp. 1–4.[16] M. Martin-Abadal, E. Guerrero-Font, F. Bonin-Font, and Y. Gonzalez-Cid, “Deep semantic segmentation in an auv for online posidoniaoceanica meadows identiﬁcation,”

IEEE Access , vol. 6, pp. 60 956–60 967, 2018.[17] C. Roelfsema, M. Lyons, M. Dunbabin, E. M. Kovacs, and S. Phinn,“Integrating ﬁeld survey data with satellite image data to improve shallowwater seagrass maps: the role of auv and snorkeller surveys?”

RemoteSensing Letters , vol. 6, no. 2, pp. 135–144, 2015.[18] M. Massot-Campos, G. Oliver-Codina, L. Ruano-Amengual, andM. Miró-Juliá, “Texture analysis of seabed images: Quantifying thepresence of posidonia oceanica at palma bay,” in . IEEE, 2013, pp. 1–6. [19] F. Sultana, A. Suﬁan, and P. Dutta, “Advancements in image classiﬁcationusing convolutional neural network,” in . IEEE, 2018, pp. 122–129.[20] M. Moniruzzaman, S. Islam, P. Lavery, M. Bennamoun, and C. P.Lam, “Imaging and classiﬁcation techniques for seagrass mapping andmonitoring: A comprehensive survey,” arXiv preprint arXiv:1902.11114 ,2019.[21] M. Moniruzzaman, S. M. S. Islam, P. Lavery, and M. Bennamoun,“Faster r-cnn based deep learning for seagrass detection from underwaterdigital images,” in . IEEE, 2019, pp. 1–7.[22] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-timeobject detection with region proposal networks,” in

Advances in neuralinformation processing systems , 2015, pp. 91–99.[23] G. Reus, T. Möller, J. Jäger, S. T. Schultz, C. Kruschel, J. Hasenauer,V. Wolff, and K. Fricke-Neuderth, “Looking for seagrass: Deep learningfor visual coverage estimation,” in . IEEE, 2018, pp. 1–6.[24] F. Weidmann, J. Jäger, G. Reus, S. T. Schultz, C. Kruschel, V. Wolff, andK. Fricke-Neuderth, “A closer look at seagrass meadows: Semantic seg-mentation for visual coverage estimation,” in

OCEANS 2019-Marseille .IEEE, 2019, pp. 1–6.[25] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networksfor semantic segmentation,” in

Proceedings of the IEEE conference oncomputer vision and pattern recognition , 2015, pp. 3431–3440.[26] K. E. Kohler and S. M. Gill, “Coral point count with excel extensions(cpce): A visual basic program for the determination of coral andsubstrate coverage using random point count methodology,”

Computers& geosciences , vol. 32, no. 9, pp. 1259–1269, 2006.[27] I. M. O. S. (IMOS), “IMOS - AUV SIRIUS, CAMPAIGN:GREAT BARRIER REEF, FEBRUARY 2011.” 2011. [On-line]. Available: https://catalogue-imos.aodn.org.au/geonetwork/srv/eng/catalog.search et al. , “Australiansea-ﬂoor survey data, with images and expert annotations,”

Scientiﬁcdata , vol. 2, no. 1, pp. 1–13, 2015.[29] L. J. McKenzie, “Seagrass meadows of Hervey Bay and the GreatSandy Strait, Queensland, derived from ﬁeld surveys conducted 6-14December, 1998,” Centre for Tropical Water and Aquatic EcosystemResearch, James Cook University, Townsville, 2017. [Online]. Available:https://doi.org/10.1594/PANGAEA.876714[30] K. Simonyan and A. Zisserman, “Very deep convolutional networks forlarge-scale image recognition,” arXiv preprint arXiv:1409.1556 , 2014.[31] A. F. Agarap, “Deep learning using rectiﬁed linear units (relu),” arXivpreprint arXiv:1803.08375 , 2018.[32] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdi-nov, “Dropout: a simple way to prevent neural networks from overﬁtting,”

The journal of machine learning research , vol. 15, no. 1, pp. 1929–1958,2014.[33] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 , 2014.[34] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for imagerecognition,” in

Proceedings of the IEEE conference on computer visionand pattern recognition , 2016, pp. 770–778.[35] L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,”

Journalof machine learning research , vol. 9, no. Nov, pp. 2579–2605, 2008.[36] P. Moghadam, D. Ward, E. Goan, S. Jayawardena, P. Sikka, andE. Hernandez, “Plant disease detection using hyperspectral imaging,” in , pp. 1–8.[37] D. Ward, P. Moghadam, and N. Hudson, “Deep leaf segmentation usingsynthetic data,” in

British Machine Vision Conference (BMVC) workshopon Computer Vision Problems in Plant Pheonotyping (CVPPP2018) ,2018, p. 26.[38] D. Ward and P. Moghadam, “Scalable learning for bridging the speciesgap in image-based plant phenotyping,”

Computer Vision and ImageUnderstanding , p. 103009, 2020., p. 103009, 2020.