[PDF] Deep Neural Networks for Active Wave Breaking Classification

Abstract

Wave breaking is an important process for energy dissipation in the open ocean and coastal seas. It drives beach morphodynamics, controls air-sea interactions, determines when ship and offshore structure operations can occur safely, and influences on the retrieval of ocean properties from satellites. Still, wave breaking lacks a proper physical understanding mainly due to scarce observational field data. Consequently, new methods and data are required to improve our current understanding of this process. In this paper we present a novel machine learning method to detect active wave breaking, that is, waves that are actively generating visible bubble entrainment in video imagery data. The present method is based on classical machine learning and deep learning techniques and is made freely available to the community alongside this publication. The results indicate that our best performing model had a balanced classification accuracy score of ~90% when classifying active wave breaking in the test dataset. An example of a direct application of the method includes a statistical description of geometrical and kinematic properties of breaking waves. We expect that the present method and the associated dataset will be crucial for future research related to wave breaking in several areas of research, which include but are not limited to: improving operational forecast models, developing risk assessment and coastal management tools, and refining the retrieval of remotely sensed ocean properties.

Full PDF

DDeep Neural Networks for Active Wave BreakingClassiﬁcation

Caio E. Stringari , Pedro V. Guimar ˜aes , Jean-Franc¸ ois Filipot , Fabien Leckler ,and Rui Duarte France Energies Marines, Plouzan ´e, 29280, France PPGOceano, Federal University of Santa Catarina, Florian ´opolis, 88040-900, Brazil * [email protected] + these authors contributed equally to this work ABSTRACT

90% when classifying active wave breaking in the testdataset. An example of a direct application of the method includes a statistical description of geometricaland kinematic properties of breaking waves. We expect that the present method and the associateddataset will be crucial for future research related to wave breaking in several areas of research, whichinclude but are not limited to: improving operational forecast models, developing risk assessment andcoastal management tools, and reﬁning the retrieval of remotely sensed ocean properties.

Introduction

Wave breaking is one of the most challenging water wave phenomena to investigate. Despite nearlytwo centuries of research, the current practical description of the process is still mostly empirical .Precise wave breaking modelling requires explicit knowledge of the wave phase speed and the ﬂuidvelocity distribution on the wave crest which can only be done by numerically solving the Navier-Stokesequations in a framework that currently is too computationally expensive for practical applications .Consequently, current large-scale state-of-the-art wave models rely on statistical approaches (mostly a r X i v : . [ phy s i c s . a o - ph ] J a n he spectral approach) to represent the waves . This type of models parameterizes wave breaking asa function of known parameters such as the wind speed , the local wave height to water depth ratio or semi-empirical breaking wave height probability distributions

2, 9 . Due to a lack of observed data,the constants involved in these models have been derived from limited datasets that may not adequatelyrepresent the natural environment. For example, a recent study has shown that popular surf zone parametricwave breaking models incorrectly represented the fraction of broken waves in their formulations witherrors >50%, despite these models being able to adequately represent surf zone energy dissipation possiblydue to parameter tuning . This paper aims to start addressing the data unavailability issue by providing areliable and reproducible method to detect and track waves that are actively breaking in video imagerydata.Wave breaking directly affects several environmental phenomena. For example, wave breaking isconsidered as the main driver for air-sea exchanges , being often described as a function of Phillips’ Λ ( c ) dc parameter. This parameter is deﬁned as the average total length per unit surface area of breakingfronts that have velocities in the range c to c + dc and its moments are assumed to correspond to differentphenomena such as whitecap coverage (second moment), rate of air entrainment (third moment), andmomentum ﬂux (fourth moment). As previously identiﬁed , different interpretations of how data isprocessed to obtain Λ ( c ) dc resulted in differences of 175% in the ﬁrst moment and 300% in the ﬁfthmoment of Λ ( c )

13, 14 . This issue does not seem to limit the applicability of Phillips’ framework. A recentstudy, for example, used a wave breaking parameterization fully based on Λ ( c ) dc to derived whitecapcoverage off the coast of California . To the authors’ knowledge, however, no study has assessed theimpact that errors in wave breaking detection has on measured Λ ( c ) distributions and how such errorswould impact on the conclusions that have been drawn from these models (for example, the assumptionthat Λ ( c ) is a simple function of wind speed ). More importantly, in the case which new observationsdiverge from Phillips’ original theory (as we will later show in this paper), new and improved wavebreaking models will certainly follow.Further, breaking waves have a direct impact on the remote sensing of the ocean surface from satellites,aircraft, or platform mounted instruments. Wave breaking has been observed to signiﬁcantly modulatemeasurements of backscatter off radars with large increases in backscatter being directly correlated with reaking waves . Given that part of the data that will be used here (see Methods for details) coincideswith the location of previous wave backscatter studies , data derived from our method could to leadto explicit formulations correlating wave breaking and radar backscatter. Such formulations could, inthe future, lead to more reliable satellite-derived scatterometer data, which currently ignores or relieson empirical wave breaking models . The foam generated by breaking waves at the sea surface alsoinﬂuences on the measurement of microwave brightness temperature signatures for moderate to high windspeeds

19, 20 . To date, no study has, however, systematically quantiﬁed such temperature signatures for largeﬁeld datasets and most of our knowledge is empirical. Such limitation could be addressed by extendingwave breaking detection and tracking methods to infrared wavelengths, which is directly correlated towater temperature . Advancements in this regard could lead to improvements in global climate modelsgiven that the inclusion of wave-generated heat transfer has shown to reduce cold biases in such modelsfor non-breaking waves . Breaking waves are expected to transfer more heat than non-breaking waves and, when considered, could improve climate models even further.With the recent explosion in the usage of machine learning, particularly deep neural networks, it wasonly a matter of time before these techniques were adapted to wave research. Recent studies have shownthat deep neural networks can accurately be used to classify different types of surf zone breakers , obtainwave heights from video data , track waves in the surf zone and the shoreline position and, whenapplied to pressure transducer data, distinguish between broken and unbroken waves . In this paper, wedescribe a robust and extensible method to identify and track breaking waves in video imagery data usinga combination of classic machine learning algorithms and deep learning. More importantly, we make thepresent dataset and code base fully available for future researchers who can either use it directly, re-trainthe models adding more training data, or expand the present method to other cases (for example, objectsegmentation). The development and standardization of a general framework to detect wave breaking invideo imagery data should help to provide the wave breaking statistics database that is currently neededto assess, or further develop, our current understanding of several processes related to breaking waves.Ultimately, the efforts initiated here aim to lead to improvements in several areas of research such aswave modelling and forecasting, wave energy tracking and harvesting , ocean-atmosphere interaction ,remote sensing of the ocean , and safety at sea and coasts . ethod Model Deﬁnition

In this study, we have developed a novel method to detect active wave breaking in video imagery data.Similarly to previous studies, we exploit the fact that breaking wave crests generate characteristic whitefoam patches from bubble entrainment

30, 31 . Differently from the vast majority of previous methods, herewe make a clear distinction between active wave breaking (that is, visible foam being actively generatedby wave breaking) from passive foam and to all other non-wave breaking instances. To the authors’knowledge, only the methods of Mironov & Dulov and Kleiss & Melville have previously attemptedto distinguish between active wave breaking and passive foam. Both their methods start similarly toours (see next paragraph) by identifying bright pixels in a given series of images via thresholding. Next,interconnected pixels are connected in space and time using a nearest neighbor search (Mironov & Dulov )or by extracting connected contours (Kleiss & Melville ). In both previous methods, the separationbetween active and passive wave breaking is then done by empirically deﬁning foam expansion directionand velocity limits. The present method aims to remove empirical steps in active wave breaking detectionby using data-driven methods and deep neural networks instead of experiment-speciﬁc parameters.Our method starts by naïvely identifying wave breaking candidates in a given input image. This wasdone by identifying bright (that is, white) pixels in the image using the local thresholding algorithmavailable from the OpenCV library with its default parameters. Interconnected regions of bright pixelsin the image were then clustered using DBSCAN and a minimum enclosing ellipse was ﬁtted toeach identiﬁed cluster. The DBSCAN step requires the user to deﬁne a minimum number of pixelsto be clustered (default value is 10 for all data used here) and a maximum distance allowed betweenpixels (default value of 5 for all data used here). This step resulted in large quantities of bright pixelpatches being detected. At this stage, it was not possible to determine whether a given identiﬁed clusterof pixels represented active wave breaking, passive foam, or any other instance of bright pixels (forexample, birds, boats, coastal structures, sun glints or white clouds). Generically, this step can be thoughtof a feature extraction step and can be replaced by any other equivalent approach (for example, imagefeature extractors). To avoid exponential memory consumption growth generated by DBSCAN, the input mage may be subdivided into blocks of regular size. This step was easily parallelized with performanceincreasing linearly with number of computing threads.The second step of the method consisted of training a deep convolutional neural network to distinguishbetween active wave breaking and passive foam (and all other non-desired occurrences). This step canbe understood as a binary classiﬁcation step. Figure 1 shows a schematic representation of the presentmethod. From the original images, randomly selected subsets of size 256x256 pixels centered on theﬁtted ellipses were extracted and manually labelled as either active wave breaking (label 1, or positive)or otherwise (label 0, or negative). The present training dataset was generated using raw video imagerydata from Guimara˜es et al. and consisted of 19000 training and 1300 testing images (see Table 1). Thetraining dataset is further split into 80% training and 20% validation data . The validation dataset isreserved for future hyper-parameter ﬁne-tuning. Note that the present dataset has a class imbalance of ≈

90% towards the negative label, that is, for each active wave breaking sample (label 1, or positive) thereare nine instances that were not active wave breaking (label 0, or negative). Data augmentation (rotation,vertical and horizontal ﬂips, and zoom) was employed during training to increase the variety of samples ofthe positive label.Five state-of-the-art neural network architectures, or backbones (VGG16 , ResNet50V2 , Incep-tionResNetV2 , MobileNetV2 and EfﬁcientNetB5 ), were implemented and can be chosen by endusers (see our Github code repository https://github.com/caiostringari/deepwaves forusage guidance). All of these backbones make use of convolutional layers to gradually extract informationfrom the input images. The main differences between them are how many layers they have, the size of theconvolution windows, how data is normalized in each layer and how each layer connects to each other (orto previous layers in the stack). For example, VGG16 is 16 layers deep, uses 3x3 convolution windowsand has no normalization. InceptionResNetV2 is 164 layers deep, uses a mix of 5x5, 3x3 and 1x1 convolu-tion windows and uses batch normalization to help avoiding overﬁtting. Residual Networks (namely,ResNet50V2) not only connect adjacent layers but also take into account errors (residuals) from previouslayers. In general, more modern architectures (namely, EfﬁcientNet) are wider (mainly by having parallelconvolution windows) as well as much deeper than older architectures (namely, VGG16). The ﬁnal toplayers of the network were ﬁxed for all backbones and consisted of ﬂattening the last convolutional layer, wo fully-connected layers with 50% dropout , and a ﬁnal classiﬁcation layer with sigmoid activation.The optimization step (training) was done using the Adam implementation of the stochastic gradientdescent method and binary cross-entropy was used as the loss function. Note that this step must becomputed using a graphics processing unit (GPU) in order to achieve feasible computation times. Themodels took from three (VGG16) to twelve (EfﬁentNetB5) hours to train using a NVIDIA GTX 1080GPU with a batch size of sixty-four images. After the neural networks were sufﬁciently trained, the bestperforming network (VGG16, see below) was used to classify all the naïvely identiﬁed wave breakingcandidates and only the events classiﬁed as active wave breaking were kept for further analyses. AlthoughVGG16 was chosen for presenting the results, the performance of the other architectures is nearly identicalto VGG16 on real-world applications. Finally, note that user-tunable parameters for the neural networks(for example, learning rate and neuron activation thresholds) were kept unchanged from their defaultvalues in the TensorFlow library . This implicates that more aggressive parameter optimization couldimprove the results presented here even further. For sake of brevity, we refer the reader to the project’scode repository for guidance on how to select hyper-parameters. Table 1.

Data characterization summary table. f is the sampling frequency in Hertz, D is the duration ofthe experiment in seconds, H s is the signiﬁcant wave height computed from the wave spectrum, T p and T p are, respectively, the peak wave period of the ﬁrst and second spectral partitions computed followingHanson & Jensen , D p is the peak wave direction, and U is the wind speed measured or converted(denoted by the ∗ ) to a height of 10m above the sea surface using Large & Pond’s formula. Tr. S. and Ts.S. are the sample size of the train and test datasets, respectively. Location Date and Time f [ Hz ] D. [ min ] H s [m] T p [ s ] T p [ s ] U [ ms − ] D p Tr. S. Ts. S.Black Sea 2011/10/01 14:18 12 07 0.3 6.20 3.10 10.7* WSW 1000 100Black Sea 2011/10/04 09:38 12 20 0.36 6.10 2.63 10.1* WSW 1000 100Black Sea 2011/10/04 11:07 12 30 0.45 6.10 3.16 12.2* WSW 1000 100Black Sea 2011/10/04 13:30 12 30 0.55 6.60 3.71 12.9* WSW 1000 100Black Sea 2013/09/22 13:00 10 15 0.66 4.30 30 8.7* E 1000 100Black Sea 2013/09/25 12:15 12 15 0.41 4.10 1.20 6.1* N 1000 100Black Sea 2013/09/30 10:20 12 15 0.65 5.70 1.80 15.2* N 1000 100Adriatic Sea 2014/03/27 09:10 12 60 1.36 5.02 - 9.9 ENE 2000 100Adriatic Sea 2015/03/05 10:35 12 60 1.33 6.10 5.02 9.0 ENE 2000 100Yellow Sea 2017/05/13 05:00 10 05 1.93 5.20 4.02 13.4 NW 2000 100La Jument 2017/12/15 14:20 10 30 5.88 10.00 6.80 13.3 W 2000 100La Jument 2018/01/03 09:39 10 30 10.03 12.80 9.30 17.9 W 2000 100La Jument 2018/01/04 11:43 10 30 7.52 11.10 - 14.7 W 2000 100

Total:

The last step of the method consisted of grouping the active wave breaking events in space and time igure 1.

Schematic representation of a deep convolutional neural network. The input layer consists ofan active wave breaking candidate and has shape 256x256x3 (image height, image width and number ofRGB channels). In the case of a grey-scale image, the single grey channel was triplicated. The red dashedrectangle in the analyzed image shows the region of interest which, in this particular case, was equal to thestereo video reconstruction area. The intermediary convolutional layers (or backbones) areinterchangeable with different options available (see text for details). The function of the convolutionallayers is to extract features from the input image by using convolutions (3x3 in this example) and maxpooling (that is, selecting the brightest pixel in a given window). The last convolutional layer is ﬂattened(that is, turned into a one-dimensional vector) and is connected to two fully-connected (that is, multi-layerperceptron-like) layers and one ﬁnal classiﬁcation layer. A 50% dropout (that is, random selection ofneurons in a layer) is applied after each fully connected layer. The ﬁnal classiﬁcation layer has one unitand uses sigmoid activation with a threshold of 0.5. note that at this stage bright pixels are only grouped in space). Time-discrete wave breaking eventshad their bounding ellipse region ﬁlled in pixel space ( i , j ) and were stacked in time ( t ) resulting in apoint-cloud-like three-dimensional ( i , j , t ) structure. The DBSCAN algorithm was then used to clusterthese data and obtain uniquely labelled clusters. The two parameters for the DBSCAN algorithm, that is,the minimum number of samples in each cluster ( n min ) and the minimum distance allowed between twosamples ( eps ), were set to be equals to the minimum number of points inside a ﬁtted ellipse among allellipses ( n min ) and equals to the sampling frequency in Hertz ( eps ). These values for n min and eps wereconstant among the analyzed datasets. Note that this ﬁnal step can be replaced by any other density-basedclustering algorithm or other more sophisticated algorithms such as SORT (which is also available onthe project’s code repository but not was used here).Note that up to the clustering step, all calculations were done in pixel domain and "time" was obtainedfrom sequential frame numbers. To convert between pixel and metric coordinates, the grids constructedusing the stereo-video dataset available from Guimarães et al. were used in the present study. If stereovideo data are not available, such conversions could be done knowing the camera position in real-world(that is, metric) coordinates, which is usually done using surveyed ground control points

53, 54 . Conversionfrom sequential frame numbers to time (in seconds) can be done by knowing the sample rate (in framesper second, that is, Hertz) for the data. The total amount of computational time required to process 20minutes of raw video recorded at 10Hz with an image size of 5 megapixels is approximately two hourson a modern six-core computer (assuming that a pre-trained neural network is available). Much shorterprocessing times are achievable for smaller images sizes and higher number of computation threads.

Evaluation Metrics

Due to the imbalanced characteristic of the dataset, the classiﬁcation accuracy score in isolation is notan appropriated metric to evaluated the present test dataset. For instance, a classiﬁer that guesses that alllabels are negative (that is, 0) would automatically obtain a high score ( ≈ T F ) and True Negatives (

T N ) are samples that were correctly classiﬁed. FalsePositives (

F P ) or Type I errors are false samples that were incorrectly classiﬁed as true, and False egatives (

F N ) or Type II errors are true samples that were classiﬁed as false.• Accuracy is the percentage of examples correctly classiﬁed considering both classes, that is,Accuracy=

T P + T NT + N , where T and N are the total number of positive and negative samples, re-spectively.• Precision is the percentage of predicted positives that were correctly classiﬁed, that is, Precision= T PT P + FP .• Recall is the percentage of actual positives that were correctly classiﬁed, that is, Recall= T PT P + FN • Area under the curve ( AUC ) is the area deﬁned by plotting the FP rate against the TP rate (alsoreferred to as receiver operating characteristic curve). This metric indicates the probability of aclassiﬁer ranking a random positive sample higher than a random negative sample.All metrics described above were monitored during the training process and were individually assessed torank model performance. The training curves shown in Figure 2-a were used to assess when overﬁtting,that is, decreases in the loss function value for the training dataset that were not reﬂected in the validationdataset, started to occur. Training epochs after overﬁtting started were discarded. Here we favor the

AUC curves as shown in Figure 2-b to indicate better performing models because AUC is a more robust metricthan the classiﬁcation score and at the same time presents a smooth evolution with training epochs. Finally,a confusion matrix (or table of confusion) as shown in Figure 2-c was plotted for each model to assessType I and Type II errors.

Results

Classiﬁer performance

From the analysis of all training curves, confusion matrices, and from Table 2, the best performingbackbone architecture during training was ResNet50V2 by a considerable margin (

AUC =0.989). Theseresults however, did not translate to the validation dataset (

AUC =0.873). Considering only the validationdata, VGG16 was the best performing backbone with

AUC =0.946. Considering only the test dataset,the best performing model was also VGG16 (

AUC =0.855). Overall, VGG16 was selected as the bestperforming model and the results presented in the next sections will use this backbone. Other evaluation etrics such as the accuracy score, precision, and recall closely followed the evolution of AUC withtraining epochs (compare Figures 2-a and b, for example). In general, as the loss value decreased, thenumber of false positives decreased, which made the precision and recall to increase. This behaviorwas consistent for all models. Given that different architectures may perform better for other datasetsand further optimization could change the model ranking presented here, all pre-trained models andtheir training metrics evolution are made available in the project’s code repository. Finally, it is worthmentioning that larger models (for example, VGG19, ResNET152 and EfﬁcientNetB7) could achievebetter results but this was not attempted there due to hardware limitations (that is, these models requiredmore memory than what was available on the NVIDIA GTX 1080 GPU used in this study).

Figure 2.

Examples of training curves and confusion matrix for the best overall performing model(VGG16). a) Loss function value for training and validation data. b)

AUC value for training and validationdata. In a) and b) the hatched area indicates the epochs after which the model started to overﬁt, the thickcolored lines show smoothed loss or

AUC values (average at every 10 epochs), and the transparent linesshow raw loss or

AUC values. c) confusion matrix calculated using test data only.

Test Case with Real-world Data

Figure 3 shows the results of the application of the best performing model architecture (VGG16) on LaJument (2018/01/03 09:39) and Black Sea (2011/10/04 09:38) data. Visual inspection of these resultsconﬁrmed the ability of the neural network to correctly classify the naïvely identiﬁed wave breakingcandidates and only keep instances that represented active wave breaking. Moreover, the same neuralnetwork was able to correctly classify active wave breaking events for the rogue waves seen at La Jument and for the small wind-generated breakers seen in the Black Sea data. This result highlights the ability able 2. Evaluation metrics for all tested backbone architectures. Refer to the main manuscript text fordeﬁnition of evaluation metrics. The best performing models are shown in boldface. Results are sorted byAUC.

Model Accuracy TP FP TN FN Precision Recall AUCTrainResNetV250 0.97 1414 198 13978 280 0.877 0.835 0.989

VGG16 0.93 855 273 13911 831 0.758 0.507 0.943InceptionResnetV2 0.927 886 359 13823 802 0.712 0.525 0.932EfﬁcientNetB5 0.772 1403 3346 10920 297 0.295 0.825 0.874MobileNet 0.904 436 268 13916 1250 0.619 0.259 0.848

ValidationVGG16 0.932 221 65 3478 204 0.773 0.52 0.946

InceptionResnetV2 0.921 190 81 3466 231 0.701 0.451 0.93EfﬁcientNetB5 0.809 353 687 2856 72 0.339 0.831 0.897MobileNet 0.908 123 64 3479 302 0.658 0.289 0.878ResNetV250 0.919 197 97 3450 224 0.67 0.468 0.873

TestVGG16 0.876 106 80 945 69 0.57 0.606 0.855

ResNetV250 0.881 95 63 962 80 0.601 0.543 0.843InceptionResnetV2 0.882 91 57 968 84 0.615 0.52 0.839EfﬁcientNetB5 0.873 88 65 960 87 0.575 0.503 0.827MobileNet 0.875 30 5 1020 145 0.857 0.171 0.768of the neural network to generalize well on the dataset, which is a difﬁcult result to achieve. From theanalysis of the training curves, the averaged classiﬁcation error (accounting for the imbalance in thedata) should be of the order of ≈ Comparison with Mironov & Dulov (2008)

To highlight the improvements obtained by the present method, this section presents a comparison betweenMironov & Dulov’s (2008) automatic active wave breaking detection method and the present method. Toperform this task, data from the 2013 Black Sea experiments in Table 1 that had previously been classiﬁedand investigated in detail by Guimarães were used. All the active breaking events (that is, considering igure 3. Example of the application of the method. a) Results of the naïve wave breaking detection(thresholding + DBSCAN) for La Jument data (03/01/2018 09:39). Note the great amount of passive foambeing detected as active wave breaking. b) Results of active wave breaking detection using VGG16 asbackbone for La Jument data (03/01/2018 09:39). Note the signiﬁcant reduction in the amount of passivefoam being detected. In both plots, number of clusters refers to the number of clusters identiﬁed byDBSCAN. The red dashed rectangle indicates the region of interest which, in these examples, was thesame as the stereo-video reconstruction area. c) Results of the naïve wave breaking detection(thresholding + DBSCAN) for Black sea data (04/10/2011 09:38). d) Results of active wave breakingdetection using VGG16 as backbone for Black Sea data (04/10/2011 09:38). Animations of these resultsare available at https://github.com/caiostringari/deepwaves . In this particularexample, the image was subdivided into blocks of 256x256 pixels for processing. Note that identicalresults were seen using other architectures other than VGG16 to classify these data. ll 900s of data) detected using Mironov & Dulov’s (2008) were manually classiﬁed as true or falseand compared to the labels predicted by our method. To the authors’ knowledge, these data are the onlycurrently available data that has been classiﬁed by both methods as well as manually. Figure 4 shows theresult of the compassion between models. On average, our method has relatively ≈

50% less error thenMironov & Dulov’s (2008) method with an averaged absolute reduction in error of ≈ ≈

15% when considering the validation and test datasets. Note that all tested neural networkarchitectures performed very similarly with only a slight advantage for InceptionResnetV2. It is also worthmentioning that Mironov & Dulov’s (2008) method was designed and optimized to work speciﬁcally withdata from the Black Sea and it is not guaranteed that it will generalize to other datasets. Our method, onthe contrary, has been shown to generalize well for very distinct datasets (see Figure 3, for example) andwith further optimization could achieve even better performance.

Figure 4.

Comparison between averaged classiﬁcation error for active wave breaking detection betweenMironov & Dulov’s (2008) method and all the neural network architectures implemented here. Theerror bars represent one standard deviation from the mean. The data used for this comparison are from the2013 Black Sea experiments described in Table 1. ave Breaking Statistics In this section we brieﬂy present examples of wave breaking statistics that can be directly derived fromthe classiﬁed wave breaking data. For brevity, the analysis is limited to data from the Black Sea becauseour classiﬁer performed best at this location (classiﬁcation errors < 10% considering both 2011 and 2013experiments). Five quantities will be analysed: the wave breaking duration ( T br ), the wave breaking area( A br ), the major ( a ) and minor ( b ) axis of the ﬁtted ellipses (representative of the wave breaking lengthsat their maximum during the active part of the wave breaking), and Phillips’ distribution Λ ( c ) dc . Thesequantities were obtained directly from space-time clustered wave breaking events with the exception ofthe cumulative wave breaking area ( A br ) which was calculated from the projections of pixels clustered inthe ﬁrst step of the method to metric coordinates. The results of this analyses are shown in Figure 5.The wave breaking duration ( T br normalized by wave peak period ( T p ), Figure 5-a) roughly followeda shifted Gamma probability density function (PDF) and had a mean value of 0.12 and most frequentvalue (mode) of 0.13. This result shows that the active part of the wave breaking process happens veryquickly. The wave breaking area ( A br , Figure 5-b) closely followed a Pareto distribution which indicatesthat large wave breaking events are relatively rare in the data. The ratio between the major and minor axisof the ﬁtted ellipses ( a / b , Figure 5-c) followed a Beta PDF and had a mean of 2.5 and mode of 1.9, whichindicates that the ellipses’ major axis is approximately double the size of the minor axis. Assuming anegligible wave front angle, the wave breaking area scaling relation from Duncan ( A br / b , Figure 5-d)also followed a Beta PDF and had mean of 0.1 and mode of 0.8, which is consistent with the previouslyreported value (0.1 ± Λ ( c ) dc distributions obtained using our method and considering that theellipse major axis is representative of the wave breaking length. The observed distributions greatly deviatedfrom the theoretical c − relation . Note that all PDF ﬁts shown here were statistically signiﬁcant at the95% conﬁdence level using the two-tailed Kolmogorov–Smirnov test. See Table 3 for the description ofthe parameters of the PDFs presented in this section and the Discussion section for the contextualizationof these results and the possible implications that they have for future research. igure 5. Examples of statistical properties of breaking waves that can be directly obtained from theproposed method. a) Probability distribution of the wave breaking duration ( T br ) normalized by wave peakperiod ( T p ). The blue line shows the Gamma ﬁt to the data, the purple dashed line shows the mean value(0.13) and the orange dashed line shows the mode value (0.12). b) Probability distribution of the wavebreaking area ( A br ) normalized by wavelength ( g π T p ). The blue line shows the Pareto ﬁt to the data. c)Probability distribution of the ratio between major and minor axis ( a / b ). The blue line shows the Beta ﬁtto the data, the purple dashed line shows the mean value (2.4) and the orange dashed line shows the modevalue (1.8). d) Probability distribution of the wave breaking area scaling parameter ( A br / b ). The blueline shows the Beta ﬁt to the data, the purple dashed line shows the mean value (0.1), the orange dashedline shows the mode value (0.08), and the red line shows Duncan’s constant value (0.11). e) Evolution ofthe wave breaking area over time. The black markers show the observed values, the error bars show valuesbinned at 0.15s intervals, the blue line shows the quadratic regression to the data, the blue swath shows the95% conﬁdence interval for the regression, and r is the correlation coefﬁcient. f) Phillips Λ ( c ) dc distributions assuming that the wave breaking speed is the same as the phase speed of the carrier wave(that is, c = c br ) grouped by wind speed. The black dashed line shows the c − theoretical decay, thecolored markers show the average crest length binned at 0.1 ms − wave speed intervals, and the coloredlines show a running average with a window size of 10 bins. able 3. Fitted parameters for the distributions seen in Figure 5 . All ﬁts were done using

Scipy and,consequently, the values reported here follow Scipy ’s conventions.

Distribution Parameter 1 Parameter 2 Location Scale

Figure 5-a Gamma 3.001 - 0.053 0.0260Figure 5-b Pareto 4.339 - -0.007 0.007Figure 5-c Beta 6.117 73.948 0.207 30.440Figure 5-d Beta 6.389 40.470 -0.760 24.438

Discussion

We have presented a new method to detect active wave breaking in video imagery data that is robustand easily transferable to future studies. Overall, VGG16 was the best performing architecture whichis a surprising result given that VGG is a considerably older architecture that has been superseded bymore recent architectures such as ResNets and EfﬁcientNets . Also surprisingly, EfﬁcientNet was one ofthe worst performers considering the test dataset despite the state-of-the-art results that this architectureachieved in recent years . One explanation could be that given VGG16 is the model with the highestnumber of parameters (despite being the shallowest network), it adapted better to the characteristics ofthe present dataset. Another explanation could be that EfﬁcientNet is currently overﬁt on the ImageNet dataset that was used to train this network hence its high reported classiﬁcation scores. Another aspectto take into consideration is the speed in which the models can predict on new data (inference). WhileVGG16 was the best performing model, its size slows down inference time. In this regard, MobileNetV2offers real-time inference speed at 10Hz for small image sizes (for example 512 × ecomes too small in the deeper layers of the network which makes it impossible for the network to learnthese events (see below for further discussion). A solution for this issue could be to increase the inputimage size but this was not attempted here due to hardware constraints (that is, memory limitation, asdiscussed above).Neural networks have been historically seen as black boxes in which only the ﬁnal classiﬁcationoutputs are relevant. There has been, however, an increase in interest to understanding how neuralnetworks are learning. One technique that is of particular interest for the present paper is the concept ofGradient-weighted Class Activation Mapping (Grad-CAM) . Brieﬂy, this technique shows which regionsof a convolutional layer are more important for the results obtained by the classiﬁer. Figure 6 shows theresults of Grad-CAM applied to examples for all unique locations from Table 1 considering VGG16’slast convolutional layer. When considering only actively breaking waves (Figure 6-a to d) it is evidentthat VGG16 closely mimicked how a human would classify these data, that is, it directly searched for theregions of the image that corresponded to active wave breaking. In the case of passive foam (Figure 6-eto h), VGG16 seemed to use larger portions of the image but at the same time focused on the ﬂocculentfoam seen in the images as a human classiﬁer would do. In general, these results show that our modeltruly learned how to classify the images and is not merely guessing the labels.The promising results presented here indicate that the current method should be extended to anobject detection framework. Such a framework would eliminate the need for the image thresholdingand DBSCAN steps. This implementation could be done by using a strongly-supervised architecturesuch as UNet , or by using a weakly-supervised method derived from Grad-CAM, for example. As aﬁnal recommendation regarding the machine learning aspect of this paper, we strongly encourage futureresearchers to add samples to the training dataset that matches their speciﬁc research needs instead ofblindly applying the provided pre-trained models.We have also presented examples of wave breaking statistics that can be obtained using the proposedmethod. In general, the patterns observed here agreed with previously reported scaling factors thatsupport the idea of wave breaking self-similarity. For example, Figure 5-f directly showed that the scalingparameter A br / b approaches the constant 0.1 value from Duncan’s laboratory experiments. Anothervariable that showed signs of a self-similar behavior was the wave breaking area ( A br ) which was very well igure 6. Results of Grad-CAM for all unique experiment locations described in Table 1 applied toactively breaking waves (top row) and to passive foam (bottom row). a) Actively breaking wave examplerecorded at Adriatic Sea (2015/03/05 10:35). b) Actively breaking wave example recorded at the BlackSea (2011/10/04 11:07). c) Actively breaking wave example recorded at La Jument (2018/01/03 09:39). d)Actively breaking wave example recorded at the Yellow Sea (2017/05/13 05:00). e) Passive foam examplerecorded at Adriatic Sea (2015/03/05 10:35). f) Passive foam example recorded at the Black Sea(2011/10/04 11:07). g) Passive foam example recorded at La Jument (2018/01/03 09:39). h) Passive foamexample recorded at the (2017/05/13 05:00). In all panels, the color scale indicates the weights of theclass activation map with brighter colors showing regions of the image which the neural network used toclassify each particular sample.described by a Pareto distribution and presented a steady quadratic increase with wave breaking duration( T br ). Extensive research

11, 15 has been grounded on the assumption that wave breaking is self-similar, butinconsistencies with this approach have been reported before . Contrarily to other studies

11, 31 , however,the Λ ( c ) distributions obtained here did not closely match the theoretical c − for a sea-state in equilibrium.As reported before , these differences may be due to the fact that here we only considered activelybreaking waves for our analysis whereas other studies seem to track the speeds of both actively breakingwaves and passive foam

31, 32 . Another possibility is that other phenomena not accounted by Phillips’ theory (for example, wave modulation ) play important role on wave breaking. A future publication thatinvestigates the mechanisms related to the wave breaking kinematics using the method and data obtained ere will soon follow. Conclusion

We described a novel method to detect and classify actively breaking waves in video imagery data. Ourmethod achieved promising results when assessed using several different metrics. Further, by analyzingthe deeper layers of our neural network, we showed that the model mimicked how a human classiﬁerwould perform a similar classiﬁcation task. As an application of the method, we presented wave breakingstatistics and breaking wave crest length distributions. Our method can thus be useful for the investigationof several ocean-atmosphere interaction processes and should, in the future, lead to advancements inoperational wave forecast models, gas-exchange models, and general safety at seas. Finally, we stronglyrecommend that future research should focus on standardized methods to study wave breaking so that aconsistent dataset can be generated and used freely and unambiguously by the community.

References Battjes, J. A. & Janssen, J. Energy loss and set-up due to breaking of random waves.

Coast. Eng. ,569–587 (1978). Thornton, E. B. & Guza, R. T. Transformation of Wave Height Distribution.

J. Geophys. Res. ,5925–5938 (1983). Banner, M. L., Babanin, A. V. & Young, I. R. Breaking Probability for Dominant Waves onthe Sea Surface.

J. Phys. Oceanogr. , 3145–3160, DOI: 10.1175/1520-0485(2000)030<3145:BPFDWO>2.0.CO;2 (2000). Banner, M. L., Gemmrich, J. R. & Farmer, D. M. Multiscale measurements of ocean wave break-ing probability.

J. Phys. Oceanogr. , 3364–3375, DOI: 10.1175/1520-0485(2002)032<3364:MMOOWB>2.0.CO;2 (2002). Cavaleri, B. Y. L. Wave Modeling: Where to Go in the Future.

Bull. Am. Meteorol. Soc. . The WAVEWATCH Development Group (WW3DG). User manual and system documentation ofWAVEWATCH III R version 6.07. Tech. Rep., NOAA/NWS/NCEP/MMAB, College Park, MD, USA(2019). Booij, N., Ris, R. C. & Holthuijsen, L. H. A third-generation wave model for coastal regions 1 .Model description and validation.

J. Geophys. Res. , 7649–7666 (1999). The WANDI Group. The WAN Model - A Third Generation Ocean Wave Prediction Model.

J. Phys.Oceanogr. , 1775–1810 (1988). Filipot, J. F., Ardhuin, F. & Babanin, A. V. A uniﬁed deep-to-shallow water wave-breaking probabilityparameterization.

J. Geophys. Res. Ocean. , 1–15, DOI: 10.1029/2009JC005448 (2010).

Stringari, C. E. & Power, H. E. The Fraction of Broken Waves in Natural Surf Zones.

J. Geophys.Res. Ocean. , 1–27, DOI: 10.1029/2019JC015213 (2019). arXiv:1904.06821v1.

Melville, W. K. & Matusov, P. Distribution of breaking waves at the ocean surface.

Nature ,58–63, DOI: 10.1038/417058a (2002).

Phillips, O. M. Spectral and statistical properties of the equilibrium range in wind-generated gravitywaves.

J. Fluid Mech. , 505–531, DOI: 10.1017/S0022112085002221 (1985).

Banner, M. L., Zappa, C. J. & Gemmrich, J. R. A note on the Phillips spectral framework for oceanwhitecaps.

J. Phys. Oceanogr. , 1727–1734, DOI: 10.1175/JPO-D-13-0126.1 (2014). Gemmrich, J. R., Banner, M. L. & Garrett, C. Spectrally resolved energy dissipation rate andmomentum ﬂux of breaking waves.

J. Phys. Oceanogr. , 1296–1312, DOI: 10.1175/2007JPO3762.1(2008). Romero, L. Distribution of Surface Wave Breaking Fronts.

Geophys. Res. Lett. , 10463–10474,DOI: 10.1029/2019GL083408 (2019). Yurovsky, Y. Y., Kudryavtsev, V. N., Chapron, B. & Grodsky, S. A. Modulation of Ka-Band DopplerRadar Signals Backscattered from the Sea Surface.

IEEE Transactions on Geosci. Remote. Sens. ,2931–2948, DOI: 10.1109/TGRS.2017.2787459 (2018). Huang, L., Liu, B., Li, X., Zhang, Z. & Yu, W. Technical evaluation of Sentinel-1 IW mode cross-polradar backscattering from the ocean surface in moderate wind condition.

Remote. Sens. , 1–21, DOI:10.3390/rs9080854 (2017). Hwang, P. A., Zhang, B., Toporkov, J. V. & Perrie, W. Comparison of composite Bragg theory andquad-polarization radar backscatter from RADARSAT-2: With applications to wave breaking andhigh wind retrieval.

J. Geophys. Res. Ocean. , 1–12, DOI: 10.1029/2009JC005995 (2010).

Monahan, E. C. Oceanic whitecaps: Sea surface features detectable via satellite that are indicators ofthe magnitude of the air-sea gas transfer coefﬁcient.

J. Earth Syst. Sci. , DOI: 10.1007/BF02701977(2002).

Reul, N. & Chapron, B. A model of sea-foam thickness distribution for passive microwave remotesensing applications.

J. Geophys. Res. C: Ocean. , 19–1, DOI: 10.1029/2003jc001887 (2003).

Carini, R. J., Chickadel, C. C., Jessup, A. T. & Thompson, J. Estimating wave energy dissipationin the surf zone using thermal infrared imagery.

J. Geophys. Res. Ocean. , 3937–3957, DOI:10.1002/2014JC010561.Received (2015).

Wang, S. et al.

Improving the Upper-Ocean Temperature in an Ocean Climate Model (FESOM 1.4):Shortwave Penetration Versus Mixing Induced by Nonbreaking Surface Waves.

J. Adv. Model. EarthSyst. , 545–557, DOI: 10.1029/2018MS001494 (2019). Komori, S. et al.

Laboratory measurements of heat transfer and drag coefﬁcients at extremely highwind speeds.

J. Phys. Oceanogr. , 959–974, DOI: 10.1175/JPO-D-17-0243.1 (2018). Buscombe, D. & Carini, R. J. A data-driven approach to classifying wave breaking in infrared imagery.

Remote. Sens. , 1–10, DOI: 10.3390/RS11070859 (2019). Buscombe, D., Carini, R. J., Harrison, S. R., Chickadel, C. C. & Warrick, J. A. Optical wave gaugingusing deep neural networks.

Coast. Eng. , 103593, DOI: 10.1016/j.coastaleng.2019.103593(2020).

Kim, J., Kim, J., Kim, T., Huh, D. & Caires, S. Wave-tracking in the surf zone using coastal videoimagery with deep neural networks.

Atmosphere , 1–13, DOI: 10.3390/atmos11030304 (2020). Bieman, J. P. D., Ridder, M. P. D. & Gent, M. R. A. V. Deep learning video analysis as measurementtechnique in physical models.

Coast. Eng. , 103689, DOI: 10.1016/j.coastaleng.2020.103689(2020).

Zheng, C. W., Chen, Y. G., Zhan, C. & Wang, Q. Source Tracing of the Swell Energy: A Case Studyof the Paciﬁc Ocean.

IEEE Access , 139264–139275, DOI: 10.1109/ACCESS.2019.2943903 (2019). Filipot, J.-F. et al.

La Jument lighthouse: a real-scale laboratory for the study of giant waves andtheir loading on marine structures.

Philos. Transactions Royal Soc. A: Math. Phys. Eng. Sci. ,20190008, DOI: 10.1098/rsta.2019.0008 (2019).

Mironov, A. S. & Dulov, V. A. Detection of wave breaking using sea surface video records.

Meas.Sci. Technol. , DOI: 10.1088/0957-0233/19/1/015405 (2008). Sutherland, P. & Melville, W. K. Field measurements and scaling of ocean surface wave-breakingstatistics.

Geophys. Res. Lett. , 3074–3079, DOI: 10.1002/grl.50584 (2013). Kleiss, J. M. & Melville, W. K. Observations of wave breaking kinematics in fetch-limited seas.

J.Phys. Oceanogr. , 2575–2604, DOI: 10.1175/2010JPO4383.1 (2010). Bradski, G. The OpenCV Library.

Dr. Dobb’s J. Softw. Tools (2000).

Ester, M., Kriegel, H.-P., Sander, J. & Xu, X. Density-Based Clustering Methods.

Compr. Chemom. , 635–654, DOI: 10.1016/B978-044452701-1.00067-3 (1996). 10.1.1.71.1980. Moshtagh, N. Minimum Volume Enclosing Ellipsoids.

Convex Optim. , 112–118 (2005).

Guimaraes, P. V. et al.

A Data Set of Sea Surface Stereo Images to Resolve Space-Time Wave Fields.

Sci. Data , 1–12, DOI: 10.12770/af599f42-2770-4d6d-8209-13f40e2c292f (2020). Hastie, T., Tibshirani, R. & Friedman, J.

The Elements of Statistical Learning (Springer InternationalPublishing, 2009), second edi edn.

Shorten, C. & Khoshgoftaar, T. M. A survey on Image Data Augmentation for Deep Learning.

J. BigData , DOI: 10.1186/s40537-019-0197-0 (2019). Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. He, K., Zhang, X., Ren, S. & Sun, J. Identity mappings in deep residual networks.

Lect. NotesComput. Sci. (including subseries Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) ,630–645, DOI: 10.1007/978-3-319-46493-0_38 (2016). 1603.05027.

Szegedy, C., Ioffe, S., Vanhoucke, V. & Alemi, A. A. Inception-v4, inception-ResNet and the impactof residual connections on learning.

Howard, A. G. et al.

MobileNets: Efﬁcient Convolutional Neural Networks for Mobile VisionApplications (2017). 1704.04861.

Tan, M. & Le, Q. V. EfﬁcientNet: Rethinking model scaling for convolutional neural networks. , 10691–10700 (2019). 1905.11946.

Ioffe, S. & Szegedy, C. Batch normalization: Accelerating deep network training by reducing internalcovariate shift. , 448–456 (2015). 1502.03167. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: A SimpleWay to Prevent Neural Networks from Overﬁtting.

J. Mach. Learn. Res. , 1929–1958, DOI:10.1109/ICAEES.2016.7888100 (2014). Kingma, D. P. & Ba, J. Adam: A Method for Stochastic Optimization. In , 1–15, DOI: http://doi.acm.org.ezproxy.lib.ucf.edu/10.1145/1830483.1830503 (San Diego, California, 2014). 1412.6980.

Ruder, S. An overview of gradient descent optimization algorithms (2016). 1609.04747.

Abadi, M. et al.

TensorFlow: Large-scale machine learning on heterogeneous systems (2015).Software available from tensorﬂow.org.

Nagi, J. et al.

Max-Pooling Convolutional Neural Networks for Vision-based Hand Gesture Recogni-tion. In , 342–347 (2011).

Hanson, J. L. & Jensen, R. Wave system diagnostics for numerical wave models. (2004). Large, W. G. & Pond, S. Open Ocean Momentum Flux Measurements in Moderate to Strong Winds.

J. Phys. Oceanogr.

DOI: 10.1175/1520-0485(1981)0112.0.CO;2 (1981).

Bewley, A., Ge, Z., Ott, L., Ramos, F. & Upcroft, B. Simple online and realtime tracking.

Proc. - Int.Conf. on Image Process. ICIP , 3464–3468, DOI: 10.1109/ICIP.2016.7533003 (2016).1602.00763.

Holman, R. A. & Stanley, J. The history and technical capabilities of Argus.

Coast. Eng. , 477–491,DOI: 10.1016/j.coastaleng.2007.01.003 (2007). Schwendeman, M., Thomson, J. & Gemmrich, J. R. Wave breaking dissipation in a Young Wind Sea.

J. Phys. Oceanogr. , 104–127, DOI: 10.1175/JPO-D-12-0237.1 (2014). Guimaraes, P. V.

Sea surface and energy dissipation . Ph.D. thesis, Universitè de Bretagne Loire(2018).

Duncan, J. H. An Experimental Investigation of Breaking Waves Produced by a Towed Hydrofoil.

Proc. Royal Soc. A: Math. Phys. Eng. Sci. , 331–348, DOI: 10.1098/rspa.1981.0127 (1981).

Virtanen, P. et al.

SciPy 1.0: Fundamental Algorithms for Scientiﬁc Computing in Python.

Nat.Methods

DOI: https://doi.org/10.1038/s41592-019-0686-2 (2020).

Russakovsky, O. et al.

ImageNet Large Scale Visual Recognition Challenge.

Int. J. Comput. Vis. ,211–252, DOI: 10.1007/s11263-015-0816-y (2015). 1409.0575.

Selvaraju, R. R. et al.

Grad-CAM: Visual Explanations from Deep Networks via Gradient-BasedLocalization.

Int. J. Comput. Vis. , 336–359, DOI: 10.1007/s11263-019-01228-7 (2020). 1610.02391.

Ronneberger, O., P.Fischer & Brox, T. U-net: Convolutional networks for biomedical image segmen-tation. In

Medical Image Computing and Computer-Assisted Intervention (MICCAI) , vol. 9351 of

LNCS , 234–241 (Springer, 2015). (available on arXiv:1505.04597 [cs.CV]).

Donelan, M. A., Haus, B. K., Plant, W. J. & Troianowski, O. Modulation of short wind waves by longwaves.

J. Geophys. Res. Ocean. , 1–12, DOI: 10.1029/2009JC005794 (2010). cknowledgements

This work beneﬁted from France Energies Marines and State ﬁnancing managed by the National ResearchAgency under the Investments for the Future program bearing the reference numbers ANR-10-IED-0006-14 and ANR-10-IEED-0006-26 for the projects DiME and CARAVELE.

Author contributions statement

C.E.S. idealized and implemented the method, C.E.S. drafted the manuscript, P.V.G., F.L., J.F.F. and R. D.conducted the ﬁeld experiments, C.E.S. and P.V.G. analyzed the data, J.F. provided funding. All authorsreviewed the manuscript.

Additional information

Data, pre-trained networks and auxiliary programs necessary to reproduce the results in this paper areavailable at: https://github.com/caiostringari/deepwaves . Competing Interests Statement