[PDF] Automated Rip Current Detection with Region based Convolutional Neural Networks

Abstract

This paper presents a machine learning approach for the automatic identification of rip currents with breaking waves. Rip currents are dangerous fast moving currents of water that result in many deaths by sweeping people out to sea. Most people do not know how to recognize rip currents in order to avoid them. Furthermore, efforts to forecast rip currents are hindered by lack of observations to help train and validate hazard models. The presence of web cams and smart phones have made video and still imagery of the coast ubiquitous and provide a potential source of rip current observations. These same devices could aid public awareness of the presence of rip currents. What is lacking is a method to detect the presence or absence of rip currents from coastal imagery. This paper provides expert labeled training and test data sets for rip currents. We use Faster-RCNN and a custom temporal aggregation stage to make detections from still images or videos with higher measured accuracy than both humans and other methods of rip current detection previously reported in the literature.

Full PDF

AAutomated Rip Current Detection with Region based ConvolutionalNeural Networks

Akila de Silva a , Issei Mori a , Gregory Dusek b , James Davis a and Alex Pang a a University of California, Santa Cruz, CA, United States b NOAA National Ocean Service, Silver Spring, MD, United States

A R T I C L E I N F O

Keywords :Rip Current DetectionMachine LearningImage ProcessingFaster RCNNTemporal Smoothing

A B S T R A C T

This paper presents a machine learning approach for the automatic identiﬁcation of rip currents withbreaking waves. Rip currents are dangerous fast moving currents of water that result in many deathsby sweeping people out to sea. Most people do not know how to recognize rip currents in order toavoid them. Furthermore, eﬀorts to forecast rip currents are hindered by lack of observations to helptrain and validate hazard models. The presence of web cams and smart phones have made videoand still imagery of the coast ubiquitous and provide a potential source of rip current observations.These same devices could aid public awareness of the presence of rip currents. What is lacking is amethod to detect the presence or absence of rip currents from coastal imagery. This paper providesexpert labeled training and test data sets for rip currents. We use Faster-RCNN and a custom temporalaggregation stage to make detections from still images or videos with higher measured accuracy thanboth humans and other methods of rip current detection previously reported in the literature.

1. Introduction

Rip currents are the most signiﬁcant safety risk to swim-mers along the coastlines of oceans, seas, and large lakes. [8,7, 13]. The majority of beach goers do not know how to iden-tify rip currents, and there is no robust and reliable location-independent method to identify them. Globally there arethousands of drownings each year due to rip currents [23,42]. A 20 year study by the US Lifesaving Association re-ports that 81.9% of the 37,000 beach rescues each year aredue to rip currents [7]. There has been no decline in the num-ber of associated drowning fatalities, despite warning signsand educational material.Rip currents are a well-studied ocean phenomenon [3,29, 31]. They are deﬁned as strong and narrow channels offast-moving water that ﬂow towards the sea from beaches.When waves break, they form a “setup” or an increase inmean water level. This setup can vary along a shoreline de-pending on the amount or height of breaking waves. Rip cur-rents form as water tends to ﬂow alongshore from regions ofhigh setup (larger waves) to regions of lower setup (smallerwaves) where currents converge to form a seaward ﬂowingrip. The speed of seaward rips can be quite strong reaching2 m/s, faster than an Olympic swimmer. There are multiplefactors that determine the location and strength of rips, suchas bathymetry, wave height and direction, tide, and beachshape. Rip currents may either be transient or persistent inspace and time. Rips that are frequently found at the samelocation are usually indicative of a fairly stable bathymetricfeature such as a sand bar or reef, or a hard structure suchas rocky outcrop, jetty or pier. These bathymetric featuresresults in variations in wave breaking and setup leading tochannelized rip current ﬂow. Transient or ﬂash rips are inde-pendent of bathymetry and may move up or down the beach,and may appear or disappear.

ORCID (s):

Lifeguards are often trained to identify rip currents.However the majority of drownings occur on beaches with-out trained personnel [1, 4]. Posted signs can provide a warn-ing, but there is evidence that most people do not ﬁnd exist-ing signs helpful in actually identifying rip currents [6].Experts at the National Oceanic and Atmospheric Ad-ministration (NOAA) use images and video to gather statis-tics about rip currents [19]. These data are supporting thevalidation of a rip current forecast model to alert people topotential hazards [20]. The most commonly used method tovisualize rip currents from video is time averaging, summa-rizing a video as a single image [28]. In [43] boosted cas-cade of simple Haar like features, a machine learning tech-nique, was used to detect rip currents in time averaged im-ages. However these time averages when manually assessedcan be misinterpreted. Furthermore, they are not readilyavailable nor interpretable by the average beachgoers, andthe process of averaging removes available information.In recent years the coastal engineering community hassuccessfully used deep neural networks to solve many prob-lems. Classiﬁcation problems such as classifying wavebreaking in infrared imagery [9], beach scene and other land-scape classiﬁcation [11], automated plankton image classiﬁ-cation [41] and ocean front recognition [36] were formulatedas deep learning problems using convolutional neural net-works. Furthermore, some regression problems such as opti-cal wave gauging [10], tracking remotely sensed waves [52],typhoon forecasting [33] were also solved using deep neu-ral networks. In addition, generative adversarial networks,a type of deep neural networks, were used to improve thequality of downscaling of ocean remote sensing data [18].Object detection with deep neural networks is wellstudied in the computer vision community. However mostbenchmarks and research focus on detecting physicalobjects with boundaries between what is and is notan object [16, 22, 37]. Rip currents are ephemeral “objects” de Silva et al.:

Preprint submitted to Elsevier

Page 1 of 9 a r X i v : . [ c s . C V ] F e b utomated rip current detection Figure 1:

A collection of beach scenes, some of which contain dangerous rip currents. Unfortunately these “objects” do not haveclear shape, and most people ﬁnd them hard to identify. This paper describes a computer vision system with detection accuracyhigher than both existing published methods and human observers. which are not observable in every frame, and amorphouswithout clearly deﬁned boundaries even when observable.It is not obvious whether existing methods are applicable.Figure 1 provides a set of examples, illustrating the diﬃcultyof the problem.Our work is aimed at introducing this problem to thecoastal engineering community, and showing that object de-tection methods are applicable. We gathered training datasets of rip currents and worked with experts at NOAA toensure that test data were labeled correctly. We use Faster-RCNN [50] with a custom temporal aggregation stage thatallowed us to achieve detection accuracy that is higher thanboth humans and other methods of rip current detection pre-viously reported in the literature.The contributions of this paper are:• Evidence that region based convolutional neural net-works (CNN) approach for object detection is appli-cable to amorphous and ephemeral objects such as ripcurrents• Analysis showing rip current detection accuracy aboveexistingpublished methods• Data sets of rip current images and video for trainingand testingThe remainder of the paper is organized as follows. Allthe related work is summarized in Section 2. We discuss howthe data was collected in Section 3. Our method is discussedin 4. Results are discussed in 5. Limitations and discussionare in Section 6. In Section 7 we conclude our paper. And inAppendix A we provide the link to the supplementary mate-rials.

2. Related Work

Rip currents are most commonly studied using in situtechniques or instrumentation. Fluorescein dye is commonly released into the ocean and the dispersion observed [5, 14,15, 48]. Wave sensors, acoustic velocimeters, or currentproﬁlers can be deployed at speciﬁc locations [21, 32, 34,40]. Floating drifters with embedded GPS units have alsobeen used to measure currents [12, 13, 51]. These methodsare costly, time consuming, require technical expertise andare generally only applicable to highly localized instancesin time and space. These limitations severely hinder the ap-plicability of such approaches to both public warnings andmodel validation.Time averaged images are a routine method for analyzingvideo in oceanic research, with 10 minutes being a commonintegration period [27, 28, 38, 47]. This method is popularbecause averages often make identiﬁcation of rip channelseasier for the human eye. While these images are usuallyintended for human interpretation, Maryan et. al. apply ma-chine learning to recognize rip channels in time averaged im-ages [43]. Nelko also used time averaged images and notedthat prediction schemes developed at one beach location maynot be directly applicable to another without some modiﬁca-tions [44]. In contrast to these works, the analysis in this pa-per suggests that object detection on individual frames out-performs time averaged images.Dense optical ﬂow [2, 30] has been used to detect ripcurrents in video [46]. This method is attractive since opti-cal ﬂow ﬁelds can be directly compared against ground truthﬂow ﬁelds obtained from in situ measurements [17]. Unfor-tunately these methods are sensitive to camera perturbation,and have diﬃculty in areas lacking textural information. Theresults in this paper suggests that object detection on individ-ual frames outperforms previous optical ﬂow based methods.Certain kinds of rip currents are characterized by visiblesediment plumes. These can be segmented based on changesin coloration. For example, Liu et al, use thresholding inHSV color space to detect rip currents [40]. Unfortunately,not all rip currents contain sediment plumes, and thus thismethod is not applicable to our data sets.Object detection in images is well studied in the com- de Silva et al.:

Preprint submitted to Elsevier

Page 2 of 9utomated rip current detection W i t h R i p s W i t h o u t R i p s Figure 2:

Examples drawn from the 2440 images we collected and labeled to build a training data set. Ground truth boundingboxes are shown in red. puter vision literature [25, 39, 45, 49, 53]. These methodshave been extended to detect objects in videos [26, 35, 54].This work has not previously been applied to rip currentsboth because it is not clear there is an “object” to detect, andbecause of a lack of publicly available data sets for trainingand testing.This paper contributes labeled data sets for rip currentdetection, and evidence that object detection outperformsexisting published methods on this application.

3. Data sets

Since rip currents are a new problem domain for com-puter vision, we did not ﬁnd any existing public databases ofrip current images. Therefore, we assembled a training dataset of rip current images and non-rip current images fromscratch. Our primary source for the database was GoogleEarth

𝑇 𝑀 , which allowed us to extract high-resolution aerialimages of rip currents and non-rip currents. In total, thedatabase contains 1740 images of rip currents and 700 im-ages of similar beach scenes without rip currents. The im-ages range in size from to

234 × 234 . We anno-tated ground truth in the rip current images with axis-alignedbounding boxes. Some examples of the training data set areshown in Figure 2. Note that this data set contains unam-biguous easy examples. We used this static image data setfor training models described in Section 4.

We also collected a data set of video clips consistingof frames. There are a total of frames with and frames without rip currents. Image size varies from to . Ground truth annotations wereveriﬁed by a co-author who is also a rip current expert atNOAA. Figure 1 contains both positive and negative exam-ple frames, as well as a few frames from the training set thatmight be mistaken as containing a rip current. Note thatthese sequences contain more diﬃcult cases. The rip cur-rents are not visible in every frame, but a positive bounding box is applied whenever viewing the entire video segmentindicates that a rip current is present.The frames of this video data set were used for testing.Note that the static images in the training set were taken fromhigh elevation while the videos used to test the model weretaken from a lower perspective. Even so, the trained modelperformed well on the test frames from the video collection.In order to encourage progress in this domain, both datasets will be made available to the public. We are includinga link to the data set in the supplementary materials in ap-pendix A.

4. Method

Region-based convolutional neural networks haveachieved great success in object detection problems. Theseobject detection models usually consist of separate classiﬁ-cation and localization networks with a shared feature ex-traction network. We use Faster RCNN which is also com-posed of two components. The ﬁrst is the deep convolutionalneural network that proposes regions. The second is the FastRCNN detector [24].Faster RCNN follows the traditional object detectionpipeline. It ﬁrst generates region proposals, and then cat-egorizes each proposal as either rip current or background.Secondly, the classiﬁed bounding boxes are further reﬁned.Essentially, the model learns a mapping from the generatedregions to the actual ground truth with a regression network.The model then uses this mapping “function” during testingto reﬁne the generated region. These reﬁned bounding boxescan be anywhere in a frame as features are translation invari-ant [50]. If there is more than one bounding box detected ina frame, we only keep track of the largest one and ignore anyadditional boxes.We trained the Faster RCNN model with the static im-age data set. Before training, each image was augmented byrotating ◦ degrees clockwise and counter clockwise, pro-ducing a training data set three times the size of the orig-inal training data set. All the training data was resized to de Silva et al.: Preprint submitted to Elsevier

Page 3 of 9utomated rip current detection

Figure 3:

Amorphous phenomena without clear boundaries,like rip currents, result in wide variation in detected boundingboxes. All detected bounding boxes from the frames of onevideo are superimposed onto a single image. Bounding boxesmay diﬀer signiﬁcantly even in consecutive frames.

300 × 300 before training to save computation time.

Static object detection models only consider the infor-mation in the frame currently being processed. However,rip currents are natural ocean phenomena, with shape andtexture change depending on many external factors such asweather, wind speed, wave ﬁeld characteristics, water ﬂowspeed, ﬂoating debris, and dirt sediments. The exact bound-aries of a rip current are not well deﬁned. This is diﬀer-ent than objects with well-deﬁned edges such as pedestriansor vehicles. Applying detection algorithms to objects withamorphous boundaries such as rip currents produces bound-ing boxes with variable sizes and locations in adjacent videoframes. In Figure 3 we illustrate this variability by draw-ing all correctly detected bounding boxes from one videosequence onto a single frame.This variability aﬀects overall accuracy, and would notinstill conﬁdence in the results if these bounding boxes werepresented to a user as a video overlay. Thus we investigatetemporal smoothing and aggregation to improve the results.We ﬁnd the overlapping regions of the detected bound-ing boxes by using an accumulation buﬀer with the samesize as the input frame and initialized as a zero matrix. Weconsider a temporal window of 𝑁 frames to build the accu-mulation buﬀer. In the ﬁrst 𝑁 − 1 frames, the accumulationbuﬀer is incremented by 1 for each region within a detectionbounding box. Starting with frame 𝑁 , the area covered bya detection bounding box is incremented by 1 and cappedat a maximum of 𝑁 . Regions not covered by a detectionbounding box are decremented by , but retains a minimumof 0 in the accumulation buﬀer. In eﬀect, the accumulationbuﬀer keeps track of the bounding boxes using a sliding win-dow of 𝑁 frames. The process of building the accumula-tion buﬀer is illustrated in Figure 4, where areas with highervalues are displayed as darker regions in the accumulationbuﬀer. For purposes of identifying a single bounding boxover the collection of bounding boxes across 𝑁 frames, we Figure 4:

Frame aggregation for a window of length 𝑁 . Firstrow shows the input frame sequence. Second row shows thedetections from Faster RCNN. Third row shows the accumu-lation buﬀer. Figure 5:

A visualization of the resulting accumulation buﬀervalues. Regions with a higher value are shown in more opaquered. The resulting bounding box around thresholded values isshown in solid dark red. The full video can be seen in thesupplementary material in appendix A consider only the parts of the accumulation buﬀer where thevalue is at least 𝑇 , and draw the tightest possible axis-alignedbounding box around this region (see Figure 5). This is theaggregated detection. In our implementation we use N=60and T=50.Before frame aggregation, large variations in boundingbox size occur in almost all consecutive frames, shown inFigure 6 top. After frame aggregation, the average sizechange is much smaller, with most frames having zerochange in size from the prior frame. The variation in po-sition of the bounding boxes is similarly reduced by frameaggregation, as seen in Figure 6 bottom. This improved tem-poral coherence provides a smoother and more consistentportrayal of the rip current extent when shown as an over-lay on the video. de Silva et al.: Preprint submitted to Elsevier

Page 4 of 9utomated rip current detection. Human Philip [46] Maryan [43] Maryan F-RCNN F-RCNN+FA[modiﬁed] [ours] [ours] 𝑟𝑖𝑝 _ .𝑚𝑝 𝑟𝑖𝑝 _ .𝑚𝑝 𝑟𝑖𝑝 _ .𝑚𝑝 𝑟𝑖𝑝 _ .𝑚𝑝 𝑟𝑖𝑝 _ .𝑚𝑝 𝑟𝑖𝑝 _ .𝑚𝑝 𝑟𝑖𝑝 _ .𝑚𝑝 𝑟𝑖𝑝 _ .𝑚𝑝 𝑟𝑖𝑝 _ .𝑚𝑝 𝑟𝑖𝑝 _ .𝑚𝑝 𝑟𝑖𝑝 _ .𝑚𝑝 𝑟𝑖𝑝 _ .𝑚𝑝 𝑟𝑖𝑝 _ .𝑚𝑝 𝑟𝑖𝑝 _ .𝑚𝑝 𝑟𝑖𝑝 _ .𝑚𝑝 𝑛𝑜 _ 𝑟𝑖𝑝 _ .𝑚𝑝 𝑛𝑜 _ 𝑟𝑖𝑝 _ .𝑚𝑝 𝑛𝑜 _ 𝑟𝑖𝑝 _ .𝑚𝑝 𝑛𝑜 _ 𝑟𝑖𝑝 _ .𝑚𝑝 𝑛𝑜 _ 𝑟𝑖𝑝 _ .𝑚𝑝 𝑛𝑜 _ 𝑟𝑖𝑝 _ .𝑚𝑝 𝑛𝑜 _ 𝑟𝑖𝑝 _ .𝑚𝑝 𝑛𝑜 _ 𝑟𝑖𝑝 _ .𝑚𝑝 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 Accuracy for each video in the test set. Column 3: Maryan[43] is trained on their trainingdata. Column 4: Maryan [modiﬁed] is trained on our training data. Our method has higheroverall accuracy than humans or any of the prior methods tested. Frame aggregation doescontribute to improvement in accuracy.

Figure 6:

Plots showing the diﬀerences in area and center ofbounding boxes in consecutive frames. Without frame aggre-gation (in red) there are much higher diﬀerences in boundingbox sizes and positions than there is after frame aggregation(in blue).

5. Results

We compared the accuracy of our method with humanobservers as well as two prior methods. We also compared our own method with and without temporal aggregation.

Comparison metric.

All methods were tested using thevideo data set. Frames were labeled as correctlyclassiﬁed if the detected bounding boxes have an Intersectionover Union (IoU) score versus ground truth above 0.3. IoU iscalculated as 𝑎𝑟𝑒𝑎 _ 𝑜𝑓 _ 𝑖𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡𝑖𝑜𝑛 ∕ 𝑎𝑟𝑒𝑎 _ 𝑜𝑓 _ 𝑢𝑛𝑖𝑜𝑛 of theground truth and the detected bounding boxes. Accuracy ofthe video was computed as 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 _ 𝑙𝑎𝑏𝑒𝑙𝑠 ∕ 𝑡𝑜𝑡𝑎𝑙 _ 𝑓 𝑟𝑎𝑚𝑒𝑠 ,and Table 1 provides the results for all methods. Humans.

The primary reason for an automated method ofrip current detection is that most people are not good at iden-tifying rip currents [6]. Human annotators were asked todraw bounding boxes around places they believe to have ripcurrents. We sampled approximately every tenth frame fromour video test set and randomized presentation order acrossall positive and negative examples. Human annotators werenot carefully trained, instead they were provided three posi-tive and three negative examples, roughly the amount of in-formation which might ﬁt on a sign at the beach. Annotatorswere acquired using Mechanical Turk, with basic screeningfor reliable workers, and paid $0.10 per image.Although human performance was relatively poor withonly 76% of frames labeled correctly, it was higher than weexpected based on past studies [6]. We hypothesize that theclearly labeled sample images on a web page directly adja- de Silva et al.:

Preprint submitted to Elsevier

Page 5 of 9utomated rip current detection

Figure 7:

Rip current detections on some of the frames from the test data set. Bounding boxes show the correctly detected ripcurrents. Frames without bounding boxes do not contain any rip currents. cent to the image they were asked to label was a positivecontributing factor.

Time averaged images.

Maryan et al. [43] perform de-tection on time averaged images using a boosted cascade ofsimple Haar like features [53]. We used Maryan’s time aver-aged data for training since our training data set consists ofonly static frames. Testing was performed by ﬁrst computingtime averaged images on each video in our test data set. Thismethod did not perform well. In order to determine if thecause was the images available for training or the model it-self, we repeated the experiment with new data. We replacedthe relatively small number of low resolution time-averagedimages from [43] with the static images from our trainingdata set. Testing this time was against single frames in ourtest videos. This modiﬁcation is called Maryan[modiﬁed]in the results table. When using our training data the modelaccuracy improved considerably, leading us to conclude thatappropriate training data is critical to good results. Further-more, it suggests further investigation on whether using crispimages, rather than time averaged images, to train a modelmight produce more accurate results.In our test images/videos the beach is always located atthe bottom half of the image/video. However, the trainingdata used by [43] was cropped from images where the beachis located in the top half of the image. Therefore, we were concerned that maybe the diﬀerence of orientation betweenthe training data and the test data contributed to the low ac-curacy of the model. However, when we retrained the modelwith vertically ﬂipped training data we did not see any signif-icant diﬀerence in accuracy on our test images/videos. Wehypothesize that the reason for this is that the cropped train-ing images contain insigniﬁcant amount of beach pixels.

Optical ﬂow.

Philip et al. [46] compute optical ﬂow onvideo sequences and make the simplifying assumption thatrip currents can be identiﬁed by regions with the second mostpredominant ﬂow direction, after that of the primary incom-ing wave direction, and that they ﬂow in a single seawarddirection. This results in regions of actual rip currents, butalso picks up swash regions where water is washed up thebeach and back out to sea with the passing of each wave.This method was introduced with the primary intention ofproviding visualizations to users, rather than automated de-tection. To allow comparison, we modify the method to re-turn a bounding box around the largest detected region, ig-noring smaller regions which are less likely to be correct.This method performed poorly on our test videos. We no-ticed that in videos where there is not enough textural in-formation on the rip current, the optical ﬂow ﬁeld generatedwas weak, leading to either missed detections or detection inother regions of the video with stronger texture. de Silva et al.:

Preprint submitted to Elsevier

Page 6 of 9utomated rip current detection

Frame aggregation.

We implemented frame aggregation asa post process to Faster RCNN initially to temporally stabi-lize detections, driven by a need for user interpretable visu-alization of rip current location.In order to understand whether temporal smoothing alsoincreased accuracy we analyzed our implementation bothwith and without frame aggregation. We found that tem-poral aggregation leads to higher accuracy than using FasterRCNN alone. Example detection results are shown in Fig-ure 7. Numerical comparison of humans, prior methods, andour model are provided in Table 1. Faster RCNN with frameaggregation had the highest accuracy in nearly all cases, andthe highest overall (last column of Table 1). For visual com-parisons we have added all the results in the supplementarymaterials at appendix A.

6. Discussion and Future Work

As with all machine learning models, our implementa-tion can fail when used with images that do not resemblethe training data set. Our data sets included primarily ripcurrents characterized by a gap in breaking waves, the mostcommon visual indicator for bathymetry controlled rip cur-rents. Thus we would expect to miss rip currents with othervisual indicators like sediment plumes. We also expect tofail when presented with new imagery, and occasionally forno apparent reason at all, as seen in Figure 8.We found it diﬃcult to compare our method with priorwork and verify that our model performs well in all condi-tions previously researched, due to a lack of public data setson which to verify our results. In order to ensure that fu-ture work has a baseline from which to compare, our datasets with thousands of labeled frames will be made public.Nevertheless our data sets are still limited. The accuracynumbers presented in this paper are correct on this limiteddata, but almost certainly overstate probable outcomes inreal world deployment. We expect that future work will needto collect more examples including less common rip currentvisual presentation, a greater variety of scales, and a widerarray of beach distractors.Lastly, our work lacks a success metric which is mean-ingful to real users. Certainly IoU, true positive rate, mAP,and the like are common in computer vision research, but isit appropriate to measure accuracy on single frames? Mostconceivable deployed systems, such as pointing a mobilephone at the beach, have access to video. If measuring accu-racy on video, should we measure accuracy aggregated over1 second or 1 minute? Is a tight bounding box measured byIoU needed, or just a general region? A metric useful to re-searchers and simultaneously meaningful to users is needed.

7. Conclusion

We present a machine learning approach for identifyingrip currents automatically. We use Faster RCNN and a cus-tom temporal aggregation stage to make detections from stillimages or videos with higher measured accuracy than both

Figure 8:

Example failure cases. The false positives on thebeach scene (right) are not easily explainable. The false posi-tive on the left scene happens only on spurious frames, whichis then corrected by frame aggregation. humans and other methods of rip current detection previ-ously reported in the literature. Training data set and suiteof test videos are made available for other researchers.

A. Appendix

Supplementary material to this article can be found on-line at https://sites.google.com/view/ripcurrentdetection/home

References [1] Australia, S.L.S., 2019. National coastal safety report.https://issuu.com/surﬂifesavingaustralia/docs/ncsr2019 .[2] Barron, J.L., Fleet, D.J., Beauchemin, S.S., 1994. Performance ofoptical ﬂow techniques. International Journal of Computer Vision12, 43–77. URL: https://doi.org/10.1007/BF01420984 , doi: .[3] Bowen, A.J., 1969. Rip currents: Theoretical investigations. Journalof Geophysical Research 74, 5467–5478.[4] Branche, C.M., 2001. Lifeguard eﬀectiveness: A report of the work-ing group. Centers for Disease Control and Prevention, NationalCenter for Injury Prevention and Control URL: .[5] Brander, R.W., Drozdzewski, D., Dominey-Howes, D., 2014. “Dyein the Water”: A visual approach to communicating the rip currenthazard. Science Communication 36, 802–810.[6] Brannstrom, C., Brown, H., Houser, C., Trimble, S., Lavoie, A., 2015.“You can’t see them from sitting here”: Evaluating beach user under-standing of a rip current warning sign. Applied Geography 56, 61–70.doi: .[7] Brewster, B.C., Gould, R.E., Brander, R.W., 2019. Estimations of ripcurrent rescues and drowning in the United States. Natural Hazardsand Earth System Sciences 19, 389–397.[8] Brighton, B., Sherker, S., Brander, R., Thompson, M., Bradstreet, A.,2013. Rip current related drowning deaths and rescues in Australia2004-2011. Natural Hazards and Earth System Sciences 13, 1069–1075. doi: .[9] Buscombe, D., Carini, R.J., 2019. A Data-Driven Approach to Clas-sifying Wave Breaking in Infrared Imagery. Remote Sensing 11,859. URL: , doi: .[10] Buscombe, D., Carini, R.J., Harrison, S.R., Chickadel, C.C.,Warrick, J.A., 2020. Optical wave gauging using deep neu-ral networks. Coastal Engineering 155, 103593. URL: ,doi: .[11] Buscombe, D., Ritchie, A.C., 2018. Landscape Classiﬁcation withDeep Neural Networks. Geosciences 8, 244. URL: , doi: .[12] Castelle, B., Almar, R., Dorel, M., Lefebvre, J.P., Senechal, N., An-thony, E.J., Laibi, R., Chuchla, R., du Penhoat, Y., 2014. Rip cur- de Silva et al.:

Preprint submitted to Elsevier

Page 7 of 9utomated rip current detection rents and circulation on a high-energy low-tide-terraced beach (GrandPopo, Benin, West Africa). Journal of Coastal Research 70, 633 – 638.URL: https://doi.org/10.2112/SI70-107.1 , doi: .[13] Castelle, B., Scott, T., Brander, R., McCarroll, R., 2016. Rip currenttypes, circulation and hazard. Earth-Science Reviews 163, 1–21.[14] Clark, D., Feddersen, F., Guza, R., 2010. Cross-shore surfzone tracerdispersion in an alongshore current. Journal of Geophysical Research(Oceans) 115. doi: .[15] Clark, D.B., Lenain, L., Feddersen, F., Boss, E., Guza, R., 2014.Aerial imaging of ﬂuorescent dye in the near shore. Journal of At-mospheric and Oceanic Technology 31, 1410–1421.[16] Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L., 2009.ImageNet: A large-scale hierarchical image database, in: 2009 IEEEConference on Computer Vision and Pattern Recognition, IEEE. pp.248–255.[17] Dérian, P., Almar, R., 2017. Wavelet-based optical ﬂow estimationof instant surface currents from shore-based and UAV videos. IEEETransactions on Geoscience and Remote Sensing 55, 5790–5797.[18] Ducournau, A., Fablet, R., 2016. Deep learning for ocean remotesensing: an application of convolutional neural networks for super-resolution on satellite-derived SST data, in: 2016 9th IAPR Workshopon Pattern Recogniton in Remote Sensing (PRRS), pp. 1–6. doi: . iSSN: null.[19] Dusek, G., Hernandez, D., Willis, M., Brown, J.A., Long, J.W.,Porter, D.E., Vance, T.C., 2019. WebCAT: Piloting the developmentof a web camera coastal observing network for diverse applications.Frontiers in Marine Science 6, 353.[20] Dusek, G., Seim, H., 2013. A probabilistic rip current forecast model.Journal of Coastal Research 29, 909–925. URL: ://WOS:000321162100015 , doi: .[21] Elgar, S., Raubenheimer, B., Guza, R.T., 2001. Current meter per-formance in the surf zone. Journal of Atmospheric and OceanicTechnology 18, 1735–1746. URL: ://WOS:000171624800010 ,doi: . n/a.[22] Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman,A., 2010. The PASCAL visual object classes (VOC) challenge. In-ternational Journal of Computer Vision 88, 303–338.[23] da F. Klein, A.H., Santana, G.G., Diehl, F.L., de Menezes, J.T., 2003.Analysis of hazards associated with sea bathing: Results of ﬁve yearswork in oceanic beaches of Santa Catarina state, Southern Brazil.Journal of Coastal Research , 107–116URL: .[24] Girshick, R., 2015. Fast R-CNN, in: Proceedings of the IEEE inter-national conference on computer vision, pp. 1440–1448.[25] Han, J., Zhang, D., Cheng, G., Liu, N., Xu, D., 2018. Advanced deep-learning techniques for salient and category-speciﬁc object detection:A survey. IEEE Signal Processing Magazine 35, 84–100.[26] Han, W., Khorrami, P., Paine, T.L., Ramachandran, P., Babaeizadeh,M., Shi, H., Li, J., Yan, S., Huang, T.S., 2016. Seq-NMS for videoobject detection. CoRR abs/1602.08465. URL: http://arxiv.org/abs/1602.08465 , arXiv:1602.08465 .[27] Holland, K.T., Holman, R.A., Lippmann, T.C., Stanley, J., Plant, N.,1997. Practical use of video imagery in nearshore oceanographic ﬁeldstudies. IEEE Journal of Oceanic Engineering 22, 81–92.[28] Holman, R.A., Stanley, J., 2007. The history and technical capabilitiesof Argus. Coastal Engineering 54, 477–491.[29] Holman, R.A., Symonds, G., Thornton, E.B., Ranasinghe, R.,2006. Rip spacing and persistence on an embayed beach. Jour-nal of Geophysical Research-Oceans 111. URL: ://WOS:000234999800001 , doi: ArtnC0100610.1029/2005jc002965 .[30] Horn, B.K.P., Schunck, B.G., 1981. Determining optical ﬂow. Ar-tiﬁcial Intelligence 17, 185–203. URL: http://dx.doi.org/10.1016/0004-3702(81)90024-2 , doi: .[31] Houser, C., Trimble, S., Brander, R., Brewster, B.C., Dusek, G.,Jones, D., Kuhn, J., 2017. Public perceptions of a rip current haz-ard education program: Break the Grip of the Rip! Natural Hazardsand Earth System Sciences 17, 1003–1024. URL: ://WOS:000404797900001 , doi: . [32] Inch, K., 2014. Surf zone hydrodynamics: Measuring waves and cur-rents. Geomorphological Techniques Chap. 3, Sec 2.3.[33] Jiang, G.Q., Xu, J., Wei, J., 2018. A Deep Learning Algorithm of Neu-ral Network for the Parameterization of Typhoon-Ocean Feedback inTyphoon Forecast Models. Geophysical Research Letters 45, 3706–3716. URL: https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1002/2018GL077004 , doi: .[34] Johnson, D., Pattiaratchi, C., . Transient rip currents and nearshorecirculation on a swell-dominated beach. Journal of Geophysical Re-search: Oceans 109. URL: https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2003JC001798 , doi: .[35] Kang, K., Ouyang, W., Li, H., Wang, X., 2016. Object Detection fromVideo Tubelets with Convolutional Neural Networks, in: 2016 IEEEConference on Computer Vision and Pattern Recognition (CVPR),IEEE, Las Vegas, NV, USA. pp. 817–825. URL: http://ieeexplore.ieee.org/document/7780464/ , doi: .[36] Lima, E., Sun, X., Dong, J., Wang, H., Yang, Y., Liu, L., 2017. Learn-ing and Transferring Convolutional Neural Network Knowledge toOcean Front Recognition. IEEE Geoscience and Remote SensingLetters 14, 354–358. URL: http://ieeexplore.ieee.org/document/7829262/ , doi: .[37] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D.,Dollár, P., Zitnick, C.L., 2014. Microsoft COCO: Common objects incontext, in: European Conference on Computer Vision, Springer. pp.740–755.[38] Lippmann, T.C., Holman, R.A., 1989. Quantiﬁcation of sand bar mor-phology: A video technique based on wave dissipation. Journal ofGeophysical Research: Oceans 94, 995–1011.[39] Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., Pietikäi-nen, M., 2018. Deep learning for generic object detection: A survey.arXiv preprint arXiv:1809.02165 .[40] Liu, Y., Wu, C.H., 2019. Lifeguarding operational camera kiosk sys-tem (LOCKS) for ﬂash rip warning: Development and application.Coastal Engineering 152, 103537.[41] Luo, J.Y., Irisson, J.O., Graham, B., Guigand, C., Sarafraz, A., Mader,C., Cowen, R.K., 2018. Automated plankton image analysis usingconvolutional neural networks. Limnology and Oceanography: Meth-ods 16, 814–827. URL: https://aslopubs.onlinelibrary.wiley.com/doi/abs/10.1002/lom3.10285 , doi: .[42] Lushine, J.B., 1991. A study of rip current drownings and relatedweather factors. National Weather Digest , 13–19.[43] Maryan, C., Hoque, M.T., Michael, C., Ioup, E., Abdelguerﬁ, M.,2019. Machine learning applications in detecting rip channels fromimages. Applied Soft Computing 78, 84–93.[44] Nelko, V., Dalrymple, R., 2011. Rip current prediction inOcean City, Maryland, in: Rip Currents: Beach Safety, Physi-cal Oceanography and Wave Modeling. CRC Press, pp. 45–58.URL: .[45] Papageorgiou, C.P., Oren, M., Poggio, T., 1998. A general frameworkfor object detection, in: Proceedings of the Sixth International Con-ference on Computer Vision, IEEE Computer Society, Washington,DC, USA. pp. 555–562. URL: http://dl.acm.org/citation.cfm?id=938978.939174 .[46] Philip, S., Pang, A., 2016. Detecting and Visualizing Rip Current Us-ing Optical Flow, in: Bertini, E., Elmqvist, N., Wischgoll, T. (Eds.),EuroVis 2016 - Short Papers, The Eurographics Association. p. 115.doi: .[47] Pitman, S., Gallop, S.L., Haigh, I.D., Mahmoodi, S., Masselink, G.,Ranasinghe, R., 2016. Synthetic imagery for the automated detectionof rip currents. Journal of coastal research , 912–916.[48] Pritchard, D.W., Carpenter, J.H., 1960. Measurements of turbu-lent diﬀusion in estuarine and inshore waters. International Asso-ciation of Scientiﬁc Hydrology Bulletin 5, 37–50. URL: https://doi.org/10.1080/02626666009493189 , doi: , arXiv:https://doi.org/10.1080/02626666009493189 .[49] Redmon, J., Divvala, S., Girshick, R., Farhadi, A., 2016. You OnlyLook Once: Uniﬁed, real-time object detection, in: The IEEE Con- de Silva et al.: Preprint submitted to Elsevier

Page 8 of 9utomated rip current detection ference on Computer Vision and Pattern Recognition (CVPR), p. 0.[50] Ren, S., He, K., Girshick, R., Sun, J., 2015. Faster R-CNN: To-wards real-time object detection with region proposal networks, in:Advances in Neural Information Processing Systems, pp. 91–99.[51] Schmidt, W., Woodward, B., Millikan, K., Guza, R., Raubenheimer,B., Elgar, S., 2003. A GPS-tracked surf zone drifter. Journal of At-mospheric and Oceanic Technology 20, 1069–1075.[52] Stringari, C.E., Harris, D.L., Power, H.E., 2019. A novel ma-chine learning algorithm for tracking remotely sensed waves inthe surf zone. Coastal Engineering 147, 149–158. URL: ,doi: .[53] Viola, P., Jones, M.J., 2004. Robust real-time face detection. In-ternational Journal of Computer Vision 57, 137–154. URL: https://doi.org/10.1023/B:VISI.0000013087.49260.fb , doi: .[54] Zhu, X., Wang, Y., Dai, J., Yuan, L., Wei, Y., 2017. Flow-guidedfeature aggregation for video object detection, in: Proceedings of theIEEE International Conference on Computer Vision, pp. 408–417. de Silva et al.: