Sound source ranging using a feed-forward neural network with fitting-based early stopping
Jing Chi, Xiaolei Li, Haozhong Wang, Dazhi Gao, Peter Gerstoft
LLi et al., JASA-EL
Sound source ranging using a feed-forward neural network withfitting-based early stopping
Jing Chi, Xiaolei Li, a) Haozhong Wang, Dazhi Gao, a) and Peter Gerstoft Department of Marine Technology, Ocean University of China, Qing-dao 266100, China Scripps Institution of Oceanography, University of California SanDiego, La Jolla, California 92093-0238, [email protected],lxl [email protected],[email protected],[email protected],[email protected] (Dated: 2 April 2019) a r X i v : . [ c s . L G ] A p r ource ranging using a neural network Abstract:
When a feed-forward neural network (FNN) is trainedfor source ranging in an ocean waveguide, it is difficult evaluating therange accuracy of the FNN on unlabeled test data. A fitting-based earlystopping (FEAST) method is introduced to evaluate the range error ofthe FNN on test data where the distance of source is unknown. Basedon FEAST, when the evaluated range error of the FNN reaches theminimum on test data, stopping training, which will help to improvethe ranging accuracy of the FNN on the test data. The FEAST isdemonstrated on simulated and experimental data. c (cid:13) a) Author to whom correspondence should be addressed.
1. Introduction section
Matched field processing (MFP) for source localization has been developed for many years.It can have limited performance due to its sensitivity to the mismatch between model-generated replica fields and measurements. With the development of machine learning,source localization methods based on machine learning have been revived . As earlyas 1991, Steinberg et al. applied perceptrons for source localization in a homogeneousmedium. Recently, Niu et al. performed ship ranging using a feed-forward neural network(FNN) trained on experimental data. Besides, a regression neural network (NN) and aconvolutional NN are also trained on experimental data for underwater source ranging.Although a NN can be trained on experimental data, because of the difficulty to obtainamounts of ocean acoustic experimental data containing distance labels, it is cumbersometo train a NN on experimental data to realize sound source ranging in an ocean waveguide.Considering the rarity of experimental data, Huang et al. combined simulation data inclose environments to train a deep NN for source localization. However, because of thespace-time variation of the ocean waveguide environment, even if the training data includesboth simulation and experimental data, the test data is often different from the trainingdata due to the difference of the environment. Therefore, the NN with the minimum rangingerror on training data may not reach the minimum ranging error on testing data. If thedistance of sound source of partial test data is known, then this part of the data can be usedas validation data with the source distance as labels, and early stopping can be used toimprove the ranging accuracy of the NN in the test data.3ource ranging using a neural networkEarly stopping is a form of regularization based on choosing when to stop runningan iterative algorithm and is usually used to enhance generalization performance of NN andto fight overfitting . Generalization performance means small error on examples not seenduring training. Validation error, which is the average error of NN on validation data andis computed by labeled validation data, is chosen as the criterion of whether the NN stopstraining in early stopping method . During the training process, when validation errorreaches the minimum, stop training. Thus the validation error on validation data is reducedby early stopping. Generally, however, test data do not contain labels and cannot be usedas validation data. In this case, people cannot use early stopping to improve the rangingaccuracy of the NN in the test data. If the ranging error of the NN in the test data can beevaluated, it can be used as the criterion of whether the NN stops training, so as to optimizethe ranging accuracy of the NN in the test data.In this paper, a FNN is trained on simulation data to realize source ranging in anocean waveguide. Different from and early stopping method, the evaluated rangingerror of the FNN on test data where the distance of source is unknown is used as the criterionof whether the FNN stops training. To evaluate the ranging error of the FNN on test data, amethod called fitting-based early stopping (FEAST) is introduced. Assuming that the trackof an underwater source satisfies a known parameterized function, the FEAST evaluates theranging error of the FNN by parameter fitting. The FEAST is demonstrated on simulatedand experimental data.
2. Simulation data preparation, FNN architecture and learning parameters
Fig. 1. (Color online) (a) Sound speed profile (SSP). (b) Seabed parameters. (c) Architecture ofthe FNN with 1024 neurons in each hidden layer, 462 in the input layer and 201 in the outputlayer.
It will be useful to introduce the parameters used for simulation. Let E1 represent anrange independent ocean waveguide which will be used for modeling the training data. Theparameters of E1 are given by the S5 event in the SWellEx-96 experiment . The soundspeed profile (SSP) and the seabed parameters of E1 are shown in Fig. 1 (a) and (b). Thevertical line array (VLA) had 21 elements from 94.125–212.25 m in depth. Let E2 representa range independent ocean waveguide which is used for modeling the test data. Except forthe SSP, see Fig. 1 (a), the parameters of E2 are the same as E1.5ource ranging using a neural networkThe simulated training and test data set are prepared as follows. Selecting a domain D in E1, which is 1100–5000 m in range from the VLA and 1–30 m below the sea surface, atraining set containing 12 ,
000 samples is constructed by choosing 12 ,
000 source locations in D uniformly. Let p = [ p , · · · , p ] T represent the acoustic signal received by the VLA whena 232 Hz point source is in the D , computed by Kraken . Then the input of FNN can beconstructed by vectorizing the normalized sample covariance matrix C = pp H / (cid:107) p (cid:107) , see .Considering the Hermite symmetry of C and the fact that C is a complex matrix, C contains11 × × x ∈ R × , the input of the FNN.The label in the training set is obtained by dividing D into 201 parts uniformly in rangedirection and encoding the distance information in a 201-dimensional vector y ∈ R × . Ifa source is in the m th part of D , the m th element of y is 1, all others are 0. The test set isgenerated by a moving 232 Hz point source positioned 9 m below the sea surface and leavesthe VLA in E2 at uniform velocity; see the black solid line in Fig. 2 (b). The VLA recordsdata at every 10 s and records 80 sets of data. When recording data, the moving point sourceis considered static. Then the test data which contain 80 samples are constructed in thesame way as the training data except that the test data contains no labels. The differencesbetween training and test data are mainly caused by environmental differences.A four-hidden-layered FNN with 1024 neurons in each hidden layer is used, see Fig. 1(c). The input layer has 462 neurons, and the output layer has 201 neurons. Sigmoid functionis selected as the activation function of the neurons in the hidden layers and softmax functionin the output layer. The FNN is trained on TensorFlow with a learning rate of 0.0005 and the6ource ranging using a neural networkcross-entropy loss function is chosen to optimize the FNN. The cross-entropy loss function L ( α ) is: L ( α ) = − N N (cid:88) i =1 [ y T i ln f α ( x i ) + ( − y i ) T ln( − f α ( x i )] , (1)where α ∈ N represents epoch which is a measure of number of iterations in training, N isthe number of training samples, { x i , y i } represents the i th training samples, x i ∈ R × isthe input to FNN, y i ∈ R × is the label of x i , f α : R × → R × represents the trainedFNN when epoch is α , “T” is to transpose, and ∈ R × is a vector with all elements1. The m th element of f α ( x i ) represents the probability of a source in the m th part of D .The maximum of f α ( x i ) represent the likely source position and the source–VLA range isexpressed by g α ( x i ). When f α ( x i ) = y i , l ( f α ( x i ) , y i ) has a minimum of 0. Fig. 2 (a) shows L ( α ).
3. Basic idea of FEAST
Although the test data do not contain labels, the performance of a FNN on the test data canbe evaluated in some situations. If the expected output of a FNN on the test data satisfies aknown parameterized function F (ˆ x ( t i ) , Θ ), where Θ represents unknown fitting parameters,ˆ x i = ˆ x ( t i ) is the i th test data and t i is a known parameter,min Θ (cid:118)(cid:117)(cid:117)(cid:116) M M (cid:88) i =1 [ g α (ˆ x i ) − F (ˆ x i , Θ )] is used to evaluate the performance on the test data, where M is number of test data. Take amoving source from a a VLA as the example in this paper. Generally, the distance between amoving source and the VLA is a simple curve in the time-distance plane, which can be fitted7ource ranging using a neural network Fig. 2. (Color online) Simulated data. (a) The loss functions L ( α ) (solid) and L FEAST ( α ) (dashed). L FEAST ( α ) reaches the minimum when α = 52. (b) g (ˆ x i ) (diamonds), g (ˆ x i ) (circles) and therange from MFP (crosses), and the real range of the source (solid). (c) Relative mean square error(RMSE) for g α (ˆ x i ). g α (ˆ x i ) (diamonds) and F (ˆ x i , a α , b α ) (circles) for (d) α = 10, (e) α = 52 and(f) α = 2000. by polynomials of finite order. For example, if the source moves from the VLA at constantspeed, the function is the first-order polynomial on time, and if the source moves from VLAat constant acceleration, the function is a second-order polynomial. For simplicity, only asource moving at constant speed is considered, thus F (ˆ x i , Θ ) = F (ˆ x ( t i ) , a, b ) = at i + b , where t i is the i th time instance and Θ = [ a, b ]. Define one loss function as L FEAST ( α ) = L ( α ) + λ (cid:118)(cid:117)(cid:117)(cid:116) M M (cid:88) i =1 [ g α (ˆ x i ) − F (ˆ x i , a α , b α )] , (2)8ource ranging using a neural networkwhere L ( α ) is defined in Eq. (1), { a α , b α } = arg min a,b M (cid:88) i =1 [ g α (ˆ x i ) − F (ˆ x i , a α , b α )] , (3)and λ ∈ R is a regularization parameter. Here λ = √ M max [ L ( α )]max (cid:2)(cid:113)(cid:80) Mi =1 [ g α (ˆ x i ) − F (ˆ x i , a α , b α )] (cid:3) (4)to make the maximum value of the two terms on the right side of Eq. (2) equal to eachother, and generally, the two terms reach their maximum values at small α (the two termsreach their maximum values in α <
10 in this paper). The first term on the right sideof Eq. (2) is the loss function defined on the training data, which aims to avoid thatthe initialization result of the FNN satisfies the parameterized function; the second termcomputes the difference between g α (ˆ x i ) and the parametric model of known form F (ˆ x i , a α , b α )and evaluates the ranging error of the FNN on the test data. When L FEAST ( α ) reaches theminimum or converges, stop training the FNN. Note that the training process of the FNNis completed by optimizing L ( α ), and L FEAST ( α ) just indicates when to stop. Because it isnecessary to calculate L FEAST ( α ) by fitting parameters { a α , b α } and L FEAST ( α ) reaches itsminimum before L ( α ), this method is called fitting-based early stopping (FEAST). Not onlyin FNN, the FEAST is used in other types of neural network to improve ranging accuracyon test data.To demonstrate FEAST, the test data prepared in Sec. 2 is used to calculate L FEAST ( α ). Fig. 2 (a) shows L FEAST ( α ) which has minimum at α = 52. In order tofacilitate the understanding of FEAST, Fig. 2 (d)-(f) show g α (ˆ x i ) and F (ˆ x i , a α , b α ) at dif-9ource ranging using a neural networkferent α , and one find that g α (ˆ x i ) and F (ˆ x i , a α , b α ) are similar when L FEAST ( α ) reaches itsminimum. Fig. 2 (b) indicates that g (ˆ x i ) is close to the true range in the test data. Forcomparison, Fig. 2 (b) also shows g (ˆ x i ) and the range from MFP where the ocean waveg-uide environment used in MFP is E1. Except for the points near 620 s, the range from MFPand g (ˆ x i ) are almost the same and slightly deviate from the true source distance which iscaused by the difference between E1 and E2. However, the ranging results of g (ˆ x i ) havelarger derivations. Define the relative mean square error (RMSE) for ranging:RMSE = (cid:118)(cid:117)(cid:117)(cid:116) M M (cid:88) m =1 [ Rp (ˆ x i ) − Rt (ˆ x i )] Rt (ˆ x i ) , (5)where Rp (ˆ x i ) and Rt (ˆ x i ) are the predicted range and the ground truth range correspondingto ˆ x i . Fig. 2 (c) gives the RMSE of g α (ˆ x i ). One can find that when α = 52, the value ofRMSE (0.0252) is closed to the minimum (0.0251), which verifies the FEAST.
4. Experimental results
FEAST is demonstrated with the experimental data from the SWellEx-96 Event S5 . Onlythe 232 Hz shallow source that was towed at a depth of about 9 m is considered. The datarecorded by VLA from 3700 to 4500 s are selected to prepare the experimental test data;every 10 s of data is used to construct a test data. The experimental test set contains 80samples. Fig. 3 (a) shows L FEAST ( α ) which is computed by the experimental test data andreaches the minimum at α = 53. Fig. 3 (d)-(f) show g α (ˆ x i ) and F (ˆ x i , a α , b α ) at different α , and one again finds that g α (ˆ x i ) and F (ˆ x i , a α , b α ) are similar when L FEAST ( α ) reaches theminimum. Fig. 3 (b) indicates that g (ˆ x i ) is close to the GPS range. For comparison,10ource ranging using a neural network Fig. 3. (Color online) Experimental data. (a) Loss function L FEAST ( α ) is minimum when α = 52.(b) Range results g (ˆ x i ) (diamonds), g (ˆ x i ) (circles) and MFP (crosses), and range from GPS(solid line). (c) RMSE for g α (ˆ x i ). g α (ˆ x i ) (diamonds) and F (ˆ x i , a α , b α ) (circles) for (d) α = 10, (e) α = 52 and (f) α = 2000. Fig. 3 (b) also shows the ranging results g (ˆ x i ) and the range from MFP where the oceanwaveguide environment used in MFP is E1. It can be seen that, except for the points at20, 60, 70 s, the range from MFP and g (ˆ x i ) are almost the same and slightly deviate fromthe GPS range of the moving source, which is caused by the difference between E1 and theexperimental environment. However, the ranging results g (ˆ x i ) have larger derivationsfrom the GSP range. Fig. 3 (c) gives the RMSE of g α (ˆ x i ). One can find that when α = 53,the value of RMSE (0.0577) is closed to the minimum (0.0528), which verifies the FEASTagain. 11ource ranging using a neural network
5. Conclusion
A method called FEAST is introduced to evaluate the ranging error of a FNN for sourceranging on test data set. The FEAST is demonstrated by simulated and experimental data.FEAST, which requires that the trajectory of a moving sound source satisfies a knownparameterized function, is used for data post-processing but not real-time processing. Theresults indicates that FEAST improves the ranging accuracy of the FNN on test data. TheFEAST is used for source ranging in this paper, but it can be used in other applicationswhich has a known parameterized function.
Acknowledgments
This work is supported by the National Natural Science Foundation of China underGrant Nos. 11674294 and 11704359, the Fundamental Research Funds for the Central Uni-versities under Grant No. 201861011 and Qingdao National Laboratory for Marine Scienceand Technology Foundation under Grant No. QNLM2016ORP0106. The authors also thankNing Wang and Ruichun Tang for their useful suggestions for this paper.
References and links H. P. Bucker, “Use of calculated sound fields and matched field detection to locate soundsource in shallow water,” J. Acoust. Soc. Am. , 368–373 (1976). A. Tolstoy, “Matched field processing for underwater acoustics,” (World Scientific, Singa-pore, 1993)., Google Scholar . 12ource ranging using a neural network A. B. Baggeroer, W. A. Kuperman, and P. N. Mikhalevsky, “An overview of matched fieldmethods in ocean acoustics,” IEEE J. Ocean. Eng. , 401–424 (1993). D. F. Gingras and P. Gerstoft, “Inversion for geometric and geoacoustic parameters inshallow water: Experimental results,” J. Acoust. Soc. Am. , 3589–3598 (1995). C. F. Mecklenbra¨uker and P. Gerstoft, “Objective functions for ocean acoustic inversionderived by likelihood methods,” J. Comput. Acoust. , 259–270 (2000). C. Debever and W. A. Kuperman, “Robust matched-field processing using a coherentbroadband white noise constraint processor,” J. Acoust. Soc. Am. , 1979–1986 (2007). H. Niu, E. Reeves, and P. Gerstoft, “Source localization in an ocean wave- guide usingsupervised machine learning,” J. Acoust. Soc. Am. , 1176–1188 (2017). H. Niu, E. Ozanich, and P. Gerstoft, “Ship localization in santa barbara channel usingmachine learning classifiers,” J. Acoust. Soc. Am. , EL455–EL460 (2017). Y. Wang and H. Peng, “Underwater acoustic source localization using generalized regres-sion neural network,” J. Acoust. Soc. Am. , 2321–2331 (2018). E. L. Ferguson, R. Ramakrishnan, S. B. Williams, and C. T. Jin, “Convolutional neuralnetworks for passive monitoring of a shallow water environment using a single sensor,” in
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process (2017), pp. 2657–2661. Z. Q. Huang, J. Xu, Z. Gong, H. Wang, and Y. Yan, “Source localization using deep neuralnetworks in a shallow water environment,” J. Acoust. Soc. Am. , 2922–2932 (2018).13ource ranging using a neural network B. Z. Steinberg, M. J. Beran, S. H. Chin, and J. H. Howard, “A neural network approachto source localization,” J. Acoust. Soc. Am. , 2081–2090 (1991). G. Raskutti, M. J. Wainwright, and B. Yu, “Early stopping and non-parametric regression:An optimal data-dependent stopping rule,” Journal of Machine Learning Research , 335–366 (2014). L. Prechelt, “Automatic early stopping using cross validation: quantifying the criteria,”Neural Networks , 761–767 (1998). J. Murray and D. Ensberg, “The SWellEx–96 experiment,” available at http://swellex96.ucsd.edu/ (Last viewed April 29, 2003).16