Automatic acoustic identification of individual animals: Improving generalisation across species and recording conditions
AAutomatic acoustic identification of individual animals:Improving generalisation across species and recordingconditions
Dan Stowell , Tereza Petruskov´a , Martin ˇS´alek , Pavel Linhart Abstract
Many animals emit vocal sounds which, independently from the sounds’ func-tion, embed some individually-distinctive signature. Thus the automatic recog-nition of individuals by sound is a potentially powerful tool for zoology andecology research and practical monitoring. Here we present a general automaticidentification method, that can work across multiple animal species with vari-ous levels of complexity in their communication systems. We further introducenew analysis techniques based on dataset manipulations that can evaluate therobustness and generality of a classifier. By using these techniques we confirmedthe presence of experimental confounds in situations resembling those from paststudies. We introduce data manipulations that can reduce the impact of theseconfounds, compatible with any classifier. We suggest that assessment of con-founds should become a standard part of future studies to ensure they do notreport over-optimistic results. We provide annotated recordings used for anal-yses along with this study and we call for dataset sharing to be a commonpractice to enhance development of methods and comparisons of results.
Keywords: animal communication; individual differences; individuality;acoustic monitoring; song repertoire; vocalisation.1 a r X i v : . [ c s . S D ] O c t Introduction
Animal vocalisations exhibit consistent individually-distinctive patterns, oftenreferred to as acoustic signatures. Individual differences in acoustic signalshave been reported universally across vertebrate species (e.g., fish [1], amphib-ians [2], birds [3], mammals [4]). Individual differences may arise from varioussources, for example: distinctive fundamental frequency and harmonic struc-ture of acoustic signal can result from individual vocal tract anatomy [4, 5];distinct temporal or frequency modulation patterns of vocal elements may re-sult from inaccurate matching of innate or learned template or can occur denovo through improvisation [6]. Such individual signatures provide individualrecognition cues for other conspecific animals, and individual recognition basedon acoustic signals is widespread among animals [7]. Long-lasting individualrecognition spanning over one or more year has also been often demonstrated[8, 9, 10]. External and internal factors such as, for example, sound degradationduring transmission [11, 12], variable ambient temperature [13], inner motiva-tion state [14, 15], acquisition of new sounds during life [16], may potentiallyincrease variation of acoustic signals. Despite these potential complications,robust individual signatures were found in many taxa.Besides being studied for their crucial importance in social interactions[17, 18, 19], individual signatures can become a valuable tool for monitoringanimals. Acoustic monitoring of individuals of various species based on vocalcues could become a powerful tool in conservation (reviewed in [3, 20, 21]).Classical capture-mark methods of individual monitoring involve physically dis-turbing the animals of interest and might have a negative impact on healthof studied animals or their behaviour (e.g. [22, 23, 24, 25]). Also, concernshave been raised about possible biases in demographic and behavioural studiesresulting from trap boldness or shyness of specific individuals [26]. Individualacoustic monitoring offers a great advantage of being non-invasive, and thus canbe deployed across species with fewer concerns about effect on behaviour [3].It also may reveal complementary or more detailed information about speciesbehaviour than classical methods [27, 28, 29, 30].Despite many pilot studies [31, 28, 32, 33], automatic acoustic individualidentification is still not routinely applied. It is usually restricted to a particularresearch team or even to a single research project, and eventually, might beabandoned altogether for a particular species. Part of the problem probablylies in the fact that methods of acoustic individual identification were closelytailored to a single species (software platform, acoustic features used, etc.).This is good in order to obtain the best possible results for a particular speciesbut it also hinders general, widespread application because methods need to bedeveloped from scratch for each new species or even project. Little attentionhas been paid to developing general methods of automatic acoustic individualidentification (henceforth “AAII”) which could be used across different species.A few studies in the past have proposed to develop a general, call-type-independent acoustic identification, working towards approaches that could beused across different species, having simple as well as complex vocalisations234]. Despite promising results, most of the published papers included vocal-isations recorded within very limited periods of time (a few hours in a day)[34, 35, 36, 37]. Hence, these studies might have failed to separate effects oftarget signal and potentially confounding effects of particular recording condi-tions and background sound, which have been reported as notable problems incase of other machine learning tasks [38, 39]. Reducing such confounds directly,by recording an animal in different backgrounds, may not be achievable in fieldconditions since animals typically live within limited home ranges and territo-ries. However, acoustic background can change during the breeding season dueto vegetation changes or cycles in activity of different bird species. Also, songbirds may change territories in subsequent years or even within a single season[27]. Some other studies of individual acoustic identification, on the other hand,provided evidence that machine learning acoustic identification can be robustin respect to possible long-term changes in the acoustic background but did notprovide evidence of being generally usable for multiple species [30, 32]. There-fore, the challenge of reliable generalisation of machine learning approach inacoustic individual identification across different conditions and different specieshas not yet been satisfactorily demonstrated.
We briefly review studies representing methods for automatic classification ofindividuals. Note that in the present work, as in many of the cited works, we setaside questions of automating the prior steps of recording focal birds and isolat-ing the recording segments in which they are active. It is common, in prepar-ing data sets, for recordists to collate recordings and manually trim them tothe regions containing the “foreground” individual of interest (often with somebackground noise), discarding the regions containing only background sound. Inthe present work we will make use of both the foreground and background clips,and our method will be applicable whether such segmentation is done manuallyor automatically.Matching a signal against a library of templates is a well-known bioacoustictechnique, most commonly using spectrogram (sonogram) representations ofthe sound, via spectrogram cross-correlation [40]. For identifying individuals,template matching will work in principle when the individuals’ vocalisations arestrongly stereotyped with stable individual differences—and in practice this cangive good recognition results for some species [41]. However, template matchingis only applicable to a minority of species. It is strongly call-type dependent andrequires a library covering all of the vocalisation units that are to be identified.It is unlikely to be useful for species which have a very large vocabulary, highvariability, or whose vocabulary changes substantially across seasons.An approach which can be more independent of call type is that of Gaussianmixture models (GMMs), previously used extensively in human speech technol-ogy [42, 30]. These do not rely on a strongly fixed template but rather build astatistical model summarising the observations (e.g. the spectral shapes) that3re likely to be produced from each individual. A particularly useful aspect ofthe GMM paradigm is that it can straightforwardly incorporate the concept ofa “universal background model” (UBM), which represents not “background” asordinarily understood but a universal pool of the sounds that might be pro-duced by individuals known and unknown. It therefore allows for the practicalpossibility that a given sound might come from unknown individuals that arenot part of the target set [42]. This approach has been used in songbirds, al-though without testing across multiple seasons [42], and for orangutan includingacross-season evaluation [30].The GMM is a very basic statistical model, which does not incorporateany notion of temporal structure. It thus misses out on making use of alarge amount of information in the signal. One way to improve on this, againwell-developed in human speech technology, is to apply hidden Markov models(HMMs). HMMs are statistical models of temporal structure and have moreflexibility than template-matching. However, in general they are likely to becall-type-dependent since they do encode the temporal structure observed ineach vocalisation. Adi et al. used HMMs for recognising individual songbirds,in this case ortolan buntings, with a pragmatic approach to call-type dependence[32]. They first applied HMMs to infer the call type active in a given recording(independent of individual), and then given the call type, applied GMMs toinfer which individual was active.Other computational approaches have been studied. Cheng et al. com-pared four classifier methods, aiming to develop call-type-independent recog-nition across three passerine species [37]. They found HMM and support vectormachines to be favourable among the methods they tested. However, the dataused in this study was relatively limited: it was based on single recording ses-sions per individual, and thus could not test across-year performance; and theauthors deliberately curated the data to select clean recordings with minimalnoise, acknowledging that this would not be representative of realistic record-ings. Fox et al. also focused on the challenge of call-independent identification,across three other passerine species [35, 34]. They used a neural network classi-fier, and achieved good performance for their species. However, again the datafor this study was based on a single session per individual, which makes it un-clear how far the findings generalise across days and years, and also does notfully test whether the results may be affected by confounding factors such asrecording conditions.Computational methods for various automatic recognition tasks have re-cently been dominated and dramatically improved by new trends in machinelearning, including deep learning. Within that broad field, the challenge of re-liable generalisation is far from solved, and is an active research topic. Withinbioacoustics this has recently been studied for detection of bird sounds [43].In deep learning, it was discovered that even the best-performing deep neuralnetworks might be surprisingly non-robust, and could be forced to change theirdecisions by the addition of tiny imperceptible amounts of background noise toan image [38].Note that deep learning systems also typically require very large amounts of4ata to train, meaning they may currently be infeasible for tasks such as acous-tic individual ID in which the number of recordings per individual is necessar-ily limited. For deep learning, “data augmentation” has been used to expanddataset sizes. Data augmentation refers to the practice of synthetically creatingadditional data items by modifying or recombining existing items. In the audiodomain, this could be done for example by adding noise, filtering, or mixingaudio clips together [44]. However, simple unprincipled data augmentation doesnot reduce issues such as undersampling (e.g. some vocalisations unrepresentedin data set) or confounding factors.There thus remains a gap in applying machine learning for automatic indi-vidual identification as a general-purpose tool that can be shown to be reliablefor multiple species and can generalise correctly across recording conditions.In the work reported in this paper, we tested generalisation of machinelearning across species and across recording conditions in context of individualacoustic identification. We used extensive data for three different bird species,including repeated recordings of the same individuals within and across twobreeding seasons. As well as directly evaluating across seasons, we also intro-duced ways to modify the evaluation data to probe the generalisation propertiesof the classifier. We then improved on the baseline approach by developing novelmethods which help to improve generalisation performance, again by modify-ing the data used. Although tested with selected species and classifiers, ourapproach of modifying the data rather than the classification algorithm wasdesigned to be compatible with a wide variety of automatic identification work-flows.
For this study we chose three bird species of varying vocal complexity (Figure1), in order to explore how a single method might apply to the same task atdiffering levels of difficulty and variation. Little owl (
Athene noctua ) representsa species with simple vocalisation (Figure 1a): territorial call is a single syllablewhich is individually unique and it is held to be stable over time (Linhart andˇS´alek unpubl. data) as was shown in several other owl species (e.g. [31, 45]).Then, we selected two passerine species, which exhibit vocal learning: chiffchaff(
Phylloscopus collybita ) and tree pipit (
Anthus trivialis ). Tree pipit songs arealso individually unique and stable over time [27]; but male on average uses11 syllable types (6-18) which are repeated in phrases that can be variablycombined to create a song ([46], Figure 1b). Chiffchaff song, when visualised,may seem simpler than that of the pipit. However, the syllable repertoire sizemight actually be higher—9 to 24 types—and, contrary to the other speciesconsidered, chiffchaff males may change syllable composition of their songs overtime ([47], (Figure 1c). Selected species also differ in their ecology. While littleowls are sedentary and extremely faithful to their territories [48], tree pipits5 r equen cy ( k H z ) AB C Figure 1.
Example spectrograms representing our three study species:(a) little owl (b) tree pipit (c) chiffchaff.and chiffchaffs belong to migratory species with high fidelity to their localities.Annual returning rates for both are 25% to 30% ([27], Linhart unpubl. data).For each of these species, we used targeted recordings of single vocally activeindividuals. Distance to the recorded individual varied across individuals andspecies according to their tolerance towards people. We tried to get the bestrecording and minimise distance to each singing individual without disturbingits activities. Recordings were always done under favorable weather conditions(no rain, no strong wind). In general, signal-to-noise ratio is very good in allof our recordings (not rigorously assessed), but there are also environmentalsounds, sounds from other animals or conspecifics in the recording background.All three species were recorded with following equipment: Sennheiser ME67microphone, Marantz PMD660 or 661 solid-state recorder (sampling frequency44.1 kHz, 16 bit, PCM).
Little owl (Linhart and ˇS´alek 2017) [49]:
Little owls were recordedin two Central European farmland areas: northern Bohemia, Czech Republic(50 ° ° ° ° Chiffchaff (Pr˚uchov´a et al 2017 [47], Pt´aˇcek et al 2016 [42]):
Chif-fchaff males were recorded in a former military training area on the outer bound-ary of ˇCesk´e Budˇejovice town, the Czech Republic (48 ° ° Tree Pipit (Petruskov´a et al. 2015 [27]):
Tree Pipit males were6 able 1.
Details of the audio recording datasets used.Evaluation scenario
Num. ofinds Foreground
Chiffchaff within-year 13 5107 : 1131 451 : 99 5011 : 1100 453 : 92Chiffchaff only-15 13 195 : 1131 18 : 99 195 : 1100 21 : 92Chiffchaff across-year 10 324 : 201 32 : 20 304 : 197 31 : 24Little owl across-year 16 545 : 407 11 : 8 546 : 409 34 : 27Pipit within-year 10 409 : 303 27 : 21 398 : 293 49 : 47Pipit across-year 10 409 : 313 27 : 19 398 : 306 49 : 37recorded at the locality Brdsk´a vrchovina, the Czech Republic (49 ° ° “Data augmentation” in machine learning refers to creating artificially largeor diverse data sets by synthetically manipulating items in data sets to cre-ate new items—for example, by adding noise or performing mild distortions.These artificially enriched data sets, used for training, often lead to improvedautomatic classification results, helping to mitigate the effects of limited dataavailability [50, 51]. Data augmentation is increasingly used in machine learningapplied to audio. Audio-specific manipulations used might include filtering orpitch-shifting, or the mixing together of audio files (i.e. summing their signalstogether) [52, 53]. Some of the highest-performing automatic species recogni-tion systems rely in part on such data augmentations to attain their strongestresults [44].In this work, we describe two augmentation methods used specifically toevaluate and to reduce the confounding effect of background sound. These structured data augmentations are based on audio mixing but with the com-binations of files to mix selected based on foreground and background identitymetadata. We make use of the fact that when recording audio from focal indi-viduals in the wild, it is common to obtain recording clips in which the focalindividual is vocalising (Figure 2a), as well as ‘background’ recordings in whichthe vocal individual is silent (Figure 2b). The latter are commonly discarded.We used them as follows: Adversarial data augmentation:
To evaluate the extent to which confound-ing from background information is an issue, we created datasets in which7ach foreground recording has been mixed with one background record-ing from some other individual (Figure 2c). In the best case, this shouldmake no difference, since the resulting sound clip is acoustically equiva-lent to a recording of the foreground individual, but with a little extrairrelevant background noise. In fact it could be considered a synthetictest of the case in which an individual is recorded having travelled outof their home range. In the worst case, a classifier that has learnt un-desirable correlations between foreground and background will be misledby the modification, either increasing the probability of classifying as theindividual whose territory provided the extra background, or simply con-fusing the classifier and reducing its general ability to classify well. In ourimplementation, each foreground item was used once, each mixed with adifferent background item. Thus the evaluation set remains the same sizeas the unmodified set. We evaluated the robustness of a classifier by look-ing at any changes in the overall correctness of classification, or in moredetail via the extent to which the classifier outputs are modified by theadversarial augmentation.
Stratified data augmentation:
We can use a similar principle during thetraining process, to create an enlarged and improved training data set.We created training datasets in which each training item had been mixedwith an example of background sound from each other individual (Figure2d). If there are K individuals this means that each item is convertedinto K synthetic items, and the data set size increases by a factor of K . Stratifying the mixing in this way, rather than selecting backgroundsamples purely at random, is intended to expose a classifier to trainingdata with reduced correlation between foreground and background, andthus reduce the chance that it uses confounding information in makingdecisions.To implement the foreground and background audio file mixing, we used the sox processing tool v14.4.1. Alongside our data augmentation, we can also consider simple interventions inwhich the background sound recordings are used alone without modification.One way of diagnosing confounding-factor issues in AAII is to apply theclassifier to background-only sound recordings. If there are no confounds inthe trained classifier, trained on foreground sounds, then it should be unable to identify the corresponding individual for any given background-only sound(identifying ‘a’ or ‘b’ in Figure 2b). Automatic identification (“AAII”) forbackground-only sounds should yield results at around chance level.A second use of using the background-only recordings is to create an explicit‘wastebasket’ class during training. As well as training the classifier to recogniseindividual labels A, B, C, ..., we created an additional ‘wastebasket’ class which8 a) ‘Foreground’ recordings, which also contain some signal content comingfrom the background habitat. The foreground and background might not varyindependently, especially in the case of territorial animals. (b) ‘Background’ recordings, recorded when the focal animal is not vocalising → classify vs. → classify (c) In adversarial data augmentation, we mix each foreground recording with a backgroundrecording from another individual, and measure the extent to which this alters the classifier’sdecision. ... → train (d) In stratified data augmentation, each foreground recording is mixed with abackground recording from each other class . This creates a to reduce theconfounding correlation in the training data. Figure 2.
Explanatory illustration of our data augmentation interventions.9hould be recognised as ‘none of the above’, or in this case, explicitly as ‘back-ground’. The explicit-background class may or may not be used in the eventualdeployment of the system. Either way, its inclusion in the training process couldhelp to ensure that the classifier learns not to make mistaken associations withthe other classes. This approach is related to the universal background model(UBM) used in open-set recognition methods [42]. Note that the ‘background’class is likely to be different in kind from the other classes, having very diversesounds. In methods with an explicit UBM, the background class can be handleddifferently than the others [42]. Here, we chose to use methods that can workwith any classifier, and so the background class was simply treated analogouslyto the classes of interest.
In this work, we started with a standard automatic classification processingworkflow (Figure 3a), and then experimented with inserting our proposed im-provements. We modified the feature processing stage, but our main innovationsin fact came during the data set preparation stage, using the foreground and/orbackground data sets in various combinations to create different varieties oftraining and testing data (Figure 3b).As in many other works, the audio files—which in this case may be theoriginals or their augmented versions—were not analysed in their raw waveformformat, but were converted to a mel spectrogram representation: ‘mel’ referringto a perceptually-motivated compression of the frequency axis of a standardspectrogram. We used audio files (44.1 kHz mono) converted into spectrogramsusing frames of length 1024 (23 ms), with Hamming windows, 50% frame over-lap, and 40 mel bands. We applied median-filtering noise reduction to thespectrogram data.Following the findings of [54], we also applied unsupervised feature learning tothe mel spectrogram data as a preprocessing step. This procedure scans throughthe training data in unsupervised fashion (i.e. neglecting the data labels), findinga linear projection that provides an informative transformation of the data. Weevaluated the audio feature data with and without this feature learning step,to evaluate whether the data representation had an impact on the robustnessand generalisability of automatic classification. In other words, as input to theclassifier we used either the mel spectrograms, or the learned representationobtained by transforming the mel spectrogram data.The automatic classifier we used was one based on a random forest classiferthat was previously tested successfully for bird species classification, but hadnot been tested for AAII [54].
As is standard in automatic classification evaluation, we divided our datasetsinto portions used for training the system, and portions used for evaluatingsystem performance. Items used in training were not used in evaluation, and10 raining data (foreground)
Testing data (foreground)
Mel spectrogram Train classi fi erMel spectrogram Apply classi fi er DecisionData set preparation Feature processing Classi fi cation (a) A standard workflow for automatic audioclassification. The upper portion shows thetraining procedure, and the lower shows theapplication or evaluation procedure.
Training data (foreground)
Training data (background)
Augment(mix audio) - strati fi ed Concatenate data sets
Testing data (foreground)
Testing data (background)
Augment(mix audio) - adversarial
Choose bg or fg Mel spectrogram Feature-learning(learn & transform) Train classi fi erMel spectrogram Feature-learning(transform) Apply classi fi er Decision Data set preparation Feature processing Classi fi cation (b) Workflow for our automatic classification experiments. Dashed boxes represent stepswhich we enable/disable as part of our experiment. The upper portion shows the trainingprocedure, and the lower shows the evaluation procedure. The two portions are very similar.However, note that the purpose and method of augmentation is different in each, as is the useof background-only audio: in the training phase the ‘concatenation’ block creates an enlargedtraining set as the union of the background items and the foreground items, while in theevaluation phase the ‘choose’ block select only one of the two, for the system to makepredictions about.
Figure 3.
Classification workflows. 11he allocation of items to the training or evaluation sets was done to create apartitioning through time: evaluation data came from different days within thebreeding season, or subsequent years, than the training data. This correspondsto a plausible use-case in which a system is trained with existing recordingsand then deployed; the partitioning also helps to reduce the probability of over-estimating performance.To quantify performance we used receiver operating curve (ROC) analysis,and as a summary statistic the area under the ROC curve (AUC). The AUCsummarises classifier performance and has various desirable properties for eval-uating classification [55].We evaluated the classifiers following the standard paradigm used in ma-chine learning. Note that during evaluation, we optionally modified the eval-uation data sets in two possible ways, as already described: adversarial dataaugmentation, and background-only classification. In all cases we used AUC asthe primary evaluation measure. However, we also wished to probe the effect ofadversarial data augmentation in finer detail: even when the overall decisionsmade by a classifier are not changed by modifying the input data, there may besmall changes in the full set of probabilities it outputs. A classifier that is robustto adversarial augmentation should be one whose probabilities change little if atall. Hence for the adversarial augmentation test, we also took the probabilitiesoutput from the classifier and compared them against their equivalent proba-bilities from the same classifier in the non-adversarial case. We measured thedifference between these sets of probabilities simply by their root-mean-squareerror (RMS error).
For our first phase of testing, we wished to compare the effectiveness of thedifferent proposed interventions, and their relative effectiveness on data testedwithin-year or across-year. We chose to use the chiffchaff datasets for these tests,since the chiffchaff song has an appropriate level of complexity to elucidate thedifferences between classifier performance, in particular the possible change ofsyllable composition across years. The chiffchaff dataset is also by far the largest.We wanted to explore the difference in estimated performance when evalu-ating a system with recordings from the same year, separated by days from thetraining data, versus recordings from a subsequent year. In the latter case, thebackground sounds may have changed intrinsically, or the individual may havemoved to a different territory; and of course the individual’s own vocalisationpatterns may change across years. This latter effect may be an issue for AAIIwith a species such as the chiffchaff, and also impose limits to the applicationof previous approaches such as template-based matching. Hence we wanted totest whether this more flexible machine learning approach could detect individ-ual signature in the chiffchaff even when applied to data from a different fieldseason. We thus evaluated performance on ‘within-year’ data—recordings fromthe same season—and ‘across-year’ data—recordings from the subsequent year,or a later year. 12ince the size of data available is often a practical constraint in AAII, andsince dataset size can have a strong influence on classifier performance, we fur-ther performed a version of the ‘within-year’ test in which the training data hadbeen restricted to only 15 items per individual. The evaluation data was notrestricted.To evaluate formally the effect of the different interventions, we appliedgeneralised linear mixed models (GLMM) to our evaluation statistics, using the glmmadmb package within R version 3.4.4 [56, 57]. Since AUC is a continuousvalue constrained to the range [0 , In the second phase of our investigations, we evaluated the selected approachacross the three species separately: chiffchaff, pipit and little owl. For each ofthese we compared the most basic version of the classifier (using mel features,no augmentation, and no explicit-background) against the improved versionthat was selected from phase one of the investigation. For each species sep-arately, and using within-year and across-year data according to availability,we evaluated the basic and the improved classifier for the overall performance(AUC measured on foreground sounds). We also evaluated their performance onbackground-only sounds, and on the adversarial data augmentation test, bothof which checked the relationship between improved classification performanceand improvements or degradations in the handling of confounding factors.For both of these tests (background-only testing and adversarial augmenta-tion), we applied GLMM tests similar to those already stated. In these cases weentered separate factors for the testing condition and for whether the improvedclassifier was in use, as well as an interaction term between the two factors. Thistherefore tested for an effect of whether our improved classifier indeed mitigatedthe problems that the tests were designed to expose.
AAII performance over the 13 chiffchaff individuals was strong, above 85% AUCin all variants of the within-year scenario (Figure 4). For interpretation, notethat this corresponds to over 85% probability that a random true-positive itemis ranked higher than a random true-negative item by the system [55]. Thisreduced to around 70–80% when the training set was limited to 15 items per13 asic +exbg +aug +aug+exbg30405060708090100 A U C ( % ) mel spec features chiff chaff within-yearchiff chaff across-yearchiff chaff only-15 within-year Basic +exbg +aug +aug+exbg30405060708090100 A U C ( % ) learnt features chiff chaff within-yearchiff chaff across-yearchiff chaff only-15 within-year Figure 4.
Performance of classifier (AUC) across the three chiffchaffevaluation scenarios, and with various combinations of configuration:with/without augmentation (‘aug’), learnt features, and explicit-background(‘exbg’) training.individual, and reduced even further to around 60% in the across-year evaluationscenario. Recognising chiffchaff individuals across years remains a challengingtask even under the studied interventions.The focus of our study is on discriminating between individuals, but our“explicit-background” configuration additionally made it possible for the sameclassifier to discriminate between cases where a focal individual was singing,and cases where it was not. Across all three of the conditions mentioned above,foreground-vs-background discrimination (aka “detection” of any focal individ-ual) for chiffchaff was strong at over 95% AUC. Mel spectral features performedslightly better for this (range 96.6–98.6%) than learnt features (range 95.3–96.7%). Given this, in the remainder of the results we focus on our main ques-tion of discriminating between individuals.We tested the GLMM residuals for the two evaluation measures (AUC,RMSE) and found no evidence for overdispersion. We also tested all possi-ble reduced models with factors removed, comparing among models using AIC.In both cases, the full model as well as a model with ‘exbg’ (explicit-backgroundtraining) removed gave the best fit, with the full model less than 2 units abovethe exbg-reduced model and leading to no difference in significance estimates.We therefore report results from the full models.Feature-learning and structured data augmentation were both found to sig-nificantly improve classifier performance (Table 2) as well as robustness to ad-versarial data augmentation (Table 3). Explicit-background training was foundto lead to mild improvement but this was a long way below significance.14 able 2.
Results of GLMM test for AUC, across the three chiffchaffevaluation scenarios. Estimate p-value(Intercept) 0.8199 0.041 *Feature-learning 0.3093 0.014 *Augmentation 0.2509 0.048 *Explicit-bg class 0.0626 0.621
Table 3.
Results of GLMM fit for RMSE in the adversarial dataaugmentation test, across the three chiffchaff evaluation scenarios.Estimate p-value(Intercept) 1.8543 1.9e-05 ***Feature-learning -0.5044 1.9e-08 ***Augmentation -0.8734 < Based on the results of our first study, we took forward an improved version ofthe classifier (using stratified data augmentation, and learnt features, but notexplicit-background training) to test across multiple species.Applying this classifier to the different species and conditions, we foundthat it led in most cases to a dramatic improvement in recognition performanceof foreground recordings, and little change in the recognition of backgroundrecordings (Figure 5, Table 4). This suggests that the improvement is based onthe individuals’ signal characteristics and not confounding factors.Our adversarial augmentation, intended as a diagnostic test to adversariallyreduce classification performance, did not have strong overall effects on theheadline performance indicated by the AUC scores (Figure 6, Table 4). Halfof the cases examined—the across-year cases—were not adversely impacted, infact showing a very small increase in AUC score. The chiffchaff within-year testswere the only to show a strong negative impact of adversarial augmentation, andthis negative impact was removed by our improved classification method.We also conducted a more fine-grained analysis of the effect of augmenta-tion, by measuring the amount of deviation induced in the probabilities outputfrom the classifier. On this measure we observed a consistent effect, with ourimprovements reducing the RMS error by ratios of approx 2–6, while the overallmagnitude of the error differed across species (Figure 7).15 asic(fg test) Improved(bg test)Basic(bg test)Improved(fg test)30405060708090100 A U C ( % ) fg testing vs. bg-only testing chiff chaff within-yearchiff chaff across-yearchiff chaff only-15 within-yearlittle owl cross-yearpipit within-yearpipit across-year Basic(fg test) Improved(bg test)Basic(bg test)Improved(fg test)5060708090100 A cc u r a c y ( % ) fg testing vs. bg-only testing chiff chaff within-yearchiff chaff across-yearchiff chaff only-15 within-yearlittle owl cross-yearpipit within-yearpipit across-year Figure 5.
Our selected interventions—data augmentation andfeature-learning—improve classification performance, in some casesdramatically (left-hand pairs of points), without any concomitant increase inthe background-only classification (right-hand pairs of points) which would bean indication of counfounding.
Table 4.
Results of GLMM test for AUC, across all three species, to quantifythe general effect of our improvements on the foreground test and thebackground test (cf. Figure 5). Estimate p-value(Intercept) 0.792 0.00150 **Use of improved classifier 0.852 0.00032 ***Background-only testing -0.562 0.00624 **Interaction term -0.896 0.00391 **
Table 5.
Results of GLMM test for AUC, across all three species, to quantifythe general effect of our improvements on the adversarial test (cf. Figure 6).Estimate p-value(Intercept) 0.873 0.0121 *Use of improved classifier 0.820 0.0027 **Adversarial data augmentation -0.333 0.1713Interaction term 0.225 0.552016 asic(fg test) Improved(adversarial)Basic(adversarial) Improved(fg test)30405060708090100 A U C ( % ) adversarials chiff chaff within-yearchiff chaff across-yearchiff chaff only-15 within-yearlittle owl cross-yearpipit within-yearpipit across-year Basic(fg test) Improved(adversarial)Basic(adversarial) Improved(fg test)5060708090100 A cc u r a c y ( % ) adversarials chiff chaff within-yearchiff chaff across-yearchiff chaff only-15 within-yearlittle owl cross-yearpipit within-yearpipit across-year Figure 6.
Adversarial augmentation has a varied impact on classifierperformance (left-hand pairs of points), in some cases giving a large decline.Our selected interventions vastly reduce the impact of this adversarial test,while also generally improving classification performance (right-hand pairs ofpoints). 17 hiff chaffwithin--year chiff chaffacross--year pipitwithin--year pipitacross--year little owlacross--year024681012 R M S e rr o r BasicImproved
Figure 7.
Measuring in detail how much effect the adversarial augmentationhas on classifier decisions: RMS error of classifier output, in each case applyingadversarial augmentation and then measuring the differences compared againstthe non-adversarial equivalent applied to the exact same data. In all fivescenarios, our selected interventions lead to a large decrease in the RMS error.18
Discussion
We demonstrate that a single approach to automatic acoustic identification ofindividuals (AAII) can be successfully used across different species with dif-ferent complexity of vocalisations. One exception to this is the hardest case,chiffchaff tested across years, in which automatic classification performance re-mains modest. The chiffchaff case (complex song, variable song content), in par-ticular, highlights the need for proper assessment of identification performance.Without proper assessment we cannot be sure if promising results reflect thereal potential of proposed identification method. We document that our pro-posed improvements to the classifier training process are able, in some cases, toimprove the generalisation performance dramatically and, on the other hand,reveal confounds causing over–optimistic results.We evaluated spherical k-means feature-learning as previously used for speciesclassification [54]. We found that for individual identification it provides an im-provement over plain Mel spectral features, not just in accuracy (as previouslyreported) but also in resistance to confounding factors (ibid.). We believe this isdue to the feature-learning having been tailored to reflect fine temporal detailsof bird sound; if so, this lesson would carry across to related systems such as con-volutional neural networks. Our machine-learning approach may be particularlyuseful for automatic identification of individuals in species with more complexsongs, such as pipits (note huge increase in performance over mel features inFigure 5), or chiffchaffs (on short-time scale though).Using silence-regions from focal individuals to create an “explicit-background”training category provided only a mild improvement in the behaviour of theclassifier, under various evaluations. Also, we found that the best-performingconfiguration used for detecting the presence/absence of a focal individual wasnot the same as the best-performing configuration for discriminating betweenindividuals. Hence, it seems generally preferable not to combine the detectionand AAII tasks into one classifier.By contrast, using silence-regions to perform dataset augmentation of theforeground sounds was found to give a strong boost to performance as well asresistance against confounding factors. Background sounds are useful in traininga system for AAII, through data augmentation (rather than explicit-backgroundtraining).We found that adversarial augmentation provided a useful tool to diagnoseconcerns about the robustness of an AAII system. In the present work we foundthat the classifier was robust against this augmentation (and thus we can inferthat it was largely not using background confounds to make its decision), exceptfor the case of chiffchaff with the simple mel features (Figure 6). This lattercase exhorts us to be cautious, and suggests that results from previous call-typeindependent methods may have been over-optimistic in assessing performance[34, 35, 36, 37, 42]. Our adversarial augmentation method can help to test forthis even in the absense of across-year data.Background-only testing was useful to confirm that when the performanceof a classifier was improved, the confounding factors were not aggravated in19arallel, i.e. that the improvement was due to signal and not confound (Figure5). However, the performance on background sound recordings was not reducedto chance, but remained at some level reflecting the foreground-backgroundcorrelations in each case, so results need to interpreted comparatively against theforeground improvement, rather than in isolation. This individual specificity ofthe background may be related to the time interval between recordings. This isclear from the across-year outcomes; within-year, we note that there was one dayof temporal separation for chiffchaffs (close to 70 percent AUC on background-only sound), while an interval of weeks for pipits (chance-level classification ofbackground). These effects surely depend on characteristics of the habitat.Our improved classifier performs much more reliably than the standard one;however, the most crucial factor still seems to be a targeted species. For the littleowl we found good performance, and least affected by modifications in methods- consistent with the fact that it is the species with the simplest vocalisations.Little owl represents a species well suited for template matching individual iden-tification methods which have been used in past for many species with similarsimple, fixed vocalisations (discriminant analysis, cross-correlation). For thesecases, it seems that our automatic identification method does not bring advan-tage regarding improved classification performance. However, a general classifiersuch as ours, automatically adjusting a set of features for each species, wouldallow common users to start individual identification right away without theneed to choose an appropriate template-matching method (e.g. [49]).We found that feature learning gave the best improvement in case of pipits(Figure 5). Pipits have more complex song, where simple template matchingcannot be used to identify individuals. In pipits, each song may have differentduration and may be composed of different subsets of syllable repertoire, and soany a single song cannot be used as template for template matching approach.This singing variation likely also prevents good identification performance basedon Mel features in pipits. Nevertheless, a singing pipit male will cycle throughthe whole syllable repertoire within a relatively low number of songs and indi-vidual males can be identified based on their unique syllable repertoires ([27]).We think that our improvements to the automatic identification might allowthe system to pick up correct features associated with stable repertoire of eachmale. This extends the use of the same automatic identification method to thelarge part of songbird species that organise songs into several song types and,at the same time, are so-called closed-ended learners ([58]).Our automatic identification, however, cannot be considered fully indepen-dent of song content in a sense defined earlier (e.g.[34, 36]). Such content-independent identification method should be able to classify across-year record-ings of chiffchaffs in which syllable repertoires of males differ almost completelybetween the two years [47]. Due to vulnerability of Mel feature classificationto confounds reported here and because performance of content independentidentification has been only tested on short-term recordings, we believe thatthe concept of fully content-independent individual identification needs to bereliably demonstrated yet.Our approach seems to be definitely suitable for species with individual20ocalisation stable over time, even if that vocalisation is complex—a very widerange of species—in general outdoor conditions. For such species it might besuccessfully used for individual automatic acoustic monitoring, although thisneeds to be tested at larger scale: in various species and in large populations.In future work these approaches should also be tested with ‘open-set’ classifiersallowing for the possibility that new unknown individuals might appear in data.This is well-developed in the “universal background model” (UBM) developedin GMM-based speaker recognition [42], and future work in machine learning isneeded to develop this for the case of more powerful classifiers.Important for further work in this topic is open sharing of data in stan-dard formats. Only this way can diverse datasets from individuals be usedto develop/evaluate automatic recognition that works across many taxa andrecording conditions.We conclude by listing the recommendations that emerge from this work forusers of automatic classifiers, in particular for acoustic recognition of individuals:1. Record ‘background’ segments, for each individual (class), and publishbackground audio samples alongside the trimmed individual audio sam-ples. Standard data repositories can be used for these purposes (e.g.Dryad, Zenodo).2. Improve robustness by:(a) suitable choice of input features;(b) structured data augmentation, using background sound recordings.3. Probe your classifier for robustness by:(a) background-only recognition: higher-than-chance recognition stronglyimplies confound;(b) adversarial distraction with background: a large change in classifieroutputs implies confound;(c) across-year testing (if such data are available): a stronger test thanwithin-year.4. Be aware of how species characteristics will affect recognition. The vocal-isation characteristics of the species will influence the ease with which au-tomatic classifiers can identify individuals. Songbirds whose song changeswithin and between seasons will always be harder to identify reliably - asis also the case in manual identification.5. Best practice is to test manual features and learned features since thegeneralisation and performance characteristics are rather different. In thepresent work we compare basic features against learned features; for a dif-ferent example see [12]. Manual features are usually of lower accuracy, butwith learned features more care must be taken with respect to confoundsand generalisation. 21 thics
Our study primarily involved only non-invasive recording of vocalising indi-viduals. In the case of ringed individuals (all chiffchaffs and some tree pipitsand little owls), ringing was done by experienced ringers (PL, MˇS, TP) whoall held ringing licences at the time of study. Tree pipits and chiffchaff maleswere recorded during spontaneous singing. Only for little owls short playbackrecording (1 min) was used to provoke calling. Playback provocations as wellas handling during ringing were kept as short as possible and we are not awareof any consequences for subjects’ breeding or welfare.
Data Accessibility
Our audio data and the associated metadata files are available online under theCreative Commons Attribution licence (CC BY 4.0) at http://doi.org/10.5281/zenodo.1413495
Competing Interests
We have no competing interests.
Authors’ Contributions
DS and PL conceived and designed the study. PL, TP and MˇS recorded audio.PL processed the audio recordings into data sets. DS carried out the classi-fication experiments and performed data analysis. DS, PL and TP wrote themanuscript. All authors gave final approval for publication.
Funding
DS was supported by EPSRC Early Career research fellowship EP/L020505/1.PL was supported by the National Science Centre, Poland, under Polonez fel-lowship reg. no UMO-2015/19/P/NZ8/02507 funded by the European UnionsHorizon 2020 research and innovation programme under the Marie Skodowska-Curie grant agreement No 665778. TP was supported by the Czech ScienceFoundation (project P505/11/P572).MˇS was supported by the research aim ofthe Czech Academy of Sciences (RVO 68081766).22 eferences
1. Amorim MCP, Vasconcelos RO. Variability in the mating calls of theLusitanian toadfish Halobatrachus didactylus: cues for potential individ-ual recognition. Journal of Fish Biology. 2008;73:1267–1283.2. Bee MA, Gerhardt HC. Neighbour-stranger discrimination by territorialmale bullfrogs (Rana catesbeiana): I. Acoustic basis. Animal Behaviour.2001;62:1129–1140.3. Terry AM, Peake TM, McGregor PK. The role of vocal individuality inconservation. Frontiers in Zoology. 2005;2(1):10.4. Taylor AM, Reby D. The contribution of source-filter theory to mam-mal vocal communication research. Journal of Zoology. 2010;280(3):221–236. Available from: http://onlinelibrary.wiley.com/doi/10.1111/j.1469-7998.2009.00661.x/abstract .5. Gamba M, Favaro L, Araldi A, Matteucci V, Giacoma C, Friard O. Mod-eling individual vocal differences in group-living lemurs using vocal tractmorphology. CURRENT ZOOLOGY. 2017;63(4):467–475.6. Janik V, Slater PB. Vocal Learning in Mammals. vol. Volume26. Academic Press; 1997. p. 59–99. Available from: .7. Wiley RH. Specificity and multiplicity in the recognition of individuals:implications for the evolution of social behaviour. Biological Reviews.2013;88(1):179–195. WOS:000317066700011.8. Boeckle M, Bugnyar T. Long-Term Memory for Affiliates in Ravens.Current Biology. 2012;22(9):801–806. Available from: .9. Insley SJ. Long-term vocal recognition in the northern fur seal : Article :Nature. Nature. 2000;406(6794):404–405. Available from: .10. Briefer EF, de la Torre MP, McElligott AG. Mother goats do not forgettheir kids ' calls. Proceedings of the Royal Society B: Biological Sciences.2012 jun;279(1743):3749–3755.11. Slabbekoorn H. Singing in the wild: the ecology of birdsong. In: Marler P,Slabbekoorn H, editors. Nature’s music: the science of birdsong. ElsevierAcademic Press; 2004. p. 178–205.12. Mouterde SC, Elie JE, Theunissen FE, Mathevon N. Learning to copewith degraded sounds: Female zebra finches can improve their expertiseat discriminating between male voices at long distance. The Journal ofExperimental Biology. 2014;p. jeb–104463.233. Gambale PG, Signorelli L, Bastos RP. Individual variation inthe advertisement calls of a Neotropical treefrog (Scinax constric-tus). Amphibia-Reptilia. 2014;35(3):271–281. Available from: http://booksandjournals.brillonline.com/content/journals/10.1163/15685381-00002949 .14. Collins SA. Vocal fighting and flirting: the functions of birdsong. In: Mar-ler PR, Slabbekoorn H, editors. Nature’s music: the science of birdsong.Elsevier Academic Press; 2004. p. 39–79.15. Linhart P, Jaˇska P, Petruskov´a T, Petrusek A, Fuchs R. Being an-gry, singing fast? Signalling of aggressive motivation by syllable ratein a songbird with slow song. Behavioural Processes. 2013;100:139–145.Available from: .16. Kroodsma DE. The diversity and plasticity of bird song. In: Marler PR,Slabbekoorn H, editors. Nature’s music: the science of birdsong. ElsevierAcademic Press; 2004. p. 108–131.17. Thom MDF, Dytham C. Female Choosiness Leads to the Evolutionof Individually Distinctive Males. Evolution. 2012;66(12):3736–3742.WOS:000312218200008.18. Bradbury JW, Vehrencamp SL. Principles of animal communication. 1sted. Sinauer Associates; 1998.19. Crowley PH, Provencher L, Sloane S, Dugatkin LA, Spohn B, Rogers L,et al. Evolving cooperation: the role of individual recognition. Biosys-tems. 1996;37(1):49–66. Available from: .20. Mennill DJ. Individual distinctiveness in avian vocalizations and thespatial monitoring of behaviour. Ibis. 2011;153(2):235–238. Availablefrom: http://onlinelibrary.wiley.com/doi/10.1111/j.1474-919X.2011.01119.x/abstract .21. Blumstein DT, Mennill DJ, Clemins P, Girod L, Yao K, Patricelli G,et al. Acoustic monitoring in terrestrial environments using micro-phone arrays: applications, technological considerations and prospec-tus. Journal of Applied Ecology. 2011;48(3):758–767. Availablefrom: http://onlinelibrary.wiley.com/doi/10.1111/j.1365-2664.2011.01993.x/abstract .22. Johnsen A, Lifjeld J, Rohde PA. Coloured leg bands affect male mate-guarding behaviour in the bluethroat. Animal Behaviour. 1997;54(1):121–130. 243. Gervais JA, Catlin DH, Chelgren ND, Rosenberg DK. Radiotransmit-ter mount type affects burrowing owl survival. The Journal of wildlifemanagement. 2006;70(3):872–876.24. Linhart P, Fuchs R, Pol´akov´a S, Slabbekoorn H. Once bitten twice shy:long-term behavioural changes caused by trapping experience in willowwarblers Phylloscopus trochilus. Journal of avian biology. 2012;43(2):186–192.25. Rivera-Gutierrez HF, Pinxten R, Eens M. Songbirds never for-get: long-lasting behavioural change triggered by a single play-back event. Behaviour. 2015;152(9):1277–1290. Available from: http://booksandjournals.brillonline.com/content/journals/10.1163/1568539x-00003278 .26. Camacho C, Canal D, Potti J. Lifelong effects of trapping experi-ence lead to age-biased sampling: lessons from a wild bird popula-tion. Animal Behaviour. 2017;130:133–139. Available from: .27. Petruskov´a T, Piˇsvejcov´a I, Kinˇstov´a A, Brinke T, Petrusek A.Repertoire-based individual acoustic monitoring of a migratory passer-ine bird with complex song as an efficient tool for tracking territorialdynamics and annual return rates. Methods in Ecology and Evolution.2015 nov;7(3):274–284. Available from: https://doi.org/10.1111%2F2041-210x.12496 .28. Laiolo P, V¨ogeli M, Serrano D, Tella JL. Testing acoustic versus physicalmarking: two complementary methods for individual-based monitoring ofelusive species. Journal of Avian Biology. 2007;38(6):672–681.29. Kirschel ANG, Cody ML, Harlow ZT, Promponas VJ, Vallejo EE, TaylorCE. Territorial dynamics of Mexican Ant-thrushes Formicarius moniligerrevealed by individual recognition of their songs. Ibis. 2011;153:255–268.30. Spillmann B, van Schaik CP, Setia TM, Sadjadi SO. Who shall I say iscalling? Validation of a caller recognition procedure in Bornean flangedmale orangutan (Pongo pygmaeus wurmbii) long calls. Bioacoustics.2017;26(2):109–120.31. Delport W, Kemp AC, Ferguson JWH. Vocal identification of individualAfrican Wood Owls Strix woodfordii: a technique to monitor long-termadult turnover and residency. Ibis. 2002;144:30–39.32. Adi K, Johnson MT, Osiejuk TS. Acoustic censusing using automaticvocalization classification and identity recognition. Journal of the Acous-tical Society of America. 2010 FEB;127(2):874–883.253. Terry AMR, McGregor PK. Census and monitoring based on individuallyidentifiable vocalizations: the role of neural networks. Animal Conserva-tion. 2002;5:103–111.34. Fox EJS. A new perspective on acoustic individual recognition in animalswith limited call sharing or changing repertoires. Animal Behaviour. 2008MAR;75(3):1187–1194.35. Fox EJS, Roberts JD, Bennamoun M. Call-independent individual iden-tification in birds. Bioacoustics. 2008;18(1):51–67.36. Cheng J, Sun Y, Ji L. A call-independent and automatic acoustic sys-tem for the individual recognition of animals: A novel model using fourpasserines. Pattern Recognition. 2010 NOV;43(11):3846–3852.37. Cheng J, Xie B, Lin C, Ji L. A comparative study in birds: call-type-independent species and individual recognition using four machine-learning methods and two acoustic features. Bioacoustics. 2012JUN;21(2):157–171.38. Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, GoodfellowI, et al. Intriguing properties of neural networks. arXiv preprintarXiv:13126199. 2013;.39. Mesaros A, Heittola T, Virtanen T. Acoustic Scene Classification: anOverview of DCASE 2017 Challenge Entries. In: 16th InternationalWorkshop on Acoustic Signal Enhancement (IWAENC). Tokyo, Japan;2018. .40. Khanna H, Gaunt S, McCallum D. Digital spectrographic cross-correlation: tests of sensitivity. Bioacoustics. 1997;7(3):209–234.41. Foote JR, Palazzi E, Mennill DJ. Songs of the Eastern Phoebe, a sub-oscine songbird, are individually distinctive but do not vary geographi-cally. Bioacoustics. 2013;22(2):137–151.42. Pt´aˇcek L, Machlica L, Linhart P, Jaˇska P, Muller L. Automatic recogni-tion of bird individuals on an open set using as-is recordings. Bioacoustics.2016;25(1):55–73.43. Stowell D, Stylianou Y, Wood M, Pamu(cid:32)la H, Glotin H. Automatic acous-tic detection of birds through deep learning: the first Bird Audio Detec-tion challenge. ArXiv e-prints. 2018 Jul;.44. Lasseck M. Audio-based Bird Species Identification with Deep Convolu-tional Neural Networks. Working Notes of CLEF. 2018;2018.45. Grava T, Mathevon N, Place E, Balluet P. Individual acoustic monitoringof the European Eagle Owl Bubo bubo. Ibis. 2008;150:279–287.266. Petruskov´a T, Osiejuk TS, Linhart P, Petrusek A. Structure and Com-plexity of Perched and Flight Songs of the Tree Pipit (Anthus trivi-alis). Annales Zoologici Fennici. 2008 apr;45(2):135–148. Available from: https://doi.org/10.5735%2F086.045.0205 .47. Pr˚uchov´a A, Jaˇska P, Linhart P. Cues to individual identity in songs ofsongbirds: testing general song characteristics in Chiffchaffs Phylloscopuscollybita . Journal of Ornithology. 2017 apr;Available from: https://doi.org/10.1007%2Fs10336-017-1455-6 .48. Nieuwenhuyse DV, Gnot JC, Johnson DH. The Little Owl: Conservation,Ecology and Behavior of
Athene noctua . Cambridge University Press;2008.49. Linhart P, ˇS´alek M. The assessment of biases in the acoustic discrimina-tion of individuals. PloS one. 2017;12(5):e0177206.50. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classifica-tion with deep convolutional neural networks. In: Advancesin neural information processing systems (NIPS); 2012. p.1097–1105. Available from: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks .51. Cire¸san D, Meier U, Schmidhuber J. Multi-column deep neural networksfor image classification. arXiv preprint arXiv:12022745. 2012;.52. Schl¨uter J, Grill T. Exploring Data Augmentation for Improved SingingVoice Detection with Neural Networks. In: Proceedings of the Inter-national Conference on Music Information Retrieval (ISMIR); 2015. p.121–126.53. Salamon J, Bello JP. Deep convolutional neural networks and data aug-mentation for environmental sound classification. IEEE Signal ProcessingLetters. 2017;24(3):279–283.54. Stowell D, Plumbley MD. Automatic large-scale classification of birdsounds is strongly improved by unsupervised feature learning. PeerJ.2014;2:e488.55. Fawcett T. An introduction to ROC analysis. Pattern Recognition Let-ters. 2006;27(8):861–874.56. Fournier DA, Skaug HJ, Ancheta J, Ianelli J, Magnusson A, MaunderMN, et al. AD Model Builder: using automatic differentiation for statis-tical inference of highly parameterized complex nonlinear models. OptimMethods Softw. 2012;27:233–249.57. Skaug H, Fournier D, Bolker B, Magnusson A, Nielsen A. GeneralizedLinear Mixed Models using ’AD Model Builder’; 2016-01-19. R packageversion 0.8.3.3. 278. Beecher MD, Brenowitz EA. Functional aspects of song learning insongbirds. Trends in Ecology & Evolution. 2005;20(3):143–149. Avail-able from: