[PDF] An embedded multichannel sound acquisition system for drone audition

Abstract

Microphone array techniques can improve the acoustic sensing performance on drones, compared to the use of a single microphone. However, multichannel sound acquisition systems are not available in current commercial drone platforms. To encourage the research in drone audition, we present an embedded sound acquisition and recording system with eight microphones and a multichannel sound recorder mounted on a quadcopter. In addition to recording and storing locally the sound from multiple microphones simultaneously, the embedded system can connect wirelessly to a remote terminal to transfer audio files for further processing. This will be the first stage towards creating a fully embedded solution for drone audition. We present experimental results obtained by state-of-the-art drone audition algorithms applied to the sound recorded by the embedded system.

Full PDF

11 An embedded multichannel sound acquisitionsystem for drone audition

Michael Clayton, Lin Wang, Andrew McPherson, Andrea Cavallaro

Abstract —Microphone array techniques can improve theacoustic sensing performance on drones, compared to the useof a single microphone. However, multichannel sound acquisitionsystems are not available in current commercial drone platforms.To encourage the research in drone audition, we present anembedded sound acquisition and recording system with eightmicrophones and a multichannel sound recorder mounted on aquadcopter. In addition to recording and storing locally the soundfrom multiple microphones simultaneously, the embedded systemcan connect wirelessly to a remote terminal to transfer audioﬁles for further processing. This will be the ﬁrst stage towardscreating a fully embedded solution for drone audition. We presentexperimental results obtained by state-of-the-art drone auditionalgorithms applied to the sound recorded by the embeddedsystem.

Index Terms —Drone audition, microphone array, embeddedsystem

I. I

NTRODUCTION

The use of drones for remote sensing has substantiallyincreased in the past decade, with operation in broadcasting,surveillance, inspection, and search and rescue [1]. Sensingis primarily based on cameras (optical and thermal) andlasers [2]–[6], whereas microphones are rarely used becauseof the inherently challenging sound sensing conditions [7].When visual data is unreliable due to low light, poor weatherconditions or visual obstructions [8], drone audition wouldgreatly beneﬁt the above-mentioned applications. One of themain obstacles when capturing audio on a drone is the strongego-noise created by the rotating motors, propellers and theairﬂow during ﬂight. The ego-noise masks the target soundsources and causes poor recording quality.Microphone array techniques can be used to improvethe drone audition performance through sound enhancement[9]–[14] and sound source localization [15]–[18], [20].An important bottleneck for deploying microphone arrayalgorithms on drones is the requirement of a multichannelsound acquisition system to enable sampling the soundfrom multiple microphones simultaneously and convert it tomultichannel digital signals before further processing. Thesound acquisition system needs to ﬂy with the drone, whichimposes additional constraints on the size and weight of thesystem. To the best of our knowledge, there is no dedicatedmultichannel sound acquisition device available in currentcommercial drone platforms. Researchers have to design andimplement their own hardware systems for data collection on

Manuscript received: December 25, 2020The authors are with Centre for Intelligent Sensing, Queen Mary Universityof London, London, UK (e-mail: { m.p.clayton, lin.wang, a.mcpherson,a.cavallaro } @qmul.ac.uk) drones, and the processing of the data is often done ofﬂine afterthe ﬂight due to limited computational resources onboard.To conduct and encourage the research in the ﬁeld ofdrone audition, we designed an embedded multichannel soundacquisition system that is suitable for drone audition and canbe mounted on a drone for acoustic sensing during ﬂight.The system is designed based on Bela [21], an embeddedcomputing platform dedicated to audio processing, and canaccommodate up to eight microphones placed in arbitraryshapes. The system can record and store the sound locally for on-device processing; and can also transfer the recordedsound ﬁle via wireless communication to a remote terminal.In the remainder of the paper, we disclose the technical detailsfor hardware, software design and development.The paper is organized as follows. Sec. II reviews relatedworks. Sec. III and Sec. IV present the hardware and softwaredesign of the embedded system. Sec. V presents real datacollection with the hardware and presents baseline processingresults with state-of-the-art drone audition algorithms.II. R ELATED WORK

Three types of audio hardware are employed: multichannelsound card, intelligent multichannel voice interface, andportable multichannel sound recorder.

1) A portable multichannel sound recorder:

This is theeasiest way to capture sound from drones as there is norequirement for any conﬁguration of the system, e.g. ZoomH6 [11] and Zoom R24 [7], [27]. The hardware supportsarbitrary array topology. The drawback is the the hardware canonly achieve recording and does not support sound processing.Another drawback is that the hardware, e.g. Zoom R24, isusually too heavy a payload for the drone to ﬂy.

2) Intelligent multichannel voice interface:

This typeof hardware integrates the microphone array and soundprocessing into a compact IC board, e.g. ReSpeaker [23] andUAM-8 [24]. This hardware usually requires an additionalcontroller, e.g. Rasberry Pi, for sound acquisition and soundprocessing. This hardware is also usually easy to use andconﬁgure for audio purposes. One of the main advantages isthat the hardware is also very compact and light-weight, and issuitable to ﬂy with the drone. The drawback is the topology ofthe array is ﬁxed, which limits the performance and ﬂexibilityof microphone array algorithms.

3) Multichannel sound card:

This is the most popularapproach for sound recording on drones, using e.g. RASPseries [18], [19], [26], 8SoundUSB [17], [28], USBStreamer [25]. This hardware supports arbitrary array a r X i v : . [ ee ss . A S ] J a n TABLE IE

XISTING MULTICHANNEL SOUND ACQUISITION SYSTEMS ON DRONES . Q - Q

UADCOPTOR ; H - H

EXACOPTERS ; O - O

CTOCOPTER

Ref Number of Shape Placement Audio interface Drone Type Remarkmicrophones of the array of the array [11] 6 T-shape Side Zoom H6 Self-assembled (Q) Portable recorder[7] 8 Circular Top Zoom R24 3DR Iris (Q) Portable recorder[23] 6 Circular (ﬁxed) Top ReSpeaker + Raspberry Pi Self-assembled (O) Intelligent voice interface[24] 7 Circular (ﬁxed) Side UMA-8 Mic array + Raspberry Pi Self-assembled (Q) Intelligent voice interface[25] 8 Circular Below MiniDSP USBStreamer I2S-to-USB Matrice 100 (Q) Sound card[17] 8 Cubic Below 8SoundsUSB MK-Quadro (Q) Sound card[28] 8 Circular Top, below, side 8SoundsUSB Matrice 100 (Q) Sound card[19] 12 Spherical Side RASP-ZX Surveyor MS-06LA (H) Sound card[26] 8 Circular Side RASP-24 Parrot AR Drone (Q) Sound card[18] 16 Octagon Side RASP-ZX Surveyor MS-06LA (H) Sound card

Proposed

Architecture of the multichannel sound acquisition system. topology along with sound acquisition and sound processing.The main drawback is the user requires knowledge ofthe hardware circuit design. This particular hardware alsorequires an operating system to control sound recording andprocessing, e.g. the RASP series is used in combinationwith the HARK system [29]. A good understanding of theback-end driver is necessary. The lack of related resources isalso intimidating for algorithm designers.III. H

ARDWARE DESIGN

Fig. 1 and Fig. 2 illustrate the architecture and thereal objects of the multichannel sound acquisition system,respectively. The system mainly consists of three parts: themicrophone array, the drone, the hardware tray containing theBela sound acquisition system and the cables. Table II liststhe components used by the system. Fig. 3 illustrates the Belahardware system assembly and peripheral connections.

A. Microphone array and drone

We use a circular microphone array consisting of eightBoya BY-M1 lapel microphones that are each powered by anLR44 (1.5V) battery. A balanced audio signal is provided fromthe microphones. The diameter of the array is 16.5 cm. Themicrophone array frame is 3D printed and constructed fromAcrylonitrile butadiene styrene (ABS). The array is mountedon top of the drone to avoid the air ﬂow from the rotatingpropellers blowing downward [30]. The vertical distance fromthe array to the drone body is 18 cm. For the drone, we useDJI Matrice 100, which has a payload capacity of 1 kg. (a) (b)(c) (d)

Fig. 2. Real objects of the multichannel sound acquisition system. (a) Frontview; (b) Side view; (c) Top view; (d) Microphone array.

B. Bela-based sound acquisition system

The sound acquisition system consists of four units: thecore processing unit, the storage and transmission unit, andthe hardware tray. Fig. 3 illustrates the hardware connections.

1) Core processing unit:

The core processing unit consistsof one Beaglebone device ﬂashed with the latest Bela software.To access multichannel audio, Bela uses a customizedexpansion board called CTAG BEAST, featuring an audiocodec with 4 audio input and 8 audio output channels. OneCTAG BEAST consists of 2 x CTAG FACE capes pre-conﬁgured for use as a BEAST [31], two CTAG Molexbreakout boards, and one external LiPo power battery.Bela is a dedicated audio processing platform based on Bela is an embedded audio programming and processing platform inventedby academia from Queen Mary University of London [21]. The compact size,light-weight, low-latency and multichannel sound acquisition makes it suitablefor sound processing on drones [22]. Bela also comes with a user friendlybrowser-based Integrated Development Environment (IDE), which is used foreasy access for editing, building and managing the system. For this reason,we decided to develop a multichannel sound acquisition system based on theBela device. This is the ﬁrst time the Bela system has been applied to roboticplatforms assisting audition.

TABLE IIC

OMPONENTS USED IN THE HARDWARE SYSTEM .Component Type FunctionalityDrone Matrice 100 /Microphones (8) Lapel microphones /Array frame 3D printing Holding microphonesHardware tray 3D printing Holding hardwareand cablesBela BeagleBone Black 1GHz ARM Cortex-A8 processorCTAG Beast (2) / Multichannel audioacquisitionCTAG Molexbreakout board(2) / Audio inputsMolex to 3.5mmadapter cable (4) / Connectsmicrophones toBelaMono to stereoadapter cable (4) / Split stereo signal tomono signalUSB LiPo battery 5V, 2Amp Provide power to BelaCTAG BEASTUSB storage / Save and storerecorded audio ﬁleslocallyWiFi Dongle / Wireless connectionto bela IDEFig. 3.

Bela multichannel audio hardware system assembly andperipheral. The core processing part is highlighted in the red box.

BeagleBone Black (BBB) single-board computer, which isfeatured by a 1GHz ARM Cortex-A8 processor, two PRUs(Programmable Realtime Units), 512 MB RAM, and a diverserange of on-board peripherals. Bela is used for controllingthe sound acquisition and audio processing. Bela is externallypowered by a LiPo USB battery that operates at 5V and2Amp for stability and powering the USB peripherals. The

Fig. 4.

Interacting with Bela from a local computer via wirelessnetwork.

Bela conﬁguration only requires 5V / 300 - 400mA of powerfor operation.The audio codec operates at 48 kHz sampling rate with16 bits analogue-to-digital converter (ADC) and digital-to-analogue converter (DAC) conversion. To accommodate 8microphone inputs, two CTAG FACE capes are stacked ontop of each other and connected with Bela via the onboardmetal contacts.

2) Storage and wireless unit:

A external USB hub isconnected to the USB socket of the Beaglebone device. Thehub accommodates a USB storage stick, which stores therecording locally, and a USB WiFi Dongle, eliminating theneed for a hard-wired connection to the system IDE andenabling the recorded audio to be transferred to a remoteprocessing terminal.

3) Hardware tray:

A hardware tray is designed toaccommodate the Bela system and the cables. The traycontains a Bela enclosure (made from ABS) and shock case(made from Thermoplastic Polyurethane - TPU) to aid withprotecting the hardware from impacts in the event of a crash.The tray is produced with 3D printing.IV.

SOFTWARE DESIGN

The software design has three objectives: to run the code ina stand-alone device; to record the sound locally to the USBstorage; to transfer the sound via WiFi to a remote terminal.All the objectives are achieved with the assistance of the BelaIntegrated Development Environment (IDE).

A. IDE for stand-alone processing

The Bela IDE (Fig.5) is a browser based integrateddevelopment environment with features that allow for editing,building and managing projects easily from a ground station(remote terminal) via a self-organized wireless network.The IDE software is pre-installed at the Bela device, alongwith an operation system. Following the steps in the tutorial ,we set up a self-organized wireless network through a WiFidongle mounted on the Bela device. Upon system boot, Belastarts a NodeJS server that allows connection to its systemfrom a ground station via the wireless network. The WiFi issetup as a peer-to-peer connection to ensure that the board actsas a dynamic host conﬁguration protocol (DHCP) server.To connect to the Bela device from the ground station,we ﬁrst need to select the WiFi network hosted by the Belasystem. After connection, the IDE can be loaded by enteringthe IP address of the host device from the web browser. TheIDE interface (Fig. 5) will appear automatically at the webbrowser of the ground station. ∼ linwang/download/bela documentation.pdf Fig. 5.

Bela Integrated Development Environment.

After compiling, building and running a project from theIDE. The project can be set to run on boot in the IDE Settingstab by selecting the desired project in the drop down menu.The program will operate on Bela without connecting to theground station as long as the external power is provided.

B. Sound recording

The code that enables the Bela to function as a multichannelrecording device is written in C++. This allows for quickaccess in the event that the system requires any modiﬁcations,creating a ﬂexible system. The source code for sound recordingis given in the Appendix, with the processing ﬂow shownin Fig. 6. In brief, after importing the required library,conﬁguring global variables and ﬁle path, the program setsup the recording task to capture the multichannel audio data,writes the stream to the audio buffer (memory block), andstores the data in the pre-deﬁned ﬁle path. Once the recordingis ﬁnished, a clean-up function ﬁnalizes the writing processand closes the ﬁle.The IDE enables the user to start/stop recording, changesettings and download audio ﬁles directly from the systemamong some other features. After building and running theproject, the recording will start by writing the digital audio tothe speciﬁed system path. Pressing the stop icon in the IDEwill stop the recording process. The audio data is continuouslywritten to the local storage during recording. Once the stopbutton is pressed the .wav ﬁle is ﬁnalised and closed.

C. WiFi Network Connection

The WiFi connection enables the user to access the Belasystem through the IDE without a hard-wired connection.Instead of recording the audio to the USB storage, we canalter the target ﬁle-path to the default RAM memory of theBela device (see Appendix) in order to have the ﬁle appearand update during the recording process within the resourcessection of the project explorer tab in the IDE. The network

Fig. 6.

Processing ﬂow for sound recording with Bela. connection is continuous to allow the user to change differentfunctions in the IDE.The current WiFi signal is able to achieve an operationalrange of 20 metres between ground station and the drone.When the network connection is lost momentarily, the IDEuser interface stops updating the user about the project runningand dependent on ﬁlepath selection, the ﬁle size of the currentrecording. The IDE recovers after coming back into WiFisignal range. When the wireless network is re-established theexisting ﬁle has essentially continued running on the systemand the IDE user interface resumes updating the recordingprogress of the ﬁle. The WiFi connection is not required toconduct the recording itself but to monitor its progress.V. E

XPERIMENT

A. Setup

To verify the validity of the developed hardware system, weconduct in-ﬂight testing and recording. We record the ego-noise and the speech, separately. When recording the ego-noise, the altitude of the drone during ﬂight is maintainedat about 2 meters above the ground via the ﬂight controller(Fig. 7). We record two types of ego-noise: drone hovering anddrone moving. In the former case, the drone is hovering in theair using the GPS stabilised mode with additional manual input(correcting small drift) to allow the drone to remain reasonablystable throughout the recording. In the latter case, the drone ismoving in the air at a speed of around 1 meter/second, withrandom rotation and tilting during ﬂight. When recording thespeech-only data, the drone is muted on the ground and aloudspeaker plays sound at a distance of 2 meters. The originalsampling rate is 48 kHz. The audio is downsampled to 8 kHzbefore processing. All the analysis is completed ofﬂine and noton the Bela system. The sound recording is available online . B. Ego-noise analysis

Fig. 8(a) depicts time-domain waveform of the ego-noiserecorded at the hovering and the moving status. Fig. 8(c) plotsthe time-frequency domain spectrogram, which is computedwith a moving window of 128 ms and half overlap. Fig. 8(b)plots the power at each time frame. Fig. 8(d) plots the ∼ linwang/bela.html Fig. 7.

A drone with microphone array hovering in the air duringrecording. frequency-domain spectrum at the 25-th second and 35-thsecond of the two ego-noises, respectively.From Fig. 8(b), the power of the ego-noise does not showbig differences at the hovering status and the moving status.The mean and standard deviation of the power across timeframes (10 s -40 s ) are -29.9 dB and 0.41 dB, respectively, atthe hovering status. The mean and standard deviation of thepower across time frames (10 s -60 s ) are -29.6 dB and 0.42 dB,respectively, at the moving status. For the moving status, weobserve a sudden rise of the power at 45 s , which is possiblydue to the rotation operation of the drone.From Fig. 8(c), it can be observed that the ego-noise consistsof multiple harmonics. Since the four motors might operateat a slight different rotating speed, the harmonic ego-noisepresents several pitches, which can be veriﬁed from Fig. 8(d).At the hovering status, the pitch of the ego-noise remainsstable. At the moving status, the pitch of the ego-noise varieswith time, depending on ﬂight status of the drone.The recording is made in an outdoor environment with alight breeze present. However, from the spectrogram of therecording we do not observe an evident inﬂuence of the windin the low frequency. This is possibly due to the windshieldworn by each microphone and also the placement of themicrophones on top of the drone. C. Processing results

We synthesize a noisy signal at the microphones by addingthe ego-noise (hovering status) and the speech at differentinput SNRs, which vary from -35 dB to 0 dB, with aninterval of 5 dB. The testing signal is 25 seconds long. Weemploy a block-wise processing strategy, using a non-overlapsliding block of 4 seconds. For simplicity, we just verify theperformance of two benchmark spatial ﬁlters enhancing thetarget sound from the ego-noise. The ﬁrst spatial ﬁlter isa beamformer based on Multichannel Wiener ﬁltering [14]which computes the correlation matrix of the target sound

Time [s] -0.200.2 A m p li t ude Hovering

0 10 20 30 40 50

Time [s] F r eq [ k H z ] Frequency [kHz] -120-100-80-60-40 P o w e r [ d B ]

0 10 20 30 40 50

Time [s] -35-30-25 P o w e r [ d B ] Time [s] -0.200.2 A m p li t ude Moving

0 10 20 30 40 50 60 70

Time [s] F r eq [ k H z ] Frequency [kHz] -120-100-80-60-40 P o w e r [ d B ]

0 10 20 30 40 50 60 70

Time [s] -35-30-25 P o w e r [ d B ] (a)(b)(c)(d) Fig. 8. Visualization of the ego-noise when the drone is hovering and moving.(a) Time-domain waveform; (b) Power plot; (c) Time-frequency spectrogram;(d) Frequency-domain plot. -35 -30 -25 -20 -15 -10 -5 0

Input SNR [dB] -20-100102030 O u t pu t S NR [ d B ] BeamformerBSS

Fig. 9. Benchmark performance achieved by two spatial ﬁlters at variousinput SNRs. and the noise separately assuming the speech-only and noise-only signals are available. The second spatial ﬁlter is based onblind source separation (BSS) [13], assuming the permutationambiguities can be perfectly solved by referencing to thespeech-only signals. The speech enhancement performanceis evaluated with the SNR measure, which is deﬁned, givenspeech s ( n ) and noise v ( n ) , as [32]SNR = 10 log (cid:80) n s ( n ) (cid:80) n v ( n ) (1)We average the output SNR across all the processing blocks.Fig. 9 depicts the output SNR achieved by the two spatial Time [s] -0.100.1 A m p li t ude Time [s] -0.100.1 A m p li t ude Time [s] -0.100.1 A m p li t ude Time [s] F r eq [ k H z ] Time [s] F r eq [ k H z ] Time [s] F r eq [ k H z ] (c)(b)(a) Fig. 10. Processing results (beamformer) for input SNR -20 dB. The outputSNR is 3.1 dB. (a) Clean speech; (b) Noisy signal before procesisng; (c)Noisy signal after processing. ﬁlters at different input SNRs. BSS performs slightly betterthan beamformer when the input SNR is lower than -15 dB,while beamformer performs better at higher input SNRs. Onaverage, the two spatial ﬁlters improve the SNR by about20 dB.Fig. 10 illustrates exemplar processing results at inputSNR -20 dB with the beamformer. The output SNR is 3.1dB. Fig. 10(b) shows that the speech signal is completelyburied in the ego-noise in the time-domain waveform andnot distinguishable between each other in the time-frequencyspectrogram. Fig. 10 shows the enhanced speech afterprocessing, where the speech is better observed in the time-frequency spectrogram.It should be noted that the two spatial ﬁlters are estimatedwith ideal assumption (i.e. the correlation matrix of the targetand the noise are known) and thus set the benchmark of theperformance of spatial ﬁltering. In practice, the correlationmatrices of the target and noise have to be estimated fromthe noisy data, which leads to performance drop in low-SNRscenarios [14]. A comprehensive evaluation will be left forfuture work. VI. C

ONCLUSION

We present an embedded multichannel sound acquisitionsystem that can ﬂy with the drone. The system canaccommodate up to 8 microphones placed in an arbitraryshape, record the sound locally and can transfer the recordedﬁle to a remote terminal via a self-organized wireless network.Experimental results with recordings made with this hardwareverify its validity. This will be the ﬁrst stage towards creatinga fully embedded solution for drone audition.

1 //render.cpp2 3 * path = "/mnt/usb/audioCh.wav" ;

9 SNDFILE * outfile2 ;

10 char originalFilename [] = "/mnt/usb/audioCh.wav" ;

11 char * uniqueFilename = WriteFile :: generateUniqueFilename ( originalFilename );

12 13 //Audio14 AuxiliaryTask gFillBufferTask ;

15 SNDFILE * outfile ;

16 unsigned int gAudioFrames ;

17 unsigned int gAudioInChannels ;

18 float gAudioSampleRate ;

19 Pipe gPipe ;

20 21 void openFile () {

22 SF_INFO sfinfo ;

23 sfinfo . channels = gAudioInChannels ;

24 sfinfo . samplerate = gAudioSampleRate ;

25 sfinfo . format = SF_FORMAT_WAV | SF_FORMAT_PCM_16 ;

26 outfile2 = sf_open ( uniqueFilename , SFM_WRITE , & sfinfo ); }

28 29 void closeFile () {

30 sf_write_sync ( outfile2 );

31 sf_close ( outfile2 );

32 printf ( ".wav file written and closed\n" ); }

34 35 void writeBuffer ( void *) {

36 unsigned int numItems = gAudioFrames * gAudioInChannels ;

37 float buf [ numItems ];

38 int ret ; while(( ret = gPipe . readNonRt ( buf , numItems ) ) > ) {

41 sf_write_float ( outfile2 , & buf [ ], ret ); } }

44 45 bool setup ( BelaContext * context , void * arg ) //setup audio frames and channels46 {

47 gAudioSampleRate = context -> audioSampleRate ;

48 gAudioFrames = context -> audioFrames ;

49 gAudioInChannels = context -> audioInChannels ;

50 gPipe . setup ( "sndfile-write" , , false, false);

51 openFile (); if(( gFillBufferTask = Bela_createAuxiliaryTask (& writeBuffer , , "writeBuffer" )) == ) { return false; } return true; }

57 58 void render ( BelaContext * context , void * arg ) {

60 gPipe . writeRt ( context -> audioIn , context -> audioFrames * context -> audioInChannels );

61 Bela_scheduleAuxiliaryTask ( gFillBufferTask ); }

63 64 void cleanup ( BelaContext * context , void * arg ) {

66 closeFile ();

67 free ( uniqueFilename ); } Fig. 11.

Source code in render.cpp

Future work would be to conduct a comprehensiveevaluation of the state-of-the-art algorithms for ego-noisereduction and to optimize the code for real-time processingat Bela, which is able to process audio at very low latency( < A PPENDIX : S

OUND RECORDING

Fig. 11 lists the C++ source code ﬁle that is used formultichannel sound recording with the Bela device. Thereare several crucial conﬁgurations for sound recording: the ﬁlepath, the number of channels and the sampling rate. The ﬁlepath can be conﬁgured by setting the global variable constchar* path in the source code, e.g. the command constchar* path = "/mnt/usb/audiofilename" setsUSB storage as the ﬁle path. The system can automaticallyrecognize the the amount of active audio inputs and thusdoes not need to conﬁgure the number of channels. Thesampling rate can be conﬁgured by setting the globalvariable gAudioSampleRate . The ADC and DAC gain isadjustable within the IDE settings tab. Once the recordingis ﬁnished, the ﬁle on the USB storage can be downloadedthrough IDE after copying them to the project folder, e.g.using the command cp /mnt/usb/audiofile.wav/root/Bela/projects/projectname/ . Alternatively,we can remove the USB storage from Bela and insert it intoa computer for data transfer.A breakdown and interpretation of the source code is givenbelow. < Bela . h > < l i b r a r i e s / Pipe / Pipe . h > < l i b r a r i e s / s n d f i l e / s n d f i l e . h > < l i b r a r i e s / W r i t e F i l e / W r i t e F i l e . h > There are four main library imports that are used in thecode. Bela.h is the central control code for hard real-timeaudio on BeagleBone Black using PRU and Xenomai Linuxextensions. The pipe library enables the use of a bi-directionalpipe that allows for data to be exchanged between realtimeand non-realtime thread. The Writeﬁle library is imported toenable the use of the generateUniqueFilename function, thatreturns a unique ﬁlename appending a number at the endof the original ﬁlename. This is important in order to avoidoverwriting existing recordings. The sndﬁle library allows theuse of the libsndﬁle API (application programming interface)which is designed to allow the reading and writing of manydifferent sampled sound ﬁle formats. const char * path = ” / mnt / usb / audioCh . wav” ;SNDFILE * o u t f i l e 2 ; char o r i g i n a l F i l e n a m e [ ] = ” / mnt / usb / audioCh . wav” ; char * uniqueFilename = W r i t e F i l e : : generateUniqueFilename (o r i g i n a l F i l e n a m e ) ;

The ﬁle path is created ﬁrst, in this case the usb is usedas the main storage. The outﬁle2 is the name of the referencefor the SNDFILE pointer. The path to the original ﬁlenameis assigned and then the generateUniqueFilename function iscalled on the original ﬁlename to return a unique ﬁlename.Instead of recording the audio to the USB storage,the ﬁlepath can be altered to const char* path ="./audiofilename" in order to have the ﬁle appearand update during the recording process within the resourcessection of the project explorer tab in the IDE.

A u x i l i a r y T a s k g F i l l B u f f e r T a s k ; unsigned i n t gAudioFrames ; unsigned i n t gAudioInChannels ; f l o a t gAudioSampleRate ;Pipe gPipe ;

The global variables are established for use later inthe code. The gFillBufferTask is a Auxiliary task variablethat is used to write to the audio buffer. The gPipe,gAudioSampleRate, gAudioInChannels and gAudioFrames areglobal variables that utilise the established behaviours presentin their corresponding library class ﬁles. void o p e n F i l e ( ) { SF INFO s f i n f o ;s f i n f o . c h a n n e l s = gAudioInChannels ;s f i n f o . s a m p l e r a t e = gAudioSampleRate ;s f i n f o . format = SF FORMAT WAV | SF FORMAT PCM 16 ;o u t f i l e 2 = sf open ( uniqueFilename , SFM WRITE, &s f i n f o ) ; } The openFile function details the

SF INFO structure and thespeciﬁed ﬁle format, sample rate, amount of channels. The sf open function opens the sound ﬁle at the speciﬁed path andutilises the write only mode

SFM WRITE and the sﬁnfo structurefor passing data between the calling function and the librarywhen opening the ﬁle for in this case writing. void c l o s e F i l e ( ) { s f w r i t e s y n c ( o u t f i l e 2 ) ;s f c l o s e ( o u t f i l e 2 ) ;p r i n t f ( ” . wav f i l e w r i t t e n and c l o s e d \ n” ) ; } The closeFile function closes and writes the ﬁle to disk. sf write sync allows for the ﬁle if it is opened using

SFM WRITE to call the operating system’s function to force the writing ofall ﬁle cache buffers to disk. sf close closes the ﬁle, deallocatesits internal buffers and returns 0 on success or an error valueotherwise. void w r i t e B u f f e r ( void *) { unsigned i n t numItems = gAudioFrames * gAudioInChannels ; f l o a t buf [ numItems ] ; i n t r e t ; while ( ( r e t = gPipe . readNonRt ( buf , numItems ) ) > { s f w r i t e f l o a t ( o u t f i l e 2 , &buf [ 0 ] , r e t ) ; }} The writeBuffer function essentially writes data to thebuffer. This function calculates the number of items bymultiplying the audio frames and audio input channels. Abuffer array holds the audio frame and input channel items.An integer variable is declared with the purpose of holding thenumber of items. The while loop will loop through the block ofcode as long as the speciﬁed condition is true. The readNonRtreads data from the non-realtime side. The sf write ﬂoat functionwrites the data in the array pointing to the pointer of the ﬁle.For items-count functions, the items parameter speciﬁes thesize of the array and must be an integer product of the numberof channels or an error will occur. bool s e t u p ( BelaContext * context , void * arg ) { gAudioSampleRate = context − > audioSampleRate ;gAudioFrames = context − > audioFrames ;gAudioInChannels = context − > audioInChannels ;gPipe . s e t u p ( ” s n d f i l e − w r i t e ” , 65536 , f a l s e , f a l s e ) ;o p e n F i l e ( ) ; i f ( ( g F i l l B u f f e r T a s k = B e l a c r e a t e A u x i l i a r y T a s k (&w r i t e B u f f e r , 90 , ” w r i t e B u f f e r ” ) )== 0) { return f a l s e ; } return true ; } The setup function is a user-deﬁned initialisation functionwhich runs before audio rendering begins. This function runsonce at the beginning of the program, after most of the systeminitialisation has begun but before audio rendering starts. Thisis used to prepare any memory or resources that will be neededin render. The audio sample rate, frames and audio inputchannels are all setup using the Bela context structure. Thepipe is setup to write data with a speciﬁed size and whether thereads at the realtime and non-realtime side should be blocking.The openFile function is called to open the ﬁle for writing theaudio data. The auxiliary task is then created with write bufferparameters. void r e n d e r ( BelaContext * context , void * arg ) { gPipe . w r i t e R t ( context − > audioIn , context − > audioFrames *context − > audioInChannels ) ;B e l a s c h e d u l e A u x i l i a r y T a s k ( g F i l l B u f f e r T a s k ) ; } The render function is a user-deﬁned callback function toprocess audio and sensor data. This function is called regularlyby the system every time there is a new block of audio and/orsensor data to process. The writeRt function reads data fromthe non-realtime side. The context− > audioIn is the ﬂoat* thatpoints to all the input samples, stored as interleaved channels.The audioIn is an array of 4 frames * 2 channels = 8 audioinput samples. The auxiliary task which has previously beencreated is scheduled to run. void cleanup ( BelaContext * context , void * arg ) { c l o s e F i l e ( ) ;f r e e ( uniqueFilename ) ; } The cleanup function runs when the program ﬁnishes tofree up memory. This function is called by the system onceafter audio rendering has ﬁnished, before the program quits.It is used to release any memory allocated in setup and toperform any other required cleanup. If no initialisation isperformed in setup, then this function will usually be empty.The ﬁle is closed and the ﬁle that has been created by thegenerateUniqueFilename has its block of memory deallocated. R

EFERENCES[1] D. Floreano, D. and R. J. Wood, “Science, technology and the futureof small autonomous drones,”

Nature , vol. 521, no. 7553, pp. 460-466,2015.[2] G. Parascandolo, H. Huttunen, and T. Virtanen, “Recurrent neuralnetworks for polyphonic sound event detection in real life recordings,” in

Proc. IEEE Int. Conf. Acoust., Speech Signal Process. , Shanghai, China,2016, pp. 6440-6444.[3] S. Li and D. Yeung, “Visual object tracking for unmanned aerialvehicles: A benchmark and new motion models,” in

Proc. Thirty-FirstAAAI Conf. Artiﬁcial Intelligence , San Francisco, USA, 2017, pp. 4140-4146.[4] P. Misra, A. A. Kumar, P. Mohapatra, and P. Balamuralidhar, “Aerialdrones with location-sensitive ears,”

IEEE Commun. Mag. , vol. 56, no.7, pp. 154-160, Jul. 2018.[5] L. Wang, R. Sanchez-Matilla, and A. Cavallaro, “Tracking a movingsound source from a multi-rotor drone,” in

Proc. IEEE/RSJ Int. Conf.Intell. Robots Sys. , Madrid, Spain, 2018, pp. 2511-2516.[6] R. Sanchez-Matilla, L. Wang, and A. Cavallaro, “Multi-modallocalization and enhancement of multiple sound sources from a microaerial vehicle,”

Proc. ACM Multimedia , Silicon Valley, USA, 2017, pp.1591-1599.[7] L. Wang and A. Cavallaro, “Acoustic sensing from a multi-rotor drone,”

IEEE Sensors J. , vol. 18, no. 11, pp. 4570-4582, Nov. 2018.[8] A. Deleforge, D. Di Carlo, M. Strauss, R. Serizel, and L. Marcenaro,“Audio-based search and rescue with a drone: highlights from the IEEEsignal processing cup 2019 student competition,”

IEEE Signal Process.Mag. , vol. 36, no. 5, pp. 138-144, Sep. 2019.[9] A. Schmidt, H. W. Lollmann, and W. Kellermann, “A novel ego-noisesuppression algorithm for acoustic signal enhancement in autonomoussystems,” in

Proc. IEEE Int. Conf. Acoust., Speech Signal Process. ,Calgary, Canada, 2018, pp. 6583-6587.[10] B. Kang, H. Ahn, and H. Choo, “A software platform for noise reductionin sound sensor equipped drones,”

IEEE Sensors J. , vol. 19, no. 21 pp.10121-10130, Nov. 2019.[11] Y. Hioka, M. Kingan, G. Schmid, R. McKay, and K. A. Stol, “Design ofan unmanned aerial vehicle mounted system for quiet audio recording,”

Appl. Acoust. , vol. 155, pp. 423-427, 2019.[12] L. Wang and A. Cavallaro, “Deep learning assisted time-frequencyprocessing for speech enhancement on drones,”

IEEE Trans.Emerging Topics Computational Intelligence , 2020, Early Access, DOI:10.1109/TETCI.2020.3014934.[13] L. Wang and A. Cavallaro, “A blind source separation framework forego-noise reduction on multi-rotor drones,”

IEEE/ACM Trans. AudioSpeech Lang. Process. , vol. 28, pp. 2523-2537, 2020.[14] L. Wang and A. Cavallaro, “Microphone-array ego-noise reductionalgorithms for auditory micro aerial vehicles,”

IEEE Sensors J. , vol.17, no. 8, pp. 2447-2455, Aug. 2017.[15] B. Yen and Y. Hioka, “Noise power spectral density scaled SNR responseestimation with restricted range search for sound source localisationusing unmanned aerial vehicles,”

EURASIP J. Audio Speech MusicProcess. , vol. 2020, no. 1, pp. 1-26, 2020.[16] L. Wang and A. Cavallaro, “Time-frequency processing for sound sourcelocalization from a micro aerial vehicle,” in

Proc. IEEE Int. Conf.Acoust., Speech Signal Process. , New Orleans, USA, 2017, pp. 496-500.[17] M. Strauss, P. Mordel, V. Miguet, and A. Deleforge, “DREGON: datasetand methods for UAV-embedded sound source localization,” in

Proc.IEEE/RSJ Int. Conf. Intell. Robot. Syst. , Madrid, Spain, 2018, pp. 5735-5742.[18] M. Wakabayashi, H. G. Okuno, and M. Kumon, “Drone auditionlistening from the sky estimates multiple sound source positions byintegrating sound source localization and data association,”

AdvancedRobotics , pp. 1-12, 2020.[19] K. Hoshiba, K. Washizaki, M. Wakabayashi, T. Ishiki, M. Kumon, Y.Bando, D. Gabriel, K. Nakadai, and H.G. Okuno, “Design of UAV-embedded microphone array system for sound source localization inoutdoor environments,”

Sensors , vol. 17, no. 11, pp. 1-16, Nov. 2017.[20] K. Furukawa, K. Okutani, K. Nagira, T. Otsuka, K. Itoyama, K. Nakadai,and H. G. Okuno, “Noise correlation matrix estimation for improvingsound source localization by multirotor UAV,” in

Proc. IEEE/RSJ Int.Conf. Intell. Robot. Syst. , Tokyo, Japan, 2013, pp. 3943-3948.[21] A. McPherson and V. Zappi, “An environment for submillisecond-latency audio and sensor processing on BeagleBone Black,” in

Proc.Audio Engineering Society Convention 138 , 2015, pp. 1-7. [22] A. McPherson, H. J. Robert, and G. Moro, “Action-sound latency: Areour tools fast enough?”, in

Proc. Int. Conf. New Interfaces MusicalExpression , Brisbane, Australia, 2016, pp. 1-6.[23] M. B. Andra, B. Rohman, and T. Usagawa, “Feasibility evaluation forkeyword spotting system using mini microphone array On UAV”, in

Proc. IEEE International Geoscience Remote Sensing Symp. , Yokohama,Japan, 2019, pp. 2264-2267.[24] Z. W. Tan, A. H. Nguyen, and A. W. Khong, “An efﬁcient dilatedconvolutional neural network for UAV noise reduction at low inputSNR,” in

Proc. Asia-Paciﬁc Signal Information Process. AssociationAnnual Summit Conf. , Lanzhou, China, 2019, pp. 1885-1892.[25] D. Salvati, C. Drioli, G. Ferrin, and G. L. Foresti, “Acoustic sourcelocalization from multirotor UAVs”,

IEEE Trans. Industrial Electronics ,vol. 67, no. 10, pp. 8618-8628, Oct. 2020.[26] K. Okutani, T. Yoshida, K. Nakamura, and K. Nakadai, “Outdoorauditory scene analysis using a moving microphone array embeddedin a quadrocopter,” in

Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst. ,Vilamoura-Algarve, Portugal, 2012, pp. 3288-3293.[27] L. Wang, R. Sanchez-Matilla, and A. Cavallaro, “Audio-visual sensingfrom a quadcopter: dataset and baselines for source localization andsound enhancement,” in

Proc. IEEE/RSJ Int. Conf. Intell. Robots Sys. ,Macao, China, 2019, pp. 5320-5325.[28] O. Ruiz-Espitia, J. Martinez-Carranza, and C. Rascon. “AIRA-UAS: Anevaluation corpus for audio processing in unmanned aerial system,” inProc. Int. Conf. Unmanned Aircraft Systems, Dallas, USA, 2018, pp.836-845.[29] K. Nakadai, H. G. Okuno, and T. Mizumoto, “Development, deploymentand applications of robot audition open source software HARK,”

J.Robotics and Mechatronics , vol. 29, no. 1, pp. 16-25, Jan. 2017.[30] L. Wang and A. Cavallaro, “Ear in the sky: Ego-noise reduction forauditory micro aerial vehicles”, in

Proc. Int. Conf. Adv. Video Signal-Based Surveillance , Colorado Springs, USA, 2016, pp. 152-158.[31] H. Langer and R. Manzke, “Embedded multichannel Linux audiosystemfor musical applications,” in

Proc. Int. Audio Mostly Conf. AugmentedParticipatory Sound Music Experiences , London, UK, 2017, pp. 1-5.[32] L. Wang, T. Gerkmann, and S. Doclo, “Noise power spectral densityestimation using MaxNSR blocking matrix,”