An embedded multichannel sound acquisition system for drone audition
Michael Clayton, Lin Wang, Andrew McPherson, Andrea Cavallaro
11 An embedded multichannel sound acquisitionsystem for drone audition
Michael Clayton, Lin Wang, Andrew McPherson, Andrea Cavallaro
Abstract —Microphone array techniques can improve theacoustic sensing performance on drones, compared to the useof a single microphone. However, multichannel sound acquisitionsystems are not available in current commercial drone platforms.To encourage the research in drone audition, we present anembedded sound acquisition and recording system with eightmicrophones and a multichannel sound recorder mounted on aquadcopter. In addition to recording and storing locally the soundfrom multiple microphones simultaneously, the embedded systemcan connect wirelessly to a remote terminal to transfer audiofiles for further processing. This will be the first stage towardscreating a fully embedded solution for drone audition. We presentexperimental results obtained by state-of-the-art drone auditionalgorithms applied to the sound recorded by the embeddedsystem.
Index Terms —Drone audition, microphone array, embeddedsystem
I. I
NTRODUCTION
The use of drones for remote sensing has substantiallyincreased in the past decade, with operation in broadcasting,surveillance, inspection, and search and rescue [1]. Sensingis primarily based on cameras (optical and thermal) andlasers [2]–[6], whereas microphones are rarely used becauseof the inherently challenging sound sensing conditions [7].When visual data is unreliable due to low light, poor weatherconditions or visual obstructions [8], drone audition wouldgreatly benefit the above-mentioned applications. One of themain obstacles when capturing audio on a drone is the strongego-noise created by the rotating motors, propellers and theairflow during flight. The ego-noise masks the target soundsources and causes poor recording quality.Microphone array techniques can be used to improvethe drone audition performance through sound enhancement[9]–[14] and sound source localization [15]–[18], [20].An important bottleneck for deploying microphone arrayalgorithms on drones is the requirement of a multichannelsound acquisition system to enable sampling the soundfrom multiple microphones simultaneously and convert it tomultichannel digital signals before further processing. Thesound acquisition system needs to fly with the drone, whichimposes additional constraints on the size and weight of thesystem. To the best of our knowledge, there is no dedicatedmultichannel sound acquisition device available in currentcommercial drone platforms. Researchers have to design andimplement their own hardware systems for data collection on
Manuscript received: December 25, 2020The authors are with Centre for Intelligent Sensing, Queen Mary Universityof London, London, UK (e-mail: { m.p.clayton, lin.wang, a.mcpherson,a.cavallaro } @qmul.ac.uk) drones, and the processing of the data is often done offline afterthe flight due to limited computational resources onboard.To conduct and encourage the research in the field ofdrone audition, we designed an embedded multichannel soundacquisition system that is suitable for drone audition and canbe mounted on a drone for acoustic sensing during flight.The system is designed based on Bela [21], an embeddedcomputing platform dedicated to audio processing, and canaccommodate up to eight microphones placed in arbitraryshapes. The system can record and store the sound locally for on-device processing; and can also transfer the recordedsound file via wireless communication to a remote terminal.In the remainder of the paper, we disclose the technical detailsfor hardware, software design and development.The paper is organized as follows. Sec. II reviews relatedworks. Sec. III and Sec. IV present the hardware and softwaredesign of the embedded system. Sec. V presents real datacollection with the hardware and presents baseline processingresults with state-of-the-art drone audition algorithms.II. R ELATED WORK
Three types of audio hardware are employed: multichannelsound card, intelligent multichannel voice interface, andportable multichannel sound recorder.
1) A portable multichannel sound recorder:
This is theeasiest way to capture sound from drones as there is norequirement for any configuration of the system, e.g. ZoomH6 [11] and Zoom R24 [7], [27]. The hardware supportsarbitrary array topology. The drawback is the the hardware canonly achieve recording and does not support sound processing.Another drawback is that the hardware, e.g. Zoom R24, isusually too heavy a payload for the drone to fly.
2) Intelligent multichannel voice interface:
This typeof hardware integrates the microphone array and soundprocessing into a compact IC board, e.g. ReSpeaker [23] andUAM-8 [24]. This hardware usually requires an additionalcontroller, e.g. Rasberry Pi, for sound acquisition and soundprocessing. This hardware is also usually easy to use andconfigure for audio purposes. One of the main advantages isthat the hardware is also very compact and light-weight, and issuitable to fly with the drone. The drawback is the topology ofthe array is fixed, which limits the performance and flexibilityof microphone array algorithms.
3) Multichannel sound card:
This is the most popularapproach for sound recording on drones, using e.g. RASPseries [18], [19], [26], 8SoundUSB [17], [28], USBStreamer [25]. This hardware supports arbitrary array a r X i v : . [ ee ss . A S ] J a n TABLE IE
XISTING MULTICHANNEL SOUND ACQUISITION SYSTEMS ON DRONES . Q - Q
UADCOPTOR ; H - H
EXACOPTERS ; O - O
CTOCOPTER
Ref Number of Shape Placement Audio interface Drone Type Remarkmicrophones of the array of the array [11] 6 T-shape Side Zoom H6 Self-assembled (Q) Portable recorder[7] 8 Circular Top Zoom R24 3DR Iris (Q) Portable recorder[23] 6 Circular (fixed) Top ReSpeaker + Raspberry Pi Self-assembled (O) Intelligent voice interface[24] 7 Circular (fixed) Side UMA-8 Mic array + Raspberry Pi Self-assembled (Q) Intelligent voice interface[25] 8 Circular Below MiniDSP USBStreamer I2S-to-USB Matrice 100 (Q) Sound card[17] 8 Cubic Below 8SoundsUSB MK-Quadro (Q) Sound card[28] 8 Circular Top, below, side 8SoundsUSB Matrice 100 (Q) Sound card[19] 12 Spherical Side RASP-ZX Surveyor MS-06LA (H) Sound card[26] 8 Circular Side RASP-24 Parrot AR Drone (Q) Sound card[18] 16 Octagon Side RASP-ZX Surveyor MS-06LA (H) Sound card
Proposed
Architecture of the multichannel sound acquisition system. topology along with sound acquisition and sound processing.The main drawback is the user requires knowledge ofthe hardware circuit design. This particular hardware alsorequires an operating system to control sound recording andprocessing, e.g. the RASP series is used in combinationwith the HARK system [29]. A good understanding of theback-end driver is necessary. The lack of related resources isalso intimidating for algorithm designers.III. H
ARDWARE DESIGN
Fig. 1 and Fig. 2 illustrate the architecture and thereal objects of the multichannel sound acquisition system,respectively. The system mainly consists of three parts: themicrophone array, the drone, the hardware tray containing theBela sound acquisition system and the cables. Table II liststhe components used by the system. Fig. 3 illustrates the Belahardware system assembly and peripheral connections.
A. Microphone array and drone
We use a circular microphone array consisting of eightBoya BY-M1 lapel microphones that are each powered by anLR44 (1.5V) battery. A balanced audio signal is provided fromthe microphones. The diameter of the array is 16.5 cm. Themicrophone array frame is 3D printed and constructed fromAcrylonitrile butadiene styrene (ABS). The array is mountedon top of the drone to avoid the air flow from the rotatingpropellers blowing downward [30]. The vertical distance fromthe array to the drone body is 18 cm. For the drone, we useDJI Matrice 100, which has a payload capacity of 1 kg. (a) (b)(c) (d)
Fig. 2. Real objects of the multichannel sound acquisition system. (a) Frontview; (b) Side view; (c) Top view; (d) Microphone array.
B. Bela-based sound acquisition system
The sound acquisition system consists of four units: thecore processing unit, the storage and transmission unit, andthe hardware tray. Fig. 3 illustrates the hardware connections.
1) Core processing unit:
The core processing unit consistsof one Beaglebone device flashed with the latest Bela software.To access multichannel audio, Bela uses a customizedexpansion board called CTAG BEAST, featuring an audiocodec with 4 audio input and 8 audio output channels. OneCTAG BEAST consists of 2 x CTAG FACE capes pre-configured for use as a BEAST [31], two CTAG Molexbreakout boards, and one external LiPo power battery.Bela is a dedicated audio processing platform based on Bela is an embedded audio programming and processing platform inventedby academia from Queen Mary University of London [21]. The compact size,light-weight, low-latency and multichannel sound acquisition makes it suitablefor sound processing on drones [22]. Bela also comes with a user friendlybrowser-based Integrated Development Environment (IDE), which is used foreasy access for editing, building and managing the system. For this reason,we decided to develop a multichannel sound acquisition system based on theBela device. This is the first time the Bela system has been applied to roboticplatforms assisting audition.
TABLE IIC
OMPONENTS USED IN THE HARDWARE SYSTEM .Component Type FunctionalityDrone Matrice 100 /Microphones (8) Lapel microphones /Array frame 3D printing Holding microphonesHardware tray 3D printing Holding hardwareand cablesBela BeagleBone Black 1GHz ARM Cortex-A8 processorCTAG Beast (2) / Multichannel audioacquisitionCTAG Molexbreakout board(2) / Audio inputsMolex to 3.5mmadapter cable (4) / Connectsmicrophones toBelaMono to stereoadapter cable (4) / Split stereo signal tomono signalUSB LiPo battery 5V, 2Amp Provide power to BelaCTAG BEASTUSB storage / Save and storerecorded audio fileslocallyWiFi Dongle / Wireless connectionto bela IDEFig. 3.
Bela multichannel audio hardware system assembly andperipheral. The core processing part is highlighted in the red box.
BeagleBone Black (BBB) single-board computer, which isfeatured by a 1GHz ARM Cortex-A8 processor, two PRUs(Programmable Realtime Units), 512 MB RAM, and a diverserange of on-board peripherals. Bela is used for controllingthe sound acquisition and audio processing. Bela is externallypowered by a LiPo USB battery that operates at 5V and2Amp for stability and powering the USB peripherals. The
Fig. 4.
Interacting with Bela from a local computer via wirelessnetwork.
Bela configuration only requires 5V / 300 - 400mA of powerfor operation.The audio codec operates at 48 kHz sampling rate with16 bits analogue-to-digital converter (ADC) and digital-to-analogue converter (DAC) conversion. To accommodate 8microphone inputs, two CTAG FACE capes are stacked ontop of each other and connected with Bela via the onboardmetal contacts.
2) Storage and wireless unit:
A external USB hub isconnected to the USB socket of the Beaglebone device. Thehub accommodates a USB storage stick, which stores therecording locally, and a USB WiFi Dongle, eliminating theneed for a hard-wired connection to the system IDE andenabling the recorded audio to be transferred to a remoteprocessing terminal.
3) Hardware tray:
A hardware tray is designed toaccommodate the Bela system and the cables. The traycontains a Bela enclosure (made from ABS) and shock case(made from Thermoplastic Polyurethane - TPU) to aid withprotecting the hardware from impacts in the event of a crash.The tray is produced with 3D printing.IV.
SOFTWARE DESIGN
The software design has three objectives: to run the code ina stand-alone device; to record the sound locally to the USBstorage; to transfer the sound via WiFi to a remote terminal.All the objectives are achieved with the assistance of the BelaIntegrated Development Environment (IDE).
A. IDE for stand-alone processing
The Bela IDE (Fig.5) is a browser based integrateddevelopment environment with features that allow for editing,building and managing projects easily from a ground station(remote terminal) via a self-organized wireless network.The IDE software is pre-installed at the Bela device, alongwith an operation system. Following the steps in the tutorial ,we set up a self-organized wireless network through a WiFidongle mounted on the Bela device. Upon system boot, Belastarts a NodeJS server that allows connection to its systemfrom a ground station via the wireless network. The WiFi issetup as a peer-to-peer connection to ensure that the board actsas a dynamic host configuration protocol (DHCP) server.To connect to the Bela device from the ground station,we first need to select the WiFi network hosted by the Belasystem. After connection, the IDE can be loaded by enteringthe IP address of the host device from the web browser. TheIDE interface (Fig. 5) will appear automatically at the webbrowser of the ground station. ∼ linwang/download/bela documentation.pdf Fig. 5.
Bela Integrated Development Environment.
After compiling, building and running a project from theIDE. The project can be set to run on boot in the IDE Settingstab by selecting the desired project in the drop down menu.The program will operate on Bela without connecting to theground station as long as the external power is provided.
B. Sound recording
The code that enables the Bela to function as a multichannelrecording device is written in C++. This allows for quickaccess in the event that the system requires any modifications,creating a flexible system. The source code for sound recordingis given in the Appendix, with the processing flow shownin Fig. 6. In brief, after importing the required library,configuring global variables and file path, the program setsup the recording task to capture the multichannel audio data,writes the stream to the audio buffer (memory block), andstores the data in the pre-defined file path. Once the recordingis finished, a clean-up function finalizes the writing processand closes the file.The IDE enables the user to start/stop recording, changesettings and download audio files directly from the systemamong some other features. After building and running theproject, the recording will start by writing the digital audio tothe specified system path. Pressing the stop icon in the IDEwill stop the recording process. The audio data is continuouslywritten to the local storage during recording. Once the stopbutton is pressed the .wav file is finalised and closed.
C. WiFi Network Connection
The WiFi connection enables the user to access the Belasystem through the IDE without a hard-wired connection.Instead of recording the audio to the USB storage, we canalter the target file-path to the default RAM memory of theBela device (see Appendix) in order to have the file appearand update during the recording process within the resourcessection of the project explorer tab in the IDE. The network
Fig. 6.
Processing flow for sound recording with Bela. connection is continuous to allow the user to change differentfunctions in the IDE.The current WiFi signal is able to achieve an operationalrange of 20 metres between ground station and the drone.When the network connection is lost momentarily, the IDEuser interface stops updating the user about the project runningand dependent on filepath selection, the file size of the currentrecording. The IDE recovers after coming back into WiFisignal range. When the wireless network is re-established theexisting file has essentially continued running on the systemand the IDE user interface resumes updating the recordingprogress of the file. The WiFi connection is not required toconduct the recording itself but to monitor its progress.V. E
XPERIMENT
A. Setup
To verify the validity of the developed hardware system, weconduct in-flight testing and recording. We record the ego-noise and the speech, separately. When recording the ego-noise, the altitude of the drone during flight is maintainedat about 2 meters above the ground via the flight controller(Fig. 7). We record two types of ego-noise: drone hovering anddrone moving. In the former case, the drone is hovering in theair using the GPS stabilised mode with additional manual input(correcting small drift) to allow the drone to remain reasonablystable throughout the recording. In the latter case, the drone ismoving in the air at a speed of around 1 meter/second, withrandom rotation and tilting during flight. When recording thespeech-only data, the drone is muted on the ground and aloudspeaker plays sound at a distance of 2 meters. The originalsampling rate is 48 kHz. The audio is downsampled to 8 kHzbefore processing. All the analysis is completed offline and noton the Bela system. The sound recording is available online . B. Ego-noise analysis
Fig. 8(a) depicts time-domain waveform of the ego-noiserecorded at the hovering and the moving status. Fig. 8(c) plotsthe time-frequency domain spectrogram, which is computedwith a moving window of 128 ms and half overlap. Fig. 8(b)plots the power at each time frame. Fig. 8(d) plots the ∼ linwang/bela.html Fig. 7.
A drone with microphone array hovering in the air duringrecording. frequency-domain spectrum at the 25-th second and 35-thsecond of the two ego-noises, respectively.From Fig. 8(b), the power of the ego-noise does not showbig differences at the hovering status and the moving status.The mean and standard deviation of the power across timeframes (10 s -40 s ) are -29.9 dB and 0.41 dB, respectively, atthe hovering status. The mean and standard deviation of thepower across time frames (10 s -60 s ) are -29.6 dB and 0.42 dB,respectively, at the moving status. For the moving status, weobserve a sudden rise of the power at 45 s , which is possiblydue to the rotation operation of the drone.From Fig. 8(c), it can be observed that the ego-noise consistsof multiple harmonics. Since the four motors might operateat a slight different rotating speed, the harmonic ego-noisepresents several pitches, which can be verified from Fig. 8(d).At the hovering status, the pitch of the ego-noise remainsstable. At the moving status, the pitch of the ego-noise varieswith time, depending on flight status of the drone.The recording is made in an outdoor environment with alight breeze present. However, from the spectrogram of therecording we do not observe an evident influence of the windin the low frequency. This is possibly due to the windshieldworn by each microphone and also the placement of themicrophones on top of the drone. C. Processing results
We synthesize a noisy signal at the microphones by addingthe ego-noise (hovering status) and the speech at differentinput SNRs, which vary from -35 dB to 0 dB, with aninterval of 5 dB. The testing signal is 25 seconds long. Weemploy a block-wise processing strategy, using a non-overlapsliding block of 4 seconds. For simplicity, we just verify theperformance of two benchmark spatial filters enhancing thetarget sound from the ego-noise. The first spatial filter isa beamformer based on Multichannel Wiener filtering [14]which computes the correlation matrix of the target sound
Time [s] -0.200.2 A m p li t ude Hovering
0 10 20 30 40 50
Time [s] F r eq [ k H z ] Frequency [kHz] -120-100-80-60-40 P o w e r [ d B ]
0 10 20 30 40 50
Time [s] -35-30-25 P o w e r [ d B ] Time [s] -0.200.2 A m p li t ude Moving
0 10 20 30 40 50 60 70
Time [s] F r eq [ k H z ] Frequency [kHz] -120-100-80-60-40 P o w e r [ d B ]
0 10 20 30 40 50 60 70
Time [s] -35-30-25 P o w e r [ d B ] (a)(b)(c)(d) Fig. 8. Visualization of the ego-noise when the drone is hovering and moving.(a) Time-domain waveform; (b) Power plot; (c) Time-frequency spectrogram;(d) Frequency-domain plot. -35 -30 -25 -20 -15 -10 -5 0
Input SNR [dB] -20-100102030 O u t pu t S NR [ d B ] BeamformerBSS
Fig. 9. Benchmark performance achieved by two spatial filters at variousinput SNRs. and the noise separately assuming the speech-only and noise-only signals are available. The second spatial filter is based onblind source separation (BSS) [13], assuming the permutationambiguities can be perfectly solved by referencing to thespeech-only signals. The speech enhancement performanceis evaluated with the SNR measure, which is defined, givenspeech s ( n ) and noise v ( n ) , as [32]SNR = 10 log (cid:80) n s ( n ) (cid:80) n v ( n ) (1)We average the output SNR across all the processing blocks.Fig. 9 depicts the output SNR achieved by the two spatial Time [s] -0.100.1 A m p li t ude Time [s] -0.100.1 A m p li t ude Time [s] -0.100.1 A m p li t ude Time [s] F r eq [ k H z ] Time [s] F r eq [ k H z ] Time [s] F r eq [ k H z ] (c)(b)(a) Fig. 10. Processing results (beamformer) for input SNR -20 dB. The outputSNR is 3.1 dB. (a) Clean speech; (b) Noisy signal before procesisng; (c)Noisy signal after processing. filters at different input SNRs. BSS performs slightly betterthan beamformer when the input SNR is lower than -15 dB,while beamformer performs better at higher input SNRs. Onaverage, the two spatial filters improve the SNR by about20 dB.Fig. 10 illustrates exemplar processing results at inputSNR -20 dB with the beamformer. The output SNR is 3.1dB. Fig. 10(b) shows that the speech signal is completelyburied in the ego-noise in the time-domain waveform andnot distinguishable between each other in the time-frequencyspectrogram. Fig. 10 shows the enhanced speech afterprocessing, where the speech is better observed in the time-frequency spectrogram.It should be noted that the two spatial filters are estimatedwith ideal assumption (i.e. the correlation matrix of the targetand the noise are known) and thus set the benchmark of theperformance of spatial filtering. In practice, the correlationmatrices of the target and noise have to be estimated fromthe noisy data, which leads to performance drop in low-SNRscenarios [14]. A comprehensive evaluation will be left forfuture work. VI. C
ONCLUSION
We present an embedded multichannel sound acquisitionsystem that can fly with the drone. The system canaccommodate up to 8 microphones placed in an arbitraryshape, record the sound locally and can transfer the recordedfile to a remote terminal via a self-organized wireless network.Experimental results with recordings made with this hardwareverify its validity. This will be the first stage towards creatinga fully embedded solution for drone audition.
1 //render.cpp2 3 * path = "/mnt/usb/audioCh.wav" ;
9 SNDFILE * outfile2 ;
10 char originalFilename [] = "/mnt/usb/audioCh.wav" ;
11 char * uniqueFilename = WriteFile :: generateUniqueFilename ( originalFilename );
12 13 //Audio14 AuxiliaryTask gFillBufferTask ;
15 SNDFILE * outfile ;
16 unsigned int gAudioFrames ;
17 unsigned int gAudioInChannels ;
18 float gAudioSampleRate ;
19 Pipe gPipe ;
20 21 void openFile () {
22 SF_INFO sfinfo ;
23 sfinfo . channels = gAudioInChannels ;
24 sfinfo . samplerate = gAudioSampleRate ;
25 sfinfo . format = SF_FORMAT_WAV | SF_FORMAT_PCM_16 ;
26 outfile2 = sf_open ( uniqueFilename , SFM_WRITE , & sfinfo ); }
28 29 void closeFile () {
30 sf_write_sync ( outfile2 );
31 sf_close ( outfile2 );
32 printf ( ".wav file written and closed\n" ); }
34 35 void writeBuffer ( void *) {
36 unsigned int numItems = gAudioFrames * gAudioInChannels ;
37 float buf [ numItems ];
38 int ret ; while(( ret = gPipe . readNonRt ( buf , numItems ) ) > ) {
41 sf_write_float ( outfile2 , & buf [ ], ret ); } }
44 45 bool setup ( BelaContext * context , void * arg ) //setup audio frames and channels46 {
47 gAudioSampleRate = context -> audioSampleRate ;
48 gAudioFrames = context -> audioFrames ;
49 gAudioInChannels = context -> audioInChannels ;
50 gPipe . setup ( "sndfile-write" , , false, false);
51 openFile (); if(( gFillBufferTask = Bela_createAuxiliaryTask (& writeBuffer , , "writeBuffer" )) == ) { return false; } return true; }
57 58 void render ( BelaContext * context , void * arg ) {
60 gPipe . writeRt ( context -> audioIn , context -> audioFrames * context -> audioInChannels );
61 Bela_scheduleAuxiliaryTask ( gFillBufferTask ); }
63 64 void cleanup ( BelaContext * context , void * arg ) {
66 closeFile ();
67 free ( uniqueFilename ); } Fig. 11.
Source code in render.cpp
Future work would be to conduct a comprehensiveevaluation of the state-of-the-art algorithms for ego-noisereduction and to optimize the code for real-time processingat Bela, which is able to process audio at very low latency( < A PPENDIX : S
OUND RECORDING
Fig. 11 lists the C++ source code file that is used formultichannel sound recording with the Bela device. Thereare several crucial configurations for sound recording: the filepath, the number of channels and the sampling rate. The filepath can be configured by setting the global variable constchar* path in the source code, e.g. the command constchar* path = "/mnt/usb/audiofilename" setsUSB storage as the file path. The system can automaticallyrecognize the the amount of active audio inputs and thusdoes not need to configure the number of channels. Thesampling rate can be configured by setting the globalvariable gAudioSampleRate . The ADC and DAC gain isadjustable within the IDE settings tab. Once the recordingis finished, the file on the USB storage can be downloadedthrough IDE after copying them to the project folder, e.g.using the command cp /mnt/usb/audiofile.wav/root/Bela/projects/projectname/ . Alternatively,we can remove the USB storage from Bela and insert it intoa computer for data transfer.A breakdown and interpretation of the source code is givenbelow. < Bela . h > < l i b r a r i e s / Pipe / Pipe . h > < l i b r a r i e s / s n d f i l e / s n d f i l e . h > < l i b r a r i e s / W r i t e F i l e / W r i t e F i l e . h > There are four main library imports that are used in thecode. Bela.h is the central control code for hard real-timeaudio on BeagleBone Black using PRU and Xenomai Linuxextensions. The pipe library enables the use of a bi-directionalpipe that allows for data to be exchanged between realtimeand non-realtime thread. The Writefile library is imported toenable the use of the generateUniqueFilename function, thatreturns a unique filename appending a number at the endof the original filename. This is important in order to avoidoverwriting existing recordings. The sndfile library allows theuse of the libsndfile API (application programming interface)which is designed to allow the reading and writing of manydifferent sampled sound file formats. const char * path = ” / mnt / usb / audioCh . wav” ;SNDFILE * o u t f i l e 2 ; char o r i g i n a l F i l e n a m e [ ] = ” / mnt / usb / audioCh . wav” ; char * uniqueFilename = W r i t e F i l e : : generateUniqueFilename (o r i g i n a l F i l e n a m e ) ;
The file path is created first, in this case the usb is usedas the main storage. The outfile2 is the name of the referencefor the SNDFILE pointer. The path to the original filenameis assigned and then the generateUniqueFilename function iscalled on the original filename to return a unique filename.Instead of recording the audio to the USB storage,the filepath can be altered to const char* path ="./audiofilename" in order to have the file appearand update during the recording process within the resourcessection of the project explorer tab in the IDE.
A u x i l i a r y T a s k g F i l l B u f f e r T a s k ; unsigned i n t gAudioFrames ; unsigned i n t gAudioInChannels ; f l o a t gAudioSampleRate ;Pipe gPipe ;
The global variables are established for use later inthe code. The gFillBufferTask is a Auxiliary task variablethat is used to write to the audio buffer. The gPipe,gAudioSampleRate, gAudioInChannels and gAudioFrames areglobal variables that utilise the established behaviours presentin their corresponding library class files. void o p e n F i l e ( ) { SF INFO s f i n f o ;s f i n f o . c h a n n e l s = gAudioInChannels ;s f i n f o . s a m p l e r a t e = gAudioSampleRate ;s f i n f o . format = SF FORMAT WAV | SF FORMAT PCM 16 ;o u t f i l e 2 = sf open ( uniqueFilename , SFM WRITE, &s f i n f o ) ; } The openFile function details the
SF INFO structure and thespecified file format, sample rate, amount of channels. The sf open function opens the sound file at the specified path andutilises the write only mode
SFM WRITE and the sfinfo structurefor passing data between the calling function and the librarywhen opening the file for in this case writing. void c l o s e F i l e ( ) { s f w r i t e s y n c ( o u t f i l e 2 ) ;s f c l o s e ( o u t f i l e 2 ) ;p r i n t f ( ” . wav f i l e w r i t t e n and c l o s e d \ n” ) ; } The closeFile function closes and writes the file to disk. sf write sync allows for the file if it is opened using
SFM WRITE to call the operating system’s function to force the writing ofall file cache buffers to disk. sf close closes the file, deallocatesits internal buffers and returns 0 on success or an error valueotherwise. void w r i t e B u f f e r ( void *) { unsigned i n t numItems = gAudioFrames * gAudioInChannels ; f l o a t buf [ numItems ] ; i n t r e t ; while ( ( r e t = gPipe . readNonRt ( buf , numItems ) ) > { s f w r i t e f l o a t ( o u t f i l e 2 , &buf [ 0 ] , r e t ) ; }} The writeBuffer function essentially writes data to thebuffer. This function calculates the number of items bymultiplying the audio frames and audio input channels. Abuffer array holds the audio frame and input channel items.An integer variable is declared with the purpose of holding thenumber of items. The while loop will loop through the block ofcode as long as the specified condition is true. The readNonRtreads data from the non-realtime side. The sf write float functionwrites the data in the array pointing to the pointer of the file.For items-count functions, the items parameter specifies thesize of the array and must be an integer product of the numberof channels or an error will occur. bool s e t u p ( BelaContext * context , void * arg ) { gAudioSampleRate = context − > audioSampleRate ;gAudioFrames = context − > audioFrames ;gAudioInChannels = context − > audioInChannels ;gPipe . s e t u p ( ” s n d f i l e − w r i t e ” , 65536 , f a l s e , f a l s e ) ;o p e n F i l e ( ) ; i f ( ( g F i l l B u f f e r T a s k = B e l a c r e a t e A u x i l i a r y T a s k (&w r i t e B u f f e r , 90 , ” w r i t e B u f f e r ” ) )== 0) { return f a l s e ; } return true ; } The setup function is a user-defined initialisation functionwhich runs before audio rendering begins. This function runsonce at the beginning of the program, after most of the systeminitialisation has begun but before audio rendering starts. Thisis used to prepare any memory or resources that will be neededin render. The audio sample rate, frames and audio inputchannels are all setup using the Bela context structure. Thepipe is setup to write data with a specified size and whether thereads at the realtime and non-realtime side should be blocking.The openFile function is called to open the file for writing theaudio data. The auxiliary task is then created with write bufferparameters. void r e n d e r ( BelaContext * context , void * arg ) { gPipe . w r i t e R t ( context − > audioIn , context − > audioFrames *context − > audioInChannels ) ;B e l a s c h e d u l e A u x i l i a r y T a s k ( g F i l l B u f f e r T a s k ) ; } The render function is a user-defined callback function toprocess audio and sensor data. This function is called regularlyby the system every time there is a new block of audio and/orsensor data to process. The writeRt function reads data fromthe non-realtime side. The context− > audioIn is the float* thatpoints to all the input samples, stored as interleaved channels.The audioIn is an array of 4 frames * 2 channels = 8 audioinput samples. The auxiliary task which has previously beencreated is scheduled to run. void cleanup ( BelaContext * context , void * arg ) { c l o s e F i l e ( ) ;f r e e ( uniqueFilename ) ; } The cleanup function runs when the program finishes tofree up memory. This function is called by the system onceafter audio rendering has finished, before the program quits.It is used to release any memory allocated in setup and toperform any other required cleanup. If no initialisation isperformed in setup, then this function will usually be empty.The file is closed and the file that has been created by thegenerateUniqueFilename has its block of memory deallocated. R
EFERENCES[1] D. Floreano, D. and R. J. Wood, “Science, technology and the futureof small autonomous drones,”
Nature , vol. 521, no. 7553, pp. 460-466,2015.[2] G. Parascandolo, H. Huttunen, and T. Virtanen, “Recurrent neuralnetworks for polyphonic sound event detection in real life recordings,” in
Proc. IEEE Int. Conf. Acoust., Speech Signal Process. , Shanghai, China,2016, pp. 6440-6444.[3] S. Li and D. Yeung, “Visual object tracking for unmanned aerialvehicles: A benchmark and new motion models,” in
Proc. Thirty-FirstAAAI Conf. Artificial Intelligence , San Francisco, USA, 2017, pp. 4140-4146.[4] P. Misra, A. A. Kumar, P. Mohapatra, and P. Balamuralidhar, “Aerialdrones with location-sensitive ears,”
IEEE Commun. Mag. , vol. 56, no.7, pp. 154-160, Jul. 2018.[5] L. Wang, R. Sanchez-Matilla, and A. Cavallaro, “Tracking a movingsound source from a multi-rotor drone,” in
Proc. IEEE/RSJ Int. Conf.Intell. Robots Sys. , Madrid, Spain, 2018, pp. 2511-2516.[6] R. Sanchez-Matilla, L. Wang, and A. Cavallaro, “Multi-modallocalization and enhancement of multiple sound sources from a microaerial vehicle,”
Proc. ACM Multimedia , Silicon Valley, USA, 2017, pp.1591-1599.[7] L. Wang and A. Cavallaro, “Acoustic sensing from a multi-rotor drone,”
IEEE Sensors J. , vol. 18, no. 11, pp. 4570-4582, Nov. 2018.[8] A. Deleforge, D. Di Carlo, M. Strauss, R. Serizel, and L. Marcenaro,“Audio-based search and rescue with a drone: highlights from the IEEEsignal processing cup 2019 student competition,”
IEEE Signal Process.Mag. , vol. 36, no. 5, pp. 138-144, Sep. 2019.[9] A. Schmidt, H. W. Lollmann, and W. Kellermann, “A novel ego-noisesuppression algorithm for acoustic signal enhancement in autonomoussystems,” in
Proc. IEEE Int. Conf. Acoust., Speech Signal Process. ,Calgary, Canada, 2018, pp. 6583-6587.[10] B. Kang, H. Ahn, and H. Choo, “A software platform for noise reductionin sound sensor equipped drones,”
IEEE Sensors J. , vol. 19, no. 21 pp.10121-10130, Nov. 2019.[11] Y. Hioka, M. Kingan, G. Schmid, R. McKay, and K. A. Stol, “Design ofan unmanned aerial vehicle mounted system for quiet audio recording,”
Appl. Acoust. , vol. 155, pp. 423-427, 2019.[12] L. Wang and A. Cavallaro, “Deep learning assisted time-frequencyprocessing for speech enhancement on drones,”
IEEE Trans.Emerging Topics Computational Intelligence , 2020, Early Access, DOI:10.1109/TETCI.2020.3014934.[13] L. Wang and A. Cavallaro, “A blind source separation framework forego-noise reduction on multi-rotor drones,”
IEEE/ACM Trans. AudioSpeech Lang. Process. , vol. 28, pp. 2523-2537, 2020.[14] L. Wang and A. Cavallaro, “Microphone-array ego-noise reductionalgorithms for auditory micro aerial vehicles,”
IEEE Sensors J. , vol.17, no. 8, pp. 2447-2455, Aug. 2017.[15] B. Yen and Y. Hioka, “Noise power spectral density scaled SNR responseestimation with restricted range search for sound source localisationusing unmanned aerial vehicles,”
EURASIP J. Audio Speech MusicProcess. , vol. 2020, no. 1, pp. 1-26, 2020.[16] L. Wang and A. Cavallaro, “Time-frequency processing for sound sourcelocalization from a micro aerial vehicle,” in
Proc. IEEE Int. Conf.Acoust., Speech Signal Process. , New Orleans, USA, 2017, pp. 496-500.[17] M. Strauss, P. Mordel, V. Miguet, and A. Deleforge, “DREGON: datasetand methods for UAV-embedded sound source localization,” in
Proc.IEEE/RSJ Int. Conf. Intell. Robot. Syst. , Madrid, Spain, 2018, pp. 5735-5742.[18] M. Wakabayashi, H. G. Okuno, and M. Kumon, “Drone auditionlistening from the sky estimates multiple sound source positions byintegrating sound source localization and data association,”
AdvancedRobotics , pp. 1-12, 2020.[19] K. Hoshiba, K. Washizaki, M. Wakabayashi, T. Ishiki, M. Kumon, Y.Bando, D. Gabriel, K. Nakadai, and H.G. Okuno, “Design of UAV-embedded microphone array system for sound source localization inoutdoor environments,”
Sensors , vol. 17, no. 11, pp. 1-16, Nov. 2017.[20] K. Furukawa, K. Okutani, K. Nagira, T. Otsuka, K. Itoyama, K. Nakadai,and H. G. Okuno, “Noise correlation matrix estimation for improvingsound source localization by multirotor UAV,” in
Proc. IEEE/RSJ Int.Conf. Intell. Robot. Syst. , Tokyo, Japan, 2013, pp. 3943-3948.[21] A. McPherson and V. Zappi, “An environment for submillisecond-latency audio and sensor processing on BeagleBone Black,” in
Proc.Audio Engineering Society Convention 138 , 2015, pp. 1-7. [22] A. McPherson, H. J. Robert, and G. Moro, “Action-sound latency: Areour tools fast enough?”, in
Proc. Int. Conf. New Interfaces MusicalExpression , Brisbane, Australia, 2016, pp. 1-6.[23] M. B. Andra, B. Rohman, and T. Usagawa, “Feasibility evaluation forkeyword spotting system using mini microphone array On UAV”, in
Proc. IEEE International Geoscience Remote Sensing Symp. , Yokohama,Japan, 2019, pp. 2264-2267.[24] Z. W. Tan, A. H. Nguyen, and A. W. Khong, “An efficient dilatedconvolutional neural network for UAV noise reduction at low inputSNR,” in
Proc. Asia-Pacific Signal Information Process. AssociationAnnual Summit Conf. , Lanzhou, China, 2019, pp. 1885-1892.[25] D. Salvati, C. Drioli, G. Ferrin, and G. L. Foresti, “Acoustic sourcelocalization from multirotor UAVs”,
IEEE Trans. Industrial Electronics ,vol. 67, no. 10, pp. 8618-8628, Oct. 2020.[26] K. Okutani, T. Yoshida, K. Nakamura, and K. Nakadai, “Outdoorauditory scene analysis using a moving microphone array embeddedin a quadrocopter,” in
Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst. ,Vilamoura-Algarve, Portugal, 2012, pp. 3288-3293.[27] L. Wang, R. Sanchez-Matilla, and A. Cavallaro, “Audio-visual sensingfrom a quadcopter: dataset and baselines for source localization andsound enhancement,” in
Proc. IEEE/RSJ Int. Conf. Intell. Robots Sys. ,Macao, China, 2019, pp. 5320-5325.[28] O. Ruiz-Espitia, J. Martinez-Carranza, and C. Rascon. “AIRA-UAS: Anevaluation corpus for audio processing in unmanned aerial system,” inProc. Int. Conf. Unmanned Aircraft Systems, Dallas, USA, 2018, pp.836-845.[29] K. Nakadai, H. G. Okuno, and T. Mizumoto, “Development, deploymentand applications of robot audition open source software HARK,”
J.Robotics and Mechatronics , vol. 29, no. 1, pp. 16-25, Jan. 2017.[30] L. Wang and A. Cavallaro, “Ear in the sky: Ego-noise reduction forauditory micro aerial vehicles”, in
Proc. Int. Conf. Adv. Video Signal-Based Surveillance , Colorado Springs, USA, 2016, pp. 152-158.[31] H. Langer and R. Manzke, “Embedded multichannel Linux audiosystemfor musical applications,” in
Proc. Int. Audio Mostly Conf. AugmentedParticipatory Sound Music Experiences , London, UK, 2017, pp. 1-5.[32] L. Wang, T. Gerkmann, and S. Doclo, “Noise power spectral densityestimation using MaxNSR blocking matrix,”