[PDF] InaudibleKey: Generic Inaudible Acoustic Signal based Key Agreement Protocol for Mobile Devices

Abstract

Secure Device-to-Device (D2D) communication is becoming increasingly important with the ever-growing number of Internet-of-Things (IoT) devices in our daily life. To achieve secure D2D communication, the key agreement between different IoT devices without any prior knowledge is becoming desirable. Although various approaches have been proposed in the literature, they suffer from a number of limitations, such as low key generation rate and short pairing distance. In this paper, we present InaudibleKey, an inaudible acoustic signal-based key generation protocol for mobile devices. Based on acoustic channel reciprocity, InaudibleKey exploits the acoustic channel frequency response of two legitimate devices as a common secret to generating keys. InaudibleKey employs several novel technologies to significantly improve its performance. We conduct extensive experiments to evaluate the proposed system in different real environments. Compared to state-of-the-art works, InaudibleKey improves key generation rate by 3-145 times, extends pairing distance by 3.2-44 times, and reduces information reconciliation counts by 2.5-16 times. Security analysis demonstrates that InaudibleKey is resilient to a number of malicious attacks. We also implement InaudibleKey on modern smartphones and resource-limited IoT devices. Results show that it is energy-efficient and can run on both powerful and resource-limited IoT devices without incurring excessive resource consumption.

Full PDF

IInaudibleKey: Generic Inaudible Acoustic Signal based KeyAgreement Protocol for Mobile Devices

Weitao Xu [email protected] University of Hong Kong

Zhenjiang Li [email protected] University of Hong Kong

Wanli Xue [email protected] of New South Wales ∗ Xiaotong Yu [email protected] of New South Wales

Bo Wei [email protected] University

Jia Wang [email protected] University

Chengwen Luo [email protected] University

Wei Li [email protected] University of Sydney

Albert Y. Zomaya [email protected] University of Sydney

ABSTRACT

Secure Device-to-Device (D2D) communication is becoming in-creasingly important with the ever-growing number of Internet-of-Things (IoT) devices in our daily life. To achieve secure D2Dcommunication, the key agreement between different IoT deviceswithout any prior knowledge is becoming desirable. Although vari-ous approaches have been proposed in the literature, they sufferfrom a number of limitations, such as low key generation rate andshort pairing distance. In this paper, we present InaudibleKey, aninaudible acoustic signal based key generation protocol for mobiledevices. Based on acoustic channel reciprocity, InaudibleKey ex-ploits the acoustic channel frequency response of two legitimate de-vices as a common secret to generating keys. InaudibleKey employsseveral novel technologies to significantly improve its performance.We conduct extensive experiments to evaluate the proposed systemin different real environments. Compared to state-of-the-art works,InaudibleKey improves key generation rate by 3-145 times, extendspairing distance by 3.2-44 times, and reduces information reconcili-ation counts by 2.5-16 times. Security analysis demonstrates thatInaudibleKey is resilient to a number of malicious attacks. We alsoimplement InaudibleKey on modern smartphones and resource-limited IoT devices. Results show that it is energy-efficient and canrun on both powerful and resource-limited IoT devices withoutincurring excessive resource consumption.

KEYWORDS

Key generation, Mobile devices, Acoustic signal

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].

With recent advances in mobile computing and embedded tech-nology, there is an increasing number of IoT devices in our dailylife, such as smartphone, smart watch, and Google assistant. Corre-spondingly, it is more and more common to pair two devices forthe purpose of data sharing, synchronization, and collaboration.For example, two persons that meet for the first time in a meetingwant to associate their smartphones temporarily to exchange theirname card. Due to the "open air" nature of wireless communica-tion, cryptographic key agreement is a fundamental requirementto secure D2D communication to achieve confidentiality [1, 2].The secure key distribution between two communication par-ties can be addressed by public key infrastructure (PKI). Unfor-tunately, public key-based solutions are not applicable to mobiledevices because PKI only works if the identity of the other party isknown out of band or only trusted parties have identities signedby pre-established certificate authorities. Another solution is pre-distributed key which is usually in the form of master key or keymaterial. However, key pre-distribution schemes lack scalability,which makes them incapable in a dynamic environment, wherenew devices may join and quit frequently. Near field communica-tion (NFC) is becoming more and more popular in modern mobiledevices, but its communication range is limited to only tens of cen-timetres (typically <20 cm). Diffie-Hellman protocol (also namedD-H protocol) is a popular key establishment protocol to establishcryptographic keys over a public channel. However, D-H protocol issusceptible to man-in-the-middle (MITM) attack and authenticatedD-H protocol requires the presence of certificate authority (CA).The most common method to pair mobile devices is still asking theuser to scan nearby devices, choose the target device and confirmmanually, which is neither user friendly nor suitable for deviceswithout screens.

The lack of efficient and user-friendly pairing methods has inspiredresearchers to explore suitable alternatives to authenticate mobiledevices. Because two devices are not assumed to own any sharedknowledge as a-priori, the widely adopted design principle in theliterature is that if multiple devices share a similar observation of a r X i v : . [ c s . N I] F e b PSN ’21, May 18-21, 2021, Tennessee, USA W. Xu et al. certain random signal, then the signal can be used to extract keys.Different designs explore different forms of such random signals [3–11], while they all strive to achieve a fast (sufficient bit generationrate), practical (without requiring extra hardware) and ubiquitous (usable in different environments) key generation system design.A successful set of pioneer efforts is to leverage wireless channelinformation [1, 3–6, 12], such as Received Signal Strength Indicator(RSSI) and Channel State Information (CSI). These methods arebased on wireless channel reciprocity which means the channelcharacteristics (RSSI or CSI) measured between two devices byexchanging a pair of probe packets in quick succession, will benearly the same. However, RSSI-based methods suffer from low keygeneration rate and predictable channel attack [3, 5]. CSI-basedsystems can improve key generation rate greatly and are robustto predictable channel attack [4]. However, the major limitationis it requires special toolkits to extract CSI from wireless card,and currently CSI can only be extracted from a limited number ofchipsets such as Intel’s 5300 NIC [13, 14].To overcome this issue, many recent designs appear for mobileplatforms by utilizing the on-board sensory data, such as acousticdata, motion sensor data, bio-sensor data, etc. However, throughour study, we find that they mainly trade the sensor’s availabilityfor other two further restrictions — either the system works in avery short pairing distance, e.g., 1.25 cm in Proximate [8], 5cm inTDS [4], or in a limited set of pre-defined contexts, e.g., in certainenvironments, when users are performing certain activities [10, 11],etc. The designs requiring short pairing distance are not preferablebecause people need to keep social distance (usually >1.5 m ) in theirdaily life since the outbreak of COVID-19. Many other approachesmay further suffer a long authentication delay [15], need expen-sive software to support (e.g., public key) [16], or require extrahardware (e.g., bio-sensors) [17, 18]. In this paper, we thus aim toinvestigate whether we can still achieve a fast and ubiquitous keyagreement system design that can pair two mobile devices beyondsocial distance, yet without using extra sensors not available onmobile devices? We find that the acoustic signals have such a great potential, in-spired by the success from previous radio signal based designs.Acoustic wave, as a form of wave, possesses many properties thatradio wave has. In particular, we exploit to leverage acoustic channelreciprocity to generate keys. One primary advantage of using acous-tic signal is that it is not dependent on special hardware as mostmobile devices are equipped with microphone and speaker. Ourpreliminary study also validates that the acoustic channel indeedholds channel reciprocity, as well as temporal variation and spatialdecorrelation, which could serve as the basics of key generation.However, because of the limited acoustic channel bandwidth andthe location offset between speaker and microphone, a number ofchallenges need to be addressed to design an efficient and robustkey agreement protocol based on the acoustic signal.(1) Firstly, how to achieve high bit generation rate through nar-row band acoustic channels. To this end, we need to extract fine-grained acoustic channel information. Unfortunately, dif-ferent from wireless card, microphone is not designed to pro-vide channel information such as RSSI and CSI. To extract fine-grained channel information, we design an effective transmit-ting scheme that uses the inaudible acoustic signal to modulateOrthogonal Frequency-Division Multiplexing (OFDM) symbols.Although applying OFDM modulation in acoustic signals isproposed in FingerIO [19], the use of OFDM in a key generationsystem can provide more acoustic channel information that canbe used to generate keys. As a result, the key generation rate issignificantly improved as demonstrated in our evaluation.(2) Secondly, how to further improve the entropy of the extractedkey. Existing quantization methods usually use a threshold todetermine whether a sample should be encoded to ‘1’ or ‘0’ [3].However, these methods may produce repeated bit strings whichreduce the entropy of the generated keys. Additionally, thesekeys are used directly to generate the final key, which leavesopportunities for powerful attackers to obtain raw informationby reverse engineering. To tackle this problem, we first apply anovel Bloom filter-based technology to protect the keys againstreverse engineering attack. Then, we leverage Karhunen-LoeveTransform (KLT) to remove the redundant information andenhance randomness.(3) Thirdly, the microphone and speaker are not located at thesame location in mobile devices. Hence, the transmitted signaland received signal will experience slightly different channels.Moreover, due to hardware diversity and manufacture imper-fection [20], different microphones/speakers attenuate somefrequencies selectively which will further cause more errors. Toaddress this challenge, we optimise a novel compressive sens-ing (CS) based reconciliation mechanism. The discrepanciesbetween two initial keys can be corrected with the help of pow-erful ℓ optimization. In particular, InaudibleKey can achievehigh matching rate even for different types of IoT devices.Although several recent works have exploited acoustic signal topair mobile devices [15, 16, 21, 22], our system shows significantperformance improvement. We make the following contributionsin this paper: • System Design.

We propose InaudibleKey, an inaudible acousticsignal-based key agreement protocol for mobile devices. Basedon acoustic channel reciprocity, InaudibleKey utilizes the channelfrequency response of OFDM symbols to generate keys. Inaudi-bleKey employs several novel approaches to significantly improvethe system performance. Particularly, we propose an optimisa-tion algorithm to improve the performance of a state-of-the-artreconciliation method. • System Implementation.

To demonstrate the feasibility, weimplement the prototype of InaudibleKey on both powerful de-vices (smartphone) and resource-limited devices (Arduino Unoboard). Evaluation results show that InaudibleKey incurs lowsystem cost and can run efficiently on these IoT devices. Wealso demonstrate that it is more energy efficient than public keycryptography and authenticated D-H protocol on IoT devices. • System Evaluation.

We conduct extensive experiments in dif-ferent real environments. Compared to state-of-the-art works, PSN ’21, May 18-21, 2021, Tennessee, USA

18 19 20 21 22

Frequency(Khz) N o r m a li z e d C F R ( d B ) AliceBobEve (a) Reciprocity

18 19 20 21 22

Frequency(Khz) N o r m a li z e d C F R ( d B ) t1t2 (b) Temporal variation Distance(cm) C o rr e l a t i on (c) Spatial decorrelation Figure 1: Feasibility Study.

InaudibleKey improves key generation rate by 3–145 times, ex-tends pairing distance by 3.2–44 times, reduces information rec-onciliation counts by 2.5–16 times. • Security Analysis.

Extensive analysis shows that InaudibleKeyis robust to a number of malicious attacks, such as eavesdroppingattack, imitating attack and predictable channel attack.The rest of the paper is organized as follows. We present prelimi-nary study results in Sec. 2, followed by a description of the systemmodel in Sec. 3. Then we specify the system design in Sec. 4. Weevaluate the performance of InaudibleKey in Sec. 5 and analyze itssecurity against attacks in Sec. 6. Finally, Sec. 7 discusses relatedworks, and Sec. 8 concludes the paper.

We first conduct preliminary study to verify whether acoustic chan-nel hold channel reciprocity, temporal variation and spatial decor-rrelation which serve as the basis for key generation.

Channel reciprocity.

Channel reciprocity means the channelcharacteristics (gains, phase shifts, and delays) measured betweentwo devices by exchanging a pair of probe packets within channelcoherence time will be very close [3]. For wireless channel, thewidely used channel characteristics include RSSI [3, 23] and CSI [1,4, 6]. Unfortunately, the microphone cannot report such channelinformation. Recent studies use channel taps [21] and soundpressure [22] as acoustic channel characteristics. However, we findthat they can only provide coarse-grained channel measurements.In this paper, we use channel frequency response (CFR), whichcan provide fine-grained acoustic channel information. CFR meansthe response of a channel at different frequencies. The multipathfading affects different frequencies across the channel to differentdegrees giving rise to frequency selective channel [24]. Moreover,if one of the devices is moving such as shaken by the user, it willalso cause channel variations due to Doppler effect. Therefore, thereceived signal will have different responses at different frequencies.In other words, the randomness in acoustic channel responses canbe used to generate keys just like radio channel characteristics.To validate the channel reciprocity, Alice and Bob exchange anumber of acoustic signals while Eve (20 cm away) is eavesdroppingthe acoustic signal between Alice and Bob. Fig. 1(a) plots the CFRof Alice, Bob and Eve. We can see that the CFR of the legitimatedevices are close to each other while Eve has different channel channel taps are the aggregate of delays caused by multi-path effect [24]. responses (though there are some similarities in certain frequencies).However, we also find the CFRs of Alice and Bob are not exactlythe same. First, the signals transmitted by Alice and Bob are nottransmitted exactly along the same path because the speaker andmicrophone are located in different locations on the smartphone.This is different from the wireless devices transmission, where thetransceivers could always use one specific antenna. Second, thehardware frequency selectivity from imperfect hardware such asspeaker [20], can also result in the CFR difference. Therefore, wecan see different strengthened and weakened power amplitudes ateach frequency. Temporal variation.

We set up two mobile phones—namely,Alice and Bob— in a laboratory with a distance of 1 m. Alice keepssending inaudible acoustic signals whose frequency varies from18 kHz to 22 kHz, while Bob is listening to the acoustic channel.Fig. 1(b) shows the channel frequency response at two differenttimes (t1 and t2 in Fig. 1(b)) when the user shakes one of the devices.We can see the acoustic channel response at different time instancesare different. When the user is shaking device randomly the acousticchannel between Alice and Bob changes rapidly. If the environmentchanges such as a person walks by, the acoustic channel will alsochange due to multi-path and Doppler effect just like the radiochannel.

Spatial decorrelation.

To validate spatial decorrelation, wevary the distance between Eve and Bob from 1 cm to 100 cm. Asshown in Fig. 1(c), the correlation between Eve’s and Bob’s chan-nel responses decreases rapidly as the distance increases. Accord-ing to radio propagation theory [24], the channel will be statisti-cally uncorrelated if two devices are separated by half wavelengthaway. If we use 22 kHz frequency audio signal, the wavelength 𝜆 is1.7 cm. Therefore, if Eve is located away from a legitimate device by 𝜆 = . cm, it will have different channel measurements. In prac-tice, however, this distance is set as at least multiple wavelengthsto alleviate a poor multipath scattering environment or interfer-ence and enhance security [25]. From Fig. 1(c), we can see that thecorrelation drops below 0.4 when the distance between Eve andlegitimate device is greater than 10 cm. Hence, the secure distanceis 10 cm in this paper, and we assume any attacker entering thisrange can be easily spotted by Alice or Bob. Fig. 2 illustrates the system model in this paper. We assume that twomobile devices, namely Alice and Bob, intend to generate the same PSN ’21, May 18-21, 2021, Tennessee, USA W. Xu et al.

Figure 2: System model key to secure their communication. Both devices are equipped witha speaker and a microphone. They have no prior shared secretsexcept that they have InaudibleKey installed. We assume that anadversary device Eve is located beyond a safe distance (10 cm inInaudibleKey) to the legitimate devices. If Eve moves within thesafe distance, it can be easily spotted by the users as noted in [16].One application scenario can be illustrated as follows. Suppose in ameeting, Alice and Bob who meet for the first time want to exchangetheir name card safely. But to keep social distance, they cannotestablish a secure communication channel via existing approaches,such as Touch-and-Guard [18], shake-n-shack [11] or NFC. By usingInaudibleKey, they can simply shake their device (e.g., smart watch)or perform a random gesture near the device for a short while. Asecure communication is then established between them even theyare 1-2 m away from each other. If there are sufficient randomnesssuch as moving subjects/objects around users, they do not evenneed to take any actions (see our demo ).We assume that Eve has the full knowledge of the key agreementprotocol and can eavesdrop, inject, and replay messages. However,like many previous key generation studies [1, 4, 10, 16, 21], al-though Eve can inject messages to the public wireless channel, weassume the goal of Eve is to intercept the secret key rather thanjamming their communications (i.e., DOS attack). In fact, the DOSattack against InaudibleKey can be performed by jamming inaudi-ble acoustic signals into the environment. We can use the methodsin the literature [26] to detect such attack. So we consider threetypes of attacks that are commonly used in related work [1, 21]. • Eavesdropping attack: Eve eavesdrops all the messages trans-mitted in the public channel to extract the same key. • Imitating attack: After Alice or Bob finish key extraction, Eveapproaches the same site with the aim of generating the samekey as a legitimate user. For example, Eve can first observehow Alice or Bob uses devices, e.g. the way of moving orshaking smartphones, then try to imitate his/her use patternand generate the same key. • Predictable channel attack: Eve can deliberately move aroundto generate desired or predictable changes in the channelbetween Alice and Bob.

Fig. 3 shows the work-flow of InaudibleKey. Suppose Alice andBob are two devices that want to generate a secret key. Firstly,they exchange a number of inaudible acoustic frames and calculatethe CFR. Then both devices follow the steps in Fig. 3 to generate https://drive.google.com/file/d/1lQIMuJahDV5PFhucXj3VODuFyXunn2Y5/view Figure 3: System Flowchart. the same cryptographic key. Finally, they use the extracted key tosecure their communication.

In sampling phase, both Alice and Bob exchange a number of acous-tic frames by transmitting via speaker and receiving via microphoneto obtain channel measurements. InaudibleKey uses inaudible fre-quency band from 18 kHz to 22 kHz for not disturbing users.To obtain fine-grained channel information, we apply OFDMtechnology on an acoustic signal based on the method in Finge-rIO [19]. Specifically, we divide the 18-22 kHz frequency band into64 subcarriers so that the width of each subcarrier is 62.5Hz. Thetransmitting time-domain samples can be obtained by performinginverse Fast Fourier transform (IFFT) on the transmitted data, andthe receiver can reconstruct the raw data bits by a Fast FourierTransform (FFT). A speaker will transmit the vectors with 64 realvalues from the OFDM symbol construction. Another advantageof using OFDM technology is that both devices can probe channelwithin channel coherence time without explicitly synchronisingtwo mobile devices. In fact, it is impractical to assume two mobiledevices are synchronised when they first meet each other. We usethe first 𝑆 𝑠𝑢𝑓 of these values to form a cyclic suffix that is appendedto the end of OFDM symbol. The cyclic suffix is used to accuratelyestimate the beginning of the OFDM symbol. Even Alice and Bobare not synchronized, they can still locate the beginning of the re-ceived symbol by calculating the correlation between the receivedsignal and known transmitting signal (see [19] for more details).The length of the transmitting signal and the transmitting inter-val is important for the following two reasons. 1) We need to makesure Alice and Bob obtain the channel estimation within coherencetime, so their CFRs are highly correlated; 2) the transmitting in-terval should be larger than the coherence time. Otherwise, theconsecutive CFRs will be correlated and the randomness of the keywill be reduced. In theory, the rate at which the channel variesis represented by Doppler frequency ( 𝑓 𝑑 ) and the duration whenthe channel is stable which is denoted by channel coherence time( 𝑇 𝑐 ). Coherence time is the time domain dual of Doppler spreadand is used to characterize the time varying nature of the channelfrequency. Suppose the moving speed of the subject or object is 𝑣 ,the channel frequency is 𝑓 , and the speed of acoustic signal is 𝑐 ( 𝑚 / 𝑠 ), then the maximum Doppler frequency is 𝑓 𝑑 = 𝑣 · 𝑓𝑐 [27].Practically, the channel coherence time with respect to the maxi-mum Doppler frequency shift is 𝑇 𝑐 = √︂ 𝜋 𝑓 𝑑 according to [27].The acoustic signal used in InaudibleKey is from 18 kHz to22 kHz, and the speed of common human motions varies from PSN ’21, May 18-21, 2021, Tennessee, USA

18 19 20 21 22

Frequency(Khz) N o r m a li z e d C F R ( d B ) Alice (window size=50)Bob (window size=50)Alice (window size=5000)Bob (window size=5000) (a) Impact of window size

Samples -60-58-56-54-52-50 C F R ( d B ) (b) An example of quantization (c) Projecting the Bits into Bloom filter space Figure 4: Illustration of quantization process. 𝑆 𝑠𝑢𝑓 to 26. Therefore, the transmitted symbol contains 90samples. Given 48 kHz sampling rate, the transmitter takes 1.9 msto transmit these 90 samples, which is shorter than the minimumcoherence time. In terms of transmitting interval, Alice and Bobexchange acoustic signals every 100 ms, which is longer than themaximum coherence time. Upon receiving the signal from Bob,Alice applies a Butterworth band-pass filter (18–22 kHz) to filter outthe environmental noises and calculate the CFR using the methodin [16]. When calculating CFR, the window length plays an impor-tant role: if it is too small, there is not much entropy; however, if it istoo large, there will be more mismatches (Fig. 4(a)). We empiricallyuse a Hamming window whose size is 2000 in InaudibleKey.Then the CFR is quantized to binary bits (0s and 1s) by multiple-bit quantization technique proposed in [3]. To be specific, we firstdivide the CFR measurements into several windows with no over-lap (window size 𝑊 = ). Then for each window, we divide thesamples into multiple quantization levels. Each level in the quanti-zation is assigned an n –bit code. For example, if 𝑛 = , then eachsample will be converted into 2 bits. We also insert guard bandbetween different levels to mitigate the effect of mismatches. Weuse 𝛼 ∈ [ , ] to represent the ratio of guard band to data. Thelarger the 𝛼 is, the more mismatches are discarded. However, it ispossible that the length of the keys generated by Alice and Bob aredifferent. To solve this issue, we exchange the indices of samplesthat are used to generate keys and only reserve keys generated atthe common indices. Fig. 4(b) shows how to convert a window of 20samples into bits. After quantization, we assume the keys generatedby Alice and Bob are denoted by 𝐾 ′ 𝐴𝑙𝑖𝑐𝑒 and 𝐾 ′ 𝐵𝑜𝑏 . Most previous work on key gen-eration utilize the quantized bits directly to get the final secretkey. However, Eve can perform attacks to derive the key from datacollected by herself and the data transferred between Alice and Bob.The Bloom filter has been used as part of the encoding and pertur-bation methods in many privacy-preserving applications [29, 30].Unfortunately, traditional Bloom filter projections cannot remainthe order information (without additional process). In other words,the mismatches between the two input key strings and output keystrings may be different. In InaudibleKey, we use a special designed Bloom filter data structure considering sequence/order informa-tion, which can help project the key into Bloom filter. The purposeof utilizing the adapted Bloom filter is to keep the key distanceinformation while in a non-plaintext format.The detailed process is presented by the Fig. 4(c). Take a 64-bitkey as an example, each single bit position information is con-veyed by an addition bit(s) before Bloom filter hash-mapping. Af-terwards, two hash functions are used to hash-map each elementin the adding-position-bits to the Bloom filtered space. The Bloomfilter hash-mapping will only turn the ‘0’ into ‘1’ based on the hashvalue (calculation) (see [31] for the original rationale). As a result,each ‘1’ in Bloom filter refers to a bit (‘0’ or ‘1’) at a certain positionof the original key ‘uniquely’. Most importantly, this adapted Bloomfilter data structure can also hold the Jaccard distance between theraw data bits and the projected Bloom filter data bits. That is tosay, suppose 𝐾 𝐴𝑙𝑖𝑐𝑒 and 𝐾 𝐵𝑜𝑏 represent the Bloom filter output of 𝐾 ′ 𝐴𝑙𝑖𝑐𝑒 and 𝐾 ′ 𝐵𝑜𝑏 , if there are 𝑁 𝑚𝑖𝑠 mismatches between 𝐾 ′ 𝐴𝑙𝑖𝑐𝑒 and 𝐾 ′ 𝐵𝑜𝑏 , then there will be also 𝑁 𝑚𝑖𝑠 mismatches between 𝐾 𝐴𝑙𝑖𝑐𝑒 and 𝐾 𝐵𝑜𝑏 . The proof of this can be referred from [30]. Thus, we candirectly use the following information reconciliation approach on 𝐾 𝐴𝑙𝑖𝑐𝑒 and 𝐾 𝐵𝑜𝑏 because they preserve the similarity informationbetween 𝐾 ′ 𝐴𝑙𝑖𝑐𝑒 and 𝐾 ′ 𝐵𝑜𝑏 . Note that although the Boom filter isan irreversible one-way function, if the input key’s length is tooshort, Eve can still get the Bloom filter’s output, such as throughbrute-force attack. Therefore, we need to ensure the entropy ofthe input of the Bloom filter. In InaudibleKey, we concatenate thebits generated from each window to a key string and further di-vide it into consecutive segments where each segment contains 128bits. Since Eve has no knowledge of the number and location ofthe incorrect bits, it is computationally infeasible ( guesses)to obtain 𝐾 ′ 𝐴𝑙𝑖𝑐𝑒 and 𝐾 ′ 𝐵𝑜𝑏 . The adapted Bloom filter, which onlyuse hash functions and limit temporary storage (to store the Bloomfilter results), will not involve much overhead to our entire mobilesystem.

Because of noise, we usually get 𝐾 𝐴𝑙𝑖𝑐𝑒 ≈ 𝐾 𝐵𝑜𝑏 instead of exactlyidentical keys. The purpose of reconciliation is to correct the mis-matches between 𝐾 𝐴𝑙𝑖𝑐𝑒 and 𝐾 𝐵𝑜𝑏 . In InaudibleKey, we optimise arecent developed CS-based reconciliation method [32] to improvethe key agreement rate. To make this paper more self-contained,we first succinctly describe the flow of this method then presentour optimisation algorithm. PSN ’21, May 18-21, 2021, Tennessee, USA W. Xu et al.

The key idea of the reconciliation method [32] is the mismatchesbetween Alice and Bob is much less than that of Alice and Eve (i.e.,more sparse) . Suppose the keys after bloom filter are 𝐾 𝐴𝑙𝑖𝑐𝑒 ∈ 𝑅 𝑁 and 𝐾 𝐵𝑜𝑏 ∈ 𝑅 𝑁 , respectively. The sampling matrix is 𝐴 ∈ 𝑅 𝑀 · 𝑁 which obeys the restricted isometry property (RIP) [33]. Researcheshave found a random Bernoulli matrix with equally distributedworks well, so we use a random Bernoulli matrix in InaudibleKey.Then Alice and Bob follow the steps below to correct their mis-matches.(1) Firstly, Bob generates a compressed vector 𝑦 𝐵𝑜𝑏 = 𝐴 · 𝐾 𝐵𝑜𝑏 ∈ 𝑅 𝑀 and transmits it to Alice via public channel.(2) Secondly, after receiving 𝑦 𝐵𝑜𝑏 , Alice calculates the differencebetween 𝑦 𝐴𝑙𝑖𝑐𝑒 and 𝑦 𝐵𝑜𝑏 : 𝑦 𝐴,𝐵 = 𝑦 𝐴𝑙𝑖𝑐𝑒 − 𝑦 𝐵𝑜𝑏 = 𝐴 ( 𝐾 𝐴𝑙𝑖𝑐𝑒 − 𝐾 𝐵𝑜𝑏 ) + 𝑒 = 𝐴 Δ 𝑥 + 𝑒 (1)where 𝑦 𝐴𝑙𝑖𝑐𝑒 = 𝐴𝐾 𝐴𝑙𝑖𝑐𝑒 and Δ 𝑥 represents the mismatchesbetween 𝐾 𝐴𝑙𝑖𝑐𝑒 and 𝐾 𝐵𝑜𝑏 and 𝑒 ∈ 𝑅 𝑀 is noise.(3) Finally, Alice apply ℓ optimization to reconstruct Δ 𝑥 from thecompressed data 𝑦 𝐴,𝐵 [33]: arg min Δ 𝑥 ∥ Δ 𝑥 ∥ subject to ∥ 𝑦 𝐴,𝐵 − 𝐴 Δ 𝑥 ∥ < 𝜖. (2)where 𝜖 is used to account for noise. Then, Alice can deduce 𝐾 𝐵𝑜𝑏 by ¯ 𝐾 𝐴𝑙𝑖𝑐𝑒 = 𝐾 𝐴𝑙𝑖𝑐𝑒 ⊕ Δ 𝑥 , and both Alice and Bob agreeon the same key ¯ 𝐾 𝐴𝑙𝑖𝑐𝑒 = 𝐾 𝐵𝑜𝑏 .When we use the above method, we find that the performancevaries a lot. We find out it is because the sampling matrix 𝐴 isgenerated randomly. To address this problem, we propose an op-timisation problem to improve its performance. According to thetheory of compressive sensing [34], the random matrix 𝐴 mustmeet the following two conditions. 𝑀 ( 𝐴 ) ≤ 𝐶 log 𝑁 , 𝑠 ≤ 𝐶𝑁 log 𝑁 · ∥ 𝐴 ∥ (3)where 𝐶 is a constant, 𝑁 is the length of the key (e.g., the number ofcolumns of 𝐴 ), 𝑠 is the sparsity of the key, and 𝑀 ( 𝐴 ) means the mu-tual coherence of 𝐴 which is defined as: 𝑀 ( 𝐴 ) = max 𝑖 < 𝑗 | 𝑎 𝑇𝑖 𝑎 𝑗 |∥ 𝑎 𝑖 ∥ ∥ 𝑎 𝑗 ∥ where 𝑎 𝑖 and 𝑎 𝑗 represent the 𝑖 − 𝑡ℎ and 𝑗 − 𝑡ℎ columns of 𝐴 respectively. In fact, 𝑀 ( 𝐴 ) represents the maximum value of cross-correlation between columns in 𝐴 . If 𝐴 is generated randomly everytime, it is hard to ensure it always has good mutual coherence. Thatis why the performance varies greatly for different 𝐴 . If we changethe second condition above into this form: ∥ 𝐴 ∥ ≥ 𝑠 log 𝑁𝐶𝑁 , we cansee that if ∥ 𝐴 ∥ is smaller, it is easier for 𝐴 to meet the second con-dition. Therefore, the goal of our proposed optimisation algorithmis to minimise 𝑀 ( 𝐴 ) iteratively.As shown in Algorithm 1, we start with finding two columns thathave minimum coherence from a searching space Ω . Ω includesa large finite set of random matrices. Then in each iteration, wechoose a vector from Ω that can minimise the maximum mutualcoherence between this vector and those already in 𝐴 . Finally, whenwe find 𝑁 such columns, the iteration terminates and we outputthe optimised matrix 𝐴 which has the minimum mutual coherence.This optimisation is conducted offline, so it does not incur anycomputation overhead. Note that although several other worksalso optimise projection matrix by minimising mutual coherence orrow coherence [35, 36], the problems to be solved are different, i.e., they aim to optimise a projection matrix for a certain dictionary toobtain better recovery signal. Algorithm 1

Sampling Matrix Optimisation

Objective : Find a matrix A with minimal 𝑀 ( 𝐴 ) Input : Search space Ω , the number of columns of 𝐴 is 𝑁 . Initialisation : traverse Ω to find two column vectors ^ 𝑎 and ^ 𝑎 suchthat their coherence is minimal, ^ 𝐴 = { ^ 𝑎 , ^ 𝑎 } , Ω = Ω \{ ^ 𝑎 , ^ 𝑎 } , 𝑖 = while 𝑖 ≤ 𝑁 do ˜ 𝐴 𝑗 = arg min ^ 𝑎 𝑗 ∈ Ω max ^ 𝑎 𝑘 ∈ ^ 𝐴 | ^ 𝑎 𝑗 ^ 𝑎 𝑘 | ^ 𝐴 = ^ 𝐴 (cid:208) { ˜ 𝑎 𝑗 } Ω = Ω \{ ˜ 𝑎 𝑗 } 𝑖 ++ end whileOutput : optimised matrix 𝐴 = [ 𝑎 , 𝑎 , · · · , 𝑎 𝑚 ] The optimised reconciliation method presents several advan-tages. Firstly, Bob only transmits the compressed key instead ofthe original key. Even if Eve eavesdrops this message, she cannotreconstruct Bob’s key 𝐾 𝐵𝑜𝑏 as will be discussed in Sec. 6. So it en-sures security. Secondly, unlike some conventional reconciliationmethods, this approach does not discard errors during reconcilia-tion process. With the powerful ℓ optimization, this approach canrecover more errors than previous reconciliation methods as willbe demonstrated in Sec. 5.7. Thus, it can improve key generationrate. Finally, some methods require both Alice and Bob to performreconciliation to correct the errors between their keys. However,in the CS-based approach, only one device needs to perform recon-ciliation (e.g., Alice in this example). The users can choose to run ℓ optimization on the power-rich devices to mitigate the computa-tional burden on resource-limited IoT devices. Thus, this approachcan improve energy efficiency as will be demonstrated in Sec 5.10.As discussed in Sec 3, Eve has the power to modify, insert andreplay messages. So she can perform two common attacks duringreconciliation process: MITM and replay attack. Eve can launchMITM by impersonating as Alice or Bob during key generationprocess to modify or insert her own messages. To solve this prob-lem, we apply the message authentication code (MAC) methodto ensure the integrity and authenticity of the message [5]. Bobincludes an additional MAC message with 𝑦 𝐵𝑜𝑏 , so the total mes-sage sent to Alice becomes 𝐿 𝐵𝑜𝑏 = { 𝑦 𝐵𝑜𝑏 , 𝑀𝐴𝐶 ( 𝐾 𝐵𝑜𝑏 , 𝑦

𝐵𝑜𝑏 )} . Af-ter receiving 𝐿 𝐵𝑜𝑏 , Alice computes 𝐾 ′ 𝐴𝑙𝑖𝑐𝑒 by Eq. 2 and verifies itsidentity. If

𝑀𝐴𝐶 ( 𝐾 ′′ 𝐴𝑙𝑖𝑐𝑒 , 𝑦

𝐵𝑜𝑏 ) ≠ 𝑀𝐴𝐶 ( 𝐾 𝐵𝑜𝑏 , 𝑦

𝐵𝑜𝑏 ) , Alice knowsthat the message was modified, indicating the presence of Eve. If 𝑀𝐴𝐶 ( 𝐾 ′′ 𝐴𝑙𝑖𝑐𝑒 , 𝑦

𝐵𝑜𝑏 ) = 𝑀𝐴𝐶 ( 𝐾 𝐵𝑜𝑏 , 𝑦

𝐵𝑜𝑏 ) , Alice can confirm thatthis message was indeed originated from Bob. To detect replay at-tacks, we can adopt some commonly used methods such as nounces,timestamps or tagging each message with a session ID [37].Although the above methods can prevent MITM and replay at-tacks, Eve can utilise the eavesdropped 𝑦 𝐵𝑜𝑏 to infer the shared key 𝑦 𝐵𝑜𝑏 . The authors in [32] pointed out two potential vulnerabilities.

Vulnerability 1 : Eve can try to recover Bob’s key from 𝑦 𝐵𝑜𝑏 by ℓ optimisation directly. Vulnerability 2 : Eve may launch the three types of attacksmentioned in Sec. 3 to obtain her own channel measurement. Thenshe can impersonate as a legitimate device and perform the same PSN ’21, May 18-21, 2021, Tennessee, USA reconciliation process as Alice by using her channel measurementwith the aim of generating the same key.The authors in [32] have proved that the CS-based reconciliationapproach is perfectly effective and secure if the number of rows of

𝐴 𝑀 meets the following condition: 𝑃 < 𝑀 < 𝑄, 𝑃 = 𝑆 Δ 𝐴,𝐵 ∗ 𝑙𝑜𝑔 ( 𝑁 / 𝑆 Δ 𝐴,𝐵 ) , 𝑄 = 𝑚𝑖𝑛 ( 𝑆 𝐵𝑜𝑏 , 𝑆 Δ 𝐵,𝐸 ) where 𝑆 Δ 𝐴,𝐵 is the sparsity of Δ 𝑥 , 𝑆 𝐵𝑜𝑏 is the sparsity of 𝐾 𝐵𝑜𝑏 , 𝑆 Δ 𝐵,𝐸 is the sparsity of difference between 𝐾 𝐵𝑜𝑏 and 𝐾 𝐸𝑣𝑒 , respec-tively. The sparsity here means the number of non-zero values inthe corresponding vector, i.e., the number of mismatches. As notedin [32], the design for an effective and secure CS-based reconcilia-tion is a problem to find a suitable 𝑀 , with upper bound 𝑄 beingthe secure threshold, and the lower bound 𝑃 being the effectivethreshold. We conduct extensive experiments to find a proper rangeof 𝑀 in Sec. 6. Although multi-level quantization can generate more bits from onesample, it may also generate some duplicated bits as the example inFig. 4(b). Directly using such a key will weaken the security of thesystem. Note that the Bloom filter-based approach in Sec 4.2.2 canprevent reverse engineering attack but cannot improve entropy. Toaddress this problem, we use Karhunen-Loeve Transform (KLT) todecorrelate the bit sequence after reconciliation.Assume the generated key by Alice after reconciliation is ¯ 𝐾 𝐴𝑙𝑖𝑐𝑒 = ( 𝑘 , 𝑘 , · · · , 𝑘 𝐿 ) 𝑇 , where 𝑘 𝑖 is the 𝑖 -th bit and 𝐿 is the length of thekey. The first step in KLT is to calculate the auto-correlation matrix 𝑅 = 𝐸 ( 𝐾𝐾 𝑇 ) . Next, we calculate its eigen value 𝜆 𝑖 and eigen vector 𝜙 𝑖 so that 𝑅𝜙 = 𝜆 𝑖 𝜙 𝑖 ( 𝑖 = , , · · · , 𝐿 ) . Note that 𝑅 is Hermitian,and its eigenvectors 𝜙 𝑖 are orthogonal. If we rank eigen values in adescending order, 𝜆 > 𝜆 > · · · > 𝜆 𝐿 , we can construct an unitarymatrix Φ which diagonalizes 𝑅 : Φ = [ 𝜙 , · · · , 𝜙 𝐿 ] . Φ is called theKLT matrix and can be used to decorrelate the bit sequences 𝐾 . Wechoose the largest 𝑆 eigenvectors to construct Φ , so we can obtaina decorrelated key string by 𝐾 ′′ 𝐴𝑙𝑖𝑐𝑒 = Φ 𝑇 ¯ 𝐾 𝐴𝑙𝑖𝑐𝑒 . In the same way,Bob can generate a decorrelated key sequence 𝐾 ′′ 𝐵𝑜𝑏 = 𝐾 ′′ 𝐴𝑙𝑖𝑐𝑒 .Although reconciliation can improve the reliability of a key gen-eration protocol, it reveals partial information to Eve. Privacy ampli-fication is a common approach to remove the revealed informationfrom the generated secret key sequence. It is usually implementedby the extractor, universal hashing functions, and cryptographichash functions [38]. In InaudibleKey, we use the commonly useddual universal hash function [39] to generate the final key. Finally,the key can be used by encryption algorithms such as AES-128 tosecure their communication.

Experimental Setup.

We implement InaudibleKey on SamsungS10 which is equipped with microphone and speaker. The frequencyrange of the inaudible acoustic signal is 18-22 kHz, and the samplingrate is 48 kHz. The volume of the speaker is set to the maximum, andthe corresponding sound pressure level (SPL) of the acoustic signalis 82 dB. Bluetooth Low Energy broadcast is used as the publicchannel to exchange reconciliation information. These settings can be supported by most modern mobile devices. We conduct extensiveexperiments with four smartphones, namely, Alice, Bob, Eve, andDavid (Eve’s partner). Same as the state-of-the-art [4, 21], data arecollected in four different scenarios: A —indoor static, B —outdoorstatic, C —indoor mobile, D —outdoor mobile. In mobile scenarios,the users can shake Alice and Bob, or carry them and walk around.In static scenarios, Alice and Bob are stationary while some peopleare moving around. The indoor experiments are conducted in astudent laboratory while the outdoor experiments are carried out ona campus road. In each environment, we vary the distance betweenAlice and Bob from 10 cm to 300 cm to evaluate the impact ofdistance. Eve and David are located at least 10 cm away from thelegitimate devices. We first evaluate the impact of important parameters in Inaudi-bleKey which include the guard band ratio 𝛼 in quantization (Sec. 4.2.1)and the number of eigenvectors 𝑆 in privacy amplification (Sec. 4.4).The guard band ratio 𝛼 trades off the key generation rate andkey agreement rate, Generally, a larger 𝛼 means more samples arediscarded which improves bit matching rate but decreases the bitgeneration rate. Fig. 5 shows the impact of 𝛼 on key generationrate and key matching rate. First, we can see the matching rateincreases significantly after information reconciliation. Also, wecan see that the key agreement rate rises with the increase of 𝛼 asmore mismatch bits are discarded. In InaudibleKey, we use 𝛼 = . because it achieves a high key agreement rate while generatingsufficient bits. Although a larger 𝛼 decreases key generation rate,from Fig. 5(b) we can see that it can still achieve a generation rate of768 bit/sec with 𝛼 = . . The key generation rate of InaudibleKeysignificantly outperforms the state-of-the-art works: it is 3 × fasterthan FREE [21], 7 × faster than TDS [4], 30 × faster than Walkie-Talkie [10], 256 × faster than H2B [32], respectively. Therefore,in terms of key generation rate, InaudibleKey is a better optionthan using radio signal (e.g., TDS [4]), motion sensor signal (e.g.,Walkie-Talkie [10], Shake-n-Shack [11]), and biometrics signal (e.g.,H2H [40], H2B [32]).The number of selected eigen vectors 𝑆 trades off the key gen-eration rate and entropy. If 𝑆 is larger, InaudibleKey can generatemore keys but with lower entropy. If 𝑆 is smaller, InaudibleKey cangenerate fewer keys with higher entropy. Fig. 5(c) shows the impactof 𝑆 on key generation rate and entropy. We can see that if 𝑆 isless than 80, the improvement of entropy levels off. Therefore, wechoose the largest 80 eigenvectors to form the decorrelation matrix. After determining 𝛼 , we evaluate the impact of distance betweenAlice and Bob on system performance. Fig. 5(d) shows that thekey agreement rate decreases slightly when the distance betweenAlice and Bob increases from 10 cm to 220 cm. Then, it starts todrop quickly when the distance further increases from 220 cm to300 cm. This is because when the distance increases, the audio signalattenuates exponentially due to path loss [41]. From Fig. 5(e), wefind that the key generation rate first increases when the distanceincreases from 10 cm to 20 cm, then becomes stable from 20 cm to230 cm after which they start to drop rapidly. This is because Line-of-Sight channel dominates the signal when two devices are very close PSN ’21, May 18-21, 2021, Tennessee, USA W. Xu et al.

Alpha K ey m a t c h i ng r a t e ( % ) Before reconciliationAfter reconciliation (a)

Alpha K ey g e n e r a t i on r a t e ( b i t / sec ) (b)

20 40 60 80 100 120

Number of Eigenvectors K ey g e n e r a t i on r a t e ( b i t / s ) E n t r op y (c) Distance (cm) K ey m a t c h i ng r a t e ( % ) (d) Distance (cm) K ey g e n e r a t i on r a t e ( b i t / sec ) (e) Figure 5: Impact of 𝛼 , 𝑆 and distance. and hence there is not much randomness to use. However, when thecommunication distance is too large, more environmental noise isinvolved in the received signal and the signal-to-noise ratio (SNR)becomes low, which leads to more discrepancies. From Fig. 5(d)and Fig. 5(e), we find that [20 cm, 220 cm] is a reasonable pairingrange to achieve both high agreement rate and bit generation rate.InaudibleKey extends the pairing distance by 3.2 times comparedto FREE [21], and 44 times compared to TDS [4].Researchers have revealed that people’s social distance variesfrom 1.2 m to 3.7 m [42]. Therefore, InaudibleKey can meet thepairing requirement of mobile devices in most cases. If the distanceof two users is larger than pairing range (say >3m), user can ei-ther walk a few steps closer to the target or use a shorter key fortemporary association, depending on the application requirementand user’s preference. In particular, nowadays people should keep1.5–2 m social distance due to pandemic. Our system can help usersestablish a secure communication channel without close contact,and thus help stop the spread of COVID-19. In this experiment, we evaluate the impact of different environ-ments by using the data in the range of [20 cm, 220 cm]. Fig. 6(a)illustrates the key generation rate and key agreement rate in dif-ferent scenarios. Intuitively, the key agreement rate of outdoorenvironment (i.e, B and D) is slightly lower than that of indoor en-vironment (i.e., A and C). This is because there are less multi-patheffect and more environmental noise in outdoor environment [6].In terms of generation rate, we can see that the mobile scenarios(i.e., C and D) can generate more bits in comparison with staticscenarios (i.e., A and B). This is because the mobile scenarios cangenerate more channel diversity and randomness.

While our analysis above show that InaudibleKey consistentlyachieve high agreement rate, the experiments are conducted oncampus only. The real-world environment is more complex andmay contain various kinds of noise. We now study the degradationin matching rate with increasing background noise. We manuallyadd 18-22 kHz random Gaussian noise with different intensities (30,50 and 70 dB). From Fig. 6(b), we can see that the key agreement rateonly drops slightly when the noise level is 30 dB. However, whenwe further increase the noise level, the key agreement rate dropssignificantly. Actually, in the range of 18-22 kHz, there are littleenvironment noise in the normal office and street environments,and such noise usually happens in factory and metro station [20]. To verify this, we conducted another experiment in four commonenvironments: coffee shop, shopping centre, bus station and metrostation. The distance between Alice and Bob is about 1 m, and wecollect 30 minutes data for each environment. All measurementsare made between 9 AM and 6 PM. In this experiment, we usesuccess rate (the probability of generating the same key) instead ofmatching rate as metric because we want to know how many trialsthe users need to successfully pair two devices in real environments.Fig. 6(c) shows the success rate of each environment. We can seethat InaudibleKey can achieve over 95% success rate in coffee shopand shopping centre. The success rate in bus station drops slightly,but it still can achieve over 90% success rate. We notice that thesuccess rate in metro station drops remarkably. This is because thenoise level of subway stations can be up to over 100 dB accordingto prior studies [43, 44]. More importantly, it has more noise inthe inaudible frequency range. However, if we use 8-bit key, thesuccess rate is still above . The results suggest that in noisyenvironment, users can use a short key to improve success rate.

So far, we assume both Alice and Bob use the same model of smart-phones (i.e., Samsung S10). Now we evaluate the performance ofInaudibleKey when Alice and Bob are different types of devices.Fig. 7 plots the CFRs of different types of mobile devices, we cansee that when Alice and Bob are using Samsung S10, their CFRsare very close to each other. While when Bob is using HTC smart-phone and HUAWEI watch, it involves more differences, but theCFR pattern over the frequency band is still very similar. Tab. 1shows the key agreement rate using different devices. It is intuitivethat InaudibleKey achieves the highest success rate when Alice andBob are the same type of devices. The success rate drops slightlywhen Alice and Bob are different types of devices. This is becausedifferent types of microphone and speaker produce different impactin the transmitted and recorded acoustic signals. In particularly,the matching rate of Arduino with other devices are the lowestbecause we use a low-price microphone and speaker module as willbe discussed in Sec. 5.10.

To demonstrate the advantage of our optimisation algorithm, wecompare it with the original CS-based reconciliation method andother methods. In the literature, two commonly used reconciliationtechniques are error-correcting code (ECC) [23, 45] and interactivemethod [5, 10, 46, 47]. In this paper, we use Reed-Solomon codeRS(15,7) and the method in [46] as benchmark. We calculate the key PSN ’21, May 18-21, 2021, Tennessee, USA

Key generation rateKey agreement rateABCD G ene r a t i on r a t e A g r ee m en t r a t e ( % ) (a) Impact of different environments DCBA

No noise30dB noise50dB noise70dB noise K e y ag r ee m en t r a t e (b) Impact of background noise Metro stationBus stationShopping centerCoffee shop S ucce ss r a t e ( % ) (c) Performance in real environments Figure 6: Impact of environment and noise.

18 19 20 21 22

Frequency(Khz) N o r m a li z e d C F R Alice-SamsungBob-SamsungBob-HTCBob-Huawei

Figure 7: CFR of different devices

DCBA

InaudibleKeyOriginal CS methodInteractive methodECC S ucce ss r a t e Figure 8:

Different reconciliation methods.

Samsung HTC Huawei ArduinoSamsung

Table 1: Success rate of different pairs. agreement rate of each method and plot their results in Fig. 8. Theresult of original CS-based method is obtained by averaging theresults of randomly generating sampling matrix 30 times. As can beseen from Fig. 8, InaudibleKey outperforms the original CS-basedmethod, interactive method and ECC and consistently achieves thehighest agreement rate in all the environments.

In this subsection, we compare InaudibleKey with several repre-sentative key agreement approaches for mobile networks. Thesemethods include KEEP [1], ASBG [3], TDS [4], Radio-telepathy [5],CGC [6] and FREE [21]. To achieve a fair comparison, we fine-tunethe parameters of other methods to make sure they achieve thebest performance. Specifically, for FREE, the distance between Aliceand Bob is set to 80cm, and the block size is 30. For ASBG, KEEPand CGC, 𝛼 and fragment size are set as 0.35 and 50, respectively.For TDS, the block size 𝛽 is 5. The distance between Alice and Bobis set to 4 cm as suggested by their authors. We compare the keyagreement rate, key generation rate, entropy, and reconciliationcounts of different methods.Fig. 9 shows the performances of different approaches. FromFig. 9(a), we can see that the key agreement rate of InaudibleKeyis higher than other approaches except for TDS. In this experi-ment, TDS can achieve the highest agreement rate because of theshort distance (4 cm). However, such short distance is unrealisticin practice. From Fig. 9(b), we can see that the key generation rateof InaudibleKey is significantly higher than previous works. To bespecific, the key generation rate of InaudibleKey is 3 × faster thanFREE [21], 7 × faster than TDS [4] on average. There are a numberof reasons for the improvement. First, the sampling rate of the audiosignal (i.e., 48kHz) is significantly higher than the radio channelprobing. Second, the channel frequency response can provide morechannel information compared to channel tap used in FREE [21].Finally, the optimised CS-based reconciliation methods can recovermore mismatches as demonstrated in the last subsection. Fig. 9(c) shows the entropy of extracted keys. We find that byusing KLT to decorrelate the bit sequences, InaudibleKey achieveshigher entropy than other methods. Fig. 9(d) plots the informationreconciliation counts of different methods. We can see that Inaudi-bleKey requires the minimum information reconciliation counts.To successfully generate the same key, InaudibleKey only requiresAlice and Bob to exchange reconciliation messages 1.6 times on aver-age. In comparison, TDS requires 4 pass checks [4], and FREE needsto exchange 25 reconciliation information messages on average [21].In other words, InaudibleKey reduces information reconciliationcounts by 2.5-16 times.The results show that InaudibleKey improves the key generationrate, the entropy, and reduces reconciliation counts significantlycompared to the state-of-the-arts. To evaluate the randomness of the extracted keys, we apply thecommonly used NIST suite of statistical tests [48]. The result ofNIST statistical test are p-values of different test processes whichindicate whether the key is random or not. If p-value is greaterthan , then the key is considered to be random. From the resultsin Tab. 2, we find that the p-values are all larger than , whichsuggests the extracted keys have good quality in randomness. Table 2: Results of NIST test.

NIST TEST p-valueSerial 0.553FFT Test 0.179Longest Run 0.353Monobit Frequency 0.742Linear Complexity 0.705Block Frequency 0.178Cumulative Sums 0.741Approximate Entropy 0.885Non Overlapping Template 0.5329

PSN ’21, May 18-21, 2021, Tennessee, USA W. Xu et al.

InaudibleKeyFREETDSKEEPASBGCGCRadio-Telepathy K e y ag r ee m en t r a t e ( % ) (a) Key agreement rate InaudibleKeyFREETDSKEEPASBGCGCRadio-Telepathy K e y gene r a t i on r a t e (b) Key generation rate InaudibleKeyFREETDSKEEPASBGCGCRadio-Telepathy E n tr op y (c) Entropy InaudibleKeyFREETDSKEEPASBGCGCRadio-Telepathy R econc ili a t i on coun t s (d) Reconciliation counts Figure 9: Comparison with state-of-the-arts.Table 3: System overhead.

Samsung S10 ArduinoInaudibleKey RSA ECDHE-RSA InaudibleKey RSA ECDHE-RSAProcessingTime (ms) 124 361 347 891 4,196 5,481EnergyConsumption (mJ) 108 391 354 1,107 1,706 2,196

Figure 10: Experimental setup of energy consumption.

To validate the feasibility of InaudibleKey on various IoT devices,we implement the prototype of InaudibleKey on Samsung S10 smart-phone and Arduino Uno board.The CPU of Samsung S10 is an Snapdragon at 2.84 GHz and theoperating system is Android 9.0. It is equipped with a stereo speakerand two dedicated microphone with active noise cancellation func-tion. Only the bottom microphone is used because it is close to thespeaker. The system is implemented in Java and the MAC algorithmdescribed in Section 4.3 is implemented based on SHA256 (HMAC-SHA256). In InaudibleKey, we save the transmitted OFDM signal asa Waveform Audio (WAV) file with a format of 16-bit Pulse CodedModulation (PCM), which will be played by speaker. To reduce theexpected response time, we implement InaudibleKey in multiplethreads. Two threads are created after InaudibleKey is launched.One of the threads is responsible for transmitting WAV file. Aftertransmitting, the smartphone will transit into listening mode andanother thread which records audio signal from another phone willbe created. In reconciliation, InaudibleKey uses ℓ -Homotopy [49] which is an efficient implementation of ℓ optimization algorithm.The complexity of ℓ -Homotopy is 𝑂 ( 𝑘 + 𝑘𝑚𝑛 ) , where 𝑘 is thesparsity of the solution, 𝑚 and 𝑛 are the size of sampling matrix 𝐴 which is 23 and 128, respectively.Firstly, we compare InaudibleKey with public key cryptographyand Diffie-Hellman key exchange protocol. For public key cryp-tography, we use the commonly used RSA as benchmark. Alicecan use RSA to encrypt a 128 bits key which can be decrypted byBob. Then, Alice and Bob can use AES-128 to secure their commu-nication. In this experiment, we use 2048 bits key for RSA whichis recommended by NIST [50]. Traditional Diffie-Hellman proto-col is susceptible to MITM attack, and is rarely used in practice.Therefore, we use the commonly used Elliptic Curve Diffie HellmanEphemeral with RSA signature (ECDHE_RSA), which is used inTransport Layer Security (TLS). We implement these algorithms onSamsung S10 and calculate their processing time and energy con-sumption. The implementation of these cryptographic algorithmsare based on the Chilkat library . The computation time is obtainedfrom the console of the development environment (Android studio)and averaged by the results from 30 tests. The energy consumptionof smartphone is calculated by reading the voltage and currentlevel of the battery which can be obtained by Android API . Asthe results in Table 3, we can see that RSA requires 361 ms to finisha round of encryption and decryption with a 2048 bits key. It takesabout 347 ms for ECDHE_RSA to generate a 128 bits key. However,InaudibleKey only requires 124 ms to generate a 128 bits key. There-fore, InaudibleKey is superior to public key cryptography and D-Hprotocol in key distribution on mobile devices.Secondly, to verify the feasibility of InaudibleKey on resource-limited IoT devices, we implement our system on Arduino Unoboard. Compared to the powerful CPU in Samsung S10, the mi-crocontroller of Arduino is ATmega328 which only has 32 K flashmemory and 2 K SRAM. There is no default speaker and micro-phone on Arduino board, so we connect additional speaker moduleand microphone module to it. To measure the energy consumptionon Arduino, we connect the output of a 9V battery to digitaloscilloscope. The details of the experimental setup is shown inFig. 10. The voltage over the resistor is stored in USB and used https://developer.android.com/reference/android/os/BatteryManager.html The operating voltage of Atmega328p is 5V, but the input voltage of the Arduinoboard is 6V to 12V.10

PSN ’21, May 18-21, 2021, Tennessee, USA

Alpha K ey a g r ee m e n t r a t e ( % ) Eavesdrop attackImitating attackPredictable channel attack (a) Agreement rate of different attacks

18 19 20 21 22

Frequency(Khz) N o r m a li z e d C F R ( d B ) BobEve (b) Bob vs predictable channel attacker

10 20 30 40 50 60 70 M C o m p l e m e n t a r y CD F LegitimateEavesdrop attackImitating attackPredictable attack feasiblerange (c) Feasible range of parameter M

Figure 11: Security analysis. to calculate the energy consumption of the board. The process-ing time and energy consumption is shown in Table 3. We cansee that although the system overhead of InaudibleKey is muchhigher than that on smartphone, it is still much more efficient thanRSA and ECDHE-RSA. Moreover, the computation-intensive part ofInaudibleKey—reconciliation— can be performed on the power-richdevice (if one of the devices is powerful).We now analyse the impact of energy consumption on IoT de-vices. The battery capacity of the Samsung S10 is 3,400 mAh (42.8 kJ).So, the energy cost of InaudibleKey amounts to 0.3 𝑒 − of the totalenergy supply. If we assume the smartphone with a targeted lifes-pan of one day which results in an energy budget of 1.75 kJ per hour.Then, with only of the battery budget (17.5 J), InaudibleKey isable to run approximately 175 times per hour, i.e., InaudibleKeycan continuously run every 20 seconds. In the same way, we canestimate that with 9V battery (500 mAh) and of the batterybudget, InaudibleKey is able to run every 22 minutes on ArduinoUno board, i.e., it can run about 3 times per hour. These resultsdemonstrate that InaudibleKey incurs a low system overhead andis more efficient than public key scheme. In Vulnerability 1, Eve can try to reconstruct the keys from 𝑦 𝐵𝑜𝑏 di-rectly using ℓ optimization. As discussed above, the key generatedby InaudibleKey has high entropy, which means almost half of thekeys are bit ‘1’s. Fig. 6(a) shows that the initial agreement rate ofAlice and Bob is about when 𝛼 = . . Assume we use 128-bitkey, we have 𝑃 = × . × 𝑙𝑜𝑔 ( /( × . )) ≈ and 𝑄 = × = according to [32]. Theoretically, the range ofM can be [20,63] because 𝑃 < 𝑀 < 𝑄 . Practically, we can choose 𝑀 ∈ [ , ] to guarantee security against an adversary and theavailability of the same key. Eve can perform the following three types of attacks to generatea key 𝐾 𝐸𝑣𝑒 that is close to 𝐾 𝐵𝑜𝑏 hoping that she can recover 𝐾 𝐵𝑜𝑏 with the eavesdropped 𝑦 𝐵𝑜𝑏 . Against Eavesdropping Attack.

In this attack, Eve can eavesdropall the communication traffic in the public channel. Since Eve islocated out of the safe distance (>10 cm), she will obtain a totallydifferent channel response, as discussed in Sec. 2. From Fig. 11(a),we can see that the agreement rate of eavesdropping attack is about 20-35%. Therefore, if Eve is out of the safe distance, she cannotguess the same key due to the different multipath fading channel.

Against Imitating Attack.

In this attack, Eve can observe howAlice and Bob generate keys. Then, after Alice and Bob leave thesite, Eve will ask her partner David to imitate the motion of Aliceand Bob to generate the same key. Previous studies have shownthat simply imitating the user’s shaking or walking motions can-not generate the same key for accelerometer-based authenticationsystems [10, 11, 51]. Similarly, Fig. 11(a) shows that an imitatingattack can achieve a higher agreement rate when 𝛼 increases. Buteventually, it can at most achieve agreement rate. More im-portantly, Eve does not know which bit is correct because of thetime-varying nature of channels. Against Predictable Channel Attack.

Predictable channel attackis a simple but effective attack to compromise a key agreementprotocol, especially for RSSI-based approaches [3, 6]. In this attack,Eve can intentionally block and unblock the Line-of-Sight (LOS)between Alice and Bob to generate predictable channel measure-ments. We evaluate this attack by setting up Alice and Bob 100 cmaway with LOS and ask a person to walk between Alice and Bobintentionally. Then after the key generation, we replace Alice andBob with Eve and David and ask the same person to repeat the pro-cess. Then we compare Eve’s key with Bob’s key to see if Eve cangenerate the same key. From Fig. 11(a), we can see that a predictablechannel attack can achieve the highest matching rate among thesethree types of attack. But still, it can at most reach matchingrate. Fig. 11(b) plots the CFR of Bob and Eve when the same personblocks the LOS signal. We can see that although the channel re-sponses of Bob and Eve are similar in some frequencies, there is stilla large portion of the difference in other frequencies due to time-varying channels and hardware difference. Particularly, we noticedthat Eve is capable of producing similar channel responses in thelower frequency range but not the higher frequency range. Thereare two reasons. First, the microphone actually works as a low-passfilter with a 22 kHz cutoff frequency [52]. So in the higher frequencyrange, the acoustic signal will be attenuated slightly which resultsin more mismatches. Additionally, Zhou et al. [20] found that differ-ent speakers’ performances are much more diversified at a higherfrequency range. If the attacker leverages more sophisticated hard-ware, it is possible that they increase their attacking ability. But itis an open question that requires further investigation. PSN ’21, May 18-21, 2021, Tennessee, USA W. Xu et al.

Although imitating attack and predictable channel attack canachieve approximately matching rate, the probability of deduc-ing the same 128-bit key is extremely low, i.e., . = . 𝑒 − .The matching rate of Eve can be further reduced by setting a higherthreshold in quantization or turning down the volume of speaker.Considering the mathing rate of Eve, a 225-bit key of our system isequivalent to a 128-bit AES symmetric key, and it takes about 0.3 sto generate such a key based on the result in Sec. 5.3.Fig. 11(c) shows the distribution of 𝑃 and 𝑄 in our dataset. Wecan see that there is a feasible range to use. In other words, if 𝑀 lies in the feasible range, then InaudibleKey is resilient to the threetypes of attacks above. Previous studies also found that if the samesampling matrix 𝐴 is used repeatedly, both 𝑦 𝐵𝑜𝑏 and 𝐾 𝐵𝑜𝑏 could beconditionally accessed [53]. We can easily solve this problem byupdating 𝐴 after each successful key generation. Although 𝐴 needsto be pre-stored, it is public information instead of a secret that isonly known by Alice and Bob. InaudibleKey does not realise literalauthentication but rely on the user to authenticate the other device.This is practical because if Eve wants to perform impersonationattack, she should be close to Alice or Bob. Her suspicious actionscan be easily spotted by Alice or Bob. Proximity-based approaches.

The proximity-based approachespair two devices based on the observation that two devices inphysical proximity can measure similar physical information. Re-searchers have proposed many different systems by exploring var-ious location-sensitive features such as RSSI [7, 8, 54], CSI [4],audio [55] and illumination [56]. However, these approaches sufferfrom a common problem: the distance between two legitimate de-vices should be very close, e.g., 1.25 cm in Proximate [8] and 5cmin TDS [4].

Channel reciprocity-based approaches.

Physical layer keygeneration is a hot research filed over the past decade. Researchershave studied key agreement for different wireless technologies suchas ZigBee [3], Wi-Fi [4, 5]. Among these approaches, RSSI-based keygeneration methods suffer from predictable channel attack and lowbit generation rate. Although CSI-based key generation methodscan improve bit generation rate, most systems rely on customisedhardware to obtain CSI information. Recently, researchers also useunique body channel to pair two mobile devices [17, 57]. However,these methods require specialised sensors such as electrode [17]and Electromyogram sensor [57].

Acoustic signal-based approaches.

Recently, the acoustic sig-nal is also exploited to pair mobile devices [15, 16, 21, 22, 55].Proximity-based schemes such as [16, 55] are not feasible due toconstraint of social distance. Two recent works [21, 22] are closelyrelated to our system. FREE [21] used channel tap and the authorsof [22] used sound pressure as channel characteristics. However,these metrics can only provide a coarse estimation of acoustic chan-nel. In comparison, we modulate the audio signal using OFDMtechnology to obtain fine-grained channel estimation and proposean optimisation algorithm to improve the performance of reconcili-ation. This is why we can achieve much higher generation rate.

In this paper, we propose a novel key generation system for mobiledevices via inaudible acoustic signal. Extensive evaluation resultsshow that InaudibleKey outperforms the state-of-the-arts signifi-cantly. To demonstrate the feasibility, we implement InaudibleKeyon both powerful and resource-limited IoT devices. We also verifythe security of InaudibleKey against malicious attacks. The resultsin this paper show that InaudibleKey is a fast, practical and efficientkey generation protocol for mobile devices that can work in variousenvironments. More importantly, it allows users to pair two mobiledevices without breaking social distance restrictions.

ACKNOWLEDGMENTS

This work is supported by the APRC grant (Project No. 9610485)and the Start-up grant (Project No. 7200642) from City Universityof Hong Kong.

REFERENCES [1] Wei Xi, Xiang-Yang Li, Chen Qian, Jinsong Han, Shaojie Tang, Jizhong Zhao, andKun Zhao. Keep: Fast secret key extraction protocol for d2d communication. In

IWQoS , pages 350–359. IEEE, 2014.[2] Weitao Xu, Junqing Zhang, Shunqi Huang, Chengwen Luo, and Wei Li. Keygeneration for internet of things: A contemporary survey.

ACM ComputingSurveys (CSUR) , 54(1):1–37, 2021.[3] Suman Jana, Sriram Nandha Premnath, Mike Clark, Sneha K Kasera, Neal Patwari,and Srikanth V Krishnamurthy. On the effectiveness of secret key extractionfrom wireless signal strength in real environments. In

Mobicom , pages 321–332.ACM, 2009.[4] Wei Xi, Chen Qian, Jinsong Han, Kun Zhao, Sheng Zhong, Xiang-Yang Li, andJizhong Zhao. Instant and robust authentication and key agreement amongmobile devices. In

CCS , pages 616–627. ACM, 2016.[5] Suhas Mathur, Wade Trappe, Narayan Mandayam, Chunxuan Ye, and Alex Reznik.Radio-telepathy: extracting a secret key from an unauthenticated wireless chan-nel. In

Mobicom , pages 128–139. ACM, 2008.[6] Hongbo Liu, Yang Wang, Jie Yang, and Yingying Chen. Fast and practical secretkey extraction by exploiting channel response. In

INFOCOM , pages 3048–3056.IEEE, 2013.[7] Alex Varshavsky, Adin Scannell, Anthony LaMarca, and Eyal De Lara. Amigo:Proximity-based authentication of mobile devices. In

Ubicomp , pages 253–270.Springer, 2007.[8] Suhas Mathur, Robert Miller, Alexander Varshavsky, Wade Trappe, and NarayanMandayam. Proximate: proximity-based secure pairing using ambient wirelesssignals. In

Mobisys , pages 211–224. ACM, 2011.[9] Nikolaos Karapanos, Claudio Marforio, Claudio Soriente, and Srdjan Capkun.Sound-proof: usable two-factor authentication based on ambient sound. In , pages 483–498, 2015.[10] Weitao Xu, Girish Revadigar, Chengwen Luo, Neil Bergmann, and Wen Hu.Walkie-talkie: Motion-assisted automatic key generation for secure on-bodydevice communication. In

IPSN , pages 1–12. IEEE, 2016.[11] Yiran Shen, Fengyuan Yang, Bowen Du, Weitao Xu, Chengwen Luo, and HongkaiWen. Shake-n-shack: Enabling secure data exchange between smart wearablesvia handshakes. In

PerCom , pages 1–10. IEEE, 2018.[12] Qian Wang, Hai Su, Kui Ren, and Kwangjo Kim. Fast and scalable secret key gen-eration exploiting channel phase randomness in wireless networks. In

INFOCOM ,pages 1422–1430. IEEE, 2011.[13] Daniel Halperin, Wenjun Hu, Anmol Sheth, and David Wetherall. Tool release:Gathering 802.11 n traces with channel state information.

ACM SIGCOMMComputer Communication Review , 41(1):53–53, 2011.[14] Matthias Schulz, Jakob Link, Francesco Gringoli, and Matthias Hollick. Shadowwi-fi: Teaching smartphones to transmit raw signals and to extract channel stateinformation to implement practical covert channels over wi-fi. In

Mobisys , pages256–268, 2018.[15] Jun Han, Albert Jin Chung, Manal Kumar Sinha, Madhumitha Harishankar, ShijiaPan, Hae Young Noh, Pei Zhang, and Patrick Tague. Do you feel what i hear?enabling autonomous iot device pairing using different sensor types. In , pages 836–852. IEEE, 2018.[16] Pengjin Xie, Jingchao Feng, Zhichao Cao, and Jiliang Wang. Genewave: Fastauthentication and key agreement on commodity mobile devices.

IEEE/ACMTransactions on Networking (TON) , 26(4):1688–1700, 2018.12

PSN ’21, May 18-21, 2021, Tennessee, USA [17] Marc Roeschlin, Ivan Martinovic, and Kasper Bonne Rasmussen. Device pairingat the touch of an electrode. In

NDSS , volume 18, pages 18–21, 2018.[18] Wei Wang, Lin Yang, and Qian Zhang. Touch-and-guard: secure pairing throughhand resonance. In

Proceedings of the 2016 ACM International Joint Conferenceon Pervasive and Ubiquitous Computing , pages 670–681, Heidelberg Germany,September 2016. ACM.[19] Rajalakshmi Nandakumar, Vikram Iyer, Desney Tan, and Shyamnath Gollakota.Fingerio: Using active sonar for fine-grained finger tracking. In

CHI , pages1515–1525. ACM, 2016.[20] Zhe Zhou, Wenrui Diao, Xiangyu Liu, and Kehuan Zhang. Acoustic fingerprintingrevisited: Generate stable device id stealthily with inaudible sound. In

CCS , pages429–440. ACM, 2014.[21] Youjing Lu, Fan Wu, Shaojie Tang, Linghe Kong, and Guihai Chen. Free: A fastand robust key extraction mechanism via inaudible acoustic signal. In

Mobihoc ,pages 311–320. ACM, 2019.[22] Dania Qara Bala and Bhaskaran Raman. Phy-based key agreement scheme usingaudio networking. In , pages 129–136. IEEE, 2020.[23] Hongbo Liu, Jie Yang, Yan Wang, and Yingying Chen. Collaborative secret keyextraction leveraging received signal strength in mobile wireless networks. In

INFOCOM , pages 927–935. IEEE, 2012.[24] David Tse and Pramod Viswanath.

Fundamentals of wireless communication .Cambridge university press, 2005.[25] Matthew Edman, Aggelos Kiayias, and Bülent Yener. On passive inference attacksagainst physical-layer key extraction? In

Proceedings of the Fourth EuropeanWorkshop on System Security , page 8. ACM, 2011.[26] John Paul Walters, Zhengqiang Liang, Weisong Shi, and Vipin Chaudhary. Wire-less sensor network security: A survey.

Security in distributed, grid, mobile, andpervasive computing , 1:367, 2007.[27] Theodore S Rappaport et al.

Wireless communications: principles and practice ,volume 2. prentice hall PTR New Jersey, 1996.[28] Kuan Zhang, Patricia Werner, Ming Sun, F Xavier Pi-Sunyer, and Carol N Boozer.Measurement of human daily physical activity.

Obesity research , 11(1):33–40,2003.[29] Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. Rappor: Randomizedaggregatable privacy-preserving ordinal response. In

Proceedings of the 2014 ACMSIGSAC conference on computer and communications security , pages 1054–1067,2014.[30] Wanli Xue, Dinusha Vatsalan, Wen Hu, and Aruna Seneviratne. Sequence datamatching and beyond: New privacy-preserving primitives based on bloom filters.

IEEE Transactions on Information Forensics and Security , 2020.[31] Burton H Bloom. Space/time trade-offs in hash coding with allowable errors.

Communications of the ACM , 13(7):422–426, 1970.[32] Qi Lin, Weitao Xu, Jun Liu, Abdelwahed Khamis, Wen Hu, Mahbub Hassan, andAruna Seneviratne. H2b: heartbeat-based secret key generation using piezovibration sensors. In

IPSN , pages 265–276. ACM, 2019.[33] D.L. Donoho. Compressed sensing.

IEEE Transactions on Information Theory ,pages 1289–1306, 2006.[34] E.J. Candes and Emmanuel J. The restricted isometry property and its implicationsfor compressed sensing.

Comptes Rendus Mathematique. , pages 589–592, 2008.[35] Yiran Shen, Wen Hu, Mingrui Yang, Bo Wei, Simon Lucey, and Chun TungChou. Face recognition on smartphones via optimised sparse representationclassification. In

IPSN , pages 237–248. IEEE Press, 2014.[36] Michael Elad. Optimized projections for compressed sensing.

IEEE Transactionson Signal Processing , 55(12):5695–5702, 2007.[37] Sreekanth Malladi, Jim Alves-Foss, and Robert B Heckendorn. On preventingreplay attacks on security protocols. Technical report, IDAHO UNIV MOSCOWDEPT OF COMPUTER SCIENCE, 2002.[38] Junqing Zhang, Trung Q Duong, Alan Marshall, and Roger Woods. Key generationfrom wireless channels: A review.

Ieee access , 4:614–626, 2016.[39] Masahito Hayashi and Toyohiro Tsurumaru. More efficient privacy amplificationwith less random seeds via dual universal hash function.

IEEE Transactions onInformation Theory , 62(4):2213–2232, 2016.[40] Masoud Rostami, Ari Juels, and Farinaz Koushanfar. Heart-to-heart (h2h): au-thentication for implanted medical devices. In

CCS

Current anthropology ,9(2/3):83–108, 1968.[43] Donguk Lee, Gibbeum Kim, and Woojae Han. Analysis of subway interior noiseat peak commuter time.

Journal of audiology & otology , 21(2):61, 2017.[44] Robyn RM Gershon, Richard Neitzel, Marissa A Barrera, and Muhammad Akram.Pilot survey of subway and bus stop noise levels.

Journal of Urban Health ,83(5):802, 2006.[45] Weitao Xu, Chitra Javali, Girish Revadigar, Chengwen Luo, Neil Bergmann,and Wen Hu. Gait-key: A gait-based shared secret key generation protocol forwearable devices.

ACM Transactions on Sensor Networks (TOSN) , 13(1):6, 2017. [46] Kai Zeng, Daniel Wu, An Chan, and Prasant Mohapatra. Exploiting multiple-antenna diversity for shared secret key generation in wireless networks. In

INFOCOM , pages 1–9. IEEE, 2010.[47] Lili Meng, Jie Liang, Upul Samarawickrama, Yao Zhao, Huihui Bai, and AndréKaup. Multiple description coding with randomly and uniformly offset quantizers.

IEEE Transactions on Image Processing , 23(2):582–595, 2014.[48] Andrew Rukhin, Juan Soto, James Nechvatal, Miles Smid, and Elaine Barker.A statistical test suite for random and pseudorandom number generators forcryptographic applications. Technical report, Booz-Allen and Hamilton IncMclean Va, 2001.[49] D.L. Donoho and Y. Tsaig. Fast Solution of ℓ -Norm Minimization ProblemsWhen the Solution May Be Sparse. IEEE Transactions on Information Theory ,54(11):4789–4812, 2008.[50] Recommendation for key management. https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-57Pt3r1.pdf.[51] Rene Mayrhofer and Hans Gellersen. Shake well before use: Intuitive and securepairing of mobile devices.

IEEE Transactions on Mobile Computing , 8(6):792–806,2009.[52] Yitao He, Junyu Bian, Xinyu Tong, Zihui Qian, Wei Zhu, Xiaohua Tian, andXinbing Wang. Canceling inaudible voice commands against voice controlsystems. In

Mobicom , pages 1–15, 2019.[53] Zuyuan Yang, Wei Yan, and Yong Xiang. On the security of compressed sensing-based signal cryptosystem.

IEEE Transactions on Emerging Topics in Computing ,3(3):363–371, 2015.[54] Jiansong Zhang, Zeyu Wang, Zhice Yang, and Qian Zhang. Proximity based iotdevice authentication. In

INFOCOM , pages 1–9. IEEE, 2017.[55] Dominik Schürmann and Stephan Sigg. Secure communication based on ambientaudio.

IEEE Transactions on Mobile Computing , 12(2):358–370, 2011.[56] Markus Miettinen, N Asokan, Thien Duc Nguyen, Ahmad-Reza Sadeghi, andMajid Sobhani. Context-based zero-interaction pairing and key evolution foradvanced personal devices. In

CCS , pages 880–891. ACM, 2014.[57] Lin Yang, Wei Wang, and Qian Zhang. Secret from muscle: Enabling securepairing with electromyography. In