EmoSense: Computational Intelligence Driven Emotion Sensing via Wireless Channel Data
Yu Gu, Yantong Wang, Tao Liu, Yusheng Ji, Zhi Liu, Peng Li, Xiaoyan Wang, Xin An, Fuji Ren
IIEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 1
EmoSense: Computational Intelligence DrivenEmotion Sensing via Wireless Channel Data
Yu Gu ,
Senior Member, IEEE,
Yantong Wang, Tao Liu, Yusheng Ji,
Senior Member, IEEE,
Zhi Liu,
Member, IEEE,
Peng Li, Xiaoyan Wang, Xin An, and Fuji Ren,
Senior Member, IEEE,
Abstract —Emotion is well-recognized as a distinguished sym-bol of human beings, and it plays a crucial role in our dailylives. Existing vision-based or sensor-based solutions are eitherobstructive to use or rely on specialized hardware, hinderingtheir applicability. This paper introduces EmoSense, a first-of-its-kind wireless emotion sensing system driven by computationalintelligence. The basic methodology is to explore the physicalexpression of emotions from wireless channel response via datamining. The design and implementation of EmoSense face twomajor challenges: extracting physical expression from wirelesschannel data and recovering emotion from the correspondingphysical expression. For the former, we present a Fresnelzone based theoretical model depicting the fingerprint of thephysical expression on channel response. For the latter, wedesign an efficient computational intelligence driven mechanismto recognize emotion from the corresponding fingerprints. Weprototyped EmoSense on the commodity WiFi infrastructureand compared it with main-stream sensor-based and vision-based approaches in the real-world scenario. The numerical studyover cases confirms that EmoSense achieves a comparableperformance to the vision-based and sensor-based rivals underdifferent scenarios. EmoSense only leverages the low-cost andprevalent WiFi infrastructures and thus constitutes a temptingsolution for emotion sensing.
Index Terms —Emotion sensing; WiFi data; Commodity WiFiInfrastructures;
I. I
NTRODUCTION
Emotion is a significant feature of human beings. It is alsothe key to interpreting implicit messages in human interaction[1]. Though humans seem to be born with innate emotional ca-pabilities, it is not a natural gift for the computers. Therefore,emotion sensing becomes an emerging topic for the human-machine interaction with various tempting applications likeelder emotion companion [2] and autism treatment [3].Emotion, as a complicated psychological state, usuallyexhibits both external signature like physical expression, andinternal signature like physiological signal. Accordingly, cur-rent emotion sensing solutions can be divided into two cate-gories, i.e., vision-based [4], [5] and sensor-based [6], [7]. Theformer focuses on capturing external signature for emotion
Y. Gu (co-corresponding author), Y. Wang, T. Liu and X. Anare with School of Computer and Information, Hefei University ofTechnology, China. E-mail: [email protected], { wangyantong912 andLTao } @mail.hfut.edu.cnY. Ji is with National Institute of Informatics, Japan. E-mail: [email protected]. Liu is with Shizuoka University, Japan E-mail: [email protected]. Wang is with Ibaraki University, Japan. E-mail: [email protected]. Li is with University of Aizu, Japan. E-mail: [email protected]. Ren (co-corresponding author) is with University of Tokushima, Japan.E-mail: [email protected] recognition, e.g., facial expression [4] or body gestures [8].The latter concentrates on detecting internal signature for re-covering emotions, e.g., electroencephalogram (EEG) signalsfor evaluating inner emotional status [6].The last few decades have witnessed solid research pro-gresses in emotion sensing achieved by the above two main-stream solutions. However, they still have some fundamentalyet unsolved issues. For instance, current systems are usuallybuilt on specialized hardware, making their availability aprominent problem. Also, they are normally constrained byphysical and environmental conditions such as illuminationand line-of-sight (LOS) dependence, leading to the reliabilityissue. Last but not least, they could be considered as offensivesince people usually dislike physical contact (sensors) or beingmonitored (cameras). Hence people are seeking for possiblealternatives to innovate the conventional approaches by askingthe following question: How can we construct an emotion sensing system that (1) ef-fectively recognizes emotions without any specialized devices,(2) robustly works under different circumstances like site,target, illumination condition, and (3) continuously monitorsthe area of interest without privacy concern?
In this paper, we introduce EmoSense, a first-of-its-kindwireless emotion sensing system that leverages channel re-sponse from off-the-shelf WiFi devices. EmoSense has threemajor advantages compared to its vision-based and sensor-based rivals. Firstly, it does not rely on specialized hardwaresince the low-cost WiFi infrastructure is pervasive nowadays.Secondly, it is robust since characterizing the channel responsewith Fresnel zones waives the environmental dependence.Lastly, it is contactless and free of privacy concern since theWiFi signal is unnoticeable for users.EmoSense explores body gesture that contains rich moodexpressions for emotion recognition. The key idea is that hu-man body gesture affects wireless signal via the shadowing andmulti-path effects. Such effects usually form unique patternsor fingerprints in the temporal-frequency domain for differentgestures. EmoSense leverages the gesture fingerprint to recoverthe corresponding emotions. Its design and implementationfaces two challenges, i.e.,1) How to identify the body gesture through its fingerprinton the wireless signal?2) How to recognize emotions from its corresponding bodygestures (physical expressions)?The first challenge corresponds to enhancing and extractingthe fingerprint of body gestures (sometimes very minor andbrief) on wireless signal in terms of channel response. To this a r X i v : . [ c s . H C ] A ug EEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2 end, we propose a look-up method based on Fresnel zonesto ensure a quick and efficient experimental setup to capturefine-grained gestures.The second challenge equals to matching the gestures to thecorresponding emotions. It essentially converges to a typicaldata mining problem that can be be solved by computationalintelligence. Therefore, we design a computational intelligencedriven architecture to explore both temporal and frequencyfeatures from signal fingerprints to recover emotions.We prototype EmoSense with low-cost off-the-shelf WiFidevices and evaluate its performance in the real environments.We also realize two traditional vision-based and sensor-basedsystems for the comparative study. All three systems focuson the external signature of emotion. We recruit 14 subjectswith no act training and ask them to evoke four emotions(happy, sad, anger and fear) through audiovisual stimulations,i.e., watching video clips or listening to music. During theexperiment, the vision-based system is capturing the facialexpressions, while EmoSense and the sensor-based systemkeep monitoring the simultaneous body gestures, respectively.The comparative study over cases suggests thatEmoSense achieves a comparable performance to thevision-based and sensor-based rivals under different scenarioswith a classic k-Nearest Neighbor (kNN) classifier, worksrobustly since the impact of external circumstances like site,target, illumination, and line-of-sight is limited, and is unob-trusive since no subject reports privacy or comfort complaintsas in the vision-based and sensor-based rivals. Furthermore,we report several interesting investigations. For instance, theempirical result confirms that the physical expression of emo-tions is person-dependent. In other words, different peoplehave different habits of expressing their moods.Our contributions can be summarized as follows:1) We design a Fresnel zone based model to characterizethe physical expression of emotion on wireless channeldata and provide a look-up method to enhance thefingerprints by adjusting the experimental settings.2) We devise a computational intelligence driven scheme toeffectively extract key features and efficiently recognizeemotion from the CSI amplitude data.3) We realize EmoSense, a first-of-its-kind WiFi-basedemotion sensing system, on the commodity WiFi de-vices. EmoSense has been evaluated with a vision-basedsystem and a sensor-based system in real environments.The experimental results not only confirm its effective-ness, but also reveal several inspiring observations.The rest of this paper is organized as follows: we introducea literature review in the next section, following by somepreliminaries that inspire the design of EmoSense in sectionIII. In section IV, we present the detailed design of EmoSense.Then, we evaluate EmoSense in real scenarios and explain theexperimental results in section V. Finally, we conclude ourwork and outline some possible extensions in section VI.II. R ELATED W ORKS
This work involves two topics, i.e., affective computing andWiFi-based gesture recognition. We will introduce the relatedresearch for both topics in this section.
A. Affective Computing
Two decades ago, Marvin Minsky [9] raised the famous ar-gument: “The question is not whether intelligent machines canhave any emotions, but whether machines can be intelligentwithout emotion?”. Since then, affective computing, whichintends to endow computers with the ability to timely senseusers’ moods and to intelligently respond them, becomes arising star of computer science and attracts tons of attentionfrom both industry and academia [10]. One essential method-ology has been laid down, i.e., exploring emotion via itsexpressive modalities such as audiovisual clues, textual input,physiological signals, and body gestures.
Audiovisual-based:
In the daily life, voice and facial ex-pressions embody most of our emotional elements. As aresult, 95% of our current research on emotion recognitionrelies on the facial expression as stimuli. For instance, Sasteand Jagdale [11] designed a system recognizing emotion inspeech that is irrelevant to languages. The system can beembedded in Automated Teller Machines (ATMs) for thesafety purpose. Recently, Liu et al. [12] proposed FEER-HRI, an online system that can not only recognize emotionduring communication between human and robots via facialexpressions, but also generate the corresponding emotion onrobot for better interaction.
Textual-based:
80% of our historical knowledge has beenpreserved in text. Nowadays, as online social media likeFacebook, Wechat, and Twitter become indispensable in themodern society, it is a tempting way of exploring rich textualsocial information for implicit emotions.Generally speaking, emotional words are commonly seenin the text documents no matter in which language theyare written. Shivhare et al. [13] leveraged the emotion wordontology and classified them into different emotion levels thathave different scores. Then, emotion of the input text can bedetermined by mapping the sum of emotion scores in the textto certain emotion categories.It is a common sense that only words are far from enoughto infer the inherent complex emotions in the text. Therefore,the syntactic and semantic structure of the text are frequentlyused. For instance, Shaheen et al. [14] proposed ERR, a novelmethod leveraging the syntactic and semantic structure of theinput English sentence and extracting emotional informationfor emotion recognition.As the research on both classes moves forwards, one primeconcern attracts more and more attention, i.e., both audiovisualclues and textual inputs are vulnerable to the intentionalemotion induction and masking, because they only representartificial emotions that are not direct and could be tuned. Tothis end, there comes a new upsurge on directed reflections ofemotion based on physiological signals and body gestures.
Physiological-signal-based:
The most commonly-usedphysiological signals are heart beating rate, breathing rate,blood pressure, and skin conductance. Usually, those signalsare obtained by contact or invasive sensors. A comprehensivesurvey on emotion recognition via physiological sensors ispresented in [15].Recently, researchers tend to the ubiquitous wireless signalsfor physiological measurements in a non-contact way. For
EEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 3 example, Zhao et al. [16] designed EQ-radio, a one-of-its-kind emotion recognition system using wireless signals tophysiological info such as the heart beating rate and thebreathing rate.
Gesture-based:
It is reported that human gestures alsopossess ample emotional elements that are not fully exploredyet. Lv et al. [17] were among the first to design a gesture-based emotion recognition system through analyzing the typ-ing sequence on a keyboard. However, Pusara et al. [18]found that the mouse movement alone is not enough foremotion recognition since it includes too little emotionalinfo. To this end, body movements that are rich in emotionhave been explored recently. Glowinski [8] et al. are amongthe first to compute emotion by analyzing body movementsthrough the off-the-shelf cameras. Piana et al. [19] pushedthe research further by utilizing the kincet device to extractpostures, physical features, and movement trends from thethree-dimensional skeleton of the human body.
B. WiFi-based Gesture Recognition
It is well-known that human beings interfere the wirelesssignals due to multi-path and fading effects [20]. But onlyuntil recently such interferences have been explored for gesturerecognition [21].The cost-effective WiFi infrastructure is widely accessiblenowadays.The most commonly-used indictor for the channelresponse of WiFi is the Received Signal Strength (RSS), acoarse-grained power feature summed over all propagationpaths. Sigg et al. were among the first to explore RSS forrecognizing hand gesture [22]. Later, Gu et al. showed thatRSS is also applicable for the whole-body gestures [21].RSS is handy, but incapable when dealing with the multi-path effect. Therefore, Channel State Info (CSI), which char-acterizes the wireless signals with the frequency, amplitude(energy feature) and phase information, soon comes in as abetter alternative [23]. Zeng et al. use CSI in recognizinghand gestures and achieve better performance [24]. Soon CSIhas been explored for fine-grained gestures such as mouthmovements [25] and keystrokes [26], [27].Though the gesture-based affective computing becomesmore and more popular nowadays, traditional solutions rely-ing on vision and wearable sensors embody several crucialdemerits such as the availability, reliability and privacy issues.To this end, we present a early version of EmoSense [28]to demonstrate the feasibility of exploring channel responsefor emotion recognition. In this paper, we push the researchmuch further by elaborating the system design with computa-tional intelligence. The enhanced system has been extensivelyevaluated with its vision-based and sensor-based rivals in realenvironments. The result shows that EmoSense achieves quitecompetitive performance.III. P
RELIMINARIES
In this part, we will first introduce the basic concepts ofwireless channel data, where the fingerprints of human motionand emotion are hidden. Then, we will build a prototypeto conduct a pilot experiment studying how the physicalexpression of emotion affects the signals.
A(cid:13)A(cid:13)A(cid:13)A(cid:13)
TxRx
Laptop(cid:13) Three(cid:13)Receiving(cid:13)Antennas(cid:13)MiniPC(cid:13)One(cid:13)Transmitting(cid:13)Antenna(cid:13)Accelerometer(cid:13)sensor(cid:13)
Fig. 1: Our prototype system.
A. Overview of Wireless Channel Data
EmoSense is driven by wireless channel data, where thereexist two options provided by the physical layer (PHY), i.e.,Received Signal Strength (RSS) and Channel State Info (CSI).The former is coarse-grained and represents the total receivedpower level at the receiver, while the latter is fine-grainedand describes signal attenuation from both time and frequencydomains. RSS is usually obtained as follows [23],
RSS = 10 log ( (cid:107) H (cid:107) ) , (1)where H = (cid:80) Nk =1 (cid:107) H k (cid:107) e jθ k . (cid:107) H k (cid:107) and θ k represent the ampli-tude and phase on the k -th signal propagation path, respec-tively.Equation (1) implies why RSS is considered to be a coarse-grained indicator because it only characterizes the total re-ceived power over all possible paths. In other words, RSS isunable to process the multi-path effect.To this end, there is a recent trend of exploring CSI, a fine-grained indicator, to extract multi-path channel features formotion detection [21], [29]. More specifically, current WiFiprotocols are based on the Orthogonal Frequency DivisionMultiplexing (OFDM) system, where H ( f, t ) is a complexvalue of channel frequency response (CFR) in terms of CSI.It describes channel performance with the amplitude and phaseinformation for the subcarrier frequency f measured at time t . It is usually formulated as follows [30]. H ( f, t ) = N (cid:88) k =1 h k ( f, t ) e − jθ k ( f,t ) , (2)where h k represents the amplitude and e θ k ( f,t ) indicates thephase shift on the k -th path caused by the propagation delay.As in our previous WiFi-based PAWS [21] and MoSense[29] systems, EmoSense also employs the fine-grained CSIfor the channel data. B. Preliminary Experiments
In order to examine how the physical expression of emotionaffects the signal, a pilot experiment is conducted as follows.
EEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 4
Emotion S ub c a rr i e r I D Empty S ub c a rr i e r I D
10 20 30 40 50
Time(s) ( d B m ) EmotionEmpty
Subcarrier1 (a) The body gesture of emotion indeed interferes withchannel response
Setting1 S ub c a rr i e r I D Setting2 S ub c a rr i e r I D
10 20 30 40 50
Time(s) ( d B m ) Setting1Setting2
Subcarrier1 (b) The layout of antennas is critical for capturing fine-grained gesture fingerprints
Subject1 (Happy) S ub c a rr i e r I D Subject2 (Happy) S ub c a rr i e r I D Subject3 (Happy) S ub c a rr i e r I D (c) The body gesture of emotion is person-dependent Fig. 2: Key observations inspiring the design of EmoSense in preliminary experiments. [Prototype].
Our prototype comprises two commodityMiniPCs, mounted with Intel Network Interface Controller(NIC) 5300 ( GHz) (See Fig 1), one of which is the senderwith one external antenna, while the other one is the receiverwith three antennas. These antennas are fixed on tripods. Thesampling rate is 100Hz. [Participants].
14 participants (5 females), aging from 21 to26, are involved in the experiments. None of them has receivedany acting training to guarantee the natural expressions ofemotions. [Environment].
The experiments were carried out in a × m office room, which contains some office furniture, suchas couches, chairs, office tables and book shelves. During theexperiments, some students are in their spots in the same room. [Emotions]. Four emotions are distinguished, i.e., happiness,sadness, anger and fear. Different audiovisual stimulations,e.g. watching video clips or listening to the music, are usedto arouse different emotions of the participants, and they areasked to perform accordingly.Through the above experiments, careful observations aremade.
1) The body gesture of emotion indeed interferes with chan-nel response:
The channel response data is indeed affected bythe physical expressions of emotions, for example in Fig.2(a).It is visualized in respect of amplitude of one subject’s physicalexpressions of happiness. The subject was asked to watch aone-minute comedy, with body-movements like clapping andleaning back and forth and laughter. The channel data capturedduring the experiment is presented in the top figure, and thatof the empty state (i.e. free of human intervention) is shownin the middle figure for comparison. The results show thatbody-movements (physical expressions) significantly affect thechannel data. In other words, the channel response data isindeed affected by the physical expressions of emotions.
2) The layout of antennas is critical for capturing fine-grained gesture fingerprints:
The channel response data onphysical expressions of emotions subject to different exper-imental settings, as a little adjustment of which can signifi-cantly affect the fingerprint on the channel data. For example, two different settings were used to record the same physicalexpression of one participant in Fig 2(b). The transmittingantenna in Setting 1 is moved 20cm closer to the participantin Setting 2. This minor adjustment produces significantlydifferent results. The channel response data captured in Setting1 exhibits a much stronger fingerprint of physical expressions;however, it is not clearly shown in Setting 2. To clarify thispoint, both settings are compared in terms of subcarrier
3) The body gesture of emotion is person-dependent:
Thechannel response data on physical expressions of emotionsdepend on persons examined. Fig. 2(c) presents the CSIamplitude of three different participants watching the same clipof one-minute comedy. Even though all of them feel the sameemotion (happiness), they have different expressions of it. Forexample, participant 3 is more dynamic with her data showingclear signal fluctuation in the bottom figure. Participant 3 isa female while the other two are males. The difference mightbe the result of different genders in expressing emotions, asfemales may be more expressive than males [16].Fig. 3 shows such an example. Firstly, the expression ofemotion is multi-modality via gestures, facial expression, andphysiological signals. Secondly, the expression of emotionclearly depends on persons. But an interesting question is thatwhether such difference is related to genders, which will bestudied in Section V.In a word, through our pilot experiments, the relation be-tween physical expressions of emotions and channel responsedata is confirmed, and meanwhile, two major challenges areto be responded to for designing EmoSense., i.e.1) How to adjust the experimental settings to enhance thefingerprint of the physical expressions of emotions?2) The fingerprint of channel response data on physical
EEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 5
Happy(cid:13) Sad(cid:13)Anger(cid:13) Fear(cid:13)
Fig. 3: The body gesture of emotion is person-dependent: anexample.expressions of emotions is person-dependent. Then howcan we distinguish the different emotions on differentpersons?In the following section, a Fresnel zone based look-upmethod will be proposed to respond to the first challenge aswell as a data-driven architecture to take on the second one.IV. S
YSTEM D ESIGN
In this section, we first present a Fresnel zone based look-upmethod for adjusting the system setup to capture fine-grainedgesture fingerprints on channel response. Then we offer acomputational intelligence driven scheme for recovering thecorresponding emotion from its body gestures.
A. A Fresnel Zone based Look-up Method
Unlike previous similar research [16], [31]–[34] relyingon the empirical experiences for system setup, we presenttheoretic analysis based on Fresnel Zones instead.Fig. 4 shows an example of the Fresnel zone, which consistsof a set of concentric ellipsoids: x a n + y b n = 1 , n = 1 · · · N (3)where Q n ( a n , b n ) is a boundary point of the n th Fresnel zone. T x and Rx represent the sender and receiver, respectively.For wireless signal with wavelength λ , the correspondingFresnel zones can be constructed as follows, | T xQ n | + | Q n Rx | − | T xRx | = n λ (4)WiFi signal, whether it is running on 2.4 GHz or 5 GHz,can hardly penetrate human beings. Therefore, a person acts Tx(cid:13)Rx1(cid:13)Rx2(cid:13)Rx3(cid:13) (a) Fresnel Zone
C0 C2C1
C0: no phase superpositionCombined signal amplitudeC2: constructive phase superpositionC1: destructive phase superposition timetimetime (b) Signal superposition
Fig. 4: An example of using Fresnel zones for enhancing thesignal.like a mirror (reflector) to it, leading to a multi-path effect.In other words, signal collected at the receiver’s end are fromtwo types of paths: direct path (also named Line-of-sight path)and reflected path (also named as Non-line-of-sight) [35].Let us particularly look at the phase shift ∆ p of the receivedsignal at Rx . Consider a person present at Q n on Fig. 4, thento Rx the LoS path is T x → Rx while the NLoS path is T x → Q n → Rx . Clearly, the NLoS path is longer than theLoS path, and the difference is | T xQ n | + | Q n Rx |−| T xRx | = n λ according to Eqn. 4. This difference in distance induces aphase shift ∆ p in signal: ∆ p = (cid:26) , n is even π, n is odd (5)Moreover, a phase shift denoted as ∆ p = π is incurredwhen the signal is reflected. As a result, the combined phaseshift ∆ p at Rx is π for the even Fresnel zone and π for theodd Fresnel zone: ∆ p = (cid:26) π, n is even π, n is odd (6)Fig. 4(b) demonstrates the combined signal. It is inspiringto see that the amplitude of the combined signal is degradedat the even zones and enhanced at the odd zones, during tothe shifted phase.The above observation urges us to leverage such phe-nomenon to envhance the impact of body gestures on channelresponse. The key idea is to adjust the layout of antennas toensure that gesture happens in the odd Fresnel zones, so as toenhance its corresponding signal fingerprint. EEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 6
TABLE I: The look-up table n | Q n O | (cid:113) λ + λl (cid:113) λ + λl ... ...n (cid:113) n λ + nλl Tx(cid:13)Rx1(cid:13)Rx2(cid:13)Rx3(cid:13)
Fig. 5: The corresponding experimental setting.To this end, we design a Fresnel zone based look-up methodto guide the layout of antennas for better performance. If thesubject locates at Q n , the distance between Q n and O can becalculated as follows, | Q n O | = (cid:112) | Q n Rx | − | ORx | = (cid:113) ( nλ + | T xRx | − |
T xQ n | ) − | ORx | , = (cid:113) ( nλ + | T xRx | ) − | OT x | , = (cid:113) ( nλ + | OT x | ) − | OT x | , = (cid:113) n λ + nλ | T xRx | , (7)The wavelength of WiFi signal under 2.4GHz and 5GHz is2cm and 6cm, respectively. If we define the distance between T x and Rx as l , a look-up table like Tab. I can be constructedto set up the system quickly to ensure a better resolution.Fig. 5 shows one of our system setups as an example.The prototype consists of one transmitting antenna T x andthree receiving antennas Rx , Rx and Rx . As the distancebetween T x and Rx is 120 cm, the look-up table points thatthe subject should be 40 cm away from this pair of transceiver,so that her gesture on channel response can be enhanced at the th Fresnel zone [36]. The locations of the rest two receivingantennas have been determined in a similar way. B. A Computational Intelligence driven Architecture
Though the expression of emotions varies with persons,there still exist certain common patterns that can be explored,e.g., dancing for joy. This observation inspires us to develop adata-driven architecture leveraging computational intelligenceto efficiently extract those patterns for emotion recognition, asshown in Fig. 6.Fig. 6 shows the system architecture of EmoSense. Like anytypical data-mining system, EmoSense also relies on mining Fig. 6: System architecture of EmoSense.the data for emotion recognition. Therefore, the training datais essential. It flows from the data receiving module to thepreprocessing module for interpolation, denoising and featureextraction, and then reaches the SQLite database. After thetraining phase, EmoSense is online for testing. The testingdata also originates from the data receiving module to thepreprocessing module, and then reaches the classificationmodule for emotion recognition.[
Preprocessing Module ]. The raw data may be not completedue to information loss on the noisy channel. Therefore, wefirst correct this issue via a commonly-used linear interpolationtechnique. Then we filter the raw data with a Butterwoth filter[26]. The cut-off frequency ω c of the Butterworth filter is setto ω c = π · fF s = π · = 0 . rad/s, where F s represents thesampling rate (100 samples per second in our system).[ Classification Module ]. This module leverages temporal-frequency features extracted from the gesture fingerprint todeduct the corresponding emotion. Here three classic classi-fiers, i.e., k-NN, NaiveBayes, and Bagging, are used.V. P
ERFORMANCE E VALUATION
In this section, we will conduct an exhaustive evaluation ofthe performance of EmoSence.
A. Evaluation Setup
A prototype system of EmoSense is built and evaluated inthe real environment setting, which is a × m office, asshown in Fig. 1. It has office furniture, such as couches, chairs,computer tables and book shelves. Some students are also inthe same office doing their works during the experiments, forthe purpose of providing a real-world environment. [Metric]. A confusion matrix containing the overall accuracy(cf. [16], [21], [29], [31]) is used to evaluate the overallperformance of EmoSense. [Data Set].
For each emotion, we define it from three differ-ence motion sequences. And each participant is required toperform 20 times of each sequence. Then, we have × ×
20 = 3360 data entries for the data set.
EEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 7 (a) The sensor-Based system (b) The vision-based system
Fig. 7: Two Mainstream Benchmarks. [Feature].
As in [37], seven features in both time and fre-quency domains are been selected, namely, • Standard deviation. ρ = (cid:113) N (cid:80) Ni =1 ( x i − µ ) • Average absolute error. ∆ = N (cid:80) Ni =1 | ∆ i | • Skewness.
Skew ( X ) = E [( X − µρ ) ] • Kurtosis. K = (cid:80) i =1 K ( x i − x ) f i ns • Entropy. H ( X ) = − (cid:80) i P ( x i ) log b P ( x i ) • Standard deviation of the velocity of the signal changing. • Median. M ( X ) = (cid:40) x n +12 , n is odd x n + x { n } , n is even [Classifier]. Three classic classifiers have been used, i.e., k-Nearest Neighbor (k-NN), Support Vector Machine (SVM) andNaive Bayes.
B. Main-stream Benchmarks
We design a sensor-based and a vision-based system tocapture the same physical expression as EmoSense does asperformance benchmarks. [Sensor-based].
The sensor-based system is built upon an Ar-duino platform mounted with three ADXL345 accelerometersensors shown in Fig. 7(a). They are attached to the forehead,and two wrists of the subject, respectively. We use the samefeatures and classifiers for both EmoSense and the sensor-based system. [Vision-based].
The vision-based system is based on anopen-source machine learning library ‘deeplear.js’ released byGoogle as shown in Fig. 7(b). The built-in camera of the laptopis used to capture both the facial and physical expression ofthe subject. Each frame will first be processed by a mini neuralnetwork named SqueezeNet, and the penultimate layer in theneural networks will be utilized for training and testing witha KNN classifier.
C. Overall Evaluation
In this part, we present and analyze the evaluation results,which are achieved through ten-fold cross-validation. [Inset Classification].
The inset testing is to use all the datasets for training. Its result serves as the upperbound of theperformance. Table II shows the corresponding results, whereEmoSense with kNN has the best performance : . % ACC,while the performance degeneration of SVM over kNN on thesame dataset is about 15% to 20%. It is quite interesting that TABLE II: Confusion Matrix of k-NN algorithm: Inset Test(Upperbound) HAPPY(%) SAD(%) ANGER(%) FEAR(%)HAPPY(%) KNN
SVM 9.52 10.00 2.38
NaiveBayes 24.76 4.76 47.14
KNN Avg. 93.33%
TABLE III: Confusion Matrix of k-NN algorithm: Person-dependent
HAPPY(%) SAD(%) ANGER(%) FEAR(%)HAPPY(%) KNN
SVM 13.33 10.48 2.38
NaiveBayes 23.81% 7.14 46.19
KNN Avg. 84.88%
Naive Bayes is much worse than its two rivals by achievingonly % ACC. We think the reason lies in two folders: 1) Theindependence assumption of Naive Bayes is not quite suitablefor our case; 2) Naive Bayes is particularly sensitive to theinitial training dataset.Specifically, happy has the highest recognition ratio(100.00%) while fear is the hardest to identify (83.81%). Inother words, happy has never been misinterpreted as otheremotions, and it is the most distinguishable. The video footagehas been carefully examined to understand such phenomenon.It is found that fear has the least intensive expression whilehappy has the most. It is also interesting that fear sometimescan be misjudges as happy (6.67%). For instance, Fig. 8 showsdifferent emotions in signal for the same subject. We cansee happy and anger have larger signal fluctuation (intensity)compared to sad and fear, which means that they contain moresignificant physical expressions. [Person-dependent Classification].
In this case, we selectpartial data of some subject for training and use the restfor testing. The results are concluded in in Table III, whereEmoSense achieves 84.88% ACC, which is close to theperformance upper bound (93.33%). k-NN is still the bestamong three classifiers. Fear still has the worst recognitionaccuracy among all four emotions. But anger has the bestperformance (89.05%) here rather than happy as in the in-set classification. The phenomenon is consistent with Fig. 8.The results confirm that there exist certain common patternsof expression for emotion across different persons. [Person-Independent Classification].
For person-independent classification, we exclude one subject’s data set
EEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 8
TABLE IV: Confusion Matrix of k-NN algorithm: Person-independent
HAPPY(%) SAD(%) ANGER(%) FEAR(%)HAPPY(%) KNN
SVM 33.33 23.57 4.17
NaiveBayes 27.02 4.52 47.86
KNN Avg. 40.86% A m p li t ude (a) Happy Performance in Signal A m p li t ude (b) Sad Performance in Signal A m p li t ude (c) Anger Performance in Signal A m p li t ude (d) Fear Performance in Signal Fig. 8: Emotions Performance Compare.from the training set and use it for testing. The result is theaverage of all experimenters. As expected, the performanceof EmoSense degenerates significantly. As shown in TableIV, EmoSense achieves only . % accuracy on average.We find that happy still is the highest (46.43%) whilesad becomes the lowest(35.48%). The huge performancedeterioration of all four emotions over the previous caseimplies that the expression of emotion indeed in person-dependent. In particular, sad has the largest performancedegeneration, indicating that its expression heavily relies onthe subject. Fig. 9: Gender-based Classification. [Gender-based Classification]. Fig. 9 shows the impact ofgender on the overall system performance. For the inset test,both genders achieve high averaged accuracy, i.e., 94.37% and Fig. 10: Emosense and Benchmarks.91.94%, while male is slightly better. For the person-dependentand person-independent cases, female slightly outperformsmale, i.e, 86.11% vs 83.96% and 43.26% vs 38.75%. Inother words, we have not observed any significant differencesbetween genders as suggested in other references [16]. Onepossible reason is that the studied emotions are normal in ourdaily lives, where both genders share certain similarities in theexpression. In the future work, we will involve more emotionsand participants to further clarify this issue.
D. Evaluation via State-of-the-art [EmoSense versus Sensor-based system].
Sensors are usuallycontact and thus posses much less noisy than the wirelesssignal. Therefore, they are considered as a more reliable datasource for recognition. It should serve as a golden performancefor EmoSense. Since both systems record body gesture ofemotion, we use the same classifiers and features for a faircomparison.As expected, the sensor-based solution outperformsEmoSense in all four emotions by achieving 95.12%, 94.87%, 100% and 94.12% accuracy, respectively. It can be ob-served very intuitively that the two fold lines of sensor basedand EmoSense have the same transformation trend, and theperformance gap in each emotion are all maintain only ataround 10%. It suggests that our idea of exploring the physicalexpression for emotion sensing is proper and the gap ismainly caused by the background noise of wireless signal.The results also verify our previous observation that fear hasthe worst performance, 91.12%, while happy and anger arebetter, 92.12% and 100%. [EmoSense versus Vision-based system].
Fig. 10 indicatesthat the vision-based solution performs better than EmoSense,and the performance gap is about 10%. But unlike in Emosensewhere sad can be hardly recognized, here it has the bestperformance (97.94%) and the performance gap between twosystems is 13.65%. Because the expression of sad usuallyconcentrates on the face than the body. Therefore, the vision-based system leveraging the facial expression can recognizesad accurately.
Summary . Our early study [28] has already confirmed thefeasibility of leveraging wireless signal for gesture and emo-tion recognition. Here, we not only confirmed its feasibilityagain, but also verified the proposed EmoSense system viaextensive comparative experiments. As a result, we belivethat EmoSense, which provides a reliable and transparentfall sensing service, constitutes attempting emotion sensingsolution in real-world.
EEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 9
Time(s) A m p li t ud e ( d B m ) S ub ca rr i e r I D Between Tx and Rx 80cm away from Rx 160cm away from Rx
Subcarrier
Fig. 11: A case study showing the robustness of EmoSense tothe noises
E. Further Discussions [Ambient noise] . The ambient noise caused by nearby devicesand persons could affect the performance of EmoSense. Inour previous work [29], we have already shown that theinterference from other wireless devices is quite limited. Here,we study the robustness issue to ambient noises caused bynearby humans. To this end, a new experiment has beenconducted as follows: • One male participant performs the same hand gesture(waving up and down) at three different locations: be-tween the Tx and Rx , cm away from Rx , and . maway from Rx .Fig. 11 records the CSI amplitude data. As pointed outin the figure, the same hand gesture has totally differentimpacts on channel data at different locations. The closer to T x and Rx , the better the gesture can be captured. The gestureperformed at the third location can hardly affect channelresponse. In other words, EmoSense is robust to noises causedby surrounding persons. [Real-world Application] . Currently, EmoSense is limited inpractice since only four emotions can be recognized. But itssystem architecture is essentially data-driven, which could beextended for more emotions that do have physical expressions.This is one of key directions for the future work.Even for the current EmoSense system, there may existsome real-world applications. For instance, there will berehearsals before the first stage performance of a comedy. Theclub will set up the price of the ticket based on the reactionof the audience. EmoSense can be used for the exact purposewithout any privacy concerns.VI. C ONCLUSION AND F UTURE W ORK
In this paper, we present EmoSense, a first-of-its-kindwireless emotion sensing system driven by computational in-telligence. It has been prototyped on off-the-shelf WiFi devicesand evaluated in real environments. Two traditional rivals, i.e.,visions-based and sensor-based, have been realized for thecomparative study. Performance evaluation over 3360 casessuggests that EmoSense achieves a comparable performanceto the vision-based and sensor-based rivals under differentscenarios with a classic k-Nearest Neighbor (kNN) classifier. For the future work, there exists several open issues. Firstly,EmoSense and its rivals hinge upon human gestures as expres-sion of emotion, which still remains a blur by far. For example,dishonest people can deceive the system by intentionallybehaving in certain ways. A possible solution is to leveragethe multi-modality feature of emotion. Secondly, EmoSenseis data-driven. But it is a common sense that psychologyknowledge is also very important. Therefore, it is more rea-sonable to couple both data and psychology knowledge formore reliable and accurate emotion recognition. Last but notleast, the physical expression of emotion is affected by manycongenital and acquired factors, some of which are totally outof control. Therefore, it is important to clarify the potentialscenarios before we actually deploy the system.A
CKNOWLEDGMENTS
This work is sponsored by the National Natural ScienceFoundation of China (NSFC) under Grant No. 61772169,National Key Research and Development Program under GrantNo.2018YFB0803403, the Fundamental Research Funds forthe Central Universities under No.JZ2018HGPA0272, andOpen Projects by Jiangsu Province Key Laboratory of Internetof Things under No.JSWLW-2017-002.R
EFERENCES[1] N. Fragopanagos and J. G. Taylor, “Emotion recognition in human-computer interaction,”
IEEE Signal Processing Magazine , vol. 18, no. 1,pp. 32–80, 2002.[2] H. Jing, X. Lun, L. Dan, H. Zhijie, and W. Zhiliang, “Cognitiveemotion-modelforeldercarerobotinsmarthome,”
China Communications , no. 4,2015.[3] K. R. El, R. Picard, and S. Baroncohen, “Affective computing andautism.”
Annals of the New York Academy of Sciences , vol. 1093, no. 1,pp. 228–248, 2010.[4] S. V. Ioannou, A. T. Raouzaiou, V. A. Tzouvaras, T. P. Mailis, K. C.Karpouzis, and S. D. Kollias, “Emotion recognition through facialexpression analysis based on a neurofuzzy network.”
Neural Networks ,vol. 18, no. 4, p. 423, 2005.[5] K. Wang, N. An, B. N. Li, Y. Zhang, and L. Li, “Speech emotionrecognition using fourier parameters,”
IEEE Transactions on AffectiveComputing , vol. 6, no. 1, pp. 69–75, 2017.[6] R. Jenke, A. Peer, and M. Buss, “Feature extraction and selectionfor emotion recognition from eeg,”
IEEE Transactions on AffectiveComputing , vol. 5, no. 3, pp. 327–339, 2017.[7] S. Katsigiannis and N. Ramzan, “Dreamer: A database for emotionrecognition through eeg and ecg signals from wireless low-cost off-the-shelf devices,”
IEEE Journal of Biomedical & Health Informatics ,vol. PP, no. 99, pp. 1–1, 2017.[8] D. Glowinski, A. Camurri, G. Volpe, N. Dael, and K. Scherer, “Tech-nique for automatic emotion recognition by body gesture analysis,” in
IEEE Computer Vision and Pattern Recognition Workshops , Anchorage,AK, USA, June 2008, pp. 1–6.[9] M. Minsky, “The society of mind,”
Personalist Forum , vol. 3, no. 1, pp.19–32, 1987.[10] R. W. Picard,
Affective computing . MIT Press, 1997.[11] S. T. Saste and S. M. Jagdale, “Emotion recognition from speechusing mfcc and dwt for security system,” in
International Conferenceof Electronics, Communication and Aerospace Technology , Heidelberg,Germany, April 2017, pp. 701–704.[12] Z. Liu, M. Wu, W. Cao, L. Chen, J. Xu, R. Zhang, M. Zhou, and J. Mao,“A facial expression emotion recognition based human-robot interactionsystem,”
IEEE/CAA Journal of Automatica Sinica , vol. 4, no. 4, pp.668–676, 2017.[13] S. N. Shivhare, S. Garg, and A. Mishra, “Emotionfinder: Detectingemotion from blogs and textual documents,” in
International Conferenceon Computing, Communication Control & Automation , Pune, India,February 2015, pp. 52–57.
EEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 10 [14] S. Shaheen, W. Elhajj, H. Hajj, and S. Elbassuoni, “Emotion recognitionfrom text based on automatically generated rules,” in
IEEE InternationalConference on Data Mining Workshop , NJ,USA, November 2015, pp.383–392.[15] S. Wioleta, “Using physiological signals for emotion recognition,” in
TheInternational Conference on Human System Interaction , Gdansk,Poland,March 2013, pp. 556–561.[16] M. Zhao, F. Adib, and D. Katabi, “Emotion recognition using wirelesssignals,” in
Proc. of ACM MobiCom , New York, USA, October 2016,pp. 95–108.[17] H. R. Lv, Z. L. Lin, W. J. Yin, and J. Dong, “Emotion recognition basedon pressure sensor keyboards,” in
IEEE International Conference onMultimedia and Expo , Hannover,Germany, June 2008, pp. 1089–1092.[18] M. Pusara and C. E. Brodley, “User re-authentication via mouse move-ments,” in
The Workshop on Visualization & Data Mining for ComputerSecurity , 2004, pp. 1–8.[19] S. Piana, A. Staglian, F. Odone, A. Verri, and A. Camurri, “Real-timeautomatic emotion recognition from body gestures,”
Computer Science ,vol. 1, no. 1, pp. 1–28, 2014.[20] C. Perera, A. Zaslavsky, P. Christen, and D. Georgakopoulos, “Contextaware computing for the internet of things: A survey,”
IEEE Communi-cations Surveys & Tutorials , vol. 16, no. 1, pp. 414–454, 2014.[21] Y. Gu, F. Ren, and J. Li, “Paws: Passive human activity recognitionbased on wifi ambient signals,”
IEEE Internet of Things Journal , vol. 3,no. 5, pp. 796–805, Oct 2016.[22] S. Sigg, U. Blanke, and G. Troster, “The telepathic phone: Frictionlessactivity recognition from wifi-rssi,” in
Proc. of the IEEE PERCOM 2014 ,Budapest, Hungary, March 2014, pp. 148–155.[23] Z. Yang, Z. Zhou, and Y. Liu, “From rssi to csi: Indoor localization viachannel response,”
Acm Computing Surveys , vol. 46, no. 2, pp. 25:1–25:32, 2013.[24] Y. Zeng, P. H. Pathak, C. Xu, and P. Mohapatra, “Your ap knows howyou move: fine-grained device motion recognition through wifi,” in
Proc.of the 1st ACM workshop on Hot topics in wireless , Maui, Hawaii, Sep2014, pp. 49–54.[25] G. Wang, Y. Zou, Z. Zhou, K. Wu, and L. M. Ni, “We can hear you withwi-fi!” in
Proc. of ACM MOBICOM 2014 , Maui, Hawaii, Sep 2014, pp.593–604.[26] K. Ali, A. X. Liu, W. Wang, and M. Shahzad, “Keystroke recognitionusing wifi signals,” in
Proc. of ACM MobiCom’15 , no. 13, Paris, France,Sept 2015, pp. 90–102.[27] B. Chen, V. Yenamandra, and K. Srinivasan, “Tracking keystrokes usingwireless signals,” in
Proc. of the 13th Annual International Conferenceon Mobile Systems, Applications, and Services , ser. MobiSys ’15, no. 14,Florence, Italy, March 2015, pp. 31–44.[28] Y. Gu, T. Liu, J. Li, F. Ren, Z. Liu, X. Wang, and P. Li, “Emosense:Data-driven emotion sensing via off-the-shelf wifi devices,” in , Kansas City,SUSA, May 2018, pp. 1–6.[29] Y. Gu, J. Zhan, Y. Ji, F. Ren, J. Li, and S. Gao, “Mosense: An rf-basedmotion detection system via off-the-shelf wifi devices,”
IEEE Internetof Things Journal , vol. 4, no. 6, pp. 2326–2341, 2017.[30] D. N. C. Tse and P. Viswanath, “Fundamentals of wireless communica-tion (tse, d. and viswanath, P.) [book review],”
IEEE Trans. InformationTheory , vol. 55, no. 2, pp. 919–920, 2009.[31] S. Sigg, M. Scholz, S. Shi, Y. Ji, and M. Beigl, “Rf-sensing of activitiesfrom non-cooperative subjects in device-free recognition systems usingambient and local signals,”
IEEE Transactions on Mobile Computing ,vol. 13, no. 4, pp. 907–920, 2014.[32] L. Feng, Z. Yan, S. Chen, and A. Wang, “Sleepsense: A noncontactand cost-effective sleep monitoring system,”
IEEE Transactions onBiomedical Circuits & Systems , pp. 1–14, 2016.[33] X. Zheng, J. Wang, L. Shangguan, Z. Zhou, and Y. Liu, “Smokey:Ubiquitous smoking detection with commercial wifi infrastructures,” in
Proc. of IEEE INFOCOM 2016 , Hong Kong, April 2015, pp. 17–18.[34] H. Wang, D. Zhang, Y. Wang, J. Ma, Y. Wang, and S. Li, “Rt-fall:A real-time and contactless fall detection system with commodity wifidevices,”
IEEE Transactions on Mobile Computing , vol. 16, no. 2, pp.511–526, Feb 2017.[35] D. Zhang, H. Wang, and D. Wu, “Toward centimeter-scale humanactivity sensing with wi-fi signals,”
Computer , vol. 50, no. 1, pp. 48–57,Jan 2017.[36] H. Wang, D. Zhang, J. Ma, Y. Wang, Y. Wang, D. Wu, T. Gu, andB. Xie, “Human respiration detection with commodity wifi devices: douser location and body orientation matter?” in
Proc. of ACM UbiComp ,Heidelberg, Germany, Sep 2016, pp. 25–36. [37] B. Samanta and K. Al-Balushi, “Artificial neural network based faultdiagnostics of rolling element bearings using time-domain features,”
Mechanical systems and signal processing , vol. 17, no. 2, pp. 317–328,2003.
Yu Gu (M’10-SM’12)received the B.E. degree fromthe Special Classes for the Gifted Young, Universityof Science and Technology of China, Hefei, China,in 2004, and the D.E. degree from the same univer-sity in 2010.In 2006, he was an Intern with MicrosoftResearch Asia, Beijing, China, for seven months.From 2007 to 2008, he was a Visiting Scholar withthe University of Tsukuba, Tsukuba, Japan. From2010 to 2012, he was a JSPS Research Fellow withthe National Institute of Informatics, Tokyo, Japan.He is currently a Professor and Dean Assistant withthe School of Computer and Information, Hefei University of Technology,Hefei, China. His current research interests include pervasive computing andaffective computing. He was the recipient of the IEEE Scalcom2009 ExcellentPaper Award and NLP-KE2017 Best Paper Award. He is a member of ACMand a senior member of IEEE.
Yantong Wang received the B.E degree from theShanghai Normal University in 2016. From 2017 tonow, she is a postgraduate student in the Hefei Uni-versity of Technology. Her research interest includesaffective computing and sensorless sensing.
Tao Liu received the B.E degree from the AnqingNormal University in 2014. From 2016 to now, Heis a postgraduate student in the Hefei University ofTechnology. His research interest includes motionsensing and affective computing.
Yusheng Ji received B.E., M.E., and D.E. degrees inelectrical engineering from the University of Tokyo.She joined the National Center for Science Informa-tion Systems, Japan (NACSIS) in 1990. Currently,she is a Professor at the National Institute of In-formatics, Japan (NII), and the Graduate Universityfor Advanced Studies (SOKENDAI). She is alsoappointed as a Visiting Professor at the Universityof Science and Technology of China (USTC). Herresearch interests include network architecture, re-source management, and performance analysis forquality of service provisioning in wired and wireless communication networks.
EEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 11
Zhi Liu (SM11-M14) received the B.E., from theUniversity of Science and Technology of China,China and Ph.D. degree in informatics in NationalInstitute of Informatics. He is currently an AssistantProfessor at Shizuoka University. He was a JuniorResearcher (Assistant Professor) at Waseda Univer-sity and a JSPS research fellow in National Instituteof InformaticsHis research interest includes video network trans-mission, vehicular networks and mobile edge com-puting. He was the recipient of the IEEE Stream-Comm2011 best student paper award, 2015 IEICE Young Researcher Awardand ICOIN2018 best paper award. He is and has been a Guest Editor ofjournals including Wireless Communications and Mobile Computing, Sensorsand IEICE Transactions on Information and Systems. He has been servingas the chair for number of international conference and workshops. He is amember of IEEE and IEICE.
Peng Li (S’10-M’12) received his BS degree fromHuazhong University of Science and Technology,China, in 2007, the MS and PhD degrees fromthe University of Aizu, Japan, in 2009 and 2012,respectively. He is currently an Associate Professorin the University of Aizu, Japan. His research inter-ests mainly focus on cloud computing, Internet-of-Things, big data systems, as well as related wiredand wireless networking problems. He is a memberof IEEE.
Xiaoyan Wang received the BE degree from Bei-hang University, China, and the ME and Ph. D. fromthe University of Tsukuba, Japan. He is currentlyworking as an assistant professor with the GraduateSchool of Science and Engineering at Ibaraki Uni-versity, Japan. Before that, he worked as an assistantprofessor (by special appointment) at National Insti-tute of Informatics (NII), Japan, from 2013 to 2016.His research interests include networking, wirelesscommunications, cloud computing, big data, securityand privacy.
XinAn is currently an Associate Professor in schoolof Computer and Information, Hefei University ofTechnology. He received his bachelor’s degree andmaster’s degree in computer science from ShandongUniversity in 2007 and 2010 respectively. From 2010to 2013, he worked as a Ph.D candidate in INRIA-Grenoble and received his Ph.D degree in ComputerScience from Universite de Grenoble in 2013. Hisresearch interests focus on the design and control ofadaptive embedded systems.