Reza Lotfian | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Reza Lotfian is active.

Explore More

Publication

Featured researches published by Reza Lotfian.

Proceedings of the conference on Wireless Health | 2012

Impact of sensor misplacement on dynamic time warping based human activity recognition using wearable computers

Nimish Kale; Jaeseong Lee; Reza Lotfian; Roozbeh Jafari

Daily living activity monitoring is important for early detection of the onset of many diseases and for improving quality of life especially in elderly. A wireless wearable network of inertial sensor nodes can be used to observe daily motions. Continuous stream of data generated by these sensor networks can be used to recognize the movements of interest. Dynamic Time Warping (DTW) is a widely used signal processing method for time-series pattern matching because of its robustness to variations in time and speed as opposed to other template matching methods. Despite this flexibility, for the application of activity recognition, DTW can only find the similarity between the template of a movement and the incoming samples, when the location and orientation of the sensor remains unchanged. Due to this restriction, small sensor misplacements can lead to a decrease in the classification accuracy. In this work, we adopt DTW distance as a feature for real-time detection of human daily activities like sit to stand in the presence of sensor misplacement. To measure this performance of DTW, we need to create a large number of sensor configurations while the sensors are rotated or misplaced. Creating a large number of closely spaced sensors is impractical. To address this problem, we use the marker based optical motion capture system and generate simulated inertial sensor data for different locations and orientations on the body. We study the performance of the DTW under these conditions to determine the worst-case sensor location variations that the algorithm can accommodate.

wearable and implantable body sensor networks | 2011

A Low Power Wake-Up Circuitry Based on Dynamic Time Warping for Body Sensor Networks

Roozbeh Jafari; Reza Lotfian

Enhancing the wear ability and reducing the form factor often are among the major objectives in design of wearable platforms. Power optimization techniques will significantly reduce the form factor and/or will prolong the time intervals between recharges. In this paper, we propose an ultra low power programmable architecture based on Dynamic Time Warping specifically designed for wearable inertial sensors. The low power architecture performs the signal processing merely as fast as the production rate for the inertial sensors, and further considers the minimum bit resolution and the number of samples that are just enough to detect the movement of interest. Our results show that the power consumption for inertial based monitoring systems can be reduced by at least three orders of magnitude using our proposed architecture compared to the state-of-the-art low power microcontrollers.

international conference on acoustics, speech, and signal processing | 2015

Emotion recognition using synthetic speech as neutral reference

Reza Lotfian; Carlos Busso

A common approach to recognize emotion from speech is to estimate multiple acoustic features at sentence or turn level. These features are derived independent of the underlying lexical content. Studies have demonstrated that lexical dependent models improve emotion recognition accuracy. However, current practical approaches can only model small lexical units like phonemes, syllables or few key words, which limits these systems. We believe that building longer lexical models (i.e., sentence level model) is feasible by leveraging the advances in speech synthesis. Assuming that the transcript of the target speech is available, we synthesize speech conveying the same lexical information. The synthetic speech is used as a neutral reference model to contrast different acoustic features, unveiling local emotional changes. This paper introduces this novel framework and provides insights on how to compare the target and synthetic speech signals. Our evaluations demonstrate the benefits of synthetic speech as neutral reference to incorporate lexical dependencies in emotion recognition. The experimental results show that adding features derived from contrasting expressive speech with the proposed synthetic speech reference increases the accuracy in 2.1% and 2.8% (absolute) in classifying low versus high levels of arousal and valence, respectively.

design, automation, and test in europe | 2013

An ultra-low power hardware accelerator architecture for wearable computers using dynamic time warping

Reza Lotfian; Roozbeh Jafari

Movement monitoring using wearable computers has been widely used in healthcare and wellness applications. To reduce the form factor of wearable nodes which is dominated by battery size, ultra-low power signal processing is crucial. In this paper, we propose an architecture that can be viewed as a hardware accelerator and employs dynamic time warping (DTW) in a hierarchical fashion. The proposed architecture removes events that are not of interest from the signal processing chain as early as possible, deactivating all remaining modules. We consider tunable parameters such as sampling frequency and bit resolution of the incoming sensor readings for DTW to balance the power consumption and classification precision trade-off. We formulate a methodology for determining the optimal set of tunable parameters and provide a solution using Active-set algorithm. We synthesized the architecture using 45nm CMOS and illustrated that a three-tiered module achieves 98% accuracy with a power budget of 1.23µW, while a single level DTW consumes 6.3µW with the same accuracy. We furthermore propose a fast approximation methodology that runs 3200 times faster while introducing less than 3% error over the original optimization for determining the total power consumption.

acm multimedia | 2012

Immersive multiplayer tennis with microsoft kinect and body sensor networks

Suraj Raghuraman; Karthik Venkatraman; Zhanyu Wang; Jian Wu; Jacob Clements; Reza Lotfian; Balakrishnan Prabhakaran; Xiaohu Guo; Roozbeh Jafari; Klara Nahrstedt

We present an immersive gaming demonstration using the minimum amount of wearable sensors. The game demonstrated is two-player tennis. We combine a virtual environment with real 3D representations of physical objects like the players and the tennis racquet (if available). The main objective of the game is to provide as real an experience of tennis as possible, while also being as less intrusive as possible. The game is played across a network, and this opens the possibility of two remote players playing a game together on a single virtual tennis pitch. The Microsoft Kinect sensors are used to obtain a 3D point cloud and a skeletal map representation of the player. This 3D point cloud is mapped on to the virtual tennis pitch. We also use a wireless wearable Attitude and Heading Reference System (AHRS) mote, which is strapped onto the wrist of the players. This mote gives us precise information about the movement (swing, rotation etc.) of the playing arm. This information along with the skeletal map is used to implement the physics of the game. Using this game we demonstrate our solutions for simultaneous data acquisition, 3D point-cloud mapping in a virtual space, use of the Kinect and AHRS sensors to calibrate real and virtual objects and for interaction of virtual objects with a 3D point cloud.

international conference on acoustics, speech, and signal processing | 2016

Practical considerations on the use of preference learning for ranking emotional speech

Reza Lotfian; Carlos Busso

A speech emotion retrieval system aims to detect a subset of data with specific expressive content. Preference learning represents an appealing framework to rank speech samples in terms of continuous attributes such as arousal and valence. The training of ranking classifiers usually requires pairwise samples where one is preferred over the other according to a specific criterion. For emotional databases, these relative labels are not available and are very difficult to collect. As an alternative, they can be derived from existing absolute emotional labels. For continuous attributes, we can create relative rankings by forming pairs with high and low values of a specific attribute which are separated by a predefined margin. This approach raises questions about efficient approaches for building such a training set, which is important to improve the performance of the emotional retrieval system. This paper analyzes practical considerations in training ranking classifiers including optimum number of pairs used during training, and the margin used to define the relative labels. We compare the preference learning approach to binary classifier and regression models. The experimental results on a spontaneous emotional database indicate that a rank-based classifier with fine-tuned parameters outperforms the other two approaches in both arousal and valence dimensions.

conference of the international speech communication association | 2016

Retrieving categorical emotions using a probabilistic framework to define preference learning samples

Reza Lotfian; Carlos Busso

Preference learning is an appealing approach for affective recognition. Instead of predicting the underlying emotional class of a sample, this framework relies on pairwise comparisons to rank-order the testing data according to an emotional dimension. This framework is relevant not only for continuous attributes such as arousal or valence, but also for categorical classes (e.g., is this sample happier than the other?). A preference learning system for categorical classes can have applications in several domains including retrieving emotional behaviors conveying a target emotion, and defining the emotional intensity associated with a given class. One important challenge to build such a system is to define relative labels defining the preference between training samples. Instead of building these labels from scratch, we propose a probabilistic framework that creates relative labels from existing categorical annotations. The approach considers individual assessments instead of consensus labels, creating a metrics that is sensitive to the underlying ambiguity of emotional classes. The proposed metric quantifies the likelihood that a sample belong to a target emotion. We build happy, angry and sad rank-classifiers using this metric. We evaluate the approach over cross-corpus experiments, showing improved performance over binary classifiers and rank-based classifiers trained with consensus labels.

IEEE Transactions on Affective Computing | 2017

Building Naturalistic Emotionally Balanced Speech Corpus by Retrieving Emotional Speech From Existing Podcast Recordings

Reza Lotfian; Carlos Busso

The lack of a large, natural emotional database is one of the key barriers to translate results on speech emotion recognition in controlled conditions into real-life applications. Collecting emotional databases is expensive and time demanding, which limits the size of existing corpora. Current approaches used to collect spontaneous databases tend to provide unbalanced emotional content, which is dictated by the given recording protocol (e.g., positive for colloquial conversations, negative for discussion or debates). The size and speaker diversity are also limited. This paper proposes a novel approach to effectively build a large, naturalistic emotional database with balanced emotional content, reduced cost and reduced manual labor. It relies on existing spontaneous recordings obtained from audio-sharing websites. The proposed approach combines machine learning algorithms to retrieve recordings conveying balanced emotional content with a cost effective annotation process using crowdsourcing, which make it possible to build a large scale speech emotional database. This approach provides natural emotional renditions from multiple speakers, with different channel conditions and conveying balanced emotional content that are difficult to obtain with alternative data collection protocols.

international conference on acoustics, speech, and signal processing | 2017

Ranking emotional attributes with deep neural networks

Srinivas Parthasarathy; Reza Lotfian; Carlos Busso

Studies have shown that ranking emotional attributes through preference learning methods has significant advantages over conventional emotional classification/regression frameworks. Preference learning is particularly appealing for retrieval tasks, where the goal is to identify speech conveying target emotional behaviors (e.g., positive samples with low arousal). With recent advances in deep neural networks (DNNs), this study explores whether a preference learning framework relying on deep learning can outperform conventional ranking algorithms. We use a deep learning ranker implemented with the RankNet algorithm to evaluate preference between emotional sentences in terms of dimensional attributes (arousal, valence and dominance). The results show improved performance over ranking algorithms trained with support vector machine (SVM) (i.e., RankSVM). The results are significantly better than performance reported in previous work, demonstrating the potential of RankNet to retrieve speech with target emotional behaviors.

conference of the international speech communication association | 2014