Parham Aarabi
University of Toronto
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Parham Aarabi.
systems man and cybernetics | 2004
Bob Mungamuru; Parham Aarabi
A new approach to sound localization, known as enhanced sound localization, is introduced, offering two major benefits over state-of-the-art algorithms. First, higher localization accuracy can be achieved compared to existing methods. Second, an estimate of the source orientation is obtained jointly, as a consequence of the proposed sound localization technique. The orientation estimates and improved localizations are a result of explicitly modeling the various factors that affect a microphones level of access to different spatial positions and orientations in an acoustic environment. Three primary factors are accounted for, namely the source directivity, microphone directivity, and source-microphone distances. Using this model of the acoustic environment, several different enhanced sound localization algorithms are derived. Experiments are carried out in a real environment whose reverberation time is 0.1 seconds, with the average microphone SNR ranging between 10-20 dB. Using a 24-element microphone array, a weighted version of the SRP-PHAT algorithm is found to give an average localization error of 13.7 cm with 3.7% anomalies, compared to 14.7 cm and 7.8% anomalies with the standard SRP-PHAT technique.
EURASIP Journal on Advances in Signal Processing | 2003
Parham Aarabi
This paper presents a general method for the integration of distributed microphone arrays for localization of a sound source. The recently proposed sound localization technique, known as SRP-PHAT, is shown to be a special case of the more general microphone array integration mechanism presented here. The proposed technique utilizes spatial likelihood functions (SLFs) produced by each microphone array and integrates them using a weighted addition of the individual SLFs. This integration strategy accounts for the different levels of access that a microphone array has to different spatial positions, resulting in an intelligent integration strategy that weighs the results of reliable microphone arrays more significantly. Experimental results using 10 2-element microphone arrays show a reduction in the sound localization error from 0.9 m to 0.08 m at a signal-to-noise ratio of 0 dB. The proposed technique also has the advantage of being applicable to multimodal sensor networks.
systems man and cybernetics | 2004
Parham Aarabi; Guangji Shi
A dual-microphone speech-signal enhancement algorithm, utilizing phase-error based filters that depend only on the phase of the signals, is proposed. This algorithm involves obtaining time-varying, or alternatively, time-frequency (TF), phase-error filters based on prior knowledge regarding the time difference of arrival (TDOA) of the speech source of interest and the phases of the signals recorded by the microphones. It is shown that by masking the TF representation of the speech signals, the noise components are distorted beyond recognition while the speech source of interest maintains its perceptual quality. This is supported by digit recognition experiments which show a substantial recognition accuracy rate improvement over prior multimicrophone speech enhancement algorithms. For example, for a case with two speakers with a 0.1 s reverberation time, the phase-error based technique results in a 28.9% recognition rate gain over the single channel noisy signal, a gain of 22.0% over superdirective beamforming, and a gain of 8.5% over postfiltering.
systems man and cybernetics | 2006
Duy Cuong Nguyen; David Halupka; Parham Aarabi; Ali Sheikholeslami
This paper proposes a new technique for face detection and lip feature extraction. A real-time field-programmable gate array (FPGA) implementation of the two proposed techniques is also presented. Face detection is based on a naive Bayes classifier that classifies an edge-extracted representation of an image. Using edge representation significantly reduces the models size to only 5184 B, which is 2417 times smaller than a comparable statistical modeling technique, while achieving an 86.6% correct detection rate under various lighting conditions. Lip feature extraction uses the contrast around the lip contour to extract the height and width of the mouth, metrics that are useful for speech filtering. The proposed FPGA system occupies only 15 050 logic cells, or about six times less than a current comparable FPGA face detection system
IEEE Transactions on Systems, Man, and Cybernetics | 2002
Parham Aarabi
This paper introduces a mechanism for localizing a microphone array when the location of sound sources in the environment is known. Using the proposed spatial observability function based microphone array integration technique, a maximum likelihood estimator for the correct position and orientation of the array is derived. This is used to localize and track a microphone array with a known and fixed geometrical structure, which can be viewed as the inverse sound localization problem. Simulations using a two-element dynamic microphone array illustrate the ability of the proposed technique to correctly localize and estimate the orientation of the array even in a very reverberant environment. Using 1 s male speech segments from three speakers in a 7 m by 6 m by 2.5 m simulated environment, a 30 cm inter-microphone distance, and PHAT histogram SLF generation, the average localization error was approximately 3 cm with an average orientation error of 19/spl deg/. The same simulation configuration but with 4 s speech segments results in an average localization error less than 1cm, with an average orientation error of approximately 2/spl deg/. Experimental examples illustrate localizations for both stationary and dynamic microphone pairs.
international conference on computer vision | 2009
Nevena Lazic; Inmar E. Givoni; Brendan J. Frey; Parham Aarabi
Subspace segmentation is the task of segmenting data lying on multiple linear subspaces. Its applications in computer vision include motion segmentation in video, structure-from-motion, and image clustering. In this work, we describe a novel approach for subspace segmentation that uses probabilistic inference via a message-passing algorithm.
IEEE Transactions on Audio, Speech, and Language Processing | 2006
Guangji Shi; Maryam Modir Shanechi; Parham Aarabi
In this paper, we analyze the effects of uncertainty in the phase of speech signals on the word recognition error rate of human listeners. The motivating goal is to get a quantitative measure on the importance of phase in automatic speech recognition by studying the effects of phase uncertainty on human perception. Listening tests were conducted for 18 listeners under different phase uncertainty and signal-to-noise ratio (SNR) conditions. These results indicate that a small amount of phase error or uncertainty does not affect the recognition rate, but a large amount of phase uncertainty significantly affects the recognition rate. The degree of the importance of phase also seems to be an SNR-dependent one, such that at lower SNRs the effects of phase uncertainty are more pronounced than at higher SNRs. For example, at an SNR of -10 dB, having random phases at all frequencies results in a word error rate (WER) of 63% compared to 24% if the phase was unaltered. In comparison, at 0 dB, random phase results in a 25% WER as compared to 11% for the unaltered phase case. Listening tests were also conducted for the case of reconstructed phase based on the least square error estimation approach. The results indicate that the recognition rate for the reconstructed phase case is very close to that of the perfect phase case (a WER difference of 4% on average)
Information Fusion | 2004
Qing Hua Wang; Teodor Ivanov; Parham Aarabi
Abstract This paper presents a method for the navigation of a mobile robot using sound localization in the context of a robotic lab tour guide. Sound localization, which is achieved using an array of 24 microphones distributed on two walls of the lab, is performed whenever the robot speaks as part of the tour. The SRP-PHAT sound localization algorithm is used to estimate the current location of the robot using approximately 2 s of recorded signal. Navigation is achieved using several stops during which the estimated location of the robot is used to make course adjustments. Experiments using the acoustic robot navigation system illustrate the accuracy of the proposed technique, which resulted in an average localization error of about 7 cm close to the array and 30 cm far away from the array.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2011
Alexandre Karpenko; Parham Aarabi
In this paper, we present a large database of over 50,000 user-labeled videos collected from YouTube. We develop a compact representation called “tiny videos” that achieves high video compression rates while retaining the overall visual appearance of the video as it varies over time. We show that frame sampling using affinity propagation - an exemplar-based clustering algorithm - achieves the best trade-off between compression and video recall. We use this large collection of user-labeled videos in conjunction with simple data mining techniques to perform related video retrieval, as well as classification of images and video frames. The classification results achieved by tiny videos are compared with the tiny images framework for a variety of recognition tasks. The tiny images data set consists of 80 million images collected from the Internet. These are the largest labeled research data sets of videos and images available to date. We show that tiny videos are better suited for classifying scenery and sports activities, while tiny images perform better at recognizing objects. Furthermore, we demonstrate that combining the tiny images and tiny videos data sets improves classification precision in a wider range of categories.
intelligent robots and systems | 2009
Anthony P. Badali; Jean-Marc Valin; François Michaud; Parham Aarabi
Although research on localization of sound sources using microphone arrays has been carried out for years, providing such capabilities on robots is rather new. Artificial audition systems on robots currently exist, but no evaluation of the methods used to localize sound sources has yet been conducted. This paper presents an evaluation of various real-time audio localization algorithms using a medium-sized microphone array which is suitable for applications in robotics. The techniques studied here are implementations and enhancements of steered response power - phase transform beamformers, which represent the most popular methods for time difference of arrival audio localization. In addition, two different grid topologies for implementing source direction search are also compared. Results show that a direction refinement procedure can be used to improve localization accuracy and that more efficient and accurate direction searches can be performed using a uniform triangular element grid rather than the typical rectangular element grid.