Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Nikki Mirghafori is active.

Publication


Featured researches published by Nikki Mirghafori.


acm multimedia | 2007

Using audio and video features to classify the most dominant person in a group meeting

Hayley Hung; Dinesh Babu Jayagopi; Chuohao Yeo; Gerald Friedland; Silèye O. Ba; Jean-Marc Odobez; Kannan Ramchandran; Nikki Mirghafori; Daniel Gatica-Perez

The automated extraction of semantically meaningful information from multi-modal data is becoming increasingly necessary due to the escalation of captured data for archival. A novel area of multi-modal data labelling, which has received relatively little attention, is the automatic estimation of the most dominant person in a group meeting. In this paper, we provide a framework for detecting dominance in group meetings using different audio and video cues. We show that by using a simple model for dominance estimation we can obtain promising results.


international conference on acoustics speech and signal processing | 1996

Towards robustness to fast speech in ASR

Nikki Mirghafori; Eric Fosler; Nelson Morgan

Psychoacoustic studies show that human listeners are sensitive to speaking rate variations. Automatic speech recognition (ASR) systems are even more affected by the changes in rate, as double to quadruple word recognition error rates of average speakers have been observed for fast speakers on many ASR systems. In our earlier work (see Proceedings of EUROSPEECH95, p.491-4, 1995), we studied the causes of higher error and concluded that both the acoustic-phonetic and the phonological differences are sources of higher word error rates. In this work, we have studied various measures for quantifying rate of speech (ROS) and used simple methods for estimating the speaking rate of a novel utterance using ASR technology. We have also implemented mechanisms that make our ASR system more robust to fast speech. Using our ROS estimator to identify fast sentences in the test set, our rate-dependent system has 24.5% fewer errors on the fastest sentences and 6.2% fewer errors on all sentences of the WSJ93 evaluation set relative to the baseline HMM/MLP system.


ieee automatic speech recognition and understanding workshop | 2007

A fast-match approach for robust, faster than real-time speaker diarization

Yan Huang; Oriol Vinyals; Gerald Friedland; Christian A. Müller; Nikki Mirghafori; Chuck Wooters

During the past few years, speaker diarization has achieved satisfying accuracy in terms of speaker Diarization Error Rate (DER). The most successful approaches, based on agglomerative clustering, however, exhibit an inherent computational complexity which makes real-time processing, especially in combination with further processing steps, almost impossible. In this article we present a framework to speed up agglomerative clustering speaker diarization. The basic idea is to adopt a computationally cheap method to reduce the hypothesis space of the more expensive and accurate model selection via Bayesian Information Criterion (BIC). Two strategies based on the pitch-correlogram and the unscented-trans-form based approximation of KL-divergence are used independently as a fast-match approach to select the most likely clusters to merge. We performed the experiments using the existing ICSI speaker diarization system. The new system using KL-divergence fast-match strategy only performs 14% of total BIC comparisons needed in the baseline system, speeds up the system by 41% without affecting the speaker Diarization Error Rate (DER). The result is a robust and faster than real-time speaker diarization system.


international conference on acoustics speech and signal processing | 1998

Transmissions and transitions: a study of two common assumptions in multi-band ASR

Nikki Mirghafori; Nelson Morgan

Is multi-band automatic speech recognition (ASR) inherently inferior to a full-band approach because phonetic information is lost due to the division of the frequency space into sub-bands? Do the phonetic transitions in sub-bands occur at different times? The first statement is a common objection of the critics of multi-band ASR, and the second, a common assumption by multi-band researchers. This paper is dedicated to finding answers to both these questions. To study the first point, we calculate phonetic feature transmission for sub-bands. Not only do we fail to substantiate the above objection, but we observe the contrary. We confirm the second hypothesis by analyzing the phonetic transition lags in each sub-band. These results reinforce our view that multi-band speech analysis provides useful information for ASR, particularly when band merging takes place at the end state for a phonetic or syllabic model, allowing sub-bands to be independently time-aligned within the model.


ACM Sigarch Computer Architecture News | 1995

Truth in SPEC benchmarks

Nikki Mirghafori; Margret Jacoby; David A. Patterson

The System Performance Evaluation Cooperative (SPEC) benchmarks are a set of integer and floating-point programs that are intended to be “effective and fair in comparing the performance of high performance computing systems”. SPEC ratings are often quoted in company advertising and have been trusted as the de facto measure of comparison for computer systems. Recently, there has been some concern regarding the fairness and the value of these benchmarks for comparing computer systems. In this paper we investigate the following two questions regarding the SPEC92 benchmark suite: 1) How sensitive are the SPEC ratings to various tunings? 2) How reproducible are the published results? For six vendors, we compare the published SPECpeak and SPECbase ratings, and observe an 11% average improvement in the SPECpeak ratings due to changes in the compiler flags alone. In our own attempt to reproduce the published SPEC ratings, we came across various “explicit” and “hidden” tuning parameters that we consider unrealistic. We suggest a new unit called SPECsimple that requires using only the -O compiler optimization flag, shared libraries, and standard system configuration. SPECsimple is designed to better match the performance experienced by a typical user. Our measured SPECsimples are 65-86% of the advertised SPECpeak performance. We conclude this paper by citing cases compiler optimizations specifically designed for SPEC programs, in which performance decreases drastically or the computed results are incorrect if the compiled program does not exactly match the SPEC benchmark program. These findings show that the fairness and value of the popular SPEC benchmarks are questionable.


international conference on acoustics, speech, and signal processing | 2007

Word-Conditioned Phone N-Grams for Speaker Recognition

Howard Lei; Nikki Mirghafori

We extend the state-of-the-art by applying word-conditioning to constrain phone N-gram features used in speaker recognition. Feature-level combination of 52 word unigrams constraining phone N-grams of order 1, 2, and 3 proved to be the best approach. Our system achieves 18% and 27% improvements compared to a non word-conditioned phone N-grams system on SRE05 and SRE06, respectively. Furthermore, the system achieves 18% and 37% improvements compared to the non word-conditioned phone N-grams system when each system is combined with a GMM-based system on SRE05 and SRE06, suggesting that the word-conditioned features are more complementary. On both corpora, this approach achieves a 4.7% EER standalone, and a 3.3% EER in combination with the non word-conditioned phone N-grams and GMM-based systems. Note that the word-conditioning approach utilizes only 43% of SRE05 data.


international conference on acoustics, speech, and signal processing | 2012

Spectro-temporal Gabor features for speaker recognition

Howard Lei; Bernd T. Meyer; Nikki Mirghafori

In this work, we have investigated the performance of 2D Gabor features (known as spectro-temporal features) for speaker recognition. Gabor features have been used mainly for automatic speech recognition (ASR), where they have yielded improvements. We explored different Gabor feature implementations, along with different speaker recognition approaches, on ROSSI [1] and NIST SRE08 databases. Using the noisy ROSSI database, the Gabor features performed as well as the MFCC features standalone, and score-level combination of Gabor and MFCC features resulted in an 8% relative EER improvement over MFCC features standalone. These results demonstrated the value of both spectral and temporal information for feature extraction, and the complementarity of Gabor features to MFCC features.


workshop on mobile computing systems and applications | 1994

A Design for File Access in a Mobile Environment

Nikki Mirghafori; Anne Fontaine

Reducing communication cost is important-in a mobile enuironment, because transmission of data over the radio link is slow, expensive, and unreliable. Providing data consistencg is crucial because-many mobile applilcations are database applications which rely on consistent data. In this paper we propose a design for a fiIe access mechanism specific to a mobile environment. Our two main design goals are to reduce communication cost and to provide data consistency. We reduce the amount of communication by extensive use of caching, profile inforrnation (files the user accesses), a proxy process, delayed writes, and availability of loose-reads. We provide data consistency by using the prony services, a centralized data rnanager with callbacks, and strict-reads. Our design is general enough to suit the needs of both mobile database and office applicati,ons.


international conference on acoustics, speech, and signal processing | 2004

Parameterization of the score threshold for a text-dependent adaptive speaker verification system

Nikki Mirghafori; Matthieu Hebert

We present a computationally efficient strategy for setting a priori thresholds in an adaptive speaker verification system. We have two motivations: to eliminate the externally preset overall system thresholds and replace them with automatically-set internal thresholds conditioned by a target FA rate and calculated at runtime; to counter the verification score shifts resulting from online adaptation. Our approach entails calculating the trajectory of the score threshold as a function of 1) length of the password, 2) target FA, 3) the number of training frames in the speaker model. The solution is successful at both achieving the target FA rates and keeping the FA rate constant during online adaptation. Furthermore, it is algorithmically simple and requires negligible computational resources. The threshold function is calibrated on a Japanese database and experimental results are presented on 12 databases in four different languages.


international conference on acoustics, speech, and signal processing | 2007

Entropy Based Classifier Combination for Sentence Segmentation

Mathew Magimai-Doss; Dilek Hakkani-Tür; Özgür Çetin; Elizabeth Shriberg; James G. Fung; Nikki Mirghafori

We describe recent extensions to our previous work, where we explored the use of individual classifiers, namely, boosting and maximum entropy models for sentence segmentation. In this paper we extend the set of classification methods with support vector machine (SVM). We propose a new dynamic entropy-based classifier combination approach to combine these classifiers, and compare it with the traditional classifier combination techniques, namely, voting, linear regression and logistic regression. Furthermore, we also investigate the combination of hidden event language models with the output of the proposed classifier combination, and the output of individual classifiers. Experimental studies conducted on the Mandarin TDT4 broadcast news database shows that the SVM classifier as an individual classifier improves over our previous best system. However, the proposed entropy-based classifier combination approach shows the best improvement in F-measure of 1% absolute, and the voting approach shows the best reduction in NIST error rate of 2.7% absolute when compared to the previous best system.

Collaboration


Dive into the Nikki Mirghafori's collaboration.

Top Co-Authors

Avatar

Nelson Morgan

University of California

View shared research outputs
Top Co-Authors

Avatar

Chuck Wooters

International Computer Science Institute

View shared research outputs
Top Co-Authors

Avatar

Howard Lei

International Computer Science Institute

View shared research outputs
Top Co-Authors

Avatar

Barbara Peskin

University of California

View shared research outputs
Top Co-Authors

Avatar

Gerald Friedland

International Computer Science Institute

View shared research outputs
Top Co-Authors

Avatar

Mary Tai Knox

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ivan Bulyko

University of Washington

View shared research outputs
Researchain Logo
Decentralizing Knowledge