Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ananth N. Iyer is active.

Publication


Featured researches published by Ananth N. Iyer.


international conference on acoustics, speech, and signal processing | 2006

Emotion Detection From Infant Facial Expressions And Cries

Pritam Pal; Ananth N. Iyer; Robert E. Yantorno

A new system for translating the infant cries from its facial image and cry sounds is presented in this paper. The system is designed to analyze the facial image and sound of the crying infant to derive the reason why the infant is crying. The image and the sound represent the same cry event. The image processing module determines the state of certain facial features, certain combinations of which determine the reason for crying. The sound processing module analyzes the data for the fundamental frequency and the first two formants and uses k-means clustering to determine the reason of the cry. The decisions from the image and sound processing modules are then fused using a decision level fusion system. The overall accuracy of the image and sound processing modules are 64% and 74.2%, respectively, and that of the fused decision is 75.2%


international conference on acoustics, speech, and signal processing | 2006

A Novel Approach to Automated Source Separation in Multispeaker Environments

Robert M. Nickel; Ananth N. Iyer

We are proposing a new approach to the solution of the cocktail party problem (CPP). The goal of the CPP is to isolate the speech signals of individuals who are concurrently talking while being recorded with a properly positioned microphone array. The new approach provides a powerful yet simple alternative to commonly used methods for the separation of speakers. It is based on the observation that the estimation of the signal transfer matrix between speakers and microphones is significantly simplified if one can assure that during certain periods of the conversation only one speaker is active while all other speakers are silent. Methods to determine such exclusive activity periods are described and a procedure to estimate the signal transfer matrix is presented. A comparison of the proposed method with other popular source separation methods is drawn. The results show an improved performance of the proposed method over earlier approaches


international symposium on intelligent signal processing and communication systems | 2006

A Speaker Count System for Telephone Conversations

Uchechukwu O. Ofoegbu; Ananth N. Iyer; Robert E. Yantorno; Brett Y. Smolenski

In telephone conversations, only short consecutive utterances can be examined for each speaker, therefore, discriminating between speakers in such conversations is a challenging task which becomes even more challenging when no information about the speakers is known a priori. In this paper, a technique for determining the number of speakers participating in a telephone conversation is presented. This approach assumes no knowledge or information about any of the participating speakers. The technique is based on comparing short utterances within the conversation and deciding whether or not they belong to the same speaker. The applications of this research include three-way call detection and speaker tracking, and could be extended to speaker change-point detection and indexing. The proposed method involves an elimination process in which speech segments matching a chosen set of reference models are sequentially removed from the conversation. Models are formed using the mean vectors and covariance matrices of linear predictive cepstral coefficients of voiced segments in the conversation. The use of the Mahalanobis distance to determine if two models belong to the same or to different speakers, based on likelihood ratio testing, is investigated. The relative amount of residual speech is observed after each elimination process to determine if an additional speaker is present. Experimentation was performed on 4000 artificial conversations from the HTIMIT database. The proposed system was able to yield an average speaker count accuracy of 78%


international symposium on intelligent signal processing and communication systems | 2006

Blind Speaker Clustering

Ananth N. Iyer; Uchechukwu O. Ofoegbu; Robert E. Yantorno; Brett Y. Smolenski

A novel approach to performing speaker clustering in telephone conversations is presented in this paper. The method is based on a simple observation that the distance between populations of feature vectors extracted from different speakers is greater than a preset threshold. This observation is incorporated into the clustering problem by the formulation of a constrained optimization problem. A modified c-means algorithm is designed to solve the optimization problem. Another key aspect in speaker clustering is to determine the number of clusters, which is either assumed or expected as an input in traditional methods. The proposed method does not require such information; instead, the number of clusters is automatically determined from the data. The performance of the proposed algorithm with the Hellinger, Bhattacharyya, Mahalanobis and the generalized likelihood ratio distance measures is evaluated and compared. The approach, employing the Hellinger distance, resulted in an average cluster purity value of 0.85 from experiments performed using the switchboard telephone conversation al speech database. The result indicates a 9% relative improvement in the average cluster purity as compared to the best performing agglomerative clustering system


international symposium on intelligent signal processing and communication systems | 2006

Generic Modeling Applied to Speaker Count

Ananth N. Iyer; Uchechukwu O. Ofoegbu; Robert E. Yantorno; Brett Y. Smolenski

The problem of determining the number of speakers participating in a conversation and building their models in short conversations, within an unknown group of speakers, is addressed in this paper. The lack of information about the number of speakers and the unavailability of sufficient data present a challenging task of efficiently estimating the speaker model parameters. The proposed method uses a novel generic speaker identification (GSID) system as a guide in the model building process. The GSID system is designed performing speaker identification where the speaker associated with the test data may not be enrolled. The models in the GSID system are employed as initial speaker models, representing the persons participating in the conversation, and are subjected to a classification-adaptation procedure. The classification is performed based on the Bhattacharyya distance between the model database and the test data being analyzed. The model database of the system is designed to consist of simple and well separated models. A technique to generate such generic models is introduced. The proposed method was applied to the speaker count problem and has produced an overall accuracy of 75.3% in determining if there were 1, 2 or 3 speakers in a conversation


International Journal of Speech Technology | 2007

Speaker distinguishing distances: a comparative study

Ananth N. Iyer; Uchechukwu O. Ofoegbu; Robert E. Yantorno; Brett Y. Smolenski

Speaker discrimination is a vital aspect of speaker recognition applications such as speaker identification, verification, clustering, indexing and change-point detection. These tasks are usually performed using distance-based approaches to compare speaker models or features from homogeneous speaker segments in order to determine whether or not they belong to the same speaker. Several distance measures and features have been examined for all the different applications, however, no single distance or feature has been reported to perform optimally for all applications in all conditions. In this paper, a thorough analysis is made to determine the behavior of some frequently used distance measures, as well as features, in distinguishing speakers for different data lengths. Measures studied include the Mahalanobis distance, Kullback-Leibler (KL) distance, T2 statistic, Hellinger distance, Bhattacharyya distance, Generalized Likelihood Ratio (GLR), Levenne distance, L2 and L∞ distances. The Mel-Scale Frequency Cepstral Coefficient (MFCC), Linear Predictive Cepstral Coefficients (LPCC), Line Spectral Pairs (LSP) and the Log Area Ratios (LAR) comprise the features investigated. The usefulness of these measures is studied for different data lengths. Generally, a larger data size for each speaker results in better speaker differentiating capability, as more information can be taken into account. However, in some applications such as segmentation of telephone data, speakers change frequently, making it impossible to obtain large speaker-consistent utterances (especially when speaker change-points are unknown). A metric is defined for determining the probability of speaker discrimination error obtainable for each distance measure using each feature set, and the effect of data size on this probability is observed. Furthermore, simple distance-based speaker identification and clustering systems are developed, and the performances of each distance and feature for various data sizes are evaluated on these systems in order to illustrate the importance of choosing the appropriate distance and feature for each application. Results show that for tasks which do not involve any limitation of data length, such as speaker identification, the Kullback Leibler distance with the MFCCs yield the highest speaker differentiation performance, which is comparable to results obtained using more complex state-of-the-art speaker identification systems. Results also indicate that the Hellinger and Bhattacharyya distances with the LSPs yield the best performance for small data sizes.


european signal processing conference | 2004

Robust speaker verification with principal pitch components

Robert M. Nickel; Sachin Oswal; Ananth N. Iyer

We are presenting a new method that improves the accuracy of text dependent speaker verification systems. The new method exploits a set of novel speech features derived from a principal component analysis of pitch synchronous voiced speech segments. We use the term principal pitch components (PPCs) or optimal pitch bases (OPBs) to denote the new feature set. Utterance distances computed from these new PPC features are only loosely correlated with utterance distances computed from cepstral features. A distance measure that combines both cepstral and PPC features provides a discriminative power that cannot be achieved with cepstral features alone. By augmenting the feature space of a cepstral baseline system with PPC features we achieve a significant reduction of the equal error probability of incorrect customer rejection versus incorrect impostor acceptance. The proposed method delivers robust performance in various noise conditions.


conference of the international speech communication association | 2016

Generation and Pruning of Pronunciation Variants to Improve ASR Accuracy.

Zhenhao Ge; Aravind Ganapathiraju; Ananth N. Iyer; Scott Allen Randal; Felix Immanuel Wyss

Speech recognition, especially name recognition, is widely used in phone services such as company directory dialers, stock quote providers or location finders. It is usually challenging due to pronunciation variations. This paper proposes an efficient and robust data-driven technique which automatically learns acceptable word pronunciations and updates the pronunciation dictionary to build a better lexicon without affecting recognition of other words similar to the target word. It generalizes well on datasets with various sizes, and reduces the error rate on a database with 13000+ human names by 42%, compared to a baseline with regular dictionaries already covering canonical pronunciations of 97%+ words in names, plus a well-trained spelling-to-pronunciation (STP) engine.


ieee aerospace conference | 2007

Speaker Recognition in Adverse Conditions

Ananth N. Iyer; Uchechukwu O. Ofoegbu; Robert E. Yantorno; Stanley J. Wenndt

Recognizing speakers from their voices is a challenging area of research with several practical applications. Presently speaker verification (SV) systems achieve a high level of accuracy under ideal conditions such as, when there is ample data to build speaker models and when speaker verification is performed in the presence of little or no interference. In general, these systems assume that the features extracted from the data follow a particular parametric probability density function (pdf), i.e., Gaussian or a mixture of Gaussians; where a form of the pdf is imposed on the speech data rather than determining the underlying structure of the pdf. In practical conditions, like in an aircraft cockpit where most of the verbal communication is in the form of short commands, it is almost impossible to ascertain that the assumptions made about the structure of the pdf are correct, and wrong assumptions could lead to significant reduction in performance of the SV system. In this research, non-parametric strategies, to statistically model speakers are developed and evaluated. Non-parametric density estimation methods are generally known to be superior when limited data is available for model building and SV. Experimental evaluation has shown that the non-parametric system yielded a 70% accuracy level in speaker verification with only 0.5 seconds of data and under the influence of noise with signal-to-noise ratio of 5dB. This result corresponds to a 20% decrease in error when compared to the parametric system.


Archive | 2003

Usable Speech Detection Using Linear Predictive Analysis - A Model-Based Approach

Nithya Sundaram; Robert E. Yantorno; Brett Y. Smolenski; Ananth N. Iyer; Norris Streets

Collaboration


Dive into the Ananth N. Iyer's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Stanley J. Wenndt

Air Force Research Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Edward J. Cupples

Air Force Research Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge