Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Bhiksha Raj is active.

Publication


Featured researches published by Bhiksha Raj.


IEEE Signal Processing Magazine | 2005

Missing-feature approaches in speech recognition

Bhiksha Raj; Richard M. Stern

In this article we have reviewed a wide variety of techniques based on the identification of missing spectral features that have proved effective in reducing the error rates of automatic speech recognition systems. These approaches have been conspicuously effective in ameliorating the effects of transient maskers such as impulsive noise or background music. We described two broad classes of missing feature algorithms: feature-vector imputation algorithms (which restore unreliable components of incoming feature vectors) and classifier-modification algorithms (which dynamically reconfigure the classifier itself to cope with the effects of unreliable feature components). We reviewed the mathematics of four major missing feature techniques: the feature-imputation techniques of cluster-based reconstruction and covariance-based reconstruction, and the classifier-modification methods of class-conditional imputation and marginalization. We also discussed the ways in which the common feature extraction procedures of cepstral analysis, temporal-difference features, and mean subtraction can be handled by speech recognition systems that make use of missing feature techniques. We concluded with a discussion of a small number of selected experimental results. These results confirm the effectiveness of all types of missing feature approaches discussed in ameliorating the effects of both stationary and transient noise, as well as the particular effectiveness of both soft masks and fragment decoding.


Speech Communication | 2004

Reconstruction of missing features for robust speech recognition

Bhiksha Raj; Michael L. Seltzer; Richard M. Stern

Speech recognition systems perform poorly in the presence of corrupting noise. Missing feature methods attempt to compensate for the noise by removing noise corrupted components of spectrographic representations of noisy speech and performing recognition with the remaining reliable components. Conventional classifier-compensation methods modify the recognition system to work with the incomplete representations so obtained. This constrains them to perform recognition using spectrographic features which are known to be less optimal than cepstra. In this paper we present two missing-feature algorithms that reconstruct complete spectrograms from incomplete noisy ones. Cepstral vectors can now be derived from the reconstructed spectrograms for recognition. The first algorithm uses MAP procedures to estimate corrupt components from their correlations with reliable components. The second algorithm clusters spectral vectors of clean speech. Corrupt components of noisy speech are estimated from the distribution of the cluster that the analysis frame is identified with. Experiments show that, although conventional classifier-compensation methods are superior when recognition is performed with spectrographic features, cepstra derived from the reconstructed spectrograms result in better recognition performance overall. The proposed methods are also less expensive computationally and do not require modification of the recognizer.


international conference on independent component analysis and signal separation | 2007

Supervised and semi-supervised separation of sounds from single-channel mixtures

Paris Smaragdis; Bhiksha Raj; Madhusudana Shashanka

In this paper we describe a methodology for model-based single channel separation of sounds. We present a sparse latent variable model that can learn sounds based on their distribution of time/ frequency energy. This model can then be used to extract known types of sounds from mixtures in two scenarios. One being the case where all sound types in the mixture are known, and the other being being the case where only the target or the interference models are known. The model we propose has close ties to non-negative decompositions and latent variable models commonly used for semantic analysis.


Speech Communication | 2004

A Bayesian Classifier for Spectrographic Mask Estimation for Missing Feature Speech Recognition

Michael L. Seltzer; Bhiksha Raj; Richard M. Stern

Missing feature methods of noise compensation for speech recognition operate by first identifying components of a spectrographic representation of speech that are considered to be corrupt. Recognition is then performed either using only the remaining reliable components, or the corrupt components are reconstructed prior to recognition. These methods require a spectrographic mask which accurately labels the reliable and corrupt regions of the spectrogram. Depending on the missing feature method applied, these masks must either contain binary values or probabilistic values. Current mask estimation techniques rely on explicit estimation of the characteristics of the corrupting noise. The estimation process usually assumes that the noise is pseudo-stationary or varies slowly with time. This is a significant drawback since the missing feature methods themselves have no such restrictions. We present a new mask estimation technique that uses a Bayesian classifier to determine the reliability of spectrographic elements. Features used for classification were designed that make no assumptions about the corrupting noise signal, but rather exploit characteristics of the speech signal itself. Experiments were performed on speech corrupted by a variety of noises, using missing feature compensation methods which require binary masks and probabilistic masks. In all cases, the proposed Bayesian mask estimation method resulted in significantly better recognition accuracy than conventional mask estimation approaches.


computer vision and pattern recognition | 2015

Beyond Gaussian Pyramid: Multi-skip Feature Stacking for action recognition

Zhenzhong Lan; Ming Lin; Xuanchong Li; Alexander G. Hauptmann; Bhiksha Raj

Most state-of-the-art action feature extractors involve differential operators, which act as highpass filters and tend to attenuate low frequency action information. This attenuation introduces bias to the resulting features and generates ill-conditioned feature matrices. The Gaussian Pyramid has been used as a feature enhancing technique that encodes scale-invariant characteristics into the feature space in an attempt to deal with this attenuation. However, at the core of the Gaussian Pyramid is a convolutional smoothing operation, which makes it incapable of generating new features at coarse scales. In order to address this problem, we propose a novel feature enhancing technique called Multi-skIp Feature Stacking (MIFS), which stacks features extracted using a family of differential filters parameterized with multiple time skips and encodes shift-invariance into the frequency space. MIFS compensates for information lost from using differential operators by recapturing information at coarse scales. This recaptured information allows us to match actions at different speeds and ranges of motion. We prove that MIFS enhances the learnability of differential-based features exponentially. The resulting feature matrices from MIFS have much smaller conditional numbers and variances than those from conventional methods. Experimental results show significantly improved performance on challenging action recognition and event detection tasks. Specifically, our method exceeds the state-of-the-arts on Hollywood2, UCF101 and UCF50 datasets and is comparable to state-of-the-arts on HMDB51 and Olympics Sports datasets. MIFS can also be used as a speedup strategy for feature extraction with minimal or no accuracy cost.


IEEE Transactions on Speech and Audio Processing | 2004

Likelihood-maximizing beamforming for robust hands-free speech recognition

Michael L. Seltzer; Bhiksha Raj; Richard M. Stern

Speech recognition performance degrades significantly in distant-talking environments, where the speech signals can be severely distorted by additive noise and reverberation. In such environments, the use of microphone arrays has been proposed as a means of improving the quality of captured speech signals. Currently, microphone-array-based speech recognition is performed in two independent stages: array processing and then recognition. Array processing algorithms, designed for signal enhancement, are applied in order to reduce the distortion in the speech waveform prior to feature extraction and recognition. This approach assumes that improving the quality of the speech waveform will necessarily result in improved recognition performance and ignores the manner in which speech recognition systems operate. In this paper a new approach to microphone-array processing is proposed in which the goal of the array processing is not to generate an enhanced output waveform but rather to generate a sequence of features which maximizes the likelihood of generating the correct hypothesis. In this approach, called likelihood-maximizing beamforming, information from the speech recognition system itself is used to optimize a filter-and-sum beamformer. Speech recognition experiments performed in a real distant-talking environment confirm the efficacy of the proposed approach.


Computational Intelligence and Neuroscience | 2008

Probabilistic latent variable models as nonnegative factorizations

Madhusudana V. S. Shashanka; Bhiksha Raj; Paris Smaragdis

This paper presents a family of probabilistic latent variable models that can be used for analysis of nonnegative data. We show that there are strong ties between nonnegative matrix factorization and this family, and provide some straightforward extensions which can help in dealing with shift invariances, higher-order decompositions and sparsity constraints. We argue through these extensions that the use of this approach allows for rapid development of complex statistical models for analyzing nonnegative data.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Soft Mask Methods for Single-Channel Speaker Separation

Aarthi M. Reddy; Bhiksha Raj

The problem of single-channel speaker separation attempts to extract a speech signal uttered by the speaker of interest from a signal containing a mixture of acoustic signals. Most algorithms that deal with this problem are based on masking, wherein unreliable frequency components from the mixed signal spectrogram are suppressed, and the reliable components are inverted to obtain the speech signal from speaker of interest. Most current techniques estimate this mask in a binary fashion, resulting in a hard mask. In this paper, we present two techniques to separate out the speech signal of the speaker of interest from a mixture of speech signals. One technique estimates all the spectral components of the desired speaker. The second technique estimates a soft mask that weights the frequency subbands of the mixed signal. In both cases, the speech signal of the speaker of interest is reconstructed from the complete spectral descriptions obtained. In their native form, these algorithms are computationally expensive. We also present fast factored approximations to the algorithms. Experiments reveal that the proposed algorithms can result in significant enhancement of individual speakers in mixed recordings, consistently achieving better performance than that obtained with hard binary masks.


international conference on acoustics, speech, and signal processing | 2001

Speech in Noisy Environments: robust automatic segmentation, feature extraction, and hypothesis combination

Rita Singh; Michael L. Seltzer; Bhiksha Raj; Richard M. Stern

The first evaluation for Speech in Noisy Environments (SPINE1) was conducted by the Naval Research Labs (NRL) in August, 2000. The purpose of the evaluation was to test existing core speech recognition technologies for speech in the presence of varying types and levels of noise. In this case the noises were taken from military settings. Among the strategies used by Carnegie Mellon Universitys successful systems designed for this task were session-adaptive segmentation, robust mel-scale filtering for the computation of cepstra, the use of parallel front-end features and noise-compensation algorithms, and parallel hypotheses combination through word-graphs. This paper describes the motivations behind the design decisions taken for these components, supported by observations and experiments.


computer vision and pattern recognition | 2017

SphereFace: Deep Hypersphere Embedding for Face Recognition

Weiyang Liu; Yandong Wen; Zhiding Yu; Ming Li; Bhiksha Raj; Le Song

This paper addresses deep face recognition (FR) problem under open-set protocol, where ideal face features are expected to have smaller maximal intra-class distance than minimal inter-class distance under a suitably chosen metric space. However, few existing algorithms can effectively achieve this criterion. To this end, we propose the angular softmax (A-Softmax) loss that enables convolutional neural networks (CNNs) to learn angularly discriminative features. Geometrically, A-Softmax loss can be viewed as imposing discriminative constraints on a hypersphere manifold, which intrinsically matches the prior that faces also lie on a manifold. Moreover, the size of angular margin can be quantitatively adjusted by a parameter m. We further derive specific m to approximate the ideal feature criterion. Extensive analysis and experiments on Labeled Face in the Wild (LFW), Youtube Faces (YTF) and MegaFace Challenge 1 show the superiority of A-Softmax loss in FR tasks.

Collaboration


Dive into the Bhiksha Raj's collaboration.

Top Co-Authors

Avatar

Rita Singh

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Richard M. Stern

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Anurag Kumar

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Isabel Trancoso

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Manas A. Pathak

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

John W. McDonough

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Benjamin Elizalde

International Computer Science Institute

View shared research outputs
Researchain Logo
Decentralizing Knowledge