Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dimitri Kanevsky is active.

Publication


Featured researches published by Dimitri Kanevsky.


international conference on acoustics, speech, and signal processing | 2008

Boosted MMI for model and feature-space discriminative training

Daniel Povey; Dimitri Kanevsky; Brian Kingsbury; Bhuvana Ramabhadran; George Saon; Karthik Visweswariah

We present a modified form of the maximum mutual information (MMI) objective function which gives improved results for discriminative training. The modification consists of boosting the likelihoods of paths in the denominator lattice that have a higher phone error relative to the correct transcript, by using the same phone accuracy function that is used in Minimum Phone Error (MPE) training. We combine this with another improvement to our implementation of the Extended Baum-Welch update equations for MMI, namely the canceling of any shared part of the numerator and denominator statistics on each frame (a procedure that is already done in MPE). This change affects the Gaussian-specific learning rate. We also investigate another modification whereby we replace I-smoothing to the ML estimate with I-smoothing to the previous iterations value. Boosted MMI gives better results than MPE in both model and feature-space discriminative training, although not consistently.


IEEE Transactions on Information Theory | 1991

An inequality for rational functions with applications to some statistical estimation problems

Ponani S. Gopalakrishnan; Dimitri Kanevsky; Arthur Nádas; David Nahamoo

The well-known Baum-Eagon inequality (1967) provides an effective iterative scheme for finding a local maximum for homogeneous polynomials with positive coefficients over a domain of probability values. However, in many applications the goal is to maximize a general rational function. In view of this, the Baum-Eagon inequality is extended to rational functions. Some of the applications of this inequality to statistical estimation problems are briefly described. >


Journal of the Acoustical Society of America | 2004

System and method for indexing and querying audio archives

Dimitri Kanevsky; Stephane Herman Maes

A system and method for indexing segments of audio/multimedia files and data streams for storage in a database according to audio information such as speaker identity, the background environment and channel (music, street noise, car noise, telephone, studio noise, speech plus music, speech plus noise, speech over speech), and/or the transcription of the spoken utterances. The content or topic of the transcribed text can also be determined using natural language understanding to index based on the context of the transcription. A user can then retrieve desired segments of the audio file from the database by generating a query having one or more desired parameters based on the indexed information.


IEEE Transactions on Signal Processing | 2010

Methods for Sparse Signal Recovery Using Kalman Filtering With Embedded Pseudo-Measurement Norms and Quasi-Norms

Avishy Carmi; Pini Gurfil; Dimitri Kanevsky

We present two simple methods for recovering sparse signals from a series of noisy observations. The theory of compressed sensing (CS) requires solving a convex constrained minimization problem. We propose solving this optimization problem by two algorithms that rely on a Kalman filter (KF) endowed with a pseudo-measurement (PM) equation. Compared to a recently-introduced KF-CS method, which involves the implementation of an auxiliary CS optimization algorithm (e.g., the Dantzig selector), our method can be straightforwardly implemented in a stand-alone manner, as it is exclusively based on the well-known KF formulation. In our first algorithm, the PM equation constrains the l 1 norm of the estimated state. In this case, the augmented measurement equation becomes linear, so a regular KF can be used. In our second algorithm, we replace the l 1 norm by a quasi-norm lp , 0 ¿ p < 1. This modification considerably improves the accuracy of the resulting KF algorithm; however, these improved results require an extended KF (EKF) for properly computing the state statistics. A numerical study demonstrates the viability of the new methods.


Journal of the Acoustical Society of America | 2004

Language model adaptation via network of similar users

Dimitri Kanevsky; Catherine G. Wolf; Wlodek Zadrozny

A language recognition system, method and program product for recognizing language based input from computer users on a network of connected computers. Each computer includes at least one user based language model trained for a corresponding user for automatic speech recognition, handwriting recognition, machine translation, gesture recognition or other similar actions that require interpretation of user activities. Network computer users are clustered into classes of similar users according to user similarities such as, nationality, profession, sex, age, etc. User characteristics are collected by sensors and from databases and, then, distributed over the network during user activities. Language models with similarities among similar users on the network are identified. The language models include a language model domain, with similar language models being clustered according to their domains. Language models identified as similar are modified in response to user production activities. After modification of one language model, other identified similar language models are compared and adapted. Also, user data, including information about user activities and language model data, is transmitted over the network to other similar users. Language models are adapted only in response to similar user activities, when these activities are recorded and transmitted over the network. Language models are given a global context based on similar users that are connected together over the network.


international conference on acoustics, speech, and signal processing | 2010

Bayesian compressive sensing for phonetic classification

Tara N. Sainath; Avishy Carmi; Dimitri Kanevsky; Bhuvana Ramabhadran

In this paper, we introduce a novel bayesian compressive sensing (CS) technique for phonetic classification. CS is often used to characterize a signal from a few support training examples, similar to k-nearest neighbor (kNN) and Support Vector Machines (SVMs). However, unlike SVMs and kNNs, CS allows the number of supports to be adapted to the specific signal being characterized. On the TIMIT phonetic classification task, we find that our CS method outperforms the SVM, kNN and Gaussian Mixture Model (GMM) methods. Our CS method achieves an accuracy of 80.01%, one of the best reported result in the literature to date.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Exemplar-Based Sparse Representation Features: From TIMIT to LVCSR

Tara N. Sainath; Bhuvana Ramabhadran; Michael Picheny; David Nahamoo; Dimitri Kanevsky

The use of exemplar-based methods, such as support vector machines (SVMs), k-nearest neighbors (kNNs) and sparse representations (SRs), in speech recognition has thus far been limited. Exemplar-based techniques utilize information about individual training examples and are computationally expensive, making it particularly difficult to investigate these methods on large-vocabulary continuous speech recognition (LVCSR) tasks. While research in LVCSR provides a good testbed to tackle real-world speech recognition problems, research in this area suffers from two main drawbacks. First, the overall complexity of an LVCSR system makes error analysis quite difficult. Second, exploring new research ideas on LVCSR tasks involves training and testing state-of-the-art LVCSR systems, which can render a large turnaround time. This makes a small vocabulary task such as TIMIT more appealing. TIMIT provides a phonetically rich and hand-labeled corpus that allows easy insight into new algorithms. However, research ideas explored for small vocabulary tasks do not always provide gains on LVCSR systems. In this paper, we combine the advantages of using both small and large vocabulary tasks by taking well-established techniques used in LVCSR systems and applying them on TIMIT to establish a new baseline. We then utilize these existing LVCSR techniques in creating a novel set of exemplar-based sparse representation (SR) features. Using these existing LVCSR techniques, we achieve a phonetic error rate (PER) of 19.4% on the TIMIT task. The additional use of SR features reduce the PER to 18.6%. We then explore applying the SR features to a large vocabulary Broadcast News task, where we achieve a 0.3% absolute reduction in word error rate (WER).


Ibm Systems Journal | 2005

Accessibility, transcription, and access everywhere

Keith Bain; Sara H. Basson; Alexander Faisman; Dimitri Kanevsky

Accessibility in the workplace and in academic settings has increased dramatically for users with disabilities, driven by greater awareness, legislative mandate, and technological improvements. Gaps, however, remain. For persons who are deaf and hard of hearing in particular, full participation requires complete access to audio materials, both for live settings and for prerecorded audio and visual information. Even for users with adequate hearing, captioned or transcribed materials offer another modality for information access, one that can be particularly useful in certain situations, such as listening in noisy environments, interpreting speakers with strong accents, or searching audio media for specific information. Providing this level of access through fully automated means is currently beyond the state of the art. This paper details a number of key advances in audio access that have occurred over the last five years. We describe the Liberated Learning Project, a consortium of universities worldwide, which is piloting technologies to create real-time access for students who are deaf and hard of hearing, without intermediary assistance. In support of this project, IBM Research has created the ViaScribeTM tool that converts speech recognition output to a viable captioning interface. Additional inventions and incremental improvements to speech recognition for captioning are described, as well as future directions.


IEEE Signal Processing Magazine | 2012

Exemplar-Based Processing for Speech Recognition: An Overview

Tara N. Sainath; Bhuvana Ramabhadran; David Nahamoo; Dimitri Kanevsky; Dirk Van Compernolle; Kris Demuynck; Jort F. Gemmeke; Jerome R. Bellegarda; Shiva Sundaram

Solving real-world classification and recognition problems requires a principled way of modeling the physical phenomena generating the observed data and the uncertainty in it. The uncertainty originates from the fact that many data generation aspects are influenced by nondirectly measurable variables or are too complex to model and hence are treated as random fluctuations. For example, in speech production, uncertainty could arise from vocal tract variations among different people or corruption by noise. The goal of modeling is to establish a generalization from the set of observed data such that accurate inference (classification, decision, recognition) can be made about the data yet to be observed, which we refer to as unseen data.


international conference on acoustics speech and signal processing | 1988

Decoder selection based on cross-entropies

Ponani S. Gopalakrishnan; Dimitri Kanevsky; Arthur Nádas; David Nahamoo; Michael Picheny

The authors generalize the maximum likelihood and related optimization criteria for training and decoding with a speech recognizer. The generalizations are constructed by considering weighted linear combinations of the logarithms of the likelihoods of words, of acoustics, and of (word, acoustic) pairs. The utility of various patterns of weights are examined.<<ETX>>

Collaboration


Dive into the Dimitri Kanevsky's collaboration.

Researchain Logo
Decentralizing Knowledge