Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Miroslav Novak is active.

Publication


Featured researches published by Miroslav Novak.


international conference on acoustics, speech, and signal processing | 1995

Performance of the IBM large vocabulary continuous speech recognition system on the ARPA Wall Street Journal task

Lalit R. Bahl; S. Balakrishnan-Aiyer; J.R. Bellgarda; Martin Franz; Ponani S. Gopalakrishnan; David Nahamoo; Miroslav Novak; Mukund Padmanabhan; Michael Picheny; Salim Roukos

In this paper we discuss various experimental results using our continuous speech recognition system on the Wall Street Journal task. Experiments with different feature extraction methods, varying amounts and type of training data, and different vocabulary sizes are reported.


Journal of the Acoustical Society of America | 2005

Method and apparatus for translating natural-language speech using multiple output phrases

Raimo Bakis; Mark E. Epstein; William Stuart Meisel; Miroslav Novak; Michael Picheny; Ridley M. Whitaker

A multi-lingual translation system that provides multiple output sentences for a given word or phrase. Each output sentence for a given word or phrase reflects, for example, a different emotional emphasis, dialect, accents, loudness or rates of speech. A given output sentence could be selected automatically, or manually as desired, to create a desired effect. For example, the same output sentence for a given word or phrase can be recorded three times, to selectively reflect excitement, sadness or fear. The multi-lingual translation system includes a phrase-spotting mechanism, a translation mechanism, a speech output mechanism and optionally, a language understanding mechanism or an event measuring mechanism or both. The phrase-spotting mechanism identifies a spoken phrase from a restricted domain of phrases. The language understanding mechanism, if present, maps the identified phrase onto a small set of formal phrases. The translation mechanism maps the formal phrase onto a well-formed phrase in one or more target languages. The speech output mechanism produces high-quality output speech. The speech output may be time synchronized to the spoken phrase using the output of the event measuring mechanism.


international conference on acoustics, speech, and signal processing | 2001

Use of non-negative matrix factorization for language model adaptation in a lecture transcription task

Miroslav Novak; Richard J. Mammone

Introduces the non-negative matrix factorization for language model adaptation. This approach is an alternative to latent semantic analysis based language modeling using singular value decomposition with several benefits. A new method, which does not require an explicit document segmentation of the training corpus is presented as well. This method resulted in a perplexity reduction of 16% on a database of biology lecture transcriptions.


international conference on acoustics, speech, and signal processing | 2003

Two-pass search strategy for large list recognition on embedded speech recognition platforms

Miroslav Novak; Radek Hampl; Pavel Krbec; Vladimir Bergl; Jan Sedivy

This paper presents an efficient algorithm for a speech recognition system which can process large lists of items. The described two-pass search implementation focuses on maximizing the speed and minimizing the memory footprint of the search engine. The algorithm is designed to handle thousands or tens of thousands of words in a search space restricted by a grammar. A typical example of such a task is stock name recognition, street name finding, song selection etc. The intended application of this algorithm is in embedded ASR system in portable devices (e.g. iPAQ) or cars.


international conference on acoustics, speech, and signal processing | 2001

Speech recognition for DARPA Communicator

Andrew Aaron; Scott Saobing Chen; Paul S. Cohen; Satya Dharanipragada; Ellen Eide; Martin Franz; Jean-Michel LeRoux; Xiaoqiang Luo; Benoît Maison; Lidia Mangu; T. Mathes; Miroslav Novak; Peder A. Olsen; Michael Picheny; Harry Printz; Bhuvana Ramabhadran; Andrej Sakrajda; George Saon; Borivoj Tydlitát; Karthik Visweswariah; D. Yuk

We report the results of investigations in acoustic modeling, language modeling and decoding techniques, for the DARPA Communicator, a speaker-independent, telephone-based dialog system. By a combination of methods, including enlarging the acoustic model, augmenting the recognizer vocabulary, conditioning the language model upon the dialog state, and applying a post-processing decoding method, we lowered the overall word error rate from 21.9% to 15.0%, a gain of 6.9% absolute and 31.5% relative.


ieee automatic speech recognition and understanding workshop | 2001

Improvement of non-negative matrix factorization based language model using exponential models

Miroslav Novak; Richard J. Mammone

This paper describes the use of exponential models to improve non-negative matrix factorization (NMF) based topic language models for automatic speech recognition. This modeling technique borrows the basic idea from latent semantic analysis (LSA), which is typically used in information retrieval. An improvement was achieved when exponential models were used to estimate the a posteriori topic probabilities for an observed history. This method improved the perplexity of the NMF model, resulting in a 24% perplexity improvement overall when compared to a trigram language model.


text speech and dialogue | 2010

Evolution of the ASR decoder design

Miroslav Novak

The ASR decoder is one of the fundamental components of an ASR system and has been evolving over the years to address the increasing demands for larger domains as well as the availability of more powerful hardware. Though the basic search algorithm (i.e. Viterbi search) is relatively simple, implementing a decoder which can handle hundreds of thousands of words in the active vocabulary and hundreds of millions of n-grams in the language model in real time is no simple task. With the emergence of embedded platforms, some of the design concepts used in the past to cope with limitations of the available hardware can become relevant again, where such limitations are similar to those of workstations of early days of ASR. In this paper we will describe various basic design concepts encountered in various decoder implementations, with the focus on those which are relevant today among the fairly large spectrum of available hardware platforms.


Journal of the Acoustical Society of America | 2000

Method and apparatus for time-synchronized translation and synthesis of natural-language speech

Raimo Bakis; Mark E. Epstein; William Stuart Meisel; Miroslav Novak; Michael Picheny; Ridley M. Whitaker


Archive | 1998

Non-leaf node penalty score assignment system and method for improving acoustic fast match speed in large vocabulary systems

Miroslav Novak; Michael Picheny


Journal of the Acoustical Society of America | 1998

Method for reducing search complexity in a speech recognition system

Martin Franz; Miroslav Novak

Researchain Logo
Decentralizing Knowledge