Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Martin Franz is active.

Publication


Featured researches published by Martin Franz.


international conference on acoustics, speech, and signal processing | 1995

Performance of the IBM large vocabulary continuous speech recognition system on the ARPA Wall Street Journal task

Lalit R. Bahl; S. Balakrishnan-Aiyer; J.R. Bellgarda; Martin Franz; Ponani S. Gopalakrishnan; David Nahamoo; Miroslav Novak; Mukund Padmanabhan; Michael Picheny; Salim Roukos

In this paper we discuss various experimental results using our continuous speech recognition system on the Wall Street Journal task. Experiments with different feature extraction methods, varying amounts and type of training data, and different vocabulary sizes are reported.


Journal of the Acoustical Society of America | 2002

Phrase splicing and variable substitution using a trainable speech synthesizer

Robert E. Donovan; Martin Franz; Salim Roukos; Jeffrey S. Sorensen

In accordance with the present invention, a method for providing generation of speech includes the steps of providing input to be acoustically produced, comparing the input to training data or application specific splice files to identify one of words and word sequences corresponding to the input for constructing a phone sequence, using a search algorithm to identify a segment sequence to construct output speech according to the phone sequence and concatenating segments and modifying characteristics of the segments to be substantially equal to requested characteristics. Application specific data is advantageously used to make pertinent information available to synthesize both the phone sequence and the output speech. Also, described is a system for performing operations in accordance with the disclosure.


IEEE Transactions on Speech and Audio Processing | 2004

Automatic recognition of spontaneous speech for access to multilingual oral history archives

William Byrne; David S. Doermann; Martin Franz; Samuel Gustman; Jan Hajic; Douglas W. Oard; Michael Picheny; Josef Psutka; Bhuvana Ramabhadran; Dagobert Soergel; Todd Ward; Wei-Jing Zhu

Much is known about the design of automated systems to search broadcast news, but it has only recently become possible to apply similar techniques to large collections of spontaneous speech. This paper presents initial results from experiments with speech recognition, topic segmentation, topic categorization, and named entity detection using a large collection of recorded oral histories. The work leverages a massive manual annotation effort on 10 000 h of spontaneous speech to evaluate the degree to which automatic speech recognition (ASR)-based segmentation and categorization techniques can be adapted to approximate decisions made by human annotators. ASR word error rates near 40% were achieved for both English and Czech for heavily accented, emotional and elderly spontaneous speech based on 65-84 h of transcribed speech. Topical segmentation based on shifts in the recognized English vocabulary resulted in 80% agreement with manually annotated boundary positions at a 0.35 false alarm rate. Categorization was considerably more challenging, with a nearest-neighbor technique yielding F=0.3. This is less than half the value obtained by the same technique on a standard newswire categorization benchmark, but replication on human-transcribed interviews showed that ASR errors explain little of that difference. The paper concludes with a description of how these capabilities could be used together to search large collections of recorded oral histories.


international acm sigir conference on research and development in information retrieval | 2001

Unsupervised and supervised clustering for topic tracking

Martin Franz; Todd Ward; J. Scott McCarley; Wei-Jing Zhu

We investigate important differences between two styles of document clustering in the context of Topic Detection and Tracking. Converting a Topic Detection system into a Topic Tracking system exposes fundamental differences between these two tasks that are important to consider in both the design and the evaluation of TDT systems. We also identify features that can be used in systems for both tasks.


international acm sigir conference on research and development in information retrieval | 2004

Building an information retrieval test collection for spontaneous conversational speech

Douglas W. Oard; Dagobert Soergel; David S. Doermann; Xiaoli Huang; G. Craig Murray; Jianqiang Wang; Bhuvana Ramabhadran; Martin Franz; Samuel Gustman; James Mayfield; Liliya Kharevych; Stephanie M. Strassel

Test collections model use cases in ways that facilitate evaluation of information retrieval systems. This paper describes the use of search-guided relevance assessment to create a test collection for retrieval of spontaneous conversational speech. Approximately 10,000 thematically coherent segments were manually identified in 625 hours of oral history interviews with 246 individuals. Automatic speech recognition results, manually prepared summaries, controlled vocabulary indexing, and name authority control are available for every segment. Those features were leveraged by a team of four relevance assessors to identify topically relevant segments for 28 topics developed from actual user requests. Search-guided assessment yielded sufficient inter-annotator agreement to support formative evaluation during system development. Baseline results for ranked retrieval are presented to illustrate use of the collection.


north american chapter of the association for computational linguistics | 2001

Question answering using maximum entropy components

Abraham Ittycheriah; Martin Franz; Wei-Jing Zhu; Adwait Ratnaparkhi; Richard J. Mammone

We present a statistical question answering system developed for TREC-9 in detail. The system is an application of maximum entropy classification for question/answer type prediction and named entity marking. We describe our system for information retrieval which did document retrieval from a local encyclopedia, and then expanded the query words and finally did passage retrieval from the TREC collection. We will also discuss the answer selection algorithm which determines the best sentence given both the question and the occurrence of a phrase belonging to the answer class desired by the question. A new method of analyzing system performance via a transition matrix is shown.


international conference on acoustics speech and signal processing | 1999

Phrase splicing and variable substitution using the IBM trainable speech synthesis system

Robert E. Donovan; Martin Franz; Jeffrey S. Sorensen; Salim Roukos

This paper describes a phrase splicing and variable substitution system which offers an intermediate form of automated speech production lying in-between the extremes of recorded utterance playback and full text-to-speech synthesis. The system incorporates a trainable speech synthesiser and an application specific set of pre-recorded phrases. The text to be synthesised is converted to a phone sequence using phone sequences present in the pre-recorded phrases wherever possible, and a pronunciation dictionary elsewhere. The synthesis inventory of the synthesiser is augmented with the synthesis information associated with the pre-recorded phrases used to construct the phone sequence. The synthesiser then performs a dynamic programming search over the augmented inventory to select a segment sequence to produce the output speech. The system enables the seamless splicing of pre-recorded phrases both with other phrases and with synthetic speech. It enables very high quality speech to be produced automatically within a limited domain.


international acm sigir conference on research and development in information retrieval | 2001

Quantifying the utility of parallel corpora

Martin Franz; J. Scott McCarley; Todd Ward; Wei-Jing Zhu

Our English-Chinese cross-language IR system is trained from parallel corpora; we investigate its performance as a function of training corpus size for three different training corpora. We find that the performance of the system as trained on the three parallel corpora can be related by a simple measure, namely the out-of-vocabulary rate of query words.


international conference on acoustics, speech, and signal processing | 2001

Speech recognition for DARPA Communicator

Andrew Aaron; Scott Saobing Chen; Paul S. Cohen; Satya Dharanipragada; Ellen Eide; Martin Franz; Jean-Michel LeRoux; Xiaoqiang Luo; Benoît Maison; Lidia Mangu; T. Mathes; Miroslav Novak; Peder A. Olsen; Michael Picheny; Harry Printz; Bhuvana Ramabhadran; Andrej Sakrajda; George Saon; Borivoj Tydlitát; Karthik Visweswariah; D. Yuk

We report the results of investigations in acoustic modeling, language modeling and decoding techniques, for the DARPA Communicator, a speaker-independent, telephone-based dialog system. By a combination of methods, including enlarging the acoustic model, augmenting the recognizer vocabulary, conditioning the language model upon the dialog state, and applying a post-processing decoding method, we lowered the overall word error rate from 21.9% to 15.0%, a gain of 6.9% absolute and 31.5% relative.


Archive | 2002

Segmentation and Detection at IBM

Satya Dharanipragada; Martin Franz; Jeffrey Scott McCarley; Todd Ward; Wei-Jing Zhu

IBM’s story segmentation uses a combination of decision tree and maximum entropy models. They take a variety of lexical, prosodic, semantic, and structural features as their inputs. Both types of models are source-specific, and we substantially lower C seg by combining them. IBM’s topic detection system introduces a minimal hierarchy into the clustering: each cluster is comprised of one or more microclusters. We investigate the importance of merging microclusters together, and propose a merging strategy which improves our performance.

Researchain Logo
Decentralizing Knowledge