Jan Robin Rohlicek | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jan Robin Rohlicek is active.

Explore More

Publication

Featured researches published by Jan Robin Rohlicek.

IEEE Transactions on Speech and Audio Processing | 1993

ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition

Vassilios Digalakis; Jan Robin Rohlicek; Mari Ostendorf

A nontraditional approach to the problem of estimating the parameters of a stochastic linear system is presented. The method is based on the expectation-maximization algorithm and can be considered as the continuous analog of the Baum-Welch estimation algorithm for hidden Markov models. The algorithm is used for training the parameters of a dynamical system model that is proposed for better representing the spectral dynamics of speech for recognition. It is assumed that the observed feature vectors of a phone segment are the output of a stochastic linear dynamical system, and it is shown how the evolution of the dynamics as a function of the segment length can be modeled using alternative assumptions. A phoneme classification task using the TIMIT database demonstrates that the approach is the first effective use of an explicit model for statistical dependence between frames of speech. >

human language technology | 1991

Integration of diverse recognition methodologies through reevaluation of N-best sentence hypotheses

Mari Ostendorf; Ashvin Kannan; Steve Austin; Owen Kimball; Richard M. Schwartz; Jan Robin Rohlicek

This paper describes a general formalism for integrating two or more speech recognition technologies, which could be developed at different research sites using different recognition strategies. In this formalism, one system uses the N-best search strategy to generate a list of candidate sentences; the list is rescored by other systems; and the different scores are combined to optimize performance. Specifically, we report on combining the BU system based on stochastic segment models and the BBN system based on hidden Markov models. In addition to facilitating integration of different systems, the N-best approach results in a large reduction in computation for word recognition using the stochastic segment model.

international conference on acoustics, speech, and signal processing | 1994

Approaches to topic identification on the switchboard corpus

John W. McDonough; Kenney Ng; Philippe Jeanrenaud; Herbert Gish; Jan Robin Rohlicek

Topic identification (TID) is the automatic classification of speech messages into one of a known set of possible topics. The TID task can be view as having three principal components: 1) event generation, 2) keyword event selection, and 3) topic modeling. Using data from the Switchboard corpus, the authors present experimental results for various approaches to the TID problem and compare the relative effectiveness of each. In addition, they examine the effect of keyword set size on identification accuracy and gauge the loss in performance when mismatched topic modeling and keyword selection schemes are used.<<ETX>>

IEEE Transactions on Speech and Audio Processing | 1994

Maximum likelihood clustering of Gaussians for speech recognition

Ashvin Kannan; Mari Ostendorf; Jan Robin Rohlicek

Describes a method for clustering multivariate Gaussian distributions using a maximum likelihood criterion. The authors point out possible applications of model clustering, and then use the approach to determine classes of shared covariances for contest modeling in speech recognition, achieving an order of magnitude reduction in the number of covariance parameters, with no loss in recognition performance. >

international conference on acoustics, speech, and signal processing | 1993

Phonetic training and language modeling for word spotting

Jan Robin Rohlicek; Philippe Jeanrenaud; Kenney Ng; Herbert Gish; B. Musicus; Man-Hung Siu

The authors present a view of HMM (hidden Markov model)-based word spotting systems as described by three main components: the HMM acoustic model; the overall HMM structure, including nonkeyword modeling; and the keyword scoring method. They investigate and present comparative results for various approaches to each of these components and show that design choices for these components can be addressed separately. They also present a novel approach to word spotting that combines phonetic training, large vocabulary modeling, and statistical language modeling with a posterior probability approach to keyword scoring. They perform word spotting experiments using telephone quality conversational speech from the Switchboard corpus to examine the effect of different design choices for the three components and demonstrate that the proposed approach provides superior performance to previously used techniques.<<ETX>>

IEEE Transactions on Signal Processing | 1992

Fast algorithms for phone classification and recognition using segment-based models

Vassilios Digalakis; Mari Ostendorf; Jan Robin Rohlicek

Methods for reducing the computation requirements of joint segmentation and recognition of phones using the stochastic segment model are presented. The approach uses a fast segment classification method that reduces computation by a factor of two to four, depending on the confidence of choosing the most probable model. A split-and-merge segmentation algorithm is proposed as an alternative to the typical dynamic programming solution of the segmentation and recognition problem, with computation savings increasing proportionally with model complexity. Although the current recognizer uses context-independent phone models, the results reported for the TIMIT database for speaker-independent joint segmentation and recognition are comparable to those of systems that use context information. >

international conference on acoustics, speech, and signal processing | 1992

Gisting conversational speech

Jan Robin Rohlicek; D. Ayuso; M. Bates; Robert J. Bobrow; Albert Boulanger; Herbert Gish; Philippe Jeanrenaud; Marie Meteer; Man-Hung Siu

A novel system for extracting information from stereotyped voice traffic is described. Off-the-air recordings of commercial air traffic control communications are interpreted in order to identify the flights present and determine the scenario (e.g., takeoff, landing) that they are following. The system combines algorithms from signal segmentation, speaker segregation, speech recognition, natural language parsing, and topic classification into a single system. Initial evaluation of the algorithm on data recorded at Dallas-Fort Worth airport yields performance of 68% detection of flights with 98% precision at an operating point where 76% of the flight identifications are correctly recognized. In tower recording containing both takeoff and landing scenarios, flights are correctly classified as takeoff or landing 94% of the time.<<ETX>>

international conference on acoustics, speech, and signal processing | 1995

Lattice-based search strategies for large vocabulary speech recognition

F. Richardson; Mari Ostendorf; Jan Robin Rohlicek

The design of search algorithms is an important issue in large vocabulary speech recognition, especially as more complex models are developed for improving recognition accuracy. Multi-pass search strategies have been used as a means of applying simple models early on to prune the search space for subsequent passes using more expensive knowledge sources. The pruned search space is typically represented by an N-best sentence list or a word lattice. Here, we investigate three alternatives for lattice search: N-best rescoring, a lattice dynamic programming search algorithm and a lattice local search algorithm. Both the lattice dynamic programming and lattice local search algorithms are shown to achieve comparable performance to the N-best search algorithm while running as much as 10 times faster on a 20 k word lexicon; the local search algorithm has the additional advantage of accommodating sentence-level knowledge sources.

international conference on acoustics, speech, and signal processing | 1993

Gisting conversational speech in real time

L. Denenberg; Herbert Gish; Marie Meteer; T. Miller; Jan Robin Rohlicek; W. Sadkin; Man-Hung Siu

The authors describe additions and modifications to a prototype system for analyzing air traffic contol (ATC) communication. The primary goal of the effort was to achieve real-time performance. This involved both system architectural and algorithmic modifications. The task of the system is to extract the gist of activity as it is monitored. In the ATC domain this involves identifying those flights that are present and classifying each flight as a departure, arrival, or other. The system combines a variety of techniques from speaker-identification, speech recognition, natural-language processing, and artificial intelligence. Continuous processing versions of the algorithms have been constructed and it has been demonstrated that real-time performance is possible by distributing the processing over a small number of workstations. To accomplish this task, a flexible software task-construction tool that allows simple specification of complex systems, supporting both dataflow and client/server models, has been developed.<<ETX>>

human language technology | 1989

Improvements in the stochastic segment model for Phoneme recognition

Vassilios Digalakis; Mari Ostendorf; Jan Robin Rohlicek

The heart of a speech recognition system is the acoustic model of sub-word units (e.g., phonemes). In this work we discuss refinements of the stochastic segment model, an alternative to hidden Markov models for representation of the acoustic variability of phonemes. We concentrate on mechanisms for better modelling time correlation of features across an entire segment. Results are presented for speaker-independent phoneme classification in continuous speech based on the TIMIT database.

Explore More