Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Alexander I. Rudnicky is active.

Publication


Featured researches published by Alexander I. Rudnicky.


international conference on acoustics, speech, and signal processing | 2006

Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices

David Huggins-Daines; Mohit Kumar; Arthur Chan; Alan W. Black; Mosur Ravishankar; Alexander I. Rudnicky

The availability of real-time continuous speech recognition on mobile and embedded devices has opened up a wide range of research opportunities in human-computer interactive applications. Unfortunately, most of the work in this area to date has been confined to proprietary software, or has focused on limited domains with constrained grammars. In this paper, we present a preliminary case study on the porting and optimization of CMU Sphinx-11, a popular open source large vocabulary continuous speech recognition (LVCSR) system, to hand-held devices. The resulting system operates in an average 0.87 times real-time on a 206 MHz device, 8.03 times faster than the baseline system. To our knowledge, this is the first hand-held LVCSR system available under an open-source license


international conference on acoustics, speech, and signal processing | 2010

Using the Amazon Mechanical Turk for transcription of spoken language

Matthew Marge; Satanjeev Banerjee; Alexander I. Rudnicky

We investigate whether Amazons Mechanical Turk (MTurk) service can be used as a reliable method for transcription of spoken language data. Utterances with varying speaker demographics (native and non-native English, male and female) were posted on the MTurk marketplace together with standard transcription guidelines. Transcriptions were compared against transcriptions carefully prepared in-house through conventional (manual) means. We found that transcriptions from MTurk workers were generally quite accurate. Further, when transcripts for the same utterance produced by multiple workers were combined using the ROVER voting scheme, the accuracy of the combined transcript rivaled that observed for conventional transcription methods. We also found that accuracy is not particularly sensitive to payment amount, implying that high quality results can be obtained at a fraction of the cost and turnaround time of conventional methods.


Computer Speech & Language | 2009

The RavenClaw dialog management framework: Architecture and systems

Dan Bohus; Alexander I. Rudnicky

In this paper, we describe RavenClaw, a plan-based, task-independent dialog management framework. RavenClaw isolates the domain-specific aspects of the dialog control logic from domain-independent conversational skills, and in the process facilitates rapid development of mixed-initiative systems operating in complex, task-oriented domains. System developers can focus exclusively on describing the dialog task control logic, while a large number of domain-independent conversational skills such as error handling, timing and turn-taking are transparently supported and enforced by the RavenClaw dialog engine. To date, RavenClaw has been used to construct and deploy a large number of systems, spanning different domains and interaction styles, such as information access, guidance through procedures, command-and-control, medical diagnosis, etc. The framework has easily adapted to all of these domains, indicating a high degree of versatility and scalability.


human language technology | 1993

Multi-site data collection and evaluation in spoken language understanding

Lynette Hirschman; Madeleine Bates; Deborah Dahl; William M. Fisher; John S. Garofolo; David S. Pallett; Kate Hunicke-Smith; Patti Price; Alexander I. Rudnicky; Evelyne Tzoukermann

The Air Travel Information System (ATIS) domain serves as the common task for DARPA spoken language system research and development. The approaches and results possible in this rapidly growing area are structured by available corpora, annotations of that data, and evaluation methods. Coordination of this crucial infrastructure is the charter of the Multi-Site ATIS Data COllection Working group (MADCOW). We focus here on selection of training and test data, evaluation of language understanding, and the continuing search for evaluation methods that will correlate well with expected performance of the technology in applications.


international conference on pattern recognition | 2002

A large scale clustering scheme for kernel K-Means

Rong Zhang; Alexander I. Rudnicky

Kernel functions can be viewed as a non-linear transformation that increases the separability of the input data by mapping them to a new high dimensional space. The incorporation of kernel functions enables the K-Means algorithm to explore the inherent data pattern in the new space. However, the previous applications of the kernel K-Means algorithm are confined to small corpora due to its expensive computation and storage cost. To overcome these obstacles, we propose a new clustering scheme which changes the clustering order from the sequence of samples to the sequence of kernels, and employs a disk-based strategy to control data. The new clustering scheme has been demonstrated to be very efficient for a large corpus by our experiments on handwritten digits recognition, in which more than 90% of the running time was saved.


ANLP/NAACL-ConvSyst '00 Proceedings of the 2000 ANLP/NAACL Workshop on Conversational systems - Volume 3 | 2000

Stochastic language generation for spoken dialogue systems

Alice H. Oh; Alexander I. Rudnicky

The two current approaches to language generation, template-based and rule-based (linguistic) NLG, have limitations when applied to spoken dialogue systems, in part because they were developed for text generation. In this paper, we propose a new corpus-based approach to natural language generation, specifically designed for spoken dialogue systems.


Journal of Memory and Language | 1985

Sound and spelling in spoken word recognition

Jola Jakimik; Ronald A. Cole; Alexander I. Rudnicky

In several experiments, lexical decisions about spoken words were shown to be influenced by the spelling of an immediately preceding item. Specifically, lexical decisions to one-syllable words were faster when part of the preceding word shared both the same sound and spelling. Thus, a lexical decision for “mess” was faster following “message” than following “letter”. Facilitation was not observed when words were related by sound alone (e.g., “definite”-“deaf”) or by spelling alone (e.g., “legislate”-“leg”). Analogous effects of spelling were obtained for nonwords; a decision about a nonword was facilitated only when preceded by a word with shared sound and spelling (e.g., “regular”-“reg”). The implications of these results for the role of spelling in the segmentation of speech and in lexical decisions are discussed.


Interactions | 2001

Universal speech interfaces

Ronald Rosenfeld; Dan R. Olsen; Alexander I. Rudnicky

In recent years speech recognition has reached the point of commercial viability realizable on any off-the-shelf computer. This is a goal that has long been sought by both the research community and by prospective users. Anyone who has used these technologies understands that the recognition has many flaws and there is much still to be done. The recognition algorithms are not the whole story. There is still the question of how speech can and should actually be used. Related to this is the issue of tools for development of speech-based applications. Achieving reliable, accurate speech recognition is similar to building an inexpensive mouse and keyboard. The underlying input technology is available but the question of how to build the application interface still remains. We have been considering these problems for some time [Rosenfeld et. al., 2000a]. In this paper we present some of our thoughts about the future of speech-based interaction. This paper is not a report of results we have obtained, but rather a vision of a future to be explored.


Journal of Experimental Psychology: Human Perception and Performance | 1984

Size and case of type as stimuli in reading.

Alexander I. Rudnicky; Paul A. Kolers

The role of size and case of print have provoked a number of experiments in the recent past. One strongly argued position is that the reader abstracts a canonical representation from a string of letters that renders its variations irrelevant and then carries out recognition procedures on that abstraction. An alternate view argues that the reader proceeds by analyzing the print, taking account of its manifold physical attributes such as length of words, their orientation, shape, and the like. In the present experiments size and case were varied in several ways, and the task was also varied to include both silent reading and reading aloud. Clear evidence for shape-sensing operations was brought forward, but they were shown to be optional rather than obligatory processes, used when it served the readers purpose to do so. However, it was also shown that such skills, normally useful, could be tricked into operating even when their presence hindered the readers performance. The conclusion is drawn that reading goes forward in many ways at once rather than through an orderly sequence of operations, consistent with the readers skills and the requirements of the task. Overarching theories of performance seem premature in the absence of detailed analysis of task components.


Communications of The ACM | 1994

Survey of current speech technology

Alexander I. Rudnicky; Alexander G. Hauptmann; Kai-Fu Lee

Speech recognition and speech synthesis are technologies of particular interest for their support of direct communication between humans and computers through a communications mode humans commonly use among themselves and at which they are highly skilled. Both manipulate speech in terms of its information content; recognition transforms human speech into text to be used literally (e.g., for dictation) or interpreted as commands to control applications, and synthesis allows the generation of spoken utterances from text

Collaboration


Dive into the Alexander I. Rudnicky's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yun-Nung Chen

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Alan W. Black

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Rong Zhang

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ming Sun

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Long Qin

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Matthew Marge

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Aasish Pappu

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Joseph Polifroni

Massachusetts Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge