Robert D. Rodman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Robert D. Rodman is active.

Explore More

Publication

Featured researches published by Robert D. Rodman.

IEEE Transactions on Audio, Speech, and Language Processing | 2009

Particle Swarm Optimization for Sorted Adapted Gaussian Mixture Models

Rahim Saeidi; Hamid Reza Sadegh Mohammadi; Todor Ganchev; Robert D. Rodman

Recently, we introduced the sorted Gaussian mixture models (SGMMs) algorithm providing the means to tradeoff performance for operational speed and thus permitting the speed-up of GMM-based classification schemes. The performance of the SGMM algorithm depends on the proper choice of the sorting function, and the proper adjustment of its parameters. In the present work, we employ particle swarm optimization (PSO) and an appropriate fitness function to find the most advantageous parameters of the sorting function. We evaluate the practical significance of our approach on the text-independent speaker verification task utilizing the NIST 2002 speaker recognition evaluation (SRE) database while following the NIST SRE experimental protocol. The experimental results demonstrate a superior performance of the SGMM algorithm using PSO when compared to the original SGMM. For comprehensiveness we also compared these results with those from a baseline Gaussian mixture model-universal background model (GMM-UBM) system. The experimental results suggest that the performance loss due to speed-up is partially mitigated using PSO-derived weights in a sorted GMM-based scheme.

conference on applied natural language processing | 1983

INTERACTIVE NATURAL LANGUAGE PROBLEM SOLVING: A PRAGMATIC APPROACH

Alan W. Biermann; Robert D. Rodman; Bruce W. Ballard; T. Betancourt; Griff L. Bilbro; Harriet Haynsworth Deas; Linda Fineman; Pamela E. Fink; Kermit C. Gilbert; D. Gregory; J. Francis Heidlage

A class of natural language processors is described which allow a user to display objects of interest on a computer terminal and manipulate them via typed or spoken English sentences.This paper concerns itself with the implementation of the voice input facility using an automatic speech recognizer, and the touch input facility using a touch sensitive screen. To overcome the high error rates of the speech recognizer under conditions of actual problem solving in natural language, error correction software has been designed and is described here. Also described are problems involving the resolution of voice input with touch input, and the identification of the intended referents of touch input.To measure system performance we have considered two classes of factors: the various conditions of testing, and the level and quality of training of the system user. In the paper a sequence of five different testing situations is observed, each one resulting in a lowering of system performance by several percentage points below the previous one. A training procedure for potential users is described, and an experiment is discussed which utilizes the training procedure to enable users to solve actual non-trivial problems using natural language voice communication.

Communications of The ACM | 1985

Natural language with discrete speech as a mode for human-to-machine

Alan W. Biermann; Robert D. Rodman; David C. Rubin; J. Francis Heidlage

A voice interactive natural language system, which allows users to solve problems with spoken English commands, has been constructed. The system utilizes a commercially available discrete speech recognizer which requires that each word be followed by approximately a 300 millisecond pause. In a test of the system, subjects were able to learn its use after about two hours of training. The system correctly processed about 77 percent of the over 6000 input sentences spoken in problem-solving sessions. Subjects spoke at the rate of about three sentences per minute and were able to effectively use the system to complete the given tasks. Subjects found the system relatively easy to learn and use, and gave a generally positive report of their experience.

international conference on acoustics, speech, and signal processing | 2007

A New Segmentation Algorithm Combined with Transient Frames Power for Text Independent Speaker Verification

Rahim Saeidi; Hamid Reza Sadegh Mohammadi; Robert D. Rodman; Tomi Kinnunen

In this paper we propose a new segmentation algorithm called delta MFCC based speech segmentation (DMFCC-SS), with application to speaker recognition systems. We show that DMFCC-SS can separate the regions of speech that result from similar likelihood scores using models such as a Gaussian mixture model (GMM), and can therefore be used to identify the regions of speech between two transitional states in a speech signal. By combining this segmentation algorithm with the discriminative power of transient frames in speaker recognition, we can investigate the tradeoff in speed-up rates that result from DMFCC-SS, with speaker verification equal error rates that result from representatives of each segment. We use a universal background model Gaussian mixture model (UBM-GMM) as a baseline system. The proposed speed-up algorithm, working in the pre-processing stage, performs well while having no computational load compared to the main GMM system. Experimental results show the superior performance of this pre-processing method in comparison with other algorithms working in the pre-processing stage of a UBM-GMM system.

asilomar conference on signals, systems and computers | 1994

Automated lip-sync: direct translation of speech-sound to mouth-shape

Barrett Emil Koster; Robert D. Rodman; Donald L. Bitzer

The goal of automatic lip-sync (ALS) is to translate speech sounds into mouth shapes. Although this seems related to speech recognition (SR), the direct map from sound to shape avoids many language problems associated with SR and provides a unique domain for error correction. Among other things, ALS animation may be used for animating cartoons realistically and as an aid to the hearing disabled. Currently, a program named Owie performs speaker dependent ALS for vowels.<<ETX>>

technical symposium on computer science education | 2007

ProofChecker: an accessible environment for automata theory correctness proofs

Matthias F. M. Stallmann; Suzanne Balik; Robert D. Rodman; Sina Bahram; Michael C. Grace; Susan D. High

ProofChecker is a graphical program based on the notion of formal correctness proofs that allows students, both sighted and visually impaired, to draw a deterministic finite automaton (DFA) and determine whether or not it correctly recognizes a given language. Sighted students use the mouse and graphical controls to draw and manipulate the DFA. Keyboard shortcuts, together with the use of a screen reader to voice the accessible descriptions provided by the program, allow visually impaired students to do the same. Because the states of a DFA partition thelanguage over its alphabet into equivalence classes, each state has a language associated with it. Conditions that describe the language of each state are entered by the student in the form of conditional expressions with function calls and/or regular expressions. A brute-force approach is then used to check that each states condition correctly describes all of the strings in its language and that none of the strings in a states language meet the condition for another state. Feedback is provided that either confirms that the DFA correctly meets thegiven conditions or alerts the student to a mismatch between the conditions and the DFA. A students DFA can be saved in an XML file and submitted for grading. An automated checking tool, known as ProofGrader, can be used to compare a students DFA with the correct DFA for a given language, thus greatly speeding up the grading of student assignments.

Computer Networks and Isdn Systems | 1998

Speaker independence in automated lip-sync for audio-video communication

David F. McAllister; Robert D. Rodman; Donald L. Bitzer; Andrew S. Freeman

The present invention relates generally to improvements in carriers or receptacles and to packages resulting from the use thereof, said carriers being of the type particularly adapted to accommodate containers such as beverage cans and the like, wherein the extremity of the container is formed with an enlargement or bead. The embodiment of the container carrier or receptacle disclosed herein includes a strip of resilient, deformable, and elastic plastic material such as polyethylene, having a plurality of container-accommodating, longitudinally and transversely aligned apertures. These apertures are intended for telescopic association with the ends of the containers so that the margins of the strip aligning said apertures may be stretched and deformed to form circumferentially continuous lips embracing said containers beneath the peripheral enlargements thereof. Transversely aligned web sections connect said circumferentially continuous strip edges in longidudinal rows, and openings are provided for rendering said web sections frangible. Certain of said openings are of greater extent than others, whereby to facilitate the ease with which a portion of the strip supporting a plurality of containers may be detached as a unit from said strip.

technical symposium on computer science education | 2013

GSK: universally accessible graph sketching

Suzanne Balik; Sean Mealin; Matthias F. M. Stallmann; Robert D. Rodman

Combinatorial graphs, often conveyed as node-link diagrams, figure prominently in Computer Science and other Science, Technology, Engineering, and Mathematics (STEM) disciplines. Unfortunately, they are most often inaccessible to blind students and professionals. This paper introduces GSK, a self-contained Graph SKetching tool that allows blind and sighted people to easily create, edit, and share graphs in real-time using interaction mechanisms (mouse, keyboard, monitor, screen reader) that are standard for them. GSK was successfully used by a blind Computer Science student and his sighted instructors to create and access graphs specific to his automata theory and operating systems courses. Our hope is that GSK will enable more blind STEM students and professionals to actively participate in their disciplines by providing them and their sighted colleagues with a cross-collaboration tool that allows them to share graphs just as easily as they share text and word processing documents.

international conference on acoustics, speech, and signal processing | 2007

Combined Inter-Frame and Intra-Frame Fast Scoring Methods for Efficient Implementation of GMM-Based Speaker Verification Systems

Hamid Reza Sadegh Mohammadi; Rahim Saeidi; M. R. Rohani; Robert D. Rodman

In this paper a new inter-frame fast scoring scheme is proposed for Gaussian mixture model universal background model (GMM-UBM) speaker verification systems. It is combined with a recently introduced intra-frame efficient scoring method called the sorted Gaussian mixture model (SGMM) classifier which itself uses a sorted UBM known as the sorted background model (SBM). To enhance the performance of the system a GMM identifier is applied as a post-processing block. Experimental results show that the performance of this combined method compares favorably with the baseline GMM-UBM system, while the computational load of the proposed system is greatly less than that of the baseline system.

international conference on human computer interaction | 1987

THE EFFECTS OF VARIOUS TYPES OF SPEECH OUTPUT ON LISTENER COMPREHENSION RATES

Taryn S. Moody; Michael Joost; Robert D. Rodman

This paper reports the results of a study that investigated the effects of varying speech signals, passages, and questions on listener comprehension rates. The passages and comprehension questions used were taken from sample tests of the High School Equivalency Examination, the Scholastic Aptitude Test, and the Graduate Record Examination. These passages were then recorded using four speech output types: synthesized speech, digitized speech at a rate of 9600 bps, digitized speech at a rate of 2400 bps, and natural (human) speech. A reading group was also used in this study for control purposes. Results indicated statistically significant differences in comprehension rates between the natural speech group and the synthesized and 2400 bps digitized speech groups. Significant passage and question type effects were also found. These results and the voice output guidelines derived from this study are discussed in this paper.

Explore More