Arthur R. Toth | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Arthur R. Toth is active.

Explore More

Publication

Featured researches published by Arthur R. Toth.

international conference on acoustics, speech, and signal processing | 2008

Is voice transformation a threat to speaker identification

Qin Jin; Arthur R. Toth; Alan W. Black; Tanja Schultz

With the development of voice transformation and speech synthesis technologies, speaker identification systems are likely to face attacks from imposters who use voice transformed or synthesized speech to mimic a particular speaker. Therefore, we investigated in this paper how speaker identification systems perform on voice transformed speech. We conducted experiments with two different approaches, the classical GMM-based speaker identification system and the Phonetic speaker identification system. Our experimental results showed that current standard voice transformation techniques are able to fool the GMM-based system but not the Phonetic speaker identification system. These findings imply that future speaker identification systems should include idiosyncratic knowledge in order to successfully distinguish transformed speech from natural speech and thus be armed against imposter attacks.

international conference on acoustics, speech, and signal processing | 2009

Voice convergin: Speaker de-identification by voice transformation

Qin Jin; Arthur R. Toth; Tanja Schultz; Alan W. Black

Speaker identification might be a suitable answer to prevent unauthorized access to personal data. However we also need to provide solutions to secure transmission of spoken information. This challenge divides into two major aspects. First, the secure transmission of the content of the spoken input and second the secure transmission of the identity of the speaker. In this paper we concentrate on the latter, i.e. how to securely transmit information via voice without revealing the identity of the speaker to unauthorized listeners. In order to make the first steps toward solving this problem we study in this paper the potential of voice transformation for speaker de-identification. We use two speaker identification approaches to verify the success of de-identification with voice transformation, a GMM-based and a Phonetic approach, and study different voice transformation strategies to disguise speaker identity information while preserving understandability.

ACM Transactions on Speech and Language Processing | 2005

Towards efficient human machine speech communication: The speech graffiti project

Stefanie Tomko; Thomas K. Harris; Arthur R. Toth; James Sanders; Alexander I. Rudnicky; Roni Rosenfeld

This research investigates the design and performance of the Speech Graffiti interface for spoken interaction with simple machines. Speech Graffiti is a standardized interface designed to address issues inherent in the current state-of-the-art in spoken dialog systems such as high word-error rates and the difficulty of developing natural language systems. This article describes the general characteristics of Speech Graffiti, provides examples of its use, and describes other aspects of the system such as the development toolkit. We also present results from a user study comparing Speech Graffiti with a natural language dialog system. These results show that users rated Speech Graffiti significantly better in several assessment categories. Participants completed approximately the same number of tasks with both systems, and although Speech Graffiti users often took more turns to complete tasks than natural language interface users, they completed tasks in slightly less time.

human factors in computing systems | 2001

A unified design for human-machine voice interaction

Stefanie Shriver; Arthur R. Toth; Xiaojin Zhu; Alexander I. Rudnicky; Roni Rosenfeld

We describe a unified design for voice interaction with simple machines; discuss the motivation for and main features of the approach, include a short sample interaction, and report the results of two preliminary experiments.

international conference on acoustics, speech, and signal processing | 2010

Synthesizing speech from Doppler signals

Arthur R. Toth; Kaustubh Kalgaonkar; Bhiksha Raj; Tony Ezzat

It has long been considered a desirable goal to be able to construct an intelligible speech signal merely by observing the talker in the act of speaking. Past methods at performing this have been based on camera-based observations of the talkers face, combined with statistical methods that infer the speech signal from the facial motion captured by the camera. Other methods have included synthesis of speech from measurements taken by electro-myelo graphs and other devices that are tethered to the talker - an undesirable setup. In this paper we present a new device for synthesizing speech from characterizations of facial motion associated with speech - a Doppler sonar. Facial movement is characterized through Doppler frequency shifts in a tone that is incident on the talkers face. These frequency shifts are used to infer the underlying speech signal. The setup is farfield and untethered, with the sonar acting from the distance of a regular desktop microphone. Preliminary experimental evaluations show that the mechanism is very promising - we are able to synthesize reasonable speech signals, comparable to those obtained from tethered devices such as EMGs.

Journal of the Acoustical Society of America | 2010

Synthesizing speech from surface electromyography and acoustic Doppler sonar.

Arthur R. Toth; Michael Wand; Szu-Chen Stan Jou; Tanja Schultz; Bhiksha Raj; Kaustubh Kalgaonkar; Tony Ezzat

Numerous techniques have been devised to process speech audio in noise, but automatic speech recognition is difficult when the noise is too great. An alternative approach is to collect data that represent the speech production process but is less affected by noise in the speech audio range. Two such types of data come from surface electromyography (EMG) and acoustic Doppler sonar (ADS). EMG records muscle activation potentials. ADS records reflected ultrasound tones. Both can be used to measure facial movements related to speech, but they present their own challenges for automatic speech recognition. This work investigates the alternative approach of using these data sources for speech synthesis. The synthesis techniques explored in this work are based on Gaussian mixture model mapping techniques, which are commonly used for voice transformation. Voice transformation is traditionally concerned with changing the identity of speech audio signals, but others have demonstrated that such techniques can be used...

conference of the international speech communication association | 2007