Jonathan Malkin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jonathan Malkin is active.

Explore More

Publication

Featured researches published by Jonathan Malkin.

empirical methods in natural language processing | 2005

The Vocal Joystick: A Voice-Based Human-Computer Interface for Individuals with Motor Impairments

Jeff A. Bilmes; Xiao Li; Jonathan Malkin; Kelley Kilanski; Richard Wright; Katrin Kirchhoff; Amarnag Subramanya; Susumu Harada; James A. Landay; Patricia Dowden; Howard Jay Chizeck

We present a novel voice-based human-computer interface designed to enable individuals with motor impairments to use vocal parameters for continuous control tasks. Since discrete spoken commands are ill-suited to such tasks, our interface exploits a large set of continuous acoustic-phonetic parameters like pitch, loudness, vowel quality, etc. Their selection is optimized with respect to automatic recognizability, communication bandwidth, learnability, suitability, and ease of use. Parameters are extracted in real time, transformed via adaptation and acceleration, and converted into continuous control signals. This paper describes the basic engine, prototype applications (in particular, voice-based web browsing and a controlled trajectory-following task), and initial user studies confirming the feasibility of this technology.

human factors in computing systems | 2009

The VoiceBot: a voice controlled robot arm

Brandi House; Jonathan Malkin; Jeff A. Bilmes

We present a system whereby the human voice may specify continuous control signals to manipulate a simulated 2D robotic arm and a real 3D robotic arm. Our goal is to move towards making accessible the manipulation of everyday objects to individuals with motor impairments. Using our system, we performed several studies using control style variants for both the 2D and 3D arms. Results show that it is indeed possible for a user to learn to effectively manipulate real-world objects with a robotic arm using only non-verbal voice as a control mechanism. Our results provide strong evidence that the further development of non-verbal voice controlled robotics and prosthetic limbs will be successful.

ieee automatic speech recognition and understanding workshop | 2005

Energy and loudness for speed control in the vocal joystick

Jonathan Malkin; Xiao Li; Jeff A. Bilmes

We propose and describe several methods for using speech power as an estimate of intentional loudness, and a mapping from this loudness estimate to a continuous control. This is performed in the context of a novel voice-based human-computer interface designed to enable individuals with motor impairments to use vocal tract parameters for both discrete and continuous control tasks. The interface uses vocal gestures to control continuous movement and discrete sounds for other events. We conduct a user preference survey to gauge user reaction to the various methods in a mouse cursor control context. We find that loudness is an effective mechanism to control mouse cursor movement speed when mapping vocalic gestures to spatial position

international conference on acoustics, speech, and signal processing | 2005

A graphical model for formant tracking

Jonathan Malkin; Xiao Li; Jeff A. Bilmes

We present a novel approach to estimating the first two formants (F1 and F2) of a speech signal using graphical models. Using a graph that takes advantage of less commonly used features of Bayesian networks, both v-structures and soft evidence, the model presented here shows that it can learn to perform reasonably without large amounts of training data, even with minimal processing on the initial signal. It far outperforms a factorial HMM using the same assumptions and suggests that with further refinement the model may produce high quality formant tracks.

international conference on acoustics, speech, and signal processing | 2008

Ratio semi-definite classifiers

Jonathan Malkin; Jeff A. Bilmes

We present a novel classification model that is formulated as a ratio of semi-definite polynomials. We derive an efficient learning algorithm for this classifier, and apply it to two separate phoneme classification corpora. Results show that our disciminatively trained model can achieve accuracies comparable with state-of-the-art techniques such as multi-layer perceptrons, but does not posses the overconfident bias often found in models based on ratios of exponentials.

international conference on acoustics, speech, and signal processing | 2009

Multi-layer ratio Semi-Definite Classifiers

Jonathan Malkin; Jeff A. Bilmes

We develop a novel extension to the Ratio Semi-definite Classifier, a discriminative model formulated as a ratio of semi-definite polynomials. By adding a hidden layer to the model, we can efficiently train the model, while achieving higher accuracy than the original version. Results on artificial 2-D data as well as two separate phone classification corpora show that our multi-layer model still avoids the overconfidence bias found in models based on ratios of exponentials, while remaining competitive with state-of-the-art techniques such as multi-layer perceptrons.

IEEE Transactions on Audio, Speech, and Language Processing | 2006

A high-speed, low-resource ASR back-end based on custom arithmetic

Xiao Li; Jonathan Malkin; Jeff A. Bilmes

With the skyrocketing popularity of mobile devices, new processing methods tailored to a specific application have become necessary for low-resource systems. This work presents a high-speed, low-resource speech recognition system using custom arithmetic units, where all system variables are represented by integer indices and all arithmetic operations are replaced by hardware-based table lookups. To this end, several reordering and rescaling techniques, including two accumulation structures for Gaussian evaluation and a novel method for the normalization of Viterbi search scores, are proposed to ensure low entropy for all variables. Furthermore, a discriminatively inspired distortion measure is investigated for scalar quantization of forward probabilities to maximize the recognition rate. Finally, heuristic algorithms are explored to optimize system-wide resource allocation. Our best bit-width allocation scheme only requires 59 kB of ROMs to hold the lookup tables, and its recognition performance with various vocabulary sizes in both clean and noisy conditions is nearly as good as that of a system using a 32-bit floating-point unit. Simulations on various architectures show that, on most modern processor designs, we can expect a cycle-count speedup of at least three times over systems with floating-point units. Additionally, the memory bandwidth is reduced by over 70% and the offline storage for model parameters is reduced by 80%

international conference on acoustics, speech, and signal processing | 2004

Custom arithmetic for high-speed, low-resource ASR systems

Jonathan Malkin; Xiao Li; Jeff A. Bilmes

With the skyrocketing popularity of mobile devices, new processing methods, tailored for low-resource systems, have become necessary. We propose the use of custom arithmetic logic tailored to a specific application. In a system with all parameters quantized to low precision, such arithmetic can be implemented through a set of small, fast table lookups. We present here a framework for the design of such a system architecture, and several heuristic algorithms to optimize system performance. In addition, we apply our techniques to an automatic speech recognition (ASR) application. Our simulations on various architectures show that on most modern processor designs, we can expect a cycle-count speedup of at least 3 times while requiring a total of only 59 kB of ROMs to hold the lookup tables.

international conference on acoustics, speech, and signal processing | 2004

Codebook design for ASR systems using custom arithmetic units

Xiao Li; Jonathan Malkin; Jeff A. Bilmes

Custom arithmetic is a novel and successful technique to reduce the computation and resource utilization of ASR systems running on mobile devices. It represents all floating-point numbers by integer indices and substitutes a sequence of table lookups for all arithmetic operations. The first and crucial step in custom arithmetic design is to quantize system variables, preferably to low precision. This paper explores several techniques to quantize variables with high entropy, including a reordering of Gaussian computation and a normalization of Viterbi search. Furthermore, a discriminatively inspired distortion measure is investigated for scalar quantization to better maintain recognition accuracy. Experiments on an isolated word recognition show that each system variable can be scalar quantized to less than 8 bits using a standard quantization method, except for the alpha probability in Viterbi search which requires 10 bits. However, using our normalization and discriminative distortion measure, the, forward probability can be quantized to 9 bits, thereby halving the corresponding lookup table size. This greatly reduces the memory bandwidth and enables the implementation of custom arithmetic on ASR systems.

Computer Speech & Language | 2011

The Vocal Joystick Engine v1.0

Jonathan Malkin; Xiao Li; Susumu Harada; James A. Landay; Jeff A. Bilmes

We present the Vocal Joystick engine, a real-time software library which can be used to map non-linguistic vocalizations into realizable continuous control signals. The system is designed to strike a balance between low latency and accurate recognition while simultaneously taking advantage of the rich complexity of sounds producible by the human vocal tract. By developing a modular, cross-platform library, we aim to provide a robust but simple means of incorporating such controls into any application, thereby producing a new form of accessible technology. This is demonstrated by the various applications that so far have used the Vocal Joystick engine. Unlike previous discussions of parts of the Vocal Joystick, this paper presents a detailed view of the inner workings of the current version of the engine.

Explore More