Paul R. Dixon | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Paul R. Dixon is active.

Explore More

Publication

Featured researches published by Paul R. Dixon.

ieee automatic speech recognition and understanding workshop | 2007

The Titech large vocabulary WFST speech recognition system

Paul R. Dixon; Diamantino Caseiro; Tasuku Oonishi; Sadaoki Furui

In this paper we present evaluations on the large vocabulary speech decoder we are currently developing at Tokyo Institute of Technology. Our goal is to build a fast, scalable, flexible decoder to operate on weighted finite state transducer (WFST) search spaces. Even though the development of the decoder is still in its infancy we have already implemented a impressive feature set and are achieving good accuracy and speed on a large vocabulary spontaneous speech task. We have developed a technique to allow parts of the decoder to be run on the graphics processor, this can lead to a very significant speed up.

Computer Speech & Language | 2009

Harnessing graphics processors for the fast computation of acoustic likelihoods in speech recognition

Paul R. Dixon; Tasuku Oonishi; Sadaoki Furui

In large vocabulary continuous speech recognition (LVCSR) the acoustic model computations often account for the largest processing overhead. Our weighted finite state transducer (WFST) based decoding engine can utilize a commodity graphics processing unit (GPU) to perform the acoustic computations to move this burden off the main processor. In this paper we describe our new GPU scheme that can achieve a very substantial improvement in recognition speed whilst incurring no reduction in recognition accuracy. We evaluate the GPU technique on a large vocabulary spontaneous speech recognition task using a set of acoustic models with varying complexity and the results consistently show by using the GPU it is possible to reduce the recognition time with largest improvements occurring in systems with large numbers of Gaussians. For the systems which achieve the best accuracy we obtained between 2.5 and 3 times speed-ups. The faster decoding times translate to reductions in space, power and hardware costs by only requiring standard hardware that is already widely installed.

international conference on acoustics, speech, and signal processing | 2009

Fast acoustic computations using graphics processors

Paul R. Dixon; Tasuku Oonishi; Sadaoki Furui

In this paper we present a fast method for computing acoustic likelihoods that makes use of a Graphics Processing Unit (GPU). After enabling the GPU acceleration the main processor runtime dedicated to acoustic scoring tasks is reduced from the largest consumer to just a few percent even when using mixture models with a large number of Gaussian components. The results show a large reduction in decoding time with no change in accuracy and we also show by using a 16bit half precision floating point format for the acoustic model parameters, storage requirements can be halved with no reduction in accuracy.

international conference on acoustics, speech, and signal processing | 2012

A comparison of dynamic WFST decoding approaches

Paul R. Dixon; Chiori Hori; Hideki Kashioka

In this paper we perform a comparison of lookahead composition and on-the-fly hypothesis rescoring using a common decoder. The results on a large vocabulary speech recognition task illustrate the differences in the behaviour of these algorithms in terms of error rate, real time factor, memory usage and internal statistics of the decoder. The evaluations were performed when the decoder was operated at either the state or arc level. The results show the dynamic approaches also work well at the state level even though there is greater dynamic construction cost.

Computer Speech & Language | 2014

Incorporating local information of the acoustic environments to MAP-based feature compensation and acoustic model adaptation

Yu Tsao; Xugang Lu; Paul R. Dixon; Ting-yao Hu; Shigeki Matsuda; Chiori Hori

Designing suitable prior distributions is important for MAP-based methods.We propose a framework to characterize local information of acoustic environments.With the local information, suitable prior distributions can be designed.Four algorithms to specify hyper-parameters for prior distributions are derived.Results confirm the advantage of using local information to MAP-based methods. The maximum a posteriori (MAP) criterion is popularly used for feature compensation (FC) and acoustic model adaptation (MA) to reduce the mismatch between training and testing data sets. MAP-based FC and MA require prior densities of mapping function parameters, and designing suitable prior densities plays an important role in obtaining satisfactory performance. In this paper, we propose to use an environment structuring framework to provide suitable prior densities for facilitating MAP-based FC and MA for robust speech recognition. The framework is constructed in a two-stage hierarchical tree structure using environment clustering and partitioning processes. The constructed framework is highly capable of characterizing local information about complex speaker and speaking acoustic conditions. The local information is utilized to specify hyper-parameters in prior densities, which are then used in MAP-based FC and MA to handle the mismatch issue. We evaluated the proposed framework on Aurora-2, a connected digit recognition task, and Aurora-4, a large vocabulary continuous speech recognition (LVCSR) task. On both tasks, experimental results showed that with the prepared environment structuring framework, we could obtain suitable prior densities for enhancing the performance of MAP-based FC and MA.

international conference on acoustics, speech, and signal processing | 2009

Generalization of specialized on-the-fly composition

Tasuku Oonishi; Paul R. Dixon; Koji Iwano; Sadaoki Furui

In the Weighted Finite State Transducer (WFST) framework for speech recognition, we can reduce memory usage and increase flexibility by using on-the-fly composition which generates the search network dynamically during decoding. Methods have also been proposed for optimizing WFSTs in on-the-fly composition, however, these operations place restrictions on the structure of the component WFSTs. We propose extended on-the-fly optimization operations which can operate on WFSTs of arbitrary structure by utilizing a filter composition. The evaluations illustrate the proposed method is able to generate more efficient WFSTs.

cross language evaluation forum | 2007