Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Marcus Tomalin is active.

Publication


Featured researches published by Marcus Tomalin.


international conference on acoustics, speech, and signal processing | 2005

Structural metadata research in the EARS program

Yang Liu; Elizabeth Shriberg; Andreas Stolcke; Barbara Peskin; Jeremy Ang; Dustin Hillard; Mari Ostendorf; Marcus Tomalin; Philip C. Woodland; Mary P. Harper

Both human and automatic processing of speech require recognition of more than just words. In this paper we provide a brief overview of research on structural metadata extraction in the DARPA EARS rich transcription program. Tasks include detection of sentence boundaries, filler words, and disfluencies. Modeling approaches combine lexical, prosodic, and syntactic information, using various modeling techniques for knowledge source integration. The performance of these methods is evaluated by task, by data source (broadcast news versus spontaneous telephone conversations) and by whether transcriptions come from humans or from an (errorful) automatic speech recognizer. A representative sample of results shows that combining multiple knowledge sources (words, prosody, syntactic information) is helpful, that prosody is more helpful for news speech than for conversational speech, that word errors significantly impact performance, and that discriminative models generally provide benefit over maximum likelihood models. Important remaining issues, both technical and programmatic, are also discussed.


international conference on acoustics, speech, and signal processing | 2009

Training and adapting MLP features for Arabic speech recognition

Junho Park; Frank Diehl; Mark J. F. Gales; Marcus Tomalin; Philip C. Woodland

Features derived from Multi-Layer Perceptrons (MLPs) are becoming increasingly popular for speech recognition. This paper describes various schemes for applying these features to state-of-the-art Arabic speech recognition: the use of MLP-features for short-vowel modelling in graphemic systems; rapid discriminative model training by standard PLP feature lattice re-use; and MLP feature adaptation using Linear Input Networks (LIN). The use of rapid training using MLP features and their use for short-vowel modelling and LIN adaptation gave reductions in word error rate. However significant improvements over explicit short-vowel modelling with standard multi-pass adaptation were not obtained, although they were useful in combination.


Computer Speech & Language | 2011

The efficient incorporation of MLP features into automatic speech recognition systems

Junho Park; Frank Diehl; Mark J. F. Gales; Marcus Tomalin; Philip C. Woodland

In recent years, the use of Multi-Layer Perceptron (MLP) derived acoustic features has become increasingly popular in automatic speech recognition systems. These features are typically used in combination with standard short-term spectral-based features, and have been found to yield consistent performance improvements. However there are a number of design decisions and issues associated with the use of MLP features for state-of-the-art speech recognition systems. Two modifications to the standard training/adaptation procedures are described in this work. First, the paper examines how MLP features, and the associated acoustic models, can be trained efficiently on large training corpora using discriminative training techniques. An approach that combines multiple individual MLPs is proposed, and this reduces the time needed to train MLPs on large amounts of data. In addition, to further speed up discriminative training, a lattice re-use method is proposed. The paper also examines how systems with MLP features can be adapted to a particular speakers, or acoustic environments. In contrast to previous work (where standard HMM adaptation schemes are used), linear input network adaptation is investigated. System performance is investigated within a multi-pass adaptation/combination framework. This allows the performance gains of individual techniques to be evaluated at various stages, as well as the impact in combination with other sub-systems. All the approaches considered in this paper are evaluated on an Arabic large vocabulary speech recognition task which includes both Broadcast News and Broadcast Conversation test data.


ieee automatic speech recognition and understanding workshop | 2007

Development of a phonetic system for large vocabulary Arabic speech recognition

Mark J. F. Gales; Frank Diehl; Chandra Kant Raut; Marcus Tomalin; Philip C. Woodland; Kai Yu

This paper describes the development of an Arabic speech recognition system based on a phonetic dictionary. Though phonetic systems have been previously investigated, this paper makes a number of contributions to the understanding of how to build these systems, as well as describing a complete Arabic speech recognition system. The first issue considered is discriminative training when there are a large number of pronunciation variants for each word. In particular, the loss function associated with minimum phone error (MPE) training is examined. The performance and combination of phonetic and graphemic acoustic models are then compared on both Broadcast News (BN) and Broadcast Conversation (BC) data. The final contribution of the paper is a simple scheme for automatically generating pronunciations for use in training and reducing the phonetic out-of-vocabulary rate. The paper concludes with a description and results from using phonetic and graphemic systems in a multipass/combination framework.


Journal of Logic, Language and Information | 2011

Syntactic Structures and Recursive Devices: A Legacy of Imprecision

Marcus Tomalin

Taking Chomsky’s Syntactic Structures as a starting point, this paper explores the use of recursive techniques in contemporary linguistic theory. Specifically, it is shown that there were profound ambiguities surrounding the notion of recursion in the 1950s, and that this was partly due to the fact that influential texts such as Syntactic Structures neglected to define what exactly constituted a recursive device. As a result, uncertainties concerning the role of recursion in linguistic theory have prevailed until the present day, and some of the most common misunderstandings that have appeared in recent discussions are examined at some length. This article shows that debates about such topics are frequently undermined by fundamental misunderstandings concerning core terminology, and the full extent of the prevailing haziness is revealed. An attempt is made, for instance, to distinguish between such things as iterative constructional devices and self-similar syntactic embedding, despite the fact that these are usually both unhelpfully classified as examples of recursion. Consequently, this article effectively constitutes a plea for much greater accuracy and clarity when such important issues are addressed from a linguistic perspective.


international conference on acoustics, speech, and signal processing | 2010

Recent improvements to the Cambridge Arabic Speech-to-Text systems

Marcus Tomalin; Frank Diehl; Mark J. F. Gales; Junho Park; Philip C. Woodland

This paper describes recent improvements to the Cambridge Arabic Large Vocabulary Continuous Speech Recognition (LVSCR) Speech-to-Text (STT) system. It is shown that Multi-Layer Perceptron (MLP) features trained on phonetic targets can improve the performance of both phonemic and graphemic systems. Also, a morphological decomposition scheme is extended from the graphemic domain to the phonetic domain, and particular attention is given to the task of dictionary generation. Finally, the use of Boosted Maximum Mutual Information (BMMI) training is explored both for individual systems and in the context of system combination. The full system results show that the combined use of the above techniques reduces the Word Error Rate (WER) of the best individual system by up to 12% relative, and that the incorporation of morphological decomposition and BMMI within the four individual branches of the combined system reduces the WER by up to 9% relative.


conference of the european chapter of the association for computational linguistics | 2014

Word Ordering with Phrase-Based Grammars

Adrià de Gispert; Marcus Tomalin; Bill Byrne

We describe an approach to word ordering using modelling techniques from statistical machine translation. The system incorporates a phrase-based model of string generation that aims to take unordered bags of words and produce fluent, grammatical sentences. We describe the generation grammars and introduce parsing procedures that address the computational complexity of generation under permutation of phrases. Against the best previous results reported on this task, obtained using syntax driven models, we report huge quality improvements, with BLEU score gains of 20+ which we confirm with human fluency judgements. Our system incorporates dependency language models, large n-gram language models, and minimum Bayes risk decoding.


Lingua | 2002

The formal origins of syntactic theory

Marcus Tomalin

This paper explores the influence of mathematics on the development of syntactic theory in the 20th century. In particular, Hilbertian Formalism is discussed, with specific reference to the use of formal proof-theoretical procedures, the annexation of recursive function theory and the assumption that mathematical form and meaning are separable. It is shown that certain of these pre-occupations began to influence the later work of the post-Bloomfieldians and that, ultimately, various techniques derived from the Formalist enterprise were directly incorporated into early versions of Transformational Generative Grammar.


international conference on acoustics, speech, and signal processing | 2006

Discriminatively Trained Gaussian Mixture Models for Sentence Boundary Detection

Marcus Tomalin; Philip C. Woodland

This paper compares the performance of two types of prosodic feature models (PFMs) in a sentence boundary detection task. Specifically, systems are compared that use discriminatively trained Gaussian mixture models (MMI-GMMs) and CART-style decision trees (CDT-PFMs), along with task-specific language models, in a lattice-based decoding framework in order automatically to insert slash unit (SU) boundaries into automatic speech recognition (ASR) transcriptions of input audio files. It is shown that a system which uses MMI-GMMs performs as well as a system that uses conventional CDT-PFMs. In addition, it is shown that, when the CDT-PFM and MMI-GMM systems are combined by taking weighted averages of their respective probability streams, error rate improvements of up to 0.8% abs over the CDT-PFM baseline can be obtained for four different test sets


Lingua | 2003

Goodman, Quine, and Chomsky: from a grammatical point of view

Marcus Tomalin

Abstract This paper explores specific issues concerning linguistic theory and the use of simplicity criteria in the early Transformational Generative Grammar literature. In particular, the influence of Nelson Goodman and Willard Van Orman Quine upon the work of Noam Chomsky during the 1950s is assessed. The main topics considered include the development of constructional system theory, the use of mechanical procedures for measuring the formal simplicity of extralogical bases in constructional systems, and the way in which Chomsky adapted these techniques in order to facilitate the analysis of natural language. In this context, the influence of constructive nominalism upon Chomskys early work is also considered. Finally, the relationship between the notion of simplicity in 1950s-style generative grammar and more recent discussions of economy in the Minimalist Program is assessed.

Collaboration


Dive into the Marcus Tomalin's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Frank Diehl

University of Cambridge

View shared research outputs
Top Co-Authors

Avatar

Junho Park

University of Cambridge

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Rasmus Dall

University of Edinburgh

View shared research outputs
Top Co-Authors

Avatar

Bill Byrne

University of Cambridge

View shared research outputs
Top Co-Authors

Avatar

Xunying Liu

University of Cambridge

View shared research outputs
Top Co-Authors

Avatar

Simon King

University of Edinburgh

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge