Matthew P. Aylett | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthew P. Aylett is active.

Explore More

Publication

Featured researches published by Matthew P. Aylett.

Language and Speech | 2004

The smooth signal redundancy hypothesis: a functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech.

Matthew P. Aylett; Alice Turk

This paper explores two related factors which influence variation in duration, prosodic structure and redundancy in spontaneous speech. We argue that the constraint of producing robust communication while efficiently expending articulatory effort leads to an inverse relationship between language redundancy and duration. The inverse relationship improves communication robustness by spreading information more evenly across the speech signal, yielding a smoother signal redundancy profile. We argue that prosodic prominence is a linguistic means of achieving smooth signal redundancy. Prosodic prominence increases syllable duration and coincides to a large extent with unpredictable sections of speech, and thus leads to a smoother signal redundancy. The results of linear regressions carried out between measures of redundancy, syllable duration and prosodic structure in a large corpus of spontaneous speech confirm: (1) an inverse relationship between language redundancy and duration, and (2) a strong relationship between prosodic prominence and duration. The fact that a large proportion of the variance predicted by language redundancy and prosodic prominence is nonunique suggests that, in English, prosodic prominence structure is the means with which constraints caused by a robust signal requirement are expressed in spontaneous speech.

Journal of the Acoustical Society of America | 2006

Language redundancy predicts syllabic duration and the spectral characteristics of vocalic syllable nuclei

Matthew P. Aylett; Alice Turk

The language redundancy of a syllable, measured by its predictability given its context and inherent frequency, has been shown to have a strong inverse relationship with syllabic duration. This relationship is predicted by the smooth signal redundancy hypothesis, which proposes that robust communication in a noisy environment can be achieved with an inverse relationship between language redundancy and the predictability given acoustic observations (acoustic redundancy). A general version of the hypothesis predicts similar relationships between the spectral characteristics of speech and language redundancy. However, investigating this claim is hampered by difficulties in measuring the spectral characteristics of speech within large conversational corpora, and difficulties in forming models of acoustic redundancy based on these spectral characteristics. This paper addresses these difficulties by testing the smooth signal redundancy hypothesis with a very high-quality corpus collected for speech synthesis, and presents both durational and spectral data from vowel nuclei on a vowel-by-vowel basis. Results confirm the duration/language redundancy results achieved in previous work, and show a significant relationship between language redundancy factors and the first two formants, although these results vary considerably by vowel. In general, however, vowels show increased centralization with increased language redundancy.

intelligent virtual agents | 2007

The CereVoice Characterful Speech Synthesiser SDK

Matthew P. Aylett; Christopher J. Pidcock

CereProc® Ltd. have recently released a beta version of a commercial unit selection synthesiser featuring XML control of speech style. The system is freely available for academic use and allows fine control of the rendered speech as well as full timings to interface with avatars and other animation.

international conference on acoustics, speech, and signal processing | 2013

Speaker and language independent voice quality classification applied to unlabelled corpora of expressive speech

John Kane; Stefan Scherer; Matthew P. Aylett; Louis-Philippe Morency; Christer Gobl

Voice quality plays a pivotal role in speech style variation. Therefore, control and analysis of voice quality is critical for many areas of speech technology. Until now, most work has focused on small purpose built corpora. In this paper we apply state-of-the-art voice quality analysis to large speech corpora built for expressive speech synthesis. A fuzzy-input fuzzy-output support vector machine classifier is trained and validated using features extracted from these corpora. We then apply this classifier to freely available audiobook data and demonstrate a clustering of the voice qualities that approximates the performance of human perceptual ratings. The ability to detect voice quality variation in these widely available unlabelled audiobook corpora means that the proposed method may be used as a valuable resource in expressive speech synthesis.

Language and Cognitive Processes | 2001

Taking the Hit: Leaving Some Lexical Competition To Be Resolved Post-Lexically.

Ellen Gurman Bard; Catherine Sotillo; M. Louise Kelly; Matthew P. Aylett

Natural variations in word pronunciation are not noise but information. Duration, prosodic prominence, vowel centralisation, and phonological reduction or assimilation can indicate whether a word stands alone or forms part of an utterance, whether it lies at the boundary of a major prosodic unit, is predictable in its context, or refers to a Given or a New entity. Though this variation is related to high-level factors, most discussions of lexical access seem to assume that lower level processes—acousticphonetic processing, phonological representation in the mental lexicon, and lexical effects on phonological representations of input—simply overcome variations in natural pronunciation, assuring that the correct word is accessed and ultimately selected, with no shortfall in the process that demands the participation of higher level information. Many of the papers in this volume deal with the architectural detail of this view. This paper summarises work on spontaneous unscripted speech, where variations most naturally occur, which shows why any such approach is counterproductive.

human factors in computing systems | 2016

The Smartphone: A Lacanian Stain, A Tech Killer, and an Embodiment of Radical Individualism

Matthew P. Aylett; Shaun W. Lawson

YAFR (Yet another futile rant) presents the smartphone: an unstoppable piece of technology generated from a perfect storm of commercial, technological, social and psychological factors. We begin by misquoting Steve Jobs and by being unfairly rude about the HCI community. We then consider the smartphones ability to kill off competing technology and to undermine collectivism. We argue that its role as a Lacanian stain, an exploitative tool, and as a means of concentrating power into the hands of the few, make it a technology that will rival the personal automobile in its effect on modern society.

human factors in computing systems | 2016

Don't Say Yes, Say Yes: Interacting with Synthetic Speech Using Tonetable

Matthew P. Aylett; Graham Pullin; David A. Braude; Blaise Potard; Shannon Hennig; Marilia Antunes Ferreira

This demo is not about what you say but how you say it. Using a tangible system, Tonetable, we explore the shades of meaning carried by the same word said in many different ways. The same word or phrase is synthesised using the Intel Edison with different expressive techniques. Tonetable allows participants to play these different tokens and select the manner they should be synthesised for different contexts. Adopting the visual language of mid-century modernism, the system provokes participants to think deeply about how they might want to say yes, oh really, or I see. Designed with the very serious objective of supporting expressive personalisation of AAC devices, but with the ability to produce a playful and amusing experience, Tonetable will change the way you think about speech synthesis and what yes really means.

IEEE Transactions on Affective Computing | 2017

Speech Synthesis for the Generation of Artificial Personality

Matthew P. Aylett; Alessandro Vinciarelli; Mirjam Wester

A synthetic voice personifies the system using it. In this work we examine the impact text content, voice quality and synthesis system have on the perceived personality of two synthetic voices. Subjects rated synthetic utterances based on the Big-Five personality traits and naturalness. The naturalness rating of synthesis output did not correlate significantly with any Big-Five characteristic except for a marginal correlation with openness. Although text content is dominant in personality judgments, results showed that voice quality change implemented using a unit selection synthesis system significantly affected the perception of the Big-Five, for example tense voice being associated with being disagreeable and lax voice with lower conscientiousness. In addition a comparison between a parametric implementation and unit selection implementation of the same voices showed that parametric voices were rated as significantly less neurotic than both the text alone and the unit selection system, while the unit selection was rated as more open than both the text alone and the parametric system. The results have implications for synthesis voice and system type selection for applications such as personal assistants and embodied conversational agents where developing an emotional relationship with the user, or developing a branding experience is important.

human factors in computing systems | 2017

Designing Speech, Acoustic and Multimodal Interactions

Cosmin Munteanu; Pourang Irani; Sharon Oviatt; Matthew P. Aylett; Gerald Penn; Shimei Pan; Nikhil Sharma; Frank Rudzicz; Randy Gomez; Benjamin R. Cowan; Keisuke Nakamura

Traditional interfaces are continuously being replaced by mobile, wearable, or pervasive interfaces. Yet when it comes to the input and output modalities enabling our interactions, we have yet to fully embrace some of the most natural forms of communication and information processing that humans possess: speech, language, gestures, thoughts. Very little HCI attention has been dedicated to designing and developing spoken language, acoustic-based, or multimodal interaction techniques, especially for mobile and wearable devices. In addition to the enormous, recent engineering progress in processing such modalities, there is now sufficient evidence that many real-life applications do not require 100% accuracy of processing multimodal input to be useful, particularly if such modalities complement each other. This multidisciplinary, one-day workshop will bring together interaction designers, usability researchers, and general HCI practitioners to analyze the opportunities and directions to take in designing more natural interactions especially with mobile and wearable devices, and to look at how we can leverage recent advances in speech, acoustic, and multimodal processing.

human factors in computing systems | 2016

e-Seesaw: A Tangible, Ludic, Parent-child, Awareness System

Yingze Sun; Matthew P. Aylett; Yolanda Vazquez-Alvarez

In modern China, the pace of life is becoming faster and working pressure is increasing often leading to pressure on families and family interaction. 23 pairs of working parents and their children were asked what they saw as their main communication challenges and how they currently used communication technology to stay in touch. The mobile phone was the dominant form of communication despite being poorly rated by children as a way of enhancing a sense of connection and love. Parents and children were presented with a series of design probes to investigate how current communication technology might be supported or enhanced with a tangible and playful awareness system. One of the designs, the e-Seesaw, was selected and evaluated in a lab and home setting. Participant reaction was positive with the design provoking a novel perspective on remote parent-child interaction allowing even very young children to both initiate and control communication.

Explore More