Nick Campbell
University of Granada
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nick Campbell.
Journal of the Acoustical Society of America | 2002
Nick Campbell; Andrew J. Hunt
In a speech synthesizer apparatus, a weighting coefficient training controller calculates acoustic distances in second acoustic feature parameters between one target phoneme from the same phoneme and the phoneme candidates other than the target phoneme based on first acoustic feature parameters and prosodic feature parameters, and determines weighting coefficient vectors for respective target phonemes defining degrees of contribution to the second acoustic feature parameters for respective phoneme candidates by executing a predetermined statistical analysis therefor. Then, a speech unit selector searches for a combination of phoneme candidates which correspond to a phoneme sequence of an input sentence and which minimizes a cost including a target cost representing approximate costs between a target phoneme and the phoneme candidates and a concatenation cost representing approximate costs between two phoneme candidates to be adjacently concatenated, and outputs index information on the searched out combination of phoneme candidates. Further, a speech synthesizer synthesizes a speech signal corresponding to the input phoneme sequence by sequentially reading out speech segments of speech waveform signals corresponding to the index information and concatenating the read speech segments of the speech waveform signals.
Journal of the Acoustical Society of America | 1997
Nick Campbell; Mary E. Beckman
In English and Dutch, pitch accents occur only on lexically prominent syllables. Such syllables are not always accented in longer utterances, however, and traditional descriptions differentiated ‘‘stress’’ proper as a local increase in loudness, although the intonational event is the most salient cue to prominence, far outweighing any differences in overall rms amplitude. Recent work by Sluijter and colleagues [J. Acoust. Soc. Am. 101, 503–513 (1997)] indicates that stressed syllables in Dutch are associated with differentially increased energy at frequencies well above the fundamental, and that these spectral tilt differences are a robust cue to relative syllable prominence — whether or not the word is in focal prominence. Because accents are not necessarily associated with focused words, however, their experiments do not tell us whether spectral tilt differentiates lexically stressed from unstressed syllables in the absence of an associated intonational prominence. The current study examines five acoust...
agent-directed simulation | 2004
Nick Campbell
The Expressive Speech Processing project has been collecting natural conversational speech from a number of ordinary people as they go about their daily lives for almost four years now. As a result, we have a better idea of the types of information that are signalled by interactive speech, and propose a framework within which the intended interpretation of an utterance can be specified for dialogue speech synthesis incorporating affective information. We have found that a very large proportion of speech utterances simultaneously convey non-lexical interpersonal and discourse-related information, and propose a model by which such extra-semantic protocols may be incorporated.
Journal of the Acoustical Society of America | 1999
Nick Campbell
Described in this paper are the theoretical background and implementation of a speech synthesis engine that uses an index of features describing a natural speech source to provide pointers to waveform segments that can then be re‐sequenced to form novel utterances. By efficiently labelling the features in speech that are minimally sufficient to describe the perceptually relevant variation in acoustic and prosodic characterisitcs, reduce the task of synthesis to ‘‘retrieval’’ rather than ‘‘replication,’’ and is reduced reuse of original waveform segments is possible without the need for (perceptually damaging) signal processing. The drawback of this system is that it requires a large corpus of natural speech from one speaker, but current improvements in data‐storage devices and cpu technology have overcome this problem. The style of the corpus speech determines the style of the synthesis, but experiments with corpora of emotional speech confirm that by switching source corpora one can easily control the sp...
Journal of the Acoustical Society of America | 1996
Nick Campbell
This paper describes a method for producing high‐quality speech synthesis, without signal processing, using indexing and resequencing of phone‐sized segments from a prerecorded speech corpus for the purpose of reproducing the voice characteristics and speaking style of the original speaker to create novel utterances. It describes procedures for indexing and retrieval using pointers into an external speech corpus that enable the synthesizer to be both language‐ and speaker independent. The prosody‐based synthesis unit selection process does not itself produce speech sounds, but yields an index for a ‘‘random‐access’’ retrieval sequence into the original speech to produce the closest approximation to a desired utterance from the segments available in a given speech corpus. To find the optimal sequence of segments for concatenation, the synthesizer first creates an inventory of phones and their acoustic and prosodic characteristics, and then selects from among these by a weighted combination of features to g...
Journal of the Acoustical Society of America | 2006
Ryoko Hayashi; Chunyue Zhu; Toshiyuki Sadanobu; Jumpei Kaneda; Donna Erickson; Nick Campbell; Miyoko Sugito
It is critical to provide people learning Japanese as a second language information about natural speech, since there is wide variability of articulation and speaking styles associated with various social attitudes and/or expressions in Japanese. For developing teaching materials for English and Chinese learners of Japanese as a second language, two new methods are demonstrated. One utilizes MRI‐movies that dynamically demonstrate differences among vowel articulations in Japanese, English, and Chinese. This approach is effective for teaching good pronunciation, especially with regard to consonant cluster production, since the timing of the articulation for consonant clusters is visibly presented. The other is audio‐visual data of natural speech in Japanese that demonstrate several typical expressions, e.g., wry face and strained (laryngealized) voice for asking favors politely. This type of material shows not only variations of speech communication in Japanese but also cultural differences among native sp...
Journal of the Acoustical Society of America | 2006
Ke Li; Yoko Greenberg; Nagisa Shibuya; Yoshinori Sagisaka; Nick Campbell
In this paper, prosodic characteristics of nonverbal utterances were analyzed using an F0 generation model proposed by Fujisaki aiming at communicative speech generation. From the analysis, the different distributions of F0 generation parameters have been observed for prototypical four dynamic patterns (rise, gradual fall, fall, and rise&down). Since former works have shown that these differences can correspond to their impressions (such as confident‐doubtful, allowable‐unacceptable, and positive‐negative) expressed by multi‐dimensional vectors, we tried to make a computational model from impression vector to F0 generation parameters. By employing a statistical optimization technique, we have achieved the mapping from impression vectors to prosody generation parameters. Perceptual evaluation tests using neutral words have confirmed the effectiveness of the mapping to provide communicative speech prosody. [Work supported in part by Waseda Univ. RISE research project of ‘‘Analysis and modeling of human mech...
Journal of the Acoustical Society of America | 2006
Nick Campbell
A corpus of spontaneous conversational Japanese speech was collected from volunteer subjects who wore high‐quality head‐mounted microphones and recorded their daily spoken interactions to minidisk over a period of 5 years. All recordings were transcribed and tagged according to interlocutor type, and a portion representing about 10% was further annotated for speech‐act, speaker‐state, emotion, and speaking style. This paper presents timing data from the corpus, showing how the same utterance can vary according to speaking style and other factors. It presents the hundred most common utterances in the corpus and relates their durations to spectral and prosodic characteristics that vary according to affect, attitude, intention, and relationship with the listener.
conference of the international speech communication association | 2004
Nick Campbell
language resources and evaluation | 2004
Nick Campbell