Nicholas Kibre
Panasonic
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nicholas Kibre.
Journal of the Acoustical Society of America | 1999
Nicholas Kibre
This paper describes the results of a study of pitch accent categories in Welsh, and their distribution in natural speech. In addition to these descriptive findings, it describes the development of a data‐driven approach to intonation. To be useful for the analysis of natural speech, an intonational model must provide means for determining an unambiguous phonological representation for any observed speech token. One approach which will be addressed is the Tilt Model [K. Dusterhoff and A. Black, ESCA Workshop on Intonation (1997); P. Taylor, ICSLP (1998)], amended as necessary to account for the accent patterns of Welsh. This will be compared with an alternative of using statistical methods, particularly factor analysis, to determine the underlying descriptive dimensions of pitch accents in the language. The resulting model will be applied to a corpus of spoken Welsh utterances, and initial findings on the distribution of accent types will be reported.
Journal of the Acoustical Society of America | 1999
Nicholas Kibre
This paper describes how overlap regions are identified in demisyllabic filter trajectory concatenation units used in a residually excited formant synthesis system, currently being developed at PTI/STL [Pearson, Kibre, and Niedzielski, ICSLP 1998]. This approach has been shown to produce clear and human‐like synthetic speech, but as in other concatenative methods smooth transitions in cross‐fade regions are essential to sound quality. This can best be obtained if a nucleus region is identified for each segment type which has consistent filter trajectories in all tokens. Database size precludes manual tuning and labeling, and this paper considers and compares two approaches to automating this task. The first of these is a rule‐based approach, in which observation and phonological theory are used to formulate an ideal cross‐fade region for each segment. Each token is searched for its best match to this definition, as determined according to penalty weights for different kinds of deviation from it. The secon...
Journal of the Acoustical Society of America | 1996
Nicholas Kibre
Prediction of segment duration in TTS systems has in the past generally been accomplished under arithmetic approaches such as multiplicative and incompressability models [D. Klatt, J. Acoust. Am. 54, 1102–1104 (1973); R. Port, J. Acoust. Soc. Am. 69, 262–724 (1981)], and sums‐of‐products models [J. van Santen, Comput. Speech Lang.8 , 95–128 (1994)]. Other research, however, suggests a more complex speech timing system than is captured by such models [H. Gopal, J. Phon. 18, 497–518 (1990)]. In this study, a limited domain of vowel duration phenomena are modeled under several designs of simple feedforward networks. The networks’ performance is then examined by using their output in our TTS system, and evaluating the naturalness of the resulting utterances in a perceptual experiment. Preliminary results indicate that simple two‐layer perceptrons are able to learn the basic patterns of environmentally conditioned variations in segment duration, while more sophisticated networks are required to capture the com...
Journal of the Acoustical Society of America | 1995
Nicholas Kibre; Kazue Hata
Distinguishing between the voiceless fricatives /f/ and /θ/ is a difficult problem in natural and synthetic speech. In a previous experiment using natural stimuli [K. Hata et al., Proc. ICSLP 327–330 (1994)], it was found that adding vowel transitions increased identification for /f/ at least 15% in comparison with frication‐only stimuli. However, with vowel transitions, the identification of /θ/ failed to show significant improvement. The purpose of the current study was to investigate, with an improved procedure, significant cues for /θ/ which we can use in our synthesizer. Six monosyllabic nonsense words (e.g., /fiyk/, /θayk/) were recorded. Segments of approximately 30‐ms duration from different locations of /θ/ and its following vowel were spliced into f‐initial words. Eight subjects were asked to identify each stimulus as ‘‘th,’’ ‘‘f’’ or ‘‘indistinguishable.’’ In the /iy/ context, /f/‐initial stimuli spliced with fricative‐vowel transitions from /θ/ were perceived as /θ/ 55% of the time, while stim...
Journal of the Acoustical Society of America | 1999
Steve Pearson; Nicholas Kibre; Nancy Niedzielski
Archive | 1997
Nicholas Kibre; Yoshizumi Terada; Kazue Hata; Rhonda Shaw
Archive | 1996
Kazue Hata; Nicholas Kibre
Archive | 2001
Nicholas Kibre; Ted H. Applebaum
Journal of the Acoustical Society of America | 2000
Nicholas Kibre; Steve Pearson
conference of the international speech communication association | 1998
Steve Pearson; Nicholas Kibre; Nancy Niedzielski