Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yoshinori Sagisaka is active.

Publication


Featured researches published by Yoshinori Sagisaka.


Speech Communication | 2003

Multi-class composite N-gram language model

Hirofumi Yamamoto; Shuntaro Isogai; Yoshinori Sagisaka

Abstract A new language model is proposed to cope with the scarcity of training data. The proposed multi-class N -gram achieves an accurate word prediction capability and high reliability with a small number of model parameters by clustering words multi-dimensionally into classes, where the left and right context are independently treated. Each multiple class is assigned by a grouping process based on the left and right neighboring characteristics. Furthermore, by introducing frequent word successions to partially include higher order statistics, multi-class N -grams are extended to more efficient multi-class composite N -grams. In comparison to conventional word tri-grams, the multi-class composite N -grams achieved 9.5% lower perplexity and a 16% lower word error rate in a speech recognition experiment with a 40% smaller parameter size.


2009 Oriental COCOSDA International Conference on Speech Database and Assessments | 2009

Phonetic aspects of content design in AESOP (Asian English Speech cOrpus Project)

Tanya Visceglia; Chiu-yu Tseng; Mariko Kondo; Helen M. Meng; Yoshinori Sagisaka

This research is part of the ongoing multinational collaboration “Asian English Speech cOrpus Project” (AESOP), whose aim is to build up an Asian English speech corpus representing the varieties of English spoken in Asia. AESOP is an international consortium of linguists, speech scientists, psychologists and educators from Japan, Taiwan, Hong Kong, China, Thailand, Indonesia and Mongolia. Its primary aim is to collect and compare Asian English speech corpora from the countries listed above in order to derive a set of core properties common to all varieties of Asian English, as well as to discover features that are particular to individual varieties. Each research team will use a common recording setup and share an experimental task set, and will develop a common, open-ended annotation system. Moreover, AESOP-collected corpora will be an open resource, available to the research community at large. The initial stage of the phonetics aspect of this project will be devoted to designing spoken-language tasks which will elicit production of a large range of English segmental and suprasegmental characteristics. These data will be used to generate a catalogue of acoustic characteristics particular to individual varieties of Asian English, which will then be compared with the data collected by other AESOP members in order to determine areas of overlap between L1 and L2 English as well as differences among varieties of Asian English.


Speech Communication | 2005

Generation and perception of F0 markedness for communicative speech synthesis

Yoshinori Sagisaka; Takumi Yamashita; Yoko Kokenawa

Abstract Aiming at natural F 0 control for conversational speech synthesis using attributes of constituent output words, F 0 characteristics are analyzed from both generation and perception viewpoints. We recorded commonly used two-phrase utterances consisting of Japanese adjective and adverb phrases expressing different degree of markedness under designed conversational situations, and compared their F 0 characteristics. The comparison showed the consistent F 0 control dependencies not only on adverbs themselves but also on the attribute of following adjective phrases. Strong positive or negative correlation is observed between the markedness of adverbs and F 0 height when an adjective phrase showing positiveness or negativeness is followed to the current adverb phrase. These consistencies have been perceptually confirmed by naturalness evaluation tests using the same two-phrase samples with different F 0 heights. Finally, a computational model of conversational F 0 control is proposed using lexical information of adjectives showing positiveness or negativeness and adverbs expressing markedness. F 0 estimation experiments quantitatively showed the possibility of F 0 control for natural conversational speech synthesis using the attribute of constituent output words.


international conference on acoustics, speech, and signal processing | 2009

Objective evaluation of English learners' timing control based on a measure reflecting perceptual characteristics

Shizuka Nakamura; Shigeki Matsuda; Hiroaki Kato; Minoru Tsuzaki; Yoshinori Sagisaka

Automatic evaluation of English timing control proficiency is carried out by comparing segmental duration differences between learners and reference native speakers. To obtain an objective measure matched to human subjective evaluation, we introduced a measure reflecting perceptual characteristics. The proposed measure evaluates duration differences weighted by the loudness of the corresponding speech segment and the differences or jumps in loudness from the two adjacent speech segments. Experiments showed that estimated scores using the new perception-based measure provided a correlation coefficient of 0.72 with subjective evaluation scores given by native English speakers on the basis of naturalness in timing control. This correlation turned out to be significantly higher than that of 0.54 obtained when using a simple duration difference measure.


natural language processing and knowledge engineering | 2009

Communicative prosody generation using language common features provided by input lexicons

Yoko Greenberg; Minoru Tsuzaki; Hiroaki Kato; Yoshinori Sagisaka

We already examined language independent control characteristics of the communicative prosody generation using multi-dimensional impressions of input lexicons. In this paper, we synthesized English single phrase utterances using prosodic characteristics of Japanese speech aiming at language independent applications. The reading-style speech prosodies of English phrases were modified by prosodic characteristics derived from one-word utterance of Japanese speech “n”. Modifications were carried out based on lexical impressions corresponding to six impressions consisting of confident, doubtful, allowable, unacceptable, positive and negative. The perceptual evaluation experiment showed the naturalness of speech with communicative prosody modified by the impression of input lexicons. These experimental results support the usefulness of the communicative prosody control based on the impression of input lexicons and suggest possibilities of language independent applications.


international conference oriental cocosda held jointly with conference on asian spoken language research and evaluation | 2013

Global F0 control parameter prediction based on impressions for communicative prosody generation

Lu Shao; Yoko Greenberg; Yoshinori Sagisaka

Aiming at communicative speech synthesis, prosody control using impressions has been proposed by applying the correlation between impressions of input lexicons and prosody. In this paper, as the first step to compute communicative prosody, we attempt to predict the F0 generation model parameters by estimating the impressions of input sentence from its constituent lexicons. To obtain an impression vector consisting of three dimensional factors (positive-negative, confident-doubtful and allowable-unacceptable) for a given input utterance, we proposed a computational scheme to calculate impression vectors using impression scores of constituent words. Using obtained sentence impression vectors, F0 control parameters are predicted by applying three-layered feed-forward neural networks. To evaluate the effectiveness of the proposed computational framework, we experimentally confirmed that F0 parameters of communicative speech could be generated from the impressions of input lexicons.


Speech Communication | 2005

Effect of speaking rate on the acceptability of change in segment duration

Makiko Muto; Hiroaki Kato; Minoru Tsuzaki; Yoshinori Sagisaka

The acceptability of changes in segment duration at different speaking rates is studied to find useful perceptual characteristics for designing an objective naturalness measure in speech synthesis. Based on a series of previous studies on the intra-phrase positional dependency of perceptual acceptability, we investigate three factors: (1) speaking rate, (2) position within a phrase, and (3) presence/absence of a carrier sentence using three-mora (three-syllable) phrases at three rates (fast, normal and slow) with or without a carrier sentence (Experiment 1). Seven listeners evaluate the acceptability of resynthesized speech stimuli in which one of the vowel segments was either lengthened or shortened by up to 50 ms. Moreover, to understand the observed results within a psychophysical or auditory-based framework instead of language-dependent features, we simplify and replicate the temporal structures of the speech stimuli used and investigate the corresponding three factors (Experiment 2). Ten listeners rate the difference between standard and comparison stimuli in which one of the duration was either lengthened or shortened by up to 40 ms. The speech experiment shows that the acceptability for the same amount of absolute change decreased with an increase in speaking rate, i.e., the listeners more sensitively responded to the same absolute duration change when the speaking rate was fast than when it was slow. Similarly, the non-speech experiment shows that the detectability for the same amount of absolute change increased with an increase in tempo. In addition, the speech experiment shows the differences in acceptability declinations due to intra-phrase positions at three speaking rates. Similarly, the non-speech experiment shows the differences in the detectability due to temporal positions at three tempi. These agreements between the speech and non-speech experiments suggest that the two experiments share a common perceptual mechanism in processing temporal differences. On the other hand, the speech experiment shows no consistent tendency of the acceptability declinations due to the presence/absence of a carrier sentence, while the non-speech experiment shows, in several cases, that the presence of a carrier context could lower the detectability.


natural language processing and knowledge engineering | 2009

F0 analysis for Japanese conversational speech synthesis

Hideharu Nakajima; Yoshinori Sagisaka

This paper proposes a conversational style text-to-speech synthesis scheme based on an analysis of fundamental frequency, F0. Through the analysis, we confirm that conversational F0 can be represented by the superpositional model using three components ranging utterance, major phrase, and minor phrase. We compare each component of the model between conversational style and reading style to investigate the following points: where big F0 discrepancies are found, what linguistic factors concern to the discrepancies, and to what extent do such discrepancies occur. This paper uses real domain data that includes a lot of linguistic context. Analysis confirms that large differences occur in global components such as single span whole utterances and phrases, and that the differences occur at or around domain-specific expressions. The analysis also reveals that local components are almost the same in both styles. These analyses show that it is necessary to estimate the utterance and phrase components from words attributes other than the grammatical clues to realize conversational synthesis in the super positional manner.


international conference on acoustics, speech, and signal processing | 2005

Speech recognition of a named entity

Tatsuhiko Tomita; Yoshiyuki Okimoto; Hirofumi Yamamoto; Yoshinori Sagisaka

A hierarchical language model is newly applied to identify a named entity consisting of multiple word sequences for continuous speech recognition. By redesigning an out-of-vocabulary model of a single word using phonotactic constraints for a named entity, a hierarchical model is composed harmoniously with conventional word and word-class N-grams. Continuous speech recognition experiments aimed at movie-title identification showed the effectiveness of this modeling in the task of inquiries on these titles. These results ensure that the proposed hierarchical language modeling architecture is applicable to multiple word successions for speech recognition to cope with unregistered expressions and enables the mixed use of different statistics harmoniously.


asia pacific signal and information processing association annual summit and conference | 2014

Sentiment analysis of color attributes derived from vowel sound impression for multimodal expression

Kanako Watanabe; Yoko Greenberg; Yoshinori Sagisaka

In this paper, the sentiment correlation between color and speech has been investigated aiming at multimodal expression of information embedded in speech. To cope with the necessity of information specification for further analyses and control of speech so called para-linguistic information, we carried out an sentiment experiment on the correlations between speech characteristics and color attributes associated by perceptual impressions. Using five vowel categories with twelve different prosodie patterns, we found high correlations between speech features (vowel categories and FO) and three attributes of color (hue, saturation and value). These correlations open up wide possibilities to describe and quantitatively analyze speech information that has not yet been systematically studied but partially treated as emotional variants or para-linguistic information.

Collaboration


Dive into the Yoshinori Sagisaka's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Minoru Tsuzaki

Kyoto City University of Arts

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chatchawarn Hansakunbuntheung

Thailand National Science and Technology Development Agency

View shared research outputs
Top Co-Authors

Avatar

Hirofumi Yamamoto

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Genichiro Kikui

Okayama Prefectural University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge