Osamu Yoshioka | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Osamu Yoshioka is active.

Explore More

Publication

Featured researches published by Osamu Yoshioka.

human language technology | 1994

A large-vocabulary continuous speech recognition algorithm and its application to a multi-modal telephone directory assistance system

Yasuhiro Minami; Kiyohiro Shikano; Osamu Yoshioka; Satoshi Takahashi; Tomokazu Yamada; Sadaoki Furui

A golf training and practice apparatus has a television display and a plurality of sensors for sensing positions of a head of a golf club during the swing at a ball at a given location. A circuit responsive to the times of positioning of the head with respect to the sensors provides output signals to enable display on the television display of a graphic representation of the direction of the swing. The circuit also enables alphanumeric display of other parameters of the swing, and provides on the television display a fixed image of the angle of the face of the club at a time just before the ball reaches the ball position location. In order to also provide information relating to the golfers stance, the apparatus includes sensors for indicating in alphanumeric characters the relative weight on each of the golfers feet during various portions of the swing.

Speech Communication | 2014

Prosodic variation enhancement using unsupervised context labeling for HMM-based expressive speech synthesis

Yu Maeno; Takashi Nose; Takao Kobayashi; Tomoki Koriyama; Yusuke Ijima; Hideharu Nakajima; Hideyuki Mizuno; Osamu Yoshioka

This paper proposes an unsupervised labeling technique using phrase-level prosodic contexts for HMM-based expressive speech synthesis, which enables users to manually enhance prosodic variations of synthetic speech without degrading the naturalness. In the proposed technique, HMMs are first trained using the conventional labels including only linguistic information, and prosodic features are generated from the HMMs. The average difference of original and generated prosodic features for each accent phrase is then calculated and classified into three classes, e.g., low, neutral, and high in the case of fundamental frequency. The created prosodic context label has a practical meaning such as high/low of relative pitch at the phrase level, and hence it is expected that users can modify the prosodic characteristic of synthetic speech in an intuitive way by manually changing the proposed labels. In the experiments, we evaluate the proposed technique in both ideal and practical conditions using speech of sales talk and fairy tale recorded under a realistic domain. In the evaluation under the practical condition, we evaluate whether the users achieve their intended prosodic modification by changing the proposed context label of a certain accent phrase for a given sentence.

Speech Communication | 1994

Large-vocabulary continuous speech recognition algorithm applied to a multi-modal telephone directory assistance system

Yasuhiro Minami; Kiyohiro Shikano; Satoshi Takahashi; Tomokazu Yamada; Osamu Yoshioka; Sadaoki Furui

Abstract This paper describes an accurate and efficient algorithm for very-large-vocabulary continuous speech recognition. It is based on a two-stage LR parser with hidden Markov models (HMMs) as phoneme models. To improve recognition accuracy, it uses the forward and backward trellis likehood. To improve search efficiency, it uses adjusting windows and merges candidates that have the same allophonic phoneme sequences and grammatical state, and then merges candidates at the meaning level. This algorithm was applied to a telephone directory assistance system that contains more than 70,000 subscribers (about 80,000 words) to evaluate its speaker-independent speech recognition capabilities. For eight speakers, the algorithm achieved a speech understanding rate of 65% for spontaneous speech. The results show that the system performs well in spite of the large word perplexity. This paper also describes a multi-modal dialog system that uses our large-vocabulary speech recognition algorithm.

international conference on acoustics, speech, and signal processing | 2013

Use of latent words language models in ASR: A sampling-based implementation

Ryo Masumura; Hirokazu Masataki; Takanobu Oba; Osamu Yoshioka; Satoshi Takahashi

This paper applies the latent words language model (LWLM) to automatic speech recognition (ASR). LWLMs are trained taking into account related words, i.e., grouping of similar words in terms of meaning and syntactic role. This means, for example, if a technical word and a general word play a similar syntactic role, they are given a similar probability. This is expected that the LWLM performs robustly over multiple domains. Furthermore, we can expect that the interpolation of the LWLM and a standard n-gram LM will be effective since each of the LMs have different learning criterion. In addition, this paper also describes an approximation method of the LWLM for ASR, in which words are randomly sampled on the LWLM and then a standard word n-gram language model is trained. This enables us one-pass decoding. Our experimental results show that the LWLM performs comparable to the hierarchical Pitman-Yor language model (HPYLM) in a target domain task, and more robustly performs in out-domain tasks. Moreover, an interpolation model with the HPYLM provides a lower word error rate in all the tasks.

international conference on acoustics, speech, and signal processing | 2013

HMM-based expressive speech synthesis based on phrase-level F0 context labeling

Yu Maeno; Takashi Nose; Takao Kobayashi; Tomoki Koriyama; Yusuke Ijima; Hideharu Nakajima; Hideyuki Mizuno; Osamu Yoshioka

This paper proposes a technique for adding more prosodic variations to the synthetic speech in HMM-based expressive speech synthesis. We create novel phrase-level F0 context labels from the residual information of F0 features between original and synthetic speech for the training data. Specifically, we classify the difference of average log F0 values between the original and synthetic speech into three classes which have perceptual meanings, i.e., high, neutral, and low of relative pitch at the phrase level. We evaluate both ideal and practical cases using appealing and fairy tale speech recorded under a realistic condition. In the ideal case, we examine the potential of our technique to modify the F0 patterns under a condition where the original F0 contours of test sentences are known. In the practical case, we show how the users intuitively modify the pitch by changing the initial F0 context labels obtained from the input text.

spoken language technology workshop | 2010

Improving hmm-based extractive summarization for multi-domain contact center dialogues

Ryuichiro Higashinaka; Yasuhiro Minami; Hitoshi Nishikawa; Kohji Dohsaka; Toyomi Meguro; Satoshi Kobashikawa; Hirokazu Masataki; Osamu Yoshioka; Satoshi Takahashi; Genichiro Kikui

This paper reports the improvements we made to our previously proposed hidden Markov model (HMM) based summarization method for multi-domain contact center dialogues. Since the method relied on Viterbi decoding for selecting utterances to include in a summary, it had the inability to control compression rates. We enhance our method by using the forward-backward algorithm together with integer linear programming (ILP) to enable the control of compression rates, realizing summaries that contain as many domain-related utterances and as many important words as possible within a predefined character length. Using call transcripts as input, we verify the effectiveness of our enhancement.

international conference on acoustics, speech, and signal processing | 2014

Role play dialogue topic model for language model adaptation in multi-party conversation speech recognition

Ryo Masumura; Takanobu Oba; Hirokazu Masataki; Osamu Yoshioka; Satoshi Takahashi

This paper introduces an unsupervised language model adaptation technique for multi-party conversation speech recognition. The use of topic models provides one of the most accurate frameworks for unsupervised language model adaptation since they can inject long-range topic information into language models. However, conventional topic models are not suitable for multi-party conversation because they assume that each speech set has each different topic. In a multi-party conversation, each speaker will share the same conversation topic and each speaker utterance will depend on both topic and speaker role. Accordingly, this paper proposes new concept of the “role play dialogue topic model” to utilize multiparty conversation attributes. The proposed topic model can share the topic distribution among each speaker and can also consider both topic and speaker role. The proposed topic model based adaptation realizes a new framework that sets multiple recognition hypotheses for each speaker and simultaneously adapts a language model for each speaker role. We use a call center dialogue data set in speech recognition experiments to show the effectiveness of the proposed method.

conference of the international speech communication association | 2011