Atsushi Sako
Kobe University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Atsushi Sako.
Computer Speech & Language | 2011
Shinji Watanabe; Tomoharu Iwata; Takaaki Hori; Atsushi Sako; Yasuo Ariki
In a real environment, acoustic and language features often vary depending on the speakers, speaking styles and topic changes. To accommodate these changes, speech recognition approaches that include the incremental tracking of changing environments have attracted attention. This paper proposes a topic tracking language model that can adaptively track changes in topics based on current text information and previously estimated topic models in an on-line manner. The proposed model is applied to language model adaptation in speech recognition. We use the MIT OpenCourseWare corpus and Corpus of Spontaneous Japanese in speech recognition experiments, and show the effectiveness of the proposed method.
multimedia and ubiquitous engineering | 2008
Tetsuya Takiguchi; Atsushi Sako; Tomoyuki Yamagata; Nobuyuki Miyake; Yasuo Ariki
For a mobile robot to serve people in actual environments, such as a living room or a party room, it must be easy to control because some users might not even be capable of operating a computer keyboard. For nonexpert users, speech recognition is one of the most effective communication tools when it comes to a hands-free (human-robot) interface. This paper describes a new mobile robot with hands-free speech recognition. For a hands- free speech interface, it is important to detect commands for a robot in spontaneous utterances. Our system can understand whether users utterances are commands for the robot or not, where commands are discriminated from human- human conversations by acoustic features. Then the robot can move according to the users voice (command). In order to capture the users voice only, a robust voice detection system with AdaBoost is also described.
international conference on acoustics, speech, and signal processing | 2005
Atsushi Sako; Yasuo Ariki
It is a difficult problem to recognize baseball live speech because the speech is rather fast, noisy, emotional and disfluent due to rephrasing, repetition, mistakes and grammatical deviation caused by the spontaneous speaking style. To solve these problems, we propose a speech recognition method incorporating emotion state as well as baseball game knowledge, such as counting of inning, out, strike and ball. Due to this emotion state and task dependent knowledge, the proposed method can effectively prevent speech recognition errors. This method is formalized in the framework of probability theory and implemented in the conventional speech decoding (Viterbi) algorithm. Experimental results show that the proposed approach improves the structuring and segmentation accuracy as well as keywords accuracy.
acm multimedia | 2005
Yasuo Ariki; Tetsuya Takiguchi; Atsushi Sako
In this paper, we propose a structure and components of a conversational television set(TV) to which we can ask anything on the broadcasted contents and receive the interesting information from the TV. The conversational TV is composed of two types of processing; back end processing and front end processing. In the back end processing, broadcasted contents are analyzed using speech and video recognition techniques and both of the meta data and the structure are extracted. In the front end processing, human speech and hand action are recognized to understand the user intention. We show some applications, being developed in this conversational TV with multi-modal interactions, such as word explanation, human information retrieval, event retrieval in soccer and baseball video games with contextual awareness.
Archive | 2008
Tetsuya Takiguchi; Atsushi Sako; Tomoyuki Yamagata; Yasuo Ariki
Robots are now being designed to become a part of the lives of ordinary people in social and home environments, such as a service robot at the office, or a robot serving people at a party (H. G. Okuno, et al., 2002 ) (J. Miura, et al., 2003). One of the key issues for practical use is the development of technologies that allow for user-friendly interfaces. This is because many robots that will be designed to serve people in living rooms or party rooms will be operated by non-expert users, who might not even be capable of operating a computer keyboard. Much research has also been done on the issues of human-robot interaction. For example, in (S. Waldherr, et al., 2000), the gesture interface has been described for the control of a mobile robot, where a camera is used to track a person, and gestures involving arm motions are recognized and used in operating the mobile robot. Speech recognition is one of our most effective communication tools when it comes to a hands-free (human-robot) interface. Most current speech recognition systems are capable of achieving good performance in clean acoustic environments. However, these systems require the user to turn the microphone on/off to capture voices only. Also, in hands-free environments, degradation in speech recognition performance increases significantly because the speech signal may be corrupted by a wide variety of sources, including background noise and reverberation. In order to achieve highly effective speech recognition, in (H. Asoh, et al., 1999), a spoken dialog interface of a mobile robot was introduced, where a microphone array system is used. In actual noisy environments, a robust voice detection algorithm plays an especially important role in speech recognition, and so on because there is a wide variety of sound sources in our daily life, and because the mobile robot is requested to extract only the object signal from all kinds of sounds, including background noise. Most conventional systems use an energyand zero-crossing-based voice detection system (R. Stiefelhagen, et al., 2004). However, the noise-power-based method causes degradation of the detection performance in actual noisy environments. In (T. Takiguchi, et al., 2007), a robust speech/non-speech detection algorithm using AdaBoost, which can achieve extremely high detection rates, has been described. Also, for a hands-free speech interface, it is important to detect commands in spontaneous utterances. Most current speech recognition systems are not capable of discriminating system requests utterances that users talk to a system from human-human conversations. O pe n A cc es s D at ab as e w w w .in te ch w eb .o rg
IEICE Transactions on Information and Systems | 2008
Atsushi Sako; Tetsuya Takiguchi; Yasuo Ariki
In this paper, we propose a PLSA-based language model for sports-related live speech. This model is implemented using a unigram rescaling technique that combines a topic model and an n-gram. In the conventional method, unigram rescaling is performed with a topic distribution estimated from a recognized transcription history. This method can improve the performance, but it cannot express topic transition. By incorporating the concept of topic transition, it is expected that the recognition performance will be improved. Thus, the proposed method employs a “Topic HMM” instead of a history to estimate the topic distribution. The Topic HMM is an Ergodic HMM that expresses typical topic distributions as well as topic transition probabilities. Word accuracy results from our experiments confirmed the superiority of the proposed method over a trigram and a PLSA-based conventional method that uses a recognized history.
spoken language technology workshop | 2010
Shinji Watanabe; Tomoharu Iwata; Takaaki Hori; Atsushi Sako; Yasuo Ariki
In a real environment, acoustic and language features often vary depending on the speakers, speaking styles and topic changes. This paper focuses on changes in the language environment, and applies a topic tracking model to language model adaptation for speech recognition and topic word extraction for meeting analysis. The topic tracking model can adaptively track changes in topics based on current text information and previously estimated topic models in an online manner. The effectiveness of the proposed method is shown experimentally by the improvement in speech recognition performance achieved with the Corpus of Spontaneous Japanese and by providing appropriate topic information in an automatic meeting analyzer.
IEEE Transactions on Audio, Speech, and Language Processing | 2006
Shinji Watanabe; Atsushi Sako; Atsushi Nakamura
conference of the international speech communication association | 2007
Tomoyuki Yamagata; Atsushi Sako; Tetsuya Takiguchi; Yasuo Ariki
Archive | 2006
Yasuo Ariki; Kentaro Koga; Atsushi Sako; Tetsuya Takiguchi; 淳 佐古; 健太郎 古賀; 康雄 有木; 哲也 滝口