Yasuhisa Niimi
Kyoto Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yasuhisa Niimi.
Journal of the Acoustical Society of America | 1996
Yasuhisa Niimi
A dialog control strategy was proposed based on the reliability of speech recognition. Receiving an utterance including several items of information, the dialog system using this strategy accepts those items that have been recognized reliably, while it rejects the rest of the items and asks the speaker to utter an utterance including the unaccepted items. The operation of the dialog system was mathematically analyzed under the assumption that the system can compute the reliability R(I) of each item of information, independently of the position of the item in a sentence and accept the item if R(I) is greater than a threshold t, and that the probability p that an item be accepted and the probability q that the accepted item be correct are given by a function of the threshold t. The analysis showed that the operation of the system could be described by a Markov process, and derived two important quantities N and Pac to evaluate the performance of the dialog system: N is the average number of turns taken betw...
Systems and Computers in Japan | 1988
Yasuhisa Niimi; Yutaka Kobayashi; Shigeru Uzuhara
This paper proposes a method for the top-down utilization of the linguistic constraints in the speed understanding system. In the proposed method, the syntactic constraint is represented by the context-free grammar, and the semantic constraint is represented by the semantic marker and the case frame. The two kinds of constraints are integrated by using the definite clause grammar, which is an extension of the context-free grammar. Based on the resulting representation, the word predictor program described by Prolog is generated by a mechanical transformation procedure. The left-to-right parsing scheme is assumed as the control. As the parser, a combination of top-down and bottom-up methods is adopted to parse utterances. This eliminates the problem in which the procedure falls in an infinite loop if a recursive rule exists, which has been serious in the traditional word predictor. The method was applied to a task, and its validity was verified. The average branching factor of the task was estimated by a simulation.
international conference on computational linguistics | 1986
Yasuhisa Niimi; Shigeru Uzuhara; Yutaka Kobayashi
This paper describes a method for converting a task-dependent grammar into a word predictor of a speech understanding system. Since the word prediction is a top-down operation, left recursive rules induces an infinite looping. We have solved this problem by applying an algorithm for bottom-up parsing.
pacific rim international conference on artificial intelligence | 2002
Masahiro Araki; Kiyoshi Ueda; Masashi Akita; Takuya Nishimoto; Yasuhisa Niimi
We propose a multimodal dialogue description language by extending VoiceXML, which is a spoken dialogue description language for voice user interface. We added the specification that can output a text, image, 3D image, life-like communication agent, and multimedia clip.
annual meeting of the special interest group on discourse and dialogue | 2001
Masahiro Araki; Yukihiko Kimura; Takuya Nishimoto; Yasuhisa Niimi
We have developed a discourse level tagging tool for spoken dialogue corpus using machine learning methods. As discourse level information, we focused on dialogue act, relevance and discourse segment. In dialogue act tagging, we have implemented a transformation-based learning procedure and resulted in 70% accuracy in open test. In relevance and discourse segment tagging, we have implemented a decision-tree based learning procedure and resulted in about 75% and 72% accuracy respectively.
Recent Research Towards Advanced Man-Machine Interface Through Spoken Language | 1996
Yasuhisa Niimi; Yutaka Kobayashi
Publisher Summary This chapter describes a method for the discourse analysis performed in the speech dialogue system that is being developing. The purpose of the analysis is to provide the system with top-down predictions. The predictions include words and syntactic rules likely to be used in the next utterance. Contextual information is analyzed in terms of topics and discourse goals. The transition of topics through a conversation is represented as an AND-OR tree of which the nodes correspond to topics. The prediction of topics is done by an expansion of the currently focused node. The structure of discourse goals is analyzed by a grammar as described in a context-free grammar. The terminal symbols of this grammar correspond to discourse goals of utterances. The top-down application of this discourse grammar hypothesizes discourse goals likely to appear in the utterance, each of which is translated into syntactic rules. The simulation of the dialogue system using typed input has proved that these top-down hypotheses reduce the vocabulary size effectively by about 60%.
Journal of the Acoustical Society of America | 1988
Yutaka Kobayashi; Yasuhisa Niimi
A speech interface for a database is being developed. The system tries to interpret sequences of natural language queries from a user by utilizing a wide range of knowledge sources. The acoustic analyzer transforms the speech signal into a phonetic lattice using the phonetic HMM‐based analysis. The syntactic and semantic analyzer predicts those words that might appear next to partial sentence hypotheses. Instead of matching word by word, the unit of matching against the lattice is a part of concatenated word templates bounded by robust phones for dealing with coarticulation effect across word boundaries. The dialog component includes a user model and a topic management mechanism, so as to keep track of the users intention and to alternate the conversation initiative between the system and the user. Knowledge sources are being refined and statistics are being gathered for a set of queries on sightseeing, where the vocabulary size is 500 × 1000 and the speech corpus contains 120 sentences from each of six ...
Journal of the Acoustical Society of America | 1988
Yasuhisa Niimi; Yutaka Kobayashi
A method for speaker adaptation of a code book in vector quantization and its application to word recognition based on the HMM are reported. Under the assumption that speech vectors can be represented by the two‐factor model—that is, a sum of the two main effects of “phoneme” and “speaker” and the interaction between the two—the vector space is divided into narrow subspaces in which the speaker effect is considered constant. In each subspace, the displacement vector due to that effect is estimated by using training utterances. The speaker adaptation of a reference code book is completed by moving all the code vectors contained in the subspace parallel to the displacement. The recognition of 65 Japanese city names was performed. A speaker‐independent reference code book and the HMMs of the words were designed by using utterances spoken by five male speakers. For the utterances produced by 20 other male speakers, the average word recognition rate was 97.4% in the speaker‐adaptive mode and 94.1% in the speak...
Iete Journal of Research | 1988
Yasuhisa Niimi
This paper provides the review of developments in speech recognition in Japan. Attention is paid to research activities in 1980s which might develop as fundamental technology in this field. After describing briefly the phonemic and syntactic structures of Japanese, we discuss the selected subjects of a variety of studies on acoustic phonetic analysis and three acoustic phonetic recognition systems based on the different principles. Although isolated word recognition devices have been commercialized, there are several problems to be solved in this subject, such as connected word recognition, speaker-independent word recognition and word recognition of a large vocabulary. We describe some promising solutions on these problems. We summarize studies on a speech understanding system performed in 1970s and then discuss developments of linguistic processing and the rise of studies on a voice operated word processor. Lastly a new ambitious project on the interpreting telephony is introduced.
Iete Journal of Research | 1988
Yutaka Kobayashi; Yasuhisa Niimi
This paper gives an overview and major components of our speech understanding system, and explains the underlying ideas.Our speech understanding system is based on the idea that since we can locate some phones very accurately in the incoming speech, an interval bounded by such robust phones should be taken as the unit of matching. This eliminates the problem of ambiguous word boundaries in searching the most likely sentence, as such intervals have little correspondence to word boundaries. We introduced a new processing level called a partial lattice hypothesis level to realize the above idea in our hierarchical SUS. The typical word pronunciation and their varieties are precompiled in a lattice form, while modifications at word boundaries are performed by applying the phonological rules from time to time.Another feature of our SUS is an efficient word predictor based on the bottom up parsing.