Shinsuke Mori | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shinsuke Mori is active.

Explore More

Publication

Featured researches published by Shinsuke Mori.

Journal of the Acoustical Society of America | 2005

Symbol insertion apparatus and method

Masafumi Nishimura; Nobuyasu Itoh; Shinsuke Mori

An apparatus and method are provided for the insertion of punctuation marks into appropriate positions in a sentence. An acoustic processor processes input utterances to extract voice data, and transforms the data into a feature vector. When the automatic insertion of punctuation marks is not performed, a language decoder processes the feature vector using only a general-purpose language model, and inserts a comma at a location marked in the voice data by the entry “ten,” for example, which is clearly a location at which a comma should be inserted. When automatic punctuation insertion is performed, the language decoder employs the general-purpose language model and the punctuation mark language model to identify an unvoiced, pause location for the insertion of a punctuation mark, such as a comma.

international conference on acoustics, speech, and signal processing | 2007

Unsupervised Lexicon Acquisition from Speech and Text

Gakuto Kurata; Shinsuke Mori; Nobuyasu Itoh; Masafumi Nishimura

When introducing a large vocabulary continuous speech recognition (LVCSR) system into a specific domain, it is preferable to add the necessary domain-specific words and their correct pronunciations selectively to the lexicon, especially in the areas where the LVCSR system should be updated frequently by adding new words. In this paper, we propose an unsupervised method of word acquisition in Japanese, where no spaces exist between words. In our method, by taking advantage of the speech of the target domain, we selected the domain-specific words among an enormous number of word candidates extracted from the raw corpora. The experiments showed that the acquired lexicon was of good quality and that it contributed to the performance of the LVCSR system for the target domain.

meeting of the association for computational linguistics | 1998

A Stochastic Language Model using Dependency and its Improvement by Word Clustering

Shinsuke Mori; Makoto Nagao

In this paper, we present a stochastic language model for Japanese using dependency. The prediction unit in this model is an attribute of bunsetsu. This is represented by the product of the head of content words and that of function words. The relation between the attributes of bunsetsu is ruled by a context-free grammar. The word sequences are predicted from the attribute using word n-gram model. The spell of Unknow word is predicted using character n-gram model. This model is robust in that it can compute the probability of an arbitrary string and is complete in that it models from unknown word to dependency at the same time.

international conference on computational linguistics | 2000

A stochastic parser based on a structural word prediction model

Shinsuke Mori; Masafumi Nishimura; Nobuyasu Itoh; Shiho Ogino; Hideo Watanabe

In this paper, we present a stochastic language model using dependency. This model considers a sentence as a word sequence and predicts each word from left to right. The history at each step of prediction is a sequence of partial parse trees covering the preceding words. First our model predicts the partial parse trees which have a dependency relation with the next word among them and then predicts the next word from only the trees which have a dependency relation with the next word. Our model is a generative stochastic model, thus this can be used not only as a parser but also as a language model of a speech recognizer. In our experiment, we prepared about 1,000 syntactically annotated Japanese sentences extracted from a financial newspaper and estimated the parameters of our model. We built a parser based on our model and tested it on approximately 100 sentences of the same newspaper. The accuracy of the dependency relation was 89.9%, the highest accuracy level obtained by Japanese stochastic parsers.

meeting of the association for computational linguistics | 2006

Phoneme-to-Text Transcription System with an Infinite Vocabulary

Shinsuke Mori; Daisuke Takuma; Gakuto Kurata

The noisy channel model approach is successfully applied to various natural language processing tasks. Currently the main research focus of this approach is adaptation methods, how to capture characteristics of words and expressions in a target domain given example sentences in that domain. As a solution we describe a method enlarging the vocabulary of a language model to an almost infinite size and capturing their context information. Especially the new method is suitable for languages in which words are not delimited by whitespace. We applied our method to a phoneme-to-text transcription task in Japanese and reduced about 10% of the errors in the results of an existing method.

international conference on acoustics, speech, and signal processing | 2006

Unsupervised Adaptation of a Stochastic Language Model Using a Japanese Raw Corpus

Gakuto Kurata; Shinsuke Mori; Masafumi Nishimura

The target uses of large vocabulary continuous speech recognition (LVCSR) systems are spreading. It takes a lot of time to build a good LVCSR system specialized for the target domain because experts need to manually segment the corpus of the target domain, which is a labor-intensive task. In this paper, we propose a new method to adapt an LVCSR system to a new domain. In our method, we stochastically segment a Japanese raw corpus of the target domain. Then a domain-specific language model (LM) is built based on this corpus. All of the domain-specific words can be added to the lexicon for LVCSR. Most importantly, the proposed method is fully automatic. Therefore, we can reduce the time for introducing an LVCSR system drastically. In addition, the proposed method yielded a comparable or even superior performance to use of expensive manual segmentation

international conference on computational linguistics | 2002

A stochastic parser based on an SLM with arboreal context trees

Shinsuke Mori

In this paper, we present a parser based on a stochastic structured language model (SLM) with a flexible history reference mechanism. An SLM is an alternative to an n-gram model as a language model for a speech recognizer. The advantage of an SLM against an n-gram model is the ability to return the structure of a given sentence. Thus SLMs are expected to play an important part in spoken language understanding systems. The current SLMs refer to a fixed part of the history for prediction just like an n-gram model. We introduce a flexible history reference mechanism called an ACT (arboreal context tree; an extension of the context tree to tree-shaped histories) and describe a parser based on an SLM with ACTs. In the experiment, we built an SLM-based parser with a fixed history and one with ACTs, and compared their parsing accuracies. The accuracy of our parser was 92.8%, which was higher than that for the parser with the fixed history (89.8%). This result shows that the flexible history reference mechanism improves the parsing ability of an SLM, which has great importance for language understanding.

Journal of Natural Language Processing | 1998

An Improvement of a Morphological Analysis by a Morpheme Clustering

Shinsuke Mori; Makoto Nagao

本論文では, 形態素クラスタリングと未知語モデルの改良による確率的形態素解析器の精度向上を提案する. 形態素クラスタリングとしては, 形態素n-gramモデルをクロスエントロピーを基準としてクラスn-gramモデルに改良する方法を提案する. 未知語モデルの改良としては, 確率モデルの枠組の中で学習コーパス以外の辞書などで与えられる形態素を追加する方法を提案する. bi-gramモデルを実装しEDRコーパスを用いて実験を行なった結果, 形態素解析の精度の向上が観測された. 両方の改良を行なったモデルによる形態素解析実験の結果の精度は, 先行研究として報告されている品詞tri-gramモデルの精度を上回った. これは, 我々のモデルが形態素解析の精度という点で優れていることを示す結果である. これらの実験に加えて, 品詞体系と品詞間の接続表を文法の専門家が作成した形態素解析器との精度比較の実験を行なった. この結果, 確率的形態素解析器の誤りは文法の専門家による形態素解析器の誤りに対して有意に少なかった. 形態素解析における確率的な手法は, このような人間の言語直感に基づく形態素解析器と比較して, 現時点で精度がより高いという長所に加えて, 今後のさらなる改良にも組織的取り組みが可能であるという点で有利である.

Archive | 2008